The LINHD team at UNED (http://linhd.uned.es/) is very happy to announce that it has been awarded one of the Pelagios microgrants for a proposal to extend Pelagios into the Mediaeval Iberian world. We will identify placenames from a very important source for Spanish Philology: the Biblioteca Digital de Textos del Español Antiguo (BIDTEA, Hispanic Seminary of Mediaeval Studies-University of Wisconsin-Madison) (http://www.hispanicseminary.org/manual-es.htm), concentrating on the Documentos Castellanos de Alfonso X* and the regions of Andalucía, Galicia, León, Murcia, Castilla la Nueva and Castilla la Vieja.
The project has two main objectives:
- to contribute to the extension of Pelagios by adding Mediaeval Iberian placenames.
- to develop a framework for the knowledge extraction of Mediaeval Iberian placenames in reliable and verifiable digital sources in Spanish.
Mediaeval Iberian placenames are not easy to identify and locate. Not only is there no Spanish/Iberian gazetteer with which to identify geographical places from such literary sources or map them to current geography; at present there is no system in place to regularise the names used in the surviving texts, which contain many phonetic and linguistic variations. It is also the case that many of the places which appear in the literary texts were either invented by their authors or were named in different ways throughout the centuries.
To address these issues, we are using the Pelagios microgrant to develop a Mediaeval Iberian knowledge extraction framework, which will have the following phases:
1) Acquisition and identification: To acquire the set of Mediaeval placenames, we will extract old Spanish geographical names from the reliable and verifiable digital sources mentioned above. We will also test different Natural Languages Processing (NLP) tools for their ability to automatically extract geographical entities with NER systems. Digital NER tools like Freeling (http://nlp.lsi.upc.edu/freeling/) will be specially adapted to the Old Spanish variants.
2) Knowledge creation: Once the Mediaeval Iberian placenames are extracted, identified and disambiguated, we will add them to the Pleiades Gazetteer, which is one of the major gazetteers that provides Pelagios with its linked system. All the new information about Mediaeval Iberian placenames will be categorised and indexed semantically as URIs, in order that they can be reused and linked to other data models in a linked open data context (e.g. from National Spanish Library, such as digitised corpora, literary works, dictionaries, wiki sources…).
3) Evaluation: Using Pelagios as a research framework to test the system, we will enter the texts into the Recogito API and check whether the new functionalities of the installed vocabulary deliver the expected results (such as whether the Mediaeval Spanish places mentioned in the texts are correctly retrieved). For this step, we ask different users from our community to try the API with different texts.
Finally we will mark up the texts in Recogito.
In this way, we hope that our project may also serve as a pilot for a larger initiative looking to ingest Mediaeval Iberian data in the Pelagios Project. Any researchers interested in Mediaeval Iberian texts and GIS will strongly benefit from this resource.
We had our first meeting at our DH Summer School at UNED, Madrid, where Pelagios’s Community Manager Pau de Soto presented a workshop on Pelagios, so that everyone on the team was familiar with the working structure. Next up is the I National Digital Humanities Conference in Argentina (7-9 November 2016) where Gimena del Rio and Melisa Martí will provide classes in working with Recogito, using the project texts as examples. (If you are interested, you can register for that conference here: http://www.aacademica.org/aahd.congreso/tabs/register.)
Ours is a highly interdisciplinary team, with experts in Mediaeval Spanish Literature collaborating with a computer scientist in data analytics and data processing working at LINHD (the Digital Humanities Innovation Lab at UNED)—a DH centre focused on semantic web technologies and Spanish literature. This group is working together in several related projects, including POSTDATA (Poetry standardization and linked open data), and EVILINHD (a virtual research environment for digital humanities research).
We are right now working hard on the project and we will be grateful for any comments and suggestions!
Gimena del Rio Riande (CONICET, LINHD UNED)
Elena González-Blanco García (LINHD UNED)
Clara Isabel Martínez Cantón (LINHD UNED)
Ma. Jesús Redondo Rodríguez and all the LINHD team
*The texts are edited and revised by us following the Textos y Concordancias Electrónicos de Documentos Castellanos de Alfonso X by Mª Teresa Herrera, Mª Nieves Sánchez, Mª Estela González de Fauve and Mª Purificación Zabía (Madison: Hispanic Seminary of Medieval Studies, 1999).