Mediaeval Iberia. Final report
Mediaeval Iberia’s pilot phase has come to an end but a new project has started.
The LINHD team at UNED (http://linhd.uned.es/) was awarded a Pelagios microgrant for extending Pelagios Commons to the Mediaeval Iberian world in June, 2016. A brief report of the work carried out in the first months of the project, coordinated and curated by Dr. Gimena del Rio Riande (CONICET, Argentina) can be accessed at http://commons.pelagios.org/2016/10/mediaeval-iberia-through-pelagios-commons/.
Digital Humanities is a relatively new emerging field in Spanish-speaking countries and, consequently, there are many projects being carried out. However, specific tools and software are seldom unknown to many researchers and there is a not a very fluent dialogue between the north and western Digital Humanities projects and developments and the ones from the Spanish and Latin American Humanidades Digitales. Language and lack of institutional and financial support are the barriers researchers most commonly find in this scientific field (Rio Riande 2015, 2016ab).
The Laboratorio de Innovación en Humanidades Digitales (LINHD) at the Universidad Nacional de Educación a Distancia (UNED) of Madrid, Spain, directed by Dr. Elena González Blanco, was founded in 2014 and, since then, has been developing and supporting Digital Humanities projects from a global perspective. In this sense, LINHD acts both as an observatory and a crossroad in which a consolidated team of researchers from Spain and Argentina work together following successful experiences in the Digital Humanities field, and look for collaboration and co-working through research projects, courses, summer schools, workshops and other academic events and activities (González Blanco, Martínez Cantón & Rio Riande 2016). Our team is interdisciplinary and Digital Humanities focused, as it is composed by a humanities team with experts in Mediaeval Spanish Literature and Computer Science with expertise in data analytics and data processing. Several LINHD projects such as POSTDATA (Poetry standardization and linked open data), or EVILINHD (a virtual research environment for digital humanities research) are strongly related with Medieval Iberia project and Pelagios Commons interests in semantic web and Linked Open Data.
Since 2014 LINHD has been following the developments of the Pelagios team. Since then, Pelagios tools, such as Recogito, were presented and/or used at several events in Argentina and Spain organized by the Asociación Argentina de Humanidades Digitales (AAHD) and LINHD. For instance, Recogito was presented in November 2014 at the First National Conference in Digital Humanities organized in Argentina, a Pelagios Workshop was organized at LINHD-UNED summer school in Madrid (July, 2016), and at the First International DH Conference in Buenos Aires, in November, 2016. All these events were coordinated by Gimena del Rio Riande and Elena González Blanco. Medieval Iberia microgrant formalized this collaboration and presented further steps extending Pelagios Commons to Spanish-speaking research groups and general users.
In the first phase or our project (July-December, 2016), the Mediaeval Iberia team worked on:
1) A digital revised edition of the Documentos Castellanos de Alfonso X (in Textos y Concordancias Electrónicos de Documentos Castellanos de Alfonso X by Mª Teresa Herrera, Mª Nieves Sánchez, Mª Estela González de Fauve and Mª Purificación Zabía. Madison: Hispanic Seminary of Medieval Studies, 1999). These texts were kindly shared by Francisco Gago Jover and Javier Pueyo from the Biblioteca Digital de Textos del Español Antiguo (BIDTEA, Hispanic Seminary of Mediaeval Studies-University of Wisconsin-Madison) (http://www.hispanicseminary.org/manual-es.htm). Gimena del Rio Riande revised and edited the documents with the technological help of Pablo Ruiz Fabo (LINHD).
2) A specific gazetteer in Pleiades for the Medieval Iberian world based on the places of the Documentos Castellanos de Alfonso X…We concentrated on the regions of Andalucía, Galicia, León, Murcia. After contacting the Pleaides team several times and several Skype online meetings, María Jesús Redondo Rodríguez (LINHD) completed more than one hundred and fifty locations in Pleiades (a list of the places with names and variants here: https://docs.google.com/document/d/1y4hFnG0kwqZE9O2zgowtphQpcoGUinrXw2nrc7vrajw/edit). All these data were curated by Gimena del Rio Riande.
3) A Recogito Virtual Lab. A joint project was organized in Spain and Argentina (LINHD and CONICET) in order to tag the places from the Documentos Castellanos de Alfonso X… Dr. Clara Martínez Cantón (UNED) worked on a Spanish tutorial of Recogito (http://linhd.es/el-2o-simposio-internacional-pelagios/) based on the class offered by Pau de Soto at LINHD’s 2016 Summer School. The tutorial was revised by Gimena del Rio Riande. lara Martínez Cantón and Elena González Blanco started working with students at UNED using Recogito to tag the texts from Andalucía and the new collaborative version of Recogito was tested. Gimena del Rio and Romina de León (CAICYT, CONICET) then tagged the texts from Murcia.
4) A NLP tool for the localization and labeling of places in Spanish texts -MEGHISTa- was developed by María Luisa Díez Platas (LINHD).
Some important issues had to be addressed before we started this work. Mediaeval Iberian placenames are not easy to identify and locate for several reasons. Firstly, there is no a Spanish/Iberian gazetteer to identify geographical places from literary sources and its equivalents in the current geography. Moreover, it is also necessary to regularize names and terms, as Mediaeval surviving texts contain many phonetic and linguistic variations to make name-entity recognition technologies (NER) systems recognize them automatically. Finally, some locations that appear in literary texts were invented by Mediaeval authors, they never existed or were named in a different way throughout the centuries. After testing different Natural Languages Process (NLP) tools to extract automatically geographical entities with NER systems -like Freeling (http://nlp.lsi.upc.edu/freeling/)-, a Mediaeval Iberian knowledge extraction framework was developed to implement the different phases defined in a knowledge extraction procedure.
Our tool -MEGHISTa – aims at the identification of geographical placenames in medieval Spanish texts and integration in Recogito. The current version, still in the process of implementation and testing and accessible on GitHub, has a text processing module, in which a lexical analysis and the different processes are carried out in order to identify the places that appear in the texts. MEGHISTa combines lexical, syntactic and semantic analysis with NLP technologies. MEGHISTa recognizes names through two processes:
1. Identification of terms in context: We have previously defined and implemented a hierarchy of contexts with polymorphic functions. These contexts allow the recognition and detection of different names and reduce ambiguities. Complex structures, nominal phrases, lists of terms, appositions and others are processed.
2. Generation of variants of terms that have not been found in gazetteer s and dictionaries: Morphological rules of Latin that impact in the linguistic evolution of Spanish are applied.
MEGHISTa uses these gazetteers and dictionaries:
1. Pleiades (Medieval Iberia placenames)
2. Geonames (the query in this gazetteer only serves for the resolution of variants, since it works with modern terms)
3. Freeling Old Spanish Dictionary
4. Medieval dictionary that the tool generates
The lexical analyzer recognizes words and complex constructions. It identifies elements of sentences preceded by specific words such as “donna” or “senhor”, structures that contain prepositions, or others that are preceded by words that specify geographical locations such as “river”, “church”, etc.
The first stage of processing aims to establish candidate simple words or sets of words for places by their exact match with simple or compound words found in the dictionaries. The search is performed giving priority to the medieval gazetteer. If it word cannot be found, it looks in Freeling, then in Pleiades, and finally in GeoNames, both in the main entrances and in their variants. When there is no exact match, new words are generated from the ones extracted by the application of morphological rules that the evolution of the words from Latin to the present Castilian has produced. Each of the generated words are entered in a table with the distance that separates them from the original word (the distance is calculated from the Levenshtein distance and N-grams). These words are compared with those of the dictionaries and then the tool extracts the match that is closest to the original word. If the new words do not match exactly with any of the dictionaries, then the minimum distance between them and the recognized words is searched and all the words found are the closest to the original word. MHEGISTa’s outputs are:
1. XML file with detected entities
2. csv file with list of identified terms
3. New terms incorporated into medieval gazetteer
4. Possible new terms that are incorporated into a new gazetteer for validation
5. List of proper names
6. List of names of saints
Padlet with Medieval Iberia conclusions: https://padlet.com/guineveregime/hbmexfyf0mjx
4) Specific courses and events were organized in 2016 to promote and disseminate the Pelagios Commons in Spanish and Medieval Iberia Project. A starting point for Medieval Iberia was our DH Summer School at UNED in Madrid in July. Here Pau de Soto gave a workshop about Pelagios (http://linhd.es/en/p/dh-summer-2016/). At the I National Digital Humanities Conference in Argentina (7-9 November 2016) Gimena del Rio and Melisa Martí taught participants how to work in Recogito using the texts of the project as example
Melisa Martí (CONICET-UBA) also published the article “Herramientas de geolocalización y su aplicación al estudio de relatos de viajes medievales” in Luthor Journal, http://www.revistaluthor.com.ar/spip.php?article144,in which she addresses Pelagios Project and Recogito as a tool for understanding Medieval Iberian Chronicles.
LINHD also organized together with Pelagios Commons the 2nd International Symposium “Linked Pasts” in which the team proposed a Multilingualism Working Group for Pelagios Commons (http://commons.pelagios.org/working-groups/).
We hope our project serves as a pilot for a larger initiative for Mediaeval Iberian data in Pelagios Commons. We will keep on working on our developments in our ongoing projects POSTDATA, Poetry Standarization and Linked Open Data, http://postdata.linhd.es, at LINHD, and Humanidades Digitales CAICYT (CONICET), http://www.caicyt-conicet.gov.ar/micrositios/mhedi/.
Medieval Iberia Team
Gimena del Rio Riande (CONICET, Argentina-LINHD)
Elena González Blanco (LINHD-UNED)
Clara Martínez Cantón (LINHD-UNED)
María Luisa Díez Platas (LINHD-UNED)
Francisco Gago Jover (Hispanic Seminary of Medieval Studies)
Sofía García (UNED)
Romina de León (CAICYT-CONICET, Argentina)
Javier Pueyo (Hispanic Seminary of Medieval Studies)
María Jesús Redondo Rodríguez (external collaborator, LINHD)
Ana Rodriguez Perez (UNED)
Salvador Ros (UNED)
Pablo Ruiz Fabo (LINHD)
Melisa Martí (CONICET-UBA, Argentina)
González Blanco, Elena (2016): “Un nuevo camino hacia las humanidades digitales: el Laboratorio de Innovación en Humanidades Digitales de la UNED (LINHD)”, en Signa: Revista de la Asociación Española de Semiótica (UNED), núm. 25, 2016, pp. 79-93. http://revistas.uned.es/index.php/signa/article/view/16959
Rio Riande, Gimena del (2015): “Humanidades Digitales. Mito, actualidad y condiciones de posibilidad en España y América Latina“. ArtyHum, monográfico 1, pp. 7-19. https://www.artyhum.com/descargas/monograficos/MONOGR%C3%81FICO%20HD.pdf
— (2016a): “De todo lo visible y lo invisible o volver a pensar la investigación en humanidades digitales”, en Signa: Revista de la Asociación Española de Semiótica (UNED), núm. 25, 2016, pp. 95-108.
— (2016b): “Humanidades Digitales: estándares para su consolidación en el campo científico argentino”, 48° Reunión Nacional de Bibliotecarios, Bibliotecarios: integración, identidad regional y abordaje transversal, http://www.abgra.org.ar/documentos/48RNB_20160419_1400-Ponencia.pdf
González Blanco, Elena; Martínez Cantón, Clara & Rio Riande, Gimena del (2016): “El Laboratorio de Innovación en Humanidades Digitales y la redefinición del perfil del humanista y la academia en el siglo XXI”, en Actas I Jornadas de Humanidades Digitales de la AAHD, Buenos Aires: Editorial de la Facultad de Filosofía y Letras, pp. 160-175.