About two weeks ago, on Friday October 31, we held the first of two annotation workshops funded through the Open Humanities Awards, designed to gather data through our Recogito “crowdsourcing” interface. The Heidelberg University Institute of Geography kindly agreed to be our host for this inaugural event. A big thank you goes to Lukas Loos for setting up our visit and taking care of local organization, and to Armin Volkmann for his spontaneous decision to merge his geo-archaeology seminar with our workshop on that day.
The answer to this famous nursery rhyme – “three score and ten”, i.e. 70 km – seems outrageously high for a day’s journey, no matter “if your heels are [exceptionally] nimble and light”. (Even swift-footed Achilles would struggle to cover 70 km in day!) So, what are we to make of it? How can we evaluate such a distance number?
This is where the “database ancient measurements” comes in. The project was initially sponsored by Berlin’s excellence cluster TOPOI and is managed now by my IT whiz Rainer Streng who set it up in MS Access and programmed data exports into “Google Earth” and applications in “ArcGIS 10”. Irina Tupikova, who by day is an astronomer and mathematician, is also working on Ptolemy’s data, recalculating the spherical coordinates to the original measurements.
As far as we know, there is no comparable collection of this kind. Right now, our database includes nearly 100 ancient authors and their works, especially ancient geographers and historians (Strabo, Pliny, Herodotus, Thucydides etc.), but also minor authors like the pseudo-Aristotelian work de mundo or Horace’s Satires. All in all we have in our database 2466 “distances”, i.e., attested routes with two points and a number. (Among them eleven routes for Babylon, and even bigger figures for a “day´s journey” than the 70 km in the nursery rhyme, if you are interested!)
What can one do with these data? We think: a lot! To start with:
- How accurate and reliable were ancient measurements data?
- What units are attested and how do they relate to each other? This is a basic and notorious question in the field of ancient metrology.
- Who measured or rather estimated distances in antiquity? Soldiers, explorers, merchants? Were there any attempts to map a whole country or empire and standardize the many distances in antiquity? If so, was this a “bottom-up” process done by practitioners like seamen or merchants or a “top-down” one, organized by a central administration?
But there are potentially much more searching questions, such as:
- How does an ancient author employ numbers, especially distances as a means to engage with his readership, in order to bring home his own ideas or concepts? Authors like Herodotus or Thucydides were very careful (and sometimes even deceptive!) in using numbers in their narratives.
- How can we use measurement data to explore one of the most basic, important and comparable properties of space is its extension, its spatiality? If researchers concern themselves with spaces, they should not ignore this aspect (as they mostly do). Distances are a means to evaluate the different concepts of space the ancients had in mind. But they allow us also to reconstruct not only the real maps of ancient geographers but also the “mental maps” of merchants, soldiers, intellectuals etc.
- How can a corpus of ancient measurement data allow us to reconstruct ancient routes and waterways and, in addition, social phenomena like migration or mobility? The ambitious application Rainer works on now, is an ancient network of ancient routes and waterways (something like Orbis or Omnes Viae, but based on our measurement data).
To give just a small example:
The green line depicts the route between Tridentum and Rome, a route, which according
to the “codex Theodosianus” (6.28.1) can be covered in 34 days. 34 days of travel
are “normally” equivalent to c. 850 km (a day´s journey calculated as 25 km). The
linear distance according to Google Earth is 477.58 km. But our ArcGIS model shows that
the route on known Roman roads is in fact 90 km longer, i.e. 567.40 km.
The meetings with the nimble-footed Pelagios team (zigzagging between several locations all over Berlin in one and a half days) helped us sharpen our own profile and scientific approach tremendously. Fine-tuning our data and making it compatible and interoperable with the other Pelagios partners will be undertaken over the upcoming months. Watch this space!
This month sees the start of another new and exciting phase of Pelagios. With funding from the Arts and Humanities Research Council’s Digital Transformations programme, we will be exploring the transformative potential of our linked open data network for doing research. In short our brief is to address the question, “ok, now we can link stuff online—so what?”
In response to the challenge posed by “data silos” (the mass of independently produced material uploaded onto the Web), since 2011 we have been developing the means of linking online resources via their common references to place. This has involved “annotating” the place names found in documents and aligning those references to a global gazetteer service (for the ancient world, this is Pleiades). Using Pleiades’s Uniform Resource Identifiers (or “social security numbers”) for each ancient place as our glue, it is now possible to agree that places mentioned in different materials are one and the same (e.g. Classical Athens and not “Athens, Georgia”). Users are now able to move seamlessly between and search the records of a growing list of international partners.
Thus each place annotation made in the document doesn’t just attach useful spatial information to a resource; it also provides a way of linking to other resources. But, as Andrew Prescott, leader of the AHRC’s Digital Transformations strand, has recently written: ‘Scholarship is much harder than [the ability to link]: we need to be clear about why we are linking data, what sort of data we are linking, and our aim in doing so’. Our one-year grant from the AHRC looks to unlock the potential of our place network to reveal previously unknown connections between different places and different documents (texts, databases, maps, etc.).
In particular what we want to do is to use these new links between different documents to rethink key periods in the history of cartography. Until now digital resources have largely concerned issues of accuracy and visualization; i.e. to pinpoint the locations of ancient places with respect to our contemporary topography. What we want to do, rather, is to try to reconstruct and interpret the markedly different ways in which pre-modern authors and mapmakers conceptualized the world. Turning the spotlight on to five moments in time, Pelagios 4 will explore how ancient or pre-modern authors used various means to grasp, represent and communicate spatial knowledge of the world around them.
To conduct this research Pelagios is happy to announce the following scholarly collaborators:
- Pascal Arnaud, Professor of History at Université Lyon 2 and senior member of the Institut universitaire de France (IUF), is the leading specialist in ancient geography and navigation.
- Tony Campbell is former head of the British Library’s ‘Map Room’ and the pre-eminent expert on Portolan Charts.
- Marianne O’Doherty, Lecturer in English at the University of Southampton, has published on medieval European travel narratives, geography and cartography.
- Klaus Geus, Chair of Ancient Geography at FU Berlin, co-ordinates the TOPOI Excellence Cluster in ‘Common Sense Geography’. He is joined by Irina Tupikova, a leading mathematical astronomer with an interest in the history of science.
We look forward to working with these scholars and rethinking the ways in which geographic space was imagined and represented before the advent of modern Cartesian cartography.
Portolan chart by Jorge de Aguiar (1492), the oldest known
signed and dated chart of Portuguese origin.
Having recently completed our first content workpackage (CWP1), dedicated to Early Geospatial Documents from the Latin Tradition, we’d like to take this opportunity to share the annotation data that we’ve compiled so far. Overall we have completed annotating place references in 33 documents (41 if we include additional language versions of the same document). Within these documents, we’ve identified 19,880 toponyms, and were able to establish mappings to Pleiades in 15,721 cases (79%).
You can find the complete list of our documents, along with a download link for the data, below. The annotations are stored in CSV format – i.e. they can be opened in a spreadsheet application, or imported into a database or GIS.
An additional part of our work in CWP1 (which will be important too for all our content work packages) has been to identify additional relevant documents as we go along. Our list of “geospatial documents” has therefore grown quite substantially. We have included these documents in our annotation tool Recogito. You can follow their status directly on Recogito’s Latin Tradition landing page.
Now that we have finished work on the first of our six traditions, we are keen to get your feedback. In particular:
- We look forward to seeing what and how you make use of these data. We’re sure that you’ll use them in ways that we can’t anticipate, and we’d love you to share that with us!
- The large number of documents, the ambiguity of the evidence and the comparatively short space of time mean that some of our identifications will inevitably be wrong or open to debate. We are planning a mechanism to allow people to suggest alternative suggestions or to indicate agreement and disagreement. In the meantime, feel free to contact us if you have proposals for corrections. We’re also happy to hear suggestions for other early latin geographic documents which we may have missed.
- Since, as we expected, we could not fully annotate or geo-resolve all of our documents, we’re interested in hearing from people who might be willing to join us in the challenge. If you would like to join in and help find places in the incomplete documents, feel free to get in touch and we may be able to provide you with a Recogito account. We’ll have to roll this out slowly since Recogito is not a ‘community tool’ as such (with features such as full moderation, user profiles, etc) so please be aware that there may be a wait if we get a lot of volunteers!
- Agrippa Fragments.
113 toponyms. 80 Pleiades matches.
- Milestone of Allichamps
3 toponyms. 3 Pleiades matches.
- Pomponius Mela: De Chorographia (around 43 CE)
2186 toponyms. 1778 Pleiades matches.
- Laterculus of Valencia (post 100 CE)
11 toponyms. 11 Pleiades matches.
- Lavs Alexandriae (post 180 CE)
5 toponyms. 5 Pleiades matches.
- Piazzale delle Corporazioni, Ostia (175 – 200 CE)
14 toponyms. 14 Pleiades matches.
- Hadrian’s Wall Fort Vessels (100 – 300 CE)
16 toponyms. 16 Pleiades matches.
- Solinus: C. IVLII SOLINI (225 – 275 CE)
598 toponyms. 478 Pleiades matches.
- Itinerarium de Astorga (267 – 276 CE)
47 toponyms. 45 Pleiades matches.
- Divisio Orbis Terrarum (300 – 400 CE)
167 toponyms. 153 Pleiades matches.
- Vicarello Beakers (around 200 – 400 CE)
438 toponyms. 438 Pleiades matches.
- Ammaedara Mosaic (275 – 325 CE)
12 toponyms. 12 Pleiades matches.
- Laterculus Veronensis (304 – 324? CE)
101 toponyms. 93 Pleiades matches.
- Nomina Provinciarvm Omnium (312 CE)
154 toponyms. 136 Pleiades matches.
- Bordeaux Itinerary (333 CE).
612 toponyms. 589 Pleiades matches.
- Bordeaux Itinerary – English translation (333 CE)
643 toponyms. 617 Pleiades matches.
- Avenius: Ora Maritima (300 – 400 CE).
269 toponyms. 168 Pleiades matches.
- Avenius: Ora Maritima (300 – 400 CE) – Spanish translation
277 toponyms. 197 Pleiades matches.
- Ex Cronographo Anni P. Chr. 354 (354 CE)
47 toponyms. 30 Pleiades matches.
- Sextus Festus: Breviarium of the Accomplishments of the Roman People (379 CE)
420 toponyms. 329 Pleiades matches.
- Sextus Festus: Breviarium of the Accomplishments of the Roman People (379 CE) – English translation
400 toponyms. 320 Pleiades matches.
- Peregrinatio Aetheriae (381 – 384 CE)
269 toponyms. 173 Pleiades matches.
- Peregrinatio Aetheriae (381 – 384 CE) – English translation
383 toponyms. 260 Pleiades matches.
- Notitia Dignitatum (390 – 420? CE)
1505 toponyms. 1164 Pleiades matches.
- Notitia Dignitatum (390 – 420? CE) – English translation
719 toponyms. 570 Pleiades matches.
- Ammianus Marcellinus: Roman History (before 391 CE)
945 toponyms. 693 Pleiades matches.
- Ammianus Marcellinus: Roman History (before 391 CE) – English translation
956 toponyms. 784 Pleiades matches.
- Dimensuratio Provinciarum (300 – 500 CE)
192 toponyms. 179 Pleiades matches.
- Notitia Galliarum (375 – 425 CE)
142 toponyms. 93 Pleiades matches.
- Pseudo-Plutarch: About Rivers And Mountains And Things Found In Them (200 – 400 CE) – English translation
288 toponyms. 242 Pleiades matches.
[Source Weblink (PDF)]
- Rutilius Namatianus: A Voyage Home to Gaul (416 CE) – English translation
88 toponyms. 83 Pleiades matches.
- Orosius: A History, against the Pagans (416 – 417 CE) – English translation
573 toponyms. 429 Pleiades matches.
- Iulii Honorii: Excerpta Eius Sphaerae vel Continenta (320 – 440 CE)
1167 toponyms. 868 Pleiades matches.
- Polemius Silvius: Ex Laterculo (448 – 449 CE)
160 toponyms. 137 Pleiades matches.
- Peutinger Table (400 – 500 CE)
3456 toponyms. 2679 Pleiades matches.
[Source Weblink: Rome’s World] [Source Weblink: OmnesViae]
- Jordanes: De Origine Actibusque Getarum (500 – 600 CE)
260 toponyms. 155 Pleiades matches.
- Jordanes: De Origine Actibusque Getarum (500 – 600 CE) – English translation
209 toponyms. 137 Pleiades matches.
This week marks a new and exciting milestone in the Pelagios 3 project – the start of work on the ancient Greek geographic tradition. There’s more Latin to do of course: our work packages run on a staggered, overlapping 6-month basis, and, while we already have 19 documents in the system (some in both Latin and their modern language translation), future additions will include some major itinerary lists—including the Antonine Itineraries and Ravenna Cosmography—as well as a number of smaller but fascinating geographic sources such as the Haidra mosaic, some more inscribed vessels, and the Piazzale delle Corporazione at Ostia.
But from today we’ll start introducing Greek documents into the system. Ancient Greek traditions of knowledge about geography extend far beyond Plato’s “frogs around a pond” metaphor for Greek settlements around the Aegean Sea. From Homer’s Odyssey, Greek texts push the boundaries of travel, exploration and knowledge, and Odysseus, the man who ‘saw the cities of many men and knew their minds’, stands as the archetypal explorer for Greeks who settled in places as far off as the Black Sea, Massalia (Marseille) and Libya. Later Greek authors like Hecataeus, Herodotus, Aristotle, Pytheas, Eratosthenes, Hipparchus, Posidonius, Artemidorus and Ptolemy are largely responsible for the way we conceptualise geography today (indeed, Eratosthenes invents the discipline), and we still use the terms that they came up with—terms such as equator, meridian, parallel, latitude and longitude. At the same time, much Greek geography is almost cosmological in nature—an attempt to understand the form of the earth and its place in the universe.
Remarkably, however, given the number and detail of these ancient witnesses, almost no Greek maps survive, and it is debate whether maps were even a feature of Greek traditions of geographical knowledge. (A map documented in Herodotus’s Histories, carried by a certain Aristagoras of Mytilene, becomes the site of contestation and debate, while Herodotus himself ‘laughs at’ the schematic representations of his contemporaries.) Instead Greek conceptualisations of the world were almost exclusively in a narrative form, from numerous periploi (sailing itineraries) to Strabo, whose Geografica remains central to our understanding of global geography in the transition to Empire.
Working with Ancient Greek texts will introduce some new challenges for us to tackle. To begin with change in alphabet will take a little getting used to for some of the team! Fortunately recent work by Bruce Robertson, Greg Crane and others on OCRing ancient Greek means that we should be able to include a range of previously inaccessible texts. We can also draw on experience form the Hestia project and a promising new approach developed by Thomas Efer at the University of Leipzig that can identify toponyms in a Greek text by comparing it to a previously marked up English text. We don’t yet know what will be the most efficient combination of methodologies but at least we have plenty to choose from.
We have enormously enjoyed working with the Latin texts and will continue doing so, but the possibilities for analysis opened up by annotating documents from these two strongly related yet radically divergent traditions are incredibly exciting.
|Jerusalem depicted in the Madaba Mosaic (6th C. AD). Image from Wikimedia Commons.|
In our two previous posts we introduced Recogito, a tool we are developing in order to efficiently extract, annotate and verify geographic references in texts. The development of Recogito is still continuing at full steam, and the team (and Leif in particular 😉 is feeding our feature backlog with a steady flow of new ideas & requirements. But despite the fact that there’s still a slight ambience of a busy construction site around Recogito, we have not just been developing. We have also been using it heavily to annotate new documents.
Prior to the start of Pelagios 3, we assembled a list of potential ancient sources to work on in each content work package. The sources we selected are specifically geographical works, i.e. documents where the authors give accounts of their world in their time. For some of the more extensive sources (such as Pliny’s Natural History), we restricted ourselves to only the specifically geographical chapters.
At the moment, we are about halfway through our first content work package, dealing with the Latin tradition (3 months out of 6). It’s therefore a good time to share with you the progress we made so far. The first three documents – the Vicarello Beakers, the Bordeaux Itinerary and Pliny’s Natural History – we already introduced previously. We’ve since found our groove and the list has grown much longer. Here are some documents we are currently working on:
Fig.1. The Bordeaux Itinerary (Part 1) in Recogito (» View Map)
Pomponius Mela: De Chorographia (around 43 AD)
Pomponius Mela lived during the government of Claudius and presumably died around the year 45 AD. His most famous work, cited by other great geographers such as Pliny the Elder, was De Chorographia. This work was composed of three volumes and was developed during the decade of the 40s. Each of his books is dedicated to an area of the known Roman world. In the first volume, Mela generally describes the world and its regions, the Mediterranean coasts of Africa and the Near East, starting from the Strait of Gibraltar. The second volume describes the coasts from the Near East to Hispania, where he talks about Greece, Italy and Gaul. Finally, the third volume describes the Atlantic territories, Britannia, and all remote territories, such as the German Limes, Arabia and India. » Map in Recogito
Laterculus Veronensis (AD 304-324?)
The Laterculus Veronensis is a listing of the various Roman provinces that existed during the governments of Diocletian and Constantine. Its chronology is therefore located between the years 284 and 337. The work is named due to the origin of the single manuscript that has been preserved in the Library of Verona. This source describes twelve dioceses gathering a total of over 100 provinces. » Map in Recogito
Avenius: Ora Maritima (AD IV)
Rufius Avienus Festus was an Etrurian poet, astronomer and geographer who lived in the 4th Century AD. He wrote several books and poems, the most prominent was Ora Maritima. This work is based on the Greek journey of Eutimenes of Massalia from the sixth century. Avienus used other sources such as the work of the first century BC Greek historian Ephorus. The use of this kind of ancient sources has introduced much confusion, making some places difficult to locate, and resulting in a mix of parts originating from very different times. » Map in Recogito
Rutilius Namatianus: A Voyage Home to Gaul (AD 416)
Rutilius Namatianus was born in southern Gaul, probably at the beginning of V century AD. He was a poet, but his only preserved work is the poem De reditu suo libri duo. It must have been written between 416 and 420 AD, and is composed in elegiac meter. Originally written in two volumes, the poem describes a trip down the coast from Rome to Gaul. Unfortunately, however, many parts (especially from the second volume) are lost, and the extant text stops at the port of Moon. » Map in Recogito
Jordanes: Getica (AD VI)
Jordanes lived during the sixth century AD and was of partially Gothic origin. It is believed that during his public career he was a notary and that he might further have had a religious career, coming to be a Bishop. Jordanes’ fame comes from two major works, De regnorum ac Temporum successione, a world history from the creation to the 6th century, and De Origine et Rebu Getarum Gestis, better known as Getica. The latter one we have included in Pelagios 3 (restricting to the chapters with geographic descriptions). It is the only preserved source that explains the origin and characteristics of the Goths. » Map in Recogito
Bede: The ecclesiastical history of our island and nation (AD 703)
Bede, also referred as a Saint Bede, was born in England in the seventh century AD. He was a monk in the kingdom of Northumbria. Bede is known for his work Historia Ecclesiastica gentis Anglorum, completed around the year 731 AD. This work consists of multiple volumes. It begins with the invasion of Caesar in 55 BC and ends with the fifth book, in the time of Bede himself. In Pelagios 3, we only have included the first chapters of this source, which are devoted to a geographical description of the British Isles. » Map in Recogito
Ammianus Marcellinus: Roman History (before 391)
This is a document we are currently starting to work on. Ammianus Marcellinus was a historian in the fourth century AD, probably born in Antioch. After developing his military career, he wrote one of the most famous stories of antiquity. His Res Gestae described the history of Rome from the government of Nerva in 96 to the Valeno’s death in 378. Unfortunately, the first thirteen books were lost, and the remaining eighteen contain missing parts. Only the last books survive, and are dedicated to the events between the years 353 and 378. Like in other cases, we only included those chapters where the geographic aspect was most prominent. » Map in Recogito
In numbers, we have already progressed to a total of 20.164 annotations (as of today), with an overall verification rate of 37.3% (which means we’ve confirmed more than 7.500 place references so far). But there are more Latin sources on our list which we yet have to address over the next three months. And our Greek content work package is about to start as well. So lots of exciting work ahead of us.
You can follow our progress live at http://pelagios.org/recogito!
– Ada, Pau & Rainer
In our last post, we introduced Recogito, a tool we built to verify and correct the results of our automatic text-to-map conversion process. Last time, we’ve focused primarily on Recogito‘s map-based interface, in which we clean up the results of geo-resolution – the step that automatically assigns gazetteer IDs to toponyms.
In this post, we want to talk about Recogito‘s second view: the text annotation interface. And as usual, we’d like to seize the opportunity to introduce our next Early Geospatial Document along with it: the Natural History by Pliny the Elder.
The Natural History (Naturalis Historia) by Pliny the Elder is an encyclopedia published ca. AD 77–79. This amazing work covers the Roman civilization’s knowledge about astronomy, geography, zoology, botany, medicine and mineralogy. In total, it consists of 37 books, and builds on more than 400 sources from the Latin and Greek worlds. Books 3, 4, 5 and 6 focus on geography. In these books, Pliny describes the known world from the Atlantic to the Near East, and from the North of Europe to Africa. He records all the peoples and cities known, with all the geographic features prominent in each territory, such as rivers, mountains, gulfs, or islands.
Recogito Text Annotation UI
The Natural history is the largest text we have addressed so far. Fig.1 shows our current progress with it. (In numbers, we’re through the toponyms of Book 3 by 98%, and have just started Book 4 – now at 5.5%). It also differs from our previous itinerary texts, in the sense that it’s prose, and not structured into an almost ‘tabular’ format. Time to enter our ‘reading view’ in Recogito: the text annotation interface.
The text annotation interface (see Fig. 2) is the place where we inspect and correct the results of geo-parsing – the automatic processing step that identifies toponyms in our source texts. Initially, when we start off with a new document, this view shows us our source text, marked up with grey ‘highlights’ wherever the geoparser thinks it has identified a toponym. We can then remove false matches, annotate toponyms the geoparser has missed, or modify things the geoparser got wrong (e.g. merge multiple identifications into one, turning separate consecutive identifications such as ‘Mount’ and ‘Atlas’ into a single toponym ‘Mount Atlas’).
Going through the source texts is a time-consuming task, and we have made every attempt to make the process as quick and painless as possible. The video above shows how the interface works in practice. Select text in the user interface as you would normally (using click and drag with your mouse, or double click), and confirm the action in the dialog window that pops up. Depending on what you select, the tool will automatically perform the appropriate action: either create a new annotation, delete one, or modify the annotation(s) in the selection. To speed up work even further, there is also an ‘advanced’ mode that skips the confirmation step.
There is one more thing you can see in Fig. 2: annotations are coloured to indicate their ‘sign-off status’. We have already talked about this briefly in our previous post. It’s a consequence of our practice to manually check every annotation before releasing it to the wild. Green annotations are those we have verified, and where we have confirmed a valid gazetteer ID). Yellow are the ones we’ve verified as valid toponyms – but for whatever reason we were yet unable to identify a suitable gazetteer ID for them. Grey are the ones we’ve either not looked at yet; or they are still ‘work in progress’ and we just haven’t verified their gazetteer mapping.
Combined with the map-based interface you can think of this as creating the two parts of an annotation. The text annotation interface presents us with a reference to a place in a document (the ‘target’ of the annotation in Open Annotation terminology), while the map interface identifies a place in a gazetteer (the ‘body’ of the annotation). Although there are two steps to the process, they are fairly quick and easy. Maybe even fun!
* “There’s Plenty of Room at the Bottom” was a lecture given by physicist Richard Feynman in 1959. The talk is considered to be a seminal event in the history of nanotechnology, as it inspired the conceptual beginnings of the field decades later.
Welcome back to another update from our Infrastructure Workpackage 2 – “Annotation Toolkit”, affectionately known as IWP2. In our previous IWP2 post, we talked a little bit about the basics of annotating place references in early geospatial documents. We also presented a first sample dataset based on the Vicarello Beakers. What we did not talk about yet, however, is how we actually annotate our documents in the first place.
The general plan behind the Pelagios annotation workflow is this:
- We use Named Entity Recognition (NER) to identify a first batch of place names automatically in our source texts. This step is also called “geo-parsing”, and tells us which toponyms there are in our text, and where in the text they occur. We implemented NER using the open source Stanford NLP Toolkit, and presently restrict this step to English translations of our documents. In a later project phase, we intend to cross-match the data gathered from the English translations to the original language versions, which is likely more feasible within the lifetime of the project, than trying to attempt latin-language NER.
- NER gives us the toponyms. What it does not tell us anything about, however, is which places they represent, or where these places are located. Next, we therefore look up the toponyms in our gazetteer, and determine the most plausible match. This step is called “geo-resolution”, and – like NER – is also fully automated.
- Naturally, neither geo-parsing nor geo-resolution work perfectly. Therefore, we need to manually verify the results of our automatic processes, correct erroneous NER or geo-resolution matches, and fill gaps where NER or geo-resolution have failed to produce a result at all. And this is where our new Tool Recogito comes in.
The Itinerarium Burdigalense
The first document we’ve tackled entirely in Recogito is the Itinerarium Burdigalense: the Itinerarium Burdigalense (or Bordeaux Itinerary) is a travel document that records a Pilgrim route between the cities of Bordeaux and Jerusalem. It is considered the oldest Christian pilgrimage document, dated in 333 AD – which is just 20 years after the Edict of Milan from 313, when the Emperor Constantine granted the religious liberty to Christians (and other religions). Formally, this document is very similar in some aspects to the Itinerarium Provinciarum Antonini Augusti: both of them are compiled as a list of places with the distances between them. Additionally, the Itinerarium Burdigalense also marked all the places as mutatio, mansio or civitas (change, halt or city) in a similar way as the Peutinger Table. The format of the document changes when the travel arrives to Judea, where it offers detailed descriptions of important places to Christian Pilgrims. So we can consider it an itinerarium in the tradition of Greek and Roman writing, except for its Christian emphasis. (We’ve compiled a detailed bibliography for the Itinerarium Burdigalense here. The text of an English translation can be found, for example, on this Website.)
Annotating the Bordeaux Itinerary with Recogito
Recogito presents the results of our automatic processing steps in two flavours: in a text-based user interface, which is primarily designed to inspect and correct what the geoparser has done; and in a map-based interface which is used to work with the results from the geo-resolution step. A screenshot of the latter is shown in Fig.2, and we will explore it in more detail below. The former interface (which benefits from a little pre-knowledge of the map-based interface) we will disucss in a separate blog post.
Geo-Resolution Verification & Correction
The map-based interface separates the screen into a table listing the toponyms, and a map that shows how they are mapped to places. The primary work area for us in this interface is the table: here, we can scroll through all the toponyms and quickly check the gazetteer IDs they were mapped to. As a matter of policy, we want to explicity keep track of which toponyms have been looked at by someone, and which haven’t. To that end, each entry in the table can be ‘signed off’ as either a verified gazetteer match, an unknown place, or a false NER detection. (In addition, there is also a generic ‘ignore’ flag, for toponyms that may be correctly identified in a technical sense, but which we don’t want to appear in the map for whatever reason.)
Double-clicking an entry in the table opens a window with details for the toponym (Fig.3): the window shows the previous automatic gazetteer match (if any), the latest manual correction, and a text snippet showing the toponym in context. A lists of suggestions for other potential gazetteer matches, along with a small search widget allows us to quickly re-assign the gazetteer match in case it is incorrect. The change history for each toponym is recorded so we know who has change what (and when), or whether there are places that may see substantially more edits than others in the long run. Furthermore, manual changes are recorded separately from the initial automatic results. This way we will be able to benchmark the performance of NER and automatic geo-resolution later on. Detailed figures for the Bordeaux Itinerary are not yet out – but our initial figures suggest that NER has caught about 2/3 of all toponyms; and that approx. 80% of NER results were correct detections. The automatic geo-resolution correctly resolved between 30%-40% of the toponyms.
While Recogito is still under heavy construction, Pau is already deeply buried in the next document – which we will present in one of our next blogposts, together with an overview of the text-based interface.