Half-way through our work on CALCS, we’re ready to discuss what we’ve been up to so far and what are our next steps. Before starting to collect place-names, we had to do some preliminary work that, hopefully, can be reused by other members of the Pelagios Commons community.
Setting up the machine
If you’re reading this page, you probably know a bit about how Pleiades place records work, and that the process of adding new data to the gazetteer, while relatively simple, can take quite a while (mostly due to lag on upload and save), especially if your goal is to contribute large numbers of new names. Manual entry of the hundreds of names we are collecting was not really viable, especially considering that CALCS is such a short project. So we spoke with Pleiades (embodied by Tom Elliott) and decided to beta-test a bulk upload feature that would import information about names in CSV format. We took advantage of the fact that, on this occasion, we’re not creating new places, but only adding new names, and so the ability to refer to existing Pleiades URIs in our name data made things relatively easy, even in cases when we had multiple names for a single place. We designed a spreadsheet whose fields mirror the indispensable information in a model Pleiades entry for names, collected a first batch of about a hundred and fifty names, and are in the process of testing the upload.
Another preliminary step was to draft a transliteration convention. There are more letters in the Arabic alphabet than in Latin, so it is unsurprising that there isn’t a straightforward and unanimously accepted convention for transliteration, and all of them tend to be driven by the national language of their creators. Of course, there are excellent conventions, such as the one used by the Encyclopedia of Islam. However, they all use diacritics or other signs (such as under- or super-score), and all deal in slightly different ways with the phonetic challenges. We discussed the issue and, then, we realised that actually, for our purpose, we didn’t need a set of unambiguous, one-to-one transliteration rules. Those very accurate systems of criteria are extremely useful to reconstruct the Arabic scripts in absentia (as, indeed, does the EoI, which doesn’t feature Arabic characters at all in its English version), but, in our case, the transliteration will never be separated from the Arabic original. In this sense, we don’t have big philological concerns, as Arabic speakers can always refer to the actual attestation of the name. What we needed was a simpler convention, to make the Arabic and Ottoman Turkish names more easily searchable in Pleiades even by non Arabic readers. We decided then to follow a set of rules not too distant from that used by Wikipedia, which seemed to make sense while sticking to ASCII characters. Our transliterations are certainly a simplification—and they may give a slight heart attack to an Arabist or philologist, but we decided it was practical, and so far it seems to work well. (Actual attested transliterations of Arabic names in use in scholarship can also of course be added to Pleiades in their own right, with reference to sources and language of transcription.)
Looking for sources: from scripts to libraries to networking
For the machine data extraction side of our project, we relied on the quite formalised structure of Pleiades and on the fact that many place entries have a Wikipedia page as a “see further” citation (meaning pages that have as primary topic the place under discussion). We developed a script (available on Github) that searches for those kind of entries, selects among the Wikipedia pages cited those that link to a Wikidata entry, and finds within the Wikidata page the Arabic and Turkish names (if any). Then, the script outputs the information in a TSV notation that can easily be imported into the spreadsheet we designed. We harvested a good number of names using this script, but, of course, our long list of 2300 or so results needs now to be checked (and transliterated) by hand before we submit it to Pleiades.
We had already identified some historical sources before the beginning of this project. We had decided to start from two major Islamic maps of the world: the Tabula Rogeriana and the Book of Curiosities. They’re both already available digitally (in the case of the Tabula Rogeriana we have digital copies of two different manuscripts) but we hope to negotiate better usage rights for them, so we can use and annotate them in Recogito. But we’re also now looking at other beautiful and rich maps such as the work of Ottoman cartographer Piri Reis or the illustrations for the Book of Roads and Kingdoms (Kitab al-masalik wa-al-mamalik) of Abu Ishaq Ibrahim ibn Muhammad al-Farisi al Istakhri. We’re also looking for digitisations of Mediaeval Islamic geographic texts in the public domain to perform named entity recognition (NER). If you come across any interesting sources, maps as well as texts, please, let us know about them!
In facts, CALCS was always meant to be a collaborative project, and we’re trying to involve as many colleagues as possible in this effort. We have started informal conversations with other researchers willing to contribute (even just a dozen) place names in Arabic or Turkish. But also in Phoenician, ancient Egyptian, Coptic, or pre-Arabic languages. We’re also using CALCS as a case study for workshops in Digital Humanities and, especially, when we teach digital ancient geography. We’re just back from a very productive week in Sofia, where our colleagues were happy to contribute to CALCS tagging and georesolving Turkish place names in the Black Sea and Balkan area. This material is also about to be used in a class for the Sunoikisis Digital Classics programme running this semester, with the help of colleagues in Egypt. Our next stop is Barcelona, where we’re planning to make the most of our colleagues’ expertise on Spanish and Andalucian place names.
If you have questions or comments, if you want to tell us about some new sources or get your students involved in a workshop, we look forward to you getting in touch.