The intention of the Linked Pasts Community & Sustainability Working Group (WG) is to bootstrap Linked Pasts into becoming a viable, long-term and self-sustaining project. Ultimately the aim is to stimulate the start of a virtuous cycle – for Linked Pasts to foster a scholarly community, and in turn provide it with tools enabling this community to contribute to and develop Linked Pasts.
In order to achieve these aims, the WG proposes to:
* Scope, and develop a Linked Pasts online publication, complete with plans for implementing a commissioning, review and technical platform
* Develop a high-level exploration plan for a technical infrastructure for Linked Pasts and linked data stores.
Working Group Coordinators: Timothy Hill & Karl Grossner
The Orbis Initiative: a Pelagios for Networks?
May 27, 2016 at 2:40 am #1261
My particular interest in joining a couple of Pelagios SIGs is to introduce what I’ve so far called ‘The Orbis Initiative,’ which in broad terms would do for trajectories (itineraries, routes, flows) and paths (roads, rivers, canals) what Pelagios is doing for ‘places.’ Among other things, it would enable network analyses at scales that aren’t currently possible.
The impetus for moving forward on this now is an emerging shared interest in linking such data by several scholars working on cultural diffusion and commercial activity in Central and East Asia and between Asia and Europe. The combined temporal extent of their work is 7th c BCE to 17th c AD.
I’ve written a blog post that outlines the idea, and welcome comments, suggestions, tweets, and collaboration.
cheers, Karl Grossner
May 27, 2016 at 12:40 pm #1262
thank you for the note and the link to your interesting and exciting blog post! I think it is a really great idea to use your experience from building the ORBIS system to create the “generic data infrastructure and tools” that you are aiming at.
While finding a technical representation for “place sequences” may sound like a fairly trivial task, it is much harder to define properties, names, labels, or even “identity” for such constructs in a clear and well-documented manner. Yet this is a crucial process if distributed heterogeneous datasets containing such entries are to be merged at scale and also linked to (textual) sources. As you have already contact with several partners who have an interest in linking their data, you surely know best how different the processes and formats involved can be. I am looking forward to see a community-made and widely agreed-upon model for that evolve from within your Orbis Initiative!
My background lies in Natural Language Processing, and in my opinion, the task of extracting trajectories from text is a logical (and positively challenging) next step to be taken, once Named Entity Recognition works sufficiently well. So especially in the advent of automatically extracted/estimated trajectories (but also for manually created datasets) there is for me the big question of data quality. How to establish and protect data quality in a linked scenario? How to measure it in the first place? Maybe it is helpful to think about how much “guesswork” should be allowed to create such datasets, and how the uncertain parts (or better the whole process) can be properly documented (in a machine-readable way?).
I probably wouldn’t be so nitpicky about that issue, if you hadn’t brought up network analysis as a central use case. Having a small (or even big) error in a gazetteer entry or georeferencing-relation has usually only a small effect on an overall-picture when you ask “What is my dataset’s geospatiall distribution?”. But when it comes to networks, the possible damage of a single wrong entry can be immense. Rightfully unconnected subnetworks could suddenly become connected. Flows could be redirected. Everything could change.
One “accidential” path crossing the Taklamakan desert north to south could totally mess up the network analysis of your trade or cultural transfer model for the whole of Central Asia. Luckily, of course, the big advantage with “geographically grounded” network data is, that you can relatively easy spot such odd entries by looking at a map visualisation. Yet for me this is merely a slight consolation, when linked data can only be assessed/trusted by manually observing it through visual plots.
One other thing to mention in that context, is the obscure role of missing information. Information about paths may have gotten lost in the course of history, paths might be known but still not entered into a computer, they might be entered somewhere but not yet be publicly available… That is something researchers have to deal with, and it is widely accounted for in established methodologies. In the logic of the Semantic Web, where the LOD idea comes from, (and also in other knowledge representation systems) there is the so-called Open-world Assumption that helps us to not view a missing information item as false. Network Analysis on the other hand has in most cases a strong tendency towards viewing missing nodes and edges as strictly non-existent. That means that all the common network measures (centrality, clustering, whatever) but also manually performed “descriptions” of connectedness are almost guaranteed to be wrong (to some extent). I am sure that this can be somehow countered with smarter, adapted algorithms and new scholarly methods, but at the moment I see this as a big issue. An issue of which I would hope that it became publicly and academically discussed and I would like to encourage you to get into discussion with your data providing partners but also with future data users regarding this problem.
But in the end pure network analyis is maybe just a beginning. If we think broader, the tools and infrastructure from your Orbis Initiative could prove essential also e.g. for simulations and other more process- and systems-centric analyses. Then, I wonder whether a strong connection or easy interoperability with typical GIS modelling constructs wouldn’t be very helpful. For advanced use case one would surely want to add numerical or categorial values to trajectories, paths and waypoints (estimated population of a place, availability of natural resources in vincity, quantity of different goods likely produced there, …). For that it may be worth looking at existing GIS standards. I just happened to come across CoverageJSON that also features a (very technical) trajectory representation (see specifically here). It may be interesting to combine such a data representation with other coverage “Domain Types” such as e.g. a grid of (time-dependent) climate date to assess sea travel, etc.
Another random thing that came to my mind when I read about the planned introduction of temporal attributes to geographic feautures, was the cover image in this article. For naturals features and for man-made paths it is far from easy to define temporal stability and temporal extent. Of course in a linked context, when mixing data from different sources it is even harder to secure a reliable consistency. So what I would love to see here, too, are some means of making explicit the level of abstraction or specific “resolution” used. I remember reading something about the struggle to include higher resolution roads in ORBIS:Rome, because they of course depict a longer path than a more coarse version of the same road. Time resolution may also become an issue integrating different sources.
June 1, 2016 at 6:17 am #1300
Thank you for your many excellent points and suggestions. I offer these comments and responses (some relates to Chiara’s post also):
“finding a technical representation for ‘place sequences’ may sound like a fairly trivial task”
Although I have tried to phrase the immediate goal of this initiative in simple terms, I sure don’t see the steps to reaching it as trivial tasks! But your nice term ‘place sequences,’ which corresponds to what I’ve called trajectories, illustrates nicely that what is to be modeled should reduce to a simple useful abstraction/pattern/model.
You say: “For advanced use case one would surely want to add numerical or categorial values to trajectories, paths and waypoints
Yes, and to the extent we try to find common vocabularies for categories, we will bog down. Every data set I have encountered has encoded different things, in different ways. I am a firm believer in discovering domain ontologies from data in the wild and not imposing them. That said, in order to link data at all we have to agree on a few concepts and relations — ‘place’ for example. Consensus is difficult to arrive at, but not impossible. People agree to use shared vocabularies when they are extensible and demonstrated to be useful.
Regarding identity of digital objects representing trajectories and paths, my intent is to follow the Pleiades/Pelagios model of representing attestations, not real-world entities. The identity of an attestation is grounded—in a string of text or a line on a map. I think (?) identity may be less of an issue with trajectories than it has been for Place in gazetteer because trajectories are events.
“the task of extracting trajectories from text is a logical (and positively challenging) next step to be taken”
This is a fascinating line of inquiry, one I’ve thought about but never began working at — I was intrigued by the possibilities in the travel writing corpus that is part of the Vision of Britain web site (http://www.visionofbritain.org.uk/travellers/). The collection’s curator, Humphrey Southall has spoken with me about one day mapping those texts.
“How to establish and protect data quality in a linked scenario?”
You make several good points on this. It would be helpful to come up with some metrics (and labels) for one or more of the many flavors of uncertainty that aren’t prohibitively complex to implement. This is true for all historical knowledge representation, not only this type.
“when it comes to networks, the possible damage of a single wrong entry can be immense;” not only network analysis but “simulations and other more process- and systems-centric analyses…”
Yes, people must create their datasets carefully and be prepared to qualify/defend their selections and/or sampling. We found errors in the initial ORBIS:Rome network when some results made no sense. ORBIS:Rome itself is essentially a route simulation machine. Some network measures were added to version 2 (clusters, flow potential, centrality) that I wasn’t entirely comfortable with; choices of the researcher. The model _suggests_ certain facts, doesn’t demonstrate or prove them — which is the case with so much digital historical research. The Orbis Initiative intends to facilitate data creation. Model creation is another matter! To each their own. Of course there can be an emergence of best practices perhaps.
“So what I would love to see here, too, are some means of making explicit the level of abstraction or specific ‘resolution’ used.”
I take this point completely. In the original Topotime spec our “when” object included an indicator of “grain” (also several other fairly complicated constructs). I’ve been persuaded since, in various technical conversations about Pelagios and PeriodO, that simplicity is absolutely essential for a prospective standard format to have any hope of uptake. I think this was learned from hard-won experience. Grain of a dataset can be discovered algorithmically also, but I agree it would be good to have in a spec. Any new standard require software tools that interprets it!
What works about Pelagios as a means for linking gazetteers is that it/they say nothing about how you organize your gazetteer. It can be a complex and as domain-specific as you like (the Pleiades model is not simple). But if you want to link your data with others, publish a simplified serialization. The Orbis Initiative would strive for the same approach — the pertains to topics mentioned above of course.
There are so many topics of conversation in your message (and Chiara’s)! By all means let’s continue to discuss them.
I will broadcast progress as it occurs – here, on my blog, and on the Orbis Initiative GitHub repo, just barely begun (). And because this is turning out to be a context for pushing Topotime further along, on that repo too
May 27, 2016 at 1:12 pm #1263
It’s very good that this discussion is starting to raise, and I am so happy to hear about the Orbis Initiative – I was starting to wonder why the project was not taking this path yet (now I think I have the answer).
What I will say here is very related to the post of @thomasefer, with whom I tend to agree most of the time…
I am now precisely evaluating ways of formalizing spatial relations in a wide range of Greek and Roman geographical texts, as part of my doctoral dissertation, and I partly agree on what you say about ontologies providing a conceptual help for this stage. However, to create a way of collecting such data easily is not a walk in the park: we do not really know how many and what different types of relations are described in ancient sources, not really – because we are lacking basic studies that go in depth with this problem systematically and specifically on these sources (not to mention that we even lack a dictionary of Graeco-Roman geographical lexicon, but that’s another story). Therefore, I am strongly convinced that we need to address this issue bottom-up, by annotating structural patterns as they are found in the source in a simple way, without superimposing a semantic model ourselves. Spatial narrative is in itself a “cognitive structure”, so theoretically this can be done – but in order to formalize it, we need data first, and I’m not sure whether we can already foresee the type of relations that we will encounter. This somehow agrees with what you say in your blog post: it’s much better to have a Pelagios-like approach, then to impose a framework in the first place.
Second, in this particular case, collecting data is a thing, modelling is a rather different matter, and we all know the enormous amount of underspecification that ancient sources have. How to overcome underspecification without “inventing” data in order to fill in the gaps is still a major concern to me. In strong agreement with Thomas here, I maintain the provocative point of view that we will need to go beyond traditional GIS/cartographic approaches at some point, but that obviously depends on the type of research that we want to conduct with these data (and one type of representation does not exclude the other). Historical research is likely to benefit both from cartographic and non-cartographic approaches.
How to collect data about spatial relationships as inferred from ancient sources, in a way that they can be converted in a good relational standard (such as RDF or JSON), is a problem that needs to be discussed methodologically first, and in the light of the actual data that we have. I am now testing and evaluating different strategies, an not all of them are immediately compatible with these standards (though there are good chances that they will be at one point).
This conversation is exactly what I was expecting from Commons, and I would be very interested in taking part to the project.
June 1, 2016 at 6:24 am #1301
Thanks for the many good points you raise. Some I comment on in my response to Thomas above, but some more thoughts here:
“not a walk in the park”
Yes, as I say to Thomas, not trivial at all. But as a “reformed ontologist” I try to keep in mind continually that some very useful things can come of a simple interchange format–as Pelagios is demonstrating.
“I am strongly convinced that we need to address this issue bottom-up, by annotating structural patterns as they are found in the source in a simple way, without superimposing a semantic model ourselves”
“in order to formalize it, we need data first, and I’m not sure whether we can already foresee the type of relations that we will encounter”
I completely agree, and the work I’m beginning with a few folks is to look at data from numerous projects. This is at a different stage than you (and maybe Thomas?) seem most concerned with. That is, the data sources I’m working with are secondary (or tertiary). So what I see are the many very varied ways people have encoded references to places and trajectories in text. My impression is that your work illuminates the encoding process itself, and perhaps would lead to some best practices or standards. This is great, an important contribution IMO.
your ‘provocation’: “we will need to go beyond traditional GIS/cartographic approaches at some point”
I am not bound to cartographic representations, only starting there in this project. I am very curious to know what an example of such a non-traditional approach would be. I have had a few discussions over the years with people working on Dreamtime representation, and am struck by the fact almost all Australian aboriginal art depicts “maps” of a kind that look nothing like our normal conception of a map.
I too look forward to continuing these discussions in this forum and elsewhere!
June 1, 2016 at 9:20 am #1306
Thanks Karl for these very detailed and interesting answers!
First of all, before I forget, I think it’s worth mentioning the work of some other people out there that I expect can contribute to the discussion: you are certainly aware, for example, of the GeoLat project held by Maurizio Lana, with whom I talked recently and has reportedly brought to the establishment of the GO! ontology, which doesn’t consider only place connectivity but possibly every type of spatial category in a large sense. Another very interesting point of view, quite close to mine, is that of Oyvind Eide, who has tested annotation and modelling of premodern spatial descriptions in text form, first of all approaching them from the direct source, and using CIDOC CRM as a semantic model for relations (but without strictly following it, for obvious reasons). I highly recommend having a look at his work, because at present it’s the most complete and thoughtful about this subject in Digital Humanities, although it does not have to do with the so-called Classical world. Another person that would be very worth having involved in this conversation is Guenther Goerz, who has been very helpful for me and is currently working on developing a way of automatically extracting RDF triples from sources and integrating them in digital editions. Last but not least, I know that our common friend @romanov is watching us and he’s part of this discussion as well (http://maximromanov.github.io/): he has developed a way of annotating and extracting structural patterns as they are expressed in texts, which is very simple and immediately related to the primary source, but at the same time holds a strong relational aspect between textual entities.
On sources: yes, I think that we should start having a look at the texts in the first place. Editing and linguistic encoding are the first stages for me, I am currently doing that aside from some rough annotation experiments. It’s important to see what people have done so far, but except for a few cases (like our friend Elton Barker, for example, and those I mentioned above), I don’t think many have consciously faced the problem – not in Classics, while there has been plenty of work in other sectors. So it’s quite understandable that there are very different methods out there: actually many are not even standards (look at TEI – there is seriously no rational way of encoding a spatial relation!), that’s why I started being concerned about establishing one.
On cartography: there’s a reason why I don’t, say, “like” cartographic maps. You certainly know that navigation and way-finding in the Graeco-Roman world have nothing to do with them. We could discuss about a few exceptional cases, but I strongly maintain the view that these people had no idea about how to use maps as we use them, and that is valid for any premodern society up to the XV century at least. So the reason why I’m pushing in this direction is primarily methodological and historical, which doesn’t mean that we cannot have good results from a cartographic representation of this geography – but that we need to consider other options to be fair to our sources. One very interesting discussion was recently raised by Klaus Geus, who is in favor of the notions of Common Sense Geography and its representation by means of mental maps. I personally like the idea of representing this image of the world in a topological manner, which could possibly also reduce the amount of underspecification and vagueness, being strictly focused on relationships between entities.
Another point relates to data uncertainty. You speak about various “flavors” of uncertainty very rightly. The problem I see there is to recognize the labels that we are supposed to give them: there have been very interesting experiments going on with Epigraphical databases about uncertainty in findspots and locations, but I suspect that working with direct sources will be much more challenging. We need to define our concept of uncertainty also according to our purposes: in a modelling environment, for example, the notion of “north” is by no means sufficient for representing a direction. Is that uncertainty? And what about vagueness in proximity relations? Are they all uncertain at the same level? Often our sources do not give an answer about that. And we probably also need to remind us that the way Greeks and Romans experienced travel is very different from ours: Greeks used to sail along the coast for months, and getting “lost” for one night or two may not really have been a problem. Another conversation going on these days with Marianne O’Doherty made me realize that sometimes ancient authors just “say” that they’ve reached a certain place, without having any idea of where they have really been. This is the very thin boundary between real and imaginative space that we always need to consider while working on textual sources (not that one cannot “lie with maps” as well).
You very rightly say that we need to agree on the basic vocabulary for expressing spatial entities and relations. Pleiades has established an important precedent in this regard, by creating a notion of “place” which we all can agree on. So that can be considered done…However, I’m not sure whether trajectories/routes are so much of a solid notion. Not only because of the necessity of rationally establishing the nature of the connection, as Thomas reports, but also because – well, when you face the texts themselves, you soon realize that it’s not that easy. Yes, they describe routes. But what about describing extents? Perimeters? Relative locations? Boundaries? And all of these things are historically described by means of spatial relations in ancient geographies, because they simply had no way of doing otherwise (see, no maps, just words). Are these trajectories? Probably not. Yet, they are fundamental for the reconstruction of the space described. A model that wants to be complete – a standard – needs to address them as well in the building process of its vocabulary.
Lastly, relations between spatial entities as “events” are certainly a thing – maybe not the only one. The overlap between concepts and “places” in the largest sense is something that is worth investigating too. For example, how places are associated to the concept of “boundary”, “insularity”, “continent”, and so on. In this case, I think that Pelagios will offer us something very good to work with in the next few months, also because here the annotation process may be fairly easier (not so much all the rest, though).
— Let’s get concrete on this one, won’t we. I am currently working on editing Greek and Roman geographical texts with a very high density of spatial data, and I am treebanking and annotating spatial relations in some of them (I wish I could do it with all of them, but I’m still human). So, I have stuff and will have plenty in the next few months. I am here to run experiments, therefore if you want to run some tests on primary sources, you have my whole attention.
June 17, 2016 at 11:16 pm #1482
Regarding the subject of extracting trajectories from textual sources, there are a couple of previous studies that you might consider interesting:
- X. Zhang, B. Qiu, P. Mitra, S. Xu, A. Klippel and A. M. MacEachren (2012) Disambiguating Road Names in Text Route Descriptions using Exact-All-Hop Shortest Path Algorithm. In: Proceedings of the European Conference on Artificial Intelligence
- L. Moncla, M. Gaio, J. Nogueras-Iso and S. Mustière (2016) Reconstruction of itineraries from annotated text with an informed spanning tree algorithm. International Journal of Geographical Information Science. 30(6)
Together with Patricia Murrieta-Flores from Univ. Chester, I’m actually thinking about submitting a proposal to the “Commons Resource Development Grants,” focusing on automatically geo-referencing itineraries. Although our focus would be on tabular itineraries, the same techniques that we envision employing could also be interested in the context of extracting trajectories from free text.
Please do let us know if you have some comments about this proposal. I’ll probably also post a separate message on the forum (and in the Gazetteer SIG forum as well… I’m actually not sure if this proposal would be a better fit there) asking for feedback on the proposal.
Here’s a copy of the text from the current version of our proposal:
<span style=”font-size: 90%;”>Descriptions for historical routes, commonly referred to as itineraries, are abundant resources that also constitute important objects of study in various disciplines associated with the humanities [Szabó 2009; Blank & Henrich 2015]. There are a number of well-known historical manuscripts, as well as transcriptions from the 19<sup>th</sup> or 20<sup>th</sup> centuries, nowadays digitally scanned and often also available online, containing information on historical routes (see, for example, the online list of Medieval itineraries collected by Peter Robins). Figure 1 presents two well-known examples of depictions for historical itineraries, namely a part of the itinerary in the Vicarello Cups (on the left), that goes from Gades (i.e., modern Cadiz) over land to Rome, and a part of the Itinerarium Alexandri, which describes Alexander the Great’s journey of conquest over the Persian Empire (on the right part of the figure).</span>
<span style=”font-size: 90%;”>As illustrated by the previous examples and also noted in previous publications [Blank & Henrich 2015; Adelfio and Samet 2014], historical itineraries were often published as tables or sequential lists of items, with each item corresponding to a toponym (i.e., a location name or a specific point-of-interest) that is to be visited in the context of a route. In these tables, the toponyms are often associated with information regarding the approximate distance to the previous toponym on the route. Less frequently, toponyms are also associated with descriptive information and/or graphical markers for locations with certain characteristics. However, there are few cases of historical itineraries originally associated with map-based representations, and the ambiguity in the toponyms that are involved in the itineraries presents challenges to automatically geo-referencing and mapping these resources (i.e., challenges that are slightly different from those that arise in the more common problem of geo-referencing toponyms in textual documents [Moncla et al., 2016; Santos et al., 2016; Zhang et al., 2012]), so as to support further analysis.</span>
In fact, we believe that the analysis of the characteristics of certain itineraries (e.g., checking them for consistency, or perhaps even reaching new inquires and inferences about the routes) can be greatly facilitated through the use of maps and/or geographic information systems. However, automatically geo-referencing itineraries (i.e., converting the toponyms to latitude and longitude coordinates), in order to represent the routes in the form of trajectories within maps, presents many technical challenges (e.g., itineraries can mention places through historical names that are different from the ones that are currently used, they can mention names for places/POIs that can correspond to different physical locations, etc.). Addressing these challenges involves the combined expertise from technical areas such as algorithms and geographic information systems, whereas their proper validation requires the involvement of digital humanities researchers.
<span style=”font-size: 90%;”>This project proposal concerns with the study and development of techniques to automatically geo-reference historical itineraries, based on the idea that travelers tend to choose the most efficient routes for traveling between locations (i.e., it is assumed that itineraries will tend to minimize the distance between the locations being visited, as also noted in previous preliminary studies in this problem [Blank & Henrich 2015; Adelfio and Samet 2014]). We will develop an open-source prototype system for automatically geo-referencing itineraries, combining different algorithms and with basis on a development methodology that will first build an initial prototype focusing on assigning toponyms to latitude and longitude coordinates, and then refine this prototype in order to consider the most likely paths between pairs of locations visited in sequence within an itinerary.</span>
Specifically, we will use approximate search techniques with basis on string similarity [Navarro 2011; Recchia & Louwerse 2013; Kılınç 2016] to search for candidate disambiguations for each toponym, over existing databases that associate location names to latitude and longitude coordinates (e.g., search over gazetteer services/datasets such as geonames.org or the one from Pleiades, or search over lists of locations described in Wikipedia and associated with coordinates). In the initial prototype, the distance between each pair of candidate locations that are visited in sequence over the itinerary will be estimated as the straight line distance between the geospatial coordinates associated to each candidate, and we will also use dynamic programming algorithms (e.g., adaptations of the Viterbi algorithm for finding the least-cost path in a graph that encodes possible paths between candidates [Forney 1973]) as a way to compute which coordinates should be associated to each item on the itinerary, in order to minimize the total distance of the trip.
In the second stage, the work that is to be developed will involve the study and development of approaches to better estimate the distance/cost associated to traveling between each pair of locations presented on an itinerary, through the application of a method known as least-cost path analysis [Romanowska et al. 2012; Douglas 1994; Yu et al., 2003], as an alternative to considering the straight line distance. Least-cost path analysis will allow us to estimate the most likely route between two locations, and consequently also the distance associated to this most likely route, using features such as the elevation of the terrain, the type of land coverage, or the distance to the modern road network, in order to estimate what is the most likely route for traveling between two places (e.g., the technique involves the usage of an algorithm such as A* to search for a sequence/path of cells [Zeng & Church, 2009], over a geo-referenced grid that represents the geographical area associated to the itinerary, where each cell is associated to a cost derived from features obtained from raster datasets that encode information such as the terrain elevation).
The prototype that is to be developed will support the realization of experiments with itineraries manually geo-referenced by domain experts, through which we will assess the quality of the proposed procedure (e.g., by measuring the distance between the estimated and the ground-truth coordinates for the toponyms in the itinerary). Besides the open-source prototype system (i.e., an application to be made available on github), we also estimate that the project will result in the publication of an article describing the proposed method and the results of these experiments. It is important to notice that the project will not focus on aspects related to the digitalization/OCR of manuscripts containing historical itineraries, instead focusing on the implementation of the method for geo-referencing itineraries that are already represented digitally as a sequence of toponyms. As such, our experiments will leverage existing resources such as the aforementioned collection of medieval itineraries from Peter Robins (some of which are already geo-referenced according to modern standards, so as to support map-based visualizations), in order to assess the quality of the obtained results.
<span style=”font-size: 90%;”> https://www.peterrobins.co.uk/itineraries/list.html</span>
June 18, 2016 at 3:03 am #1486
Bruno, Patricia (and all),
It is great to hear about this very interesting work you are doing and planning. I had a notion to attempt itinerary recognition from texts several years ago, using the travel writings found on the Vision of Britain site and discussed this with its curator, Humphrey Southall, but other things intervened — as they do. I came to believe that parsing natural language such as that is a hard problem in CS terms, with little likelihood of automatically generating useful results. I do think the best prospects are for machine learning algorithms (neural nets, e.g.) to narrow the search space in a large collection of documents; that is, identify candidate documents. Ultimately, good usable data needs some human curation, and as Pleiades/Pelagios have demonstrated, the “crowd” (or community more accurately) can make terrific progress. So I think it’s wise to start with tabular itineraries (!) — and the more interesting (to me) geospatial problem, estimating actual paths.
I am planning to submit a Pelagios Commons proposal concerning itineraries as well, about digital representation of evidence for routes, itineraries (routes taken), and flows (of people, commodities, information). I am working with colleagues to arrive at two data models/formats — a flexible framework for stores of such data (a GeoJSON extension I call Topotime, mentioned in my recent blog post), and a much slimmer “interconnection format” corresponding to the one Pelagios uses for linking gazetteers. I’ll be posting a draft of that in the next day or two.
You must be logged in to reply to this topic.