Linking Linked Places project update
Last month Lex Berman hosted Rainer Simon and me for three days at Harvard’s Center for Geographic Analysis, to consolidate understandings from remote discussions about our Linked Places project, and to move the work forward. We made excellent progress. Coincidentally, members of the Linked Data for Libraries (LD4L) project’s geospatial working group were meeting simultaneously on campus and we discussed historical gazetteer requirements with them in a working meeting. We also presented on our work, and Pelagios generally, before a wider audience in a 90-minute lunchtime seminar and over a raised glass or two one evening.
Linked Places aims to create a provisional data model and standard formats for representing historical geographic movement as well as some prototype software for managing and visualizing data in those formats. We began by refining an earlier version of the following figure; an explanation follows.
Conceptual and logical models
Our first step has been to agree between ourselves on the entities involved, the relationships between them, and a provisional vocabulary. Many of the possible terms have multiple and overlapping meanings, so this is not a simple matter and no solution will please everyone. What is essential is that there be a sound internal logic, and that linking with existing models, formats, and systems be facilitated.
Not everything represented in this figure will make it into data models and formats; concepts like way (the physical media or channels for routes) are provided for context. Some details are omitted here; for example, time periods will be articulated in the proposed data formats to follow. Also, the core properties of flows and historical_routes are, at this writing, “to be determined.”
This figure—a cross between a concept map and a UML diagram—resembles a common form for describing ontology design patterns (ODPs). It describes the following understandings:
A route describes an attestation of one or more occurrences of the movement, on or near the earth surface, of people, commodities or information between two or more places, at some time during or throughout some time_period. Routes are composed of one or more segment, each of which is composed of two places and a path (corresponding to nodes and edges in network parlance), the locations and temporal attributes for which may or may not be known or specified. Movement between places may have been uni- or bi-directional.
The three types of routes considered here are journeys, flows, and historical_routes (hRoutes):
A journey is a record of a specific instance of travel by one or more individuals. Examples include: the 7th century pilgrimage of the Buddhist monk Xuanzang across China and India; the first voyage of Captain James Cook, between 1768 and 1771.
A flow is the record of the movement of something (commodities, people, ideas) between two places, aggregated as a magnitude, throughout some period. Examples include: the number of captive Africans conveyed between West Africa and Bahia in the 17th century; the volume of letters between correspondents in Paris and Prague in the 18th century.
A historical_route (or hRoute) asserts a single or composite course of travel between places, taken repeatedly by unspecified individuals over time, usually for purposes of commerce. Examples include the Silk Road, Ming Dynasty Courier Routes, and the medieval Amber Routes. Some correspond with named roads, for instance the Via Salaria in Italy.
The following additional propositions are indicated by the relations and cardinality expressions (e.g. 0…*) in the figure:
- All routes are attestations, cited in textual or cartographic documents
- The way (physical path, including geometry) for a segment may be known and represented, unknown, or ignored. A segment with an unspecified way will be rendered in many visualizations as a straight or curved line
- Each segment has one or more temporal attribute, which can be a time_period or sequence (e.g. after segment n)
- Routes and their component segments can have any number of properties, dependent on the data source(s) and project requirements
This is network data, right?
Right. It will be apparent to most or all readers that the places and segments above correspond to the standard graph abstraction of nodes and edges. In this work we’re proposing a preliminary specification of core spatial and temporal properties for the nodes and edges of historical movement data. One product of this work will be scripts to convert data between formats, including the new GeoJSON-T (discussed below), CSV, RDF, and one or more graph format such as GEXF.
Network data for routes is almost always either partially or entirely geographically embedded. In most cases, places have known or estimated point geometry. Paths in historical data are less frequently known, and are often estimated—for example with least effort hiking functions computed over modern day topographical data. Our model must also account for cases where geometry for paths and/or places is unknown or not recorded, including data about fictional routes.
A few notes on terminology
For this work, the jumble of potentially relevant terms included: journey, itinerary, route, path, trajectory, course, way, and flow—all of which relate to the movement of stuff (things, people, ideas, knowledge) between two or more places at some time during or throughout some period. Periods might be at any scale or precision, and described by names, intervals, or both.
A distinctive characteristic of this domain is that the same occurrences may be considered as more than one type of route. That is, the same data may possibly be used to produce one or more journey, flow, or hRoute dataset in any combination. For example, our Venetian Incanto Trade dataset in its original form is a record of individual journeys. We can (and have) aggregated these by any of several attributes as flows (e.g. year, patron, terminus).
The apparent agreement within the historical gazetteer community about the meaning of “place” is no small achievement—per Pleiades, they are concepts that are “…contexts for Locations and Names”—but is not truly and formally universal.
For example, the alignment of the Pleiades definition with the scope notes of CIDOC-CRM’s E53_Place (“…extents in space, in particular on the surface of the earth, in the pure sense of physics: independent from temporal phenomena and matter”) is imperfect. However, an E53_Place is_identified_with place names (appellations) and is_defined_by spatial primitives (locational geometry), so perhaps they are close enough. Fortunately, while we as a community sort out whether that matters, work can proceed.
We have nearly finished tweaking the GeoJSON extension I’d been working on for a while (called GeoJSON-T, the “T” is for time) to ensure it can fully express this conceptual model. Once that’s done, I can refine the scripts already written for transforming our exemplar route data sets into GeoJSON-T (https://github.com/kgeographer/topotime/wiki). We’ll also need a script to export GeoJSON-T to the Pelagios RDF Interconnection format. Lex will be testing possible alignment between the T-GAZ (temporal gazetteer) format he has developed and GeoJSON-T. As it stands, RDF for the places in our early route data examples can be readily rendered in Peripleo, confirmed by Rainer’s quick hack in Cambridge last month. There is an open question about whether to generate a simple RDF serialization of segments so Peripleo might render them—as straight lines, or as paths if actual geometry is given. Finally, I will create the prototype map and timeline interface mentioned earlier.
Not only for routes
It should be noted that GeoJSON-T is (or could be) an appropriate format for use by any historical gazetteer. Although there needn’t (shouldn’t?) be a single gazetteer data model, a recent Pelagios Commons discussion indicates interest in best practices for gazetteer development incorporating the GeoJSON format and I think GeoJSON-T is worth a look.
I will be presenting our sample datasets, associated scripts, and the prototype interface at the Linked Pasts meeting in Madrid this December. In the meantime, comments are welcome!
 The ODP approach to ontology engineering is gaining traction in the Semantic Web community; cf. Blomqvist, E., Hitzler, P., Janowicz, K., Krisnadhi, A., Narock, T., & Solanki, M. (2016). Considerations regarding ontology design patterns. Semant. Web, 7(1), 1-7.
 Has a specified geographic location