PastPlace / using Wikidata as a gazetteer
The PastPlace project is working to create a global historical gazetteer using Wikidata as a spine, to which we aim to add (a) a user interface more appropriate to a gazetteer than that provided by Wikidata and (b) additional attestations of place names taken from historical sources. What I could say much about but won’t:
- Why we need a minimal core gazetteer or “spine” of untyped “places”, rather than typed geographical features.
- Why historical researchers should use an existing open gazetteer as our spine, rather than build one from scratch — although the answer might well be different for archaeologists and classicists, working with places which arguably no longer exist.
- What Wikidata is, other than that it is a systematisation of Wikipedia — see https://www.wikidata.org/
I will say that I believe Wikidata is a better starting point than Geonames, Open Street Map or the NGA gazetteer because:
- Those are all gazetteers of features, where each entity is allowed to have many names but only one type. In some senses “places” are bundles of features sharing name elements and approximate locations, but in those other gazetteers each feature exists separately. The Wikidata data model allows entities to be instances of multiple other entities, which better approximates my notion of a place. See this key assertion on the nature of place by Bertrand Chorizo: http://www.theguardian.com/society/2011/mar/29/jaywick-essex-resort-most-deprived#comment-10147803
- Crowd-sourcing has made other gazetteers too big in unhelpful ways, creating ambiguity and other confusions, whereas Wikipedia’s “notability” requirement, coupled with an active editing community, keeps a lid on this.
- In practice, Wikidata provides a much higher ratio of names to places, although of course they lack attribution beyond the language edition of Wikipedia they come from.
That said, this thread is not meant to be a discussion of the pros and cons of using Wikidata as spine, but a discussion of how best, in practice, to turn Wikidata content into a historically useful gazetteer. More later, but we have been working on this since 2013. When we started Wikidata’s export facilities were limited and poorly documented, and we were not able to make much progress until Rainer Simon and I visited the Wikidata team in Berlin in April 2014. That enabled us to extract a subset of Wikidata entities consisting of all those with an associated geographical coordinate, and to put online a first experimental version of the PastPlace gazetteer which the Pelagios team have made some use of. Where we are at now is:
- We have just about finished creating a system for periodically downloading and ingesting new versions of Wikidata, which includes fairly complete entity attributes and instance-of information. As Wikidata has grown it has become more important to be able to filter out entities which have locations but aren’t really “places”, and this additional information will enable that filtering. One key issue is how to do this filtering, as far as possible, in a language-neutral way.
- We have also done a bunch of work creating both a PastPlace API and a user interface that, unlike Wikidata’s own interface, look and work like a gazetteer APU and UI; and in particular a UI which shows places on a map. The PastPlace user interface also uses the React framework developed by Facebook to provide a responsive web design which aims to work well on mobile devices as well as on PCs. NB embarrasingly, that interface currently does not work, but fixing it is a priority now that the revised ingest system is working.
- We aim to keep those interfaces running, but we won’t do a big public launch of PastPlace until there is more historical content. The current version inherits considerable historical content, including attested place names, from our Vision of Britain system, but of course that is only for Britain. We are working to integrate around 100,000 entries containing around 7m. words from the Gazetteer of the World: Or, Dictionary of Geographical Knowledge, published in Edinburgh in the 1850s, but we also want to add name attestations gathered by academic partners — following Pelagios principles, we aim to link in anything that can be supplied as an annotation containing a place name, the Wikidata ID of the entity it is a name for, and some kind of URI for the historical source.
My next posting should hopefully be about our first attempt at filtering, but it would be helpful to have comments from anyone who has used the existing unfiltered PastPlace gazetteer about what we should be trying to filter out. Another issue is what issues are best addressed by trying to alter Wikidata itself, and how this should work; this is certainly how issues of positional accuracy should be addressed.