Dear All,
Apologies for only now being able to have a near final version of this proposal. Unfortunately it will not leave us much time to discuss it… Anyway, I hope the proposal is aligned with the objectives of Pelagios and the Linked Pasts SIG.
Kind regards,
Nuno
Proposal description:
The Europeana Geoparser is a service that uses information extraction techniques to automatically identify names of places and time periods that are mentioned in unstructured text. It works based on a Gazetteer to be able to assign coordinates and dates with the mentioned names of places and periods. The target users are Europeana, aggregators and data providers, which need to enrich the object descriptions by analyzing geographic or temporal references in existing metadata records.
The Geoparser was developed during several years in series of projects at INESC-ID, and adapted for specialization in cultural heritage metadata records, during the Europeana Connect project, a best practices network that ran from May 2009 until October 2011. The Geoparser web service was maintained by Europeana and used by many of its aggregators in the following years, until changes in the technological environment (namely, in the Geonames API, and in the change in metadata format of Europeana – to the Europeana Data Model< or EDM) made the service inoperational in 2014, and without financial support for its adaptation.
Several research efforts within the information retrieval community have addressed the general topic of geographic text analysis, for instance by developing versatile and comprehensive methodologies for mapping natural language expressions, given over textual documents, describing locations, orientations and paths, into the geographic entities they refer to. However, the Geoparser was unique in its approach to metadata records, in which we find a scenario where unstructured text exists within structured data. This provides a richer context that can be used for supporting the recognition and resolution of place names within the unstructured text. The value of its approach is supported by both its usage (in the Europeana Network and in DARIAH), and also by peer-reviewed publication.
The cooperation between Pelagios Commons and Europeana may now provide a favourable setting for sustaining the development and maintenance of the Geoparser service. This focused project aims to repurpose the software of the Geoparser, which is without maintenance since 2011 and currently inoperative.
The project will undertake the following tasks:
<li style=”font-weight: 400;”>Reorganization of the software source code according to current open source best practices, and requirements from Pelagios Commons.
<li style=”font-weight: 400;”>Setup of a GitHub repository for hosting the software and supporting its open sourced development.
<li style=”font-weight: 400;”>Replacement of the underlying gazetteer (based on Geonames), whose implementation is currently broken, by a functionality that supports any Pelagios commons gazetteer.
<li style=”font-weight: 400;”>Redesign of the Geoparser functionality and API for general usage by the Pelagios Commons community. The original functionality was overcoupled to the requirements of Europeana and its API was designed for the specific metadata format in use, in 2011, in the Europeana Network: Europeana Semantic Elements (ESE).
<li style=”font-weight: 400;”>APIs and requirements of other tools in Pelagios will be analyzed for their support in the Geoparser, thus allowing integration of the Geoparser with other Pelagios tools. For example in the support of geo-resolution in tools such as Recogito, for a variety of digital object types.
<li style=”font-weight: 400;”>Deployment of at demonstration instance of the service. Europeana has offered to provide the hosting for the Geoparser service, without costs for the project. After the end of the project, Europeana will maintain the demonstration service online, and it will be reachable from the Apps section of Europeana Labs.
The results of this focused project have the potential to reach a wide audience and promote the usage of Pelagios resources. Metadata descriptions are key resources for the discoverability of geographically referenced digital objects, and widely used in the Pelagios and Europeana communities. By leveraging the Europeana Network of cultural heritage institutions, and the metadata aggregation of Europeana itself, a potentially very large number of digital resources metadata may become linked to Pelagios. In addition, the promotion of the usage of cultural heritage resources in research infrastructures undertaken by Europeana, will further promote Pelagios resources, close to the digital humanities research community.
The software will be made available under an open Apache 2.0 License.
-
This topic was modified 2 years, 8 months ago by
Nuno Freire.