Linked Data Annotation Without the Pointy Brackets
Note: the ‘Learn More’ link on recogito.pelagios.org temporarily redirects here, while we’re still working on an introductory tutorial for Recogito.
Since around February, we’ve been working behind the scenes on a new version of our annotation tool Recogito. For those who haven’t heard about it yet: Recogito is a Web application we developed for the purpose of annotating geographic references in texts and scanned maps. The new version will be released officially this December. If you are daring enough, however, we warmly invite you to explore the development version right now.
During the Pelagios 3 project (completed in August 2015), Recogito helped us make significant progress. We tagged more than 120.000 references to places in about 200 documents from the Latin, Greek, European medieval, maritime, and early Islamic and Chinese traditions with it; and aligned about half of them to historical gazetteers like Pleiades or PastPlace. We also ran public annotation workshops which did not just produce great results, but were also a lot of fun.
Our work with Recogito also revealed though certain limitations. For example, while Recogito had always been Web-based (and thus usable remotely by different people at the same time) it was never really very collaborative. Working with it required assistance by a member of the core project team – to set up accounts, upload documents, assign them to collections, etc. Provenance tracking (i.e. who contributed what to an annotation) was very basic. Forming teams and managing access to specific documents for specific users wasn’t possible. Restoring annotated documents to a previous state in time (just in case those student contributions from the other day weren’t so great after all) involved some non-trivial data-wrangling.
Likewise, feature set and data models strongly reflected the tasks we aimed to complete in the project. Yes, Recogito had some fancy things like automatic transcription support powered by a specialist gazetteer so we could crunch through our Maritime tradition maps faster. But as a feature, it wouldn’t have been universally applicable to other content. Or what if someone just wanted to do something trivial, like mark a dot on an image and attach a written note? Sorry – not on the feature list. Or record multiple alternative readings in one annotation? Again, sorry. If it wasn’t already down on our tight, project-specific requirements list for Recogito version 1, we couldn’t do it.
Semantic Annotation: a Better Workflow
However, the more feedback we got from outside the core project team, the more we felt that there were aspects about our workflow which addressed more general unmet needs. For example, a fundamental design choice that was received positively was that every automated step (like Named Entity Recognition) must always require human verification. Whenever such verification was missing, this would be prominently displayed visually. Most importantly, though, the very reason we built Recogito was that we needed to be more productive. We felt that existing tools for semantic annotation (i.e. the task of linking document fragments to terms in controlled vocabularies – like gazetteers) were still lacking the simplicity, tractability, and efficiency to make the process fast and user-friendly enough – even (or especially!) for novice and non-technical users. We envisioned a tool that would let users focus on the documents, rather than have to think about how to properly use the technology, while at the same time introducing the intricacies of URIs, RDF properties and Linked Data to those documents without users even knowing it – hence our motto of Linked Data annotation without the pointy brackets.
We certainly don’t want to claim Recogito has reached all those goals – version 1 certainly hasn’t. Nevertheless, since February, this overall vision has guided our rethinking of many essential features, including the user interface. But not just rethinking, of course. (It would be wonderful if that were enough.) Also rebuilding. By now, the codebase of Recogito version 2, hosted on GitHub, has become a neatly-sized software project of around 28.000 lines of code (split roughly even across server-side/backend code and user interface functionality). We’re still some time away from our first offical release in December. Even so we think we’ve made good progress over these past months in untangling many of Recogito’s more specialized and clunky aspects, and in turning it into the beginnings of something that acts and feels like the open, collaboration and useful work environment that we’ve always wanted.
Working with Text
The new Recogito text annotation interface was completely re-built from scratch. We think we managed (or will manage 😉 ) to incorporate all the feature requests we gathered from our users so far; and the new interface seems to deal nicely even with very-well annotated texts of 1000+ annotations (see example here). Presently, plain text (.txt) files are the only supported import format, just like was the case for Recogito v.1. However, we plan to support import of TEI documents as well. Officially, this functionality is scheduled for October 2017. But we have a working proof of concept already now, based on the excellent CETEIcean library.
— Rainer Simon (@aboutgeo) July 13, 2016
Images and IIIF
Most of the essentials needed to annotate high-resolution zoomable images in Recogito v.2 are done as well. This time round, most things also work on iOS and Android tablets (our original “tilted box” selection tool, which will be familiar to our current users, still needs porting to touch). There’s a new convenient fullscreen mode that can be switched on and off, and which maximizes the size of the image canvas – particularly useful on tablets, where screenspace is a bit scarcer. Be sure to take a look (and try out the multi-touch based zoom and free rotation!) – e.g. on this example.
We’ve also been paying close attention to the activities around IIIF, an effort to promote and standardize access to high-resolution imagery on the Web, particularly in the Cultural Heritage field. Support for IIIF is also scheduled on our roadmap, and I’m happy to announce that our first integration trials (making use of KlokanTech’s great Open Source IIIF viewer code) have already been successful – expect more news to come in due time.
— Rainer Simon (@aboutgeo) September 2, 2016
Recogito as an extensible platform
For the time being, our main priority is to get everything up and running smoothly for our first release in December. Ultimately, however, we’d like to see Recogito as more than just the code that sits behind our site at recogito.pelagios.org. Recogito is, of course, Open Source software. So, first of all, everyone is free to set up their own copy, on their own server. (Check our Readme on GitHub to learn how it’s done.)
Moreover, we want to evolve Recogito into an extensible framework – a platform that takes care of the mundane formalities of annotation: storage, versioning, recording provenance and activity metrics, managing documents and access rights, handling data transformation, import and export. At the same time, it would provide the necessary hooks to plug in extra functionality as needed, so as to provide the right, tailor-made work environment for specific use cases. Examples could be alternative Named Entity Recognition Engines; additional fields in the annotation editor popup box; or a connector to an existing document repository. We are only starting to identify what would make useful extension points (and how to best design them). But if you have any thoughts now – or would even like to work with us on developing them – do get in touch.
Now Open for Alpha Testing
But before we wander off too far into the future that’s beyond our current development horizon, let’s return to the present. We do have some way to go until December, but we’re open for alpha testing right now! Do pay us a visit at recogito.pelagios.org. Explore, play – and let us know what you think. You can reach us via GitHub or the Recogito Users Group on this site.