Natural Language Processing for Cultural Heritage Domains
Museums, archives, libraries and other cultural heritage institutes maintain large collections of artefacts, which are valuable knowledge sources for both experts and interested lay persons. Recently, more and more cultural heritage institutes have started to digitise their collections, for instance to make them accessible via web portals. However, while digitisation is a necessary first step towards improved information access, to fully unlock the knowledge contained in these collections, users have to be able to easily browse, search and query these collections. This requires cleaning, linking and enriching the data, a process that is often too time-consuming to be performed manually. Information technology can help with (partially) automating this task. Because data processing and enrichment typically involve the textual metadata level, natural language processing has a key role to play in this endeavour. At the same time, cultural heritage domains pose significant challenges for language technology and call for the development of very robust and flexible solutions. Consequently, cultural heritage data can also serve as a good test-bed for the development of robust natural language processing tools.