3rd Summer Datathon on Linguistic Linked Open Data (SD-LLOD-19)

The 3rd Summer Datathon on Linguistic Linked Open Data (SD-LLOD-19) will be held from May 12th to 17th, 2019 at Schloss Dagstuhl – Leibniz Center for Informatics, Wadern, Germany.

The SD-LLOD datathon has the main goal of giving people from industry and academia practical knowledge in the field of Linked Data and its application to natural language data and natural language annotations, from areas as diverse as knowledge engineering, lexicography, the language sciences, natural language processing and computational philology.

The final aim is to allow participants to develop their own use cases, i.e., migrate their own (or other’s) linguistic data and publish them as Linked Data on the Web. The datathon series is unique in its topic worldwide and continues a series of bi-annual datathons on Linguistic Linked Open Data organized since 2012. The 2019 edition is organized in conjunction with and held before the 2nd International Conference on Language, Data and Knowledge (LDK-2019, May 20th-22th, Leipzig, Germany). The 2019 edition is supported by the Research Group "Linked Open Dictionaries (LiODi)" funded by the German Federal Ministry of Education and Research (BMBF), the H2020 Research and Innovation Action Prêt-à-LLOD. Ready-to-use Multilingual Linked Language Data for Knowledge Services across Sectors and the H2020 Research and Innovation Action ELEXIS. European Lexicographic Infrastructure.


During the datathon, participants will:

  • Generate and publish their own linguistic linked data from some existing data sources or existing tools.
  • Apply Linked Data principles and Semantic Web technologies (Ontologies, RDF, Linked Data) into the field of language resources.
  • Use the principal models for representing Linguistic Linked Data, in particular Ontolex-Lemon and Web Annotation and knowledge representation vocabularies such as SKOS and OWL.
  • Gather experiences with terminology resources developed for or used in the Linguistic Linked Open Data context, such as lexvo, lexinfo, OLiA and GOLD.
  • Learn about multilingual knowledge bases and entity linking against resources from the Web of Data, e.g., DBpedia or BabelNet.
  • Learn about potential benefits and applications of linguistic linked data for specific use cases.


During the datathon, seminars will be organised to cover topics such as:

  • Ontologies and Linked Data
  • The Lexicon Model for Ontologies (Ontolex-Lemon)
  • Integrating documents, annotations and NLP tools NLP with Linked Data and RDF using Web Annotation and NIF
  • Guidelines for RDF generation of Language Resources
  • Methodologies for Linked Data publication of Language Resources
  • Multilingual Word Sense Disambiguation and Entity Linking
  • Use and Applications of Linguistic Linked Data
  • Metadata and Licenses for Linguistic Linked Data

With the objective of avoiding passive learning, the program of the summer datathon will contain three types of sessions:

      1. Seminars to show novel aspects and discuss selected topics
      2. Practical sessions to introduce the basic foundations of each topic, methods, and technologies and where participants will perform different tasks using the methods and technologies presented
      3. Hacking sessions where participants will follow the whole process of generating and publishing Linguistic Linked Data with some existing data set

Participants will be invited to propose a “miniproject” related to the topic and to bring to the datathon some dataset of linguistic data produced by their organizations in order to work on it during the hacking sessions and transform it into linked data. Participants who cannot provide their own linguistic dataset can join another’s miniproject or some of the ones proposed by the organisers. There will be an award to the best miniproject.

Participants should bring their own laptops to follow the hacking sessions, but they will be provided with digital copies of all the material used during the course and will have assistance for installing all the required software.


Christian Chiarcos

Applied Computational Linguistics (ACoLi) Lab, Goethe Universität Frankfurt

As Professor of Computer Science at Goethe University Frankfurt, Germany, I am heading the Applied Computational Linguistics (ACoLi) lab since 2013, and the research group "Linked Open Dictionaries (LiODi)" since 2015. My research focuses on semantic technologies, including computational semantics as well as the innovative application of Linked Data formalisms to problems and resources in NLP and Digital Humanities.

For further information, please visit my website

John Philip McCrae

Insight Centre for Data Analytics, NUI Galway

I am a lecturer above-the-bar at the Insight Centre for Data Analytics at the National University of Ireland Galway. I am currently working with Paul Buitelaar in the Unit for Natural Language Processing. My main research has focused around the development of linguistic linked open data and in particular the development of models for the representation of lexical resources, by means of the lemon and OntoLex models.

For further information, please visit my personal website

Jorge Gracia

Aragon Institute of Engineering Research (I3A), University of Zaragoza

I am an assistant professor at the Department of Computer Science and Systems Engineering (University of Zaragoza, Spain). I develop my research activities at the Distributed Information Systems group, belonging to the Aragon Institute of Engineering Research (I3A). My main research interests are Semantic Web, Ontology Matching, Multilingual Web of Data, Query Interpretation, and Linguistic Linked Data.

For further information, please visit my personal website

Local Organisers

Local organization is handled by the members of the Research Group “Linked Open Dictionaries (LiODi)”, funded by the German Federal Ministry for Education and Science (BMBF).

Monika Rind-Pawlowski

Research Group ``Linked Open Dictionaries''
Goethe Universität Frankfurt am Main, Germany

Hasmik Sargsian

Research Group ``Linked Open Dictionaries''
Goethe Universität Frankfurt am Main, Germany

Jesse Wichers Schreur

Research Group ``Linked Open Dictionaries''
Goethe Universität Frankfurt am Main, Germany

How to Apply

We welcome participants from anywhere in the world and coming from industry or academia. Some basic acquaintance with software development and Web technologies is recommended. Participants are expected to participate fully in the activities of the datathon until its conclusion.

Fees: All SD-LLOD activities, including lecturers, tutors, teaching materials and social activities are sponsored by supporting research projects. Participants only cover the expenses for their stay, which are handled directly with the venue, Schloss Dagstuhl. Participants should stay for the entire duration of the event.

  • single room: 350 EUR total, full board
  • shared room:* 275 EUR total, full board
  • (*single room with shared restroom, upon availability)

Note that datathon participants are entitled to a reduced attendance fee to the 2nd Conference on Language, Data and Knowledge (LDK-2019), Leipzig, Germany, May 20-22, 2019, please see the LDK registration page for details.

Registration: Registration is now open. You can apply by sending an email to datathon@linguistic-lod.org. Please provide a short description of

  • research interests and affiliation
  • proposal of a mini-project [optional, as attachment, see below]
  • dietary preferences [optional]
  • accomodation preferences [single room / shared room]

Potential participants will make an application until April, 4 2019. You will receive the selection result at April, 12th 2019.

Participants are encouraged to contribute to the datathon with their own data, their own research and their own challenges. If you want to propose a topic for a mini-project in the datathon (e.g., a language resource to be converted into linked data, a LLOD dataset to be linked to other resources, a use case description that exploits the LLOD cloud, ...) or want to report on some recent research related to the topics of the datathon, you can write a short description of your ideas (less than 1000 words), to be sent to the organisers via email as part of the registration process by 4th April. Selected mini-project proposals and abstracts will be highlighted and presented during the event.

Important Dates

Registration opens: January, 18th 2019
Registration closes: April, 4th 2019
Notification: April, 12th 2019
Datathon: May, 12th to 17th 2019
Payment: onsite only, upon arrival, starting May, 12th 2019, 15:00 CEST

Invited speakers

Gerard de Melo

Director of the Deep Data Lab, Department of Computer Science
Rutgers University, New Jersey

Gerard de Melo is an Assistant Professor at Rutgers University (NJ, USA), where he heads the Deep Data Lab. Over the years, he has published over 100 papers on natural language processing, AI, and Big Data analytics, with Best Paper/Demo awards at WWW 2011, CIKM 2010, ICGL 2008, and the NAACL 2015 Workshop on Vector Space Modeling. Notable research projects include Lexvo.org, FrameBase.org, the Universal WordNet, and the Etymological WordNet. Prior to joining Rutgers, he was a faculty member at Tsinghua University and a Post-Doctoral Research Scholar at ICSI/UC Berkeley. He received his doctoral degree at the Max Planck Institute for Informatics.

For further information, please consult his website

Richart Eckart de Castilho

Ubiquitous Knowledge Processing (UKP) Lab, Department of Computer Science
Technische Universität Darmstadt, Germany

Dr. Richard Eckart de Castilho is a senior research at the Ubiquitous Knowledge Processing Lab, TU Darmstadt. He is interested in architectures, tools and infrastructures for the automatic and interactive analysis of text data. Richard is presently building a next-generation text annotation platform as a PI in the DFG-funded project INCEpTION, member of the CEDIFOR Digital Humanities Centre, member of the Apache Software Foundation, as well as the maintainer of DKPro Core, WebAnno, Apache uimaFIT and involved in various other open source projects related to NLP.

For further information, please consult his website



The Summer Datathon on Linguistic Linked Open Data (SD-LLOD-19) will be held from May 12th to 17th, 2019 at Schloss Dagstuhl – Leibniz Center for Informatics, in Wadern, Germany. Schloss Dagstuhl is a prominent location for workshops and meetings in computer science situated in an idyllic rural environment near the French and Luxembourg borders, and easily reachable via the cities of Mainz, Saarbrücken or Trier, resp. the airports Frankfurt am Main (FRA), Frankfurt-Hahn (HHN), Saarbrücken (SCN) or Luxembourg (LUX). For details on getting there, please see the Schloss Dagstuhl arrival information.

Schloss Dagstuhl photo (c) L. Sieht, Wikipedia, CC-BY 3.0


Sun 12/5 Mon 13/5 Tue 14/5 Wed 15/5 Thu 16/5 Fri 17/5
07:30 - 08:45 breakfast breakfast breakfast breakfast breakfast
09:00 - 09:30 Welcome Presentation of participant groups Practical Session: Generating & Publishing Language Resources Seminar: Metadata Seminar: OntoLex extensions
09:30 - 10:00 Introduction: Linguistic Linked Open Data Invited Talk: Richard Eckart de Castilho Practical Session: Metadata Practical Session: OntoLex extensions
10:00 - 10:30 Datathon
10:30 - 11:00 break break break break break
11:00 - 11:30 Practical Session: Linguistic Linked Open Data Seminar: OntoLex-lemon Practical Session: Linking Datasets Datathon Feedback and Review Session
11:30 - 12:00
12:00 - 12:30 lunch lunch lunch lunch lunch
12:30 - 13:00
13:00 - 13:30 daily report (tutors only) daily report (tutors only) daily report (tutors only) Datathon Presentations
13:30 - 14:00 Seminar: Ontologies Seminar: Annotations & NLP Datathon Datathon
14:00 - 14:30
14:30 - 15:00 Practical Session: Ontologies Practical Session: SPARQL & CoNLL-RDF Invited Talk: Gerard de Melo
15:00 - 15:30 Arrival & Registration
15:30 - 16:00 coffee break coffee break coffee break coffee break coffee break
16:00 - 16:30 Installfest: Technical Setup (all participants) Participant's minute madness Datathon Excursion (Trier) Datathon Conclusion & Awards
16:30 - 17:00
17:00 - 17:30 Group Formation & Project Selection Departure: 17:00
17:30 - 18:00
18:00 - 18:30 Dinner & Icebreaking Dinner Dinner Dinner
18:30 - 19:00
19:00 - 19:30 Icebreaking Session Datathon Excursion Social Evening
19:30 - 20:00 Conference Dinner
20:00 - 20:30


By default, we meet in Lecture Room Saarbrücken (LH-Sb).

acronymnamebuildinglecturespractical sessiondatathon sessionsmini project (tutor)
LH-SbLecture Hall Saarbrückennew buildingXXX6 (Sina) & 7 (Julia)
LH-KlLecture Hall Kaiserslauternold building_X*X1 (Thierry)
S006Cafeteriaold building_X**X***3 (Andon)
S104S104old building__X4 (Bettina)
S003S003old building__X2 (Christian F.)
NewsNews Room / Wappensaalold building__X8 (Alessandro)
TrierRoom Trierold building__X5 (Max)
* if necessary, t.b.a. in the session before (Mon-Thu),
** instead of LH-Kl, Friday (Fri) only,
*** if occupied, please move to LH-Kl (Mon-Thu)

About LLOD and the SD-LLOD datathon series

In natural language processing, linguistics, and neighboring fields, Linguistic Linked Open Data (LLOD) describes a method and an interdisciplinary community concerned with creating, sharing and (re-)using language resources in accordance with Linked Data principles. The Linguistic Linked Open Data cloud was conceived and is maintained by the Open Linguistics Working Group (OWLG) of Open Knowledge International, and has been a point of focal activity for several W3C community groups, research projects and infrastructure efforts since then.

To a large extent, LLOD development has been driven forward by international workshops and accompanying hackathons, as organized, for example, in the context of workshops on Multilingual Linked Open Data for Enterprises in 2012 and 2014 in Leipzig, Germany. Since 2015, these are organized in the form of bi-annual summer schools: The first Summer Datathon on Linguistic Linked Open Data (SD-LLOD’15) was held in June 2015 in Cercedilla, Madrid, Spain, as was the second Summer Datathon on Linguistic Linked Open Data (SD-LLOD’17) in July 2017. The 2019 edition is organized in conjunction with and held before the 2nd International Conference on Language, Data and Knowledge (LDK-2019, May 20th-22th, Leipzig, Germany).

Notable outcomes of earlier datathon editions include the first installment of the LLOD cloud and the LLOD cloud diagram (as a result of MLODE-2012), a large number of converted resources, and numerous scientific publications, and thesis projects that build on successful mini-projects, experiments or case studies conducted at or initiated during the previous SD-LLOD datathon.