Knowledge Extraction

Most information on the web is available in unstructured or semi-structured format. In many cases, it is desirable to convert this information into structured form (often so-called knowledge graphs) as this allows researchers and practitioners to easily query and re-use this information. I worked on several extraction projects – the most well-known ones are DBpedia and LinkedGeoData, which I briefly describe below.

DBpedia Extraction: DBpedia is a prominent extraction effort, in which information is extracted from more than 100 Wikipedia language editions containing several billion facts. The resulting knowledge graph is linked to more than 30 other datasets. DBpedia is used by a number of companies, such as BBC, IBM and New York Times. The core papers obtained awards at the Semantic Web Journal, Journal of Web Semantics, ISWC, ESWC and the Literati Network for Excellence. I am co-founder of the project (with Prof. Auer and Prof. Bizer), core contributor since 2007 and active DBpedia board member.

Figure: DBpedia extraction manager

LinkedGeoData / Query Rewriting: Another data extraction research effort I perform with my colleagues is LinkedGeoData, in which a spatial knowledge base is derived from the OpenStreetMap community project. We designed a virtual mapping approach that allows to rewrite an incoming SPARQL query into a single SQL query potentially containing virtual spatial predicates. At that time, this was novel and allowed us to scale to a dataset with more than 30 billion facts, more than 1000 updates per minute and a semi-automatically generated ontology.

Related Publications

Journal Articles

Oliveira, Italo Lopes; Fileto, Renato; Speck, René; Garcia, Luís Paulo F.; Moussallem, Diego; Lehmann, Jens

Towards holistic Entity Linking: Survey and directions Journal Article

In: Information Systems, 95 , pp. 101624, 2021.

BibTeX | Links:

Ismayilov, Ali; Kontokostas, Dimitris; Auer, Sören; Lehmann, Jens; Hellmann, Sebastian

Wikidata through the eyes of DBpedia Journal Article

In: Semantic Web, 9 (4), pp. 493–503, 2018.

BibTeX | Links:

Lehmann, Jens; Isele, Robert; Jakob, Max; Jentzsch, Anja; Kontokostas, Dimitris; Mendes, Pablo N.; Hellmann, Sebastian; Morsey, Mohamed; Kleef, Patrick van; Auer, Sören; Bizer, Christian

DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia Journal Article

In: Semantic Web, 6 (2), pp. 167–195, 2015, (SWJ Outstanding Paper Award 2014).

BibTeX | Links:

Martin, Michael; Stadler, Claus; Frischmuth, Philipp; Lehmann, Jens

Increasing the financial transparency of European Commission project funding Journal Article

In: Semantic Web, 5 (2), pp. 157–164, 2014.

BibTeX | Links:

Zaveri, Amrapali; Lehmann, Jens; Auer, Sören; Hassan, Mofeed M.; Sherif, Mohamed Ahmed; Martin, Michael

Publishing and interlinking the Global Health Observatory dataset - Towards increasing transparency in Global Health Journal Article

In: Semantic Web, 4 (3), pp. 315–322, 2013.

BibTeX | Links:

Morsey, Mohamed; Lehmann, Jens; Auer, Sören; Stadler, Claus; Hellmann, Sebastian

DBpedia and the live extraction of structured data from Wikipedia Journal Article

In: Program, 46 (2), pp. 157–181, 2012.

BibTeX | Links:

Bizer, Christian; Lehmann, Jens; Kobilarov, Georgi; Auer, Sören; Becker, Christian; Cyganiak, Richard; Hellmann, Sebastian

DBpedia - A crystallization point for the Web of Data Journal Article

In: J. Web Semant., 7 (3), pp. 154–165, 2009, (Journal of Web Semantics 2006-2010 Award).

BibTeX | Links:


Wilhelm, Nico; Collarana, Diego; Lehmann, Jens

A Virtual Knowledge Graph for Enabling Defect Traceability and Customer Service Analytics Inproceedings

In: The Semantic Web: ESWC 2021 Satellite Events - Virtual Event, June 6-10, 2021, Revised Selected Papers, pp. 245–248, Springer, 2021.

BibTeX | Links:

Esteves, Diego; Marcelino, José; Chawla, Piyush; Fischer, Asja; Lehmann, Jens

HORUS-NER: A Multimodal Named Entity Recognition Framework for Noisy Data Inproceedings

In: Advances in Intelligent Data Analysis XIX - 19th International Symposium on Intelligent Data Analysis, IDA 2021, Porto, Portugal, April 26-28, 2021, Proceedings, pp. 89–100, Springer, 2021.

BibTeX | Links:

Chau, Minh Triet; Esteves, Diego; Lehmann, Jens

A Neural-based model to Predict the Future Natural Gas Market Price through Open-domain Event Extraction Inproceedings

In: Proceedings of the 1st International Workshop on Cross-lingual Event-centric Open Analytics co-located with the 17th Extended Semantic Web Conference (ESWC 2020), Heraklion, Crete, Greece, June 3, 2020 (online event due to COVID-19 outbreak), pp. 17–31,, 2020.

BibTeX | Links:

Mulang, Isaiah Onando; Singh, Kuldeep; Prabhu, Chaitali; Nadgeri, Abhishek; Hoffart, Johannes; Lehmann, Jens

Evaluating the Impact of Knowledge Graph Context on Entity Disambiguation Models Inproceedings

In: CIKM '20: The 29th ACM International Conference on Information and Knowledge Management, Virtual Event, Ireland, October 19-23, 2020, pp. 2157–2160, ACM, 2020.

BibTeX | Links:

Frey, Johannes; Hofer, Marvin; Obraczka, Daniel; Lehmann, Jens; Hellmann, Sebastian

DBpedia FlexiFusion the Best of Wikipedia textgreater Wikidata textgreater Your Data Inproceedings

In: The Semantic Web - ISWC 2019 - 18th International Semantic Web Conference, Auckland, New Zealand, October 26-30, 2019, Proceedings, Part II, pp. 96–112, Springer, 2019.

BibTeX | Links:

Sakor, Ahmad; Mulang, Isaiah Onando; Singh, Kuldeep; Shekarpour, Saeedeh; Vidal, Maria-Esther; Lehmann, Jens; Auer, Sören

Old is Gold: Linguistic Driven Approach for Entity and Relation Linking of Short Text Inproceedings

In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 2336–2346, Association for Computational Linguistics, 2019.

BibTeX | Links:

Meester, Ben De; Dimou, Anastasia; Maroy, Wouter; Kontokostas, Dimitris; Verborgh, Ruben; Lehmann, Jens; Mannens, Erik; Hellmann, Sebastian

A Vocabulary-Independent Generation Framework for DBpedia and Beyond Inproceedings

In: Proceedings of the ISWC 2017 Posters & Demonstrations and Industry Tracks co-located with 16th International Semantic Web Conference (ISWC 2017), Vienna, Austria, October 23rd - to - 25th, 2017,, 2017.

BibTeX | Links:

Maroy, Wouter; Dimou, Anastasia; Kontokostas, Dimitris; Meester, Ben De; Verborgh, Ruben; Lehmann, Jens; Mannens, Erik; Hellmann, Sebastian

Sustainable Linked Data Generation: The Case of DBpedia Inproceedings

In: The Semantic Web - ISWC 2017 - 16th International Semantic Web Conference, Vienna, Austria, October 21-25, 2017, Proceedings, Part II, pp. 297–313, Springer, 2017.

BibTeX | Links:

Both, Andreas; Wauer, Matthias; García-Rojas, Alejandra; Hladky, Daniel; Lehmann, Jens

The GeoKnow Generator Workbench - An Integrated Tool Supporting the Linked Data Lifecycle for Enterprise Usage Inproceedings

In: Joint Proceedings of the Posters and Demos Track of 11th International Conference on Semantic Systems - SEMANTiCS 2015 and 1st Workshop on Data Science: Methods, Technology and Applications (DSci15) 11th International Conference on Semantic Systems - SEMANTiCS 2015, Vienna, Austria, September 15-17, 2015, pp. 92–95,, 2015.

BibTeX | Links:

Fossati, Marco; Kontokostas, Dimitris; Lehmann, Jens

Unsupervised learning of an extensive and usable taxonomy for DBpedia Inproceedings

In: Proceedings of the 11th International Conference on Semantic Systems, SEMANTICS 2015, Vienna, Austria, September 15-17, 2015, pp. 177–184, ACM, 2015.

BibTeX | Links:

Vaidya, Gaurav; Kontokostas, Dimitris; Knuth, Magnus; Lehmann, Jens; Hellmann, Sebastian

DBpedia Commons: Structured Multimedia Metadata from the Wikimedia Commons Inproceedings

In: The Semantic Web - ISWC 2015 - 14th International Semantic Web Conference, Bethlehem, PA, USA, October 11-15, 2015, Proceedings, Part II, pp. 281–289, Springer, 2015.

BibTeX | Links:

Stadler, Claus; Unbehauen, Jörg; Westphal, Patrick; Sherif, Mohamed Ahmed; Lehmann, Jens

Simplified RDB2RDF Mapping Inproceedings

In: Proceedings of the Workshop on Linked Data on the Web, LDOW 2015, co-located with the 24th International World Wide Web Conference (WWW 2015), Florence, Italy, May 19th, 2015,, 2015.

BibTeX | Links:

Knuth, Magnus; Lehmann, Jens; Kontokostas, Dimitris; Steiner, Thomas; Sack, Harald

The DBpedia Events Dataset Inproceedings

In: Proceedings of the ISWC 2015 Posters & Demonstrations Track co-located with the 14th International Semantic Web Conference (ISWC-2015), Bethlehem, PA, USA, October 11, 2015,, 2015.

BibTeX | Links:

Zaveri, Amrapali; Kontokostas, Dimitris; Sherif, Mohamed Ahmed; Bühmann, Lorenz; Morsey, Mohamed; Auer, Sören; Lehmann, Jens

User-driven quality evaluation of DBpedia Inproceedings

In: I-SEMANTICS 2013 - 9th International Conference on Semantic Systems, ISEM '13, Graz, Austria, September 4-6, 2013, pp. 97–104, ACM, 2013.

BibTeX | Links:

Stadler, Claus; Martin, Michael; Lehmann, Jens; Hellmann, Sebastian

Update Strategies for DBpedia Live Inproceedings

In: Proceedings of the Sixth Workshop on Scripting and Development for the Semantic Web, Crete, Greece, May 31, 2010,, 2010.

BibTeX | Links:

Hellmann, Sebastian; Stadler, Claus; Lehmann, Jens; Auer, Sören

DBpedia Live Extraction Inproceedings

In: On the Move to Meaningful Internet Systems: OTM 2009, Confederated International Conferences, CoopIS, DOA, IS, and ODBASE 2009, Vilamoura, Portugal, November 1-6, 2009, Proceedings, Part II, pp. 1209–1223, Springer, 2009.

BibTeX | Links:

Auer, Sören; Lehmann, Jens; Hellmann, Sebastian

LinkedGeoData: Adding a Spatial Dimension to the Web of Data Inproceedings

In: The Semantic Web - ISWC 2009, 8th International Semantic Web Conference, ISWC 2009, Chantilly, VA, USA, October 25-29, 2009. Proceedings, pp. 731–746, Springer, 2009.

BibTeX | Links:

Lehmann, Jens; Schüppel, Jörg; Auer, Sören

Discovering Unknown Connections - the DBpedia Relationship Finder Inproceedings

In: The Social Semantic Web 2007, Proceedings of the 1st Conference on Social Semantic Web (CSSW), September 26-28, 2007, Leipzig, Germany, pp. 99–109, GI, 2007.

BibTeX | Links:

Auer, Sören; Bizer, Christian; Kobilarov, Georgi; Lehmann, Jens; Cyganiak, Richard; Ives, Zachary G.

DBpedia: A Nucleus for a Web of Open Data Inproceedings

In: The Semantic Web, 6th International Semantic Web Conference, 2nd Asian Semantic Web Conference, ISWC 2007 + ASWC 2007, Busan, Korea, November 11-15, 2007, pp. 722–735, Springer, 2007, (ISWC 10 Year Best Paper Award).

BibTeX | Links:

Auer, Sören; Lehmann, Jens

What Have Innsbruck and Leipzig in Common? Extracting Semantics from Wiki Content Inproceedings

In: The Semantic Web: Research and Applications, 4th European Semantic Web Conference, ESWC 2007, Innsbruck, Austria, June 3-7, 2007, Proceedings, pp. 503–517, Springer, 2007, (ESWC 7-Year Most Influential Paper Award).

BibTeX | Links: