Knowledge Extraction
Most information on the web is available in unstructured or semi-structured format. In many cases, it is desirable to convert this information into structured form (often so-called knowledge graphs) as this allows researchers and practitioners to easily query and re-use this information. I worked on several extraction projects – the most well-known ones are DBpedia and LinkedGeoData, which I briefly describe below.
DBpedia Extraction: DBpedia is a prominent extraction effort, in which information is extracted from more than 100 Wikipedia language editions containing several billion facts. The resulting knowledge graph is linked to more than 30 other datasets. DBpedia is used by a number of companies, such as BBC, IBM and New York Times. The core papers obtained awards at the Semantic Web Journal, Journal of Web Semantics, ISWC, ESWC and the Literati Network for Excellence. I am co-founder of the project (with Prof. Auer and Prof. Bizer), core contributor since 2007 and active DBpedia board member.
Figure: DBpedia extraction manager
LinkedGeoData / Query Rewriting: Another data extraction research effort I perform with my colleagues is LinkedGeoData, in which a spatial knowledge base is derived from the OpenStreetMap community project. We designed a virtual mapping approach that allows to rewrite an incoming SPARQL query into a single SQL query potentially containing virtual spatial predicates. At that time, this was novel and allowed us to scale to a dataset with more than 30 billion facts, more than 1000 updates per minute and a semi-automatically generated ontology.
Related Publications
Journal Articles
Towards holistic Entity Linking: Survey and directions Journal Article
In: Information Systems, 95 , pp. 101624, 2021.
Wikidata through the eyes of DBpedia Journal Article
In: Semantic Web, 9 (4), pp. 493–503, 2018.
DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia Journal Article
In: Semantic Web, 6 (2), pp. 167–195, 2015, (SWJ Outstanding Paper Award 2014).
Increasing the financial transparency of European Commission project funding Journal Article
In: Semantic Web, 5 (2), pp. 157–164, 2014.
Publishing and interlinking the Global Health Observatory dataset - Towards increasing transparency in Global Health Journal Article
In: Semantic Web, 4 (3), pp. 315–322, 2013.
DBpedia and the live extraction of structured data from Wikipedia Journal Article
In: Program, 46 (2), pp. 157–181, 2012.
DBpedia - A crystallization point for the Web of Data Journal Article
In: J. Web Semant., 7 (3), pp. 154–165, 2009, (Journal of Web Semantics 2006-2010 Award).
Inproceedings
A Virtual Knowledge Graph for Enabling Defect Traceability and Customer Service Analytics Inproceedings
In: The Semantic Web: ESWC 2021 Satellite Events - Virtual Event, June 6-10, 2021, Revised Selected Papers, pp. 245–248, Springer, 2021.
HORUS-NER: A Multimodal Named Entity Recognition Framework for Noisy Data Inproceedings
In: Advances in Intelligent Data Analysis XIX - 19th International Symposium on Intelligent Data Analysis, IDA 2021, Porto, Portugal, April 26-28, 2021, Proceedings, pp. 89–100, Springer, 2021.
A Neural-based model to Predict the Future Natural Gas Market Price through Open-domain Event Extraction Inproceedings
In: Proceedings of the 1st International Workshop on Cross-lingual Event-centric Open Analytics co-located with the 17th Extended Semantic Web Conference (ESWC 2020), Heraklion, Crete, Greece, June 3, 2020 (online event due to COVID-19 outbreak), pp. 17–31, CEUR-WS.org, 2020.
Evaluating the Impact of Knowledge Graph Context on Entity Disambiguation Models Inproceedings
In: CIKM '20: The 29th ACM International Conference on Information and Knowledge Management, Virtual Event, Ireland, October 19-23, 2020, pp. 2157–2160, ACM, 2020.
DBpedia FlexiFusion the Best of Wikipedia textgreater Wikidata textgreater Your Data Inproceedings
In: The Semantic Web - ISWC 2019 - 18th International Semantic Web Conference, Auckland, New Zealand, October 26-30, 2019, Proceedings, Part II, pp. 96–112, Springer, 2019.
Old is Gold: Linguistic Driven Approach for Entity and Relation Linking of Short Text Inproceedings
In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 2336–2346, Association for Computational Linguistics, 2019.
A Vocabulary-Independent Generation Framework for DBpedia and Beyond Inproceedings
In: Proceedings of the ISWC 2017 Posters & Demonstrations and Industry Tracks co-located with 16th International Semantic Web Conference (ISWC 2017), Vienna, Austria, October 23rd - to - 25th, 2017, CEUR-WS.org, 2017.
Sustainable Linked Data Generation: The Case of DBpedia Inproceedings
In: The Semantic Web - ISWC 2017 - 16th International Semantic Web Conference, Vienna, Austria, October 21-25, 2017, Proceedings, Part II, pp. 297–313, Springer, 2017.
The GeoKnow Generator Workbench - An Integrated Tool Supporting the Linked Data Lifecycle for Enterprise Usage Inproceedings
In: Joint Proceedings of the Posters and Demos Track of 11th International Conference on Semantic Systems - SEMANTiCS 2015 and 1st Workshop on Data Science: Methods, Technology and Applications (DSci15) 11th International Conference on Semantic Systems - SEMANTiCS 2015, Vienna, Austria, September 15-17, 2015, pp. 92–95, CEUR-WS.org, 2015.
Unsupervised learning of an extensive and usable taxonomy for DBpedia Inproceedings
In: Proceedings of the 11th International Conference on Semantic Systems, SEMANTICS 2015, Vienna, Austria, September 15-17, 2015, pp. 177–184, ACM, 2015.
DBpedia Commons: Structured Multimedia Metadata from the Wikimedia Commons Inproceedings
In: The Semantic Web - ISWC 2015 - 14th International Semantic Web Conference, Bethlehem, PA, USA, October 11-15, 2015, Proceedings, Part II, pp. 281–289, Springer, 2015.
Simplified RDB2RDF Mapping Inproceedings
In: Proceedings of the Workshop on Linked Data on the Web, LDOW 2015, co-located with the 24th International World Wide Web Conference (WWW 2015), Florence, Italy, May 19th, 2015, CEUR-WS.org, 2015.
The DBpedia Events Dataset Inproceedings
In: Proceedings of the ISWC 2015 Posters & Demonstrations Track co-located with the 14th International Semantic Web Conference (ISWC-2015), Bethlehem, PA, USA, October 11, 2015, CEUR-WS.org, 2015.
User-driven quality evaluation of DBpedia Inproceedings
In: I-SEMANTICS 2013 - 9th International Conference on Semantic Systems, ISEM '13, Graz, Austria, September 4-6, 2013, pp. 97–104, ACM, 2013.
Update Strategies for DBpedia Live Inproceedings
In: Proceedings of the Sixth Workshop on Scripting and Development for the Semantic Web, Crete, Greece, May 31, 2010, CEUR-WS.org, 2010.
DBpedia Live Extraction Inproceedings
In: On the Move to Meaningful Internet Systems: OTM 2009, Confederated International Conferences, CoopIS, DOA, IS, and ODBASE 2009, Vilamoura, Portugal, November 1-6, 2009, Proceedings, Part II, pp. 1209–1223, Springer, 2009.
LinkedGeoData: Adding a Spatial Dimension to the Web of Data Inproceedings
In: The Semantic Web - ISWC 2009, 8th International Semantic Web Conference, ISWC 2009, Chantilly, VA, USA, October 25-29, 2009. Proceedings, pp. 731–746, Springer, 2009.
Discovering Unknown Connections - the DBpedia Relationship Finder Inproceedings
In: The Social Semantic Web 2007, Proceedings of the 1st Conference on Social Semantic Web (CSSW), September 26-28, 2007, Leipzig, Germany, pp. 99–109, GI, 2007.
DBpedia: A Nucleus for a Web of Open Data Inproceedings
In: The Semantic Web, 6th International Semantic Web Conference, 2nd Asian Semantic Web Conference, ISWC 2007 + ASWC 2007, Busan, Korea, November 11-15, 2007, pp. 722–735, Springer, 2007, (ISWC 10 Year Best Paper Award).
What Have Innsbruck and Leipzig in Common? Extracting Semantics from Wiki Content Inproceedings
In: The Semantic Web: Research and Applications, 4th European Semantic Web Conference, ESWC 2007, Innsbruck, Austria, June 3-7, 2007, Proceedings, pp. 503–517, Springer, 2007, (ESWC 7-Year Most Influential Paper Award).