Knowledge Extraction

Most information on the web is available in unstructured or semi-structured format. In many cases, it is desirable to convert this information into structured form (often so-called knowledge graphs) as this allows researchers and practitioners to easily query and re-use this information. I worked on several extraction projects – the most well-known ones are DBpedia and LinkedGeoData, which I briefly describe below.

DBpedia Extraction: DBpedia is a prominent extraction effort, in which information is extracted from more than 100 Wikipedia language editions containing several billion facts. The resulting knowledge graph is linked to more than 30 other datasets. DBpedia is used by a number of companies, such as BBC, IBM and New York Times. The core papers obtained awards at the Semantic Web Journal, Journal of Web Semantics, ISWC, ESWC and the Literati Network for Excellence. I am co-founder of the project (with Prof. Auer and Prof. Bizer), core contributor since 2007 and active DBpedia board member.

Figure: DBpedia extraction manager

LinkedGeoData / Query Rewriting: Another data extraction research effort I perform with my colleagues is LinkedGeoData, in which a spatial knowledge base is derived from the OpenStreetMap community project. We designed a virtual mapping approach that allows to rewrite an incoming SPARQL query into a single SQL query potentially containing virtual spatial predicates. At that time, this was novel and allowed us to scale to a dataset with more than 30 billion facts, more than 1000 updates per minute and a semi-automatically generated ontology.

