Over the past decade, vast amounts of machine-readable structured information have become available through the increasing popularity of semantic knowledge graphs in a variety of application domains. However, a major and yet unsolved challenge that research faces today is to perform scalable analytics, i.e. machine learning, inference and querying, of this knowledge while taking into account its rich semantic structures. Current analytics methods are, to our knowledge, either not fully aware of the semantics and structure of knowledge graphs or scale insufficiently.
The aim of this research line, in particular our SANSA project, is to research whether this severe limitation can be overcome by jointly leveraging results from distributed analytics and semantic technologies. To achieve this, SANSA will advance the state of the art by developing foundational models and algorithms in (1) data distribution techniques for semantic knowledge graphs, (2) semantics-aware distributed computation of resource embeddings in knowledge graphs, (3) adaptive distributed querying, (4) efficient self-optimising inference execution plans and (5) distributed symbolic machine learning approaches. These advancements will be implemented as a semantic analytics stack which uses distributed in-memory computing models as the foundation and includes further layers for (1) knowledge distribution and representation, (2) querying and inference as well as (3) machine learning. By design, each layer will be both semantics aware and horizontally scalable.
The synthesis of the above advancements can enable powerful analytics which impact on several application areas, including life sciences (e.g. improved therapy response prediction), media and publishing (e.g. entity resolution and semantic querying) and the internet of things (e.g. smart meter optimisation, traffic pattern detection).
Scalable Semantic Analytics Stack (SANSA)