Since around 2011, I work on question answering over knowledge graphs (KGQA). In a narrow sense question answering over knowledge graphs is the task of translating a natural language question into a formal query (often SPARQL). Question answering has many interesting applications, e.g., in call centres, production systems, the automotive sector, home entertainment and smart assistants.
You can have a look at our introduction to neural question answering over knowledge graphs to find out more about the field if you are interested. I pursue the development of two classes of methods in this field:
1.) The first class of methods is based on “traditional” natural language processing (NLP) pipelines, which translate the natural language query via a sequence of steps into an intermediate form.
Figure: Typical KGQA processing pipeline starting from the input question (left) to the detection of entities and relation (= shallow parsing), linking to the knowledge graph and finally query building. Often, the answer is then verbalised and returned to the user. (The typo “Barak” is intentional.)
This intermediate form is then converted into a formal query. In initial work (published at WWW 2012), we translated natural language input into a first order logic representation that was subsequently aligned with the actual knowledge graph. This can be combined with active supervised machine learning for capturing user feedback as we have shown previously (ESWC 2011). Later, we used a normalised query structure to be more robust against different paraphrases of the same underlying query (ESWC 2016). We also improved the state of the art in individual steps of typical QA pipelines. For example in the EARL approach (ISWC 2018), we perform joint entity and relation linking of the question into by casting the task into a generalised traveling salesman problem (GTSP) on the so called subdivision graph. This way, we can use existing GTSP solver, which can then derive an optimal (w.r.t. the problem formulation) solution when there are multiple candidates for entities and relations in the question. In later work, we developed a deep reinforcement learning approach for shallow parsing that can work with a delayed reward signal from the linker. Since this approach can be fine-tuned towards question (rather than generic text), coarse-grained annotations (compared to POS taggers) and the specific knowledge graph, it achieved state-of-the-art performance in the LCQuAD and QALD7 benchmarks. We are also working on pipeline composition (WWW 2018) which allows researchers to focus on specific components as the pipeline orchestration is performed dynamically.
2.) The second class of methods are “end-to-end” systems solving the KGQA task in a single step. A core idea underlying this theme of research is to avoid the error propagation of pipeline approaches and, therefore, overall higher accuracy and robustness of question answering. Despite not being able to profit from several decades of research in natural language processing, this class of methods has recently led to state-of-the-art results when sufficient training data is available. A relatively early example of this is an approach, in which we explored neural networks with word and character-level embeddings (WebConf 2017), which was at the time the most accurate end-to-end question answering system for the Facebook AI simple questions benchmark (even though it’s quite simple from today’s point of view). In a nutshell, both the natural language question as well as the nodes (entities) and edges (called predicate here) in the knowledge graph were encoded and a similarity function applied:
Figure: Neural End-to-end KGQA system architecture allowing to answer simple questions with a single entity and predicate
In 2018, we developed a sequence-to-sequence translation-based approach using neural networks, that is able to generate more complex queries and was at the time the state-of-the-art algorithm for the WikiSQL benchmark for learning SQL queries over relational tables. More recently, we used graph attention networks and Transformer architectures to enable conversational QA scenarios via a multi-task semantic parsing approach. This is currently (as of 2021) the state-of-the-art approach for complexing sequential question answering and has substantially improved the accuracy compared to baselines on the CSQA (Complex Sequential Question Answering) dataset published by Saha et. al.
Figure: Multi-task Semantic Parsing with Transformer and Graph Attention Networks architecture. It consists of three modules: 1) A semantic parsing-based transformer model, containing a contextual encoder and a grammar guided decoder. 2) An entity recognition module, which identifies all the entities in the context, together with their types, linking them to the knowledge graph. It filters them based on the context and permutes them, in case of more than one required entity. Finally, 3) a graph attention-based module that uses a GAT network initialised with BERT embeddings to incorporate and exploit correlations between (entity) types and predicates. The resulting node embeddings, together with the context hidden state (hctx) and decoder hidden state (dh), are used to score the nodes and predict the corresponding type and predicate.