1 Introduction
Knowledge graphs (KGs) are getting more attention in the research community and industry. They are shown to effectively improve the performance of many downstream applications, such as question answering (Berant et al.
2013; Fader et al.
2014), recommender systems (Zhang et al.
2016; Zhao et al.
2017), and information retrieval (Ensan and Bagheri
2017; Liu and Fang
2015; Reinanda et al.
2020). Despite their usefulness, KGs are notoriously incomplete (Wang et al.
2017), and hence they require continuous curation and enrichment.
One of the most effective KG enrichment techniques is KG alignment (Paulheim
2017), which aims to merge two or more KGs to form a single and more complete KG. The first step of KG alignment is finding entities representing the same real-world entity in different KGs. Then, the relationships (i.e., the attributes and the neighbors) of the aligned entities from the different KGs are merged to form a more comprehensive KG. The former step becomes the main challenge of a KG alignment technique, while the latter is straightforward.
Traditional approaches of KG alignment mainly use string matching of entities’ attributes to compute entity similarity (Volz et al.
2009; Pershina et al.
2015). These approaches require manually defined constraints, i.e., they need to know which attributes are to be compared beforehand. However, the manually defined constraints are typically sub-optimal since different entities may have different attributes, e.g., a person may have a
gender
attribute, but an organisation does not.
The state-of-the-art KG alignment approach is based on entity embeddings (Trisedya et al.
2019; Wu et al.
2020; Liu et al.
2020). To compute the entity similarity, it first computes the embeddings (a vector representation) of all entities in the KGs. Then, a vector similarity (e.g., cosine similarity) can be used as the entity similarity score. Despite its success, the practical use of embedding-based KG alignment techniques for KG enrichment is low. One of the key reasons is that these techniques do not provide any explanations of the alignment results, which is essential to help experts decide whether the alignment is correct. The expert check is important to maintain the quality of the resulting KG. Without any explanation of the prediction, an expert needs to look up all the attributes and neighbors of the aligned entities predicted by the model to verify the prediction. This can be error-prone since different KGs often have different naming schemes for attributes and relations (e.g.,
educated_at
vs.
alumni
).
This paper aims to fill this gap by proposing an interpretable embedding-based KG alignment model capable of generating state-of-the-art performance with explainable alignment results. There are three main challenges to build such a model. The first is the interpretability of the two common approaches for embedding-based KG alignment techniques, Translation-based and Graph Neural Network (GNN)-based KG alignment models, is non-trivial. The translation-based alignment models compute entity embeddings using a translation-based embedding model, such as TransE (Bordes et al.
2013), that treats the triples in a KG independently, making it difficult to compute the importance of attributes and neighbors. Meanwhile, GNN-based models (Wu et al.
2020; Liu et al.
2020) overlook entities’ attributes by typically only using the entity label to initialize node embeddings in the GNN while ignoring the other attributes (e.g., birth date, address, etc.). Moreover, GNN typically employs a message-passing paradigm, where the aggregation function is constructed to be invariant to neighborhood permutations (Dwivedi and Bresson
2021). These limitations mean that the predicted alignments are difficult to explain, i.e., it is difficult to compute the importance of the attributes and neighbors of the aligned entities.
The second challenge is that applying a post-hoc (model-agnostic) explainer is sub-optimal. One of the state-of-the-art post-hoc explainers for GNN models is GNNExplainer (Ying et al.
2019). However, it can only extract the most influential neighbors, but not the most influential attributes, due to the first limitation of GNN-based models mentioned above. Moreover, model-agnostic explainers cannot have perfect fidelity with respect to the model (Rudin
2019). The third challenge is scalability. The state-of-the-art alignment models are built on top of GNN models that need to maintain the whole KB graph, which requires large amounts of memory for the message-passing procedure. This is problematic when handling large KGs.
This paper proposes an interpretable KG alignment model named i-Align to handle the above challenges. The main goal of i-Align is to accurately predict entity alignment between KGs and seamlessly provide an explanation for the prediction. The provided explanation is in the form of the similarity between the top-n features (i.e., attributes and neighbors) of the aligned entities used to compute the entity embeddings. Intuitively, a KG alignment model should capture the aligned features of aligned entities, which are reflected by the computed embeddings. In other words, entity embedding is computed by highlighting the features that are aligned with the features of the counterpart entity. Hence, the top-n highlighted attributes and neighbors of entities can help to indicate the correctness of the predictions.
The proposed model is built on top of a Transformer model (Vaswani et al.
2017) to exploit its self-attention mechanism to rank the importance of the attributes and neighbors (Wiegreffe and Pinter
2019) as the model’s explanation. It has two Transformer encoders, one is used as an attribute aggregator, and the other is used as a neighbor aggregator. The attribute aggregator computes a hidden state of an entity based on its attribute using the standard Transformer (Vaswani et al.
2017). At the same time, the neighbor aggregator computes a hidden state based on its structure/neighbors. Both hidden states are combined to form entity embeddings.
This, however, poses an additional challenge. Computing the hidden states (latent information) based on structure/neighbors using a Transformer-based model is challenging, especially in large KGs, which leads to the following sub-problems. First, the self-attention mechanism of a Transformer is computationally expensive and may not be feasible to be applied to a large graph. Thus, the model needs to decompose the large graph into sub-graphs (mini-batches). Second, it requires a message-passing-like mechanism to aggregate the structural information of both the sub-graphs and the whole graph accordingly. Existing work, such as GraphTransformer (Dwivedi and Bresson
2021), has attempted to simulate the message-passing mechanism of a GNN in a Transformer-based model, but it cannot handle large graphs.
To address the above challenges, a novel Transformer-based Graph Encoder,
Trans-GE, is proposed for the neighbor aggregator component of i-Align. Trans-GE uses
Edge-gated Attention that combines the adjacency matrix and the self-attention matrix to learn a gating mechanism to control the information aggregation from neighboring entities. It also uses Historical Embeddings (Chen et al.
2018; Fey et al.
2021), which allows Trans-GE to approximate the full computational graph in a mini-batch to address the scalability issue when encoding a large KG. The attention mechanism of the attributes and neighbor aggregators is used to compute the attention weight to highlight the important attributes and neighbors, respectively. The top-n highlighted attributes and neighbors of the aligned entities are then listed as an explanation of whether the alignment is correct.
In summary, the contributions of the paper are as follows:
1.
An interpretable KG alignment model is proposed, where an explanation of the alignment prediction can be automatically derived. The alignment prediction can help enrich a KG, and the explanation can help experts check the correctness of the prediction to maintain the high quality results of the enrichment process.
2.
Along with the proposed model, a novel Transformer-based graph encoder is proposed. It uses Edge-gated Attention to learn a weight that controls information aggregation from the surrounding neighbors of an entity. It also uses Historical Embeddings to train the model over small mini-batches.
3.
Extensive experiments and analyses are conducted to show the model’s effectiveness in predicting the alignments and providing explanations.
5 Conclusion and future work
This paper proposed i-Align, an interpretable KG alignment model. The main advantage of i-Align over the existing KG alignment models is that it provides an explanation for each alignment prediction made. This explanation can help experts in the curation process to merge KGs by providing an explanation of proposed alignment predictions. Thus, it helps maintain the high quality of the enriched KG. The proposed model has two components: attribute aggregator and neighbor aggregator. The attribute aggregator uses the standard Transformer, while a novel Transformer-based Graph Encoder (Trans-GE) is proposed for the neighbor aggregator. Trans-GE uses Edge-gated Attention that combines the adjacency matrix and the self-attention matrix to learn a score as a gate to control the information aggregation from the neighboring entities. It also uses historical embeddings, allowing Trans-GE trained over mini-batches/small sub-graphs to address the scalability issue when encoding a large KG. The attention mechanisms of the attribute and neighbor aggregators are used to compute the attention weight to highlight the important attributes and neighbors, respectively. Experimental results show the model’s effectiveness for aligning KGs, the quality of the generated explanations, and the practicality for aligning large KGs.
The proposed i-Align uses attention weights as the primary indicator of the importance of attributes/neighbors. This is a simple yet effective technique. One limitation of i-Align is that it uses attribute literal similarity. So, it may not perform well on cross-lingual knowledge graph alignment, where the literal attributes are written in different characters. The other interesting topics for future work are: an integration to a more advanced technique, such as attention rollout (Chefer et al.
2021), which can be explored to improve the explanation quality; explainability in GNNs is an emerging research topic and worth exploring to improve i-Align.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.