Skip to main content

2022 | Buch

Complex Data Analytics with Formal Concept Analysis

herausgegeben von: Prof. Dr. Rokia Missaoui, Prof. Dr. Léonard Kwuida, Prof. Dr. Talel Abdessalem

Verlag: Springer International Publishing

insite
SUCHEN

Über dieses Buch

FCA is an important formalism that is associated with a variety of research areas such as lattice theory, knowledge representation, data mining, machine learning, and semantic Web. It is successfully exploited in an increasing number of application domains such as software engineering, information retrieval, social network analysis, and bioinformatics. Its mathematical power comes from its concept lattice formalization in which each element in the lattice captures a formal concept while the whole structure represents a conceptual hierarchy that offers browsing, clustering and association rule mining.

Complex data analytics refers to advanced methods and tools for mining and analyzing data with complex structures such as XML/Json data, text and image data, multidimensional data, graphs, sequences and streaming data. It also covers visualization mechanisms used to highlight the discovered knowledge.
This edited book examines a set of important and relevant research directions in complex data management, and updates the contribution of the FCA community in analyzing complex and large data such as knowledge graphs and interlinked contexts. For example, Formal Concept Analysis and some of its extensions are exploited, revisited and coupled with recent processing parallel and distributed paradigms to maximize the benefits in analyzing large data.

Inhaltsverzeichnis

Frontmatter
Chapter 1. Formal Concept Analysis and Extensions for Complex Data Analytics
Abstract
The goal of this paper is to first recall the key notions of Formal Concept Analysis and its main extensions, and then to give a brief overview of studies on complex data analytics. The latter refers to the analysis of complex data to discover patterns and learning models from data with a complex structure such as XML or Json data, texts, images, graphs, trees, multidimensional and streaming data. Finally, it presents the contributions inside this volume.
Léonard Kwuida, Rokia Missaoui
Chapter 2. Conceptual Navigation in Large Knowledge Graphs
Abstract
A growing part of Big Data is made of knowledge graphs. Major knowledge graphs such as Wikidata, DBpedia or the Google Knowledge Graph count millions of entities and billions of semantic links. A major challenge is to enable their exploration and querying by end-users. The SPARQL query language is powerful but provides no support for exploration by end-users. Question answering is user-friendly but is limited in expressivity and reliability. Navigation in concept lattices supports exploration but is limited in expressivity and scalability. In this paper, we introduce a new exploration and querying paradigm, Abstract Conceptual Navigation (ACN), that merges querying and navigation in order to reconcile expressivity, usability, and scalability. ACN is founded on Formal Concept Analysis (FCA) by defining the navigation space as a concept lattice. We then instantiate the ACN paradigm to knowledge graphs (Graph-ACN) by relying on Graph-FCA, an extension of FCA to knowledge graphs. We continue by detailing how Graph-ACN can be efficiently implemented on top of SPARQL endpoints, and how its expressivity can be increased in a modular way. Finally, we present a concrete implementation available online, Sparklis, and a few application cases on large knowledge graphs.
Sébastien Ferré
Chapter 3. FCA2VEC: Embedding Techniques for Formal Concept Analysis
Abstract
Embedding large and high dimensional data into low dimensional vector spaces is a necessary task to computationally cope with contemporary data sets. Superseding ‘latent semantic analysis’ recent approaches like ‘word2vec’ or ‘node2vec’ are well established tools in this realm. In the present paper we add to this line of research by introducing ‘fca2vec’, a family of embedding techniques for formal concept analysis (FCA). Our investigation contributes to two distinct lines of research. First, we enable the application of FCA notions to large data sets. In particular, we demonstrate how the cover relation of a concept lattice can be retrieved from a computationally feasible embedding. Secondly, we show an enhancement for the classical node2vec approach in low dimension. For both directions the overall constraint of FCA of explainable results is preserved. We evaluate our novel procedures by computing fca2vec on different data sets like, wiki44 (a dense part of the Wikidata knowledge graph), the Mushroom data set and a publication network derived from the FCA community.
Dominik Dürrschnabel, Tom Hanika, Maximilian Stubbemann
Chapter 4. Analysis of Complex and Heterogeneous Data Using FCA and Monadic Predicates
Abstract
In this article, we recall the NextPriorityConcept algorithm we developed to study concept lattices using first-order monadic predicates. This new approach unifies and simplifies the pattern structure theory proposing to immerse context objects in a dedicated predicate space having the properties of an inference system. This way of managing objects and attributes (monadic predicates) joins the concepts developed in the theory of generalized convex structures, in particular that of half-spaces. We show how this paradigm can be used for boolean, categorized, numerical, character string and sequential data on well-known examples of literature in order to generate lattices whose size is controlled by the user’s choices.
Karell Bertet, Christophe Demko, Salah Boukhetta, Jérémy Richard, Cyril Faucher
Chapter 5. Dealing with Large Volumes of Complex Relational Data Using RCA
Abstract
Most of available data are inherently relational, with e.g. temporal, spatial, causal or social relations. Besides, many datasets involve complex and voluminous data. Therefore, the exploration of relational data is a major challenge for Formal Concept Analysis (FCA). Relational Concept Analysis (RCA) is specifically designed to investigate the relational structure of a dataset in the FCA paradigm. In this chapter, we examine how RCA can take over the issues raised by complex data. Using two datasets, one about the quality monitoring of waterbodies in France, the other about the use of pesticidal and antimicrobial plants in Africa, we study the limitations of different FCA algorithms, and their current implementations to explore these datasets with RCA. We also show how pattern extraction combined with the presentation of data in hierarchical structures is appropriate for the analysis of temporal datasets by the domain expert. Finally, we discuss about the possible directions to investigate.
Agnès Braud, Xavier Dolques, Alain Gutierrez, Marianne Huchard, Priscilla Keip, Florence Le Ber, Pierre Martin, Cristina Nica, Pierre Silvie
Chapter 6. Computing Dependencies Using FCA
Abstract
Constraints, in a broad sense, are restrictions that exist (or should exist) in a dataset. There are many different kinds of constraints, that differ not only in their semantics, but also, in the domains in which they are present: database design, knowledge discovery, data analysis, to name a few. Formal Concept Analysis and Pattern Structures has been used to characterize and compute different kinds of constraints. The fact that this unified framework can embrace all this diversity has been an appealing line of research during the last years. In this paper we revisit some of our relevant results in this field. Moreover, we also discuss limitations and drawbacks and suggest possible directions within this field.
Jaume Baixeries, Victor Codocedo, Mehdi Kaytoue, Amedeo Napoli
Chapter 7. Leveraging Closed Patterns and Formal Concept Analysis for Enhanced Microblogs Retrieval
Abstract
Social microblogging services have gained a significant interest for society during our decade. These online platforms offered by the web 2.0 showed up the emergence of a large amount of data, allowing users to produce, share and exchange various content. Twitter is one of the most popular microblogging sites used by people to find relevant posts that satisfy their information need (e.g., breaking news, popular trends, information about people of interest, etc). However, Twitter’s queries and messages are short and access to information is sometimes difficult because of the variety of published content and huge amount of data generated. In this context, it is difficult for the user to properly find the relevant information. The proposal work deals with the context of social information retrieval (SIR) and aims to improve tweets retrieval quality. Thus, we propose a query expansion method to expand users’ queries. The proposed approach is based on Formal Concept Analysis by extracting patterns from documents retrieved by the search system. Also, the method uses Word Embeddings to enrich the patterns by adding similar words. The final query is therefore given by merging the initial query with the extended query. We experiment and evaluate the proposed method on the TREC 2011 dataset containing approximately 16 million tweets and 49 queries. Results revealed the effectiveness of the proposed approach and show the interest of combining patterns and word embeddings for enhanced microblogs retrieval.
Meryem Bendella, Mohamed Quafafou
Chapter 8. Scalable Visual Analytics in FCA
Abstract
We adopt a visual analytic approach to FCA by combining computational analysis with interactive visualisation. Scaling FCA to the interactive analysis of large data sets poses four fundamental challenges: the time required to enumerate the vertices, arcs and labels of the lattice digraph; the difficulty of responsive presentation of, and meaningful user interaction with, a large digraph; the time required to enumerate (a basis for) all valid implications; and the discovery of insightful implications. This chapter briefly surveys potential solutions to these scalability challenges posed by big data volumes, and describes software prototypes and coordinated visualisations which explore some of them.
Tim Pattison, Manuel Enciso, Ángel Mora, Pablo Cordero, Derek Weber, Michael Broughton
Chapter 9. Formal Methods in FCA and Big Data
Abstract
Formal Concept Analysis (FCA) plays an important role in knowledge representation and knowledge discovery, and has generated an increasingly growing research field. The use of FCA in the context of big data provides a basis for better interpretability and explainability of results, usually lacking in other statistical approaches to data analysis; however, scalability is an important issue for FCA logic-based tools and techniques, such as the generation and use of implicational systems. We survey the theoretical and technical foundations of some trends in FCA. Specifically, we present a summary of promising theoretical and practical applications of FCA that could be used to solve the problem of dealing with big data. Furthermore, we propose some directions for future research to solve this problem.
Domingo López-Rodríguez, Emilio Muñoz-Velasco, Manuel Ojeda-Aciego
Chapter 10. Towards Distributivity in FCA for Phylogenetic Data
Abstract
It is known that a distributive lattice is a median graph, and that a distributive ∨-semilattice can be thought of as a median graph iff every triple of elements such that the infimum of each couple of its elements exists, has an infimum. Since a lattice without its bottom element is obviously a ∨-semilattice, using the FCA formalism, we investigate the following problem: Given a semilattice L obtained from a lattice by deletion of the bottom element, is there a minimum distributive ∨-semilattice L d such that L can be order embedded into L d? We give a negative answer to this question by providing a counter-example.
Alain Gély, Miguel Couceiro, Amedeo Napoli
Chapter 11. Triclustering in Big Data Setting
Abstract
In this paper, we describe versions of triclustering algorithms adapted for efficient calculations in distributed environments with MapReduce model or parallelisation mechanism provided by modern programming languages. OAC-family of triclustering algorithms shows good parallelisation capabilities due to the independent processing of triples of a triadic formal context. We provide time and space complexity of the algorithms and justify their relevance. We also compare performance gain from using a distributed system and scalability.
Dmitry Egurnov, Dmitry I. Ignatov, Dmitry Tochilkin
Backmatter
Metadaten
Titel
Complex Data Analytics with Formal Concept Analysis
herausgegeben von
Prof. Dr. Rokia Missaoui
Prof. Dr. Léonard Kwuida
Prof. Dr. Talel Abdessalem
Copyright-Jahr
2022
Electronic ISBN
978-3-030-93278-7
Print ISBN
978-3-030-93277-0
DOI
https://doi.org/10.1007/978-3-030-93278-7