Skip to main content
Top

2008 | Book

Statistical Implicative Analysis

Theory and Applications

Editors: Régis Gras, Einoshin Suzuki, Fabrice Guillet, Filippo Spagnolo

Publisher: Springer Berlin Heidelberg

Book Series : Studies in Computational Intelligence

insite
SEARCH

About this book

Statistical implicative analysis is a data analysis method created by Régis Gras almost thirty years ago which has a significant impact on a variety of areas ranging from pedagogical and psychological research to data mining. Statistical implicative analysis (SIA) provides a framework for evaluating the strength of implications; such implications are formed through common knowledge acquisition techniques in any learning process, human or artificial. This new concept has developed into a unifying methodology, and has generated a powerful convergence of thought between mathematicians, statisticians, psychologists, specialists in pedagogy and last, but not least, computer scientists specialized in data mining.

This volume collects significant research contributions of several rather distinct disciplines that benefit from SIA. Contributions range from psychological and pedagogical research, bioinformatics, knowledge management, and data mining.

Table of Contents

Frontmatter

Methodology and concepts for SIA

An overview of the Statistical Implicative Analysis (SIA) development
This paper presents an overview of the Statistical Implicative Analysis which is a data analysis method devoted to the extraction and the structuration of quasi-implications. Originally developed by Gras [11] for applications in the didactics of mathematics, it has considerably evolved and has been applied to a wide range of data, in particular in data mining. This paper is a synthesis which both briefly presents the basic statistical framework of the approach and details recent developments.
Régis Gras, Pascale Kuntz
CHIC: Cohesive Hierarchical Implicative Classification
CHIC is a data analysis tool based on SIA. Its aim is to discover the more relevant implications between states of different variables. It proposes two different ways to organize these implications into systems: i) In the form of an oriented hierarchical tree and ii) as an implication graph. Besides, it also produces a (non oriented) similarity tree based on the likelihood of the links between states. The paper describes its main features and its usage.
Raphaël Couturier
Assessing the interestingness of temporal rules with Sequential Implication Intensity
In this article, we study the assessment of the interestingness of sequential rules (generally temporal rules). This is a crucial problem in sequence analysis since the frequent pattern mining algorithms are unsupervised and can produce huge amounts of rules. While association rule interestingness has been widely studied in the literature, there are few measures dedicated to sequential rules. Continuing with our work on the adaptation of implication intensity to sequential rules, we propose an original statistical measure for assessing sequential rule interestingness. More precisely, this measure named Sequential Implication Intensity (SII) evaluates the statistical significance of the rules in comparison with a probabilistic model. Numerical simulations show that SII has unique features for a sequential rule interestingness measure.
Julien Blanchard, Fabrice Guillet, Régis Gras

Application to concept learning in education, teaching, and didactics

Student's Algebraic Knowledge Modelling: Algebraic Context as Cause of Student's Actions
In this chapter, we describe a construction of a student model in the field of algebra. For gathering the data, we have used the Aplusix learning environment, which allows students to make freely calculation steps and records all the students' actions. One way to build and update the student model is to precisely follow what the student is doing, by means of a detailed representation of cognitive skills. We are interested in persistent and reproducible actions, i.e., the same action done by a student in different algebraic contexts, rather than in a local student action. For discovering patterns of student behaviours, we use a statistical implicative analysis which makes possible seeking for stability of the actions and determining the contexts where they appear. This theory allows us to build implicative connections between algebraic contexts and actions.
Marie-Caroline Croset, Jana Trgalova, Jean-François Nicaud Croset
The graphic illusion of high school students
The factorial analysis of the relationship between the mathematical background on linear and quadratic functions, on the one hand, and the representation of functions (graphics, figures and so on) on the other hand, stands in contradiction to the usual assumption of the existence of a “graphical conceptualization” of functions, different from the “non-graphical conceptualization”. Nevertheless, both the authors of scholar texts and the teachers involved in this research tend to use the graphical representation of functions. In the context of proportionality, the Statistical Implicative Analysis of students' preferences regarding the kind of graphical representation reveals the existence of a graphical illusion shared by high schoolstudents.
Eduardo Lacasta, Miguel R. Wilhelmi
Implicative networks of student's representations of Physical Activities
The proposal reports on and discusses results of a questionnaire-based study of young people's attitudes (representations) to team games and volleyball (in the context of physical education lessons). This questionnaire was given to students in French agricultural high school. Treatment use software CHIC. Questions approached attitudes, values and dispositions (representations) of students about physical education, and, more particularly about volleyball. Several networks of variables appear which make it possible to profile different kinds of students. Study of contributions of two additional variables, sex and gender, highlighted networks, makes it possible to improve choices of representatives networks students for later studies based on interviews. Interestingly and somewhat unexpectedly, while sex is a strong predictor of attitudes and dispositions to team sports and volleyball, gender is not.
Catherine-Marie Chiocca, Ingrid Verscheure
A comparison between the hierarchical clustering of variables, implicative statistical analysis and confirmatory factor analysis
This study aims to gain insight about the distinct features and advantages of three statistical methods, namely the hierarchical clustering of variables, the implicative method and the Confirmatory Factor Analysis, by comparing the outcomes of their application in exploring the understanding of function. The investigation concentrates on the structure of students' abilities to carry out conversions of functions from one mode of representation to others. Data were obtained from 587 students in grades 9 and 11. Using Confirmatory Factor Analysis, a model, that provides information about the significant role of the initial representations of conversions in students' processes, is developed and validated. Using the hierarchical clustering and implicative analysis, evidence is provided to students' compartmentalized thinking among representations. These findings remain stable across grades. The outcomes of the three methods were found to coincide and to complement each other.
Iliada Elia, Athanasios Gagatsis
Implications between learning outcomes in elementary bayesian inference
In this research implicative analysis served to study some previous hypotheses about the interrelationships in students' understanding of different concepts and procedures after 12 hours of teaching elementary Bayesian inference. A questionnaire made up of 20 multiple choice items was used to assess learning of 78 psychology students. Results suggest four groups of interrelated concepts: conditional probability, logic of statistical inference, probability models and random variables.
Carmen Díaz, Inmaculada de la Fuente, Carmen Batanero
Personal Geometrical Working Space: a Didactic and Statistical Approach
In this paper, we study answers that pre-service teachers gave in an exercise of Geometry. Our purpose is to gain a better understanding of what we call the geometrical working space (espace de travail géAoméAtrique). We first conduct a didactical study based on the notion of geometrical paradigms that leads to a classification of student's answers. Then, we use statistical tools to precise the previous analysis and explain students' evolution during their training.
Alain Kuzniak

A methodological answer in various application frameworks

Statistical Implicative Analysis of DNA microarrays
This chapter presents an application of the Statistical Implicative Analysis to microarray gene expression data. The specificity of these data requires an adaptation of the concept of intensity of implication. More specifically, we propose to study the rankings of observations instead of the measurements themselves. This method makes our analysis more robust and insensitive to any monotonic transformation of gene expression. We introduce the concept of rank interval and show that the integration of the implicative method in this framework is more efficient than correlation techniques. Our method is applied to the most challenging problems encountered in gene expression analysis, namely the discovery of gene coregulation, gene selection and tumour classification. We compare our method with performing algorithms that are dedicated to gene expression data or that are well-suited to high-dimensional variable space.
Gerard Ramstein
On the use of Implication Intensity for matching ontologies and textual taxonomies
At the intersection of data mining and knowledge management, we shall hereafter present an extensional and asymmetric matching approach designed to find semantic relations (equivalence and subsumption) between two textual taxonomies or ontologies. This approach relies on the idea that an entity A will be more specific than or equivalent to an entity B if the vocabulary (i.e. terms and data) used to describe A and its instances tends to be included in that of B and its instances. In order to evaluate such implicative tendencies, this approach makes use of association rule model and Interestingness Measures (IMs) developed in this context. More precisely, we focus on experimental evaluations of IMs for matching ontologies. A set of IMs has been selected according to criteria related to measure properties and semantics. We have performed two experiments on a benchmark composed of two textual taxonomies and a set of reference matching relations between the concepts of the two structures. The first test concerns a comparison of matching accuracy with each of the selected measures. In the second experiment, we compare how each IM evaluates reference relations by studying their value distributions. Results show that the implication intensity delivers the best results.
Járôme David, Fabrice Guillet, Henri Briand, Régis Gras
Modelling by Statistic in Research of Mathematics Education
The aim of this paper is to study the quantitative tools of the research in didactics. We want to investigate the theoretical-experimental relationships between factorial and implicative analysis. This chapter consists of three parts. The first one deals with the didactic research and some fundamental tools: the a priori analysis of a didactic situation, the collection of experimental data and the statistic analysis of data. The purpose of the second and the third section is to introduce the experimental comparison between the factorial and the implicative analysis in two researches in mathematics education.
Elsa Malisani, Aldo Scimone, Filippo Spagnolo
Didactics of Mathematics and Implicative Statistical Analysis
People working in Didactics of Mathematics have constantly regarded statistical implicative analysis as a profitable and heuristic method of data analysis. First we intend to show the reasons for this interest: implicative links that S.I.A. has pointed out may be interpreted as rules and regulations connecting actions, discourses, …, or as a group's characteristics. We develop some examples showing how S.I.A. can be used and what special research results it can provide. We insist upon some points that may be interesting methodologically to focus on: asymmetric links, nodes and separate implicative ways.
Dominique Lahanier-Reuter
Using the Statistical Implicative Analysis for Elaborating Behavioral Referentials
Various informatic assessment tools have been created to help human resources managers in evaluating the behavioral profile of a person. The psychological basis of those tools have all been validated, but very few of them have follow a deep statistical analysis. The PerformanSe Echo assessment tool is one of them. It gives the behavioral profile of a person along 10 bipolar dimensions. It has been validated on a population of 4538 subjects in 2004. We are now interested in building a set of psychological indicators based on Echo on a population of 613 experienced executives who are 45 years old and more, and seeking a job. Our goal is twofold: first to confirm the previous validation study, then to build a relevant behavioral referential on this population. The final goal is to have relevant indicators helping to understand the link between some behavioral characteristics and current profiles that can be categorized in the population. In the end, it may provide the foundation for a decision support tool intended for consultants specialized in coaching and outplacement.
Stéphane Daviet, Fabrice Guillet, Henri Briand, Serge Baquedano, Vincent Philippé, Régis Gras
Fictitious Pupils and Implicative Analysis: a Case Study
We present a case study, in the context of Didactics of Mathematics, in which we adopt the methodology of using fictitious data in the Statistical Implicative Analysis. On the one hand, unlike supplementary variables, the fact of adding fictitious data to the sample does modify analyses results, so caution is needed. On the other hand, fictitious students are a tool for better understanding the data structure resulting from the analyses.
Pilar Orús, Pablo Gregori
Identifying didactic and sociocultural obstacles to conceptualization through Statistical Implicative Analysis
To understand culture's relationship to cognition, this field has studied children or adults with little schooling and often alien to well-educated Western culture. Traditionally centered on extra-curricular knowledge, school-based variables must be considered: written culture and teaching/learning strategies can generate obstacles to conceptualization. Subjects are adults who studied at least three years at university: some are professionals. Data was from short clinical-style interviews as well as a questionnaire based survey taken from an observational sample. To find regularities linked to conceptual strength, S.I.A. determined implicative rules between responses and pre-ordered structures. Results suggested representations linked to specific conceptual aspects constitute didactical and/or socio-cultural obstacles.
Nadja Maria Acioly-Régnier, Jean-Claude Régnier

Extensions to rule interestingness in data mining

Pitfalls for Categorizations of Objective Interestingness Measures for Rule Discovery
In this paper, we point out four pitfalls for categorizations of objective interestingness measures for rule discovery. Rule discovery, which is extensively studied in data mining, suffers from the problem of outputting a huge number of rules. An objective interestingness measure can be used to estimate the potential usefulness of a discovered rule based on the given data set thus hopefully serves as a countermeasure to circumvent this problem. Various measures have been proposed, resulting systematic attempts for categorizing such measures. We believe that such attempts are subject to four kinds of pitfalls: data bias, rule bias, expert bias, and search bias. The main objective of this paper is to issue an alert for the pitfalls which are harmful to one of the most important research topics in data mining. We also list desiderata in categorizing objective interestingness measures.
Einoshin Suzuki
Inducing and Evaluating Classification Trees with Statistical Implicative Criteria
Implicative statistics criteria have proven to be valuable interestingness measures for association rules. Here we highlight their interest for classification trees. We start by showing how Gras' implication index may be defined for rules derived from an induced decision tree. This index is especially helpful when the aim is not classification itself, but characterizing the most typical conditions of a given conclusion. We show that the index looks like a standardized residual and propose as alternatives other forms of residuals borrowed from the modeling of contingency tables. We then consider two main usages of these indexes. The first is purely descriptive and concerns the a posteriori individual evaluation of the classification rules. The second usage relies upon the strength of implication for assigning the most appropriate conclusion to each leaf of the induced tree. We demonstrate the practical usefulness of this statistical implicative view on decision trees through a full scale real world application.
Gilbert Ritschard, Vincent Pisetta, Djamel A. Zighed
On the behavior of the generalizations of the intensity of implication: A data-driven comparative study
In this chapter, we present an original and synthetical overview of most of the commonly used association rule statistical interestingness measures introduced in previous works. These measures usually relate the confidence of a rule to an independency reference situation. Others relate it to indetermination, or impose a minimum confidence threshold. We propose a systematic generalization of these measures, taking into account a reference point, chosen by an expert, in order to apprehend the confidence of a rule. This generalization introduces new connections between measures. They lead to the enhancement of some measures. We then propose new parameterized possibilities. The behavior of the parameterized measures is illustrated using classical datasets, and these measures are compared to their original counter-parts. This study highlights the different properties of each of them and discusses the advantages of our proposition.
Benoît Vaillant, Stéphane Lallich, Philippe Lenca
The TVpercent principle for the counterexamples statistic
Our aim is to put into practice the principle of test value percent criterion to the counterexamples statistic, which is the basis of the well-known statistical implicative analysis approach. We show how to compute the test value in this context; what is the connection with the intensity of implication measure, on the one hand; and the index of implication, on the other hand. We evaluate the behavior of these measures on a large dataset comprising several hundred of thousands of transactions. We evaluate especially the discriminating capacity of the measures, in relation to specialized measure such as the entropic intensity of implication.
Ricco Rakotomalala, Alain Morineau
User-System Interaction for Redundancy-Free Knowledge Discovery in Data
A classical limit of association rule at the decider's point of view is in the combinatorial nature of the association rules, resulting in numerous rules. As the overall quality of an association rule set can be considered as insight of the studied domain given to the decider by the interpretation of the rules, too many rules can make an harder interpretation then a worse quality of the overall process.
To get more readable rules and thus improve this global quality criterion, we apply techniques initially designed for redundancy reduction in functional dependencies sets to association rules. Although the two kinds of relations have different properties, this method allow very concise representations that are easily understood by the decider and can be further exploited for automatic reasoning.
sIn this paper, we present this method, compare it to other approaches and apply it to synthetic datasets. We end with a discussion about the information loss resulted of the simplification.
Rémi Lehn, Henri Briand, Fabrice Guillet
Fuzzy Knowledge Discovery Based on Statistical Implication Indexes
We describe one application of statistical implication indexes to fuzzy knowledge discovery. After recalling principles of fuzzy logics, we explain how we have adapted statistical indexes to fuzzy knowledge: the support, the confidence and a less common index, the intensity of implication. These indexes highlight statistical links between conjunctions of fuzzy attributes and fuzzy conclusions, but do not evaluate the associated fuzzy rules, which depend of the chosen fuzzy operators. Since fuzzy operators are numerous, we evaluate their sets by applying the generalized modus ponens on the database and by comparing its results to the effective conclusions. We give a summary of the results on several databases, and we present the sets of fuzzy operators that appear to be the best. Studying methods to aggregate fuzzy rules, we show that in order to keep classical reduction schemes, fuzzy operators must be chosen differently. However, one of these possible operator sets is also one of the best for processing the generalized modus ponens.
Maurice Bernadet
Backmatter
Metadata
Title
Statistical Implicative Analysis
Editors
Régis Gras
Einoshin Suzuki
Fabrice Guillet
Filippo Spagnolo
Copyright Year
2008
Publisher
Springer Berlin Heidelberg
Electronic ISBN
978-3-540-78983-3
Print ISBN
978-3-540-78982-6
DOI
https://doi.org/10.1007/978-3-540-78983-3

Premium Partners