research-article

Detecting outlying properties of exceptional objects

Authors:
Fabrizio Angiulli

DEIS, Università della Calabria, Rende (CS), Italy

DEIS, Università della Calabria, Rende (CS), Italy
View Profile

,
Fabio Fassetti

ICAR-CNR, Rende (CS), Italy

ICAR-CNR, Rende (CS), Italy
View Profile

,
Luigi Palopoli

DEIS, Università della Calabria, Rende (CS), Italy

DEIS, Università della Calabria, Rende (CS), Italy
View Profile

Authors Info & Claims

ACM Transactions on Database Systems Volume 34 Issue 1Article No.: 7pp 1–62https://doi.org/10.1145/1508857.1508864

Published:23 April 2009Publication History

ACM Transactions on Database Systems

Abstract

Assume you are given a data population characterized by a certain number of attributes. Assume, moreover, you are provided with the information that one of the individuals in this data population is abnormal, but no reason whatsoever is given to you as to why this particular individual is to be considered abnormal. In several cases, you will be indeed interested in discovering such reasons. This article is precisely concerned with this problem of discovering sets of attributes that account for the (a priori stated) abnormality of an individual within a given dataset. A criterion is presented to measure the abnormality of combinations of attribute values featured by the given abnormal individual with respect to the reference population. In this respect, each subset of attributes is intended to somehow represent a “property” of individuals. We distinguish between global and local properties. Global properties are subsets of attributes explaining the given abnormality with respect to the entire data population. With local ones, instead, two subsets of attributes are singled out, where the former one justifies the abnormality within the data subpopulation selected using the values taken by the exceptional individual on those attributes included in the latter one. The problem of individuating abnormal properties with associated explanations is formally stated and analyzed. Such a formal characterization is then exploited in order to devise efficient algorithms for detecting both global and local forms of most abnormal properties. The experimental evidence, which is accounted for in the article, shows that the algorithms are both able to mine meaningful information and to accomplish the computational task by examining a negligible fraction of the search space.

References

Aggarwal, C. C. and Yu, P. 2001. Outlier detection for high dimensional data. In Proceedings of the International Conference on Managment of Data (SIGMOD), 37--46. Google ScholarDigital Library
Agrawal, R. and Srikant, R. 1994. Fast algorithms for mining association rules in large databases. In Proceedings of the International Conference on Very Large Databases (VLDB), 487--499. Google ScholarDigital Library
Angiulli, F. and Pizzuti, C. 2002. Fast outlier detection in large high-dimensional datasets. In Proceedings of the International Conference on Principles of Data Mining and Knowledge Discovery (PKDD), 15--26. Google ScholarDigital Library
Angiulli, F. and Pizzuti, C. 2005. Outlier mining in large high-dimensional datasets. IEEE Trans. Knowl. Data Eng. 17, 2, 203--215. Google ScholarDigital Library
Arning, A., Aggarwal, C., and Raghavan, P. 1996. A linear method for deviation detection in large databases. In Proceedings of the International Conference on Knowledge Discovery and Data Mining (KDD), 164--169.Google Scholar
Barnett, V. and Lewis, T. 1994. Outliers in Statistical Data. John Wiley & Sons.Google Scholar
Breiman, L., Friedman, J., Olshen, R., and Stone, C. 1984. Classification and Regression Trees. Wadsworth, Belmont.Google Scholar
Breunig, M. M., Kriegel, H., Ng, R., and Sander, J. 2000. Lof: Identifying density-based local outliers. In Proceedings of the International Conference on Managment of Data (SIGMOD), 93--104. Google ScholarDigital Library
Chaudhuri, S. and Dayal, U. 1997. An overview of data warehousing and olap technology. SIGMOD Rec. 26, 1, 65--74. Google ScholarDigital Library
Chen, Z. and Toda, S. 1995. The complexity of selecting maximal solutions. Inf. Comput. 1192, 2, 231--239. Google ScholarDigital Library
Codd, E., Codd, S., and Salley, C. 1993. Providing OLAP (on-line analytical processing) to user-analysts: An it mandate. Tech. rep., Codd & Date, Inc.Google Scholar
De Benedictis, G., Rose, G., Carrieri, G., Luca, M. D., Falcone, E., Passarino, G., Bonafè, M., Monti, D., Baggio, G., Bertolini, S., Mari, D., Mattace, R., and Franceschi, C. 1999. Mitochondrial DNA inherited variants are associates with successful aging and longevity in humans. FASEB J. 13, 12, 1532--1536.Google ScholarCross Ref
Garasto, S., Berardelli, M., Rango, F. D., Mari, V., Feraco, E., and Benedictis, G. D. 2004. A study of the average effect of the 3'apob-vntr polymorphism on lipidemic parameters could explain why the short alleles (< 35 repeats) are rare in centenarians. BMC Medical Genetics 5, 3.Google ScholarCross Ref
Garey, M. R. and Johnson, D. S. 1979. Computer and Intractability. W. H. Freeman and Company, New York. Google ScholarDigital Library
Gini, C. 1921. Measurement of inequality of incomes. The Economic J. 31, 124--126.Google ScholarCross Ref
Griffiths, A. J. F., Miller, J. H., Suzuki, D. T., Lewontin, R. C., and Gelbart, W. M. 1996. An Introduction to Genetic Analysis. W. H. Freeman.Google Scholar
Hodge, V. and Austin, J. 2004. A survey of outlier detection methodologies. Artif. Intell. Rev. 22, 2, 85--126. Google ScholarDigital Library
Karp, R. 1972. Reducibility among combinatorial problems. In Complexity of Computer Computations. Plenum, New York, 85--103.Google Scholar
Klösgen, W. 1996. Explora: A multipattern and multistrategy discovery assistant. In Advances in Knowledge Discovery and Data Mining (KDD), 249--271. Google ScholarDigital Library
Knorr, E. and Ng, R. 1998. Algorithms for mining distance-based outliers in large datasets. In Proceedings of the International Conference on Very Large Databases (VLDB), 392--403. Google ScholarDigital Library
Knorr, E. and Ng, R. 1999. Finding intensional knowledge of distance-based outliers. In Proceedings of the International Conference on Very Large Databases(VLDB), 211--222. Google ScholarDigital Library
Knorr, E., Ng, R., and Tucakov, V. 2000. Distance-based outlier: Algorithms and applications. VLDB J. 8, 3-4, 237--253. Google ScholarDigital Library
Knuth, D. 1998. The Art of Computer Programming, Volume 3: Sorting and Searching, 2nd ed. Addison-Wesley. Google ScholarDigital Library
Krentel, M. 1988. The complexity of optimization problems. J. Comput. Syst. Sci. 36, 3, 490--509. Google ScholarDigital Library
Mehta, M., Agrawal, R., and Rissanen, J. 1996. Sliq: A fast scalable classifier for data mining. In International Conference on Extending Database Technology (EDBT), 18--32. Google ScholarDigital Library
Newman, D. J., Hettich, S., Blake, C. L., and Merz, C. 1998. UCI repository of machine learning databases.Google Scholar
Papadimitriou, C. 1994. Computational Complexity. Addison-Wesley, USA.Google Scholar
Passarino, G., Montesanto, A., Dato, S., Giordano, S., Domma, F., Mari, V., Feraco, E., and Benedictis, G. D. 2006. Sex and age specificity of susceptibility genes modulating survival at old age. Human Heredity (Int. J. Hum. Medical Genetics) 62, 4, 213--220.Google ScholarCross Ref
Ramaswamy, S., Rastogi, R., and Shim, K. 2000. Efficient algorithms for mining outliers from large datasets. In Proceedings of the International Conference on Managment of Data (SIGMOD), 427--438. Google ScholarDigital Library
Rymon, R. 1992. Search through systematic set enumeration. In Proceedings of the International Conference on Principles of Knowledge and Reasoning (KR), 539--550.Google Scholar
Sarawagi, S., Agrawal, R., and Megiddo, N. 1998. Discovery-Driven exploration of OLAP data cubes. In Proceedings of the International Conference on Extending Database Technology (EDBT), 168--182. Google ScholarDigital Library
Selman, A. 1994. A taxonomy of complexity classes of functions. J. Comput. Syst. Sci. 48, 2, 357--381. Google ScholarDigital Library
Shafer, J., Agrawal, R., and Mehta, M. 1996. Sprint: A scalable parallel classifier for data mining. In Proceedings of the International Conference on Very Large Data Bases (VLDB), 544--555. Google ScholarDigital Library
Suzuki, E. 2006. Data mining methods for discovering interesting exceptions from an unsupervised table. J. Universal Comput. Sci. 12, 6, 627--653.Google Scholar
Tan, P. N., Steinbach, M., and Kumar, V. 2005. Introduction to Data Mining. Addison-Wesley. Google ScholarDigital Library
Wei, L., Qian, W., Zhou, A., Jin, W., and Yu, J. 2003. Hot: Hypergraph-Based outlier test for categorical data. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), 399--410. Google ScholarDigital Library
Zhang, J., Gao, Q., and Wang, H. 2006. A novel method for detecting outlying subspaces in high-dimensional databases using genetic algorithm. In Proceedings of the International Conference on Data Mining (ICDM), 731--740. Google ScholarDigital Library
Zhang, J. and Wang, H. 2006. Detecting outlying subspaces for high-dimensional data: The new task, algorithms, and performance. Knowl. Inf. Syst. 10, 3, 333--355. Google ScholarDigital Library
Zhu, C., Kitagawa, H., and Faloutsos, C. 2005. Example-Based robust outlier detection in high dimensional datasets. In Proceedings of the International Conference on Data Mining (ICDM), 829--832. Google ScholarDigital Library

Index Terms

Detecting outlying properties of exceptional objects
1. Information systems
  1. Data management systems

Recommendations

Mining Optimized Association Rules with Categorical and Numeric Attributes

Mining association rules on large data sets has received considerable attention in recent years. Association rules are useful for determining correlations between attributes of a relation and have applications in marketing, financial, and retail ...
Read More
Dataless Transitions Between Concise Representations of Frequent Patterns

For many data mining problems in order to solve them it is required to discover frequent patterns. Frequent itemsets are useful e.g. in the discovery of association and episode rules, sequential patterns and clusters. Nevertheless, the number of ...
Read More
Mining N-most interesting itemsets without support threshold by the COFI-tree

Data mining is the discovery of interesting and hidden patterns from a large amount of collected data. Applications can be found in many organisations with large databases, for many different purposes such as customer relationships, marketing, planning, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Database Systems Volume 34, Issue 1
April 2009
349 pages
ISSN:0362-5915
EISSN:1557-4644
DOI:10.1145/1508857
Issue’s Table of Contents

Copyright © 2009 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 April 2009
- Accepted: 1 September 2008
- Revised: 1 May 2008
- Received: 1 September 2007
Published in tods Volume 34, Issue 1

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Data mining
knowledge discovery
outlier characterization
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 53
  Total Citations
  View Citations
- 778
  Total Downloads
- Downloads (Last 12 months)13
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Detecting outlying properties of exceptional objects

ACM Transactions on Database Systems

Abstract

References

Cited By

Index Terms

Recommendations

Mining Optimized Association Rules with Categorical and Numeric Attributes

Dataless Transitions Between Concise Representations of Frequent Patterns

Mining N-most interesting itemsets without support threshold by the COFI-tree

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Detecting outlying properties of exceptional objects

ACM Transactions on Database Systems

Abstract

References

Cited By

Index Terms

Recommendations

Mining Optimized Association Rules with Categorical and Numeric Attributes

Dataless Transitions Between Concise Representations of Frequent Patterns

Mining N-most interesting itemsets without support threshold by the COFI-tree

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media