Abstract
Several real-world applications need to effectively manage and reason about large amounts of data that are inherently uncertain. For instance, pervasive computing applications must constantly reason about volumes of noisy sensory readings for a variety of reasons, including motion prediction and human behavior modeling. Such probabilistic data analyses require sophisticated machine-learning tools that can effectively model the complex spatio/temporal correlation patterns present in uncertain sensory data. Unfortunately, to date, most existing approaches to probabilistic database systems have relied on somewhat simplistic models of uncertainty that can be easily mapped onto existing relational architectures: Probabilistic information is typically associated with individual data tuples, with only limited or no support for effectively capturing and reasoning about complex data correlations. In this paper, we introduce BayesStore, a novel probabilistic data management architecture built on the principle of handling statistical models and probabilistic inference tools as first-class citizens of the database system. Adopting a machine-learning view, BAYESSTORE employs concise statistical relational models to effectively encode the correlation patterns between uncertain data, and promotes probabilistic inference and statistical model manipulation as part of the standard DBMS operator repertoire to support efficient and sound query processing. We present BAYESSTORE's uncertainty model based on a novel, first-order statistical model, and we redefine traditional query processing operators, to manipulate the data and the probabilistic models of the database in an efficient manner. Finally, we validate our approach, by demonstrating the value of exploiting data correlations during query processing, and by evaluating a number of optimizations which significantly accelerate query processing.
- L. Antova, T. Jansen, C. Koch, and D. Olteanu. Fast and simple relational processing of uncertain data. In ICDE, 2008. Google ScholarDigital Library
- L. Antova, D. Olteanu, and S. Scherzinger. 10(106) worlds and beyond: Efficient representation and processing of incomplete information. In ICDE, 2007.Google Scholar
- D. Barbará, H. Garcia-Molina, and D. Porter. The management of probabilistic data. IEEE TKDE, 4(5), 1992. Google ScholarDigital Library
- O. Benjelloun, A. D. Sarma, A. Halevy, and J. Widom. ULDB: Databases with Uncertainty and Lineage. In VLDB, 2006. Google ScholarDigital Library
- R. Cavallo and M. Pittarelli. The theory of probabilistic databases. In VLDB, 1987. Google ScholarDigital Library
- R. G. Cowell, A. P. Dawid, S. L. Lauritzen, and D. J. Spiegelhalter. Probabilistic Networks and Expert Systems. Springer, 1999. Google ScholarDigital Library
- N. Dalvi and D. Suciu. Efficient Query Evaluation on Probabilistic Databases. In VLDB, 2004. Google ScholarDigital Library
- A. Deshpande and S. Madden. MauveDB: Supporting Model-based User Views in Database Systems. In SIGMOD, 2006. Google ScholarDigital Library
- N. Friedman, L. Getoor, D. Koller, and A. Pfeffer. Learning Probabilistic Relational Models. In IJCAI, 1999. Google ScholarDigital Library
- N. Fuhr and T. Rolleke. A Probabilistic Relational Algebra for the Integration of Information Retrieval and Database Systems. In ACM TOIS, 15(1), 1997. Google ScholarDigital Library
- R. Gupta and S. Sarawagi. Curating probabilistic databases from information extraction models. In VLDB, 2006 Google ScholarDigital Library
- T. Imieliński and W. Lipski. Incomplete information in relational databases. JACM, 31(4), 1984. Google ScholarDigital Library
- M. I. Jordan. Graphical models. Statistical Science (Special Issue on Bayesian Statistics), 19:140--155, 2004.Google Scholar
- E. Michelakis, D. Z. Wang, M. Garofalakis, and J. M. Hellerstein. Granularity conscious modeling for probabilistic databases. In DUNE, 2007.Google Scholar
- A. Pfeffer. Probabilistic Reasoning for Complex Systems. PhD thesis, Stanford, 2000. Google ScholarDigital Library
- D. Poole. First-order Probabilistic Inference. In IJCAI, 2003. Google ScholarDigital Library
- P. Sen and A. Deshpande. Representing and Querying Correlated Tuples in Probabilistic Databases. In ICDE, 2007.Google ScholarCross Ref
- P. Sen, A. Deshpande, and L. Getoor. Representing tuple and attribute uncertainty in probabilistic databases. In DUNE, 2007.Google Scholar
- B. Taskar, P. Abbeel, and D. Koller. Discriminative probabilistic models for relational data. In UAI, 2002. Google ScholarDigital Library
Index Terms
- BayesStore: managing large, uncertain data repositories with probabilistic graphical models
Recommendations
Numerical approach for quantification of epistemic uncertainty
In the field of uncertainty quantification, uncertainty in the governing equations may assume two forms: aleatory uncertainty and epistemic uncertainty. Aleatory uncertainty can be characterised by known probability distributions whilst epistemic ...
Naive possibilistic classifiers for imprecise or uncertain numerical data
In real-world problems, input data may be pervaded with uncertainty. In this paper, we investigate the behavior of naive possibilistic classifiers, as a counterpart to naive Bayesian ones, for dealing with classification tasks in the presence of ...
Comments