skip to main content
research-article

BayesStore: managing large, uncertain data repositories with probabilistic graphical models

Published:01 August 2008Publication History
Skip Abstract Section

Abstract

Several real-world applications need to effectively manage and reason about large amounts of data that are inherently uncertain. For instance, pervasive computing applications must constantly reason about volumes of noisy sensory readings for a variety of reasons, including motion prediction and human behavior modeling. Such probabilistic data analyses require sophisticated machine-learning tools that can effectively model the complex spatio/temporal correlation patterns present in uncertain sensory data. Unfortunately, to date, most existing approaches to probabilistic database systems have relied on somewhat simplistic models of uncertainty that can be easily mapped onto existing relational architectures: Probabilistic information is typically associated with individual data tuples, with only limited or no support for effectively capturing and reasoning about complex data correlations. In this paper, we introduce BayesStore, a novel probabilistic data management architecture built on the principle of handling statistical models and probabilistic inference tools as first-class citizens of the database system. Adopting a machine-learning view, BAYESSTORE employs concise statistical relational models to effectively encode the correlation patterns between uncertain data, and promotes probabilistic inference and statistical model manipulation as part of the standard DBMS operator repertoire to support efficient and sound query processing. We present BAYESSTORE's uncertainty model based on a novel, first-order statistical model, and we redefine traditional query processing operators, to manipulate the data and the probabilistic models of the database in an efficient manner. Finally, we validate our approach, by demonstrating the value of exploiting data correlations during query processing, and by evaluating a number of optimizations which significantly accelerate query processing.

References

  1. L. Antova, T. Jansen, C. Koch, and D. Olteanu. Fast and simple relational processing of uncertain data. In ICDE, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. L. Antova, D. Olteanu, and S. Scherzinger. 10(106) worlds and beyond: Efficient representation and processing of incomplete information. In ICDE, 2007.Google ScholarGoogle Scholar
  3. D. Barbará, H. Garcia-Molina, and D. Porter. The management of probabilistic data. IEEE TKDE, 4(5), 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. O. Benjelloun, A. D. Sarma, A. Halevy, and J. Widom. ULDB: Databases with Uncertainty and Lineage. In VLDB, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. R. Cavallo and M. Pittarelli. The theory of probabilistic databases. In VLDB, 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. R. G. Cowell, A. P. Dawid, S. L. Lauritzen, and D. J. Spiegelhalter. Probabilistic Networks and Expert Systems. Springer, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. N. Dalvi and D. Suciu. Efficient Query Evaluation on Probabilistic Databases. In VLDB, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. Deshpande and S. Madden. MauveDB: Supporting Model-based User Views in Database Systems. In SIGMOD, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. N. Friedman, L. Getoor, D. Koller, and A. Pfeffer. Learning Probabilistic Relational Models. In IJCAI, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. N. Fuhr and T. Rolleke. A Probabilistic Relational Algebra for the Integration of Information Retrieval and Database Systems. In ACM TOIS, 15(1), 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. R. Gupta and S. Sarawagi. Curating probabilistic databases from information extraction models. In VLDB, 2006 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. T. Imieliński and W. Lipski. Incomplete information in relational databases. JACM, 31(4), 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. I. Jordan. Graphical models. Statistical Science (Special Issue on Bayesian Statistics), 19:140--155, 2004.Google ScholarGoogle Scholar
  14. E. Michelakis, D. Z. Wang, M. Garofalakis, and J. M. Hellerstein. Granularity conscious modeling for probabilistic databases. In DUNE, 2007.Google ScholarGoogle Scholar
  15. A. Pfeffer. Probabilistic Reasoning for Complex Systems. PhD thesis, Stanford, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. D. Poole. First-order Probabilistic Inference. In IJCAI, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. P. Sen and A. Deshpande. Representing and Querying Correlated Tuples in Probabilistic Databases. In ICDE, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  18. P. Sen, A. Deshpande, and L. Getoor. Representing tuple and attribute uncertainty in probabilistic databases. In DUNE, 2007.Google ScholarGoogle Scholar
  19. B. Taskar, P. Abbeel, and D. Koller. Discriminative probabilistic models for relational data. In UAI, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. BayesStore: managing large, uncertain data repositories with probabilistic graphical models

                    Recommendations

                    Comments

                    Login options

                    Check if you have access through your login credentials or your institution to get full access on this article.

                    Sign in

                    Full Access

                    PDF Format

                    View or Download as a PDF file.

                    PDF

                    eReader

                    View online with eReader.

                    eReader