skip to main content
research-article

MCDB-R: risk analysis in the database

Published:01 September 2010Publication History
Skip Abstract Section

Abstract

Enterprises often need to assess and manage the risk arising from uncertainty in their data. Such uncertainty is typically modeled as a probability distribution over the uncertain data values, specified by means of a complex (often predictive) stochastic model. The probability distribution over data values leads to a probability distribution over database query results, and risk assessment amounts to exploration of the upper or lower tail of a query-result distribution. In this paper, we extend the Monte Carlo Database System to efficiently obtain a set of samples from the tail of a query-result distribution by adapting recent "Gibbs cloning" ideas from the simulation literature to a database setting.

References

  1. S. Asmussen and P. W. Glynn. Stochastic Simulation: Algorithms and Analysis. Springer, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  2. Z. I. Botev and D. P. Kroese. An efficient algorithm for rare-event probability estimation, combinatorial optimization, and counting. Methodol. Comput. Appl. Prob., 10:471--505, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  3. R. C. Bradley. Basic properties of strong mixing conditions: A survey and some open questions. Probab. Surveys, 2:107--144, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  4. F. Cérou, P. D. Moral, T. Furon, and A. Guyader. Rare event simulation for a static distribution. INRIA Research Report 6792, Rennes, France, 2009.Google ScholarGoogle Scholar
  5. C.-T. Chu, S. K. Kim, Y.-A. Lin, Y. Yu, G. R. Bradski, A. Y. Ng, and K. Olukotun. Map-reduce for machine learning on multicore. In NIPS, pages 281--288, 2006.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Cohen, B. Dolan, M. Dunlap, J. M. Hellerstein, and C. Welton. MAD skills: New analysis practices for big data. PVLDB, 2(2):1481--1492, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. N. N. Dalvi, C. Ré, and D. Suciu. Probabilistic databases: diamonds in the dirt. Commun. ACM, 52(7):86--94, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. Das Sarma, O. Benjelloun, A. Y. Halevy, S. U. Nabar, and J. Widom. Representing uncertain data: models, properties, and algorithms. VLDB J., 18(5):989--1019, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. H. A. David and H. N. Nagaraja. Order Statistics. Wiley, third edition, 2003.Google ScholarGoogle Scholar
  10. A. Deshpande and S. Madden. MauveDB: supporting model-based user views in database systems. In SIGMOD, pages 73--84, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S. Geman and D. Geman. Stochastic relaxation, Gibbs distribution and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intelligence, 6(6):721--741, 1984.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. Guha. RHIPE - R and Hadoop Integrated Processing Environment. http://ml.stat.purdue.edu/rhipe/.Google ScholarGoogle Scholar
  13. R. Jampani, F. Xu, M. Wu, L. L. Perez, C. M. Jermaine, and P. J. Haas. MCDB: a Monte Carlo approach to managing uncertain data. In ACM SIGMOD, pages 687--700, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. O. Kennedy and C. Koch. PIP: A database system for great and small expectations. In ICDE, pages 157--168, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  15. C. Koch and D. Olteanu. Conditioning probabilistic databases. PVLDB, 1(1):313--325, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. J. McNeil, R. Frey, and P. Embrechts. Quantitative Risk Management: Concepts, Techniques, and Tools. Princeton University Press, 2005.Google ScholarGoogle Scholar
  17. G. Rubino and B. Tuffin, editors. Rare Event Simulation Using Monte Carlo. Wiley, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. R. Rubinstein. The Gibbs cloner for combinatorial optimization, counting, and sampling. Methodol. Comput. Appl. Prob., 11(4):491--549, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  19. R. J. Serfling. Approximation Theorems of Mathematical Statistics. Wiley, 1980.Google ScholarGoogle ScholarCross RefCross Ref
  20. M. Stonebraker, J. Becla, D. J. DeWitt, K.-T. Lim, D. Maier, O. Ratzesberger, and S. B. Zdonik. Requirements for science data bases and SciDB. In CIDR, page 26, 2009.Google ScholarGoogle Scholar
  21. A. Thiagarajan and S. Madden. Querying continuous functions in a database system. In SIGMOD, pages 791--804, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. MCDB-R: risk analysis in the database
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image Proceedings of the VLDB Endowment
      Proceedings of the VLDB Endowment  Volume 3, Issue 1-2
      September 2010
      1658 pages

      Publisher

      VLDB Endowment

      Publication History

      • Published: 1 September 2010
      Published in pvldb Volume 3, Issue 1-2

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader