skip to main content
10.1145/2541940.2541958acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article

Uncertain<T>: a first-order type for uncertain data

Published:24 February 2014Publication History

ABSTRACT

Emerging applications increasingly use estimates such as sensor data (GPS), probabilistic models, machine learning, big data, and human data. Unfortunately, representing this uncertain data with discrete types (floats, integers, and booleans) encourages developers to pretend it is not probabilistic, which causes three types of uncertainty bugs. (1) Using estimates as facts ignores random error in estimates. (2) Computation compounds that error. (3) Boolean questions on probabilistic data induce false positives and negatives.

This paper introduces Uncertain<T>, a new programming language abstraction for uncertain data. We implement a Bayesian network semantics for computation and conditionals that improves program correctness. The runtime uses sampling and hypothesis tests to evaluate computation and conditionals lazily and efficiently. We illustrate with sensor and machine learning applications that Uncertain<T> improves expressiveness and accuracy.

Whereas previous probabilistic programming languages focus on experts, Uncertain<T> serves a wide range of developers. Experts still identify error distributions. However, both experts and application writers compute with distributions, improve estimates with domain knowledge, and ask questions with conditionals. The Uncertain<T> type system and operators encourage developers to expose and reason about uncertainty explicitly, controlling false positives and false negatives. These benefits make Uncertain<T> a compelling programming model for modern applications facing the challenge of uncertainty.

References

  1. D. Barbara, H. Garcia-Molina, and D. Porter. The management of probabilistic data. phIEEE Transactions on Knowledge and Data Engineering, 4 (5): 487--502, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. O. Benjelloun, A. D. Sarma, A. Halevy, and J. Widom. ULDBs: Databases with uncertainty and lineage. In phACM International Conference on Very Large Data Bases (VLDB), pages 953--964, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. E. R. Berlekamp, J. H. Conway, and R. K. Guy. phWinning Ways for Your Mathematical Plays, volume 4. A K Peters, 2004.Google ScholarGoogle Scholar
  4. S. Bhat, A. Agarwal, R. Vuduc, and A. Gray. A type theory for probability density functions. In phACM Symposium on Principles of Programming Languages (POPL), pages 545--556, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. M. Bishop. phPattern Recognition and Machine Learning. Springer, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. et al.(2011)Borgström, Gordon, Greenberg, Margetson, and Van Gael}BGGGoogle ScholarGoogle Scholar
  7. :11J. Borgström, A. D. Gordon, M. Greenberg, J. Margetson, and J. Van Gael. Measure transformer semantics for Bayesian machine learning. In phEuropean Symposium on Programming (ESOP), pages 77--96, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Bornholt. phAbstractions and techniques for programming with uncertain data. Honors thesis, Australian National University, 2013.Google ScholarGoogle Scholar
  9. G. E. P. Box and M. E. Muller. A note on the generation of random normal deviates. phThe Annals of Mathematical Statistics, 29 (2): 610--611, 1958.Google ScholarGoogle ScholarCross RefCross Ref
  10. M. Carbin, S. Misailovic, and M. C. Rinard. Verifying quantitative reliability of programs that execute on unreliable hardware. In phACM Conference on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA), pages 33--52, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. T. Chaganty, A. V. Nori, and S. K. Rajamani. Efficiently sampling probabilistic programs via program analysis. In phProceedings of the 16th international conference on Artificial Intelligence and Statistics, AISTATS 2013, Scottsdale, AZ, USA, April 29 - May 1, 2013, pages 153--160. JMLR, 2013.Google ScholarGoogle Scholar
  12. N. Dalvi and D. Suciu. Management of probabilistic data: Foundations and challenges. In phACM Symposium on Principles of Database Systems (PODS), pages 1--12, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. H. Esmaeilzadeh, A. Sampson, L. Ceze, and D. Burger. Neural acceleration for general-purpose approximate programs. In phIEEE/ACM International Symposium on Microarchitecture (MICRO), pages 449--460, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. W. R. Gilks, A. Thomas, and D. J. Spiegelhalter. A language and program for complex Bayesian modelling. phJournal of the Royal Statistical Society. Series D (The Statistician), 43 (1): 169--177, 1994.Google ScholarGoogle Scholar
  15. M. Giry. A categorical approach to probability theory. In phCategorical Aspects of Topology and Analysis, volume 915 of phLecture Notes in Mathematics, pages 68--85. 1982.Google ScholarGoogle Scholar
  16. N. D. Goodman, V. K. Mansinghka, D. M. Roy, K. Bonawitz, and J. B. Tenenbaum. Church: A language for generative models. In phConference in Uncertainty in Artificial Intelligence (UAI), pages 220--229, 2008.Google ScholarGoogle Scholar
  17. S. Jaroszewicz and M. Korzen. Arithmetic operations on independent random variables: A numerical approach. phSIAM Journal on Scientific Computing, 34: A1241--A1265, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  18. C. Jennison and B. W. Turnbull. phGroup sequential methods with applications to clinical trials. Chapman & Hall, 2000.Google ScholarGoogle Scholar
  19. KNR:13A. Kumar, F. Niu, and C. Ré. Hazy: Making it Easier to Build and Maintain Big-data Analytics. phACM Queue, 11 (1): 30--46, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. R. E. Moore. phInterval analysis. Prentice-Hall, 1966.Google ScholarGoogle Scholar
  21. R. M. Neal. phBayesian learning for neural networks. PhD thesis, University of Toronto, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. P. Newson and J. Krumm. Hidden Markov map matching through noise and sparseness. In phACM International Conference on Advances in Geographic Information Systems (GIS), pages 336--343, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. A. Papoulis and S. U. Pillai. phProbability, random variables, and stochastic processes. New York, NY, 4th edition, 2000.Google ScholarGoogle Scholar
  24. S. Park, F. Pfenning, and S. Thrun. A probabilistic language based on sampling functions. In phACM Symposium on Principles of Programming Languages (POPL), pages 171--182, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. A. Pfeffer. IBAL: a probabilistic rational programming language. In phInternational Joint Conference on Artificial Intelligence (IJCAI), pages 733--740, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. N. Ramsey and A. Pfeffer. Stochastic lambda calculus and monads of probability distributions. In phACM Symposium on Principles of Programming Languages (POPL), pages 154--165, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. A. Sampson, W. Dietl, E. Fortuna, D. Gnanapragasam, L. Ceze, and D. Grossman. EnerJ: Approximate data types for safe and general low-power computation. In phACM Conference on Programming Language Design and Implementation (PLDI), pages 164--174, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. S. Sankaranarayanan, A. Chakarov, and S. Gulwani. Static analysis for probabilistic programs: inferring whole program properties from finitely many paths. In phACM Conference on Programming Language Design and Implementation (PLDI), pages 447--458, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. J. Schwarz, J. Mankoff, and S. E. Hudson. Monte Carlo methods for managing interactive state, action and feedback under uncertainty. In phACM Symposium on User Interface Software and Technology (UIST), pages 235--144, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. R. Thompson. Global positioning system: the mathematics of GPS receivers. phMathematics Magazine, 71 (4): 260--269, 1998.Google ScholarGoogle ScholarCross RefCross Ref
  31. S. Thrun. Towards programming tools for robots that integrate probabilistic computation and learning. In phIEEE International Conference on Robotics and Automation (ICRA), pages 306--312, 2000.Google ScholarGoogle Scholar
  32. 1970)}T:70F. Topsøe. On the Glivenko-Cantelli theorem. phZeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete, 14: 239--250, 1970.Google ScholarGoogle Scholar
  33. A. Wald. Sequential Tests of Statistical Hypotheses. phThe Annals of Mathematical Statistics, 16 (2): 117--186, 1945.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Uncertain<T>: a first-order type for uncertain data

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      ASPLOS '14: Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
      February 2014
      780 pages
      ISBN:9781450323055
      DOI:10.1145/2541940

      Copyright © 2014 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 24 February 2014

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      ASPLOS '14 Paper Acceptance Rate49of217submissions,23%Overall Acceptance Rate535of2,713submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader