ABSTRACT
Emerging applications increasingly use estimates such as sensor data (GPS), probabilistic models, machine learning, big data, and human data. Unfortunately, representing this uncertain data with discrete types (floats, integers, and booleans) encourages developers to pretend it is not probabilistic, which causes three types of uncertainty bugs. (1) Using estimates as facts ignores random error in estimates. (2) Computation compounds that error. (3) Boolean questions on probabilistic data induce false positives and negatives.
This paper introduces Uncertain<T>, a new programming language abstraction for uncertain data. We implement a Bayesian network semantics for computation and conditionals that improves program correctness. The runtime uses sampling and hypothesis tests to evaluate computation and conditionals lazily and efficiently. We illustrate with sensor and machine learning applications that Uncertain<T> improves expressiveness and accuracy.
Whereas previous probabilistic programming languages focus on experts, Uncertain<T> serves a wide range of developers. Experts still identify error distributions. However, both experts and application writers compute with distributions, improve estimates with domain knowledge, and ask questions with conditionals. The Uncertain<T> type system and operators encourage developers to expose and reason about uncertainty explicitly, controlling false positives and false negatives. These benefits make Uncertain<T> a compelling programming model for modern applications facing the challenge of uncertainty.
- D. Barbara, H. Garcia-Molina, and D. Porter. The management of probabilistic data. phIEEE Transactions on Knowledge and Data Engineering, 4 (5): 487--502, 1992. Google ScholarDigital Library
- O. Benjelloun, A. D. Sarma, A. Halevy, and J. Widom. ULDBs: Databases with uncertainty and lineage. In phACM International Conference on Very Large Data Bases (VLDB), pages 953--964, 2006. Google ScholarDigital Library
- E. R. Berlekamp, J. H. Conway, and R. K. Guy. phWinning Ways for Your Mathematical Plays, volume 4. A K Peters, 2004.Google Scholar
- S. Bhat, A. Agarwal, R. Vuduc, and A. Gray. A type theory for probability density functions. In phACM Symposium on Principles of Programming Languages (POPL), pages 545--556, 2012. Google ScholarDigital Library
- C. M. Bishop. phPattern Recognition and Machine Learning. Springer, 2006. Google ScholarDigital Library
- et al.(2011)Borgström, Gordon, Greenberg, Margetson, and Van Gael}BGGGoogle Scholar
- :11J. Borgström, A. D. Gordon, M. Greenberg, J. Margetson, and J. Van Gael. Measure transformer semantics for Bayesian machine learning. In phEuropean Symposium on Programming (ESOP), pages 77--96, 2011. Google ScholarDigital Library
- J. Bornholt. phAbstractions and techniques for programming with uncertain data. Honors thesis, Australian National University, 2013.Google Scholar
- G. E. P. Box and M. E. Muller. A note on the generation of random normal deviates. phThe Annals of Mathematical Statistics, 29 (2): 610--611, 1958.Google ScholarCross Ref
- M. Carbin, S. Misailovic, and M. C. Rinard. Verifying quantitative reliability of programs that execute on unreliable hardware. In phACM Conference on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA), pages 33--52, 2013. Google ScholarDigital Library
- A. T. Chaganty, A. V. Nori, and S. K. Rajamani. Efficiently sampling probabilistic programs via program analysis. In phProceedings of the 16th international conference on Artificial Intelligence and Statistics, AISTATS 2013, Scottsdale, AZ, USA, April 29 - May 1, 2013, pages 153--160. JMLR, 2013.Google Scholar
- N. Dalvi and D. Suciu. Management of probabilistic data: Foundations and challenges. In phACM Symposium on Principles of Database Systems (PODS), pages 1--12, 2007. Google ScholarDigital Library
- H. Esmaeilzadeh, A. Sampson, L. Ceze, and D. Burger. Neural acceleration for general-purpose approximate programs. In phIEEE/ACM International Symposium on Microarchitecture (MICRO), pages 449--460, 2012. Google ScholarDigital Library
- W. R. Gilks, A. Thomas, and D. J. Spiegelhalter. A language and program for complex Bayesian modelling. phJournal of the Royal Statistical Society. Series D (The Statistician), 43 (1): 169--177, 1994.Google Scholar
- M. Giry. A categorical approach to probability theory. In phCategorical Aspects of Topology and Analysis, volume 915 of phLecture Notes in Mathematics, pages 68--85. 1982.Google Scholar
- N. D. Goodman, V. K. Mansinghka, D. M. Roy, K. Bonawitz, and J. B. Tenenbaum. Church: A language for generative models. In phConference in Uncertainty in Artificial Intelligence (UAI), pages 220--229, 2008.Google Scholar
- S. Jaroszewicz and M. Korzen. Arithmetic operations on independent random variables: A numerical approach. phSIAM Journal on Scientific Computing, 34: A1241--A1265, 2012.Google ScholarCross Ref
- C. Jennison and B. W. Turnbull. phGroup sequential methods with applications to clinical trials. Chapman & Hall, 2000.Google Scholar
- KNR:13A. Kumar, F. Niu, and C. Ré. Hazy: Making it Easier to Build and Maintain Big-data Analytics. phACM Queue, 11 (1): 30--46, 2013. Google ScholarDigital Library
- R. E. Moore. phInterval analysis. Prentice-Hall, 1966.Google Scholar
- R. M. Neal. phBayesian learning for neural networks. PhD thesis, University of Toronto, 1994. Google ScholarDigital Library
- P. Newson and J. Krumm. Hidden Markov map matching through noise and sparseness. In phACM International Conference on Advances in Geographic Information Systems (GIS), pages 336--343, 2009. Google ScholarDigital Library
- A. Papoulis and S. U. Pillai. phProbability, random variables, and stochastic processes. New York, NY, 4th edition, 2000.Google Scholar
- S. Park, F. Pfenning, and S. Thrun. A probabilistic language based on sampling functions. In phACM Symposium on Principles of Programming Languages (POPL), pages 171--182, 2005. Google ScholarDigital Library
- A. Pfeffer. IBAL: a probabilistic rational programming language. In phInternational Joint Conference on Artificial Intelligence (IJCAI), pages 733--740, 2001. Google ScholarDigital Library
- N. Ramsey and A. Pfeffer. Stochastic lambda calculus and monads of probability distributions. In phACM Symposium on Principles of Programming Languages (POPL), pages 154--165, 2002. Google ScholarDigital Library
- A. Sampson, W. Dietl, E. Fortuna, D. Gnanapragasam, L. Ceze, and D. Grossman. EnerJ: Approximate data types for safe and general low-power computation. In phACM Conference on Programming Language Design and Implementation (PLDI), pages 164--174, 2011. Google ScholarDigital Library
- S. Sankaranarayanan, A. Chakarov, and S. Gulwani. Static analysis for probabilistic programs: inferring whole program properties from finitely many paths. In phACM Conference on Programming Language Design and Implementation (PLDI), pages 447--458, 2013. Google ScholarDigital Library
- J. Schwarz, J. Mankoff, and S. E. Hudson. Monte Carlo methods for managing interactive state, action and feedback under uncertainty. In phACM Symposium on User Interface Software and Technology (UIST), pages 235--144, 2011. Google ScholarDigital Library
- R. Thompson. Global positioning system: the mathematics of GPS receivers. phMathematics Magazine, 71 (4): 260--269, 1998.Google ScholarCross Ref
- S. Thrun. Towards programming tools for robots that integrate probabilistic computation and learning. In phIEEE International Conference on Robotics and Automation (ICRA), pages 306--312, 2000.Google Scholar
- 1970)}T:70F. Topsøe. On the Glivenko-Cantelli theorem. phZeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete, 14: 239--250, 1970.Google Scholar
- A. Wald. Sequential Tests of Statistical Hypotheses. phThe Annals of Mathematical Statistics, 16 (2): 117--186, 1945.Google ScholarCross Ref
Index Terms
- Uncertain<T>: a first-order type for uncertain data
Recommendations
Uncertain<T>: a first-order type for uncertain data
ASPLOS '14Emerging applications increasingly use estimates such as sensor data (GPS), probabilistic models, machine learning, big data, and human data. Unfortunately, representing this uncertain data with discrete types (floats, integers, and booleans) encourages ...
Uncertain<T>: a first-order type for uncertain data
ASPLOS '14Emerging applications increasingly use estimates such as sensor data (GPS), probabilistic models, machine learning, big data, and human data. Unfortunately, representing this uncertain data with discrete types (floats, integers, and booleans) encourages ...
Probabilistic skylines on uncertain data: model and bounding-pruning-refining methods
Uncertain data are inherent in some important applications. Although a considerable amount of research has been dedicated to modeling uncertain data and answering some types of queries on uncertain data, how to conduct advanced analysis on uncertain ...
Comments