research-article

Uncertain<T>: a first-order type for uncertain data

Authors:
James Bornholt

Australian National University, Canberra, Australia

Australian National University, Canberra, Australia
View Profile

,
Todd Mytkowicz

Microsoft Research, Redmond, WA, USA

Microsoft Research, Redmond, WA, USA
View Profile

,
Kathryn S. McKinley

Microsoft Research, Redmond, WA, USA

Microsoft Research, Redmond, WA, USA
View Profile

ASPLOS '14: Proceedings of the 19th international conference on Architectural support for programming languages and operating systemsFebruary 2014Pages 51–66https://doi.org/10.1145/2541940.2541958

Published:24 February 2014Publication History

ASPLOS '14: Proceedings of the 19th international conference on Architectural support for programming languages and operating systems

Pages 51–66

ABSTRACT

Emerging applications increasingly use estimates such as sensor data (GPS), probabilistic models, machine learning, big data, and human data. Unfortunately, representing this uncertain data with discrete types (floats, integers, and booleans) encourages developers to pretend it is not probabilistic, which causes three types of uncertainty bugs. (1) Using estimates as facts ignores random error in estimates. (2) Computation compounds that error. (3) Boolean questions on probabilistic data induce false positives and negatives.

This paper introduces Uncertain<T>, a new programming language abstraction for uncertain data. We implement a Bayesian network semantics for computation and conditionals that improves program correctness. The runtime uses sampling and hypothesis tests to evaluate computation and conditionals lazily and efficiently. We illustrate with sensor and machine learning applications that Uncertain<T> improves expressiveness and accuracy.

Whereas previous probabilistic programming languages focus on experts, Uncertain<T> serves a wide range of developers. Experts still identify error distributions. However, both experts and application writers compute with distributions, improve estimates with domain knowledge, and ask questions with conditionals. The Uncertain<T> type system and operators encourage developers to expose and reason about uncertainty explicitly, controlling false positives and false negatives. These benefits make Uncertain<T> a compelling programming model for modern applications facing the challenge of uncertainty.

References

D. Barbara, H. Garcia-Molina, and D. Porter. The management of probabilistic data. phIEEE Transactions on Knowledge and Data Engineering, 4 (5): 487--502, 1992. Google ScholarDigital Library
O. Benjelloun, A. D. Sarma, A. Halevy, and J. Widom. ULDBs: Databases with uncertainty and lineage. In phACM International Conference on Very Large Data Bases (VLDB), pages 953--964, 2006. Google ScholarDigital Library
E. R. Berlekamp, J. H. Conway, and R. K. Guy. phWinning Ways for Your Mathematical Plays, volume 4. A K Peters, 2004.Google Scholar
S. Bhat, A. Agarwal, R. Vuduc, and A. Gray. A type theory for probability density functions. In phACM Symposium on Principles of Programming Languages (POPL), pages 545--556, 2012. Google ScholarDigital Library
C. M. Bishop. phPattern Recognition and Machine Learning. Springer, 2006. Google ScholarDigital Library
et al.(2011)Borgström, Gordon, Greenberg, Margetson, and Van Gael}BGGGoogle Scholar
:11J. Borgström, A. D. Gordon, M. Greenberg, J. Margetson, and J. Van Gael. Measure transformer semantics for Bayesian machine learning. In phEuropean Symposium on Programming (ESOP), pages 77--96, 2011. Google ScholarDigital Library
J. Bornholt. phAbstractions and techniques for programming with uncertain data. Honors thesis, Australian National University, 2013.Google Scholar
G. E. P. Box and M. E. Muller. A note on the generation of random normal deviates. phThe Annals of Mathematical Statistics, 29 (2): 610--611, 1958.Google ScholarCross Ref
M. Carbin, S. Misailovic, and M. C. Rinard. Verifying quantitative reliability of programs that execute on unreliable hardware. In phACM Conference on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA), pages 33--52, 2013. Google ScholarDigital Library
A. T. Chaganty, A. V. Nori, and S. K. Rajamani. Efficiently sampling probabilistic programs via program analysis. In phProceedings of the 16th international conference on Artificial Intelligence and Statistics, AISTATS 2013, Scottsdale, AZ, USA, April 29 - May 1, 2013, pages 153--160. JMLR, 2013.Google Scholar
N. Dalvi and D. Suciu. Management of probabilistic data: Foundations and challenges. In phACM Symposium on Principles of Database Systems (PODS), pages 1--12, 2007. Google ScholarDigital Library
H. Esmaeilzadeh, A. Sampson, L. Ceze, and D. Burger. Neural acceleration for general-purpose approximate programs. In phIEEE/ACM International Symposium on Microarchitecture (MICRO), pages 449--460, 2012. Google ScholarDigital Library
W. R. Gilks, A. Thomas, and D. J. Spiegelhalter. A language and program for complex Bayesian modelling. phJournal of the Royal Statistical Society. Series D (The Statistician), 43 (1): 169--177, 1994.Google Scholar
M. Giry. A categorical approach to probability theory. In phCategorical Aspects of Topology and Analysis, volume 915 of phLecture Notes in Mathematics, pages 68--85. 1982.Google Scholar
N. D. Goodman, V. K. Mansinghka, D. M. Roy, K. Bonawitz, and J. B. Tenenbaum. Church: A language for generative models. In phConference in Uncertainty in Artificial Intelligence (UAI), pages 220--229, 2008.Google Scholar
S. Jaroszewicz and M. Korzen. Arithmetic operations on independent random variables: A numerical approach. phSIAM Journal on Scientific Computing, 34: A1241--A1265, 2012.Google ScholarCross Ref
C. Jennison and B. W. Turnbull. phGroup sequential methods with applications to clinical trials. Chapman & Hall, 2000.Google Scholar
KNR:13A. Kumar, F. Niu, and C. Ré. Hazy: Making it Easier to Build and Maintain Big-data Analytics. phACM Queue, 11 (1): 30--46, 2013. Google ScholarDigital Library
R. E. Moore. phInterval analysis. Prentice-Hall, 1966.Google Scholar
R. M. Neal. phBayesian learning for neural networks. PhD thesis, University of Toronto, 1994. Google ScholarDigital Library
P. Newson and J. Krumm. Hidden Markov map matching through noise and sparseness. In phACM International Conference on Advances in Geographic Information Systems (GIS), pages 336--343, 2009. Google ScholarDigital Library
A. Papoulis and S. U. Pillai. phProbability, random variables, and stochastic processes. New York, NY, 4th edition, 2000.Google Scholar
S. Park, F. Pfenning, and S. Thrun. A probabilistic language based on sampling functions. In phACM Symposium on Principles of Programming Languages (POPL), pages 171--182, 2005. Google ScholarDigital Library
A. Pfeffer. IBAL: a probabilistic rational programming language. In phInternational Joint Conference on Artificial Intelligence (IJCAI), pages 733--740, 2001. Google ScholarDigital Library
N. Ramsey and A. Pfeffer. Stochastic lambda calculus and monads of probability distributions. In phACM Symposium on Principles of Programming Languages (POPL), pages 154--165, 2002. Google ScholarDigital Library
A. Sampson, W. Dietl, E. Fortuna, D. Gnanapragasam, L. Ceze, and D. Grossman. EnerJ: Approximate data types for safe and general low-power computation. In phACM Conference on Programming Language Design and Implementation (PLDI), pages 164--174, 2011. Google ScholarDigital Library
S. Sankaranarayanan, A. Chakarov, and S. Gulwani. Static analysis for probabilistic programs: inferring whole program properties from finitely many paths. In phACM Conference on Programming Language Design and Implementation (PLDI), pages 447--458, 2013. Google ScholarDigital Library
J. Schwarz, J. Mankoff, and S. E. Hudson. Monte Carlo methods for managing interactive state, action and feedback under uncertainty. In phACM Symposium on User Interface Software and Technology (UIST), pages 235--144, 2011. Google ScholarDigital Library
R. Thompson. Global positioning system: the mathematics of GPS receivers. phMathematics Magazine, 71 (4): 260--269, 1998.Google ScholarCross Ref
S. Thrun. Towards programming tools for robots that integrate probabilistic computation and learning. In phIEEE International Conference on Robotics and Automation (ICRA), pages 306--312, 2000.Google Scholar
1970)}T:70F. Topsøe. On the Glivenko-Cantelli theorem. phZeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete, 14: 239--250, 1970.Google Scholar
A. Wald. Sequential Tests of Statistical Hypotheses. phThe Annals of Mathematical Statistics, 16 (2): 117--186, 1945.Google ScholarCross Ref

Index Terms

Uncertain<T>: a first-order type for uncertain data
1. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language features
        Data types and structures

Recommendations

Uncertain<T>: a first-order type for uncertain data
ASPLOS '14

Emerging applications increasingly use estimates such as sensor data (GPS), probabilistic models, machine learning, big data, and human data. Unfortunately, representing this uncertain data with discrete types (floats, integers, and booleans) encourages ...
Read More
Uncertain<T>: a first-order type for uncertain data
ASPLOS '14

Emerging applications increasingly use estimates such as sensor data (GPS), probabilistic models, machine learning, big data, and human data. Unfortunately, representing this uncertain data with discrete types (floats, integers, and booleans) encourages ...
Read More
Probabilistic skylines on uncertain data: model and bounding-pruning-refining methods

Uncertain data are inherent in some important applications. Although a considerable amount of research has been dedicated to modeling uncertain data and answering some types of queries on uncertain data, how to conduct advanced analysis on uncertain ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ASPLOS '14: Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
February 2014
780 pages
ISBN:9781450323055
DOI:10.1145/2541940
General Chairs:
Rajeev Balasubramonian
University of Utah
,
Al Davis
University of Utah
,
Program Chair:
Sarita Adve
University of Illinois at Urbana-Champ
ACM SIGARCH Computer Architecture News Volume 42, Issue 1
ASPLOS '14
March 2014
729 pages
ISSN:0163-5964
DOI:10.1145/2654822
Editor:
Doug DeGroot
acm dot org
Issue’s Table of Contents
ACM SIGPLAN Notices Volume 49, Issue 4
ASPLOS '14
April 2014
729 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/2644865
Editors:
Mark W. Bailey
Hamilton College, Clinton, NY
,
Rajeev Balasubramonian
University of Utah
,
Al Davis
University of Utah
,
Sarita Adve
University of Illinois at Urbana-Champ
Issue’s Table of Contents
Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 February 2014
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
estimates
probabilistic programming
uncertain data
Qualifiers
- research-article
Conference

Acceptance Rates
ASPLOS '14 Paper Acceptance Rate49of217submissions,23%Overall Acceptance Rate535of2,713submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 88
  Total Citations
  View Citations
- 1,176
  Total Downloads
- Downloads (Last 12 months)52
- Downloads (Last 6 weeks)9
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Uncertain<T>: a first-order type for uncertain data

ASPLOS '14: Proceedings of the 19th international conference on Architectural support for programming languages and operating systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Uncertain<T>: a first-order type for uncertain data

Uncertain<T>: a first-order type for uncertain data

Probabilistic skylines on uncertain data: model and bounding-pruning-refining methods