Abstract
Information quality (IQ) is a measure of how fit information is for a purpose. Sometimes called Quality of Information (QoI) by analogy with Quality of Service (QoS), it quantifies whether the correct information is being used to make a decision or take an action. Not understanding when information is of adequate quality can lead to bad decisions and catastrophic effects, including system outages, increased costs, lost revenue -- and worse. Quantifying information quality can help improve decision making, but the ultimate goal should be to select or construct information producers that have the appropriate balance between information quality and the cost of providing it. In this paper, we provide a brief introduction to the field, argue the case for applying information quality metrics in the systems domain, and propose a research agenda to explore this space.
- S. Agarwala, Y. Chen, D. Milojicic, and K. Schwan, "QMON: QoS- and utility-aware monitoring in enterprise systems", 3rd IEEE International Conference on Autonomic Computing (ICAC), 2006. Google ScholarDigital Library
- C. Aggarwal and P. Yu, "A survey of uncertain data algorithms and applications," IEEE Trans. on Knowledge and Data Engineering, Vol. 21, No. 5, May 2009, pp. 609--623. Google ScholarDigital Library
- M. Aguilera, J. Mogul, J. Wiener, P. Reynolds, and A. Muthitacharoen, "Performance debugging for distributed systems of black boxes," Proc. SOSP, 2003, pp. 74--89. Google ScholarDigital Library
- M. Arlitt, K. Farkas, S. Iyer, S. P. Kumaresan, S. Rafaeli, "Data assurance: a prerequisite for IT automation", HPL-TR-2005-212, HP Laboratories, November 2005.Google Scholar
- P. Barham, A. Donnelly, R. Isaacs, and R. Mortier, "Using Magpie for request extraction and workload modelling," Proc. OSDI, 2004, pp. 259--272. Google ScholarDigital Library
- P. Bartlet-Ros, G. Iannaccone, J. Sanjuas-Cuxart, D. Amores-Lopez and J. Sole-Pareta, "Load shedding in network monitoring applications," Proc. USENIX Annual Technical Conf., 2007, pp. 59--72. Google ScholarDigital Library
- Y. Beth, B. Plale and D. Gannon,"A survey of data provenance in e-Science," SIGMOD Record, Vol. 34, 2005, pp. 31--36. Google ScholarDigital Library
- I. Cohen, M. Goldszmidt, T. Kelly, J. Symons, and J. Chase, "Correlating instrumentation data to system states: a building block for automated diagnosis and control," Proc. OSDI, 2004, pp. 231--244. Google ScholarDigital Library
- I. Cohen, S. Zhang, M. Goldszmidt, J. Symons, T. Kelly, and A. Fox,"Capturing, indexing, clustering, and retrieving system history, Proc. SOSP, 2005, pp. 105--118. Google ScholarDigital Library
- E. Cohen, N. Duffield, C. Lund, M. Thorup, "Confident estimation for multistage measurement sampling and aggregation,", Proc. SIGMETRICS, 2008, pp. 109--120. Google ScholarDigital Library
- N. Dalvi and D. Suciu, "Management of probabilistic data: foundations and challenges," Proc. PODS, 2007, pp. 1--12. Google ScholarDigital Library
- Frank Dravos. "Information quality: the quest for justification", Business Intelligence Journal 7(2), Spring 2002.Google Scholar
- S. Duan, S. Babu and K. Munagala, "Fa: A system for automating failure diagnosis," Proc. ICDE, 2009, pp.1012--1023. Google ScholarDigital Library
- V.F. Grasso, J.L. Beck,. and G. Manfredi, "Seismic early warning systems: procedure for automated decision making," Technical report EERL-2005-02, Caltech, Pasadena, CA, November 2005.Google Scholar
- R. Harji, "Harness Information to Deliver Enhanced Business Performance," Enterprise Search Summit, New York, NY, May 2009.Google Scholar
- N. Jain, P. Mahajan, D. Kit, P. Yalagandula, M. Dahlin, and Y. Zhang, "Network imprecision: a new consistency metric for scalable monitoring," Proc. OSDI'08, December 2008. Google ScholarDigital Library
- D. Kahneman, P. Slovic and A. Tversky, Judgment under Uncertainty : Heuristics and Biases, Cambridge University Press, April 1982.Google ScholarCross Ref
- J. Kiernan and E. Terze, "EventSummarizer: a tool for summarizing large event sequences," Proc. 12th Intl. Conf. on Extending Database Technnology (EDBT'09), March 2009. Google ScholarDigital Library
- D. Krishna, "Calculating the Value of Information," The Data Warehousing Institute (TDWI) New York City Chapter, June 10, 2009.Google Scholar
- M. Mesnier, M. Wachs, R. Sambasivan, A. Zheng, and G. Ganger, "Modeling the relative fitness of storage," Proc. SIGMETRICS, 2007, pp. 37--48. Google ScholarDigital Library
- R. Murty and M. Welsh, "Towards a dependable architecture for Internet-scale sensing," Proc. 2nd Workshop on Hot Topics in Dependability (HotDep '06), November 2006. Google ScholarDigital Library
- A. Preece, P. Missier, S. Embury, B. Jin and M. Greenwood,"An ontology-based approach to handling information quality in e-Science", Concurrency and Computation: Practice and Experience 20:253--264, 2008. Google ScholarDigital Library
- S. Rajbhandari, O. Rana and I. Wootten, "A fuzzy model for calculating workflow trust using provenance data," Proc. of 15th ACM Mardi Gras Conf., 2008, pp. 1--8. Google ScholarDigital Library
- E. Thereska, B. Salmon, J. Strunk, M. Wachs, M. Abd-El-Malik, J. Lopez and G. Ganger, "Stardust: tracking activity in a distributed storage system," Proc. SIGMETRICS, June 2006, pp. 3--14. Google ScholarDigital Library
- C. Wang, K.-L. Ma, "A statistical approach to volume data quality assessment," IEEE Trans on Visualization and Computer Graphics 14(3): 590--602, May/June 2008. Google ScholarDigital Library
- D. Wang, E. Michelakis, M. Garofalakis, and J. Hellerstein, "BayesStore: Managing Large, Uncertain Data Repositories with Probabilistic Graphical Models," Proc. VLDB, 2008, pp. 340--351. Google ScholarDigital Library
- J. Widom, "Trio: a system for data, uncertainty, and lineage," In C. Aggarwal, editor, Managing and Mining Uncertain Data, Springer, 2009, pp. 113--148.Google Scholar
- "NATO bombing of the Chinese embassy in Belgrade", Wikipedia, Dec. 2008.Google Scholar
Index Terms
- Do you know your IQ?: a research agenda for information quality in systems
Recommendations
Overview and Framework for Data and Information Quality Research
Awareness of data and information quality issues has grown rapidly in light of the critical role played by the quality of information in our data-intensive, knowledge-based economy. Research in the past two decades has produced a large body of data ...
The Impact of Experience and Time on the Use of Data Quality Information in Decision Making
Data Quality Information (DQI) is metadata that can be included with data to provide the user with information regarding the quality of that data. As users are increasingly removed from any personal experience with data, knowledge that would be ...
Research and Implementation of Information Quality Improvement
COINFO '09: Proceedings of the 2009 Fourth International Conference on Cooperation and Promotion of Information Resources in Science and TechnologyInformation quality is the premise for scientific decision making. Along with the development of various types of information-sharing project, problems of information quality are increasingly apparent. This paper firstly analyzed the criteria of ...
Comments