research-article

Leveraging spatio-temporal redundancy for RFID data cleansing

Authors:
Haiquan Chen

Auburn University, Auburn, AL, USA

Auburn University, Auburn, AL, USA
View Profile

,
Wei-Shinn Ku

Auburn University, Auburn, AL, USA

Auburn University, Auburn, AL, USA
View Profile

,
Haixun Wang

Microsoft Research Asia, Beijing, China

Microsoft Research Asia, Beijing, China
View Profile

,
Min-Te Sun

National Central University, Taoyuan, Taiwan ROC

National Central University, Taoyuan, Taiwan ROC
View Profile

SIGMOD '10: Proceedings of the 2010 ACM SIGMOD International Conference on Management of dataJune 2010Pages 51–62https://doi.org/10.1145/1807167.1807176

Published:06 June 2010Publication History

SIGMOD '10: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data

Pages 51–62

ABSTRACT

Radio Frequency Identification (RFID) technologies are used in many applications for data collection. However, raw RFID readings are usually of low quality and may contain many anomalies. An ideal solution for RFID data cleansing should address the following issues. First, in many applications, duplicate readings (by multiple readers simultaneously or by a single reader over a period of time) of the same object are very common. The solution should take advantage of the resulting data redundancy for data cleaning. Second, prior knowledge about the readers and the environment (e.g., prior data distribution, false negative rates of readers) may help improve data quality and remove data anomalies, and a desired solution must be able to quantify the degree of uncertainty based on such knowledge. Third, the solution should take advantage of given constraints in target applications (e.g., the number of objects in a same location cannot exceed a given value) to elevate the accuracy of data cleansing. There are a number of existing RFID data cleansing techniques. However, none of them support all the aforementioned features. In this paper we propose a Bayesian inference based approach for cleaning RFID raw data. Our approach takes full advantage of data redundancy. To capture the likelihood, we design an n-state detection model and formally prove that the 3-state model can maximize the system performance. Moreover, in order to sample from the posterior, we devise a Metropolis-Hastings sampler with Constraints (MH-C), which incorporates constraint management to clean RFID raw data with high efficiency and accuracy. We validate our solution with a common RFID application and demonstrate the advantages of our approach through extensive simulations.

References

P. Agrawal, O. Benjelloun, A. D. Sarma, C. Hayworth, S. Nabar, T. Sugihara, and J. Widom. Trio: A System for Data, Uncertainty, and Lineage. In VLDB, pages 1151--1154, 2006. Google ScholarDigital Library
C. Andrieu, N. de Freitas, A. Doucet, and M. I. Jordan. An Introduction to MCMC for Machine Learning. Machine Learning, 50(1-2):5--43, 2003.Google ScholarCross Ref
P. Andritsos, A. Fuxman, and R. J. Miller. Clean Answers over Dirty Databases: A Probabilistic Approach. In ICDE, page 30, 2006. Google ScholarDigital Library
L. Antova, C. Koch, and D. Olteanu. Query Language Support for Incomplete Information in the MayBMS System. In VLDB, pages 1422--1425, 2007. Google ScholarDigital Library
S. S. Chawathe, V. Krishnamurthy, S. Ramachandran, and S. E. Sarma. Managing RFID Data. In VLDB, pages 1189--1195, 2004. Google ScholarDigital Library
R. Cheng, S. Singh, and S. Prabhakar. U-DBMS: A Database System for Managing Constantly-evolving Data. In VLDB, pages 1271--1274, 2005. Google ScholarDigital Library
N. Dalvi and D. Suciu. Efficient Query Evaluation on Probabilistic Databases. The VLDB Journal, 16(4):523--544, 2007. Google ScholarDigital Library
A. Deshpande, C. Guestrin, and S. Madden. Using probabilistic models for data management in acquisitional environments. In CIDR, pages 317--328, 2005.Google Scholar
D. W. Engels and S. E. Sarma. The Reader Collision Problem. In IEEE SMC, 2002.Google ScholarCross Ref
C. Floerkemeier and M. Lampe. Issues with RFID Usage in Ubiquitous Computing Applications. In Pervasive, pages 188--193, 2004.Google ScholarCross Ref
M. J. Franklin, S. R. Jeffery, S. Krishnamurthy, F. Reiss, S. Rizvi, E. Wu, O. Cooper, A. Edakkunni, and W. Hong. Design Considerations for High Fan-In Systems: The HiFi Approach. In CIDR, pages 290--304, 2005.Google Scholar
H. Gonzalez, J. Han, X. Li, and D. Klabjan. Warehousing and Analyzing Massive RFID Data Sets. In ICDE, page 83, 2006. Google ScholarDigital Library
J. Ho, D. W. Engels, and S. E. Sarma. HiQ: A Hierarchical Q-learning Algorithm to Solve the Reader Collision Problem. In SAINT Workshops, pages 88--91, 2006. Google ScholarDigital Library
R. Jampani, F. Xu, M. Wu, L. L. Perez, C. Jermaine, and P. J. Haas. MCDB: A Monte Carlo Approach to Managing Uncertain Data. In SIGMOD, pages 687--700, 2008. Google ScholarDigital Library
S. R. Jeffery, G. Alonso, M. J. Franklin, W. Hong, and J. Widom. Declarative Support for Sensor Data Cleaning. In Pervasive, pages 83--100, 2006. Google ScholarDigital Library
S. R. Jeffery, M. J. Franklin, and M. N. Garofalakis. An Adaptive RFID Middleware for Supporting Metaphysical Data Independence. VLDB J., 17(2):265--289, 2008. Google ScholarDigital Library
S. R. Jeffery, M. N. Garofalakis, and M. J. Franklin. Adaptive Cleaning for RFID Data Streams. In VLDB, pages 163--174, 2006. Google ScholarDigital Library
N. Khoussainova, M. Balazinska, and D. Suciu. Towards Correcting Input Data Errors Probabilistically Using Integrity Constraints. In MobiDE, pages 43--50, 2006. Google ScholarDigital Library
N. Khoussainova, M. Balazinska, and D. Suciu. Probabilistic Event Extraction from RFID Data. In ICDE, pages 1480--1482, 2008. Google ScholarDigital Library
J. Myung, W. Lee, J. Srivastava, and T. K. Shih. Tag-Splitting: Adaptive Collision Arbitration Protocols for RFID Tag Identification. IEEE Trans. Parallel Distrib. Syst., 18(6):763--775, 2007. Google ScholarDigital Library
J. Rao, S. Doraiswamy, H. Thakkar, and L. S. Colby. A Deferred Cleansing Method for RFID Data Analytics. In VLDB, pages 175--186, 2006. Google ScholarDigital Library
S. M. Ross. Introduction to Probability Models, Ninth Edition. Academic Press, 2006. Google ScholarDigital Library
L. Sullivan. RFID Implementation Challenges Persist, All This Time Later. InformationWeek, October 2005.Google Scholar
T. Tran, C. Sutton, R. Cocci, Y. Nie, Y. Diao, and P. Shenoy. Probabilistic Inference over RFID Streams in Mobile Environments. In ICDE, 2009. Google ScholarDigital Library
J. Waldrop, D. W. Engels, and S. E. Sarma. Colorwave: An Anticollision Algorithm for the Reader Collision Problem. In ICC, pages 1206--1210, 2003.Google ScholarCross Ref
F. Wang and P. Liu. Temporal Management of RFID Data. In VLDB, pages 1128--1139, 2005. Google ScholarDigital Library
R. Want. The Magic of RFID. ACM Queue, 2(7):40--48, 2004. Google ScholarDigital Library
J. Xie, J. Yang, Y. Chen, H. Wang, and P. S. Yu. A Sampling-Based Approach to Information Recovery. In ICDE, pages 476--485, 2008. Google ScholarDigital Library

Index Terms

Leveraging spatio-temporal redundancy for RFID data cleansing
1. Information systems
  1. Data management systems
    1. Database management system engines

Recommendations

A model-based approach for RFID data stream cleansing
CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management

In recent years, RFID technologies have been used in many applications, such as inventory checking and object tracking. However, raw RFID data are inherently unreliable due to physical device limitations and different kinds of environmental noise. ...
Read More
A Bayesian Inference-Based Framework for RFID Data Cleansing

The past few years have witnessed the emergence of an increasing number of applications for tracking and tracing based on radio frequency identification (RFID) technologies. However, raw RFID readings are usually of low quality and may contain numerous ...
Read More
Alliance Rules for Data Warehouse Cleansing
ICSPS '09: Proceedings of the 2009 International Conference on Signal Processing Systems

Data Cleansing is an activity performed on the data sets of data warehouse to enhance and maintain the quality and consistency of the data. This paper addresses the problems related with dirty data, entrance of dirty data and detection of dirty data in ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '10: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
June 2010
1286 pages
ISBN:9781450300322
DOI:10.1145/1807167
General Chair:
Ahmed Elmagarmid
Purdue University, USA
,
Program Chair:
Divyakant Agrawal
University of California at Santa Barbara, USA
Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 6 June 2010
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
data cleaning
probabilistic algorithms
spatio-temporal databases
uncertainty
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate785of4,003submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 57
  Total Citations
  View Citations
- 900
  Total Downloads
- Downloads (Last 12 months)13
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Leveraging spatio-temporal redundancy for RFID data cleansing

SIGMOD '10: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data

ABSTRACT

References

Cited By

Index Terms

Recommendations

A model-based approach for RFID data stream cleansing

A Bayesian Inference-Based Framework for RFID Data Cleansing

Alliance Rules for Data Warehouse Cleansing