Skip to main content
Log in

Quality-optimized predictive analytics

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

On-line statistical and machine learning analytic tasks over large-scale contextual data streams coming from e.g., wireless sensor networks, Internet of Things environments, have gained high popularity nowadays due to their significance in knowledge extraction, regression and classification tasks, and, more generally, in making sense from large-scale streaming data. The quality of the received contextual information, however, impacts predictive analytics tasks especially when dealing with uncertain data, outliers data, and data containing missing values. Low quality of received contextual data significantly spoils the progressive inference and on-line statistical reasoning tasks, thus, bias is introduced in the induced knowledge, e.g., classification and decision making. To alleviate such situation, which is not so rare in real time contextual information processing systems, we propose a progressive time-optimized data quality-aware mechanism, which attempts to deliver contextual information of high quality to predictive analytics engines by progressively introducing a certain controlled delay. Such a mechanism progressively delivers high quality data as much as possible, thus eliminating possible biases in knowledge extraction and predictive analysis tasks. We propose an analytical model for this mechanism and show the benefits stem from this approach through comprehensive experimental evaluation and comparative assessment with quality-unaware methods over real sensory multivariate contextual data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Abbott D (2014) Applied Predictive Analytics: Principles and Techniques for the Professional Data Analyst, (1 ed.). Wiley Publishing

  2. Awang A et al (2007) RIMBAMON: A forest monitoring system using wireless sensor networks. In: ICIAS, pp 1101–1106

  3. Zervas E et al (2011) Multisensor data fusion for fire detection. Inform Fusion, Elsevier 12(3):1566–2535

    Google Scholar 

  4. Nittel S (2009) A Survey of geosensor networks: Advances in dynamic environmental monitoring. Sensors 9:5664–5678

    Article  Google Scholar 

  5. Xu G et al (2014) Applications Of wireless sensor networks in marine environment monitoring: a survey. Sensors 14(9):16932–16954

    Article  Google Scholar 

  6. Su X et al (2011) Using classifier-based nominal imputation to improve machine learning. In: 15th PAKDD, Part I, LNAI 6634, pp 124–135

  7. Farhangfar A et al (2008) Impact of imputation of missing values on classification error for discrete data. Pattern Recogn 41(12):3692–3705

    Article  MATH  Google Scholar 

  8. Enders CK (2010) Applied Missing data analysis. Guilford Press, NY

    Google Scholar 

  9. Anagnostopoulos C, Triantafillou P (2014) Scaling out big data missing value imputations: pythia vs. godzilla. In: 20th ACM SIGKDD (KDD ’14), pp 651–660

  10. Hall DL, McMullen SAH (2004) Mathematical techniques in multisensor data fusion, Second. Artech House, Norwood

    MATH  Google Scholar 

  11. Das S (2008) High-Level Data fusion. Artech House Publishers, Norwood

    Google Scholar 

  12. Bettini C, Brdiczka O, Henricksen K, Indulska J, Nicklas D, Ranganathan A, Riboni D (2010) A survey of context modelling and reasoning techniques. Pervasive Mob Comput 6(2):161– 180

    Article  Google Scholar 

  13. Jong-yi H, Eui-ho S, Sung-Jin K (2009) Context-aware systems: A literature review and classification. Expert Syst Appl 36(4):8509–8522

    Article  Google Scholar 

  14. Henricksen K, Indulska J (2006) Developing context-aware pervasive computing applications: Models and approach. Pervasive Mob Comput 2(1):37–64

    Article  Google Scholar 

  15. Ye J, Dobson S, McKeever S (2012) Situation identification techniques in pervasive computing: A review. Pervasive Mob Comput 8(1):36–66

    Article  Google Scholar 

  16. Anagnostopoulos C, Ntarladimas Y, Hadjiefthymiades S (2007) Situational computing: An innovative architecture with imprecise reasoning. J Syst Softw 80(12):1993–2014

    Article  Google Scholar 

  17. Anagnostopoulos C, Hadjiefthymiades S (2008) Enhancing situation-aware systems through imprecise reasoning. IEEE Trans Mob Comput 7(10):1153–1168

    Article  Google Scholar 

  18. Kolomvatsos K, Anagnostopoulos C, Hadjiefthymiades S (2015) A Time optimized scheme for top-k list maintenance over incomplete data streams. Inform Sci 311, C:59–73

    Article  Google Scholar 

  19. Kolomvatsos K, Anagnostopoulos C, Hadjiefthymiades S (2015) An efficient time optimized scheme for progressive analytics in big data. Big Data Res 2(4):155–165

    Article  Google Scholar 

  20. Eidson GW et al (2009) The South carolina digital Watershed: end-to-end support for realtime management of water resources, Proc. 4th Intl. Symposium on Innovations and Real-time Applications of Distributed Sensor Networks (IRADSN 09), 2010, USA

  21. Xia HB et al (2009) Design of water environment data monitoring node based on ZigBee technology. Proc. Intl. Conference on Computational Intelligence and Software Engineering (CiSE 09), 1–4

  22. Nguyen N et al (2010) A Real-time control using wireless sensor network for intelligent energy management system in buildings. Proc. IEEE Worskshop on Environmental Energy and Structural Monitoring Systems (EESMS 10), 87–92

  23. Oliveira LM, Rodrigues JJ (2011) Wireless Sensor networks: a survey on environmental monitoring. J Commun 6(2):143–151

    Article  Google Scholar 

  24. Kim J-J et al (2010) Wireless monitoring of indoor air quality by a sensor network. Indoor Built Environ 19(1):145–150

    Article  MathSciNet  Google Scholar 

  25. Kelley K et al (2012) On effect size. Psychol Methods 17(2):137–152

    Article  Google Scholar 

  26. Little R, Rubin D (2002) Statistical Analysis with Missing Data, Wiley Series in Probability and Statistics

  27. Peskir G, Shiryaev A (2006) Optimal Stopping and Free-Boundary problems, Ed. 1, Lectures in Mathematics, ETH Zuerich, Birkhauser Basel

  28. Shiryaev A (2007) Optimal stopping rules, series: Stochastic modelling and applied probability, vol. 8 springer

  29. Daskalakis C et al (2012) Learning poisson binomial distributions. In: 44th ACM STOC ’12, pp 709–728

  30. Tomas C (2006) Exponential smoothing for irregular data. Appl Math 51(6):597–604

    Article  MathSciNet  MATH  Google Scholar 

  31. Rousseeuw PJ, Croux C (1993) Alternatives to the Median Absolute Deviation. J Am Stat Assoc 88(424)

  32. Vergara A, Vembu S, Ayhan T, Ryan MA, Homer ML, Huerta R (2012) Chemical gas sensor drift compensation using classifier ensembles. Sensors Actuators B Chem 166:320–329

    Article  Google Scholar 

  33. Rodriguez-Lujan I, Fonollosa J, Vergara A, Homer M, Huerta R (2014) On the calibration of sensor arrays for pattern recognition using the minimal number of experiments. Chemometr Intell Lab Syst 130:123–134

    Article  Google Scholar 

  34. Anagnostopoulos C, Kolomvatsos K, Hadjiefthymiades S (2015) Time-optimised user grouping in location based services. Comput Netw, Elsevier 81:220–244

    Article  Google Scholar 

  35. Kolomvatsos K, Anagnostopoulos C, Hadjiefthymiades S (2014) An efficient recommendation system based on the optimal stopping theory. Expert Syst Appl, Elsevier 41(15):6796–6806

    Article  Google Scholar 

  36. Anagnostopoulos C, Hadjiefthymiades S (2014) Intelligent trajectory classification for improved movement prediction. IEEE Trans Syst Man Cybern Syst Hum 44(10):1301–1314

    Article  Google Scholar 

  37. Anagnostopoulos C (2014) Time-optimized contextual information forwarding in mobile sensor networks. J Parallel Distrib Comput, Elsevier 74(5):2317–2332

    Article  Google Scholar 

  38. Anagnostopoulos C, Hadjiefthymiades S (2013) Multivariate context collection in mobile sensor networks. Comput Netw, Elsevier 57(6):1394–1407

    Article  Google Scholar 

  39. Anagnostopoulos C, Hadjiefthymiades S, Zervas E (2013) Optimal stopping of the context collection process in mobile sensor networks. In: IEEE 24Th international symposium on personal, indoor and mobile radio communications (PIMRC), london, UK, pp 8–11

  40. Delakouridis C, Anagnostopoulos C (2013) On enhancement of ’share the secret’ scheme for location privacy. In: 9th International Workshop on Security and Trust Management (STM 2013), England, UK, pp 09–13

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Christos Anagnostopoulos.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Anagnostopoulos, C. Quality-optimized predictive analytics. Appl Intell 45, 1034–1046 (2016). https://doi.org/10.1007/s10489-016-0807-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-016-0807-x

Keywords

Navigation