Skip to main content
Erschienen in: Data Mining and Knowledge Discovery 5/2016

01.09.2016

Irrevocable-choice algorithms for sampling from a stream

verfasst von: Yan Zhu, Eamonn Keogh

Erschienen in: Data Mining and Knowledge Discovery | Ausgabe 5/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The problem of sampling from data streams has attracted significant interest in the last decade. Whichever sampling criteria is considered (uniform sample, maximally diverse sample, etc.), the challenges stem from the relatively small amount of memory available in the face of unbounded streams. In this work we consider an interesting extension of this problem, the framework of which is stimulated by recent improvements in sensing technologies and robotics. In some situations it is not only possible to digitally sense some aspects of the world, but to physically capture a tangible aspect of that world. Currently deployed examples include devices that can capture water/air samples, and devices that capture individual insects or fish. Such devices create an interesting twist on the stream sampling problem, because in most cases, the decision to take a physical sample is irrevocable. In this work we show how to generalize diversification sampling strategies to the irrevocable-choice setting, demonstrating our ideas on several real world domains.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Aggarwal CC (2006) Data streams: models and algorithms (advances in database systems). Springer, New York Aggarwal CC (2006) Data streams: models and algorithms (advances in database systems). Springer, New York
Zurück zum Zitat Anderson R et al (2010) Mars Science Laboratory participating scientists program proposal information package. NASA/Jet Propulsion Laboratory, Pasadena Anderson R et al (2010) Mars Science Laboratory participating scientists program proposal information package. NASA/Jet Propulsion Laboratory, Pasadena
Zurück zum Zitat Baldridge AM, Hook SJ, Grove CI, Rivera G (2009) The ASTER spectral library version 2.0. Remote Sens Environ 113(4):711–715CrossRef Baldridge AM, Hook SJ, Grove CI, Rivera G (2009) The ASTER spectral library version 2.0. Remote Sens Environ 113(4):711–715CrossRef
Zurück zum Zitat Begum N, Keogh E (2014) Rare time series motif discovery from unbounded streams. Proc VLDB Endow 8(2):149–160CrossRef Begum N, Keogh E (2014) Rare time series motif discovery from unbounded streams. Proc VLDB Endow 8(2):149–160CrossRef
Zurück zum Zitat Bowman A, Azzalini A (1997) Applied smoothing techniques for data analysis: the kernel approach with S-Plus illustrations. Oxford University Press, New YorkMATH Bowman A, Azzalini A (1997) Applied smoothing techniques for data analysis: the kernel approach with S-Plus illustrations. Oxford University Press, New YorkMATH
Zurück zum Zitat Cerra D, Bieniarz J, Avbelj J, Reinartz P, Mueller R (2011) Compression-based unsupervised clustering of spectral signatures. Whispers, Oro ValleyCrossRef Cerra D, Bieniarz J, Avbelj J, Reinartz P, Mueller R (2011) Compression-based unsupervised clustering of spectral signatures. Whispers, Oro ValleyCrossRef
Zurück zum Zitat Chen Y, Why A, Batista G, Mafra-Neto A, Keogh E (2014) Flying insect classification with inexpensive sensors. J Insect Behav 27(5):657–677CrossRef Chen Y, Why A, Batista G, Mafra-Neto A, Keogh E (2014) Flying insect classification with inexpensive sensors. J Insect Behav 27(5):657–677CrossRef
Zurück zum Zitat Cormode G, Hadjieleftheriou M (2010) Methods for finding frequent items in data streams. VLDB J 19(1):3–20CrossRef Cormode G, Hadjieleftheriou M (2010) Methods for finding frequent items in data streams. VLDB J 19(1):3–20CrossRef
Zurück zum Zitat Drosou M, Pitoura E (2012a) Disc diversity: result diversification based on dissimilarity and coverage. Proc VLDB Endow 6(1):13–24 Drosou M, Pitoura E (2012a) Disc diversity: result diversification based on dissimilarity and coverage. Proc VLDB Endow 6(1):13–24
Zurück zum Zitat Drosou M, Pitoura E (2012b) Dynamic diversification of continuous data. In: Proceedings of the 15th EDBT/ICDT, ACM, pp 216–227 Drosou M, Pitoura E (2012b) Dynamic diversification of continuous data. In: Proceedings of the 15th EDBT/ICDT, ACM, pp 216–227
Zurück zum Zitat Erkut E, Ülküsal Y, Yenicerioğlu O (1994) A comparison of p-dispersion heuristics. Comput Oper Res 21(10):1103–1113CrossRefMATH Erkut E, Ülküsal Y, Yenicerioğlu O (1994) A comparison of p-dispersion heuristics. Comput Oper Res 21(10):1103–1113CrossRefMATH
Zurück zum Zitat Fønss A, Munksgaard L (2008) Automatic blood sampling in dairy cows. Comput Electron Agric 64(1):27–33CrossRef Fønss A, Munksgaard L (2008) Automatic blood sampling in dairy cows. Comput Electron Agric 64(1):27–33CrossRef
Zurück zum Zitat Goldberg D (2011) Huxley: a flexible robot control architecture for autonomous underwater vehicles. In: Proceedings of IEEE OCEANS conference (Spain, 2011), pp 1–10 Goldberg D (2011) Huxley: a flexible robot control architecture for autonomous underwater vehicles. In: Proceedings of IEEE OCEANS conference (Spain, 2011), pp 1–10
Zurück zum Zitat Hill TP (2009) Knowing when to stop: how to gamble if you must—the mathematics of optimal stopping. Am Sci 97(2):126–133CrossRef Hill TP (2009) Knowing when to stop: how to gamble if you must—the mathematics of optimal stopping. Am Sci 97(2):126–133CrossRef
Zurück zum Zitat Honda MC, Watanabe S (2007) Utility of an automatic water sampler to observe seasonal variability in nutrients and DIC in the Northwestern North Pacific. J Oceanogr 63(3):349–362CrossRef Honda MC, Watanabe S (2007) Utility of an automatic water sampler to observe seasonal variability in nutrients and DIC in the Northwestern North Pacific. J Oceanogr 63(3):349–362CrossRef
Zurück zum Zitat Jonsson F (2015) Real-time fish type recognition in underwater images for sustainable fishing. Technical report. Uppsala University, Uppsala Jonsson F (2015) Real-time fish type recognition in underwater images for sustainable fishing. Technical report. Uppsala University, Uppsala
Zurück zum Zitat Minack E, Siberski W, Nejdl W (2011) Incremental diversification for very large sets: a streaming-based approach. In: ACM SIGIR (July 2011), pp 585–594 Minack E, Siberski W, Nejdl W (2011) Incremental diversification for very large sets: a streaming-based approach. In: ACM SIGIR (July 2011), pp 585–594
Zurück zum Zitat Peskir G, Shiryaev A (2006) Optimal stopping and free-boundary problems. Lectures in Mathematics. ETH, ZürichMATH Peskir G, Shiryaev A (2006) Optimal stopping and free-boundary problems. Lectures in Mathematics. ETH, ZürichMATH
Zurück zum Zitat Roman C, Mather R (2010) Autonomous underwater vehicles as tools for deep-submergence archaeology. Eng Marit Environ 224(4):327–340 Roman C, Mather R (2010) Autonomous underwater vehicles as tools for deep-submergence archaeology. Eng Marit Environ 224(4):327–340
Zurück zum Zitat Silver JB (2008) Chapter 14: estimating the size of the adult population. Mosquito ecology field sampling methods, 3rd edn. Springer, New York Silver JB (2008) Chapter 14: estimating the size of the adult population. Mosquito ecology field sampling methods, 3rd edn. Springer, New York
Zurück zum Zitat Webster G, Agle DC (2012) Mars Science Laboratory/Curiosity Mission status report. NASA, New York Webster G, Agle DC (2012) Mars Science Laboratory/Curiosity Mission status report. NASA, New York
Zurück zum Zitat Zhang D et al (2015) Automatic fish taxonomy using evolution-constructed features for invasive species removal. Pattern Anal Appl 18(2):451–459MathSciNetCrossRef Zhang D et al (2015) Automatic fish taxonomy using evolution-constructed features for invasive species removal. Pattern Anal Appl 18(2):451–459MathSciNetCrossRef
Metadaten
Titel
Irrevocable-choice algorithms for sampling from a stream
verfasst von
Yan Zhu
Eamonn Keogh
Publikationsdatum
01.09.2016
Verlag
Springer US
Erschienen in
Data Mining and Knowledge Discovery / Ausgabe 5/2016
Print ISSN: 1384-5810
Elektronische ISSN: 1573-756X
DOI
https://doi.org/10.1007/s10618-016-0472-z

Weitere Artikel der Ausgabe 5/2016

Data Mining and Knowledge Discovery 5/2016 Zur Ausgabe

Premium Partner