Skip to main content
Erschienen in: International Journal of Data Science and Analytics 1/2019

06.06.2018 | Applications

Automatic water mixing event identification in the Koljö fjord observatory data

verfasst von: Markus Götz, Mikhail Kononets, Christian Bodenstein, Morris Riedel, Matthias Book, Olafur Petur Palsson

Erschienen in: International Journal of Data Science and Analytics | Ausgabe 1/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This study addresses the task of automatically identifying water mixing events in the multivariate time series of salinity, temperature and dissolved oxygen provided by the Koljö fjord observatory. The observatory is used to test new underwater sensory technology and to monitor water quality with respect to hypoxia and oxygenation in the fjord and has been collecting data since April 2011. The fjord water properties change, manifesting as peaks or drops of dissolved oxygen, salinity and temperature, when affected by inflows of new water originating from the open sea or by rivers connected to the fjord system. An acute state of oxygen depletion can harm wildlife and the ecosystem permanently. The major challenge for the analysis is that the water property changes are marked by highly varying peak strength and correlation between the signals. The proposed data-driven analysis method extends existing univariate outlier detection approaches, based on clustering techniques, to identify the water mixing events. It incorporates three major steps: 1. smoothing of the input data, to counter noise, 2. individual outlier detection within the separate variables, 3. clustering of the results using the DBSCAN clustering algorithm to determine the anomalous events. The proposed approach is able to detect the water mixing events with a \(F{\textit{1}}\)-measure of 0.885, a precision of 0.931—that is 93.1% of all events have been correctly detected—and a recall of 0.843–84.3% of events that should have been found actually also have been. Using the proposed method, the oceanographers can be informed automatically about the status of the fjord without manual interaction or physical presence at the experiment site.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
Median-smoothed curves reflect slope changes with a delay of half the filter window size.
 
2
Estimated minimum duration of a typical water mixing event.
 
Literatur
4.
Zurück zum Zitat Andersson, L., Rydberg, L.: Trends in nutrient and oxygen conditions within the Kattegat: effects of local nutrient supply. Estuar. Coast. Shelf Sci. 26(5), 559–579 (1988)CrossRef Andersson, L., Rydberg, L.: Trends in nutrient and oxygen conditions within the Kattegat: effects of local nutrient supply. Estuar. Coast. Shelf Sci. 26(5), 559–579 (1988)CrossRef
5.
Zurück zum Zitat Arce, G., McLoughlin, M.: Theoretical analysis of the max/median filter. IEEE Trans. Acoust. Speech Signal Process. 35(1), 60–69 (1987)CrossRef Arce, G., McLoughlin, M.: Theoretical analysis of the max/median filter. IEEE Trans. Acoust. Speech Signal Process. 35(1), 60–69 (1987)CrossRef
6.
Zurück zum Zitat Atamanchuk, D., Tengberg, A., Aleynik, D., Fietzek, P., Shitashima, K., Lichtschlag, A., Hall, P.O., Stahl, H.: Detection of CO2 leakage from a simulated sub-seabed storage site using three different types of CO2 sensors. Int. J. Greenh. Gas Control 38, 121–134 (2015)CrossRef Atamanchuk, D., Tengberg, A., Aleynik, D., Fietzek, P., Shitashima, K., Lichtschlag, A., Hall, P.O., Stahl, H.: Detection of CO2 leakage from a simulated sub-seabed storage site using three different types of CO2 sensors. Int. J. Greenh. Gas Control 38, 121–134 (2015)CrossRef
7.
Zurück zum Zitat Bagnall, A.J., Janacek, G.J.: Clustering time series from ARMA models with clipped data. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, pp. 49–58 (2004) Bagnall, A.J., Janacek, G.J.: Clustering time series from ARMA models with clipped data. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, pp. 49–58 (2004)
8.
Zurück zum Zitat Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)MATH Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)MATH
9.
Zurück zum Zitat Diepenbroek, M., Grobe, H., Reinke, M., Schindler, U., Schlitzer, R., Sieger, R., Wefer, G.: PANGAEA—an information system for environmental sciences. Comput. Geosci. 28(10), 1201–1210 (2002)CrossRef Diepenbroek, M., Grobe, H., Reinke, M., Schindler, U., Schlitzer, R., Sieger, R., Wefer, G.: PANGAEA—an information system for environmental sciences. Comput. Geosci. 28(10), 1201–1210 (2002)CrossRef
10.
Zurück zum Zitat Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. Knowl. Discov. Data Min. 96, 226–231 (1996) Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. Knowl. Discov. Data Min. 96, 226–231 (1996)
11.
Zurück zum Zitat Gabriel, E., Fagg, G.E., Bosilca, G., Angskun, T., Dongarra, J.J., Squyres, J.M., Sahay, V., Kambadur, P., Barrett, B., Lumsdaine, A., et al: Open mpi: goals, concept, and design of a next generation mpi implementation. In: European Parallel Virtual Machine/Message Passing Interface Users’ Group Meeting, pp. 97–104. Springer (2004) Gabriel, E., Fagg, G.E., Bosilca, G., Angskun, T., Dongarra, J.J., Squyres, J.M., Sahay, V., Kambadur, P., Barrett, B., Lumsdaine, A., et al: Open mpi: goals, concept, and design of a next generation mpi implementation. In: European Parallel Virtual Machine/Message Passing Interface Users’ Group Meeting, pp. 97–104. Springer (2004)
12.
Zurück zum Zitat Gariel, M., Srivastava, A.N., Feron, E.: Trajectory clustering and an application to airspace monitoring. IEEE Trans. Intell. Transp. Syst. 12(4), 1511–1524 (2011)CrossRef Gariel, M., Srivastava, A.N., Feron, E.: Trajectory clustering and an application to airspace monitoring. IEEE Trans. Intell. Transp. Syst. 12(4), 1511–1524 (2011)CrossRef
13.
Zurück zum Zitat Götz, M., Bodenstein, C., Riedel, M.: HPDBSCAN: highly parallel DBSCAN. In: Proceedings of the Workshop on Machine Learning in High-Performance Computing Environments, ACM, p. 2 (2015) Götz, M., Bodenstein, C., Riedel, M.: HPDBSCAN: highly parallel DBSCAN. In: Proceedings of the Workshop on Machine Learning in High-Performance Computing Environments, ACM, p. 2 (2015)
14.
Zurück zum Zitat Goutte, C., Toft, P., Rostrup, E., Nielsen, F.Å., Hansen, L.K.: On clustering fMRI time series. NeuroImage 9(3), 298–310 (1999)CrossRef Goutte, C., Toft, P., Rostrup, E., Nielsen, F.Å., Hansen, L.K.: On clustering fMRI time series. NeuroImage 9(3), 298–310 (1999)CrossRef
16.
Zurück zum Zitat Hallac, D., Vare, S., Boyd, S., Leskovec, J.: Toeplitz inverse covariance-based clustering of multivariate time series data. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, pp. 215–223 (2017) Hallac, D., Vare, S., Boyd, S., Leskovec, J.: Toeplitz inverse covariance-based clustering of multivariate time series data. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, pp. 215–223 (2017)
17.
Zurück zum Zitat Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques, 3rd edn. Morgan Kaufmann, Burlington (2011)MATH Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques, 3rd edn. Morgan Kaufmann, Burlington (2011)MATH
18.
Zurück zum Zitat Hansson, D., Stigebrandt, A., Liljebladh, B.: Modelling the Orust fjord system on the Swedish west coast. J. Mar. Syst. 113, 29–41 (2013)CrossRef Hansson, D., Stigebrandt, A., Liljebladh, B.: Modelling the Orust fjord system on the Swedish west coast. J. Mar. Syst. 113, 29–41 (2013)CrossRef
19.
Zurück zum Zitat Himberg, J., Hyvärinen, A., Esposito, F.: Validating the independent components of neuroimaging time series via clustering and visualization. Neuroimage 22(3), 1214–1222 (2004)CrossRef Himberg, J., Hyvärinen, A., Esposito, F.: Validating the independent components of neuroimaging time series via clustering and visualization. Neuroimage 22(3), 1214–1222 (2004)CrossRef
20.
Zurück zum Zitat Hodge, V.J., Austin, J.: A survey of outlier detection methodologies. Artif. Intell. Rev. 22(2), 85–126 (2004)CrossRefMATH Hodge, V.J., Austin, J.: A survey of outlier detection methodologies. Artif. Intell. Rev. 22(2), 85–126 (2004)CrossRefMATH
21.
Zurück zum Zitat Jiang, D., Pei, J., Zhang, A.: Dhc: a density-based hierarchical clustering method for time series gene expression data. In: Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings, pp. 393–400. IEEE (2003) Jiang, D., Pei, J., Zhang, A.: Dhc: a density-based hierarchical clustering method for time series gene expression data. In: Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings, pp. 393–400. IEEE (2003)
22.
Zurück zum Zitat Johnston, F., Boyland, J., Meadows, M., Shale, E.: Some properties of a simple moving average when applied to forecasting a time series. J. Oper. Res. Soc. 50(12), 1267–1271 (1999)CrossRefMATH Johnston, F., Boyland, J., Meadows, M., Shale, E.: Some properties of a simple moving average when applied to forecasting a time series. J. Oper. Res. Soc. 50(12), 1267–1271 (1999)CrossRefMATH
23.
Zurück zum Zitat Klise, K.A., McKenna, S.A.: Water quality change detection: multivariate algorithms. In: Defense and Security Symposium, International Society for Optics and Photonics, p. 62030J (2006) Klise, K.A., McKenna, S.A.: Water quality change detection: multivariate algorithms. In: Defense and Security Symposium, International Society for Optics and Photonics, p. 62030J (2006)
27.
Zurück zum Zitat Kut, A., Birant, D.: Spatio-temporal outlier detection in large databases. CIT J. Comput. Inf. Technol. 14(4), 291–297 (2006)CrossRef Kut, A., Birant, D.: Spatio-temporal outlier detection in large databases. CIT J. Comput. Inf. Technol. 14(4), 291–297 (2006)CrossRef
28.
Zurück zum Zitat Liao, T.W.: Clustering of time series data—a survey. Pattern Recognit. 38(11), 1857–1874 (2005)CrossRefMATH Liao, T.W.: Clustering of time series data—a survey. Pattern Recognit. 38(11), 1857–1874 (2005)CrossRefMATH
29.
Zurück zum Zitat MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Oakland, CA, USA, vol. 14, pp. 281–297 (1967) MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Oakland, CA, USA, vol. 14, pp. 281–297 (1967)
32.
Zurück zum Zitat McKinney, W.: Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython. O’Reilly Media, Inc., Newton (2012) McKinney, W.: Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython. O’Reilly Media, Inc., Newton (2012)
33.
Zurück zum Zitat Murray, R., Haxton, T., McKenna, S., Hart, D., Klise, K., Koch, M., Vugrin, E., Martin, S., Wilson, M., Cruze, V., et al.: Water quality event detection systems for drinking water contamination warning systems—development, testing, and application of canary. EPAI600IR-lOI036, US (2010) Murray, R., Haxton, T., McKenna, S., Hart, D., Klise, K., Koch, M., Vugrin, E., Martin, S., Wilson, M., Cruze, V., et al.: Water quality event detection systems for drinking water contamination warning systems—development, testing, and application of canary. EPAI600IR-lOI036, US (2010)
34.
Zurück zum Zitat Nordberg, K., Filipsson, H.L., Gustafsson, M., Harland, R., Roos, P.: Climate, hydrographic variations and marine benthic hypoxia in Koljö Fjord, Sweden. J. Sea Res. 46(3), 187–200 (2001)CrossRef Nordberg, K., Filipsson, H.L., Gustafsson, M., Harland, R., Roos, P.: Climate, hydrographic variations and marine benthic hypoxia in Koljö Fjord, Sweden. J. Sea Res. 46(3), 187–200 (2001)CrossRef
35.
Zurück zum Zitat Pavlidis, N.G., Tasoulis, D.K., Plagianakos, V.P., Vrahatis, M.N.: Computational intelligence methods for financial time series modeling. Int. J. Bifurc. Chaos 16(07), 2053–2062 (2006)MathSciNetCrossRefMATH Pavlidis, N.G., Tasoulis, D.K., Plagianakos, V.P., Vrahatis, M.N.: Computational intelligence methods for financial time series modeling. Int. J. Bifurc. Chaos 16(07), 2053–2062 (2006)MathSciNetCrossRefMATH
36.
Zurück zum Zitat Perelman, L., Arad, J., Housh, M., Ostfeld, A.: Event detection in water distribution systems from multivariate water quality time series. Environ. Sci. Technol. 46(15), 8212–8219 (2012)CrossRef Perelman, L., Arad, J., Housh, M., Ostfeld, A.: Event detection in water distribution systems from multivariate water quality time series. Environ. Sci. Technol. 46(15), 8212–8219 (2012)CrossRef
37.
Zurück zum Zitat Powers, D.: Evaluation: From Precision, Recall and F Factor to ROC, Informedness, Markedness and Correaltion. School of Informatics and Engineering, Flinders, Bedford Park (2007) Powers, D.: Evaluation: From Precision, Recall and F Factor to ROC, Informedness, Markedness and Correaltion. School of Informatics and Engineering, Flinders, Bedford Park (2007)
40.
Zurück zum Zitat Whitle, P.: Hypothesis Testing in Time Series Analysis, vol. 4. Almqvist & Wiksells, Stockholm (1951) Whitle, P.: Hypothesis Testing in Time Series Analysis, vol. 4. Almqvist & Wiksells, Stockholm (1951)
41.
Zurück zum Zitat Zhao, H., Hou, D., Huang, P., Zhang, G.: Water quality event detection in drinking water network. Water Air Soil Pollut 225(11), 1–15 (2014)CrossRef Zhao, H., Hou, D., Huang, P., Zhang, G.: Water quality event detection in drinking water network. Water Air Soil Pollut 225(11), 1–15 (2014)CrossRef
Metadaten
Titel
Automatic water mixing event identification in the Koljö fjord observatory data
verfasst von
Markus Götz
Mikhail Kononets
Christian Bodenstein
Morris Riedel
Matthias Book
Olafur Petur Palsson
Publikationsdatum
06.06.2018
Verlag
Springer International Publishing
Erschienen in
International Journal of Data Science and Analytics / Ausgabe 1/2019
Print ISSN: 2364-415X
Elektronische ISSN: 2364-4168
DOI
https://doi.org/10.1007/s41060-018-0132-z

Weitere Artikel der Ausgabe 1/2019

International Journal of Data Science and Analytics 1/2019 Zur Ausgabe