Skip to main content
Erschienen in: GeoInformatica 4/2008

01.12.2008

On Detecting Spatial Outliers

verfasst von: Dechang Chen, Chang-Tien Lu, Yufeng Kou, Feng Chen

Erschienen in: GeoInformatica | Ausgabe 4/2008

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The ever-increasing volume of spatial data has greatly challenged our ability to extract useful but implicit knowledge from them. As an important branch of spatial data mining, spatial outlier detection aims to discover the objects whose non-spatial attribute values are significantly different from the values of their spatial neighbors. These objects, called spatial outliers, may reveal important phenomena in a number of applications including traffic control, satellite image analysis, weather forecast, and medical diagnosis. Most of the existing spatial outlier detection algorithms mainly focus on identifying single attribute outliers and could potentially misclassify normal objects as outliers when their neighborhoods contain real spatial outliers with very large or small attribute values. In addition, many spatial applications contain multiple non-spatial attributes which should be processed altogether to identify outliers. To address these two issues, we formulate the spatial outlier detection problem in a general way, design two robust detection algorithms, one for single attribute and the other for multiple attributes, and analyze their computational complexities. Experiments were conducted on a real-world data set, West Nile virus data, to validate the effectiveness of the proposed algorithms.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat C.C. Aggarwal. “Redesigning distance functions and distance-based applications for high dimensional data,” SIGMOD Record, Vol. 30(1):13–18, March 2001. C.C. Aggarwal. “Redesigning distance functions and distance-based applications for high dimensional data,” SIGMOD Record, Vol. 30(1):13–18, March 2001.
2.
Zurück zum Zitat C.C. Aggarwal, J.L. Wolf, P.S. Yu, C. Procopiuc, and J. S. Park. “Fast algorithms for projected clustering,” in Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, pp. 61–72, Philadelphia, Pennsylvania, United States, June 1–3, 1999. C.C. Aggarwal, J.L. Wolf, P.S. Yu, C. Procopiuc, and J. S. Park. “Fast algorithms for projected clustering,” in Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, pp. 61–72, Philadelphia, Pennsylvania, United States, June 1–3, 1999.
3.
Zurück zum Zitat C.C. Aggarwal and P.S. Yu. “Outlier detection for high dimensional data,” in Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data, pp. 37–46, Santa Barbara, California, United States, May 21–24, 2001. C.C. Aggarwal and P.S. Yu. “Outlier detection for high dimensional data,” in Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data, pp. 37–46, Santa Barbara, California, United States, May 21–24, 2001.
4.
Zurück zum Zitat V. Barnett and T. Lewis. Outliers in Statistical Data. Wiley, New York, 1994. V. Barnett and T. Lewis. Outliers in Statistical Data. Wiley, New York, 1994.
5.
Zurück zum Zitat S. Berchtold, C. Böhm, and H.-P. Kriegal. “The pyramid-technique: Towards breaking the curse of dimensionality,” in Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, pp. 142–153, Seattle, Washington, United States, June 2–4, 1998. S. Berchtold, C. Böhm, and H.-P. Kriegal. “The pyramid-technique: Towards breaking the curse of dimensionality,” in Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, pp. 142–153, Seattle, Washington, United States, June 2–4, 1998.
6.
Zurück zum Zitat M.M. Breunig, H.-P. Kriegel, R.T. Ng, and J. Sander. “Lof: Identifying density-based local outliers.” in Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 93–104, Dallas, Texas, United States, May 14–19, 2000. M.M. Breunig, H.-P. Kriegel, R.T. Ng, and J. Sander. “Lof: Identifying density-based local outliers.” in Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 93–104, Dallas, Texas, United States, May 14–19, 2000.
7.
Zurück zum Zitat A. Cerioli and M. Riani. “The ordering of spatial data and the detection of multiple outliers,” Journal of Computational and Graphical Statistics, Vol. 8(2):239–258, June 1999.CrossRef A. Cerioli and M. Riani. “The ordering of spatial data and the detection of multiple outliers,” Journal of Computational and Graphical Statistics, Vol. 8(2):239–258, June 1999.CrossRef
8.
Zurück zum Zitat P.K. Chan, W. Fan, A.L. Prodromidis, and S.J. Stolfo. “Distributed data mining in credit card fraud detection,” IEEE Intelligent Systems, Vol. 14(6):67–74, 1999.CrossRef P.K. Chan, W. Fan, A.L. Prodromidis, and S.J. Stolfo. “Distributed data mining in credit card fraud detection,” IEEE Intelligent Systems, Vol. 14(6):67–74, 1999.CrossRef
9.
Zurück zum Zitat W.S. Chan and W.N. Liu. “Diagnosing shocks in stock markets of Southeast Asia, Australia, and New Zealand,” Mathematics and Computers in Simulation, Vol. 59(1–3):223–232, 2002.CrossRef W.S. Chan and W.N. Liu. “Diagnosing shocks in stock markets of Southeast Asia, Australia, and New Zealand,” Mathematics and Computers in Simulation, Vol. 59(1–3):223–232, 2002.CrossRef
10.
Zurück zum Zitat A. Conci and C.B. Proença. “A system for real-time fabric inspection and industrial decision,” in Proceedings of the 14th International Conference on Software Engineering and Knowledge Engineering, pp. 707–714, Ischia, Italy, July 15–19, 2002. A. Conci and C.B. Proença. “A system for real-time fabric inspection and industrial decision,” in Proceedings of the 14th International Conference on Software Engineering and Knowledge Engineering, pp. 707–714, Ischia, Italy, July 15–19, 2002.
11.
Zurück zum Zitat D. Freedman, R. Pisani, and R. Purves. Statistics. Norton, Vol. 41:212–223, 1998. D. Freedman, R. Pisani, and R. Purves. Statistics. Norton, Vol. 41:212–223, 1998.
12.
Zurück zum Zitat M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. “A density-based algorithm for discovering clusters in large spatial databases with noise,” in the Second International Conference on Knowledge Discovery and Data Mining, pp. 226–231, Portland, Oregon, United States, August 2–4, 1996. M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. “A density-based algorithm for discovering clusters in large spatial databases with noise,” in the Second International Conference on Knowledge Discovery and Data Mining, pp. 226–231, Portland, Oregon, United States, August 2–4, 1996.
13.
Zurück zum Zitat R. Haining. Spatial Data Analysis in the Social and Environmental Sciences. Cambridge University Press, 1993. R. Haining. Spatial Data Analysis in the Social and Environmental Sciences. Cambridge University Press, 1993.
14.
Zurück zum Zitat J. Hardin and D.M. Rocke. “The distribution of robust distances,” Journal of Computational and Graphical Statistics, Vol. 14:1–19, 2005.CrossRef J. Hardin and D.M. Rocke. “The distribution of robust distances,” Journal of Computational and Graphical Statistics, Vol. 14:1–19, 2005.CrossRef
15.
Zurück zum Zitat J. Haslett, R. Brandley, P. Craig, A. Unwin, and G. Wills. “Dynamic graphics for exploring spatial data with application to locating global and local anomalies,” The American Statistician, Vol. 45:234–242, 1991.CrossRef J. Haslett, R. Brandley, P. Craig, A. Unwin, and G. Wills. “Dynamic graphics for exploring spatial data with application to locating global and local anomalies,” The American Statistician, Vol. 45:234–242, 1991.CrossRef
16.
Zurück zum Zitat A. Hinneburg, C.C. Aggarwal, and D.A. Keim. “What is the nearest neighbor in high dimensional spaces?” in Proceedings of 26th International Conference on Very Large Data Bases, pp. 506–515, Cairo, Egypt, September 10–14, 2000. A. Hinneburg, C.C. Aggarwal, and D.A. Keim. “What is the nearest neighbor in high dimensional spaces?” in Proceedings of 26th International Conference on Very Large Data Bases, pp. 506–515, Cairo, Egypt, September 10–14, 2000.
17.
Zurück zum Zitat W. Jin, A.K.H. Tung, and J. Han. “Mining top-n local outliers in large databases,” in Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 293–298, San Francisco, California, United States, August 26–29, 2001. W. Jin, A.K.H. Tung, and J. Han. “Mining top-n local outliers in large databases,” in Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 293–298, San Francisco, California, United States, August 26–29, 2001.
18.
Zurück zum Zitat E.M. Knorr and R.T. Ng. “Algorithms for mining distance-based outliers in large datasets,” in Proceedings of the 24th International Conference on Very Large Data Bases, pp. 392–403, New York City, NY, United States, August 24–27, 1998. E.M. Knorr and R.T. Ng. “Algorithms for mining distance-based outliers in large datasets,” in Proceedings of the 24th International Conference on Very Large Data Bases, pp. 392–403, New York City, NY, United States, August 24–27, 1998.
19.
Zurück zum Zitat H. Liu, K.C. Jezek, and M.E. O’Kelly. “Detecting outliers in irregularly distributed spatial data sets by locally adaptive and robust statistical analysis and gis,” International Journal of Geographical Information Science, Vol. 15(8):721–741, 2001.CrossRef H. Liu, K.C. Jezek, and M.E. O’Kelly. “Detecting outliers in irregularly distributed spatial data sets by locally adaptive and robust statistical analysis and gis,” International Journal of Geographical Information Science, Vol. 15(8):721–741, 2001.CrossRef
20.
Zurück zum Zitat C.-T. Lu, D. Chen, and Y. Kou. “Detecting spatial outliers with multiple attributes,” in Proceedings of the 15th International Conference on Tools with Artificial Intelligence, pp. 122–128, Sacramento, California, United States, November 3–5, 2003. C.-T. Lu, D. Chen, and Y. Kou. “Detecting spatial outliers with multiple attributes,” in Proceedings of the 15th International Conference on Tools with Artificial Intelligence, pp. 122–128, Sacramento, California, United States, November 3–5, 2003.
21.
Zurück zum Zitat C.-T. Lu, D. Chen, and Y. Kou. “Algorithms for spatial outlier detection,” in Proceedings of the 3rd IEEE International Conference on Data Mining, Melbourne, Florida, pp. 597–600, November 19–22, 2003. C.-T. Lu, D. Chen, and Y. Kou. “Algorithms for spatial outlier detection,” in Proceedings of the 3rd IEEE International Conference on Data Mining, Melbourne, Florida, pp. 597–600, November 19–22, 2003.
22.
Zurück zum Zitat C.-T. Lu and L.R. Liang. “Wavelet fuzzy classification for detecting and tracking region outliers in meteorological data,” in Proceedings of the 12th Annual ACM International Workshop on Geographic Information Systems, pp. 258–265, Washington DC, United States, November 12–13, 2004. C.-T. Lu and L.R. Liang. “Wavelet fuzzy classification for detecting and tracking region outliers in meteorological data,” in Proceedings of the 12th Annual ACM International Workshop on Geographic Information Systems, pp. 258–265, Washington DC, United States, November 12–13, 2004.
23.
Zurück zum Zitat A. Luc. “Local indicators of spatial association: Lisa.” Geographical Analysis, Vol. 27(2):93–115, 1995. A. Luc. “Local indicators of spatial association: Lisa.” Geographical Analysis, Vol. 27(2):93–115, 1995.
24.
Zurück zum Zitat M. Blum, R.W. Floyd, V. Pratt, R. Rivest, and R. Tarjan. “Time bounds for selection,” Journal of Computer and System Sciences, Vol. 7:448–461, 1973.CrossRef M. Blum, R.W. Floyd, V. Pratt, R. Rivest, and R. Tarjan. “Time bounds for selection,” Journal of Computer and System Sciences, Vol. 7:448–461, 1973.CrossRef
25.
Zurück zum Zitat A. Mkhadri. “Shrinkage parameter for the modified linear discriminant analysis,” Pattern Recognition Letters, Vol. 16(3):267–275, 1995.CrossRef A. Mkhadri. “Shrinkage parameter for the modified linear discriminant analysis,” Pattern Recognition Letters, Vol. 16(3):267–275, 1995.CrossRef
26.
Zurück zum Zitat R. T. Ng and J. Han. “Efficient and effective clustering methods for spatial data mining,” in Proceedings of the 20th International Conference on Very Large Data Bases, pp. 144–155, Santiago de Chile, Chile, September 12–15, 1994. R. T. Ng and J. Han. “Efficient and effective clustering methods for spatial data mining,” in Proceedings of the 20th International Conference on Very Large Data Bases, pp. 144–155, Santiago de Chile, Chile, September 12–15, 1994.
27.
Zurück zum Zitat Y. Panatier. VARIOWIN: Software for Spatial Data Analysis in 2D. Springer, New York, 1996. Y. Panatier. VARIOWIN: Software for Spatial Data Analysis in 2D. Springer, New York, 1996.
28.
Zurück zum Zitat M. Prastawa, E. Bullitt, S. Ho, and G. Gerig. “A brain tumor segmentation framework based on outlier detection,” Medical Image Analysis, Vol. 9(5):457–466, 2004.CrossRef M. Prastawa, E. Bullitt, S. Ho, and G. Gerig. “A brain tumor segmentation framework based on outlier detection,” Medical Image Analysis, Vol. 9(5):457–466, 2004.CrossRef
29.
Zurück zum Zitat F.P. Preparata and M.I. Shamos. Computational Geometry—An Introduction. Springer, 1985. F.P. Preparata and M.I. Shamos. Computational Geometry—An Introduction. Springer, 1985.
30.
Zurück zum Zitat S. Ramaswamy, R. Rastogi, and K. Shim. “Efficient algorithms for mining outliers from large data sets,” in Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, vol. 29, pp. 427–438, Dallas, Texas, United States, May 16–18, 2000. S. Ramaswamy, R. Rastogi, and K. Shim. “Efficient algorithms for mining outliers from large data sets,” in Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, vol. 29, pp. 427–438, Dallas, Texas, United States, May 16–18, 2000.
31.
Zurück zum Zitat P.J. Rousseeuw and K.V. Driessen. “A fast algorithm for the minimum covariance determinant estimator,” Technometrics, Vol. 41:212–223, 1999.CrossRef P.J. Rousseeuw and K.V. Driessen. “A fast algorithm for the minimum covariance determinant estimator,” Technometrics, Vol. 41:212–223, 1999.CrossRef
32.
Zurück zum Zitat I. Ruts and P.J. Rousseeuw. “Computing depth contours of bivariate point clouds,” Computational Statistics and Data Analysis, Vol. 23(1):153–168, 1996.CrossRef I. Ruts and P.J. Rousseeuw. “Computing depth contours of bivariate point clouds,” Computational Statistics and Data Analysis, Vol. 23(1):153–168, 1996.CrossRef
33.
Zurück zum Zitat S. Shekhar and S. Chawla. A Tour of Spatial Databases. Prentice Hall, 2002. S. Shekhar and S. Chawla. A Tour of Spatial Databases. Prentice Hall, 2002.
34.
Zurück zum Zitat S. Shekhar, C.-T. Lu, and P. Zhang. “A unified approach to detecting spatial outliers,” GeoInformatica, Vol. 7(2):139–166, 2003.CrossRef S. Shekhar, C.-T. Lu, and P. Zhang. “A unified approach to detecting spatial outliers,” GeoInformatica, Vol. 7(2):139–166, 2003.CrossRef
35.
Zurück zum Zitat S. Shekhar, C.-T. Lu, and P. Zhang. “Detecting graph-based spatial outliers: algorithms and applications (a summary of results),” in Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 371–376, San Francisco, California, United States, August 26–29, 2001. S. Shekhar, C.-T. Lu, and P. Zhang. “Detecting graph-based spatial outliers: algorithms and applications (a summary of results),” in Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 371–376, San Francisco, California, United States, August 26–29, 2001.
36.
Zurück zum Zitat M.E. Tipping and C.M. Bishop. “Mixtures of probabilistic principal component analysers,” Neural Computation, Vol. 11(2):443–482, 1999.CrossRef M.E. Tipping and C.M. Bishop. “Mixtures of probabilistic principal component analysers,” Neural Computation, Vol. 11(2):443–482, 1999.CrossRef
37.
Zurück zum Zitat W. Tobler. “Cellular geography,” in Philosophy in Geography, pp. 379–386, Dordrecht, Holland. Dordrecht Reidel Publishing Company, 1979. W. Tobler. “Cellular geography,” in Philosophy in Geography, pp. 379–386, Dordrecht, Holland. Dordrecht Reidel Publishing Company, 1979.
38.
Zurück zum Zitat W.-K. Wong, A. Moore, G. Cooper, and M. Wagner. “Rule-based anomaly pattern detection for detecting disease outbreaks,” in The Eighteenth National Conference on Artificial Intelligence, pp. 217–223, Edmonton, Alberta, Canada, July 28–August 1, 2002. W.-K. Wong, A. Moore, G. Cooper, and M. Wagner. “Rule-based anomaly pattern detection for detecting disease outbreaks,” in The Eighteenth National Conference on Artificial Intelligence, pp. 217–223, Edmonton, Alberta, Canada, July 28–August 1, 2002.
39.
Zurück zum Zitat L. Xu. “Bayesian ying-yang machine, clustering and number of clusters,” Pattern Recognition Letters, Vol. 18(11–13):1167–1178, 1997.CrossRef L. Xu. “Bayesian ying-yang machine, clustering and number of clusters,” Pattern Recognition Letters, Vol. 18(11–13):1167–1178, 1997.CrossRef
40.
Zurück zum Zitat K. Yamanishi, J.-I. Takeuchi, G. Williams, and P. Milne. “On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms,” Data Mining and Knowledge Discovery, Vol. 8(3):275–300, 2004.CrossRef K. Yamanishi, J.-I. Takeuchi, G. Williams, and P. Milne. “On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms,” Data Mining and Knowledge Discovery, Vol. 8(3):275–300, 2004.CrossRef
41.
Zurück zum Zitat S. Zanero and S.M. Savaresi. “Unsupervised learning techniques for an intrusion detection system,” in Proceedings of the 2004 ACM Symposium on Applied Computing, pp. 412–419, Nicosia, Cyprus, March 14–17, 2004. S. Zanero and S.M. Savaresi. “Unsupervised learning techniques for an intrusion detection system,” in Proceedings of the 2004 ACM Symposium on Applied Computing, pp. 412–419, Nicosia, Cyprus, March 14–17, 2004.
42.
Zurück zum Zitat T. Zhang, R. Ramakrishnan, and M. Livny. “Birch: an efficient data clustering method for very large databases,” in Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, pp. 103–114, Montreal, Quebec, Canada, June 4–6, 1996. T. Zhang, R. Ramakrishnan, and M. Livny. “Birch: an efficient data clustering method for very large databases,” in Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, pp. 103–114, Montreal, Quebec, Canada, June 4–6, 1996.
43.
Zurück zum Zitat J. Zhao, C.-T. Lu, and Y. Kou. “Detecting region outliers in meteorological data,” in Proceedings of the 11th ACM International Symposium on Advances in Geographic Information Systems, pp. 49–55, New Orleans, Louisiana, United States, November 7–8, 2003. J. Zhao, C.-T. Lu, and Y. Kou. “Detecting region outliers in meteorological data,” in Proceedings of the 11th ACM International Symposium on Advances in Geographic Information Systems, pp. 49–55, New Orleans, Louisiana, United States, November 7–8, 2003.
44.
Zurück zum Zitat G.H. Golub and C.F. Van Loan. Matrix Computations. The Johns Hopkins University Press, 3rd ed., 1996. G.H. Golub and C.F. Van Loan. Matrix Computations. The Johns Hopkins University Press, 3rd ed., 1996.
45.
Zurück zum Zitat S. Verboven and M. Hubert. “LIBRA: a Matlab library for robust analysis,” Chemometrics and Intelligent Laboratory Systems, Vol. 75:127–136, 1996.CrossRef S. Verboven and M. Hubert. “LIBRA: a Matlab library for robust analysis,” Chemometrics and Intelligent Laboratory Systems, Vol. 75:127–136, 1996.CrossRef
Metadaten
Titel
On Detecting Spatial Outliers
verfasst von
Dechang Chen
Chang-Tien Lu
Yufeng Kou
Feng Chen
Publikationsdatum
01.12.2008
Verlag
Springer US
Erschienen in
GeoInformatica / Ausgabe 4/2008
Print ISSN: 1384-6175
Elektronische ISSN: 1573-7624
DOI
https://doi.org/10.1007/s10707-007-0038-8

Weitere Artikel der Ausgabe 4/2008

GeoInformatica 4/2008 Zur Ausgabe