Skip to main content
Top
Published in: Journal of Intelligent Information Systems 2/2012

01-10-2012

Rotation-invariant similarity in time series using bag-of-patterns representation

Authors: Jessica Lin, Rohan Khade, Yuan Li

Published in: Journal of Intelligent Information Systems | Issue 2/2012

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

For more than a decade, time series similarity search has been given a great deal of attention by data mining researchers. As a result, many time series representations and distance measures have been proposed. However, most existing work on time series similarity search relies on shape-based similarity matching. While some of the existing approaches work well for short time series data, they typically fail to produce satisfactory results when the sequence is long. For long sequences, it is more appropriate to consider the similarity based on the higher-level structures. In this work, we present a histogram-based representation for time series data, similar to the “bag of words” approach that is widely accepted by the text mining and information retrieval communities. We performed extensive experiments and show that our approach outperforms the leading existing methods in clustering, classification, and anomaly detection on dozens of real datasets. We further demonstrate that the representation allows rotation-invariant matching in shape datasets.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
All datasets used in this paper are available at http://​www.​cs.​gmu.​edu/​~jessica/​BOP
 
2
In previous work (Lin and Li 2009), we used a different normalization technique, min-max normalization, which resulted in slightly better clustering for Euclidean distance: signals #5 and #6 were correctly clustered together. However, z-normalization is more commonly used for time series data mining tasks.
 
Literature
go back to reference Agrawal, R., Faloutsos, C., & Swami, A. (1993). Efficient similarity search in sequence databases. In Proceedings of the 4th int’l conference on foundations of data organization and algorithms (pp. 69–84). Chicago, IL. Agrawal, R., Faloutsos, C., & Swami, A. (1993). Efficient similarity search in sequence databases. In Proceedings of the 4th int’l conference on foundations of data organization and algorithms (pp. 69–84). Chicago, IL.
go back to reference Bradley, P., Fayyad, U., & Reina, C. (1998). Scaling clustering algorithms to large databases. In Proceedings of the 4th int’l conference on knowledge discovery and data mining (pp. 9–15). New York, NY. Bradley, P., Fayyad, U., & Reina, C. (1998). Scaling clustering algorithms to large databases. In Proceedings of the 4th int’l conference on knowledge discovery and data mining (pp. 9–15). New York, NY.
go back to reference Chan, K., & Fu, A. W. (1999). Efficient time series matching by wavelets. In Proceedings of the 15th IEEE int’l conference on data engineering (pp. 126–133). Sydney, Australia. Chan, K., & Fu, A. W. (1999). Efficient time series matching by wavelets. In Proceedings of the 15th IEEE int’l conference on data engineering (pp. 126–133). Sydney, Australia.
go back to reference Chen, L., & Ng, R. (2004). On the marriage of Lp-norms and edit distance. In Proceedings of the thirtieth international conference on very large data bases (Vol. 30, pp. 792–803). Chen, L., & Ng, R. (2004). On the marriage of Lp-norms and edit distance. In Proceedings of the thirtieth international conference on very large data bases (Vol. 30, pp. 792–803).
go back to reference Crochemore, M., Czumaj, A., Gasjeniec, L., Jarominek, S., Lecroq, T., Plandowski, W., et al. (1994). Speeding up two string-matching algorithms. Algorithmica, 12, 247–267.MathSciNetMATHCrossRef Crochemore, M., Czumaj, A., Gasjeniec, L., Jarominek, S., Lecroq, T., Plandowski, W., et al. (1994). Speeding up two string-matching algorithms. Algorithmica, 12, 247–267.MathSciNetMATHCrossRef
go back to reference Deng, K., Moore, A., & Nechyba, M. (1997). Learning to recognize time series: Combining ARMA models with memory-based learning. IEEE International Symposium on Computational Intelligence in Robotics and Automation, 1, 246–250. Deng, K., Moore, A., & Nechyba, M. (1997). Learning to recognize time series: Combining ARMA models with memory-based learning. IEEE International Symposium on Computational Intelligence in Robotics and Automation, 1, 246–250.
go back to reference Ding, H., Trajcevski, G., Scheuermann, P., Wang, X., & Keogh, E. (2008). Querying and mining of time series data: Experimental comparison of representations and distance measures. Proceedings of VLDB Endowment, 1(2), 1542–1552. Ding, H., Trajcevski, G., Scheuermann, P., Wang, X., & Keogh, E. (2008). Querying and mining of time series data: Experimental comparison of representations and distance measures. Proceedings of VLDB Endowment, 1(2), 1542–1552.
go back to reference Faloutsos, C., Ranganathan, M., & Manolopulos, Y. (1994). Fast subsequence matching in time-series databases. SIGMOD Record, 23, 419–429.CrossRef Faloutsos, C., Ranganathan, M., & Manolopulos, Y. (1994). Fast subsequence matching in time-series databases. SIGMOD Record, 23, 419–429.CrossRef
go back to reference Gavrilov, M., Anguelov, D., Indyk, P., & Motwahl, R. (2000). Mining the stock market: Which measure is best? In Proceeding of the 6th ACM SIGKDD. Gavrilov, M., Anguelov, D., Indyk, P., & Motwahl, R. (2000). Mining the stock market: Which measure is best? In Proceeding of the 6th ACM SIGKDD.
go back to reference Ge, X., & Smyth, P. (2000). Deformable Markov model templates for time-series pattern matching. In Proceedings of the 6th ACM SIGKDD (pp. 81–90). Boston, MA. Ge, X., & Smyth, P. (2000). Deformable Markov model templates for time-series pattern matching. In Proceedings of the 6th ACM SIGKDD (pp. 81–90). Boston, MA.
go back to reference Geurts, P. (2001). Pattern extraction for time series classification. In Proceedings of the 5th European conference on principles of data mining and knowledge discovery (pp. 115–127). Freiburg, Germany. Geurts, P. (2001). Pattern extraction for time series classification. In Proceedings of the 5th European conference on principles of data mining and knowledge discovery (pp. 115–127). Freiburg, Germany.
go back to reference Goldberger, A. L., Amaral, L., Glass, L, Hausdorff, J. M., Ivanov, P. Ch., Mark, R. G., et al. (1997). PhysioBank, PhysioToolkit, and PhysioNet: Circulation 101(23):e215–e220. Discovery, 1(3). Goldberger, A. L., Amaral, L., Glass, L, Hausdorff, J. M., Ivanov, P. Ch., Mark, R. G., et al. (1997). PhysioBank, PhysioToolkit, and PhysioNet: Circulation 101(23):e215–e220. Discovery, 1(3).
go back to reference Johnson, S. C. (1967). Hierarchical clustering schemes. Psychometrika, 2, 241–254.CrossRef Johnson, S. C. (1967). Hierarchical clustering schemes. Psychometrika, 2, 241–254.CrossRef
go back to reference Keogh, E. (2002). Exact indexing of dynamic time warping. In Proceedings of the 28th international conference on very large data bases. Hong Kong, China. Keogh, E. (2002). Exact indexing of dynamic time warping. In Proceedings of the 28th international conference on very large data bases. Hong Kong, China.
go back to reference Keogh, E. (2004). Tutorial in SIGKDD. In Data mining and machine learning in time series databases. Keogh, E. (2004). Tutorial in SIGKDD. In Data mining and machine learning in time series databases.
go back to reference Keogh, E., & Kasetty, S. (2002). On the need for time series data mining benchmarks: A survey and empirical demonstration. In Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery (pp. 102–111). Edmonton, Alberta, Canada. Keogh, E., & Kasetty, S. (2002). On the need for time series data mining benchmarks: A survey and empirical demonstration. In Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery (pp. 102–111). Edmonton, Alberta, Canada.
go back to reference Keogh, E., Chakrabarti, K., & Pazzani, M. (2001) Locally adaptive dimensionality reduction for indexing large time series databases. In Proceedings of ACM SIGMOD conference on management of data (pp. 151–162). Santa Barbara. Keogh, E., Chakrabarti, K., & Pazzani, M. (2001) Locally adaptive dimensionality reduction for indexing large time series databases. In Proceedings of ACM SIGMOD conference on management of data (pp. 151–162). Santa Barbara.
go back to reference Keogh, E., Lonardi, S., & Ratanamahatana, C. A. (2004). Towards parameter-free data mining. In Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining. Seattle, WA, USA. Keogh, E., Lonardi, S., & Ratanamahatana, C. A. (2004). Towards parameter-free data mining. In Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining. Seattle, WA, USA.
go back to reference Keogh, E., Lin, J., & Fu, A. (2006b). Finding the most unusual time series subsequence: Algorithms and applications. Knowledge and Information Systems (KAIS). Springer-Verlag. Keogh, E., Lin, J., & Fu, A. (2006b). Finding the most unusual time series subsequence: Algorithms and applications. Knowledge and Information Systems (KAIS). Springer-Verlag.
go back to reference Keogh, E., Wei, L., Xi, X., Lee, S., & Vlachos, M. (2006c). LB_Keogh supports exact indexing of shapes under rotation invariance with arbitrary representations and distance measures. In Proceedings of the 32nd international conference on very large data bases. Keogh, E., Wei, L., Xi, X., Lee, S., & Vlachos, M. (2006c). LB_Keogh supports exact indexing of shapes under rotation invariance with arbitrary representations and distance measures. In Proceedings of the 32nd international conference on very large data bases.
go back to reference Ko, M. K., West, G., Venkatesh, S., & Kumar, M. (2005) Online context recognition in multisensor systems using dynamic time warping. In Intelligent sensors, sensor networks and information processing conference (pp. 283–288). Ko, M. K., West, G., Venkatesh, S., & Kumar, M. (2005) Online context recognition in multisensor systems using dynamic time warping. In Intelligent sensors, sensor networks and information processing conference (pp. 283–288).
go back to reference Kriegel, H.-P., Kroger, P., Pryakhin, A., Renz, M., & Zherdin, A. (2008). Approximate clustering of time series using compact model-based descriptions. In J. R. Haritsa, R. Kotagiri, & V. Pudi (Eds.), Proceedings of the 13th international conference on database systems for advanced applications (DASFAA’08) (pp. 364–379). Berlin, Heidelberg: Springer-Verlag. Kriegel, H.-P., Kroger, P., Pryakhin, A., Renz, M., & Zherdin, A. (2008). Approximate clustering of time series using compact model-based descriptions. In J. R. Haritsa, R. Kotagiri, & V. Pudi (Eds.), Proceedings of the 13th international conference on database systems for advanced applications (DASFAA’08) (pp. 364–379). Berlin, Heidelberg: Springer-Verlag.
go back to reference Li, M., & Vitanyi, P. (1997). An introduction to Kolmogorov complexity and its applications, 2nd Edn. Springer Verlag. Li, M., & Vitanyi, P. (1997). An introduction to Kolmogorov complexity and its applications, 2nd Edn. Springer Verlag.
go back to reference Lin, J., & Li, Y. (2009). Finding structural similarity in time series data using Bag-of-Patterns representation. In M. Winslett (Ed.), Proceedings of the 21st international conference on scientific and statistical database management (SSDBM 2009) (pp. 461–477). Berlin, Heidelberg: Springer-Verlag. Lin, J., & Li, Y. (2009). Finding structural similarity in time series data using Bag-of-Patterns representation. In M. Winslett (Ed.), Proceedings of the 21st international conference on scientific and statistical database management (SSDBM 2009) (pp. 461–477). Berlin, Heidelberg: Springer-Verlag.
go back to reference Lin, J., Keogh, E., Li, W., & Lonardi, S. (2007). Experiencing SAX: A novel symbolic representation of time series. Data Mining and Knowledge Discovery, 15(2), 107–144.MathSciNetCrossRef Lin, J., Keogh, E., Li, W., & Lonardi, S. (2007). Experiencing SAX: A novel symbolic representation of time series. Data Mining and Knowledge Discovery, 15(2), 107–144.MathSciNetCrossRef
go back to reference Lin, J., Vlachos, M., Keogh, E., & Gunopulos, D. (2004). Iterative incremental clustering of time series. In IX conference on extending database technology (EDBT). Lin, J., Vlachos, M., Keogh, E., & Gunopulos, D. (2004). Iterative incremental clustering of time series. In IX conference on extending database technology (EDBT).
go back to reference Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to information retrieval. Cambridge University Press. Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to information retrieval. Cambridge University Press.
go back to reference Marteau, P.-F. (2009). Time warp edit distance with stiffness adjustment for time series matching. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(2), 306–318.CrossRef Marteau, P.-F. (2009). Time warp edit distance with stiffness adjustment for time series matching. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(2), 306–318.CrossRef
go back to reference McQueen, J. (1967). Some methods for classification and analysis of multivariate observation. In L. Le Cam, & J. Neyman (Eds.), Proceedings of the 5th Berkeley symposium on mathematical statistics and probability (Vol. 1, pp. 281–297). Berkeley, CA. McQueen, J. (1967). Some methods for classification and analysis of multivariate observation. In L. Le Cam, & J. Neyman (Eds.), Proceedings of the 5th Berkeley symposium on mathematical statistics and probability (Vol. 1, pp. 281–297). Berkeley, CA.
go back to reference Mueen, A., Keogh, E., & Young, N. (2011). Logical-shapelets: An expressive primitive for time series classification. In Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining (KDD ’11). San Diego, CA. Mueen, A., Keogh, E., & Young, N. (2011). Logical-shapelets: An expressive primitive for time series classification. In Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining (KDD ’11). San Diego, CA.
go back to reference Nanopoulos, A., Alcock, R., & Manolopoulos, Y. (2001). Feature-based classification of time-series data. In N. Mastorakis, & S. D. Nikolopoulos (Eds.), Information processing and technology (pp. 49–61). Commack, NY: Nova Science Publishers. Nanopoulos, A., Alcock, R., & Manolopoulos, Y. (2001). Feature-based classification of time-series data. In N. Mastorakis, & S. D. Nikolopoulos (Eds.), Information processing and technology (pp. 49–61). Commack, NY: Nova Science Publishers.
go back to reference Olszewski, R. (2001). Generalized feature extraction for structural pattern recognition in time-series data. Ph.D. thesis, Carnegie Mellon University, Pittsburgh, PA. Olszewski, R. (2001). Generalized feature extraction for structural pattern recognition in time-series data. Ph.D. thesis, Carnegie Mellon University, Pittsburgh, PA.
go back to reference Radovanovic, M., Nanopoulos, A., & Ivanovic, M. (2010). Time-series classification in many intrinsic dimensions (pp. 677–688). SDM. Radovanovic, M., Nanopoulos, A., & Ivanovic, M. (2010). Time-series classification in many intrinsic dimensions (pp. 677–688). SDM.
go back to reference Ratanamahatana, C. A., & Keogh, E. (2004). Making time-series classification more accurate using learned constraints. In Proceedings of SIAM international conference on data mining. Lake Buena Vista, Florida. Ratanamahatana, C. A., & Keogh, E. (2004). Making time-series classification more accurate using learned constraints. In Proceedings of SIAM international conference on data mining. Lake Buena Vista, Florida.
go back to reference Salton, G., Wong, A., & Yang, C. S. (1975). A vector space model for automatic indexing. Communications of the ACM, 19, 613–620.CrossRef Salton, G., Wong, A., & Yang, C. S. (1975). A vector space model for automatic indexing. Communications of the ACM, 19, 613–620.CrossRef
go back to reference Sart, D., Mueen, A., Najjar, W., Keogh, E., & Niennattrakul, V. (2010). Accelerating dynamic time warping subsequence search with GPUs and FPGAs. In Proceedings of the 2010 IEEE international conference on data mining (ICDM’10) (pp. 1001–1006). Washington, DC, USA. Sart, D., Mueen, A., Najjar, W., Keogh, E., & Niennattrakul, V. (2010). Accelerating dynamic time warping subsequence search with GPUs and FPGAs. In Proceedings of the 2010 IEEE international conference on data mining (ICDM’10) (pp. 1001–1006). Washington, DC, USA.
go back to reference Wang, X., Smith, K., & Hyndman, R. (2006). Characteristic-based clustering for time series data. Data Mining and Knowledge Discovery, 13(3), 335–364.MathSciNetCrossRef Wang, X., Smith, K., & Hyndman, R. (2006). Characteristic-based clustering for time series data. Data Mining and Knowledge Discovery, 13(3), 335–364.MathSciNetCrossRef
go back to reference Wei, L., & Keogh, E. (2006). Semi-supervised time series classification. In Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 748–753). New York, NY, U.S.A.: ACM.CrossRef Wei, L., & Keogh, E. (2006). Semi-supervised time series classification. In Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 748–753). New York, NY, U.S.A.: ACM.CrossRef
go back to reference Wei, L., Keogh, E., & Xi, X. (2006) SAXually explicit images: finding unusual shapes. In Proceedings of the IEEE international conference on data mining. Hong Kong. Wei, L., Keogh, E., & Xi, X. (2006) SAXually explicit images: finding unusual shapes. In Proceedings of the IEEE international conference on data mining. Hong Kong.
go back to reference Vlachos, M., Gunopoulos, D., & Kollios, G. (2002). Discovering similar multidimensional trajectories. In Proceedings of the 18th International Conference on Data Engineering. Vlachos, M., Gunopoulos, D., & Kollios, G. (2002). Discovering similar multidimensional trajectories. In Proceedings of the 18th International Conference on Data Engineering.
go back to reference Xing, Z., Pei, J., Yu, P., & Wang, K. (2011). Extracting interpretable features for early classification on time series. In Proceedings of SDM, 2011. Xing, Z., Pei, J., Yu, P., & Wang, K. (2011). Extracting interpretable features for early classification on time series. In Proceedings of SDM, 2011.
go back to reference Ye, L., & Keogh, E. (2009). Time series shapelets: A new primitive for data mining. In Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’09). New York, NY. Ye, L., & Keogh, E. (2009). Time series shapelets: A new primitive for data mining. In Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’09). New York, NY.
Metadata
Title
Rotation-invariant similarity in time series using bag-of-patterns representation
Authors
Jessica Lin
Rohan Khade
Yuan Li
Publication date
01-10-2012
Publisher
Springer US
Published in
Journal of Intelligent Information Systems / Issue 2/2012
Print ISSN: 0925-9902
Electronic ISSN: 1573-7675
DOI
https://doi.org/10.1007/s10844-012-0196-5

Other articles of this Issue 2/2012

Journal of Intelligent Information Systems 2/2012 Go to the issue

Premium Partner