skip to main content
research-article

Co-Clustering Structural Temporal Data with Applications to Semiconductor Manufacturing

Published:24 May 2016Publication History
Skip Abstract Section

Abstract

Recent years have witnessed data explosion in semiconductor manufacturing due to advances in instrumentation and storage techniques. The large amount of data associated with process variables monitored over time form a rich reservoir of information, which can be used for a variety of purposes, such as anomaly detection, quality control, and fault diagnostics. In particular, following the same recipe for a certain Integrated Circuit device, multiple tools and chambers can be deployed for the production of this device, during which multiple time series can be collected, such as temperature, impedance, gas flow, electric bias, etc. These time series naturally fit into a two-dimensional array (matrix), i.e., each element in this array corresponds to a time series for one process variable from one chamber. To leverage the rich structural information in such temporal data, in this article, we propose a novel framework named C-Struts to simultaneously cluster on the two dimensions of this array. In this framework, we interpret the structural information as a set of constraints on the cluster membership, introduce an auxiliary probability distribution accordingly, and design an iterative algorithm to assign each time series to a certain cluster on each dimension. Furthermore, we establish the equivalence between C-Struts and a generic optimization problem, which is able to accommodate various distance functions. Extensive experiments on synthetic, benchmark, as well as manufacturing datasets demonstrate the effectiveness of the proposed method.

References

  1. Arindam Banerjee, Inderjit S. Dhillon, Joydeep Ghosh, Srujana Merugu, and Dharmendra S. Modha. 2007. A generalized maximum entropy approach to Bregman co-clustering and matrix approximation. J. Mach. Learn. Res. 8, 1919--1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Deepayan Chakrabarti. 2004. AutoPart: parameter-free graph partitioning and outlier detection. In Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD). Springer-Verlag Berlin Heidelberg, 112--124. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Deepayan Chakrabarti, Spiros Papadimitriou, Dharmendra S. Modha, and Christos Faloutsos. 2004. Fully automatic cross-associations. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM New York, NY, USA, 79--88. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Hyung Jin Chang, Dong Sung, Pyo Jae Kim, and Jin Young Choi. 2012. Spatiotemporal pattern modeling for fault detection and classification in semiconductor manufacturing. IEEE Trans. Semicond. Manuf. 25, 72--82.Google ScholarGoogle ScholarCross RefCross Ref
  5. Yanping Chen, Bing Hu, Eamonn J. Keogh, and Gustavo E. A. P. A. Batista. 2013. DTW-D: time series semi-supervised learning from a single example. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM New York, NY, USA, 383--391. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Hyuk Cho, Inderjit S. Dhillon, Yuqiang Guan, and Suvrit Sra. 2004. Minimum sum-squared residue co-clustering of gene expression data. In Proceedings of the SIAM International Conference on Data Mining. 114--125.Google ScholarGoogle ScholarCross RefCross Ref
  7. Inderjit S. Dhillon, Subramanyam Mallela, and Dharmendra S. Modha. 2003. Information-theoretic co-clustering. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM New York, NY, USA, 89--98. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Ada Wai-Chee Fu, Eamonn J. Keogh, Leo Yung Hang Lau, Chotirat (Ann) Ratanamahatana, and Raymond Chi-Wing Wong. 2008. Scaling and time warping in time series querying. VLDB J. 17, 4, 899--921. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Bin Gao, Tie-Yan Liu, Xin Zheng, QianSheng Cheng, and Wei-Ying Ma. 2005. Consistent bipartite graph co-partitioning for star-structured high-order heterogeneous data co-clustering. In Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM New York, NY, USA, 41--50. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Jingrui He, Hanghang Tong, Spiros Papadimitriou, Tina Eliassi-Rad, Christos Faloutsos, and Jaime Carbonell. 2009. PaCK: scalable parameter-free clustering on k-partite graphs. In Proceedings of the SDM Workshop on Link Analysis, Counterterrorism and Security.Google ScholarGoogle Scholar
  11. Qinghua Peter He and Jin Wang. 2010. Large-scale semiconductor process fault detection using a fast pattern recognition-based method. IEEE Trans. Semicond. Manuf. 23, 194--200.Google ScholarGoogle ScholarCross RefCross Ref
  12. Bing Hu, Yanping Chen, and Eamonn J. Keogh. 2013. Time series classification under more realistic assumptions. In Proceedings of the SIAM International Conference on Data Mining. 578--586.Google ScholarGoogle Scholar
  13. Anil K. Jain and Richard C. Dubes. 1988. Algorithms for Clustering Data. Prentice-Hall, Inc., Upper Saddle River, NJ. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Jyh-Cheng Jeng, Cheng-Chih Li, and Hsiao-Ping Huang. 2007. Fault detection and isolation for dynamic processes using recursive principal component analysis based on filtering of signals. Asia-Pacific J. chem. Eng. 2, 501--509.Google ScholarGoogle Scholar
  15. A. B. Johnson and S. F. McLoone. 2012. A dynamic sampling methodology for within product virtual metrology. In Proceedings of the 29th International Manufacturing Conference. University of Ulster, Coleraine, United Kingdom.Google ScholarGoogle Scholar
  16. Daniel Kurz, Cristina De Luca, and Jurgen Pilz. 2013. Monitoring virtual metrology reliability in a sampling decision system. In Proceedings of the Conference on Automation Science and Engineering. IEEE.Google ScholarGoogle ScholarCross RefCross Ref
  17. Lei Li and B. Aditya Prakash. 2011. Time series clustering: complex is simpler!. In Proceedings of the 28th International Conference on Machine Learning. 185--192.Google ScholarGoogle Scholar
  18. Lei Li, B. Aditya Prakash, and Christos Faloutsos. 2010. Parsimonious linear fingerprinting for time series. Very Large Database Endowment 3, 1, 385--396. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Tao Li. 2005. A general model for clustering binary data. In Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM New York, NY, USA, 188--197. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Weihua Li, H. Henry Yue, Sergio Valle-Cervantes, and S. Joe Qin. 2000. Recursive PCA for adaptive process monitoring. J. Process Contr. 10, 471--486.Google ScholarGoogle ScholarCross RefCross Ref
  21. Bo Long, Zhongfei (Mark) Zhang, Xiaoyun Wu, and Philip S. Yu. 2006. Spectral clustering for multi-type relational data. In Proceedings of the 23rd International Conference on Machine Learning. 585--592. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Bo Long, Zhongfei (Mark) Zhang, and Philip S. Yu. 2007. A probabilistic framework for relational clustering. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM New York, NY, USA, 470--479. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Sara C. Madeira and Arlindo L. Oliveira. 2004. Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Trans. Comput. Biol. Bioinform. 1, 1, 24--45. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Manish Misra. 2007. Novel techniques for real-time and predictive FDC systems. Future Fab Int. 22.Google ScholarGoogle Scholar
  25. Glenn Newell, Naji Bekhazi, and Ray Morgan. 2007. Optimizing Storage and I/O for Distributed Processing on Enterprise and High Performance Compute (HPC) Systems for Mask Data Preparation Software (CATS). Technical Report. Synopsys, Inc.Google ScholarGoogle Scholar
  26. Spiros Papadimitriou, Jimeng Sun, and Christos Faloutsos. 2005. Streaming pattern discovery in multiple time-series. In Proceedings of the 31st International Conference on Very Large Data Bases. VLDB Endowment, 697--708. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Thanawin Rakthanmanon, Bilson J. L. Campana, Abdullah Mueen, Gustavo E. A. P. A. Batista, M. Brandon Westover, Qiang Zhu, Jesin Zakaria, and Eamonn J. Keogh. 2012a. Searching and mining trillions of time series subsequences under dynamic time warping. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM New York, NY, USA, 262--270. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Thanawin Rakthanmanon, Bilson J. L. Campana, Abdullah Mueen, Gustavo E. A. P. A. Batista, M. Brandon Westover, Qiang Zhu, Jesin Zakaria, and Eamonn J. Keogh. 2012b. Searching and mining trillions of time series subsequences under dynamic time warping. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM New York, NY, USA, 262--270. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Thanawin Rakthanmanon, Eamonn J. Keogh, Stefano Lonardi, and Scott Evans. 2012c. MDL-based time series clustering. Knowl. Inf. Syst. 33, 2 (2012), 371--399.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Ajit Paul Singh and Geoffrey J. Gordon. 2008. Relational learning via collective matrix factorization. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM New York, NY, USA, 650--658. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Jimeng Sun, Christos Faloutsos, Spiros Papadimitriou, and Philip S. Yu. 2007. GraphScope: parameter-free mining of large time-evolving graphs. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM New York, NY, USA, 687--696. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. R. Tibshirani. 1996. Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Series B Stat. Methodol. 58, 267--288.Google ScholarGoogle ScholarCross RefCross Ref
  33. Li Wei, Eamonn J. Keogh, Xiaopeng Xi, and Melissa Yoder. 2008. Efficiently finding unusual shapes in large image databases. Data Min. Knowl. Discov. 17, 3, 343--376. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Xiaopeng Xi, Eamonn J. Keogh, Christian R. Shelton, Li Wei, and Chotirat Ann Ratanamahatana. 2006. Fast time series classification using numerosity reduction. In Proceedings of the 23rd International Conference on Machine Learning. 1033--1040. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Byoung-Kee Yi, Nikolaos Sidiropoulos, Theodore Johnson, H. V. Jagadish, Christos Faloutsos, and Alexandros Biliris. 2000. Online data mining for co-evolving time sequences. In Proceedings of the 16th International Conference on Data Engineering. IEEE, 13--22. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Jesin Zakaria, Abdullah Mueen, and Eamonn J. Keogh. 2012a. Clustering time series using unsupervised-shapelets. In Proceedings of the IEEE International Conference on Data Mining. IEEE, 785--794. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Jesin Zakaria, Abdullah Mueen, and Eamonn J. Keogh. 2012b. Clustering time series using unsupervised-shapelets. In Proceedings of the IEEE International Conference on Data Mining. IEEE, 785--794. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Yada Zhu and Jingrui He. 2014. Co-clustering structural temporal data with applications to semiconductor manufacturing. In Proceedings of the IEEE International Conference on Data Mining. IEEE, 1121--1126. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Yada Zhu, Jingrui He, and Rick Lawrence. 2012. Hierarchical modeling with tensor inputs. In Proceedings of the 26 AAAI Conference on Artificial Intelligence. AAAI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Hui Zou and Trevor Hastie. 2003. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Series B Stat. Methodol. 67, 2, 301--320.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Co-Clustering Structural Temporal Data with Applications to Semiconductor Manufacturing

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Knowledge Discovery from Data
          ACM Transactions on Knowledge Discovery from Data  Volume 10, Issue 4
          Special Issue on SIGKDD 2014, Special Issue on BIGCHAT and Regular Papers
          July 2016
          417 pages
          ISSN:1556-4681
          EISSN:1556-472X
          DOI:10.1145/2936311
          Issue’s Table of Contents

          Copyright © 2016 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 24 May 2016
          • Accepted: 1 January 2016
          • Revised: 1 December 2015
          • Received: 1 May 2015
          Published in tkdd Volume 10, Issue 4

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader