research-article

Co-Clustering Structural Temporal Data with Applications to Semiconductor Manufacturing

Authors:
Yada Zhu

IBM Research, Yorktown Heights, NY

IBM Research, Yorktown Heights, NY
View Profile

,
Jingrui He

Arizona State University, Temple, AZ

Arizona State University, Temple, AZ
View Profile

ACM Transactions on Knowledge Discovery from Data Volume 10 Issue 4Article No.: 43pp 1–18https://doi.org/10.1145/2875427

Published:24 May 2016Publication History

ACM Transactions on Knowledge Discovery from Data

Abstract

Recent years have witnessed data explosion in semiconductor manufacturing due to advances in instrumentation and storage techniques. The large amount of data associated with process variables monitored over time form a rich reservoir of information, which can be used for a variety of purposes, such as anomaly detection, quality control, and fault diagnostics. In particular, following the same recipe for a certain Integrated Circuit device, multiple tools and chambers can be deployed for the production of this device, during which multiple time series can be collected, such as temperature, impedance, gas flow, electric bias, etc. These time series naturally fit into a two-dimensional array (matrix), i.e., each element in this array corresponds to a time series for one process variable from one chamber. To leverage the rich structural information in such temporal data, in this article, we propose a novel framework named C-Struts to simultaneously cluster on the two dimensions of this array. In this framework, we interpret the structural information as a set of constraints on the cluster membership, introduce an auxiliary probability distribution accordingly, and design an iterative algorithm to assign each time series to a certain cluster on each dimension. Furthermore, we establish the equivalence between C-Struts and a generic optimization problem, which is able to accommodate various distance functions. Extensive experiments on synthetic, benchmark, as well as manufacturing datasets demonstrate the effectiveness of the proposed method.

References

Arindam Banerjee, Inderjit S. Dhillon, Joydeep Ghosh, Srujana Merugu, and Dharmendra S. Modha. 2007. A generalized maximum entropy approach to Bregman co-clustering and matrix approximation. J. Mach. Learn. Res. 8, 1919--1986. Google ScholarDigital Library
Deepayan Chakrabarti. 2004. AutoPart: parameter-free graph partitioning and outlier detection. In Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD). Springer-Verlag Berlin Heidelberg, 112--124. Google ScholarDigital Library
Deepayan Chakrabarti, Spiros Papadimitriou, Dharmendra S. Modha, and Christos Faloutsos. 2004. Fully automatic cross-associations. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM New York, NY, USA, 79--88. Google ScholarDigital Library
Hyung Jin Chang, Dong Sung, Pyo Jae Kim, and Jin Young Choi. 2012. Spatiotemporal pattern modeling for fault detection and classification in semiconductor manufacturing. IEEE Trans. Semicond. Manuf. 25, 72--82.Google ScholarCross Ref
Yanping Chen, Bing Hu, Eamonn J. Keogh, and Gustavo E. A. P. A. Batista. 2013. DTW-D: time series semi-supervised learning from a single example. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM New York, NY, USA, 383--391. Google ScholarDigital Library
Hyuk Cho, Inderjit S. Dhillon, Yuqiang Guan, and Suvrit Sra. 2004. Minimum sum-squared residue co-clustering of gene expression data. In Proceedings of the SIAM International Conference on Data Mining. 114--125.Google ScholarCross Ref
Inderjit S. Dhillon, Subramanyam Mallela, and Dharmendra S. Modha. 2003. Information-theoretic co-clustering. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM New York, NY, USA, 89--98. Google ScholarDigital Library
Ada Wai-Chee Fu, Eamonn J. Keogh, Leo Yung Hang Lau, Chotirat (Ann) Ratanamahatana, and Raymond Chi-Wing Wong. 2008. Scaling and time warping in time series querying. VLDB J. 17, 4, 899--921. Google ScholarDigital Library
Bin Gao, Tie-Yan Liu, Xin Zheng, QianSheng Cheng, and Wei-Ying Ma. 2005. Consistent bipartite graph co-partitioning for star-structured high-order heterogeneous data co-clustering. In Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM New York, NY, USA, 41--50. Google ScholarDigital Library
Jingrui He, Hanghang Tong, Spiros Papadimitriou, Tina Eliassi-Rad, Christos Faloutsos, and Jaime Carbonell. 2009. PaCK: scalable parameter-free clustering on k-partite graphs. In Proceedings of the SDM Workshop on Link Analysis, Counterterrorism and Security.Google Scholar
Qinghua Peter He and Jin Wang. 2010. Large-scale semiconductor process fault detection using a fast pattern recognition-based method. IEEE Trans. Semicond. Manuf. 23, 194--200.Google ScholarCross Ref
Bing Hu, Yanping Chen, and Eamonn J. Keogh. 2013. Time series classification under more realistic assumptions. In Proceedings of the SIAM International Conference on Data Mining. 578--586.Google Scholar
Anil K. Jain and Richard C. Dubes. 1988. Algorithms for Clustering Data. Prentice-Hall, Inc., Upper Saddle River, NJ. Google ScholarDigital Library
Jyh-Cheng Jeng, Cheng-Chih Li, and Hsiao-Ping Huang. 2007. Fault detection and isolation for dynamic processes using recursive principal component analysis based on filtering of signals. Asia-Pacific J. chem. Eng. 2, 501--509.Google Scholar
A. B. Johnson and S. F. McLoone. 2012. A dynamic sampling methodology for within product virtual metrology. In Proceedings of the 29th International Manufacturing Conference. University of Ulster, Coleraine, United Kingdom.Google Scholar
Daniel Kurz, Cristina De Luca, and Jurgen Pilz. 2013. Monitoring virtual metrology reliability in a sampling decision system. In Proceedings of the Conference on Automation Science and Engineering. IEEE.Google ScholarCross Ref
Lei Li and B. Aditya Prakash. 2011. Time series clustering: complex is simpler!. In Proceedings of the 28th International Conference on Machine Learning. 185--192.Google Scholar
Lei Li, B. Aditya Prakash, and Christos Faloutsos. 2010. Parsimonious linear fingerprinting for time series. Very Large Database Endowment 3, 1, 385--396. Google ScholarDigital Library
Tao Li. 2005. A general model for clustering binary data. In Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM New York, NY, USA, 188--197. Google ScholarDigital Library
Weihua Li, H. Henry Yue, Sergio Valle-Cervantes, and S. Joe Qin. 2000. Recursive PCA for adaptive process monitoring. J. Process Contr. 10, 471--486.Google ScholarCross Ref
Bo Long, Zhongfei (Mark) Zhang, Xiaoyun Wu, and Philip S. Yu. 2006. Spectral clustering for multi-type relational data. In Proceedings of the 23rd International Conference on Machine Learning. 585--592. Google ScholarDigital Library
Bo Long, Zhongfei (Mark) Zhang, and Philip S. Yu. 2007. A probabilistic framework for relational clustering. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM New York, NY, USA, 470--479. Google ScholarDigital Library
Sara C. Madeira and Arlindo L. Oliveira. 2004. Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Trans. Comput. Biol. Bioinform. 1, 1, 24--45. Google ScholarDigital Library
Manish Misra. 2007. Novel techniques for real-time and predictive FDC systems. Future Fab Int. 22.Google Scholar
Glenn Newell, Naji Bekhazi, and Ray Morgan. 2007. Optimizing Storage and I/O for Distributed Processing on Enterprise and High Performance Compute (HPC) Systems for Mask Data Preparation Software (CATS). Technical Report. Synopsys, Inc.Google Scholar
Spiros Papadimitriou, Jimeng Sun, and Christos Faloutsos. 2005. Streaming pattern discovery in multiple time-series. In Proceedings of the 31st International Conference on Very Large Data Bases. VLDB Endowment, 697--708. Google ScholarDigital Library
Thanawin Rakthanmanon, Bilson J. L. Campana, Abdullah Mueen, Gustavo E. A. P. A. Batista, M. Brandon Westover, Qiang Zhu, Jesin Zakaria, and Eamonn J. Keogh. 2012a. Searching and mining trillions of time series subsequences under dynamic time warping. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM New York, NY, USA, 262--270. Google ScholarDigital Library
Thanawin Rakthanmanon, Bilson J. L. Campana, Abdullah Mueen, Gustavo E. A. P. A. Batista, M. Brandon Westover, Qiang Zhu, Jesin Zakaria, and Eamonn J. Keogh. 2012b. Searching and mining trillions of time series subsequences under dynamic time warping. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM New York, NY, USA, 262--270. Google ScholarDigital Library
Thanawin Rakthanmanon, Eamonn J. Keogh, Stefano Lonardi, and Scott Evans. 2012c. MDL-based time series clustering. Knowl. Inf. Syst. 33, 2 (2012), 371--399.Google ScholarDigital Library
Ajit Paul Singh and Geoffrey J. Gordon. 2008. Relational learning via collective matrix factorization. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM New York, NY, USA, 650--658. Google ScholarDigital Library
Jimeng Sun, Christos Faloutsos, Spiros Papadimitriou, and Philip S. Yu. 2007. GraphScope: parameter-free mining of large time-evolving graphs. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM New York, NY, USA, 687--696. Google ScholarDigital Library
R. Tibshirani. 1996. Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Series B Stat. Methodol. 58, 267--288.Google ScholarCross Ref
Li Wei, Eamonn J. Keogh, Xiaopeng Xi, and Melissa Yoder. 2008. Efficiently finding unusual shapes in large image databases. Data Min. Knowl. Discov. 17, 3, 343--376. Google ScholarDigital Library
Xiaopeng Xi, Eamonn J. Keogh, Christian R. Shelton, Li Wei, and Chotirat Ann Ratanamahatana. 2006. Fast time series classification using numerosity reduction. In Proceedings of the 23rd International Conference on Machine Learning. 1033--1040. Google ScholarDigital Library
Byoung-Kee Yi, Nikolaos Sidiropoulos, Theodore Johnson, H. V. Jagadish, Christos Faloutsos, and Alexandros Biliris. 2000. Online data mining for co-evolving time sequences. In Proceedings of the 16th International Conference on Data Engineering. IEEE, 13--22. Google ScholarDigital Library
Jesin Zakaria, Abdullah Mueen, and Eamonn J. Keogh. 2012a. Clustering time series using unsupervised-shapelets. In Proceedings of the IEEE International Conference on Data Mining. IEEE, 785--794. Google ScholarDigital Library
Jesin Zakaria, Abdullah Mueen, and Eamonn J. Keogh. 2012b. Clustering time series using unsupervised-shapelets. In Proceedings of the IEEE International Conference on Data Mining. IEEE, 785--794. Google ScholarDigital Library
Yada Zhu and Jingrui He. 2014. Co-clustering structural temporal data with applications to semiconductor manufacturing. In Proceedings of the IEEE International Conference on Data Mining. IEEE, 1121--1126. Google ScholarDigital Library
Yada Zhu, Jingrui He, and Rick Lawrence. 2012. Hierarchical modeling with tensor inputs. In Proceedings of the 26 AAAI Conference on Artificial Intelligence. AAAI. Google ScholarDigital Library
Hui Zou and Trevor Hastie. 2003. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Series B Stat. Methodol. 67, 2, 301--320.Google ScholarCross Ref

Index Terms

Co-Clustering Structural Temporal Data with Applications to Semiconductor Manufacturing
1. Mathematics of computing
  1. Probability and statistics
    1. Statistical paradigms
      1. Time series analysis
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Machine learning theory
      1. Structured prediction
      2. Unsupervised learning and clustering

Recommendations

Co-Clustering Structural Temporal Data with Applications to Semiconductor Manufacturing
ICDM '14: Proceedings of the 2014 IEEE International Conference on Data Mining

Recent years have witnessed data explosion in semiconductor manufacturing due to advances in instrumentation and storage techniques. In particular, following the same recipe for a certain IC device, multiple tools and chambers can be deployed for the ...
Read More
Non-Exhaustive, Overlapping Co-Clustering
CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management

The goal of co-clustering is to simultaneously identify a clustering of the rows as well as the columns of a two dimensional data matrix. Most existing co-clustering algorithms are designed to find pairwise disjoint and exhaustive co-clusters. However, ...
Read More
Weighted multi-view co-clustering (WMVCC) for sparse data
Abstract
Multi-view clustering has gained importance in recent times due to the large-scale generation of data, often from multiple sources. Multi-view clustering refers to clustering a set of objects which are expressed by multiple set of features, known ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Knowledge Discovery from Data Volume 10, Issue 4
Special Issue on SIGKDD 2014, Special Issue on BIGCHAT and Regular Papers
July 2016
417 pages
ISSN:1556-4681
EISSN:1556-472X
DOI:10.1145/2936311
Editor:
Philip S. Yu
University of Illinois at Chicago, USA
Issue’s Table of Contents
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 May 2016
- Accepted: 1 January 2016
- Revised: 1 December 2015
- Received: 1 May 2015
Published in tkdd Volume 10, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Co-clustering
semiconductor
structural
temporal
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 6
  Total Citations
  View Citations
- 271
  Total Downloads
- Downloads (Last 12 months)18
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Co-Clustering Structural Temporal Data with Applications to Semiconductor Manufacturing

ACM Transactions on Knowledge Discovery from Data

Abstract

References

Cited By

Index Terms

Recommendations

Co-Clustering Structural Temporal Data with Applications to Semiconductor Manufacturing

Non-Exhaustive, Overlapping Co-Clustering

Weighted multi-view co-clustering (WMVCC) for sparse data

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Co-Clustering Structural Temporal Data with Applications to Semiconductor Manufacturing

ACM Transactions on Knowledge Discovery from Data

Abstract

References

Cited By

Index Terms

Recommendations

Co-Clustering Structural Temporal Data with Applications to Semiconductor Manufacturing

Non-Exhaustive, Overlapping Co-Clustering

Weighted multi-view co-clustering (WMVCC) for sparse data

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media