Abstract
Recent years have witnessed data explosion in semiconductor manufacturing due to advances in instrumentation and storage techniques. The large amount of data associated with process variables monitored over time form a rich reservoir of information, which can be used for a variety of purposes, such as anomaly detection, quality control, and fault diagnostics. In particular, following the same recipe for a certain Integrated Circuit device, multiple tools and chambers can be deployed for the production of this device, during which multiple time series can be collected, such as temperature, impedance, gas flow, electric bias, etc. These time series naturally fit into a two-dimensional array (matrix), i.e., each element in this array corresponds to a time series for one process variable from one chamber. To leverage the rich structural information in such temporal data, in this article, we propose a novel framework named C-Struts to simultaneously cluster on the two dimensions of this array. In this framework, we interpret the structural information as a set of constraints on the cluster membership, introduce an auxiliary probability distribution accordingly, and design an iterative algorithm to assign each time series to a certain cluster on each dimension. Furthermore, we establish the equivalence between C-Struts and a generic optimization problem, which is able to accommodate various distance functions. Extensive experiments on synthetic, benchmark, as well as manufacturing datasets demonstrate the effectiveness of the proposed method.
- Arindam Banerjee, Inderjit S. Dhillon, Joydeep Ghosh, Srujana Merugu, and Dharmendra S. Modha. 2007. A generalized maximum entropy approach to Bregman co-clustering and matrix approximation. J. Mach. Learn. Res. 8, 1919--1986. Google ScholarDigital Library
- Deepayan Chakrabarti. 2004. AutoPart: parameter-free graph partitioning and outlier detection. In Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD). Springer-Verlag Berlin Heidelberg, 112--124. Google ScholarDigital Library
- Deepayan Chakrabarti, Spiros Papadimitriou, Dharmendra S. Modha, and Christos Faloutsos. 2004. Fully automatic cross-associations. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM New York, NY, USA, 79--88. Google ScholarDigital Library
- Hyung Jin Chang, Dong Sung, Pyo Jae Kim, and Jin Young Choi. 2012. Spatiotemporal pattern modeling for fault detection and classification in semiconductor manufacturing. IEEE Trans. Semicond. Manuf. 25, 72--82.Google ScholarCross Ref
- Yanping Chen, Bing Hu, Eamonn J. Keogh, and Gustavo E. A. P. A. Batista. 2013. DTW-D: time series semi-supervised learning from a single example. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM New York, NY, USA, 383--391. Google ScholarDigital Library
- Hyuk Cho, Inderjit S. Dhillon, Yuqiang Guan, and Suvrit Sra. 2004. Minimum sum-squared residue co-clustering of gene expression data. In Proceedings of the SIAM International Conference on Data Mining. 114--125.Google ScholarCross Ref
- Inderjit S. Dhillon, Subramanyam Mallela, and Dharmendra S. Modha. 2003. Information-theoretic co-clustering. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM New York, NY, USA, 89--98. Google ScholarDigital Library
- Ada Wai-Chee Fu, Eamonn J. Keogh, Leo Yung Hang Lau, Chotirat (Ann) Ratanamahatana, and Raymond Chi-Wing Wong. 2008. Scaling and time warping in time series querying. VLDB J. 17, 4, 899--921. Google ScholarDigital Library
- Bin Gao, Tie-Yan Liu, Xin Zheng, QianSheng Cheng, and Wei-Ying Ma. 2005. Consistent bipartite graph co-partitioning for star-structured high-order heterogeneous data co-clustering. In Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM New York, NY, USA, 41--50. Google ScholarDigital Library
- Jingrui He, Hanghang Tong, Spiros Papadimitriou, Tina Eliassi-Rad, Christos Faloutsos, and Jaime Carbonell. 2009. PaCK: scalable parameter-free clustering on k-partite graphs. In Proceedings of the SDM Workshop on Link Analysis, Counterterrorism and Security.Google Scholar
- Qinghua Peter He and Jin Wang. 2010. Large-scale semiconductor process fault detection using a fast pattern recognition-based method. IEEE Trans. Semicond. Manuf. 23, 194--200.Google ScholarCross Ref
- Bing Hu, Yanping Chen, and Eamonn J. Keogh. 2013. Time series classification under more realistic assumptions. In Proceedings of the SIAM International Conference on Data Mining. 578--586.Google Scholar
- Anil K. Jain and Richard C. Dubes. 1988. Algorithms for Clustering Data. Prentice-Hall, Inc., Upper Saddle River, NJ. Google ScholarDigital Library
- Jyh-Cheng Jeng, Cheng-Chih Li, and Hsiao-Ping Huang. 2007. Fault detection and isolation for dynamic processes using recursive principal component analysis based on filtering of signals. Asia-Pacific J. chem. Eng. 2, 501--509.Google Scholar
- A. B. Johnson and S. F. McLoone. 2012. A dynamic sampling methodology for within product virtual metrology. In Proceedings of the 29th International Manufacturing Conference. University of Ulster, Coleraine, United Kingdom.Google Scholar
- Daniel Kurz, Cristina De Luca, and Jurgen Pilz. 2013. Monitoring virtual metrology reliability in a sampling decision system. In Proceedings of the Conference on Automation Science and Engineering. IEEE.Google ScholarCross Ref
- Lei Li and B. Aditya Prakash. 2011. Time series clustering: complex is simpler!. In Proceedings of the 28th International Conference on Machine Learning. 185--192.Google Scholar
- Lei Li, B. Aditya Prakash, and Christos Faloutsos. 2010. Parsimonious linear fingerprinting for time series. Very Large Database Endowment 3, 1, 385--396. Google ScholarDigital Library
- Tao Li. 2005. A general model for clustering binary data. In Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM New York, NY, USA, 188--197. Google ScholarDigital Library
- Weihua Li, H. Henry Yue, Sergio Valle-Cervantes, and S. Joe Qin. 2000. Recursive PCA for adaptive process monitoring. J. Process Contr. 10, 471--486.Google ScholarCross Ref
- Bo Long, Zhongfei (Mark) Zhang, Xiaoyun Wu, and Philip S. Yu. 2006. Spectral clustering for multi-type relational data. In Proceedings of the 23rd International Conference on Machine Learning. 585--592. Google ScholarDigital Library
- Bo Long, Zhongfei (Mark) Zhang, and Philip S. Yu. 2007. A probabilistic framework for relational clustering. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM New York, NY, USA, 470--479. Google ScholarDigital Library
- Sara C. Madeira and Arlindo L. Oliveira. 2004. Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Trans. Comput. Biol. Bioinform. 1, 1, 24--45. Google ScholarDigital Library
- Manish Misra. 2007. Novel techniques for real-time and predictive FDC systems. Future Fab Int. 22.Google Scholar
- Glenn Newell, Naji Bekhazi, and Ray Morgan. 2007. Optimizing Storage and I/O for Distributed Processing on Enterprise and High Performance Compute (HPC) Systems for Mask Data Preparation Software (CATS). Technical Report. Synopsys, Inc.Google Scholar
- Spiros Papadimitriou, Jimeng Sun, and Christos Faloutsos. 2005. Streaming pattern discovery in multiple time-series. In Proceedings of the 31st International Conference on Very Large Data Bases. VLDB Endowment, 697--708. Google ScholarDigital Library
- Thanawin Rakthanmanon, Bilson J. L. Campana, Abdullah Mueen, Gustavo E. A. P. A. Batista, M. Brandon Westover, Qiang Zhu, Jesin Zakaria, and Eamonn J. Keogh. 2012a. Searching and mining trillions of time series subsequences under dynamic time warping. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM New York, NY, USA, 262--270. Google ScholarDigital Library
- Thanawin Rakthanmanon, Bilson J. L. Campana, Abdullah Mueen, Gustavo E. A. P. A. Batista, M. Brandon Westover, Qiang Zhu, Jesin Zakaria, and Eamonn J. Keogh. 2012b. Searching and mining trillions of time series subsequences under dynamic time warping. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM New York, NY, USA, 262--270. Google ScholarDigital Library
- Thanawin Rakthanmanon, Eamonn J. Keogh, Stefano Lonardi, and Scott Evans. 2012c. MDL-based time series clustering. Knowl. Inf. Syst. 33, 2 (2012), 371--399.Google ScholarDigital Library
- Ajit Paul Singh and Geoffrey J. Gordon. 2008. Relational learning via collective matrix factorization. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM New York, NY, USA, 650--658. Google ScholarDigital Library
- Jimeng Sun, Christos Faloutsos, Spiros Papadimitriou, and Philip S. Yu. 2007. GraphScope: parameter-free mining of large time-evolving graphs. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM New York, NY, USA, 687--696. Google ScholarDigital Library
- R. Tibshirani. 1996. Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Series B Stat. Methodol. 58, 267--288.Google ScholarCross Ref
- Li Wei, Eamonn J. Keogh, Xiaopeng Xi, and Melissa Yoder. 2008. Efficiently finding unusual shapes in large image databases. Data Min. Knowl. Discov. 17, 3, 343--376. Google ScholarDigital Library
- Xiaopeng Xi, Eamonn J. Keogh, Christian R. Shelton, Li Wei, and Chotirat Ann Ratanamahatana. 2006. Fast time series classification using numerosity reduction. In Proceedings of the 23rd International Conference on Machine Learning. 1033--1040. Google ScholarDigital Library
- Byoung-Kee Yi, Nikolaos Sidiropoulos, Theodore Johnson, H. V. Jagadish, Christos Faloutsos, and Alexandros Biliris. 2000. Online data mining for co-evolving time sequences. In Proceedings of the 16th International Conference on Data Engineering. IEEE, 13--22. Google ScholarDigital Library
- Jesin Zakaria, Abdullah Mueen, and Eamonn J. Keogh. 2012a. Clustering time series using unsupervised-shapelets. In Proceedings of the IEEE International Conference on Data Mining. IEEE, 785--794. Google ScholarDigital Library
- Jesin Zakaria, Abdullah Mueen, and Eamonn J. Keogh. 2012b. Clustering time series using unsupervised-shapelets. In Proceedings of the IEEE International Conference on Data Mining. IEEE, 785--794. Google ScholarDigital Library
- Yada Zhu and Jingrui He. 2014. Co-clustering structural temporal data with applications to semiconductor manufacturing. In Proceedings of the IEEE International Conference on Data Mining. IEEE, 1121--1126. Google ScholarDigital Library
- Yada Zhu, Jingrui He, and Rick Lawrence. 2012. Hierarchical modeling with tensor inputs. In Proceedings of the 26 AAAI Conference on Artificial Intelligence. AAAI. Google ScholarDigital Library
- Hui Zou and Trevor Hastie. 2003. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Series B Stat. Methodol. 67, 2, 301--320.Google ScholarCross Ref
Index Terms
- Co-Clustering Structural Temporal Data with Applications to Semiconductor Manufacturing
Recommendations
Co-Clustering Structural Temporal Data with Applications to Semiconductor Manufacturing
ICDM '14: Proceedings of the 2014 IEEE International Conference on Data MiningRecent years have witnessed data explosion in semiconductor manufacturing due to advances in instrumentation and storage techniques. In particular, following the same recipe for a certain IC device, multiple tools and chambers can be deployed for the ...
Non-Exhaustive, Overlapping Co-Clustering
CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge ManagementThe goal of co-clustering is to simultaneously identify a clustering of the rows as well as the columns of a two dimensional data matrix. Most existing co-clustering algorithms are designed to find pairwise disjoint and exhaustive co-clusters. However, ...
Weighted multi-view co-clustering (WMVCC) for sparse data
AbstractMulti-view clustering has gained importance in recent times due to the large-scale generation of data, often from multiple sources. Multi-view clustering refers to clustering a set of objects which are expressed by multiple set of features, known ...
Comments