Abstract
The data in many disciplines such as social networks, Web analysis, etc. is link-based, and the link structure can be exploited for many different data mining tasks. In this article, we consider the problem of temporal link prediction: Given link data for times 1 through T, can we predict the links at time T + 1? If our data has underlying periodic structure, can we predict out even further in time, i.e., links at time T + 2, T + 3, etc.? In this article, we consider bipartite graphs that evolve over time and consider matrix- and tensor-based methods for predicting future links. We present a weight-based method for collapsing multiyear data into a single matrix. We show how the well-known Katz method for link prediction can be extended to bipartite graphs and, moreover, approximated in a scalable way using a truncated singular value decomposition. Using a CANDECOMP/PARAFAC tensor decomposition of the data, we illustrate the usefulness of exploiting the natural three-dimensional structure of temporal link data. Through several numerical experiments, we demonstrate that both matrix- and tensor-based techniques are effective for temporal link prediction despite the inherent difficulty of the problem. Additionally, we show that tensor-based techniques are particularly effective for temporal data with varying periodic patterns.
- Acar, E. and Yener, B. 2009. Unsupervised multiway data analysis: A literature survey. IEEE Trans. Knowl. Data Engin. 21, 1, 6--20. Google ScholarDigital Library
- Acar, E., Çamtepe, S. A., and Yener, B. 2006. Collective sampling and analysis of high order tensors for chatroom communications. In Proceedings of the IEEE International Conference on Intelligence and Security Informatics (ISI’06). Lecture Notes in Computer Science, vol. 3975. Springer, 213--224. Google ScholarDigital Library
- Acar, E., Dunlavy, D. M., and Kolda, T. G. 2009. Link prediction on evolving data using matrix and tensor factorizations. In Proceedings of the Workshop on Large-Scale Data Mining: Theory and Applications (LDMTA’09). 262--269. Google ScholarDigital Library
- Acar, E., Dunlavy, D. M., and Kolda, T. G. 2010. A scalable optimization approach for fitting canonical tensor decompositions. J. Chemometrics. To appear.Google Scholar
- Bader, B. W. and Kolda, T. G. 2007. Efficient MATLAB computations with sparse and factored tensors. SIAM J. Scient. Comput. 30, 1, 205--231. Google ScholarDigital Library
- Bader, B. W., Berry, M. W., and Browne, M. 2007. Discussion tracking in Enron email using PARAFAC. In Survey of Text Mining: Clustering, Classification, and Retrieval 2nd Ed., M. W. Berry and M. Castellanos Eds. Springer, 147--162.Google Scholar
- Bast, H. and Majumdar, D. 2005. Why spectral retrieval works. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’05). 11--18. Google ScholarDigital Library
- Bell, R. M. and Koren, Y. 2007. Lessons from the Netflix Prize Challenge. ACM SIGKDD Explor. Newslett. 9, 75--79. Google ScholarDigital Library
- Carroll, J. D. and Chang, J. J. 1970. Analysis of individual differences in multidimensional scaling via an N-way generalization of “Eckart-Young” decomposition. Psychometrika 35, 283--319.Google ScholarCross Ref
- Chatfield, C. and Yar, M. 1988. Holt-winters forecasting: Some practical issues. J. Royal Statist. Soc. Series D (The Statistician) 37, 2, 129--140.Google Scholar
- Clauset, A., Moore, C., and Newman, M. 2008. Hierarchical structure and the prediction of missing links in networks. Nature 453.Google Scholar
- Dumais, S. T., Furnas, G. W., Landauer, T. K., Deerwester, S., and Harshman, R. 1988. Using latent semantic analysis to improve access to textual information. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’88). ACM Press, 281--285. Google ScholarDigital Library
- Gardner, E. S. 2006. Exponential smoothing: The state of the art - part ii. Int. J. Forecast. 22, 637--666.Google ScholarCross Ref
- Getoor, L. and Diehl, C. P. 2005. Link mining: A survey. ACM SIGKDD Explor. Newslett. 7, 2, 3--12. Google ScholarDigital Library
- Harshman, R. A. 1970. Foundations of the PARAFAC procedure: Models and conditions for an “explanatory” multi-modal factor analysis. UCLA Working Papers in Phonetics 16, 1--84. http://www.psychology.uwo.ca/faculty/harshman/wpppfac0.pdf.Google Scholar
- Hasan, M. A., Chaoji, V., Salem, S., and Zaki, M. 2006. Link prediction using supervised learning. In Proceedings of the SIAM of the Data Mining Workshop on Link Analysis, Counterterrorism, and Security.Google Scholar
- Huang, Z. and Lin, D. K. J. 2009. The time-series link prediction problem with applications in communication surveillance. INFORMS J. Comput. 21, 286--303. Google ScholarDigital Library
- Huang, Z., Li, X., and Chen, H. 2005. Link prediction approach to collaborative filtering. In Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL’05). 141--142. Google ScholarDigital Library
- Katz, L. 1953. A new status index derived from sociometric analysis. Psychometrika 18, 1, 39--43.Google ScholarCross Ref
- Kolda, T. G. and Bader, B. W. 2009. Tensor decompositions and applications. SIAM Rev. 51, 3, 455--500. Google ScholarDigital Library
- Kolda, T. G., Bader, B. W., and Kenny, J. P. 2005. Higher-order web link analysis using multilinear algebra. In Proceedings of the 5th IEEE International Conference on Data Mining (ICDM’05). IEEE Computer Society, 242--249. Google ScholarDigital Library
- Koren, Y. 2009. Collaborative filtering with temporal dynamics. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’09). ACM, New York, NY, 447--456. Google ScholarDigital Library
- Koren, Y., Bell, R., and Volinsky, C. 2009. Matrix factorization techniques for recommender systems. IEEE Comput. 42, 30--37. Google ScholarDigital Library
- Kruskal, J. B. 1989. Rank, decomposition, and uniqueness for 3-way and N-way arrays. In Multiway Data Analysis. R. Coppi and S. Bolasco Eds. North-Holland, Amsterdam, 7--18. Google ScholarDigital Library
- Liben-Nowell, D. and Kleinberg, J. 2007. The link-prediction problem for social networks. J. Amer. Soc. Inform. Sci. Techn. 58, 7, 1019--1031. Google ScholarDigital Library
- Liu, Y. and Kou, Z. 2007. Predicting who rated what in large-scale datasets. ACM SIGKDD Explor. Newslett. 9, 62--65. Google ScholarDigital Library
- Makridakis, S. and Hibon, M. 2000. The m3 competition: Results, conclusions and implications. Int. J. Forecast. 16, 451--476.Google ScholarCross Ref
- Rattigan, M. J. and Jensen, D. 2005. The case for anomalous link discovery. ACM SIGKDD Explor. Newslett. 7, 2, 41--47. Google ScholarDigital Library
- Saad, Y. 1992. Numerical Methods for Large Eigenvalue Problems. Manchester University Press.Google Scholar
- Sarkar, P., Siddiqi, S. M., and Gordon, G. J. 2007. A latent space approach to dynamic embedding of co-occurrence data. In Proceedings of the 11th International Conference on Artificial Intelligence and Statistics (AI-STATS’07).Google Scholar
- Sharan, U. and Neville, J. 2008. Temporal-relational classifiers for prediction in evolving domains. In Proceedings of the 8th IEEE International Conference on Data Mining (ICDM’08). IEEE Computer Society, 540--549. Google ScholarDigital Library
- Smola, A. and Kondor, R. 2003. Kernels and regularization on graphs. In Proceedings of the Annual Conference on Computational Learning Theory and Kernel Workshop. B. Schölkopf and M. Warmuth Eds., Lecture Notes in Computer Science. Springer.Google Scholar
- Stager, M., Lukowicz, P., and Troster, G. 2006. Dealing with class skew in context recognition. In Proceedings of the 26th IEEE International Conference on Distributed Computing Systems Workshops (ICDCSW’06). 58. Google ScholarDigital Library
- Sun, J., Papadimitriou, S., Lin, C.-Y., Cao, N., Liu, S., and Qian, W. 2009. Multivis: Content-based social network exploration through multi-way visual analysis. In Proceedings of the 9th SIAM International Conference on Data Mining (SDM’09). 1064--1075.Google Scholar
- Tong, H., Papadimitriou, S., Yu, P. S., and Faloutsos, C. 2008. Proximity tracking on time-evolving bipartite graphs. In Proceedings of the 8th SIAM International Conference on Data Mining (SDM’08). 704--715.Google Scholar
- Tucker, L. R. 1963. Implications of factor analysis of three-way matrices for measurement of change. In Problems in Measuring Change. C. W. Harris Ed., University of Wisconsin Press, 122--137.Google Scholar
- Tucker, L. R. 1966. Some mathematical notes on three-mode factor analysis. Psychometrika 31, 279--311.Google ScholarCross Ref
- Wang, C., Satuluri, V., and Parthasarathy, S. 2007. Local probabilistic models for link prediction. In Proceedings of the 7th IEEE Conference on Data Mining (ICDM’07). 322--331. Google ScholarDigital Library
- Xiong, L., Chen, X., Huang, T.-K., Schneider, J., and Carbonell, J. G. 2010. Temporal collaborative filtering with Bayesian probabilistic tensor factorization. In Proceedings of the SIAM International Conference on Data Mining.Google Scholar
- Yan, H., Grosky, W. I., and Fotouhi, F. 2008. Augmenting the power of LSI in text retrieval: Singular value rescaling. Data Knowl. Engin. 65, 108--125. Google ScholarDigital Library
Index Terms
- Temporal Link Prediction Using Matrix and Tensor Factorizations
Recommendations
A Survey of Link Prediction in Complex Networks
Networks have become increasingly important to model complex systems composed of interacting elements. Network data mining has a large number of applications in many disciplines including protein-protein interaction networks, social networks, ...
DeepWalk: online learning of social representations
KDD '14: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data miningWe present DeepWalk, a novel approach for learning latent representations of vertices in a network. These latent representations encode social relations in a continuous vector space, which is easily exploited by statistical models. DeepWalk generalizes ...
node2vec: Scalable Feature Learning for Networks
KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data MiningPrediction tasks over nodes and edges in networks require careful effort in engineering features used by learning algorithms. Recent research in the broader field of representation learning has led to significant progress in automating prediction by ...
Comments