Coupling learning of complex interactions
Introduction
Complex interactive and unstructured/semi-structured data and applications, especially in big data, present major challenges to the current analytic and learning theories and systems. Big data, in particular, presents specific complexities of weakly structured and unstructured data distribution, dynamics, interactions, and structures, which challenge the existing theoretical and commercial systems in mathematics, statistics, and computer science. Examples include the connections between gene combinations and physical and psychological consequences, between one’s personal traits or preferences in social media and one’s social, behavioral, attitudinal and interest attributes.
This results in a situation where learning big data is analogous to the ancient Indian parable of seven blind men encountering an elephant for the first time. Each touches a different part of the animal, so when the seven share their experiences, each has a completely different idea of what the whole animal must look like. Similarly, when confronted with a big data set, a data modeler or learner may only see a partial set or aspect, hence often only a partial story is told by a learner. Why does this happen? There are many reasons, one of which is the invisibility of sophisticated coupling relationships (coupling for short, see Definition 2.1) hidden between the heterogeneous parts that are ‘visible’ to blind people. They do not have the ability to recognize the visible and invisible couplings between parts to connect those heterogeneous parts to form a global picture as sighted people do. This is representative of certain major challenges of complex relations hidden in complex data (particularly referring here to data with complex couplings and/or mixed distributions, formats, types and variables, and unstructured and weakly structured data). Learning visible and especially invisible coupling relationships can complement and assist in understanding weakly structured and unstructured data.
In many cases, such inherent, locally visible but globally invisible (or vice versa) couplings are presented in a range of forms, structures, and layers and on diverse entities. Often individual learners cannot tell the whole story due to their inability to identify to such complex coupling. Effectively learning the widespread, various, visible and invisible couplings is thus crucial for obtaining a true and total picture of the underlying problem.
This is not a trivial task, however. The difficulty in learning complex couplings lies not only with invisible couplings – even visible couplings are often overlooked. Taking the design of recommender algorithms as an example, our ability to recognize them is limited, even though these interactions and structures are embedded in applications such as social media networks. For example, there have been several recent cases in which researchers have started to incorporate inherent couplings between items and between users into a recommender system (RS) (Jannach et al., 2010, Ricci et al., 2011), after a long period of focusing on rating-based exploration, whereas the item-item couplings and user-user couplings (see Fig. 2) have been always intrinsic to the systems.
One reason for this is that visibility is relative to opportunity and capability. The same couplings are implicit to some people, while explicit to others. For instance, in social media recommendation, the friendship between twitters (Cheng) has only recently been recognized as enhancing social recommendation, yet it has always been a natural built-in feature of social media systems. There is a need to develop our ability to capture and convert as many invisible couplings as possible to visible coupling, and to effectively capture visible couplings in complex data.
In reviewing the existing literature, we unfortunately cannot find systematic methodologies and techniques in learning theories to address the above coupling issues. This raises a fundamental question: how much do we know about coupling? and many other basic questions, including: what are couplings, where they are, and in what forms are they present, which we need to address before we can think about how to capture and embed couplings in learning systems. Once these problems have been satisfactorily addressed, more issues follow, such as: how to represent couplings, how to test whether and to what extent couplings exist in a dataset, how to incorporate them into learning models, and how to evaluate the difference they make once they are incorporated into learning systems. These challenges form the basis of the need to study coupling learning, a fundamental but undeveloped area in computer science, to address the intricate coupling relationships embedded in complex data and increasingly seen in information retrieval, data mining and machine learning in particular. This is crucial for big data analytics because most existing analytics and learning theories and systems have been built on the assumption that data is independent and identically distributed (IID), while big data is essentially non-IID (Cao, 2013b). Coupling is one critical aspect of non-IIDness (Cao, 2013b) (the other is heterogeneity or so-called personalization, which is not the main concern in this paper, although coupling may be heterogeneous and involve heterogeneity in data).
Learning the above characteristics of complex couplings in big data fundamentally challenges existing learning theories and systems, including pairwise coupling (Moreira and Mayoraz, 1998, Wu et al., 2004), statistical relation learning (Dzeroski and Lavra, 2001, Getoor and Taskar, 2007), dependency learning (Neville and Jensen, 2007, Wei et al., 2014), association learning (Ceglar and Roddick, 2006, Lu et al., 2000), correlation analysis (Hair et al., 2009, Székely et al., 2007), linkage analysis (Faloutsos et al., 2011, Miller et al., 2009), community analysis (Arenas et al., 2004, Girvan and Newman, 2002), social network analysis (Arenas et al., 2004, Girvan and Newman, 2002, Knoke and Yang, 2007, Wasserman and Faust, 1994), multivariate time series (Székely et al., 2007), causality analysis (Gujarati & Porter, 2009) and graph analysis (Cook & Holder, 2006). They either essentially treat data as IID or only address specific forms or levels of couplings. No general and competent theories, frameworks, algorithms or tools are available to handle the coupling complexities discussed above.
The above observations motivate this work, namely to systematically state the coupling learning problem, which clearly involves interactive, unstructured and semi-structured data. The aim of this paper is multi-fold:
- •
High-level: build a conceptual system of coupling learning (Sections 2 Coupling: an important perspective, 3 Ubiquitous couplings, 4 Learning coupling) towards a generic and comprehensive understanding of the broad-based coupling relationships that exist in complex data and applications (especially in big data related business).
- •
Middle-level: illustrate how to advance classic problems to another generation by incorporating coupling learning into a specific existing scientific problem such as recommender systems (Section 5).
- •
Low-level: showcase specific examples in recommender systems to demonstrate how couplings can be managed in practice to improve analytic outcomes (Section 6).
The purpose of this paper is therefore not to specify one particular technique for learning a particular type of coupling (instead we provide citations to our related work for such discrete discussions), but to disclose the whole nature of the problem and build generic frameworks and examples to show possible ways to address the problem.
Accordingly, the organization of this work is as follows. Section 2 discusses the concept of coupling and major coupling relationships often addressed in current big data communities. Section 3 presents a high-level picture of coupling layers and forms appearing in complex data and applications. In Section 4, the issues of modeling and measuring couplings and the curse of couplings are introduced. An example of comprehensive couplings in recommender systems is discussed in Section 5, which presents a new theoretical framework for next-generation recommender systems. Two case studies are given in Section 6, one in which a coupled K-mode algorithm to identify items with strong coupling relationships is presented, and one in which couplings are utilized to improve Matrix Factorization-based recommendation. Section 7 explores the opportunities for learning couplings in data mining, text mining, information retrieval, and complex behavior analysis. The paper is concluded in Section 8.
Section snippets
Coupling: an important perspective
In this section, we discuss the concept of coupling, and the relevant work in statistics, mathematics and computer science. The following key concepts are used in this paper:
- •
Coupling: refers to any relationship or interaction that connects two or more aspects (which could be between inputs or between inputs and outputs).
- •
Aspect: a term broadly referring to entity, entity property (or characteristics such as variations), property value, context, learner or analytic model, learning objective
Ubiquitous couplings
In this section, we expand the above discussions on couplings, aiming to provide an overall picture of couplings widespread in comprehensive learning tasks.
Learning coupling
Learning coupling refers to understanding, formalizing and quantifying the coupling aspects, entities, interactions, layers, forms and strength. This includes extracting, discovering and estimating the interactions and relationships between learning components, including method, objective, task, level, dimension, process, measure and outcome, especially when the learning involves multiples of one of the above components, for instance, multi-methods or multi-tasks. Recently, the concept of
An example: couplings in recommendation
In recommender systems such as online shopping websites, online broadcasting systems, IPTV, and social media, there are different types of intrinsic interactions: user-user couplings, item-item couplings, and user-item couplings. A user’s behavior may influence his/her friends, which further affects the behaviors of others. Item attributes such as item price and quantity are often associated with each other. The price of one item may affect the price of another. An item may influence the sale
Case study: coupled recommender systems
The discussions about learning different types of couplings in recommender systems in Section 5 inspire us to incorporate couplings into recommendation algorithms. In this section, we discuss two preliminary studies in this direction. The first (for more details, see Yu, Wang, Gao, Cao, & Sun, 2013) considers coupled item recommendation, which incorporates couplings into items and creates a new coupled collaborative filtering (CCF) algorithm: Coupled K-modes (CK-modes). The second (for more
Discussions
Coupling learning is a very promising direction in learning complex relationships between objects, properties, processes, facts, events, and states of affairs which are beyond correlation, association and dependency. Complex couplings are a major characteristic of big data, and together with heterogeneity form the phenomenon of non-IIDness, namely non-independent and non-identically distributed characteristics. Non-IIDness greatly challenges the existing theories and systems in statistics, data
Conclusions
In the real world, diverse coupling relationships are embedded in every business and are associated with objects, properties, processes, events, and states of affairs. Such couplings may present characteristics, which are far beyond the association, correlation and dependency relationships that usually concern statistics, data mining and machine learning communities. Triggered by behavioral, economic, social, cultural, or other driving forces, they may be explicit vs. implicit, syntactic vs.
References (71)
In-depth behavior understanding and use: The behavior informatics approach
Information Science
(2010)- Al Mamunur Rashid, S. K. L., Karypis, G., & Riedl, J. (2006). ClustKNN: A highly scalable hybrid model-& memory-based...
- et al.
Community analysis in social networks
The European Physical Journal B-Condensed Matter and Complex Systems
(2004) - Breese, J., Heckerman, D., & Kadie, C. (1998). Empirical analysis of predictive algorithms for collaborative filtering....
- Cao, L., Ou, Y., Yu, P. S., & Wei, G. (2010). Detecting abnormal coupled sequences and sequence changes in group-based...
Combined mining: Analyzing object and pattern relations for discovering and constructing complex yet actionable patterns
WIREs Data Mining and Knowledge Discovery
(2013)Non-IIDness learning in behavioral and social data
The Computer Journal
(2013)- Cao, L., Luo, D., & Zhang, C. (2009). Ubiquitous intelligence in agent mining. In Proceedings of ADMI 2009 (pp....
- Cao, W., Cao, L., & Song, Y. (2013). Coupled market behavior based financial crisis detection. In...
- et al.
Coupled behavior analysis with applications
IEEE Transactions on Knowledge and Data Engineering
(2012)
Behavior computing: Modeling, analysis, mining and decision
Domain driven data mining
Combined mining: Discovering informative knowledge in complex data
IEEE Transactions SMC Part B
Mining impact-targeted activity patterns in imbalanced data
IEEE Transactions on Knowledge and Data Engineering
Association mining
ACM Computing Surveys
Latent semantic analysis
Annual Review of Information Science and Technology
Relational data mining
Link mining: Models, algorithms and applications
Introduction to statistical relational learning
Community structure in social and biological networks
Proceedings of the National Academy of Sciences
Causality in economics: The Granger causality test
Multivariate data analysis
The elements of statistical learning: Data mining, inference, and prediction
Statistics and causal inference
Journal of the American Statistical Association
Recommender systems an introduction
Fuzzy set theory: Foundations and applications
Social network analysis
Cited by (134)
Flexible wearable sensors: An emerging platform for monitoring of bacterial infection in skin wounds
2024, Engineered RegenerationBiT-MAC: Mortality prediction by bidirectional time and multi-feature attention coupled network on multivariate irregular time series
2023, Computers in Biology and MedicineDoubled coupling for image emotion distribution learning
2023, Knowledge-Based SystemsA Multi-View Deep Metric Learning approach for Categorical Representation on mixed data
2023, Knowledge-Based SystemsIntegrateCF: Integrating explicit and implicit feedback based on deep learning collaborative filtering algorithm
2022, Expert Systems with ApplicationsDeep Multidilation Temporal and Spatial Dependence Modeling in Stereoscopic 3-D EEG for Visual Discomfort Assessment
2024, IEEE Transactions on Systems, Man, and Cybernetics: Systems