Skip to main content
main-content

Über dieses Buch

Social media data contains our communication and online sharing, mirroring our daily life. This book looks at how we can use and what we can discover from such big data:

Basic knowledge (data & challenges) on social media analytics

Clustering as a fundamental technique for unsupervised knowledge discovery and data mining

A class of neural inspired algorithms, based on adaptive resonance theory (ART), tackling challenges in big social media data clustering

Step-by-step practices of developing unsupervised machine learning algorithms for real-world applications in social media domain

Adaptive Resonance Theory in Social Media Data Clustering stands on the fundamental breakthrough in cognitive and neural theory, i.e. adaptive resonance theory, which simulates how a brain processes information to perform memory, learning, recognition, and prediction.

It presents initiatives on the mathematical demonstration of ART’s learning mechanisms in clustering, and illustrates how to extend the base ART model to handle the complexity and characteristics of social media data and perform associative analytical tasks.

Both cutting-edge research and real-world practices on machine learning and social media analytics are included in the book and if you wish to learn the answers to the following questions, this book is for you:

How to process big streams of multimedia data?

How to analyze social networks with heterogeneous data?

How to understand a user’s interests by learning from online posts and behaviors?

How to create a personalized search engine by automatically indexing and searching multimodal information resources?

.

Inhaltsverzeichnis

Frontmatter

Theories

Frontmatter

Chapter 1. Introduction

Abstract
The last decade has witnessed how social media in the era of Web 2.0 reshapes the way people communicate, interact, and entertain in daily life and incubates the prosperity of various user-centric platforms, such as social networking, question answering, massive open online courses (MOOC), and e-commerce platforms. The available rich user-generated multimedia data on the web has evolved traditional ways of understanding multimedia research and has led to numerous emerging topics on human-centric analytics and services, such as user profiling, social network mining, crowd behavior analysis, and personalized recommendation. Clustering, as an important tool for mining information groups and in-group shared characteristics, has been widely investigated for the knowledge discovery and data mining tasks in social media analytics. Whereas, social media data has numerous characteristics that raise challenges for traditional clustering techniques, such as the massive amount, diverse content, heterogeneous media sources, noisy user-generated content, and the generation in stream manner. This leads to the scenario where the clustering algorithms used in the literature of social media applications are usually variants of a few traditional algorithms, such as K-means, non-negative matrix factorization (NMF), and graph clustering. Developing a fast and robust clustering algorithm for social media analytics is still an open problem. This chapter will give a bird’s eye view of clustering in social media analytics, in terms of data characteristics, challenges and issues, and a class of novel approaches based on adaptive resonance theory (ART).
Lei Meng, Ah-Hwee Tan, Donald C. Wunsch II

Chapter 2. Clustering and Its Extensions in the Social Media Domain

Abstract
This chapter summarizes existing clustering and related approaches for the identified challenges as described in Sect. 1.​2 and presents the key branches of social media mining applications where clustering holds a potential. Specifically, several important types of clustering algorithms are first illustrated, including clustering, semi-supervised clustering, heterogeneous data co-clustering, and online clustering. Subsequently, Sect. 2.5 presents a review on existing techniques that help decide the value of the predefined number of clusters (required by most clustering algorithms) automatically and highlights the clustering algorithms that do not require such a parameter. It better illustrates the challenge of input parameter sensitivity of clustering algorithms when applied to large and complex social media data. Furthermore, in Sect. 2.6, a survey on several main applications of clustering algorithms to social media mining tasks is offered, including web image organization, multi-modal information fusion, user community detection, user sentiment analysis, social event detection, community question answering, social media data indexing and retrieval, and recommender systems in social networks.
Lei Meng, Ah-Hwee Tan, Donald C. Wunsch II

Chapter 3. Adaptive Resonance Theory (ART) for Social Media Analytics

Abstract
This chapter presents the ART-based clustering algorithms for social media analytics in detail. Sections 3.1 and 3.2 introduce Fuzzy ART and its clustering mechanisms, respectively, which provides a deep understanding of the base model that is used and extended for handling the social media clustering challenges. Important concepts such as vigilance region (VR) and its properties are explained and proven. Subsequently, Sects. 3.33.7 illustrate five types of ART variants, each of which addresses the challenges in one social media analytical scenario, including automated parameter adaptation, user preference incorporation, short text clustering, heterogeneous data co-clustering and online streaming data indexing. The content of this chapter is several prior studies, including Probabilistic ART [15] (©2012 IEEE. Reprinted, with permission, from [15]), Generalized Heterogeneous Fusion ART [20] (©2014 IEEE. Reprinted, with permission, from [20]), Vigilance Adaptation ART [19] (©2016 IEEE. Reprinted, with permission, from [19]), and Online Multimodal Co-indexing ART [17] (http://​dx.​doi.​org/​10.​1145/​2671188.​2749362).
Lei Meng, Ah-Hwee Tan, Donald C. Wunsch II

Applications

Frontmatter

Chapter 4. Personalized Web Image Organization

Abstract
Due to the problem of semantic gap, i.e. the visual content of an image may not represent its semantics well, existing efforts on web image organization usually transform this task to clustering the surrounding text. However, because the surrounding text is usually short and the words therein usually appear only once, existing text clustering algorithms can hardly use the statistical information for image representation and may achieve downgraded performance with higher computational cost caused by learning from noisy tags. This chapter presents using the Probabilistic ART with user preference architecture, as introduced in Sects. 3.​5 and 3.​4, for personalized web image organization. This fused algorithm is named Probabilistic Fusion ART (PF-ART), which groups images of similar semantics together and simultaneously mines the key tags/topics of individual clusters. Moreover, it performs semi-supervised learning using the user-provided taggings for images to give users direct control of the generated clusters. An agglomerative merging strategy is further used to organize the clusters into a hierarchy, which is of a multi-branch tree structure rather than a binary tree generated by traditional hierarchical clustering algorithms. The entire two-step algorithm is called Personalized Hierarchical Theme-based Clustering (PHTC), for tag-based web image organization. Two large-scale real-world web image collections, namely the NUS-WIDE and the Flickr datasets, are used to evaluate PHTC and compare it with existing algorithms in terms of clustering performance and time cost. The content of this chapter is summarized and extended from the prior study [17] (©2012 IEEE. Reprinted, with permission, from [17]).
Lei Meng, Ah-Hwee Tan, Donald C. Wunsch II

Chapter 5. Socially-Enriched Multimedia Data Co-clustering

Abstract
Heterogeneous data co-clustering is a commonly used technique for tapping the rich meta-information of multimedia web documents, including category, annotation, and description, for associative discovery. However, most co-clustering methods proposed for heterogeneous data do not consider the representation problem of short and noisy text and their performance is limited by the empirical weighting of the multimodal features. This chapter explains how to use the Generalized Heterogeneous Fusion Adaptive Resonance Theory (GHF-ART) for clustering large-scale web multimedia documents. Specifically, GHF-ART is designed to handle multimedia data with an arbitrarily rich level of meta-information. For handling short and noisy text, GHF-ART employs the representation and learning methods of PF-ART as described in Sect. 3.​5, which identify key tags for cluster prototype modeling by learning the probabilistic distribution of tag occurrences of clusters. More importantly, GHF-ART incorporates an adaptive method for effective fusion of the multimodal features, which weights the features of multiple data sources by incrementally measuring the importance of feature modalities through the intra-cluster scatters. Extensive experiments on two web image datasets and one text document set have shown that GHF-ART achieves significantly better clustering performance and is much faster than many existing state-of-the-art algorithms. The content of this chapter is summarized and extended from [12] (©2014 IEEE. Reprinted, with permission, from [12]), and the Python codes of GHF-ART are available at https://​github.​com/​Lei-Meng/​GHF-ART.
Lei Meng, Ah-Hwee Tan, Donald C. Wunsch II

Chapter 6. Community Discovery in Heterogeneous Social Networks

Abstract
Discovering social communities of web users through clustering analysis of heterogeneous link associations has drawn much attention. However, existing approaches typically require the number of clusters a priori, do not address the weighting problem for fusing heterogeneous types of links, and have a heavy computational cost. This chapter studies the commonly used social links of users and explores the feasibility of the proposed heterogeneous data co-clustering algorithm GHF-ART, as introduced in Sect. 3.​6, for discovering user communities in social networks. Contrary to the existing algorithms proposed for this task, GHF-ART performs real-time matching of patterns and one-pass learning, which guarantees its low computational cost. With a vigilance parameter to restrain the intra-cluster similarity , GHF-ART does not need the number of clusters a priori. To achieve a better fusion of multiple types of links, GHF-ART employs a weighting algorithm, called robustness measure (RM), to incrementally assess the importance of all the feature channels for the representation of data objects of the same class. Extensive experiments have been conducted on two social network datasets to analyze the performance of GHF-ART. The promising results compare GHF-ART with existing methods and demonstrate the effectiveness and efficiency of GHF-ART. The content of this chapter is summarized and extended from [11] (Copyright ©2014 Society for Industrial and Applied Mathematics. Reprinted with permission. All rights reserved).
Lei Meng, Ah-Hwee Tan, Donald C. Wunsch II

Chapter 7. Online Multimodal Co-indexing and Retrieval of Social Media Data

Abstract
Effective indexing of social media data is key to searching for information on the social Web. However, the characteristics of social media data make it a challenging task. The large-scale and streaming nature is the first challenge, which requires the indexing algorithm to be able to efficiently update the indexing structure when receiving data streams. The second challenge is utilizing the rich meta-information of social media data for a better evaluation of the similarity between data objects and for a more semantically meaningful indexing of the data, which may allow the users to search for them using the different types of queries they like. Existing approaches based on either matrix operations or hashing usually cannot perform an online update of the indexing base to encode upcoming data streams, and they have difficulty handling noisy data. This chapter presents a study on using the Online Multimodal Co-indexing Adaptive Resonance Theory (OMC-ART) for an effective and efficient indexing and retrieval of social media data. More specifically, two types of social media data are considered: (1) the weakly supervised image data, which is associated with captions, tags and descriptions given by the users; and (2) the e-commerce product data, which includes product images, titles, descriptions and user comments. These scenarios make this study related to multimodal web image indexing and retrieval. Compared with existing studies, OMC-ART has several distinct characteristics. First, OMC-ART is able to perform online learning of sequential data. Second, instead of a plain indexing structure, OMC-ART builds a two-layer one, in which the first layer co-indexes the images by the key visual and textual features based on the generalized distributions of the clusters they belong to; while in the second layer, the data objects are co-indexed by their own feature distributions. Third, OMC-ART enables flexible multimodal searching by using either visual features, keywords, or a combination of both. Fourth, OMC-ART employs a ranking algorithm that does not need to go through the whole indexing system when only a limited number of images need to be retrieved. Experiments on two publicly accessible image datasets and a real-world e-commerce dataset demonstrate the efficiency and effectiveness of OMC-ART. The content of this chapter is summarized and extended from [13] (https://​doi.​org/​10.​1145/​2671188.​2749362), and the Python codes of OMC-ART with examples on building an e-commerce product search engine are available at https://​github.​com/​Lei-Meng/​OMC-ART-Build-a-toy-online-search-engine-.
Lei Meng, Ah-Hwee Tan, Donald C. Wunsch II

Chapter 8. Concluding Remarks

Abstract
This chapter summarizes the major contributions in this book and discusses their possible positions and requirements in some future scenarios. Section 8.1 follows the book structure to revisit the key contributions of this book in both theories and applications. The developed algorithms, such as the VA-ARTs for hyperparameter adaptation and the GHF-ART for multimedia representation and fusion, and the four applications, such as clustering and retrieving socially enriched multimedia data, are concentrated using one paragraph and three paragraphs, respectively. In Sect. 8.2, the roles of the proposed ART-embodied algorithms in social media clustering tasks are highlighted, and their possible evolutions using the state-of-the-art representation learning techniques to fit the increasingly rich social media data and demands are discussed.
Lei Meng, Ah-Hwee Tan, Donald C. Wunsch II

Backmatter

Weitere Informationen

Premium Partner

    Bildnachweise