Skip to main content
main-content
Top

About this book

This book constitutes the thoroughly refereed papers of the Third National Conference of Social Media Processing, SMP 2014, held in Beijing, China, in November 2014. The 14 revised full papers and 9 short papers presented were carefully reviewed and selected from 101 submissions. The papers focus on the following topics: mining social media and applications; natural language processing; data mining; information retrieval; emergent social media processing problems.

Table of Contents

Frontmatter

Inferring Correspondences from Multiple Sources for Microblog User Tags

Abstract
Some microblog services encourage users to annotate themselves with multiple tags, indicating their attributes and interests. User tags play an important role for personalized recommendation and information retrieval. In order to better understand the semantics of user tags, we propose Tag Correspondence Model (TCM) to identify complex correspondences of tags from the rich context of microblog users. In TCM, we divide the context of a microblog user into various sources (such as short messages, user profile, and neighbors). With a collection of users with annotated tags, TCM can automatically learn the correspondences of user tags from the multiple sources. With the learned correspondences, we are able to interpret implicit semantics of tags. Moreover, for the users who have not annotated any tags, TCM can suggest tags according to users’ context information. Extensive experiments on a real-world dataset demonstrate that our method can efficiently identify correspondences of tags, which may eventually represent semantic meanings of tags.
Cunchao Tu, Zhiyuan Liu, Maosong Sun

Mining Intention-Related Products on Online Q&A Community

Abstract
User generated content on social media has attracted much attention from service/product providers, as it contains plenty of potential commercial opportunities. However, previous work mainly focuses on user Consumption Intention (CI) identification, and little effort has been spent to mine intention-related products. In this paper, we propose a novel approach to mine intention-related products on online Question & Answer (Q&A) community. Making use of the question-answer pairs as data source, we first automatically extract candidate products based on dependency parser. And then by means of the collocation extraction model, we identify the real intention-related products from the candidate set. The experimental results on our carefully constructed evaluation dataset show that our approach achieves better performance than two natural baseline methods. Our method is general enough for domain adaptation.
Junwen Duan, Xiao Ding, Ting Liu

Tag Expansion Using Friendship Information: Services for Picking-a-crowd for Crowdsourcing

Abstract
To address self-tagging concerns, some social networks’ websites, such as LinkedIn and Sina Weibo, allow users to tag themselves as part of their profiles; however, due to privacy or other unknown reasons, most of the users take just a few tags. Self-tag sparsity refers to the problem of low recall obtained when searching for people on systems based on user profiles. In this paper, we use not only users’ self-tags but also their friend relationships (which are often not hidden) to expand the tag list and measure the effectiveness of different types of friendship links and their self-tags. Experimental results show that friendship information (friendship links and profiles) can effectively improve the performance of tag expansion, especially for common users who have limited followers.
Bin Liang, Yiqun Liu, Min Zhang, Shaoping Ma, Liyun Ru, Kuo Zhang

Predicting the Popularity of Messages on Micro-blog Services

Abstract
Micro-blogging is one of the most popular social media services on which users can publish new messages (usually called tweets), submit their comments and retweet their followees’ messages. It is retweeting behavior that leading the information diffusion in a faster way. However, why some tweets are more popular than others? Whether a message will be popular in the future? These problems have attracted great attention. In this paper, we focus on predicting the popularity of a tweet on Weibo, a famous micro-blogging service in China. It is important for tremendous tasks such as breaking news detection, personalized message recommendation, advertisement placement, viral marketing etc. We propose a novel approach to predict the retweet count of a tweet by finding top-k similar tweets published by the same author. To find the top-k similar tweets we consider both content similarity and temporal similarity. Meanwhile, we also integrate our method into a classical classification method and prove our method can improve the results significantly.
Yang Li, Yiheng Chen, Ting Liu, Wenchao Deng

Detecting Anomalies in Microblogging via Nonnegative Matrix Tri-Factorization

Abstract
With the increasing of anomalous user’s intelligent, it is difficult to detect the anomalous users and messages in microblogging. Most of the studies attempt to detect anomalous users or messages individually nowadays. In this paper, we propose a co-clustering algorithm based on nonnegative matrix tri-factorization to detect anomalous users and messages simultaneously. A bipartite graph between user and message is built to model the homogeneous and heterogeneous interactions, and homogeneous relations as constraints to improve the accuracy of heterogeneous co-clustering algorithm. The experimental results show that the proposed algorithm can detect anomalous users and messages with high accuracy on Sina Weibo dataset.
Guowei Shen, Wu Yang, Wei Wang, Miao Yu, Guozhong Dong

Expanding Native Training Data for Implicit Discourse Relation Classification

Abstract
Linguistically informed features are provably useful in classifying implicit discourse relations among adjacent text spans. However the state of the art methods in this area suffer from either sparse natively implicit relation corpus or counter-intuitive artificially implicit one, and consequently either insufficient or distorted training in automatically learning discriminative features. To overcome the problem, this paper proposes a semantic frame based vector model towards unsupervised acquisition of semantically and relationally parallel data, aiming to enlarge natively implicit relation corpus so as to optimize the training effect. Experiments on PDTB 2.0 show the usage of the acquired parallel corpus gives statistically significant improvements over that of the prototypical corpus.
Yu Hong, Shanshan Zhu, Weirong Yan, Jianmin Yao, Qiaoming Zhu, Guodong Zhou

Microblog Sentiment Analysis with Emoticon Space Model

Abstract
Emoticons have been widely employed to express different types of moods, emotions and feelings in microblog environments. They are therefore regarded as one of the most important signals for microblog sentiment analysis. Most existing works use several emoticons that convey clear emotional meanings as noisy sentiment labels or similar sentiment indicators. However, in practical microblog environments, tens or even hundreds of emoticons are frequently adopted and all emoticons have their own unique emotional signals. Besides, a considerable number of emoticons do not have clear emotional meanings. An improved sentiment analysis model should not overlook these phenomena. Instead of manually assigning sentiment labels to several emoticons that convey relatively clear meanings, we propose the emoticon space model (ESM) that leverages more emoticons to construct word representations from a massive amount unlabeled data. By projecting words and microblog posts into an emoticon space, the proposed model helps identify subjectivity, polarity and emotion in microblog environments. The experimental results for a public microblog benchmark corpus (NLP&CC 2013) indicate that the ESM effectively leverages emoticon signals and outperforms previous state-of-the-art strategies and benchmark best runs.
Fei Jiang, Yiqun Liu, Huanbo Luan, Min Zhang, Shaoping Ma

Predicting Health Care Risk with Big Data Drawn from Clinical Physiological Parameters

Abstract
Fatty liver often afflicts patients seriously and jeopardizes the health of human race with high possibility of deteriorating into cirrhosis and liver cancer, which motivates researchers to detect causes and potential influential factors. In this paper, we study the problem of detecting the potential influential factors in workplaces and their contributions to the morbidity. To this end, gender and age, retirement status and department information are chosen as three potential influential factors in workplaces. By analyzing those factors with demographics, Propensity Score Matching and classic classifier models, we mine the relationship between the workplace factors and morbidity. This finding indicates a new domain of discussing the causes of fatty liver which originally focuses on daily diets and lifestyles.
Honghao Wei, Yang Yang, Huan Chen, Bin Xu, Jian Li, Miao Jiang, Aiping Lu

Gender Identification on Social Media

Abstract
Accurate identification of hidden demographic attributes from social media is very useful for advertisement, personalized recommendation and etc. We investigate the effect of two different classification models for the gender identification problem over different attributes of Sina Weibo users. To improve the accuracy of the classfication models, we propose a novel feature selection algorithm and a retrained multiattribute model. Experimental results show that the accuracy of our approach achieves 89.01% which is better than any previous work in this problem.
Xiaofei Sun, Xiao Ding, Ting Liu

A Hybrid Method of Domain Lexicon Construction for Opinion Targets Extraction Using Syntax and Semantics

Abstract
Considering opinion targets extraction of Chinese microblogs plays an important role in opinion mining, there has been a significant progress in this area recently, especially the CRF-based method. However, this method only takes lexical-related features into consideration and does not excavate the implied semantic and syntactic knowledge. We propose a new approach which incorporates domain lexicon with groups of features using syntax and semantics. The approach acquires domain lexicon through a novel way namely PDSP. And then we combine the domain lexicon with opinion targets extracted from CRF with groups of features together for opinion targets extraction. Experimental results on COAE2014 dataset show that this approach notably outperforms other baselines of opinion targets extraction.
Chun Liao, Chong Feng, Sen Yang, Heyan Huang

Online Social Network Profile Linkage Based on Cost-Sensitive Feature Acquisition

Abstract
Billions of people spend their virtual life time on hundreds of social networking sites for different social needs. Each social footprint of a person in a particular social networking site reflects some special aspects of himself. To adequately investigate a user’s preference for applications such as recommendation and executive search, we need to connect up all these aspects to generate a comprehensive profile of the identity. Profile linkage provides an effective solution to identify the same identity’s profiles from different social networks.
With various types of resources, comparing profiles may require plenty of expensive and time-consuming features such as avatars. To boost the online social network profile linkage solution, we propose a cost-sensitive approach that only acquires these expensive and time-consuming features when needed. By evaluating on the real-world datasets from Twitter and LinkedIn, our approach performs at over 85% F 1-measure and has the ability to prune over 80% of the unnecessary feature acquisitions, at a marginal cost of 10% performance loss.
Haochen Zhang, Minyen Kan, Yiqun Liu, Shaoping Ma

Information Diffusion and Influence Measurement Based on Interaction in Microblogging

Abstract
In microblogging, user interaction is the main factor that promotes the information diffusion rapidly. According to the user interaction in the process of information diffusion, this paper proposes a directed tree model based on user interaction that considering the history, type and frequency of interaction. User interaction matrix was used to describe the interactions between pairs of users. A directed diffusion tree was generated from the sparsification of interaction graph. The edges of directed diffusion tree were used to measure the information influence and identify the spam in microblogging. Experimental results show that the directed tree model can describe the information diffusion, measure the influence more accurately and identify the spam in the dataset more effectively.
Miao Yu, Wu Yang, Wei Wang, Guowei Shen, Guozhong Dong

Inferring Emotions from Social Images Leveraging Influence Analysis

Abstract
Nowadays thriving image-based social networks such as Flickr and Instagram are attracting more and more people’s attention. When it comes to inferring emotions from images, previous researches mainly focus on the extraction of effective image features. However, in the context of social networks, the user’s emotional state is no longer isolated, but influenced by her friends. In this paper, we aim to infer emotions from social images leveraging influence analysis. We first explore several interesting psychological phenomena on the world’s largest image-sharing website Flickr. Then we summarize these pattern into formalized factor functions. Introducing these factors into modeling, we propose a partially-labeled factor graph model to infer the emotions of social images. The experimental results shows a 23.71% promotion compared with Naïve Bayesian method and a 21.83% promotion compared with Support Vector Machine (SVM) method under the evaluation of F1-Measure, which validates the effectiveness of our method.
Boya Wu, Jia Jia, Xiaohui Wang, Yang Yang, Lianhong Cai

Emotion Evolution under Entrainment in Social Media

Abstract
Emotion entrainment refers to the phenomenon that people gradually synchronize to other’s emotion states through social interactions. Previous studies mainly focus on conducting laboratory experiments or small-scale offline surveys. Large-scale empirical studies on real-world emotion entrainment among individuals are still to be explored. Especially, determinants that influence this process are not clear. Also, how emotion evolves among people in a large scale population is still unknown. In this study, we attempt to conduct a large-scale empirical analysis on emotion entrainment based on online social media information. For this purpose, we develop a model-free framework to measure entrainment strength among people. Experimental results indicate that interaction partners with strong reciprocal entrainment tend to assume similar emotion states, and negative emotion is more empathetic in an intimate relationship. Especially, when the relationship is balanced, users are more emotionally similar to each other.
Saike He, Xiaolong Zheng, Daniel Zeng, Bo Xu, Guanhua Tian, Hongwei Hao

Topic Related Opinion Integration for Users of Social Media

Abstract
Social media such as Twitter, has become a valuable source for mining opinions of users about all kinds of topics. In this paper, we investigate how to automatically integrate topic related opinions expressed by a user in User-Generated Content (UGC). We propose a general subjectivity model by combining topics and fine-grained opinions towards each topic, and design an efficient algorithm to establish the model. We demonstrate utility of our model in the opinion prediction problem and verify the effectiveness of our model qualitatively and quantitatively in a series of experiments on real Twitter data. Results show that the proposed model is effective and can generate consistent integrated opinion summaries for users. Furthermore, the proposed model is more suitable for social media context, thus can reach better performance in an opinion prediction task.
Songxian Xie, Jintao Tang, Ting Wang

Relationship between Reviews Polarities, Helpfulness, Stars and Sales Rankings of Products: A Case Study in Books

Abstract
To help customers, especially the customers without explicit purchasing motivation, to obtain valuable information of products via E-commerce websites, it is useful to predict sales rankings of the products. This paper focuses on this problem by finding relationship between reviews, star level and sales rankings of products. We combine various factors with the information of helpfulness and conducting correlation analysis between sales rankings and our combinations to find the most correlative combinations, namely the optimal combinations. We use three domains of books from Amazon.cn to conduct experiments. The main findings show that helpfulness is really useful to predict book sales rankings. Different domains of books have different optimal combinations. In addition, in consideration of helpfulness, the combination of number of positive reviews, score of review stars and score of frequent aspects is the most correlative combination. In this paper, although reviews on Amazon.cn are written in Chinese, our method is language independent.
Qingqing Zhou, Chengzhi Zhang

Research on Webpage Similarity Computing Technology Based on Visual Blocks

Abstract
Measuring web page similarity is one of the core issues in web content detection and Classification. In this paper, we first give the definition of webpage visual blocks. And then we propose a method using visual blocks for measuring web page similarity. The experiments show our method can effectively measure level of similarity between different type of webpages.
Yuliang Wei, Bailing Wang, Yang Liu, Fang Lv

Doctor Recommendation via Random Walk with Restart in Mobile Medical Social Networks

Abstract
In this paper, we try to systematically study how to perform doctor recommendation in mobile Medical Social Networks (m-MSNs). Specifically, employing a real-world medical dataset as the source in our study, we first mine doctor-patient ties/relationships via Time-constraint Probability Factor Graph model (TPFG), and then define the transition probability matrix between neighbor nodes. Finally, we propose a doctor recommendation model via Random Walk with Restart (RWR), namely RWR-Model. Our real experiments validate the effectiveness of the proposed method. Experimental results show that we obtain the good accuracies of mining doctor-patient relationships from the network, the performance of doctor recommendation is also better than the baseline algorithms: traditional Reduced SVM (RSVM) method and IDRModel.
Jibing Gong, Ce Pang, Lili Wang, Lin Zhang, Wenbo Huang, Shengtao Sun

From Post to Values: Mining Schwartz Values of Individuals from Social Media

Abstract
This paper aims to provide a novel method called Automatic Estimation of Schwartz Values (AESV) from social media, which automatically conducts text categorization based on Schwartz theory. AESV comprises three key components: training, feature extraction and values computation. Specifically, a training corpus is firstly built from the Web for each Schwartz value type and the feature vector is then extracted by using Chi statistics. Last but most important, as for individual values calculation, the personal posts are collected as input data which are converted to a word vector. The similarities between input vector and each value feature vector are used to calculate the individual value priorities. An experiment with 101 participants has been conducted, implying that AESV could obtain the competitive results, which are close to manually measurement by expert survey. In a further experiment, 92 users with different patterns on Sina weibo are tested, indicating that AESV algorithm is robust and could be widely applied in surveying the values for a huge amount of people, which is normally expensive and time-consuming in social science research. It is noted that our work is promising to automatically measure individual’s values just using his/her posts on social media.
Mengshu Sun, Huaping Zhang, Yanping Zhao, Jianyun Shang

Personality Prediction Based on All Characters of User Social Media Information

Abstract
In recent years, the number of social networks users has shown explosive growth. In this context, social media provides researchers with plenty of information about user behavior and social behavior. We are beginning to understand user’s behavior on social media is related to user’s personality. Conventional personality assessment depends on self-report inventory, which costs a lot to collect information. This paper tries to predict user’s Big-Five personality through their information on social networks. We conducted a Big-Five personality inventory test with 131 users of Chinese social network Sina Weibo, and crawled all of their Weibo texts and profile information. By studying the relevance between all types of user generated information and personality results of users, we extracted five most relative dimensionalities and used machine learning method to successfully predict the Big-Five personality of users.
Danlin Wan, Chuang Zhang, Ming Wu, Zhixiang An

Identifying Opinion Leaders from Online Comments

Abstract
Online comments are ubiquitous in social media such as micro-blogs, forums and blogs. They provide opinions of reviewers that are useful for understanding social media. Identifying opinion leaders from all reviewers is one of the most important tasks to analysis online comments. Most existing methods to identify opinion leaders only consider positive opinions. Few studies investigate the effect of negative opinions on opinion leader identification. In this paper, we propose a novel method to identify opinion leaders from online comments based on both positive and negative opinions. In this method, we first construct a signed network from online comments, and then design a new model based on PageTrust, called TrustRank, to identify opinion leaders from the signed network. Experimental results on the online comments of a real forum show that the proposed method is competitive with other related state-of-the-art methods.
Yi Chen, Xiaolong Wang, Buzhou Tang, Ruifeng Xu, Bo Yuan, Xin Xiang, Junzhao Bu

A Method of User Recommendation in Social Networks Based on Trust Relationship and Topic Similarity

Abstract
In the research area of user recommendation in social network sites (SNS), there exist problems that some algorithms based on the structure of SNS are resulting in low quality recommendation results due to lack of model and mechanism to express users’ topic similarity, some algorithms which use topic model to measure the theme similarity between users cost a lot of time because of the topic model have a high time complexity in case of large amounts of data. This paper proposed a hybrid method for user recommendation based on trust relationship and topic similarity between users, aiming to widening their circle of friends and enhancing user stickiness of SNS. Two main steps are involved in this process: (1) a trust-propagation based community detection method is proposed to model the users’ social relationship; (2) a topic model is applied to retrieve users’ topics from their microblogging, and gain the recommendations by the topic similarity. Our research brings two major contributions to the research community: (1) a Peer-to-Peer trust model, PGP, is introduced to the field of community detection and we improve the PGP model to compute trust value more precise; (2) a distributed implementation of the topic model is proposed to reduce total execution time. Finally, we conduct experiments with Sina-microblog datasets, which shows the model we proposed can availably compute the trust degree between users, and gain a better result of recommendation. Our evaluation demonstrates the effectiveness, efficiency, and scalability of the proposed method.
Yufeng Ma, Zidan Yu, Jun Ding

Backmatter

Additional information

Premium Partner

    Image Credits