Mining comparative opinions from customer reviews for Competitive Intelligence

https://doi.org/10.1016/j.dss.2010.08.021Get rights and content

Abstract

Competitive Intelligence is one of the key factors for enterprise risk management and decision support. However, the functions of Competitive Intelligence are often greatly restricted by the lack of sufficient information sources about the competitors. With the emergence of Web 2.0, the large numbers of customer-generated product reviews often contain information about competitors and have become a new source of mining Competitive Intelligence. In this study, we proposed a novel graphical model to extract and visualize comparative relations between products from customer reviews, with the interdependencies among relations taken into consideration, to help enterprises discover potential risks and further design new products and marketing strategies. Our experiments on a corpus of Amazon customer reviews show that our proposed method can extract comparative relations more accurately than the benchmark methods. Furthermore, this study opens a door to analyzing the rich consumer-generated data for enterprise risk management.

Introduction

Competitive Intelligence (CI) involves the early identification of potential risks and opportunities by gathering and analyzing information about the environment to support managers in making strategic decisions for an enterprise [33]. Most firms realize the importance of CI in enterprise risk management and decision support, and invest a large amount of money in CI. A survey from the American Futures Group consulting firm indicates that 82% of large enterprises and over 90% of the Forbes top 500 global firms adopt CI for risk management and decisions. By the end of the 20th century, the overall production value of CI industry had reached 70 billion U.S. dollars [23].

In order to identify potential risks, it is important for companies to collect and analyze information about their competitors' products and plans. Based on such information, a company can learn the relative weaknesses and strengths of its own products, and can then design new pointed products and campaigns to countervail those of its competitors. Traditionally, information about competitors has mainly come from press releases, such as analyst reports and trade journals, and recently also from competitors' websites and news sites. Unfortunately, such information is mostly generated by the company that produces the product. Therefore, the amount of available information is limited and its objectivity is questionable. The lack of sufficient and reliable information sources about competitors greatly restricts the capability of CI.

With the emergence of Web 2.0, an increasing number of customers now have opportunities to directly express their opinions and sentiments regarding products through various channels, such as online shopping sites, blogs, social network sites, forums, and so forth. These opinion data, coming directly from customers, become a natural information source for CI. There are some existing studies on mining customer opinions [6], [7], [27], [31], [34]. However, these studies mainly focus on identifying customers' sentiment polarities toward products. The most important problem in CI—i.e., collecting and analyzing the competitors' information to identify potential risks as early as possible and plan appropriate strategies—has not been well studied.

Customer reviews are often a rich source of comparison opinions. Users usually prefer to compare several competitive products with similar functions, for example,

  • Nokia N95 has a stronger signal than iPhone.

  • The iPhone has better looks, but a much higher price than the BB Curve.

  • Compared with the v3, this V8 has a bigger body, and it has a much worse keyboard than Nokia E71.

These comparison opinions are precious information sources for identifying the relative strengths and weaknesses of products, analyzing the enterprise risk and threats from competitors, and further designing new products and business strategies.

Mining such comparison opinions is a non-trivial task due to the large amount of customer reviews and their informal style. In this paper, we propose a novel approach to extracting product comparative relations from customer reviews, and display the results as comparative relation maps for decision support in enterprise risk management.

The remainder of this paper is organized as follows: Section 2 reviews the related work in comparative opinion mining. Section 3 introduces our overall approach of comparative relation extraction. Section 4 introduces a novel graphical model we propose for comparative relation extraction. Section 5 presents our experiments that evaluate the proposed relation extraction approach. Section 6 concludes our study and discusses some future directions for research.

Section snippets

Sentiment analysis of user opinions

Much research exists on sentiment analysis of user opinion data [6], [7], [27], [31], [34], which mainly judges the polarities of user reviews. In these studies, sentiment analysis is often conducted at one of three levels: the document level, sentence level, or attribute level. Sentiment analysis at the document level classifies reviews into the types of polarities—positive, negative, or neutral—based on the overall sentiments in the reviews. A number of machine learning techniques have been

Problem formulation

Most comparison opinions can be expressed in a succinct format—the comparative relation.

Definition

A comparative relation is a formal expression of customers' comparison opinions, which can capture the customers' sentiment polarities on the competitive products about special attributes. A comparative relation can be expressed as a 4-tuple:RP1,P2,A,S

where P1 and P2 are the two product names, A is the attribute name, and S is the sentimental phrase. For convenience, here we call all product names P1 and P2,

Comparative relations

The following are several examples of the comparison opinions and their comparative relations.

  • Example 1:

    Nokia N95 has a better camera than iPhone.

  • > (Nokia N95, iPhone, camera, better)

  • Example 2:

    Compared with Nokia N95, iPhone has a better camera.

  • < (Nokia N95, iPhone, camera, better)

  • Example 3:

    The Pearl and the Curve are both with high resolution camera.

  • ~ (Pearl, Curve, camera, high resolution)

  • Example 4:

    The screen of iPhone is bigger than that of the curve, so I can read easily.

  • > (iPhone, curve, screen, bigger)

  • Example 5:

    The price of iPhone is

Experimental evaluation

We conducted experiments to evaluate the performances of the proposed model for comparative relation extraction. Furthermore, a case/tutorial is used to show the usefulness of the comparative relation maps for risk management and decision support.

Conclusions and future work

In this paper, we designed a novel method to extract comparative relations from customer opinion data, to build comparative relation maps for aiding enterprise managers in identifying the potential operation risks and supporting strategy decisions. The two-level CRF model with unfixed interdependencies can better extract the comparative relations, by utilizing the complicated dependencies between relations, entities and words, and the unfixed interdependencies among relations. The empirical

Acknowledgements

This research was directly supported by a GRF grant (CityU 147407), of the RGC, Hong Kong SAR government and an SRG grant (7002425) of the City University of Hong Kong. The authors are also grateful to the referees for their helpful comments and valuable suggestions for improving the earlier version of the paper.

Kaiquan Xu is a Ph.D. student in the Department of Information Systems at City University of Hong Kong. His research interests include opinion mining, social network mining, information extraction, and machine learning. He has published articles in the Journal of Information Science and presented papers at conferences, such as ICIS, HICSS, AMCIS, and PACIS. He has also worked in the IT industry as a technical staff at Oracle China R&D Center.

References (48)

  • A. Abbasi et al.

    Sentiment analysis in multiple languages: feature selection for opinion classification in Web forums

    ACM Transactions on Information Systems

    (2008)
  • S. Argamon et al.

    Stylistic text classification using functional lexical features

    Journal of the American Society for Information Science and Technology

    (2007)
  • S. Bethard et al.

    Automatic extraction of opinion propositions and their holders

  • R.C. Bunescu et al.

    A shortest path dependency kernel for relation extraction

  • R. Bunescu et al.

    Subsequence kernels for relation extraction

  • M. Chau et al.

    Mining communities and their relationships in blogs: a study of online hate groups

    International Journal of Human–Computer Studies

    (2007)
  • H. Chen

    Intelligence and security informatics: information systems perspective

    Decision Support Systems

    (2006)
  • A. Culotta et al.

    Dependency tree kernels for relation extraction

  • A. Divoli et al.

    BioIE: extracting informative sentences from the biomedical literature

    Bioinformatics

    (2005)
  • D. Edwards

    Introduction to Graphical Modelling

    (2000)
  • A. Esuli et al.

    SENTIWORDNET: a publicly available lexical resource for opinion mining

  • Z. Fei

    extended SentimentWord

  • B.A. Hagedorn et al.

    World knowledge in broad-coverage information filtering

  • J. Hakenberg et al.

    LLL'05 challenge: genic interaction extraction—identification of language patterns based on alignment and finite state automata

  • V. Hatzivassiloglou et al.

    Effects of adjective orientation and gradability on sentence subjectivity

  • C.W. Hsu et al.

    A comparison of methods for multiclass support vector machines

    IEEE Transactions on Neural Networks

    (2002)
  • M. Hu et al.

    Mining and summarizing customer reviews

  • N. Jindal et al.

    Mining comparative sentences and relations

  • T. Joachims

    Text categorization with support vector machines: learning with many relevant features

  • T. Joachims

    SVM-multiclass: Multi-Class Support Vector Machine

  • N. Kambhatla

    Combining lexical, syntactic, and semantic features with maximum entropy models for extracting relations

  • Y. Kim et al.

    Identifying opinion holders in opinion text from online newspapers

  • L. Kong et al.

    Study on Competitive Intelligence system based on Web

  • T. Kudo et al.

    A boosting algorithm for classification of semi-structured text

  • Cited by (0)

    Kaiquan Xu is a Ph.D. student in the Department of Information Systems at City University of Hong Kong. His research interests include opinion mining, social network mining, information extraction, and machine learning. He has published articles in the Journal of Information Science and presented papers at conferences, such as ICIS, HICSS, AMCIS, and PACIS. He has also worked in the IT industry as a technical staff at Oracle China R&D Center.

    Dr Stephen Liao is Associate Professor of Information Systems at the City University of Hong Kong. He earned a Ph.D. from University of Aix-Marseille III and the Institute of France Telecom in 1993. He has been working at the City University of Hong Kong since 1993, and his research has focused on use of IT in e-business systems. His articles have been published extensively in various academic journals such as Decision Support Systems, IEEE transactions, Communications of the ACM, Information Science, Computer Software, and so on. His current research interests include use of data mining techniques in mobile commerce applications and intelligent business systems, especially intelligent transportation systems.

    Dr. Jiexun Li received his Bachelor's degree in Engineering in 2000 and Master's degree in Management in 2002 from Tsinghua University in Beijing, China. He received his Ph.D. degree in Management (with a major focus in Management Information Systems) from the University of Arizona in 2007. He is currently an Assistant Professor in the College of Information Science and Technology at Drexel University. His research interests are knowledge discovery (data/text/web mining), machine learning, network analysis, and their applications in various areas, including customer relationship management, bioinformatics, medical informatics, security, and so on.

    View full text