skip to main content
10.1145/775047.775098acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Mining product reputations on the Web

Authors Info & Claims
Published:23 July 2002Publication History

ABSTRACT

Knowing the reputations of your own and/or competitors' products is important for marketing and customer relationship management. It is, however, very costly to collect and analyze survey data manually. This paper presents a new framework for mining product reputations on the Internet. It automatically collects people's opinions about target products from Web pages, and it uses text mining techniques to obtain the reputations of those products.On the basis of human-test samples, we generate in advance syntactic and linguistic rules to determine whether any given statement is an opinion or not, as well as whether such any opinion is positive or negative in nature. We first collect statements regarding target products using a general search engine, and then, using the rules, extract opinions from among them and attach three labels to each opinion, labels indicating the positive/negative determination, the product name itself, and an numerical value expressing the degree of system confidence that the statement is, in fact, an opinion. The labeled opinions are then input into an opinion database.The mining of reputations, i.e., the finding of statistically meaningful information included in the database, is then conducted. We specify target categories using label values (such as positive opinions of product A) and perform four types of text mining: extraction of 1) characteristic words, 2) co-occurrence words, 3) typical sentences, for individual target categories, and 4) correspondence analysis among multiple target categories.Actual marketing data is used to demonstrate the validity and effectiveness of the framework, which offers a drastic reduction in the overall cost of reputation analysis over that of conventional survey approaches and supports the discovery of knowledge from the pool of opinions on the web.

References

  1. B. Adelberg, Nodose - a tool for semi-automatically extracting structured and semistructured data from text documents, in Proc. of the 1998 ACM SIGMOD International Conference on Management of Data, pp:283--294, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. R. Agrawal and R. Srikant, Fast algorithms for mining association rules, in Proc. 1994 Int'l. Conf. Very Large Data Bases (VLDB), pp:487--499, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M.R. Anderberg, Cluster Analysis for Applications, Academic Press, 1973.Google ScholarGoogle Scholar
  4. N. Ashish and C. Knoblock, Wrapper generation for semi-structured internet sources, SIGMOD Record, 26(4), 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J.P. Benzecri, Correspondence Analysis Handbook, Mercel Dekker, 1992.Google ScholarGoogle Scholar
  6. V. Chaudhri and R. Fikes, Answering Systems, the 1999 Fall Symposium. Technical Report, FS-98-04, AAAI, November 1999.Google ScholarGoogle Scholar
  7. D. Clark, Shopbots Become Agents for Business Change, Computer, 33, pp:18--21, February 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. Craven, D. DiPasquo, D. Freitag, A. McCallum, T. Mitchell, K. Nigam, and S. Slattery, Learning to construct knowledge bases from World Wide Web, Artificial Intelligence, 118, pp:1--2, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. R. Doorenbos, O. Etzioni, and D. Weld, A scalable comparison-shopping agent for the World-Wide Web, in Proc. of the First International Conference on Autonomous Agents Agents'97, pp:39--48, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. D. Florescu, A. Levy, and A. Mendelzon, Database Techniques for the World-Wide Web: A Survey, SIG-MOD Record, 27(3), 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Fujitsu, Symfoware World http://www.fujitsu.co.jp/jp/soft/symfoware/index.html, 2001.Google ScholarGoogle Scholar
  12. S. Harabagiu, M. Pasca, and S. Maiorano, Experiments with open-domain textual question answering, in Proc. of COLING-2000, pp:292--298, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. B. Katz, From sentence processing to information access on the World Wide Web. in Natural Language Processing for the World Wide Web: the 1997 AAAI Spring Symposium, pp:77--94, 1999.Google ScholarGoogle Scholar
  14. Komatsu Soft, Information Mining Tool VextSearch (in Japanese) http://www.komatsusoft.co.jp/develp/vxtsc/index.html, 2001.Google ScholarGoogle Scholar
  15. H. Li and K. Yamanishi, Mining from open answers in questionnaire data, in Proc. of KDD 2001, pp:443--449, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. H. Li and K. Yamanishi, Text classification using ESC-based stochastic decision lists, Information Processing and Management, 38, pp. 343--361, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. K.C. Litkowski, Question-answering using semantic relation triples.in Proc. of the 8th Text Retrieval Conference (TREC-8)., pp:349--356, 1999.Google ScholarGoogle Scholar
  18. D. Moldovan and S. Harabagiu, The structure and performance of an open-domain question answering system, in Proc. of the 38th Annual Meeting of the Association for Computational Linguistics, pp:563--570, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. Prayer, E. Brown, and A. Coden, Question-answering by predictive annotation, in Proc. of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp:184--191, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J.R. Qninlan, C4.5: Programs for Machine Learning, Morgan Kaufmann, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. Rissanen, Fisher information and stochastic complexity, IEEE Transaction on Information Theory, 42(1), pp:40--47, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. D. R. Radev, J. Prager, and V. Samn, The use of predictive annotation for question answering in Proc. of the 8th Text Retrieval Conference (TREC-8), pp:399--411, 1999.Google ScholarGoogle Scholar
  23. R. Srihari and W. Li, Information extraction supported question answering, in Proc. of the 8th Text Retrieval Conference (TREC-8), pp:185--196, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  24. K. Tateishi, Y. Ishiguro, and T. Fukushima, A reputation search engine that gathers people's opinions from the internet, (in Japanese) Technical Report NL-144-11, Information Processing Society of Japan, pp:75--82, 2001.Google ScholarGoogle Scholar
  25. E.M. Voorhees and D.M. Tice, Building a quesdtion answering test collection, in Proc. of the 23rd Annual International ACM SIGIR Conference on Research and Development in Informtion Retrieval, pp:200--207, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. K. Yamanishi, A learning criterion for stochastic rules, Machine Learning, 9, pp:165--203, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. K. Yamanishi, A decision-theoretic extension of stochastic complexity and its applications to learning, IEEE Trans. on Infortmation Theory, 44(4), pp:1424--1439, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Mining product reputations on the Web

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
            July 2002
            719 pages
            ISBN:158113567X
            DOI:10.1145/775047

            Copyright © 2002 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 23 July 2002

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • Article

            Acceptance Rates

            KDD '02 Paper Acceptance Rate44of307submissions,14%Overall Acceptance Rate1,133of8,635submissions,13%

            Upcoming Conference

            KDD '24

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader