Skip to main content

2011 | OriginalPaper | Buchkapitel

3. Spam, Opinions, and Other Relationships: Towards a Comprehensive View of the Web Knowledge Discovery

verfasst von : Bettina Berendt

Erschienen in: Advanced Topics in Information Retrieval

Verlag: Springer Berlin Heidelberg

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

“Web mining” or “Web Knowledge Discovery” is the analysis of web resources with data-mining techniques such as classification, clustering, association-rule or graph-structure methods. Its applications pervade much of the software web users interact with on a daily basis: search engines’ indexing and ranking choices, recommender systems’ recommendations, targeted advertising, and many others. An understanding of this fast-moving field is therefore a key component of digital information literacy for everyone and a useful and fascinating extension of knowledge and skills for Information Retrieval researchers and practitioners. This chapter proposes an integrating model of learning cycles involving data, information and knowledge, explains how this model subsumes Information Retrieval and Knowledge Discovery and relates them to one another. We illustrate the usefulness of this model in an introduction to web content/text mining, using the model to structure the activities in this form of Knowledge Discovery. We focus on spam detection, opinion mining and relation mining. The chapter aims at complementing other books and articles that focus on the computational aspects of web mining, by emphasizing the often-neglected context in which these computational analyses take place: the full cycle of Knowledge Discovery, which ranges from application understanding via data understanding, data preparation, modeling and evaluation to deployment.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
Two other application areas of web mining that have received a lot of attention recently are the mining of news and the mining of social media such as blogs; for overviews of their specifics, see for example the proceedings of the International Conference on Weblogs and Social Media at http://​www.​icwsm.​org and Berendt (2010).
 
2
There are various concepts of “data vs. information vs. knowledge”. The notions we use are designed to be maximally consistent with the uses of the term in the databases, Information Retrieval, and Knowledge Discovery literatures. For a summary, see Fig. 3.1 for details.
 
3
The classical definition is “the nontrivial process of identifying valid, previously unknown, and potentially useful patterns” (Fayyad et al. 1996).
 
4
The association of induction/abduction with new knowledge goes back to Peirce, cf. the collection of relevant text passages at http://​www.​helsinki.​fi/​science/​commens/​terms/​abduction.​html.
 
5
Thanks to Ricardo Baeza-Yates for the ideas and discussions that led to this figure.
 
8
While the exploration of data is often considered but one and the first step of data-mining modeling, it is also common to regard the whole of data mining (modeling) as exploratory data analysis. The reason is that in contrast to confirmatory methods, one usually does not test a previously specified hypothesis, does not collect data only for this purpose, and performs an open-ended number of statistical tests.
 
9
Other spammers want to convince gullible people to disclose their passwords (phishing). For reasons of space, we do not investigate this further here.
 
10
“New web spam techniques are introduced every 2–3 days.” (Liverani 2008).
 
12
All retrieved on 2010-04-10.
 
13
These are typical examples of humans having fed their knowledge into machine-readable data as described by the left-pointing arrows at the bottom of Fig. 3.3.
 
14
RDF triples (just like database content) do not need to be authored by technology-savvy users: Web forms are a convenient way to collect structured data from laypeople. Thus, for example, social networks generate and hold masses of personal data in table/RDF form and accessible over the Web. Examples are the FOAF export of Livejournal (http://​www.​livejournal.​com/​bots/​) and exporter tools for Facebook (http://​www.​dcs.​shef.​ac.​uk/​~mrowe/​foafgenerator.​html), Twitter (http://​sioc-project.​org/​node/​262) or Flickr (http://​apassant.​net/​home/​2007/​12/​flickrdf).
 
15
These are typical examples of humans having fed their knowledge into machine-readable information as described by the left-pointing arrows in the middle of Fig. 3.3.
 
16
The decision whether to treat something as a concept (standing in a subclass relation to another concept) or as an instance (standing in an instance-of relation) is not always straightforward, handled differently by different extraction methods, and even treated differently by different logics and ontology formalisms. For reasons of space, we will therefore not investigate this differentiation.
 
17
Both retrieved on 2010-04-10.
 
18
These are typical examples of humans having fed their knowledge into machine-readable knowledge as described by the left-pointing arrows at the top of Fig. 3.3a, and into the form that can be used for automatic consistency checking in the sense of Fig. 3.3b.
 
23
A cross-disciplinary initiative to understand the ways in which personal details are collected, stored, transmitted, checked, and used as means of influencing and managing people and populations; for an overview, see Lyon (2007).
 
24
We have deliberately not discussed any accuracies, F measure values, or other absolute numbers here, in order to concentrate on the big picture. However, the reader is encouraged to consult original articles, investigate the reported quality values closely, and consider what for example a 20% misclassification rate or an unknown recall rate may mean in practice.
 
Literatur
Zurück zum Zitat Attenberg J, Suel T (2008) Cleaning search results using term distance features. In: Proceedings of the International Workshop on Adversarial Information Retrieval on the Web. AIRWeb ’08. ACM Press, New York, NY, pp 21–24. http://doi.acm.org/10.1145/1451983.1451989, visited on December, 2010 Attenberg J, Suel T (2008) Cleaning search results using term distance features. In: Proceedings of the International Workshop on Adversarial Information Retrieval on the Web. AIRWeb ’08. ACM Press, New York, NY, pp 21–24. http://​doi.​acm.​org/​10.​1145/​1451983.​1451989, visited on December, 2010
Zurück zum Zitat Backstrom L, Dwork C, Kleinberg JM (2007) Wherefore art thou r3579x?: Anonymized social networks and hidden patterns and structural steganography. In: Williamson CL, Zurko ME, Patel-Schneider PF, Shenoy PJ (eds) Proceedings of the International Conference on the World Wide Web. ACM Press, New York, NY, pp 181–190 CrossRef Backstrom L, Dwork C, Kleinberg JM (2007) Wherefore art thou r3579x?: Anonymized social networks and hidden patterns and structural steganography. In: Williamson CL, Zurko ME, Patel-Schneider PF, Shenoy PJ (eds) Proceedings of the International Conference on the World Wide Web. ACM Press, New York, NY, pp 181–190 CrossRef
Zurück zum Zitat Baldi P, Frasconi P, Smyth P (2003) Modeling the Internet and the Web: Probabilistic Methods and Algorithms. Wiley, Chichester Baldi P, Frasconi P, Smyth P (2003) Modeling the Internet and the Web: Probabilistic Methods and Algorithms. Wiley, Chichester
Zurück zum Zitat Banko M, Cafarella MJ, Soderland S, Broadhead M, Etzioni O (2007) Open information extraction from the Web. In: Veloso MM, Veloso MM (eds) Proceedings of the International Joint Conferences on Artificial Intelligence, pp 2670–2676 Banko M, Cafarella MJ, Soderland S, Broadhead M, Etzioni O (2007) Open information extraction from the Web. In: Veloso MM, Veloso MM (eds) Proceedings of the International Joint Conferences on Artificial Intelligence, pp 2670–2676
Zurück zum Zitat Barth A, Datta A, Mitchell JC, Nissenbaum H (2006) Privacy and contextual integrity: Framework and applications. In: Proceedings of the IEEE Symposium on Security and Privacy. IEEE Computer Society, Los Alamitos, pp 184–198 Barth A, Datta A, Mitchell JC, Nissenbaum H (2006) Privacy and contextual integrity: Framework and applications. In: Proceedings of the IEEE Symposium on Security and Privacy. IEEE Computer Society, Los Alamitos, pp 184–198
Zurück zum Zitat Berendt B (2007) Intelligent business intelligence and privacy: More knowledge through less data? In: Köppen, Müller R (eds) Business Intelligence: Methods and Applications. Verlag Dr. Kovač, Hamburg, pp 63–79 Berendt B (2007) Intelligent business intelligence and privacy: More knowledge through less data? In: Köppen, Müller R (eds) Business Intelligence: Methods and Applications. Verlag Dr. Kovač, Hamburg, pp 63–79
Zurück zum Zitat Berendt B (2008) You are a document too: Web mining and IR for next-generation information literacy. In: Macdonald C, Ounis I, Plachouras V, Ruthven I, White RW (eds) Proceedings of the European Conference on Information Retrieval. Lecture Notes in Computer Science, vol 4956. Springer, Berlin, p 3 Berendt B (2008) You are a document too: Web mining and IR for next-generation information literacy. In: Macdonald C, Ounis I, Plachouras V, Ruthven I, White RW (eds) Proceedings of the European Conference on Information Retrieval. Lecture Notes in Computer Science, vol 4956. Springer, Berlin, p 3
Zurück zum Zitat Berendt B (2010) Text mining for news and blogs analysis. In: Sammut C, Webb G (eds) Encyclopedia of Machine Learning. Springer, Berlin, pp 968–972 Berendt B (2010) Text mining for news and blogs analysis. In: Sammut C, Webb G (eds) Encyclopedia of Machine Learning. Springer, Berlin, pp 968–972
Zurück zum Zitat Berendt B, Krause B, Kolbe-Nusser S (2010) Intelligent scientific authoring tools: Interactive data mining for constructive uses of citation networks. Information Processing and Management 46(1):1–10 CrossRef Berendt B, Krause B, Kolbe-Nusser S (2010) Intelligent scientific authoring tools: Interactive data mining for constructive uses of citation networks. Information Processing and Management 46(1):1–10 CrossRef
Zurück zum Zitat Berry M, Linoff G (2002) Mining the Web: Transforming customer data. Wiley, Hoboken, NJ Berry M, Linoff G (2002) Mining the Web: Transforming customer data. Wiley, Hoboken, NJ
Zurück zum Zitat Berry M, Linoff G (2004) Data Mining Techniques. Wiley, Hoboken, NJ Berry M, Linoff G (2004) Data Mining Techniques. Wiley, Hoboken, NJ
Zurück zum Zitat Bíró I, Szabó J, Benczúr AA (2008) Latent Dirichlet allocation in web spam filtering. In: Proceedings of the International Workshop on Adversarial Information Retrieval on the Web, pp 29–32 Bíró I, Szabó J, Benczúr AA (2008) Latent Dirichlet allocation in web spam filtering. In: Proceedings of the International Workshop on Adversarial Information Retrieval on the Web, pp 29–32
Zurück zum Zitat Bizer C, Heath T, Berners-Lee T (2009) Linked data—the story so far. International Journal of Semantic Web Information Systems 5(3):1–22 CrossRef Bizer C, Heath T, Berners-Lee T (2009) Linked data—the story so far. International Journal of Semantic Web Information Systems 5(3):1–22 CrossRef
Zurück zum Zitat Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. Journal of Machine Learning Research 3:993–1022 MATHCrossRef Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. Journal of Machine Learning Research 3:993–1022 MATHCrossRef
Zurück zum Zitat Buitelaar P, Cimiano P, Magnini B (2005) Ontology learning from text: An overview. In: Buitelaar P, Cimiano P, Magnini B (eds) Ontology Learning from Text: Methods, Evaluation and Applications/Frontiers in Artificial Intelligence and Applications, vol 7. IOS Press, pp 3–14 Buitelaar P, Cimiano P, Magnini B (2005) Ontology learning from text: An overview. In: Buitelaar P, Cimiano P, Magnini B (eds) Ontology Learning from Text: Methods, Evaluation and Applications/Frontiers in Artificial Intelligence and Applications, vol 7. IOS Press, pp 3–14
Zurück zum Zitat Carlson A, Betteridge J, Wang RC, Hruschka ER Jr, Mitchell TM (2010) Coupled semi-supervised learning for information extraction. In: Davison BD, Suel T, Craswell N, Liu B (eds) Proceedings of the ACM Conference on Web Search and Data Mining. ACM Press, New York, NY, pp 101–110 CrossRef Carlson A, Betteridge J, Wang RC, Hruschka ER Jr, Mitchell TM (2010) Coupled semi-supervised learning for information extraction. In: Davison BD, Suel T, Craswell N, Liu B (eds) Proceedings of the ACM Conference on Web Search and Data Mining. ACM Press, New York, NY, pp 101–110 CrossRef
Zurück zum Zitat Chakrabarti S (2003) Mining the Web. Morgan Kaufmann, San Francisco, CA Chakrabarti S (2003) Mining the Web. Morgan Kaufmann, San Francisco, CA
Zurück zum Zitat Davenport T, Beck J (2001) The Attention Economy: Understanding the New Currency of Business. Harvard Business School Press, Cambridge, MA Davenport T, Beck J (2001) The Attention Economy: Understanding the New Currency of Business. Harvard Business School Press, Cambridge, MA
Zurück zum Zitat Deerwester SC, Dumais S, Furnas G, Landauer T, Harshman R (1990) Indexing by latent semantic analysis. Journal of the American Society for Information Science 41:391–407 CrossRef Deerwester SC, Dumais S, Furnas G, Landauer T, Harshman R (1990) Indexing by latent semantic analysis. Journal of the American Society for Information Science 41:391–407 CrossRef
Zurück zum Zitat Domingo-Ferrer J (2007) A three-dimensional conceptual framework for database privacy. In: Secure Data Management. Lecture Notes in Computer Science, vol 4721. Springer, Berlin, pp 193–202 CrossRef Domingo-Ferrer J (2007) A three-dimensional conceptual framework for database privacy. In: Secure Data Management. Lecture Notes in Computer Science, vol 4721. Springer, Berlin, pp 193–202 CrossRef
Zurück zum Zitat Drost I, Scheffer T (2005) Thwarting the nigritude ultramarine: Learning to identify link spam. In: João Gama RC, Brazdil P, Jorge A, Torgo L (eds) Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery. Lecture Notes in Computer Science, vol 3720. Springer, Berlin, pp 96–107 Drost I, Scheffer T (2005) Thwarting the nigritude ultramarine: Learning to identify link spam. In: João Gama RC, Brazdil P, Jorge A, Torgo L (eds) Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery. Lecture Notes in Computer Science, vol 3720. Springer, Berlin, pp 96–107
Zurück zum Zitat Etzioni O, Cafarella MJ, Downey D, Popescu AM, Tal Shaked SS, Weld DS, Yates A (2004) Methods for domain-independent information extraction from the Web: An experimental comparison. In: Proceedings of the National Conference on Artificial Intelligence, pp 391–398 Etzioni O, Cafarella MJ, Downey D, Popescu AM, Tal Shaked SS, Weld DS, Yates A (2004) Methods for domain-independent information extraction from the Web: An experimental comparison. In: Proceedings of the National Conference on Artificial Intelligence, pp 391–398
Zurück zum Zitat Fayyad U, Piatetsky-Shapiro G, Smyth P (1996) From data mining to knowledge discovery. In: Fayyad M, G Piatetsky-Shapiro PS, Uthurusamy R (eds) Advances in Knowledge Discovery and Data Mining. AAAI/MIT Press, Cambridge, MA, pp 1–34 Fayyad U, Piatetsky-Shapiro G, Smyth P (1996) From data mining to knowledge discovery. In: Fayyad M, G Piatetsky-Shapiro PS, Uthurusamy R (eds) Advances in Knowledge Discovery and Data Mining. AAAI/MIT Press, Cambridge, MA, pp 1–34
Zurück zum Zitat Feldman R, Sanger J (2007) The Text Mining Handbook. Advanced Approaches in Analyzing Unstructured Data. Cambridge University Press, Cambridge Feldman R, Sanger J (2007) The Text Mining Handbook. Advanced Approaches in Analyzing Unstructured Data. Cambridge University Press, Cambridge
Zurück zum Zitat Fellbaum C (1998) Wordnet: An Electronic Lexical Database. MIT Press, Cambridge, MA MATH Fellbaum C (1998) Wordnet: An Electronic Lexical Database. MIT Press, Cambridge, MA MATH
Zurück zum Zitat Fortuna B, Grobelnik M, Mladenic D (2005) Visualization of text document corpus. Informatica (Slovenia) 29(4):497–504 Fortuna B, Grobelnik M, Mladenic D (2005) Visualization of text document corpus. Informatica (Slovenia) 29(4):497–504
Zurück zum Zitat Fortuna B, Mladenic D, Grobelnik M (2006) Semi-automatic construction of topic ontologies. In: Ackermann M (ed) Proceedings of the Semantics, Web and Mining Workshops at the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery. Lecture Notes in Computer Science, vol 4289. Springer, Berlin, pp 121–131 Fortuna B, Mladenic D, Grobelnik M (2006) Semi-automatic construction of topic ontologies. In: Ackermann M (ed) Proceedings of the Semantics, Web and Mining Workshops at the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery. Lecture Notes in Computer Science, vol 4289. Springer, Berlin, pp 121–131
Zurück zum Zitat Fortuna B, Galleguillos C, Cristianini N (2009) Detecting the bias in media with statistical learning methods. In: Text Mining: Classification, Clustering and Applications. Chapman & Hall/CRC Press, New York, NY, pp 27–50 CrossRef Fortuna B, Galleguillos C, Cristianini N (2009) Detecting the bias in media with statistical learning methods. In: Text Mining: Classification, Clustering and Applications. Chapman & Hall/CRC Press, New York, NY, pp 27–50 CrossRef
Zurück zum Zitat Frankowski D, Cosley D, Sen S, Terveen LG, Riedl J (2006) You are what you say: Privacy risks of public mentions. In: Efthimiadis EN, Dumais ST, Hawking D, Järvelin K (eds) Proceedings of the ACM Conference on Research and Development in Information Retrieval. ACM Press, New York, NY, pp 565–572 Frankowski D, Cosley D, Sen S, Terveen LG, Riedl J (2006) You are what you say: Privacy risks of public mentions. In: Efthimiadis EN, Dumais ST, Hawking D, Järvelin K (eds) Proceedings of the ACM Conference on Research and Development in Information Retrieval. ACM Press, New York, NY, pp 565–572
Zurück zum Zitat Gordon DF, des Jardins M (1995) Evaluation and selection of biases in machine learning. Machine Learning 20(1–2):5–22 Gordon DF, des Jardins M (1995) Evaluation and selection of biases in machine learning. Machine Learning 20(1–2):5–22
Zurück zum Zitat Gürses FS (2010) Multilateral privacy requirements analysis in online social network services. PhD thesis, KU Leuven and Dept of Computer Science Gürses FS (2010) Multilateral privacy requirements analysis in online social network services. PhD thesis, KU Leuven and Dept of Computer Science
Zurück zum Zitat Gürses FS, Berendt B (2010) The social Web and privacy: Practices, reciprocity and conflict detection in social networks. In: Ferrari E, Bonchi F (eds) Privacy-Aware Knowledge Discovery. Chapman & Hall/CRC Press, New York, NY, pp 395–432 CrossRef Gürses FS, Berendt B (2010) The social Web and privacy: Practices, reciprocity and conflict detection in social networks. In: Ferrari E, Bonchi F (eds) Privacy-Aware Knowledge Discovery. Chapman & Hall/CRC Press, New York, NY, pp 395–432 CrossRef
Zurück zum Zitat Hand DJ, Smyth P, Mannila H (2001) Principles of Data Mining. MIT Press, Cambridge, MA Hand DJ, Smyth P, Mannila H (2001) Principles of Data Mining. MIT Press, Cambridge, MA
Zurück zum Zitat Hearst MA (1992) Automatic acquisition of hyponyms from large text corpora. In: Proceedings of the Conference on Computational Linguistics, pp 539–545 CrossRef Hearst MA (1992) Automatic acquisition of hyponyms from large text corpora. In: Proceedings of the Conference on Computational Linguistics, pp 539–545 CrossRef
Zurück zum Zitat Hu M, Liu B (2004) Mining opinion features in customer reviews. In: Proceedings of the National Conference on Artificial Intelligence, pp 755–760 Hu M, Liu B (2004) Mining opinion features in customer reviews. In: Proceedings of the National Conference on Artificial Intelligence, pp 755–760
Zurück zum Zitat Hu J, Zeng HJ, Li H, Niu C, Chen Z (2007) Demographic prediction based on user’s browsing behavior. In: Proceedings of the International Conference on the World Wide Web. ACM Press, New York, NY, pp 151–160 CrossRef Hu J, Zeng HJ, Li H, Niu C, Chen Z (2007) Demographic prediction based on user’s browsing behavior. In: Proceedings of the International Conference on the World Wide Web. ACM Press, New York, NY, pp 151–160 CrossRef
Zurück zum Zitat Katayama T, Utsuro T, Sato Y, Yoshinaka T, Kawada Y, Fukuhara T (2009) An empirical study on selective sampling in active learning for splog detection. In: Proceedings of the International Workshop on Adversarial Information Retrieval on the Web. ACM Press, New York, NY, pp 29–36 Katayama T, Utsuro T, Sato Y, Yoshinaka T, Kawada Y, Fukuhara T (2009) An empirical study on selective sampling in active learning for splog detection. In: Proceedings of the International Workshop on Adversarial Information Retrieval on the Web. ACM Press, New York, NY, pp 29–36
Zurück zum Zitat Kushmerick N, Weld DS, Doorenbos RB (1997) Wrapper induction for information extraction. In: Proceedings of the International Joint Conferences on Artificial Intelligence, pp 729–737 Kushmerick N, Weld DS, Doorenbos RB (1997) Wrapper induction for information extraction. In: Proceedings of the International Joint Conferences on Artificial Intelligence, pp 729–737
Zurück zum Zitat Lin WH, Xing EP, Hauptmann AG (2008) A joint topic and perspective model for ideological discourse. In: Daelemans W, Goethals B, Morik K (eds) Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery. Lecture Notes in Computer Science, vol 5212. Springer, Berlin, pp 17–32 Lin WH, Xing EP, Hauptmann AG (2008) A joint topic and perspective model for ideological discourse. In: Daelemans W, Goethals B, Morik K (eds) Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery. Lecture Notes in Computer Science, vol 5212. Springer, Berlin, pp 17–32
Zurück zum Zitat Liu B (2007) Web Data Mining. Exploring Hyperlinks and Contents and Usage Data. Springer, Berlin MATH Liu B (2007) Web Data Mining. Exploring Hyperlinks and Contents and Usage Data. Springer, Berlin MATH
Zurück zum Zitat Liu H, Mihalcea R (2007) Of men and women and computers: Data-driven gender modeling for improved user interfaces. In: Proceedings of the International Conference on Weblogs Social Media, pp 121–128 Liu H, Mihalcea R (2007) Of men and women and computers: Data-driven gender modeling for improved user interfaces. In: Proceedings of the International Conference on Weblogs Social Media, pp 121–128
Zurück zum Zitat Lyon D (2007) Surveillance Studies: An Overview. Polity Press, Cambridge Lyon D (2007) Surveillance Studies: An Overview. Polity Press, Cambridge
Zurück zum Zitat Maedche A, Staab S (2001) Ontology learning for the semantic Web. IEEE Intelligent Systems 16(2):72–79 CrossRef Maedche A, Staab S (2001) Ontology learning for the semantic Web. IEEE Intelligent Systems 16(2):72–79 CrossRef
Zurück zum Zitat Matuszek C, Witbrock MJ, Kahlert RC, Cabral J, Schneider D, Shah P, Lenat DB (2005) Searching for common sense: Populating Cyc from the Web. In: Veloso MM, Kambhampati S (eds) Proceedings of the National Conference on Artificial Intelligence. AAAI/MIT Press, Cambridge, MA, pp 1430–1435 Matuszek C, Witbrock MJ, Kahlert RC, Cabral J, Schneider D, Shah P, Lenat DB (2005) Searching for common sense: Populating Cyc from the Web. In: Veloso MM, Kambhampati S (eds) Proceedings of the National Conference on Artificial Intelligence. AAAI/MIT Press, Cambridge, MA, pp 1430–1435
Zurück zum Zitat McGarry K (2005) A survey of interestingness measures for knowledge discovery. Knowledge Engineering Review 20(1):39–61 CrossRef McGarry K (2005) A survey of interestingness measures for knowledge discovery. Knowledge Engineering Review 20(1):39–61 CrossRef
Zurück zum Zitat Mladenic D (1998) Turning Yahoo! to automatic web-page classifier. In: Proceedings of the European Conference on Artificial Intelligence, pp 473–474 Mladenic D (1998) Turning Yahoo! to automatic web-page classifier. In: Proceedings of the European Conference on Artificial Intelligence, pp 473–474
Zurück zum Zitat Mobasher B (2007) Web usage mining. In: Liu B (ed) Web Data Mining: Exploring Hyperlinks and Contents and Usage Data. Springer, Berlin, pp 449–484. Chap 12 Mobasher B (2007) Web usage mining. In: Liu B (ed) Web Data Mining: Exploring Hyperlinks and Contents and Usage Data. Springer, Berlin, pp 449–484. Chap 12
Zurück zum Zitat Narayanan A, Shmatikov V (2008) Robust de-anonymization of large sparse datasets. In: Proceedings of the IEEE Symposium on Security and Privacy. IEEE Computer Society, Los Alamitos, pp 111–125 Narayanan A, Shmatikov V (2008) Robust de-anonymization of large sparse datasets. In: Proceedings of the IEEE Symposium on Security and Privacy. IEEE Computer Society, Los Alamitos, pp 111–125
Zurück zum Zitat Nissenbaum H (2004) Privacy as contextual integrity. Washington Law Review 79(1):119–158 Nissenbaum H (2004) Privacy as contextual integrity. Washington Law Review 79(1):119–158
Zurück zum Zitat Nonaka I, Takeuchi H (1995) The Knowledge-Creating Company: How Japanese Companies Create the Dynamics of Innovation. Oxford University Press, New York Nonaka I, Takeuchi H (1995) The Knowledge-Creating Company: How Japanese Companies Create the Dynamics of Innovation. Oxford University Press, New York
Zurück zum Zitat Pang B, Lee L (2008) Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval 2(1–2):1–135 CrossRef Pang B, Lee L (2008) Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval 2(1–2):1–135 CrossRef
Zurück zum Zitat Phillips D (2004) Privacy policy and PETs: The influence of policy regimes on the development and social implications of privacy enhancing technologies. New Media and Society 6(6):691–706 CrossRef Phillips D (2004) Privacy policy and PETs: The influence of policy regimes on the development and social implications of privacy enhancing technologies. New Media and Society 6(6):691–706 CrossRef
Zurück zum Zitat Piskorski J, Sydow M, Weiss D (2008) Exploring linguistic features for web spam detection: A preliminary study. In: Proceedings of the International Workshop on Adversarial Information Retrieval on the Web, pp 25–28 Piskorski J, Sydow M, Weiss D (2008) Exploring linguistic features for web spam detection: A preliminary study. In: Proceedings of the International Workshop on Adversarial Information Retrieval on the Web, pp 25–28
Zurück zum Zitat Popescu AM, Etzioni O (2005) Extracting product features and opinions from reviews. In: Proceedings of the Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing. The Association for Computational Linguistics, pp 339–346 CrossRef Popescu AM, Etzioni O (2005) Extracting product features and opinions from reviews. In: Proceedings of the Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing. The Association for Computational Linguistics, pp 339–346 CrossRef
Zurück zum Zitat Preibusch S (2006) Implementing privacy negotiations in e-commerce. In: Zhou X, Li J, Shen HT, Kitsuregawa M, Zhang Y (eds) Proceedings of the Asia-Pacific Web Conference. Lecture Notes in Computer Science, vol 3841. Springer, Berlin, pp 604–615 Preibusch S (2006) Implementing privacy negotiations in e-commerce. In: Zhou X, Li J, Shen HT, Kitsuregawa M, Zhang Y (eds) Proceedings of the Asia-Pacific Web Conference. Lecture Notes in Computer Science, vol 3841. Springer, Berlin, pp 604–615
Zurück zum Zitat Pyle D (1999) Data Preparation for Data Mining. Academic Press, San Diego, CA Pyle D (1999) Data Preparation for Data Mining. Academic Press, San Diego, CA
Zurück zum Zitat Sarjant S, Legg C, Robinson M, Medelyan O (2009) All you can eat ontology-building: Feeding wikipedia to Cyc. Web Intelligence 341–348 Sarjant S, Legg C, Robinson M, Medelyan O (2009) All you can eat ontology-building: Feeding wikipedia to Cyc. Web Intelligence 341–348
Zurück zum Zitat Stumme G, Hotho A, Berendt B (2006) Semantic web mining: State of the art and future directions. Journal of Web Semantics 4(2):124–143 CrossRef Stumme G, Hotho A, Berendt B (2006) Semantic web mining: State of the art and future directions. Journal of Web Semantics 4(2):124–143 CrossRef
Zurück zum Zitat Sweeney L (2002) K-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 10(5):557–570 MathSciNetMATHCrossRef Sweeney L (2002) K-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 10(5):557–570 MathSciNetMATHCrossRef
Zurück zum Zitat Urvoy T, Lavergne T, Filoche P (2006) Tracking web spam with hidden style similarity. In: Proceedings of the International Workshop on Adversarial Information Retrieval on the Web, pp 25–31 Urvoy T, Lavergne T, Filoche P (2006) Tracking web spam with hidden style similarity. In: Proceedings of the International Workshop on Adversarial Information Retrieval on the Web, pp 25–31
Zurück zum Zitat Wardlow DL (1996) Theory, Practice and Research Issues in Marketing: Gays, Lesbians and Consumer Behavior. Haworth Wardlow DL (1996) Theory, Practice and Research Issues in Marketing: Gays, Lesbians and Consumer Behavior. Haworth
Zurück zum Zitat Wu B, Davison BD (2005) Identifying link farm spam pages. In: Ellis A, Hagino T (eds) Proceedings of the International Conference on the World Wide Web. ACM Press, New York, NY, pp 820–829 CrossRef Wu B, Davison BD (2005) Identifying link farm spam pages. In: Ellis A, Hagino T (eds) Proceedings of the International Conference on the World Wide Web. ACM Press, New York, NY, pp 820–829 CrossRef
Zurück zum Zitat Zaïane OR (1998) From resource discovery to knowledge discovery on the internet. Tech Rep TR 1998–13, Simon Fraser University Zaïane OR (1998) From resource discovery to knowledge discovery on the internet. Tech Rep TR 1998–13, Simon Fraser University
Metadaten
Titel
Spam, Opinions, and Other Relationships: Towards a Comprehensive View of the Web Knowledge Discovery
verfasst von
Bettina Berendt
Copyright-Jahr
2011
Verlag
Springer Berlin Heidelberg
DOI
https://doi.org/10.1007/978-3-642-20946-8_3

Neuer Inhalt