nach oben

Erschienen in:

2011 | OriginalPaper | Buchkapitel

24. Information Quality and Relevance in Large-Scale Social Information Systems

verfasst von : Munmun De Choudhury

Erschienen in: Handbook of Data Intensive Computing

Verlag: Springer New York

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

As the surge of today’s pervasive social applications continues unabatedly, it has greatly expanded our horizons of putting the shared information artifacts to good use. Almost inconceivable scarcely a decade ago, on one hand, it has enabled researchers to study social processes on these systems at extremely large-scales. While on the other, it has streamlined the end user experience in terms of exploring real-time event based information ubiquitously via a variety of devices, almost anytime, anywhere. However, with several terrabytes of such information generated everyday, we are presented with the daunting question: how do we identify those pieces of information that are relevant and interesting? This book chapter sheds light on the significance, challenges associated with this problem domain and presents a case study geared towards addressing these challenges. Finally it identifies the impact of the vision to the larger data intensive computing community.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Design Space Exploration for Efficient Data Intensive Computing on SoCs

Nächstes Kapitel Geospatial Data Management with Terrafly

Huffington Post. Twitter User Statistics Revealed: http://www.huffingtonpost.com/2010/04/14/twitter-user-statistics-r_n_537992.html, Apr. 2010.

http://www.digitalbuzzblog.com/infographic-youtube-statistics-facts-figures/

http://www.datacenterknowledge.com/archives/2007/06/22/youtube-10-percent-of-all-internet-traffic/

http://mashable.com/2010/01/12/haiti-earthquake-pictures/

http://www.washingtontimes.com/news/2009/jun/16/irans-twitter-revolution/

http://www.telegraph.co.uk/science/science-news/8184149/Email-has-turned-us-into-lab-rats.html

Supported by the Huffington Post article: http://www.huffingtonpost.com/2010/04/14/twitter-user-statisticsr_n_537992.html, Apr. 2010.

http://www.bing.com/social/

http://www.search.twitter.com/

The diversity index of a sample population has been widely used by researchers in different areas ranging from economics, ecology and statistics, to measure the differences among members of the population consisting of various types of objects. Although there are a host of measures to estimate such diversity (e.g., species richness, concentration ratio, etc.), the most popular and robust measure by far is Shannon’s entropy based quantification [16]. This motivated us to utilize an information theoretic formulation to represent the diversity existing in social information spaces.

Note that we do not make apriori assumptions about what value of the diversity parameter is more desirable for the content selection task. Instead, diversity is a parameter in our experimental design, and we provide discussions on how the choice of its value affects the end-user’s perception of information consumption.

Although our proposed content selection technique can generate tweet sets of any given size, we considered sets of a reasonably small size (ten items) in our experimental design. The goal was to ensure that while going through the user study and evaluating different sets, the end-user participant was not overwhelmed by the quantity of information presented.

Dimitris Achlioptas, Aaron Clauset, David Kempe, and Cristopher Moore. On the bias of traceroute sampling: or, power-law degree distributions in regular graphs. In Proceedings of the thirty-seventh annual ACM symposium on Theory of computing, STOC ’05, pages 694–703, New York, NY, USA, 2005. ACM.

Eugene Agichtein, Carlos Castillo, Debora Donato, Aristides Gionis, and Gilad Mishne. Finding high-quality content in social media. In Proceedings of the international conference on Web search and web data mining, WSDM ’08, pages 183–194, New York, NY, USA, 2008. ACM.

Ricardo A. Baeza-Yates and Berthier Ribeiro-Neto. Modern Information Retrieval. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1999.

Suh B. Hong L. Chen J. Kairam S. Bernstein, M. and E.H. Chi. Eddi: Interactive topic-based browsing of social status streams. In ACM User Interface Software and Technology (UIST) conference, 2010. To appear.

Georg Buscher, Andreas Dengel, and Ludger van Elst. Query expansion using gaze-based feedback on the subdocument level. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR ’08, pages 387–394, New York, NY, USA, 2008. ACM.

Jilin Chen, Rowan Nairn, Les Nelson, Michael Bernstein, and Ed Chi. Short and tweet: experiments on recommending content from information streams. In CHI ’10: Proceedings of the 28th international conference on Human factors in computing systems, pages 1185–1194, New York, NY, USA, 2010. ACM.

Thomas M. Cover and Joy A. Thomas. Elements of information theory. Wiley-Interscience, New York, NY, USA, 1991.CrossRefMATH

M. Czerwinski, E. Horvitz, and E. Cutrell. Subjective duration assessment: An implicit probe for software usability. In Proceedings of IHM-HCI, pages 167–170, September 2001.

P J Daniels. Cognitive models in information retrieval—an evaluative review. J. Doc., 42: 272–304, December 1986.

10.

Gautam Das, Nick Koudas, Manos Papagelis, and Sushruth Puttaswamy. Efficient sampling of information in social networks. In SSM, pages 67–74, 2008.

11.

Anish Das Sarma, Atish Das Sarma, Sreenivas Gollapudi, and Rina Panigrahy. Ranking mechanisms in twitter-like forums. In Proceedings of the third ACM international conference on Web search and data mining, WSDM ’10, pages 21–30, New York, NY, USA, 2010. ACM.

12.

Munmun De Choudhury, Scott Counts, and Mary Czerwinski. Identifying relevant social media content: leveraging information diversity and user cognition. In Proceedings of the 22nd ACM conference on Hypertext and hypermedia, HT ’11, pages 161–170, New York, NY, USA, 2011. ACM.

13.

Munmun De Choudhury, Y-R Lin, Hari Sundaram, K.S. Candan, Lexing Xie, and Aisling Kelliher. How does the data sampling strategy impact the discovery of information diffusion in social media? In ICWSM ’10: Proceedings of the 4th International Conference on Weblogs and Social Media, Washington D.C., May 2010. AAAI Press, AAAI Press.

14.

Nigel Ford. Modeling cognitive processes in information seeking: from popper to pask. J. Am. Soc. Inf. Sci. Technol., 55:769–782, July 2004.

15.

O. Frank. Sampling and estimation in large social networks. Social Networks, 1(91):101, 1978.

16.

Lou Jost. Entropy and diversity. Oikos, 113(2):363–375, May 2006.CrossRefMathSciNet

17.

W. Kellogg. Information rates in sampling and quantization. Information Theory, IEEE Transactions on, 13(3):506 – 511, jul 1967.

18.

Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. What is twitter, a social network or a news media? In Proceedings of the 19th international conference on World wide web, WWW ’10, pages 591–600, New York, NY, USA, 2010. ACM.

19.

J. Leskovec and C. Faloutsos. Sampling from large graphs. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, page 636. ACM, 2006.

20.

Arun S. Maiya and Tanya Y. Berger-Wolf. Sampling community structure. In Proceedings of the 19th international conference on World wide web, WWW ’10, pages 701–710, New York, NY, USA, 2010. ACM.

21.

Qiaozhu Mei, Jian Guo, and Dragomir Radev. Divrank: the interplay of prestige and diversity in information networks. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’10, pages 1009–1018, New York, NY, USA, 2010. ACM.

22.

Owen Phelan, Kevin McCarthy, and Barry Smyth. Using twitter to recommend real-time topical news. In Proceedings of the third ACM conference on Recommender systems, RecSys ’09, pages 385–388, New York, NY, USA, 2009. ACM.

23.

P. Rusmevichientong, D.M. Pennock, S. Lawrence, and C.L. Giles. Methods for sampling pages uniformly from the world wide web. In AAAI Fall Symposium on Using Uncertainty Within Computation, pages 121–128, 2001.

24.

Takeshi Sakaki, Makoto Okazaki, and Yutaka Matsuo. Earthquake shakes twitter users: real-time event detection by social sensors. In Proceedings of the 19th international conference on World wide web, WWW ’10, pages 851–860, New York, NY, USA, 2010. ACM.

25.

Marc Smith, Vladimir Barash, Lise Getoor, and Hady W. Lauw. Leveraging social context for searching social media. In Proceeding of the 2008 ACM workshop on Search in social media, SSM ’08, pages 91–94, New York, NY, USA, 2008. ACM.

26.

S. M. Smith. Remembering in and out of context. Journal of Experimental Psychology: Human Learning and Memory, 5(5):460–471, 1979.CrossRef

27.

G. Sperling. A model for visual memory tasks. Human Factors, 5:19–31, 1963.

28.

D. Stutzbach, R. Rejaie, N. Duffield, S. Sen, and W. Willinger. Sampling techniques for large, dynamic graphs. In INFOCOM 2006: Proceedings of the 25th IEEE International Conference on Computer Communications, pages 1–6. IEEE, April 2006.

29.

Pertti Vakkari. Relevance and contributing information types of searched documents in task performance. In Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR ’00, pages 2–9, New York, NY, USA, 2000. ACM.

30.

Steve Whittaker and Candace Sidner. Email overload: exploring personal information management of email. In CHI ’96: Proceedings of the SIGCHI conference on Human factors in computing systems, pages 276–283, New York, NY, USA, 1996. ACM.

31.

Yunjie (Calvin) Xu and Zhiwei Chen. Relevance judgment: What do information users consider beyond topicality? J. Am. Soc. Inf. Sci. Technol., 57:961–973, May 2006.

32.

Judith Lynne Zaichkowsky. Measuring the involvement construct. Journal of Consumer Research: An Interdisciplinary Quarterly, 12(3):341–52, 1985.

Titel: Information Quality and Relevance in Large-Scale Social Information Systems
verfasst von: Munmun De Choudhury
Verlag: Springer New York
Buch: Handbook of Data Intensive Computing
Print ISBN: 978-1-4614-1414-8

Electronic ISBN: 978-1-4614-1415-5

Copyright-Jahr: 2011
DOI: https://doi.org/10.1007/978-1-4614-1415-5_24

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"