Skip to main content

2018 | OriginalPaper | Buchkapitel

Large Scale Retrieval of Social Network Pages by Interests of Their Followers

verfasst von : Elena Mikhalkova, Yuri Karyakin, Igor Glukhikh

Erschienen in: Computational Science – ICCS 2018

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Social networks provide an opportunity to form communities of people that share their interests on a regular basis (circles of fans of different music, books, kinds of sports, etc.). Every community manifests these interests creating lots of linguistic data to attract new followers to certain pages and support existing clusters of users. In the present article, we suggest a model of retrieving such pages that attract users with similar interests, from a large collection of pages. We test our model on three types of pages manually retrieved from the social network Vkontakte and classified as interesting for a. football fans, b. vegetarians, c. historical reenactors. We use such machine learning classifiers as Naive Bayes, SVM, Logistic Regression, Decision Trees to compare their performance with the performance of our system. It appears that the mentioned classifiers can hardly retrieve (i.e. single out) pages with a particular interest that form a small collection of 30 samples from a collection as large as 4,090 samples. In particular, our system exceeds their best result (F1-score = 0.65) and achieves F1-score of 0.72.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
The field is called so by [18, 39] and some other.
 
2
[27] evaluate importance of these types of linguistic content in user-modelling.
 
3
Unless they already know the page owner and follow them to confirm the previously established contact.
 
4
A good account of such algorithms is given by [15].
 
5
The sample was taken from a page where people discussed a concert of Madonna that they attended or read about. Some of them expressed discontent with her religious and political views, some vice versa expressed admiration. [14] calls such accidental interactions “quasi-groups”.
 
6
Therefore, it is important to understand what kind of content a user would like to get if they are looking for pages of interest. E.g. if a football fan is looking for other fans, do they need fans of a particular team? Which is usually the case of football fans. However, with the music or anime, they might be looking for more diverse communities - fans of different music bands, cartoons.
 
7
Model Texts are characterised by intense communication of multiple representatives of a social group and are often quite large. But if the text is too large, it becomes noisy. Empirically we found out that selection of approximately 1,000 features is most effective. However, this observation requires more research. Also, we observed that some Model Texts provide better results than other; often using two texts instead of one is more effective. However, unlike common supervised learning algorithms, with the increase in the number of Model Texts (even up to 10–15) our algorithm becomes less efficient.
 
8
These are conditions similar to what our algorithm requires. To extract features, it needs one or two Model Texts and a couple of non-class texts to extract uniques.
 
Literatur
1.
Zurück zum Zitat Agichtein, E., Brill, E., Dumais, S., Ragno, R.: Learning user interaction models for predicting web search result preferences. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 3–10. ACM (2006) Agichtein, E., Brill, E., Dumais, S., Ragno, R.: Learning user interaction models for predicting web search result preferences. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 3–10. ACM (2006)
2.
Zurück zum Zitat Ahmed, A., Low, Y., Aly, M., Josifovski, V., Smola, A.J.: Scalable distributed inference of dynamic user interests for behavioral targeting. In: KDD (2011) Ahmed, A., Low, Y., Aly, M., Josifovski, V., Smola, A.J.: Scalable distributed inference of dynamic user interests for behavioral targeting. In: KDD (2011)
3.
Zurück zum Zitat Al-Kouz, A., Albayrak, S.: An interests discovery approach in social networks based on semantically enriched graphs. In: 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 1272–1277. IEEE (2012) Al-Kouz, A., Albayrak, S.: An interests discovery approach in social networks based on semantically enriched graphs. In: 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 1272–1277. IEEE (2012)
4.
Zurück zum Zitat Bakalov, F., König-Ries, B., Nauerz, A., Welsch, M.: A hybrid approach to identifying user interests in web portals. In: IICS, pp. 123–134 (2009) Bakalov, F., König-Ries, B., Nauerz, A., Welsch, M.: A hybrid approach to identifying user interests in web portals. In: IICS, pp. 123–134 (2009)
5.
Zurück zum Zitat Bentley, A.F.: The Process of Government. Ripol Klassik, Moskva (1955) Bentley, A.F.: The Process of Government. Ripol Klassik, Moskva (1955)
7.
Zurück zum Zitat Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3(January), 993–1022 (2003)MATH Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3(January), 993–1022 (2003)MATH
8.
Zurück zum Zitat Bonhard, P., Sasse, M.A.: ‘Knowing me, knowing you’ - using profiles and social networking to improve recommender systems. BT Technol. J. 24(3), 84–98 (2006)CrossRef Bonhard, P., Sasse, M.A.: ‘Knowing me, knowing you’ - using profiles and social networking to improve recommender systems. BT Technol. J. 24(3), 84–98 (2006)CrossRef
9.
Zurück zum Zitat Brown, J., Broderick, A.J., Lee, N.: Word of mouth communication within online communities: conceptualizing the online social network. J. Interact. Mark. 21(3), 2–20 (2007)CrossRef Brown, J., Broderick, A.J., Lee, N.: Word of mouth communication within online communities: conceptualizing the online social network. J. Interact. Mark. 21(3), 2–20 (2007)CrossRef
10.
Zurück zum Zitat Dugan, C., Muller, M., Millen, D.R., Geyer, W., Brownholtz, B., Moore, M.: The Dogear game: a social bookmark recommender system. In: Proceedings of the 2007 International ACM Conference on Supporting Group Work, pp. 387–390. ACM (2007) Dugan, C., Muller, M., Millen, D.R., Geyer, W., Brownholtz, B., Moore, M.: The Dogear game: a social bookmark recommender system. In: Proceedings of the 2007 International ACM Conference on Supporting Group Work, pp. 387–390. ACM (2007)
11.
Zurück zum Zitat Firan, C.S., Nejdl, W., Paiu, R.: The benefit of using tag-based profiles. In: Web Conference, LA-WEB 2007. Latin American, pp. 32–41. IEEE (2007) Firan, C.S., Nejdl, W., Paiu, R.: The benefit of using tag-based profiles. In: Web Conference, LA-WEB 2007. Latin American, pp. 32–41. IEEE (2007)
12.
Zurück zum Zitat Fire, M., Puzis, R.: Organization mining using online social networks. Netw. Spat. Econ. 16(2), 545–578 (2016)MathSciNetCrossRef Fire, M., Puzis, R.: Organization mining using online social networks. Netw. Spat. Econ. 16(2), 545–578 (2016)MathSciNetCrossRef
13.
Zurück zum Zitat Fischer, G.: User modeling in human-computer interaction. User Model. User-Adap. Inter. 11(1), 65–86 (2001)CrossRef Fischer, G.: User modeling in human-computer interaction. User Model. User-Adap. Inter. 11(1), 65–86 (2001)CrossRef
14.
Zurück zum Zitat Frolov, S.: Sociology: personality and society. The main factors of personality development (1994) Frolov, S.: Sociology: personality and society. The main factors of personality development (1994)
15.
Zurück zum Zitat Gomaa, W.H., Fahmy, A.A.: A survey of text similarity approaches. Int. J. Comput. Appl. 68(13) (2013) Gomaa, W.H., Fahmy, A.A.: A survey of text similarity approaches. Int. J. Comput. Appl. 68(13) (2013)
16.
Zurück zum Zitat Groh, G., Ehmig, C.: Recommendations in taste related domains: collaborative filtering vs. social filtering. In: Proceedings of the 2007 International ACM Conference on Supporting Group Work, pp. 127–136. ACM (2007) Groh, G., Ehmig, C.: Recommendations in taste related domains: collaborative filtering vs. social filtering. In: Proceedings of the 2007 International ACM Conference on Supporting Group Work, pp. 127–136. ACM (2007)
17.
Zurück zum Zitat Guy, I., Zwerdling, N., Carmel, D., Ronen, I., Uziel, E., Yogev, S., Ofek-Koifman, S.: Personalized recommendation of social software items based on social relations. In: Proceedings of the Third ACM Conference on Recommender Systems, pp. 53–60. ACM (2009) Guy, I., Zwerdling, N., Carmel, D., Ronen, I., Uziel, E., Yogev, S., Ofek-Koifman, S.: Personalized recommendation of social software items based on social relations. In: Proceedings of the Third ACM Conference on Recommender Systems, pp. 53–60. ACM (2009)
18.
Zurück zum Zitat Li, X., Guo, L., Zhao, Y.E.: Tag-based social interest discovery. In: Proceedings of the 17th International Conference on World Wide Web, pp. 675–684. ACM (2008) Li, X., Guo, L., Zhao, Y.E.: Tag-based social interest discovery. In: Proceedings of the 17th International Conference on World Wide Web, pp. 675–684. ACM (2008)
19.
Zurück zum Zitat Li, Y., Dong, M., Huang, R.: Special interest groups discovery and semantic navigation support within online discussion forums. In: IEEE International Joint Conference on Neural Networks, IJCNN 2008. (IEEE World Congress on Computational Intelligence), pp. 3904–3911. IEEE (2008) Li, Y., Dong, M., Huang, R.: Special interest groups discovery and semantic navigation support within online discussion forums. In: IEEE International Joint Conference on Neural Networks, IJCNN 2008. (IEEE World Congress on Computational Intelligence), pp. 3904–3911. IEEE (2008)
20.
Zurück zum Zitat McCallum, A., Corrada-Emmanuel, A., Wang, X.: Topic and role discovery in social networks. In: IJCAI, vol. 5, pp. 786–791. Citeseer (2005) McCallum, A., Corrada-Emmanuel, A., Wang, X.: Topic and role discovery in social networks. In: IJCAI, vol. 5, pp. 786–791. Citeseer (2005)
21.
Zurück zum Zitat Merton, R.K.: Social structure and anomie. Am. Sociol. Rev. 3(5), 672–682 (1938)CrossRef Merton, R.K.: Social structure and anomie. Am. Sociol. Rev. 3(5), 672–682 (1938)CrossRef
22.
Zurück zum Zitat Mikhalkova, E., Karyakin, Y., Ganzherli, N.: A comparative analysis of social network pages by interests of their followers. arXiv preprint arXiv:1707.05481v2 (2017) Mikhalkova, E., Karyakin, Y., Ganzherli, N.: A comparative analysis of social network pages by interests of their followers. arXiv preprint arXiv:​1707.​05481v2 (2017)
23.
Zurück zum Zitat Newman, M.E., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69(2), 026113 (2004)CrossRef Newman, M.E., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69(2), 026113 (2004)CrossRef
24.
Zurück zum Zitat Pazzani, M.J.: A framework for collaborative, content-based and demographic filtering. Artif. Intell. Rev. 13(5–6), 393–408 (1999)CrossRef Pazzani, M.J.: A framework for collaborative, content-based and demographic filtering. Artif. Intell. Rev. 13(5–6), 393–408 (1999)CrossRef
26.
Zurück zum Zitat Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)MathSciNetMATH Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)MathSciNetMATH
27.
Zurück zum Zitat Piao, G., Breslin, J.G.: Interest representation, enrichment, dynamics, and propagation: a study of the synergetic effect of different user modeling dimensions for personalized recommendations on Twitter. In: Blomqvist, E., Ciancarini, P., Poggi, F., Vitali, F. (eds.) EKAW 2016. LNCS (LNAI), vol. 10024, pp. 496–510. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49004-5_32CrossRef Piao, G., Breslin, J.G.: Interest representation, enrichment, dynamics, and propagation: a study of the synergetic effect of different user modeling dimensions for personalized recommendations on Twitter. In: Blomqvist, E., Ciancarini, P., Poggi, F., Vitali, F. (eds.) EKAW 2016. LNCS (LNAI), vol. 10024, pp. 496–510. Springer, Cham (2016). https://​doi.​org/​10.​1007/​978-3-319-49004-5_​32CrossRef
28.
Zurück zum Zitat Piao, S., Whittle, J.: A feasibility study on extracting Twitter users’ interests using NLP tools for serendipitous connections. In: 2011 IEEE Third International Conference on Privacy, Security, Risk and Trust (PASSAT) and 2011 IEEE Third International Conference on Social Computing (SocialCom), pp. 910–915. IEEE (2011) Piao, S., Whittle, J.: A feasibility study on extracting Twitter users’ interests using NLP tools for serendipitous connections. In: 2011 IEEE Third International Conference on Privacy, Security, Risk and Trust (PASSAT) and 2011 IEEE Third International Conference on Social Computing (SocialCom), pp. 910–915. IEEE (2011)
29.
Zurück zum Zitat Ramage, D., Dumais, S.T., Liebling, D.J.: Characterizing microblogs with topic models. ICWSM 10(1), 16 (2010) Ramage, D., Dumais, S.T., Liebling, D.J.: Characterizing microblogs with topic models. ICWSM 10(1), 16 (2010)
30.
Zurück zum Zitat Reicher, S.: The determination of collective behaviour. Soc. Ident. Intergroup Relat., pp. 41–83 (1982) Reicher, S.: The determination of collective behaviour. Soc. Ident. Intergroup Relat., pp. 41–83 (1982)
31.
Zurück zum Zitat Scott, J.: Social Network Analysis. SAGE Publications, Thousand Oaks (2017) Scott, J.: Social Network Analysis. SAGE Publications, Thousand Oaks (2017)
32.
Zurück zum Zitat Sen, S., Vig, J., Riedl, J.: Tagommenders: connecting users to items through tags. In: Proceedings of the 18th International Conference on World Wide Web, pp. 671–680. ACM (2009) Sen, S., Vig, J., Riedl, J.: Tagommenders: connecting users to items through tags. In: Proceedings of the 18th International Conference on World Wide Web, pp. 671–680. ACM (2009)
33.
Zurück zum Zitat Shen, W., Wang, J., Luo, P., Wang, M.: Linking named entities in tweets with knowledge base via user interest modeling. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 68–76. ACM (2013) Shen, W., Wang, J., Luo, P., Wang, M.: Linking named entities in tweets with knowledge base via user interest modeling. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 68–76. ACM (2013)
34.
Zurück zum Zitat Shi, L.L., Liu, L., Wu, Y., Jiang, L., Hardy, J.: Event detection and user interest discovering in social media data streams. IEEE Access 5, 20953–20964 (2017)CrossRef Shi, L.L., Liu, L., Wu, Y., Jiang, L., Hardy, J.: Event detection and user interest discovering in social media data streams. IEEE Access 5, 20953–20964 (2017)CrossRef
35.
Zurück zum Zitat Stefani, A., Strapparava, C.: Exploiting NLP techniques to build user model for web sites: the use of WordNet in SiteIF project. In: Proceedings of the 2nd Workshop on Adaptive Systems and User Modeling on the WWW (1999) Stefani, A., Strapparava, C.: Exploiting NLP techniques to build user model for web sites: the use of WordNet in SiteIF project. In: Proceedings of the 2nd Workshop on Adaptive Systems and User Modeling on the WWW (1999)
36.
Zurück zum Zitat Szomszor, M., Alani, H., Cantador, I., O’Hara, K., Shadbolt, N.: Semantic modelling of user interests based on cross-folksonomy analysis. In: Sheth, A., Staab, S., Dean, M., Paolucci, M., Maynard, D., Finin, T., Thirunarayan, K. (eds.) ISWC 2008. LNCS, vol. 5318, pp. 632–648. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88564-1_40CrossRef Szomszor, M., Alani, H., Cantador, I., O’Hara, K., Shadbolt, N.: Semantic modelling of user interests based on cross-folksonomy analysis. In: Sheth, A., Staab, S., Dean, M., Paolucci, M., Maynard, D., Finin, T., Thirunarayan, K. (eds.) ISWC 2008. LNCS, vol. 5318, pp. 632–648. Springer, Heidelberg (2008). https://​doi.​org/​10.​1007/​978-3-540-88564-1_​40CrossRef
37.
Zurück zum Zitat Volkova, S., Coppersmith, G., Van Durme, B.: Inferring user political preferences from streaming communications. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Long Papers), vol. 1, pp. 186–196 (2014) Volkova, S., Coppersmith, G., Van Durme, B.: Inferring user political preferences from streaming communications. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Long Papers), vol. 1, pp. 186–196 (2014)
38.
Zurück zum Zitat Wang, Q., Xu, J., Li, H.: User message model: a new approach to scalable user modeling on microblog. In: Jaafar, A., Mohamad Ali, N., Mohd Noah, S.A., Smeaton, A.F., Bruza, P., Bakar, Z.A., Jamil, N., Sembok, T.M.T. (eds.) AIRS 2014. LNCS, vol. 8870, pp. 209–220. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-12844-3_18CrossRef Wang, Q., Xu, J., Li, H.: User message model: a new approach to scalable user modeling on microblog. In: Jaafar, A., Mohamad Ali, N., Mohd Noah, S.A., Smeaton, A.F., Bruza, P., Bakar, Z.A., Jamil, N., Sembok, T.M.T. (eds.) AIRS 2014. LNCS, vol. 8870, pp. 209–220. Springer, Cham (2014). https://​doi.​org/​10.​1007/​978-3-319-12844-3_​18CrossRef
39.
Zurück zum Zitat Xu, S., Shi, Q., Qiao, X., Zhu, L., Zhang, H., Jung, H., Lee, S., Choi, S.P.: Adynamic users’ interest discovery model with distributed inference algorithm. Int. J. Distrib. Sens. Netw. 10(4), Article ID 280892 (2014)CrossRef Xu, S., Shi, Q., Qiao, X., Zhu, L., Zhang, H., Jung, H., Lee, S., Choi, S.P.: Adynamic users’ interest discovery model with distributed inference algorithm. Int. J. Distrib. Sens. Netw. 10(4), Article ID 280892 (2014)CrossRef
40.
Zurück zum Zitat Yang, J., Leskovec, J.: Defining and evaluating network communities based on ground-truth. Knowl. Inf. Syst. 42(1), 181–213 (2015)CrossRef Yang, J., Leskovec, J.: Defining and evaluating network communities based on ground-truth. Knowl. Inf. Syst. 42(1), 181–213 (2015)CrossRef
Metadaten
Titel
Large Scale Retrieval of Social Network Pages by Interests of Their Followers
verfasst von
Elena Mikhalkova
Yuri Karyakin
Igor Glukhikh
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-93698-7_18