Skip to main content
Log in

Webometrics benefitting from web mining? An investigation of methods and applications of two research fields

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

Webometrics and web mining are two fields where research is focused on quantitative analyses of the web. This literature review outlines definitions of the fields, and then focuses on their methods and applications. It also discusses the potential of closer contact and collaboration between them. A key difference between the fields is that webometrics has focused on exploratory studies, whereas web mining has been dominated by studies focusing on development of methods and algorithms. Differences in type of data can also be seen, with webometrics more focused on analyses of the structure of the web and web mining more focused on web content and usage, even though both fields have been embracing the possibilities of user generated content. It is concluded that research problems where big data is needed can benefit from collaboration between webometricians, with their tradition of exploratory studies, and web miners, with their tradition of developing methods and algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. Source: http://www.go2web20.net/. See Appendix for a list of used social web terms.

  2. A graph is considered directed if the edges imply a direction, e.g. page A links to page B.

  3. The linkdomain command was useful for finding all pages linking to any page belonging to a given web site.

  4. An API is an interface set up by a service provider, which can be used to query the provider for certain data.

References

  • Aguillo, I. (2009). Measuring the institution’s footprint in the web. Library Hi Tech, 27(4), 540–556.

    Google Scholar 

  • Aguillo, I. F., Granadino, B., Ortega, J. L., & Prieto, J. A. (2006). Scientific research activity and communication measured with cybermetrics indicators. Journal of the American Society for Information Science and Technology, 57(10), 1296–1302.

    Google Scholar 

  • Ai, D., Zhang, Y., Zuo, H., & Wang, Q. (2006). Web content mining for market intelligence acquiring from B2C websites. In L. Feng, et al. (Eds.), WISE 2006 Workshops, LNCS 4256 (pp. 159–170). Berlin: Springer-Verlag.

    Google Scholar 

  • Akcora, C. G., Bayir, M. A., Demirbas, M. & Ferhatosmanoglu, H. (2010). Identifying breakpoints in public opinion. SOMA 2010: Proceedings of the 1st Workshop on Social Media Analytics (pp. 62–66).

  • Algur, S. P., Patil, A. P., Hiremath, P. S. & Shivashankar, S. (2010). Conceptual level similarity measure based review spam detection. Proceedings of the 2010 International Conference on Signal and Image Processing, ICSIP 2010 (pp. 416–423).

  • Almind, T. C., & Ingwersen, P. (1997). Informetric analyses on the World Wide Web: Methodological approaches to ‘Webometrics’. Journal of Documentation, 53(4), 404–426.

    Google Scholar 

  • Alsaleh, S., Nayak, R., Xu, Y., & Chen, L. (2011). Improving matching process in social network using implicit and explicit user information. Lecture Notes in Computer Science, 6612, 313–320.

    Google Scholar 

  • Aminpour, F., Kabiri, P., Otroj, Z., & Keshtkar, A. A. (2009). Webometric analysis of Iranian universities of medical sciences. Scientometrics, 80(1), 253–264.

    Google Scholar 

  • Angus, E., Thelwall, M., & Stuart, D. (2008). General patterns of tag usage among university groups in Flickr. Online Information Review, 32(1), 89–101.

    Google Scholar 

  • Arbelaitz, O., Gurrutxaga, I., Lojo, A., Muguerza, J., Pérez, J. M., & Perona, I. (2013). Web usage and content mining to extract knowledge for modelling the users of the Bidasoa Turismo website and to adapt it. Expert Systems with Applications, 40, 7478–7491.

    Google Scholar 

  • Asadi, M., & Shekofteh, M. (2009). The relationship between the research activity of Iranian medical universities and their web impact factor. Electronic Library, 27(6), 1026–1043.

    Google Scholar 

  • Asur, S. & Huberman, B. A. (2010). Predicting the future with social media. Proceedings 2010 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2010, Vol. 1(pp. 492–499).

  • Ball, R., Mittermaier, B., & Tunger, D. (2009). Creation of journal-based publication profiles of scientific institutions: A methodology for the interdisciplinary comparison of scientific research based on the J-factor. Scientometrics, 81(2), 381–392.

    Google Scholar 

  • Bar-Ilan, J. (2004). A microscopic link analysis of academic institutions within a country: The case of Israel. Scientometrics, 59(3), 391–403.

    Google Scholar 

  • Bar-Ilan, J. (2008). Informetrics at the beginning of the 21st century: A review. Journal of Informetrics, 2, 1–52.

    Google Scholar 

  • Barjak, F., Li, X., & Thelwall, M. (2007). Which factors explain the Web impact of scientists’ personal homepages? Journal of the American Society for Information Science and Technology, 58(2), 200–211.

    Google Scholar 

  • Barragáns-Martínez, A. B., Costa-Montenegro, E., Burguillo, J. C., Rey-López, M., Mikic-Fonte, F. A., & Peleteiro, A. (2010). A hybrid content-based and item-based collaborative filtering approach to recommend TV programs enhanced with singular value decomposition. Information Sciences, 180(22), 4290–4311.

    Google Scholar 

  • Bastian, M., Heymann, S., Jacomy, M. (2009). Gephi: An open source software for exploring and manipulating networks. International AAAI Conference on Weblogs and Social Media.

  • Bayir, M. A., Toroslu, I. H., Demirbas, M., & Cosar, A. (2012). Discovering better navigation sequences for the session construction problem. Data and Knowledge Engineering, 73, 58–72.

    Google Scholar 

  • Becher, T., & Trowler, P. R. (2001). Academic tribes and territories: intellectual enquiry and the culture of disciplines (2nd ed.). Philadelphia, PA: Open University Press.

    Google Scholar 

  • Biehl, M., Kim, H., & Wade, M. (2006). Relationships among the academic business disciplines: A multi-method citation analysis. Omega, 34(4), 359–371.

    Google Scholar 

  • Bifet, A., & Frank, E. (2010). Sentiment knowledge discovery in Twitter streaming data. Lecture Notes in Computer Science, 6332, 1–15.

    Google Scholar 

  • Biuk-Aghai, R. P., Tang, L. V.-S., Fong, S., & Si, Y.-W. (2009). Wikis as digital ecosystems: An analysis based on authorship. 2009 3rd IEEE International Conference on Digital Ecosystems and Technologies, DEST ‘09 (pp. 581–586).

  • Björneborn, L. (2006). ‘Mini small worlds’ of shortest link paths crossing domain boundaries in an academic Web space. Scientometrics, 68(3), 395–414.

    Google Scholar 

  • Björneborn, L., & Ingwersen, P. (2001). Perspectives of webometrics. Scientometrics, 50(1), 65–82.

    Google Scholar 

  • Björneborn, L., & Ingwersen, P. (2004). Toward a basic framework for webometrics. Journal of the American Society for Information Science and Technology, 55(14), 1216–1227.

    Google Scholar 

  • Blondel, V. D., Guillaume, J.-L., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics, 10, 1–12.

    Google Scholar 

  • Borges, J., & Levene, M. (2006). Ranking pages by topology and popularity within web sites. World Wide Web: Internet and Web Information Systems, 9(3), 301–316.

    Google Scholar 

  • Breese, J. S., Heckerman, D., & Kadie, C. (1999). Empirical analysis of predictive algorithms for collaborative filtering. Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence (pp. 43–52).

  • Brejla, P., & Gilbert, D. (2012). An exploratory use of web content analysis to understand cruise tourism services. International Journal of Tourism Research. doi:10.1002/jtr.1910.

  • Brin, S., & Page, L. (1998). The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems, 30(1–7), 107–117.

    Google Scholar 

  • Canny, J. (2002). Collaborative filtering with privacy via factor analysis. SIGIR Forum, 2002, 238–245.

    Google Scholar 

  • Chau, M., & Xu, J. (2007). Mining communities and their relationships in blogs: A study of online hate groups. International Journal of Human-Computer Studies, 65(1), 57–70.

    Google Scholar 

  • Chen, H., & Chau, M. (2004). Web mining: Machine learning for web applications. Annual Review of Information Science and Technology, 38, 289–329 + xvii–xviii.

    Google Scholar 

  • Cheong, M., & Lee, V. C. S. (2011). A microblogging-based approach to terrorism informatics: Exploration and chronicling civilian sentiment and response to terrorism events via Twitter. Information Systems Frontiers, 13(1), 45–59.

    Google Scholar 

  • Cho, S. E., & Park, H. W. (2012). Government organizations’ innovative use of the Internet: The case of the Twitter activity of South Korea’s Ministry for Food, Agriculture, Forestry and Fisheries. Scientometrics, 90(1), 9–23.

    Google Scholar 

  • Chou, P.-H., Li, P.-H., Chen, K.-K., & Wu, M.-J. (2010). Integrating web mining and neural network for personalized e-commerce automatic service. Expert Systems with Applications, 37(4), 2898–2910.

    Google Scholar 

  • Cooley, R., Mobasher, B., & Srivastava, J. (1997). Web mining: Information and pattern discovery on the world wide web. In International Conference on Tools with Artificial Intelligence (pp. 558–567).

  • Da Costa Jr, M. G., & Gong, Z. (2005). Web structure mining: An introduction. ICIA 2005 Proceedings of 2005 International Conference on Information Acquisition, Vol. 2005 (pp. 590–595).

  • Das, R., & Turkoglu, I. (2009). Creating meaningful data from web logs for improving the impressiveness of a website by using path analysis method. Expert Systems with Applications, 36(3), 6635–6644.

    Google Scholar 

  • Deshpande, M., & Karypis, G. (2004). Item-based top-N recommendation algorithms. ACM Transactions on Information Systems, 22(1), 143–177.

    Google Scholar 

  • Didegah, F., & Goltaji, M. (2010). Link analysis and impact of top universities of Islamic world on the world wide web. Library Hi Tech News, 27(8), 12–16.

    Google Scholar 

  • Duane Ireland, R., & Webb, J. W. (2007). A cross-disciplinary exploration of entrepreneurship research. Journal of Management, 33(6), 891–927.

    Google Scholar 

  • Efron, M. (2011). Information search and retrieval in microblogs. Journal of the American Society for Information Science and Technology, 62(6), 996–1008.

    MathSciNet  Google Scholar 

  • Eirinaki, M., & Vazirgiannis, M. (2003). Web mining for web personalization. ACM Transactions on Internet Technology, 3(1), 1–27.

    Google Scholar 

  • Erfanmanesh, M., & Didegah, F. (2011). Visibility and impact of Iranian research institutions on the web. Library Hi Tech News, 28(1), 4–9.

    Google Scholar 

  • Etzioni, O. (1996). The world-wide web: Quagmire or gold mine? Communications of the ACM, 39(11), 65–68.

    Google Scholar 

  • Facca, F. M., & Lanzi, P. L. (2005). Mining interesting knowledge from weblogs: A survey. Data and Knowledge Engineering, 53(3), 225–241.

    Google Scholar 

  • Fernández, J., Boldrini, E., Gómez, J. M., & Martínez-Barco, P. (2011). Evaluating EmotiBlog robustness for sentiment analysis tasks. Lecture Notes in Computer Science, 6716, 290–294.

    Google Scholar 

  • Fischer, A. R. H., Tobi, H., & Ronteltap, A. (2011). When natural met Social: A review of collaboration between the natural and social sciences. Interdisciplinary Science Reviews, 36(4), 341–358.

    Google Scholar 

  • Glass, R. L., Ramesh, V., & Vessey, I. (2004). An analysis of research in computing disciplines. Communications of the ACM, 47(6), 89–94.

    Google Scholar 

  • Gruzd, A., Black, F. A., Le, T. N. Y., & Amos, K. (2012). Investigating biomedical research literature in the blogosphere: A case study of diabetes and glycated hemoglobin (HbA1c). Journal of the Medical Library Association, 100(1), 34–42.

    Google Scholar 

  • Guerbas, A., Addam, O., Zaarour, O., Nagi, M., Elhajj, A., Ridley, M., et al. (2013). Effective web log mining and online navigational pattern prediction. Knowledge-Based Systems, 49, 50–62.

    Google Scholar 

  • Hale, S. A. (2012). Net increase? Cross-lingual linking in the blogosphere. Journal of Computer-Mediated Communication, 17(2), 135–151.

    MathSciNet  Google Scholar 

  • He, Q. (1999). Knowledge discovery through co-word analysis. Library Trends, 48(1), 133–159.

    Google Scholar 

  • Hofmann, T. (2004). Latent semantic models for collaborative filtering. ACM Transactions on Information Systems, 22(1), 89–115.

    Google Scholar 

  • Holloway, T., Bozicevic, M., & Börner, K. (2007). Analyzing and visualizing the semantic coverage of Wikipedia and its authors. Complexity, 12(3), 30–40.

    Google Scholar 

  • Holmberg, K. (2010). Co-inlinking to a municipal Web space: A webometric and content analysis. Scientometrics, 83(3), 851–862.

    Google Scholar 

  • Holmberg, K., & Thelwall, M. (2009). Local government web sites in Finland: A geographic and webometric analysis. Scientometrics, 79(1), 157–169.

    Google Scholar 

  • Hsu, C.-L., & Park, H. W. (2011). Sociology of hyperlink networks of web 1.0, web 2.0, and twitter: A case study of South Korea. Social Science Computer Review, 29(3), 354–368.

    Google Scholar 

  • Hsu, C.-L., & Park, H. W. (2012). Mapping online social networks of Korean politicians. Government Information Quarterly, 29(2), 169–181.

    Google Scholar 

  • Huang, Z., Chen, H., & Zeng, D. (2004). Applying associative retrieval techniques to alleviate the sparsity problem in collaborative filtering. ACM Transactions on Information Systems, 22(1), 116–142.

    Google Scholar 

  • Ingwersen, P. (1998). The calculation of web impact factors. Journal of Documentation, 54(2), 236–243.

    Google Scholar 

  • Islam, M. A. (2011). Webometrics study of universities in Bangladesh. Annals of Library and Information Studies, 58(4), 307–318.

    Google Scholar 

  • Islam, M. A., & Alam, M. S. (2011). Webometric study of private universities in Bangladesh. Malaysian Journal of Library and Information Science, 16(2), 115–126.

    Google Scholar 

  • Jonkers, K., De Moya Anegon, F., & Aguillo, I.-F. (2012). Measuring the usage of e-research infrastructure as an indicator of research activity. Journal of the American Society for Information Science and Technology, 63(7), 1374–1382.

    Google Scholar 

  • Kajikawa, Y. & Mori, J. (2009). Interdisciplinary Research Detection by Citation Indicators. International Conference on Industrial Engineering and Engineering Management 2009 (IEEM2009) in Hong Kong. (December 8–11, 2009).

  • Kirby, J. A., Hoadley, C. M., & Carr-Chellman, A. A. (2005). Instructional systems design and the learning sciences: A citation analysis. ETR&D-Educational Technology Research and Development, 53(1), 37–48.

    Google Scholar 

  • Kleinberg, J. M. (1999). Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5), 604–632.

    MATH  MathSciNet  Google Scholar 

  • Kontopoulos, E., Berberidis, C., Dergiades, T., & Bassiliades, N. (2013). Ontology-based sentiment analysis of twitter posts. Expert Systems with Applications, 40, 4065–4074.

    Google Scholar 

  • Kosala, R., & Blockeel, H. (2000). Web mining research: A survey. ACM SIGKDD Explorations, 2(11), 1–15.

    Google Scholar 

  • Kretschmer, H., & Aguillo, I. F. (2005). New indicators for gender studies in web networks. Information Processing and Management, 41(6), 1481–1494.

    Google Scholar 

  • Ku, L.-W., & Chen, H.-H. (2007). Mining opinions from the web: Beyond relevance retrieval. Journal of the American Society for Information Science and Technology, 58(12), 1838–1850.

    Google Scholar 

  • Kumar, G. D., & Gosul, M. (2011). Web mining research and future directions. Communications in Computer and Information Science, 196, 489–496.

    Google Scholar 

  • Kundu, S. (2012). An intelligent approach of web data mining. International Journal on Computer Science and Engineering., 4(5), 919–928.

    Google Scholar 

  • Lai, Y., & Zeng, J. (2013). A cross-language personalized recommendation model in digital libraries. The Electronic Library, 31(3), 264–277.

    MathSciNet  Google Scholar 

  • Lambiotte, R., Delvenne, J.-C., & Barahona, M. (2009). Laplacian dynamics and multiscale modular structure in networks. arXiv. Retrieved October 10, 2013 from http://arxiv.org/abs/0812.1770.

  • Lang, P. B., Gouveia, F. C., & Leta, J. (2010). Site co-link analysis applied to small networks: a new methodological approach. Scientometrics, 83(1), 157–166.

    Google Scholar 

  • Lang, P. B., Gouveia, F. C., & Leta, J. (2013). Cooperation in health: Mapping collaborative networks on the web. PLoS One, 8(8), e71415.

    Google Scholar 

  • Laniado, D., & Tasso, R. (2011). Co-authorship 2.0-Patterns of collaboration in Wikipedia. HT 2011 Proceedings of the 22nd ACM Conference on Hypertext and Hypermedia (pp. 201–210).

  • Lappas, G. (2007). An overview of web mining in societal benefit areas. Online Information Review, 32(2), 179–195.

    Google Scholar 

  • Li, H.-F. (2009). A sliding window method for finding top-k path traversal patterns over streaming web click-sequences. Expert Systems with Applications, 36(3), 4382–4386.

    Google Scholar 

  • Li, Y.-M., Lai, C.-Y., & Chen, C.-W. (2009). Identifying bloggers with marketing influence in the blogosphere. ACM International Conference Proceeding Series (pp. 335–340).

  • Lin, S.-H., Chu, K.-P., & Chiu, C.-M. (2011). Automatic sitemaps generation: Exploring website structures using block extraction and hyperlink analysis. Expert Systems with Applications, 38(4), 3944–3958.

    Google Scholar 

  • Malinský, R., & Jelínek, I. (2010). Improvements of Webometrics by using sentiment analysis for better accessibility of the web. Lecture Notes in Computer Science, 6385, 581–586.

    Google Scholar 

  • Martínez-Ruiz, A., & Thelwall, M. (2010). The importance of technology and R&D expenditures in the visibility of the firms on the web: An exploratory study. Cybermetrics International Journal of Scientometrics, Informetrics and Bibliometrics, 14(1), 2.

    Google Scholar 

  • Martínez-Torres, M. R., & Díaz-Fernández, M. C. (2013). A study of global and local visibility as web indicators of research production. Research Evaluation, 22, 157–168.

    Google Scholar 

  • Martínez-Torres, M. R., Toral, S. L., Palacios, B., & Barrero, F. (2012). An evolutionary factor analysis computation for mining website structures. Expert Systems with Applications, 39(14), 11623–11633.

    Google Scholar 

  • Milgram, S. (1967). The small-world problem. Psychology Today, 1(1), 60–67.

    Google Scholar 

  • Miller, B. N., Konstan, J. A., & Riedl, J. (2004). PocketLens: Toward a personal recommender system. ACM Transactions on Information Systems, 22(3), 437–476.

    Google Scholar 

  • Minguillo, D., & Thelwall, M. (2012). Mapping the network structure of science parks: An exploratory study of cross-sectoral interactions reflected on the web. Aslib Proceedings: New Information Perspectives, 64(4), 332–357.

    Google Scholar 

  • Mobasher, B., Cooley, R., & Srivastava, J. (2000). Automatic personalization based on web usage mining. Communications of the ACM, 43(8), 142–151.

    Google Scholar 

  • Mobasher, B., Dai, H., Luo, T., & Nakagawa, M. (2002). Discovery and evaluation of aggregate usage profiles for web personalization. Data Mining and Knowledge Discovery, 6(1), 61–82.

    MathSciNet  Google Scholar 

  • Moghaddam, S., & Ester, M. (2010). Opinion digger: An unsupervised opinion miner from unstructured product reviews. International Conference on Information and Knowledge Management, Proceedings (pp. 1825–1828).

  • Nam, Y., Lee, Y.-O., & Park, H. W. (2013). Can web ecology provide a clearer understanding of people’s information behavior during election campaigns? Social Science Information, 52(1), 91–109.

    Google Scholar 

  • Nasraoui, O., Rojas, C., & Cardona, C. (2006). A framework for mining evolving trends in web data streams using dynamic learning and retrospective validation. Computer Networks, 50(10, SI), 1488–1512.

    Google Scholar 

  • Nasraoui, O., Soliman, M., Saka, E., Badia, A., & Germain, R. (2008). A web usage mining framework for mining evolving user profiles in dynamic web sites. IEEE Transactions on Knowledge and Data Engineering, 20(2), 202–215.

    Google Scholar 

  • Nekaris, K. A.-I., Campbell, N., Coggins, T. G., Johanna Rode, E., & Nijman, V. (2013). Tickled to death: Analysing public perceptions of ‘cute’ videos of threatened species (Slow lorisesNycticebus spp.) on web 2.0 sites. PLoS One, 8(7), e69215.

    Google Scholar 

  • Noruzi, A. (2005). Web impact factors for Iranian Universities. Webology, 2(1), 51.

    Google Scholar 

  • Noruzi, A. (2006). The web impact factor: A critical review. Electronic Library, 24(4), 490–500.

    Google Scholar 

  • Nwagwu, W. E., & Agarin, O. (2008). Nigerian University websites: A webometric analysis. Webology, 5(4), 1–20.

    Google Scholar 

  • Orduña-Malea, E. (2012). Graphic, multimedia, and blog content presence in the Spanish academic web-space. Cybermetrics International Journal of Scientometrics, Informetrics and Bibliometrics, 16(1), 3.

    Google Scholar 

  • Ortega, J. L., & Aguillo, I. F. (2007). Interdisciplinary relationships in the Spanish academic web space: A webometric study through networks visualization. Cybermetrics International Journal of Scientometrics, Informetrics and Bibliometrics, 11(1), 4.

    Google Scholar 

  • Ortega, J. L., & Aguillo, I. F. (2008). Visualization of the Nordic academic web: Link analysis using social network tools. Information Processing and Management, 44(4), 1624–1633.

    Google Scholar 

  • Ortega, J. L., & Aguillo, I. F. (2009). Mapping world-class universities on the web. Information Processing and Management, 45(2), 272–279.

    Google Scholar 

  • Ortega, J. L., Aguillo, I., Cothey, V., & Scharnhorst, A. (2008). Maps of the academic web in the European Higher Education Area: An exploration of visual web indicators. Scientometrics, 74(2), 295–308.

    Google Scholar 

  • Otte, E., & Rousseau, R. (2002). Social network analysis: A powerful strategy, also for the information sciences. Journal of Information Science, 28(6), 441–453.

    Google Scholar 

  • Ou, J.-C., Lee, C.-H., & Chen, M.-S. (2008). Efficient algorithms for incremental web log mining with dynamic thresholds. VLDB Journal, 17(4), 827–845.

    Google Scholar 

  • Paliouras, G. (2012). Discovery of web user communities and their role in personalization. User Modelling and User-Adapted Interaction, 22(1–2), 151–175.

    Google Scholar 

  • Palmer, J. W. (2002). Web site usability, design, and performance metrics. Information Systems Research, 13(2), 151–167.

    Google Scholar 

  • Panchal, V., Pillai, S., & Singh, A. (2012). Truth finder algorithm for multiple conflicting information providers on the web. International Journal of Computer Applications, 5, 1–4.

    Google Scholar 

  • Park, H.-W. (2010). Mapping the e-science landscape in South Korea using the webometrics method. Journal of Computer-Mediated Communication, 15(2), 211–229.

    Google Scholar 

  • Park, H.-W., & Kluver, R. (2009). Trends in online networking among South Korean politicians: A mixed-method approach. Government Information Quarterly, 26(3), 505–515.

    Google Scholar 

  • Park, H.-W., & Thelwall, M. (2008). Link analysis: Hyperlink patterns and social structure on politicians’ web sites in South Korea. Quality and Quantity, 42(5), 687–697.

    Google Scholar 

  • Pierrakos, D., & Paliouras, G. (2010). Personalizing web directories with the aid of web usage data. IEEE Transactions on Knowledge and Data Engineering, 22(9), 1331–1344.

    Google Scholar 

  • Polanco, X., Roche, I., & Besagni, D. (2006). User science indicators in the web context and co-usage analysis. Scientometrics, 66(1), 171–182.

    Google Scholar 

  • Poongothai, K., & Sathiyabama, S. (2012). Efficient web usage miner using decisive induction rules. Journal of Computer Science, 8(6), 835–840.

    Google Scholar 

  • Popova, V., John, R., & Stockton, D. (2009). Sales intelligence using web mining. In P. Perner (Ed.), ICDM 2009, LNAI, 5633 (pp. 131–145). Berlin: Springer.

    Google Scholar 

  • Pratt, J. A., Hauser, K., & Sugimoto, C. R. (2012). Cross-disciplinary communities or knowledge islands: Examining business disciplines. Journal of Computer Information Systems, 53(2), 9–21.

    Google Scholar 

  • Qiu, G., Liu, B., Bu, J., & Chen, C. (2011). Opinion word expansion and target extraction through double propagation. Computational Linguistics, 37(1), 9–27.

    Google Scholar 

  • Rettinger, A., Loesch, U., Tresp, V., D’Amato, C., & Fanizzi, N. (2012). Mining the semantic web statistical learning for next generation knowledge bases. Data Mining and Knowledge Discovery, 24(3, SI), 613–662.

    MATH  MathSciNet  Google Scholar 

  • Richardson, M., & Domingos, P. (2002). Mining knowledge-sharing sites for viral marketing. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 61–70).

  • Romero, C., Ventura, S., Zafra, A., & De Bra, P. (2009). Applying web usage mining for personalizing hyperlinks in web-based adaptive educational systems. Computers and Education, 53(3), 828–840.

    Google Scholar 

  • Romero-Frías, E., & Vaughan, L. (2012). Exploring the relationships between media and political parties through web hyperlink analysis: The case of Spain. Journal of the American Society for Information Science and Technology, 63(5), 967–976.

    Google Scholar 

  • Ruller, T. J. (1993). A review of information science and computer science literature to support archival work with electronic records. American Archivist, 56(3), 546.

    Google Scholar 

  • Schubert, A., & Braun, T. (1996). Cross-field normalization of scientometric indicators. Scientometrics, 36(3), 311–324.

    Google Scholar 

  • Shandilya, S. K., & Jain, D. S. (2009). Automatic opinion extraction from web documents. Proceedings 2009 International Conference on Computer and Automation Engineering, ICCAE 2009 (pp. 351–355).

  • Sharma, K., Shrivastava, G., & Kumar, V. (2011). Web mining: Today and tomorrow. ICECT 20112011 3rd International Conference on Electronics Computer Technology, Vol. 1 (pp. 399–403).

  • Shekofteh, M., Shahbodaghi, A., Sajjadi, S., & Jambarsang, S. (2010). Investigating Web impact factors of type 1, type 2 and type 3 medical universities in Iran. Journal of Paramedical Sciences, 1(3), 34–41.

    Google Scholar 

  • Shunbo, Yuan, & Weina, Hua. (2011). Scholarly impact measurements of LIS open access journals: Based on citations and links. The Electronic Library, 29(5), 682–697.

    Google Scholar 

  • Shyu, M.-L., Haruechaiyasak, C., & Chen, S.-C. (2006). Mining user access patterns with traversal constraint for predicting web page requests. Knowledge and Information Systems, 10(4), 515–528.

    Google Scholar 

  • Small, H. (2010). Maps of science as interdisciplinary discourse: Co-citation contexts and the role of analogy. Scientometrics, 83(3), 835–849.

    MathSciNet  Google Scholar 

  • Somprasertsri, G., & Lalitrojwong, P. (2010). Mining feature-opinion in online customer reviews for opinion summarization. Journal of Universal Computer Science, 16(6), 938–955.

    Google Scholar 

  • Srivastava, J., Cooley, R., Deshpande, M., & Tan, P. N. (2000). Web usage mining: Discovery and applications of usage patterns from web data. Sigkdd Explorations, 1(2), 12–23.

    Google Scholar 

  • Stuart, D., Thelwall, M., & Harries, G. (2007). UK academic web links and collaboration: An exploratory study. Journal of Information Science, 33(2), 231–246.

    Google Scholar 

  • Takahashi, T., Abe, S., & Igata, N. (2011). Can Twitter be an alternative of real-world sensors? Lecture Notes in Computer Science, 6763, 240–249.

    Google Scholar 

  • Thelwall, M. (2001a). A web crawler design for data mining. Journal of Information Science, 27(5), 319–325.

    Google Scholar 

  • Thelwall, M. (2001b). Extracting macroscopic information from Web links. Journal of the American Society for Information Science and Technology, 52(13), 1157–1168.

    Google Scholar 

  • Thelwall, M. (2002a). A research and institutional size based model for National University web site interlinking. Journal of Documentation, 58(6), 683–694.

    Google Scholar 

  • Thelwall, M. (2002b). Evidence for the existence of geographic trends in university web site interlinking. Journal of Documentation, 58(5), 563–574.

    Google Scholar 

  • Thelwall, M. (2006). Interpreting social science link analysis research: A theoretical framework. Journal of the American Society for Information Science and Technology archive, 57(1), 60–68.

    Google Scholar 

  • Thelwall, M. (2009). Introduction to webometrics: Quantitative Web research for the social sciences. New York, NY: Morgan & Claypool.

    Google Scholar 

  • Thelwall, M. (2010a). Webometrics. Encyclopedia of library and information sciences (pp. 5634–5643). New York: Taylor and Francis.

    Google Scholar 

  • Thelwall, M. (2010b). Webometrics: Emergent or doomed? Information Research, 15(4), 713.

    Google Scholar 

  • Thelwall, M. (2011). A comparison of link and URL citation counting. Aslib Proceedings: New Information Perspectives, 63(4), 419–425.

    Google Scholar 

  • Thelwall, M., Buckley, K., & Paltoglou, G. (2011). Sentiment in Twitter events. Journal of the American Society for Information Science and Technology, 62(2), 406–418.

    Google Scholar 

  • Thelwall, M., Haustein, S., Larivière, V., & Sugimoto, C. R. (2013). Do altmetrics work? Twitter and ten other social web services. PLoS One, 8(5), e64841.

    Google Scholar 

  • Thelwall, M., Klitkou, A., Verbeek, A., Stuart, D., & Vincent, C. (2010). Policy-relevant webometrics for individual scientific fields. Journal of the American Society for Information Science and Technology, 61(7), 1464–1475.

    Google Scholar 

  • Thelwall, M., & Sud, P. (2011). A comparison of methods for collecting web citation data for academic organizations. Journal of the American Society for Information Science and Technology, 62(8), 1488–1497.

    Google Scholar 

  • Thelwall, M., & Sud, P. (2012). Webometric research with the Bing Search API2.0. Journal of Informetrics, 6(1), 44–52.

    Google Scholar 

  • Thelwall, M., Vann, K., & Fairclough, R. (2006). Web issue analysis: An integrated water resource management case study. Journal of the American Society for Information Science and Technology, 57(10), 1303–1314.

    Google Scholar 

  • Thelwall, M., Vaughan, L., & Björneborn, L. (2005). Webometrics. Annual Review of Information Science and Technology, 39, 81–135.

    Google Scholar 

  • Thelwall, M., & Wouters, P. (2005). What’s the deal with the web/blogs/the next big technology: A key role for information science in e-social science research? CoLIS’05: Proceedings of the 5th international conference on Context: conceptions of Library and Information Sciences.

  • Van Leeuwen, T., & Tijssen, R. (2000). Interdisciplinary dynamics of modern science: analysis of cross-disciplinary citation flows. Research Evaluation, 9(3), 183–187.

    Google Scholar 

  • Van Zoonen, L., Vis, F., & Mihelj, S. (2011). YouTube interactions between agonism, antagonism and dialogue: Video responses to the anti-Islam film Fitna. New Media and Society, 13(8), 1283–1300.

    Google Scholar 

  • Vaughan, L., & Romero-Frías, E. (2012). Exploring web keyword analysis as an alternative to link analysis: A multi-industry case. Scientometrics, 93(1), 217–232.

    Google Scholar 

  • Vaughan, L., & Thelwall, M. (2003). Scholarly use of the web: What are the key inducers of links to journal web sites? Journal of the American Society for Information Science and Technology, 54(1), 29–38.

    Google Scholar 

  • Vaughan, L., & Yang, R. (2012). Web data as academic and business quality estimates: A comparison of three data sources. Journal of the American Society for Information Science and Technology, 63(10), 1960–1972.

    Google Scholar 

  • Vaughan, L., Yang, R., & Tang, J. (2012). Web co-word analysis for business intelligence in the Chinese environment. Aslib Proceedings: New Information Perspectives, 6, 653–666.

    Google Scholar 

  • Vaughan, L., & You, J. (2010). Word co-occurrences on Webpages as a measure of the relatedness of organizations: A new Webometrics concept. Journal of Informetrics, 4(4), 483–491.

    Google Scholar 

  • Velásquez, J. D. (2013). Combining eye-tracking technologies with web usage mining for identifying Website Keyobjects. Engineering Applications of Artificial Intelligence, 26, 1469–1478.

    Google Scholar 

  • Velásquez, J. D., Dujovne, L. E., & L’Huillier, G. (2011). Extracting significant website key objects: A semantic web mining approach. Engineering Applications of Artificial Intelligence, 24(8), 1532–1541.

    Google Scholar 

  • Wang, C., Lu, J., & Zhang, G. (2007). Mining key information of web pages: A method and its application. Expert Systems with Applications, 33, 425–433.

    MathSciNet  Google Scholar 

  • Wang, P., Sanin, C., & Szczerbicki, E. (2011). Application of Decisional DNA in Web Data Mining. Knowlege-Based and Intelligent Information and Engineering Systems., 6882, 631–639.

    Google Scholar 

  • Wang, P., Sanin, C., & Szczerbicki, E. (2012). Introducing the concept of decisional DNA-based web content mining. Cybernetics and Systems: An International Journal, 43, 136–142.

    Google Scholar 

  • Wang, K.-Y., Ting, I.-H., & Wu, H.-J. (2013). Discovering interest groups for marketing in virtual communities: An integrated approach. Journal of Business Research, 66, 1360–1366.

    Google Scholar 

  • Wilkinson, D., & Thelwall, M. (2012). Trending Twitter Topics in English. Journal of the American Society for Information Science and Technology, 63(8), 1631–1646.

    Google Scholar 

  • Williams, C. J., O’Rourke, M., Eigenbrode, S. D., O’Loughlin, I., & Crowley, S. J. (2013). Using bibliometrics to support the facilitation of cross-disciplinary communication. Journal of the American Society for Information Science and Technology, 64(9), 1768–1779.

    Google Scholar 

  • Woo-Young, C., & Park, H. W. (2012). The network structure of the Korean blogosphere. Journal of Computer-Mediated Communication, 17(2), 216–230.

    Google Scholar 

  • Yang, B., Liu, J., & Feng, J. (2012). On the spectral characterization and scalable mining of network communities. IEEE Transactions on Knowledge and Data Engineering, 24(2), 326–337.

    MathSciNet  Google Scholar 

  • Yang, B., & Sun, Y. (2013). An exploration of link-based knowledge map in academic web space. Scientometrics, 96(1), 239–253.

    Google Scholar 

  • Yeh, I.-C., Lien, C., Ting, T.-M., & Liu, C.-H. (2009). Applications of web mining for marketing of online bookstores. Expert Systems with Applications, 36, 11249–11256.

    Google Scholar 

  • Zhang, Z., & Nasraoui, O. (2008). Mining search engine query logs for social filtering-based query recommendation. Applied Soft Computing, 8(4), 1326–1334.

    Google Scholar 

  • Zhang, Q., & Segall, R. S. (2008). Web mining: A survey of current research, techniques, and software. International Journal of Information Technology and Decision Making, 7(4), 683–720.

    Google Scholar 

  • Zhang, Y., & Xu, G. (2009). On web communities mining and recommendation. Concurrency and Computation-Practice and Experience, 21(5), 561–582.

    Google Scholar 

  • Zuccala, A. (2006). Author cocitation analysis is to intellectual structure as web colink analysis is to…? Journal of the American Society for Information Science and Technology, 57(11), 1487–1502.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David Gunnarsson Lorentzen.

Appendix: The queries

Appendix: The queries

General data collection

Initial Scopus queries

Webometrics

(webometric* OR “web metric*” OR cybermetric* OR scientometric* OR informetric*) AND (“web impact assessment” OR “web impact report*” OR “web impact analy*” OR “web citation analy*” OR “web content analy*” OR “link analy*” OR “webometric link analy*” OR “link relationship map*” OR “link relationship analy*” OR “link impact report*” OR “link impact analy*” OR “link network analy*” OR “colink relationship map*” OR “colink relationship analy*” OR “colink impact report*” OR “colink impact analy*” OR “colink network analy*” OR “co-link relationship map*” OR “co-link relationship analy*” OR “co-link impact report*” OR “co-link impact analy*” OR “co-link network analy*” OR “web analy*” OR “log analy*” OR “web memetic*” OR “social network analy*” OR “social network metric*”)

Web mining

(“web mining” OR “web data mining”) AND (“social network mining” OR “social network metric*” OR “web personalization” OR “web recommend*” OR “web community analy*” OR “web linkage mining” OR “web usage mining” OR “web structure mining” OR “web content mining” OR “web knowledge discovery” OR “collaborative filtering” OR “opinion mining” OR “web community discovery” OR “web graph measur*” OR “web graph model*” OR “log analy*” OR “log mining” OR “web structural analy*” OR “web structure analy*” OR “web temporal analy*” OR “link analy*”)

Refined queries for Scopus and Web of Science

Webometrics, Scopus

TITLE-ABS-KEY(webometric* OR cybermetric* OR scientometric* OR informetric*) AND TITLE-ABS-KEY(“web impact” OR “web citation analy*” OR “web citing analy*” OR “web content analy*” OR “link analy*” OR “colink analy*” OR “co-link analy*” OR “link relationship*” OR “link impact*” OR “link network*” OR “colink relationship*” OR “colink*” OR “colink network*” OR “co-link relationship*” OR “co-link impact*” OR “co-link network*” OR “web analy*” OR “log analy*” OR “web content*” OR “web usage” OR “web memetic*” OR “virtual memetic*” OR “social network” OR “web knowledge”)

142 items returned.

Webometrics, WoS

TS = (webometric* OR cybermetric* OR scientometric* OR informetric*) AND TS = (“web impact” OR “web citation analy*” OR “web citing analy*” OR “web content analy*” OR “link analy*” OR “colink analy*” OR “co-link analy*” OR “link relationship*” OR “link impact*” OR “link network*” OR “colink relationship*” OR “colink*” OR “colink network*” OR “co-link relationship*” OR “co-link impact*” OR “co-link network*” OR “web analy*” OR “log analy*” OR “web content*” OR “web usage” OR “web memetic*” OR “virtual memetic*” OR “social network” OR “web knowledge”)

133 items returned.

Web mining, Scopus

TITLE-ABS-KEY(“web mining” OR “web data mining”) AND TITLE-ABS-KEY(“social network” OR “web personal*” OR “web recommend*” OR “web community” OR “web linkage mining” OR “web usage” OR “web structure” OR “web content” OR “web knowledge” OR “collaborative filtering” OR “opinion mining” OR “web community” OR “web graph measur*” OR “web graph model*” OR “log analy*” OR “log mining” OR “web structural analy*” OR “web structure analy*” OR “web temporal analy*” OR “link analy*”)

688 items returned.

Web mining, WoS

TS = (“web mining” OR “web data mining”) AND TS = (“social network” OR “web personal*” OR “web recommend*” OR “web community” OR “web linkage mining” OR “web usage” OR “web structure” OR “web content” OR “web knowledge” OR “collaborative filtering” OR “opinion mining” OR “web community” OR “web graph measur*” OR “web graph model*” OR “log analy*” OR “log mining” OR “web structural analy*” OR “web structure analy*” OR “web temporal analy*” OR “link analy*”)

338 items returned.

Data collection for citation and keyword analysis

Webometrics

TITLE-ABS-KEY(webometric* or cybermetric*) AND (LIMIT-TO(DOCTYPE, “cp”) OR LIMIT-TO(DOCTYPE, “ar”) OR LIMIT-TO(DOCTYPE, “re”) OR LIMIT-TO(DOCTYPE, “ip”))

307 items returned.

Web mining

TITLE-ABS-KEY(“web mining” or “web data mining”) AND (LIMIT-TO(DOCTYPE, “cp”) OR LIMIT-TO(DOCTYPE, “ar”) OR LIMIT-TO(DOCTYPE, “re”) OR LIMIT-TO(DOCTYPE, “ip”))

2,518 items returned.

Social web search terms

farmville, hulu, prezi, posterous, blipfm, boxee, friv, friendfeed, gliffy, kerpoof, mint, docstoc, animoto, fotoflexer, lijit, google docs, foxytunes, wufoo, twitter, openid, piczo, picnik, joost, footnote, digg, viddler, snap, wesabe, zamzar, linkedin, compete, weebly, typepad, ilike, slide, feedblitz, mybloglog, quantcast, blip.tv, songbird, widgetbox, panoramio, plazes, scrapblog, imagekind, zoho, metacafe, evernote, reddit, zyb, yelp, amie.st, finetune, pageflakes, feedburner, netvibes, zooomr, facebook, youtube, alexa, flickr, gmail, box, ebay, amazon, orkut, myspace, skype, meebo, delicious, del.icio.us, flock, stumbleupon, pandora, last.fm, smugmug, social, 2.0, new media, blog*, communit*, wiki, collabo*, participat*, new web

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lorentzen, D.G. Webometrics benefitting from web mining? An investigation of methods and applications of two research fields. Scientometrics 99, 409–445 (2014). https://doi.org/10.1007/s11192-013-1227-x

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-013-1227-x

Keywords

Navigation