Abstract
Webometrics and web mining are two fields where research is focused on quantitative analyses of the web. This literature review outlines definitions of the fields, and then focuses on their methods and applications. It also discusses the potential of closer contact and collaboration between them. A key difference between the fields is that webometrics has focused on exploratory studies, whereas web mining has been dominated by studies focusing on development of methods and algorithms. Differences in type of data can also be seen, with webometrics more focused on analyses of the structure of the web and web mining more focused on web content and usage, even though both fields have been embracing the possibilities of user generated content. It is concluded that research problems where big data is needed can benefit from collaboration between webometricians, with their tradition of exploratory studies, and web miners, with their tradition of developing methods and algorithms.
Similar content being viewed by others
Notes
Source: http://www.go2web20.net/. See Appendix for a list of used social web terms.
A graph is considered directed if the edges imply a direction, e.g. page A links to page B.
The linkdomain command was useful for finding all pages linking to any page belonging to a given web site.
An API is an interface set up by a service provider, which can be used to query the provider for certain data.
References
Aguillo, I. (2009). Measuring the institution’s footprint in the web. Library Hi Tech, 27(4), 540–556.
Aguillo, I. F., Granadino, B., Ortega, J. L., & Prieto, J. A. (2006). Scientific research activity and communication measured with cybermetrics indicators. Journal of the American Society for Information Science and Technology, 57(10), 1296–1302.
Ai, D., Zhang, Y., Zuo, H., & Wang, Q. (2006). Web content mining for market intelligence acquiring from B2C websites. In L. Feng, et al. (Eds.), WISE 2006 Workshops, LNCS 4256 (pp. 159–170). Berlin: Springer-Verlag.
Akcora, C. G., Bayir, M. A., Demirbas, M. & Ferhatosmanoglu, H. (2010). Identifying breakpoints in public opinion. SOMA 2010: Proceedings of the 1st Workshop on Social Media Analytics (pp. 62–66).
Algur, S. P., Patil, A. P., Hiremath, P. S. & Shivashankar, S. (2010). Conceptual level similarity measure based review spam detection. Proceedings of the 2010 International Conference on Signal and Image Processing, ICSIP 2010 (pp. 416–423).
Almind, T. C., & Ingwersen, P. (1997). Informetric analyses on the World Wide Web: Methodological approaches to ‘Webometrics’. Journal of Documentation, 53(4), 404–426.
Alsaleh, S., Nayak, R., Xu, Y., & Chen, L. (2011). Improving matching process in social network using implicit and explicit user information. Lecture Notes in Computer Science, 6612, 313–320.
Aminpour, F., Kabiri, P., Otroj, Z., & Keshtkar, A. A. (2009). Webometric analysis of Iranian universities of medical sciences. Scientometrics, 80(1), 253–264.
Angus, E., Thelwall, M., & Stuart, D. (2008). General patterns of tag usage among university groups in Flickr. Online Information Review, 32(1), 89–101.
Arbelaitz, O., Gurrutxaga, I., Lojo, A., Muguerza, J., Pérez, J. M., & Perona, I. (2013). Web usage and content mining to extract knowledge for modelling the users of the Bidasoa Turismo website and to adapt it. Expert Systems with Applications, 40, 7478–7491.
Asadi, M., & Shekofteh, M. (2009). The relationship between the research activity of Iranian medical universities and their web impact factor. Electronic Library, 27(6), 1026–1043.
Asur, S. & Huberman, B. A. (2010). Predicting the future with social media. Proceedings 2010 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2010, Vol. 1(pp. 492–499).
Ball, R., Mittermaier, B., & Tunger, D. (2009). Creation of journal-based publication profiles of scientific institutions: A methodology for the interdisciplinary comparison of scientific research based on the J-factor. Scientometrics, 81(2), 381–392.
Bar-Ilan, J. (2004). A microscopic link analysis of academic institutions within a country: The case of Israel. Scientometrics, 59(3), 391–403.
Bar-Ilan, J. (2008). Informetrics at the beginning of the 21st century: A review. Journal of Informetrics, 2, 1–52.
Barjak, F., Li, X., & Thelwall, M. (2007). Which factors explain the Web impact of scientists’ personal homepages? Journal of the American Society for Information Science and Technology, 58(2), 200–211.
Barragáns-Martínez, A. B., Costa-Montenegro, E., Burguillo, J. C., Rey-López, M., Mikic-Fonte, F. A., & Peleteiro, A. (2010). A hybrid content-based and item-based collaborative filtering approach to recommend TV programs enhanced with singular value decomposition. Information Sciences, 180(22), 4290–4311.
Bastian, M., Heymann, S., Jacomy, M. (2009). Gephi: An open source software for exploring and manipulating networks. International AAAI Conference on Weblogs and Social Media.
Bayir, M. A., Toroslu, I. H., Demirbas, M., & Cosar, A. (2012). Discovering better navigation sequences for the session construction problem. Data and Knowledge Engineering, 73, 58–72.
Becher, T., & Trowler, P. R. (2001). Academic tribes and territories: intellectual enquiry and the culture of disciplines (2nd ed.). Philadelphia, PA: Open University Press.
Biehl, M., Kim, H., & Wade, M. (2006). Relationships among the academic business disciplines: A multi-method citation analysis. Omega, 34(4), 359–371.
Bifet, A., & Frank, E. (2010). Sentiment knowledge discovery in Twitter streaming data. Lecture Notes in Computer Science, 6332, 1–15.
Biuk-Aghai, R. P., Tang, L. V.-S., Fong, S., & Si, Y.-W. (2009). Wikis as digital ecosystems: An analysis based on authorship. 2009 3rd IEEE International Conference on Digital Ecosystems and Technologies, DEST ‘09 (pp. 581–586).
Björneborn, L. (2006). ‘Mini small worlds’ of shortest link paths crossing domain boundaries in an academic Web space. Scientometrics, 68(3), 395–414.
Björneborn, L., & Ingwersen, P. (2001). Perspectives of webometrics. Scientometrics, 50(1), 65–82.
Björneborn, L., & Ingwersen, P. (2004). Toward a basic framework for webometrics. Journal of the American Society for Information Science and Technology, 55(14), 1216–1227.
Blondel, V. D., Guillaume, J.-L., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics, 10, 1–12.
Borges, J., & Levene, M. (2006). Ranking pages by topology and popularity within web sites. World Wide Web: Internet and Web Information Systems, 9(3), 301–316.
Breese, J. S., Heckerman, D., & Kadie, C. (1999). Empirical analysis of predictive algorithms for collaborative filtering. Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence (pp. 43–52).
Brejla, P., & Gilbert, D. (2012). An exploratory use of web content analysis to understand cruise tourism services. International Journal of Tourism Research. doi:10.1002/jtr.1910.
Brin, S., & Page, L. (1998). The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems, 30(1–7), 107–117.
Canny, J. (2002). Collaborative filtering with privacy via factor analysis. SIGIR Forum, 2002, 238–245.
Chau, M., & Xu, J. (2007). Mining communities and their relationships in blogs: A study of online hate groups. International Journal of Human-Computer Studies, 65(1), 57–70.
Chen, H., & Chau, M. (2004). Web mining: Machine learning for web applications. Annual Review of Information Science and Technology, 38, 289–329 + xvii–xviii.
Cheong, M., & Lee, V. C. S. (2011). A microblogging-based approach to terrorism informatics: Exploration and chronicling civilian sentiment and response to terrorism events via Twitter. Information Systems Frontiers, 13(1), 45–59.
Cho, S. E., & Park, H. W. (2012). Government organizations’ innovative use of the Internet: The case of the Twitter activity of South Korea’s Ministry for Food, Agriculture, Forestry and Fisheries. Scientometrics, 90(1), 9–23.
Chou, P.-H., Li, P.-H., Chen, K.-K., & Wu, M.-J. (2010). Integrating web mining and neural network for personalized e-commerce automatic service. Expert Systems with Applications, 37(4), 2898–2910.
Cooley, R., Mobasher, B., & Srivastava, J. (1997). Web mining: Information and pattern discovery on the world wide web. In International Conference on Tools with Artificial Intelligence (pp. 558–567).
Da Costa Jr, M. G., & Gong, Z. (2005). Web structure mining: An introduction. ICIA 2005 Proceedings of 2005 International Conference on Information Acquisition, Vol. 2005 (pp. 590–595).
Das, R., & Turkoglu, I. (2009). Creating meaningful data from web logs for improving the impressiveness of a website by using path analysis method. Expert Systems with Applications, 36(3), 6635–6644.
Deshpande, M., & Karypis, G. (2004). Item-based top-N recommendation algorithms. ACM Transactions on Information Systems, 22(1), 143–177.
Didegah, F., & Goltaji, M. (2010). Link analysis and impact of top universities of Islamic world on the world wide web. Library Hi Tech News, 27(8), 12–16.
Duane Ireland, R., & Webb, J. W. (2007). A cross-disciplinary exploration of entrepreneurship research. Journal of Management, 33(6), 891–927.
Efron, M. (2011). Information search and retrieval in microblogs. Journal of the American Society for Information Science and Technology, 62(6), 996–1008.
Eirinaki, M., & Vazirgiannis, M. (2003). Web mining for web personalization. ACM Transactions on Internet Technology, 3(1), 1–27.
Erfanmanesh, M., & Didegah, F. (2011). Visibility and impact of Iranian research institutions on the web. Library Hi Tech News, 28(1), 4–9.
Etzioni, O. (1996). The world-wide web: Quagmire or gold mine? Communications of the ACM, 39(11), 65–68.
Facca, F. M., & Lanzi, P. L. (2005). Mining interesting knowledge from weblogs: A survey. Data and Knowledge Engineering, 53(3), 225–241.
Fernández, J., Boldrini, E., Gómez, J. M., & Martínez-Barco, P. (2011). Evaluating EmotiBlog robustness for sentiment analysis tasks. Lecture Notes in Computer Science, 6716, 290–294.
Fischer, A. R. H., Tobi, H., & Ronteltap, A. (2011). When natural met Social: A review of collaboration between the natural and social sciences. Interdisciplinary Science Reviews, 36(4), 341–358.
Glass, R. L., Ramesh, V., & Vessey, I. (2004). An analysis of research in computing disciplines. Communications of the ACM, 47(6), 89–94.
Gruzd, A., Black, F. A., Le, T. N. Y., & Amos, K. (2012). Investigating biomedical research literature in the blogosphere: A case study of diabetes and glycated hemoglobin (HbA1c). Journal of the Medical Library Association, 100(1), 34–42.
Guerbas, A., Addam, O., Zaarour, O., Nagi, M., Elhajj, A., Ridley, M., et al. (2013). Effective web log mining and online navigational pattern prediction. Knowledge-Based Systems, 49, 50–62.
Hale, S. A. (2012). Net increase? Cross-lingual linking in the blogosphere. Journal of Computer-Mediated Communication, 17(2), 135–151.
He, Q. (1999). Knowledge discovery through co-word analysis. Library Trends, 48(1), 133–159.
Hofmann, T. (2004). Latent semantic models for collaborative filtering. ACM Transactions on Information Systems, 22(1), 89–115.
Holloway, T., Bozicevic, M., & Börner, K. (2007). Analyzing and visualizing the semantic coverage of Wikipedia and its authors. Complexity, 12(3), 30–40.
Holmberg, K. (2010). Co-inlinking to a municipal Web space: A webometric and content analysis. Scientometrics, 83(3), 851–862.
Holmberg, K., & Thelwall, M. (2009). Local government web sites in Finland: A geographic and webometric analysis. Scientometrics, 79(1), 157–169.
Hsu, C.-L., & Park, H. W. (2011). Sociology of hyperlink networks of web 1.0, web 2.0, and twitter: A case study of South Korea. Social Science Computer Review, 29(3), 354–368.
Hsu, C.-L., & Park, H. W. (2012). Mapping online social networks of Korean politicians. Government Information Quarterly, 29(2), 169–181.
Huang, Z., Chen, H., & Zeng, D. (2004). Applying associative retrieval techniques to alleviate the sparsity problem in collaborative filtering. ACM Transactions on Information Systems, 22(1), 116–142.
Ingwersen, P. (1998). The calculation of web impact factors. Journal of Documentation, 54(2), 236–243.
Islam, M. A. (2011). Webometrics study of universities in Bangladesh. Annals of Library and Information Studies, 58(4), 307–318.
Islam, M. A., & Alam, M. S. (2011). Webometric study of private universities in Bangladesh. Malaysian Journal of Library and Information Science, 16(2), 115–126.
Jonkers, K., De Moya Anegon, F., & Aguillo, I.-F. (2012). Measuring the usage of e-research infrastructure as an indicator of research activity. Journal of the American Society for Information Science and Technology, 63(7), 1374–1382.
Kajikawa, Y. & Mori, J. (2009). Interdisciplinary Research Detection by Citation Indicators. International Conference on Industrial Engineering and Engineering Management 2009 (IEEM2009) in Hong Kong. (December 8–11, 2009).
Kirby, J. A., Hoadley, C. M., & Carr-Chellman, A. A. (2005). Instructional systems design and the learning sciences: A citation analysis. ETR&D-Educational Technology Research and Development, 53(1), 37–48.
Kleinberg, J. M. (1999). Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5), 604–632.
Kontopoulos, E., Berberidis, C., Dergiades, T., & Bassiliades, N. (2013). Ontology-based sentiment analysis of twitter posts. Expert Systems with Applications, 40, 4065–4074.
Kosala, R., & Blockeel, H. (2000). Web mining research: A survey. ACM SIGKDD Explorations, 2(11), 1–15.
Kretschmer, H., & Aguillo, I. F. (2005). New indicators for gender studies in web networks. Information Processing and Management, 41(6), 1481–1494.
Ku, L.-W., & Chen, H.-H. (2007). Mining opinions from the web: Beyond relevance retrieval. Journal of the American Society for Information Science and Technology, 58(12), 1838–1850.
Kumar, G. D., & Gosul, M. (2011). Web mining research and future directions. Communications in Computer and Information Science, 196, 489–496.
Kundu, S. (2012). An intelligent approach of web data mining. International Journal on Computer Science and Engineering., 4(5), 919–928.
Lai, Y., & Zeng, J. (2013). A cross-language personalized recommendation model in digital libraries. The Electronic Library, 31(3), 264–277.
Lambiotte, R., Delvenne, J.-C., & Barahona, M. (2009). Laplacian dynamics and multiscale modular structure in networks. arXiv. Retrieved October 10, 2013 from http://arxiv.org/abs/0812.1770.
Lang, P. B., Gouveia, F. C., & Leta, J. (2010). Site co-link analysis applied to small networks: a new methodological approach. Scientometrics, 83(1), 157–166.
Lang, P. B., Gouveia, F. C., & Leta, J. (2013). Cooperation in health: Mapping collaborative networks on the web. PLoS One, 8(8), e71415.
Laniado, D., & Tasso, R. (2011). Co-authorship 2.0-Patterns of collaboration in Wikipedia. HT 2011 Proceedings of the 22nd ACM Conference on Hypertext and Hypermedia (pp. 201–210).
Lappas, G. (2007). An overview of web mining in societal benefit areas. Online Information Review, 32(2), 179–195.
Li, H.-F. (2009). A sliding window method for finding top-k path traversal patterns over streaming web click-sequences. Expert Systems with Applications, 36(3), 4382–4386.
Li, Y.-M., Lai, C.-Y., & Chen, C.-W. (2009). Identifying bloggers with marketing influence in the blogosphere. ACM International Conference Proceeding Series (pp. 335–340).
Lin, S.-H., Chu, K.-P., & Chiu, C.-M. (2011). Automatic sitemaps generation: Exploring website structures using block extraction and hyperlink analysis. Expert Systems with Applications, 38(4), 3944–3958.
Malinský, R., & Jelínek, I. (2010). Improvements of Webometrics by using sentiment analysis for better accessibility of the web. Lecture Notes in Computer Science, 6385, 581–586.
Martínez-Ruiz, A., & Thelwall, M. (2010). The importance of technology and R&D expenditures in the visibility of the firms on the web: An exploratory study. Cybermetrics International Journal of Scientometrics, Informetrics and Bibliometrics, 14(1), 2.
Martínez-Torres, M. R., & Díaz-Fernández, M. C. (2013). A study of global and local visibility as web indicators of research production. Research Evaluation, 22, 157–168.
Martínez-Torres, M. R., Toral, S. L., Palacios, B., & Barrero, F. (2012). An evolutionary factor analysis computation for mining website structures. Expert Systems with Applications, 39(14), 11623–11633.
Milgram, S. (1967). The small-world problem. Psychology Today, 1(1), 60–67.
Miller, B. N., Konstan, J. A., & Riedl, J. (2004). PocketLens: Toward a personal recommender system. ACM Transactions on Information Systems, 22(3), 437–476.
Minguillo, D., & Thelwall, M. (2012). Mapping the network structure of science parks: An exploratory study of cross-sectoral interactions reflected on the web. Aslib Proceedings: New Information Perspectives, 64(4), 332–357.
Mobasher, B., Cooley, R., & Srivastava, J. (2000). Automatic personalization based on web usage mining. Communications of the ACM, 43(8), 142–151.
Mobasher, B., Dai, H., Luo, T., & Nakagawa, M. (2002). Discovery and evaluation of aggregate usage profiles for web personalization. Data Mining and Knowledge Discovery, 6(1), 61–82.
Moghaddam, S., & Ester, M. (2010). Opinion digger: An unsupervised opinion miner from unstructured product reviews. International Conference on Information and Knowledge Management, Proceedings (pp. 1825–1828).
Nam, Y., Lee, Y.-O., & Park, H. W. (2013). Can web ecology provide a clearer understanding of people’s information behavior during election campaigns? Social Science Information, 52(1), 91–109.
Nasraoui, O., Rojas, C., & Cardona, C. (2006). A framework for mining evolving trends in web data streams using dynamic learning and retrospective validation. Computer Networks, 50(10, SI), 1488–1512.
Nasraoui, O., Soliman, M., Saka, E., Badia, A., & Germain, R. (2008). A web usage mining framework for mining evolving user profiles in dynamic web sites. IEEE Transactions on Knowledge and Data Engineering, 20(2), 202–215.
Nekaris, K. A.-I., Campbell, N., Coggins, T. G., Johanna Rode, E., & Nijman, V. (2013). Tickled to death: Analysing public perceptions of ‘cute’ videos of threatened species (Slow lorises—Nycticebus spp.) on web 2.0 sites. PLoS One, 8(7), e69215.
Noruzi, A. (2005). Web impact factors for Iranian Universities. Webology, 2(1), 51.
Noruzi, A. (2006). The web impact factor: A critical review. Electronic Library, 24(4), 490–500.
Nwagwu, W. E., & Agarin, O. (2008). Nigerian University websites: A webometric analysis. Webology, 5(4), 1–20.
Orduña-Malea, E. (2012). Graphic, multimedia, and blog content presence in the Spanish academic web-space. Cybermetrics International Journal of Scientometrics, Informetrics and Bibliometrics, 16(1), 3.
Ortega, J. L., & Aguillo, I. F. (2007). Interdisciplinary relationships in the Spanish academic web space: A webometric study through networks visualization. Cybermetrics International Journal of Scientometrics, Informetrics and Bibliometrics, 11(1), 4.
Ortega, J. L., & Aguillo, I. F. (2008). Visualization of the Nordic academic web: Link analysis using social network tools. Information Processing and Management, 44(4), 1624–1633.
Ortega, J. L., & Aguillo, I. F. (2009). Mapping world-class universities on the web. Information Processing and Management, 45(2), 272–279.
Ortega, J. L., Aguillo, I., Cothey, V., & Scharnhorst, A. (2008). Maps of the academic web in the European Higher Education Area: An exploration of visual web indicators. Scientometrics, 74(2), 295–308.
Otte, E., & Rousseau, R. (2002). Social network analysis: A powerful strategy, also for the information sciences. Journal of Information Science, 28(6), 441–453.
Ou, J.-C., Lee, C.-H., & Chen, M.-S. (2008). Efficient algorithms for incremental web log mining with dynamic thresholds. VLDB Journal, 17(4), 827–845.
Paliouras, G. (2012). Discovery of web user communities and their role in personalization. User Modelling and User-Adapted Interaction, 22(1–2), 151–175.
Palmer, J. W. (2002). Web site usability, design, and performance metrics. Information Systems Research, 13(2), 151–167.
Panchal, V., Pillai, S., & Singh, A. (2012). Truth finder algorithm for multiple conflicting information providers on the web. International Journal of Computer Applications, 5, 1–4.
Park, H.-W. (2010). Mapping the e-science landscape in South Korea using the webometrics method. Journal of Computer-Mediated Communication, 15(2), 211–229.
Park, H.-W., & Kluver, R. (2009). Trends in online networking among South Korean politicians: A mixed-method approach. Government Information Quarterly, 26(3), 505–515.
Park, H.-W., & Thelwall, M. (2008). Link analysis: Hyperlink patterns and social structure on politicians’ web sites in South Korea. Quality and Quantity, 42(5), 687–697.
Pierrakos, D., & Paliouras, G. (2010). Personalizing web directories with the aid of web usage data. IEEE Transactions on Knowledge and Data Engineering, 22(9), 1331–1344.
Polanco, X., Roche, I., & Besagni, D. (2006). User science indicators in the web context and co-usage analysis. Scientometrics, 66(1), 171–182.
Poongothai, K., & Sathiyabama, S. (2012). Efficient web usage miner using decisive induction rules. Journal of Computer Science, 8(6), 835–840.
Popova, V., John, R., & Stockton, D. (2009). Sales intelligence using web mining. In P. Perner (Ed.), ICDM 2009, LNAI, 5633 (pp. 131–145). Berlin: Springer.
Pratt, J. A., Hauser, K., & Sugimoto, C. R. (2012). Cross-disciplinary communities or knowledge islands: Examining business disciplines. Journal of Computer Information Systems, 53(2), 9–21.
Qiu, G., Liu, B., Bu, J., & Chen, C. (2011). Opinion word expansion and target extraction through double propagation. Computational Linguistics, 37(1), 9–27.
Rettinger, A., Loesch, U., Tresp, V., D’Amato, C., & Fanizzi, N. (2012). Mining the semantic web statistical learning for next generation knowledge bases. Data Mining and Knowledge Discovery, 24(3, SI), 613–662.
Richardson, M., & Domingos, P. (2002). Mining knowledge-sharing sites for viral marketing. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 61–70).
Romero, C., Ventura, S., Zafra, A., & De Bra, P. (2009). Applying web usage mining for personalizing hyperlinks in web-based adaptive educational systems. Computers and Education, 53(3), 828–840.
Romero-Frías, E., & Vaughan, L. (2012). Exploring the relationships between media and political parties through web hyperlink analysis: The case of Spain. Journal of the American Society for Information Science and Technology, 63(5), 967–976.
Ruller, T. J. (1993). A review of information science and computer science literature to support archival work with electronic records. American Archivist, 56(3), 546.
Schubert, A., & Braun, T. (1996). Cross-field normalization of scientometric indicators. Scientometrics, 36(3), 311–324.
Shandilya, S. K., & Jain, D. S. (2009). Automatic opinion extraction from web documents. Proceedings 2009 International Conference on Computer and Automation Engineering, ICCAE 2009 (pp. 351–355).
Sharma, K., Shrivastava, G., & Kumar, V. (2011). Web mining: Today and tomorrow. ICECT 2011—2011 3rd International Conference on Electronics Computer Technology, Vol. 1 (pp. 399–403).
Shekofteh, M., Shahbodaghi, A., Sajjadi, S., & Jambarsang, S. (2010). Investigating Web impact factors of type 1, type 2 and type 3 medical universities in Iran. Journal of Paramedical Sciences, 1(3), 34–41.
Shunbo, Yuan, & Weina, Hua. (2011). Scholarly impact measurements of LIS open access journals: Based on citations and links. The Electronic Library, 29(5), 682–697.
Shyu, M.-L., Haruechaiyasak, C., & Chen, S.-C. (2006). Mining user access patterns with traversal constraint for predicting web page requests. Knowledge and Information Systems, 10(4), 515–528.
Small, H. (2010). Maps of science as interdisciplinary discourse: Co-citation contexts and the role of analogy. Scientometrics, 83(3), 835–849.
Somprasertsri, G., & Lalitrojwong, P. (2010). Mining feature-opinion in online customer reviews for opinion summarization. Journal of Universal Computer Science, 16(6), 938–955.
Srivastava, J., Cooley, R., Deshpande, M., & Tan, P. N. (2000). Web usage mining: Discovery and applications of usage patterns from web data. Sigkdd Explorations, 1(2), 12–23.
Stuart, D., Thelwall, M., & Harries, G. (2007). UK academic web links and collaboration: An exploratory study. Journal of Information Science, 33(2), 231–246.
Takahashi, T., Abe, S., & Igata, N. (2011). Can Twitter be an alternative of real-world sensors? Lecture Notes in Computer Science, 6763, 240–249.
Thelwall, M. (2001a). A web crawler design for data mining. Journal of Information Science, 27(5), 319–325.
Thelwall, M. (2001b). Extracting macroscopic information from Web links. Journal of the American Society for Information Science and Technology, 52(13), 1157–1168.
Thelwall, M. (2002a). A research and institutional size based model for National University web site interlinking. Journal of Documentation, 58(6), 683–694.
Thelwall, M. (2002b). Evidence for the existence of geographic trends in university web site interlinking. Journal of Documentation, 58(5), 563–574.
Thelwall, M. (2006). Interpreting social science link analysis research: A theoretical framework. Journal of the American Society for Information Science and Technology archive, 57(1), 60–68.
Thelwall, M. (2009). Introduction to webometrics: Quantitative Web research for the social sciences. New York, NY: Morgan & Claypool.
Thelwall, M. (2010a). Webometrics. Encyclopedia of library and information sciences (pp. 5634–5643). New York: Taylor and Francis.
Thelwall, M. (2010b). Webometrics: Emergent or doomed? Information Research, 15(4), 713.
Thelwall, M. (2011). A comparison of link and URL citation counting. Aslib Proceedings: New Information Perspectives, 63(4), 419–425.
Thelwall, M., Buckley, K., & Paltoglou, G. (2011). Sentiment in Twitter events. Journal of the American Society for Information Science and Technology, 62(2), 406–418.
Thelwall, M., Haustein, S., Larivière, V., & Sugimoto, C. R. (2013). Do altmetrics work? Twitter and ten other social web services. PLoS One, 8(5), e64841.
Thelwall, M., Klitkou, A., Verbeek, A., Stuart, D., & Vincent, C. (2010). Policy-relevant webometrics for individual scientific fields. Journal of the American Society for Information Science and Technology, 61(7), 1464–1475.
Thelwall, M., & Sud, P. (2011). A comparison of methods for collecting web citation data for academic organizations. Journal of the American Society for Information Science and Technology, 62(8), 1488–1497.
Thelwall, M., & Sud, P. (2012). Webometric research with the Bing Search API2.0. Journal of Informetrics, 6(1), 44–52.
Thelwall, M., Vann, K., & Fairclough, R. (2006). Web issue analysis: An integrated water resource management case study. Journal of the American Society for Information Science and Technology, 57(10), 1303–1314.
Thelwall, M., Vaughan, L., & Björneborn, L. (2005). Webometrics. Annual Review of Information Science and Technology, 39, 81–135.
Thelwall, M., & Wouters, P. (2005). What’s the deal with the web/blogs/the next big technology: A key role for information science in e-social science research? CoLIS’05: Proceedings of the 5th international conference on Context: conceptions of Library and Information Sciences.
Van Leeuwen, T., & Tijssen, R. (2000). Interdisciplinary dynamics of modern science: analysis of cross-disciplinary citation flows. Research Evaluation, 9(3), 183–187.
Van Zoonen, L., Vis, F., & Mihelj, S. (2011). YouTube interactions between agonism, antagonism and dialogue: Video responses to the anti-Islam film Fitna. New Media and Society, 13(8), 1283–1300.
Vaughan, L., & Romero-Frías, E. (2012). Exploring web keyword analysis as an alternative to link analysis: A multi-industry case. Scientometrics, 93(1), 217–232.
Vaughan, L., & Thelwall, M. (2003). Scholarly use of the web: What are the key inducers of links to journal web sites? Journal of the American Society for Information Science and Technology, 54(1), 29–38.
Vaughan, L., & Yang, R. (2012). Web data as academic and business quality estimates: A comparison of three data sources. Journal of the American Society for Information Science and Technology, 63(10), 1960–1972.
Vaughan, L., Yang, R., & Tang, J. (2012). Web co-word analysis for business intelligence in the Chinese environment. Aslib Proceedings: New Information Perspectives, 6, 653–666.
Vaughan, L., & You, J. (2010). Word co-occurrences on Webpages as a measure of the relatedness of organizations: A new Webometrics concept. Journal of Informetrics, 4(4), 483–491.
Velásquez, J. D. (2013). Combining eye-tracking technologies with web usage mining for identifying Website Keyobjects. Engineering Applications of Artificial Intelligence, 26, 1469–1478.
Velásquez, J. D., Dujovne, L. E., & L’Huillier, G. (2011). Extracting significant website key objects: A semantic web mining approach. Engineering Applications of Artificial Intelligence, 24(8), 1532–1541.
Wang, C., Lu, J., & Zhang, G. (2007). Mining key information of web pages: A method and its application. Expert Systems with Applications, 33, 425–433.
Wang, P., Sanin, C., & Szczerbicki, E. (2011). Application of Decisional DNA in Web Data Mining. Knowlege-Based and Intelligent Information and Engineering Systems., 6882, 631–639.
Wang, P., Sanin, C., & Szczerbicki, E. (2012). Introducing the concept of decisional DNA-based web content mining. Cybernetics and Systems: An International Journal, 43, 136–142.
Wang, K.-Y., Ting, I.-H., & Wu, H.-J. (2013). Discovering interest groups for marketing in virtual communities: An integrated approach. Journal of Business Research, 66, 1360–1366.
Wilkinson, D., & Thelwall, M. (2012). Trending Twitter Topics in English. Journal of the American Society for Information Science and Technology, 63(8), 1631–1646.
Williams, C. J., O’Rourke, M., Eigenbrode, S. D., O’Loughlin, I., & Crowley, S. J. (2013). Using bibliometrics to support the facilitation of cross-disciplinary communication. Journal of the American Society for Information Science and Technology, 64(9), 1768–1779.
Woo-Young, C., & Park, H. W. (2012). The network structure of the Korean blogosphere. Journal of Computer-Mediated Communication, 17(2), 216–230.
Yang, B., Liu, J., & Feng, J. (2012). On the spectral characterization and scalable mining of network communities. IEEE Transactions on Knowledge and Data Engineering, 24(2), 326–337.
Yang, B., & Sun, Y. (2013). An exploration of link-based knowledge map in academic web space. Scientometrics, 96(1), 239–253.
Yeh, I.-C., Lien, C., Ting, T.-M., & Liu, C.-H. (2009). Applications of web mining for marketing of online bookstores. Expert Systems with Applications, 36, 11249–11256.
Zhang, Z., & Nasraoui, O. (2008). Mining search engine query logs for social filtering-based query recommendation. Applied Soft Computing, 8(4), 1326–1334.
Zhang, Q., & Segall, R. S. (2008). Web mining: A survey of current research, techniques, and software. International Journal of Information Technology and Decision Making, 7(4), 683–720.
Zhang, Y., & Xu, G. (2009). On web communities mining and recommendation. Concurrency and Computation-Practice and Experience, 21(5), 561–582.
Zuccala, A. (2006). Author cocitation analysis is to intellectual structure as web colink analysis is to…? Journal of the American Society for Information Science and Technology, 57(11), 1487–1502.
Author information
Authors and Affiliations
Corresponding author
Appendix: The queries
Appendix: The queries
General data collection
Initial Scopus queries
Webometrics
(webometric* OR “web metric*” OR cybermetric* OR scientometric* OR informetric*) AND (“web impact assessment” OR “web impact report*” OR “web impact analy*” OR “web citation analy*” OR “web content analy*” OR “link analy*” OR “webometric link analy*” OR “link relationship map*” OR “link relationship analy*” OR “link impact report*” OR “link impact analy*” OR “link network analy*” OR “colink relationship map*” OR “colink relationship analy*” OR “colink impact report*” OR “colink impact analy*” OR “colink network analy*” OR “co-link relationship map*” OR “co-link relationship analy*” OR “co-link impact report*” OR “co-link impact analy*” OR “co-link network analy*” OR “web analy*” OR “log analy*” OR “web memetic*” OR “social network analy*” OR “social network metric*”)
Web mining
(“web mining” OR “web data mining”) AND (“social network mining” OR “social network metric*” OR “web personalization” OR “web recommend*” OR “web community analy*” OR “web linkage mining” OR “web usage mining” OR “web structure mining” OR “web content mining” OR “web knowledge discovery” OR “collaborative filtering” OR “opinion mining” OR “web community discovery” OR “web graph measur*” OR “web graph model*” OR “log analy*” OR “log mining” OR “web structural analy*” OR “web structure analy*” OR “web temporal analy*” OR “link analy*”)
Refined queries for Scopus and Web of Science
Webometrics, Scopus
TITLE-ABS-KEY(webometric* OR cybermetric* OR scientometric* OR informetric*) AND TITLE-ABS-KEY(“web impact” OR “web citation analy*” OR “web citing analy*” OR “web content analy*” OR “link analy*” OR “colink analy*” OR “co-link analy*” OR “link relationship*” OR “link impact*” OR “link network*” OR “colink relationship*” OR “colink*” OR “colink network*” OR “co-link relationship*” OR “co-link impact*” OR “co-link network*” OR “web analy*” OR “log analy*” OR “web content*” OR “web usage” OR “web memetic*” OR “virtual memetic*” OR “social network” OR “web knowledge”)
142 items returned.
Webometrics, WoS
TS = (webometric* OR cybermetric* OR scientometric* OR informetric*) AND TS = (“web impact” OR “web citation analy*” OR “web citing analy*” OR “web content analy*” OR “link analy*” OR “colink analy*” OR “co-link analy*” OR “link relationship*” OR “link impact*” OR “link network*” OR “colink relationship*” OR “colink*” OR “colink network*” OR “co-link relationship*” OR “co-link impact*” OR “co-link network*” OR “web analy*” OR “log analy*” OR “web content*” OR “web usage” OR “web memetic*” OR “virtual memetic*” OR “social network” OR “web knowledge”)
133 items returned.
Web mining, Scopus
TITLE-ABS-KEY(“web mining” OR “web data mining”) AND TITLE-ABS-KEY(“social network” OR “web personal*” OR “web recommend*” OR “web community” OR “web linkage mining” OR “web usage” OR “web structure” OR “web content” OR “web knowledge” OR “collaborative filtering” OR “opinion mining” OR “web community” OR “web graph measur*” OR “web graph model*” OR “log analy*” OR “log mining” OR “web structural analy*” OR “web structure analy*” OR “web temporal analy*” OR “link analy*”)
688 items returned.
Web mining, WoS
TS = (“web mining” OR “web data mining”) AND TS = (“social network” OR “web personal*” OR “web recommend*” OR “web community” OR “web linkage mining” OR “web usage” OR “web structure” OR “web content” OR “web knowledge” OR “collaborative filtering” OR “opinion mining” OR “web community” OR “web graph measur*” OR “web graph model*” OR “log analy*” OR “log mining” OR “web structural analy*” OR “web structure analy*” OR “web temporal analy*” OR “link analy*”)
338 items returned.
Data collection for citation and keyword analysis
Webometrics
TITLE-ABS-KEY(webometric* or cybermetric*) AND (LIMIT-TO(DOCTYPE, “cp”) OR LIMIT-TO(DOCTYPE, “ar”) OR LIMIT-TO(DOCTYPE, “re”) OR LIMIT-TO(DOCTYPE, “ip”))
307 items returned.
Web mining
TITLE-ABS-KEY(“web mining” or “web data mining”) AND (LIMIT-TO(DOCTYPE, “cp”) OR LIMIT-TO(DOCTYPE, “ar”) OR LIMIT-TO(DOCTYPE, “re”) OR LIMIT-TO(DOCTYPE, “ip”))
2,518 items returned.
Social web search terms
farmville, hulu, prezi, posterous, blipfm, boxee, friv, friendfeed, gliffy, kerpoof, mint, docstoc, animoto, fotoflexer, lijit, google docs, foxytunes, wufoo, twitter, openid, piczo, picnik, joost, footnote, digg, viddler, snap, wesabe, zamzar, linkedin, compete, weebly, typepad, ilike, slide, feedblitz, mybloglog, quantcast, blip.tv, songbird, widgetbox, panoramio, plazes, scrapblog, imagekind, zoho, metacafe, evernote, reddit, zyb, yelp, amie.st, finetune, pageflakes, feedburner, netvibes, zooomr, facebook, youtube, alexa, flickr, gmail, box, ebay, amazon, orkut, myspace, skype, meebo, delicious, del.icio.us, flock, stumbleupon, pandora, last.fm, smugmug, social, 2.0, new media, blog*, communit*, wiki, collabo*, participat*, new web
Rights and permissions
About this article
Cite this article
Lorentzen, D.G. Webometrics benefitting from web mining? An investigation of methods and applications of two research fields. Scientometrics 99, 409–445 (2014). https://doi.org/10.1007/s11192-013-1227-x
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-013-1227-x