Skip to main content
Erschienen in: Wireless Personal Communications 2/2021

03.02.2021

Firefly Optimization Algorithm Based Web Scraping for Web Citation Extraction

verfasst von: E. Suganya, S. Vijayarani

Erschienen in: Wireless Personal Communications | Ausgabe 2/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Web citation analysis is primarily used to examine the impact of an author, an article or a publication by counting the amount of intervals that has been cited by other authors. The significant goal of web citation analysis is helping the researchers to find their related papers for their further analysis. The references of the paper must be cited by the author, which is used to recognize the link among the previous relevant research. The citation provides the valuable information that directs the researchers to gain knowledge about the current trends and future developments and obtain new ideas in their respective fields. The citation’s information are incorporate in the database called web citation database such as Google Scholar, Web of Science, Scopus and so on. From the web citation database, extracting the user required information is very complex task. Most of the open source tools are available online but manual process is needed to select the user-required information from the web page. For instance if the user need author name and publisher from the web citation database, they required to choose the exact information tags manually in existing tools which consumes more time. To overcome this difficulty we proposed an algorithm Firefly Optimization Algorithm based Web Scraping for web content extraction from web citation database. The primary purpose of this research is to extract author information extraction process extracts citation information published by an author, journal name, publisher, year and citation using web citation analysis. The user’s query input will be the keyword for example big data, artificial intelligence, etc. The web citation information from multiple web pages using Web crawling and web scraping techniques are applied for web citation information based on the user query and Particle Swarm Optimization, Hidden Markov Model are applied for finding the best solution from all the feasible solutions. Experiments illustrate the proposed FOAWS algorithm outperforms well comparing to other two algorithms.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Sari, F. S., & Kurniawan, A. (2010). Implementation of Indonesian electronic citation system based on web. Third International Conference on Knowledge Discovery and Data Mining—IEEE Transaction, 1, 494–497. Sari, F. S., & Kurniawan, A. (2010). Implementation of Indonesian electronic citation system based on web. Third International Conference on Knowledge Discovery and Data Mining—IEEE Transaction, 1, 494–497.
2.
Zurück zum Zitat Liu, B. (2007). Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data. Berlin: Springer.MATH Liu, B. (2007). Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data. Berlin: Springer.MATH
3.
Zurück zum Zitat Mehra, J., & Thakur, R. S. (2018). An Effective method for Web Log 3 preprocessing and page access frequency using web usage mining. International Journal of Applied Engineering Research, 13(2), 1227–1232. Mehra, J., & Thakur, R. S. (2018). An Effective method for Web Log 3 preprocessing and page access frequency using web usage mining. International Journal of Applied Engineering Research, 13(2), 1227–1232.
4.
Zurück zum Zitat Ceri, S., Bozzon, A., Brambilla, M., Della Valle, E., et al. (2013). Web information retrieval. Berlin: Springer.CrossRef Ceri, S., Bozzon, A., Brambilla, M., Della Valle, E., et al. (2013). Web information retrieval. Berlin: Springer.CrossRef
5.
Zurück zum Zitat Khalil, S., & Fakir, M. (2017). R Crawler: An R package for parallel web crawling and scraping. Software X, 6, 98–106. Khalil, S., & Fakir, M. (2017). R Crawler: An R package for parallel web crawling and scraping. Software X, 6, 98–106.
6.
Zurück zum Zitat Ammulu, K., & Venugopal, T. (2017). Mining web data using PSO algorithm. International Journal for Innovative Research in Science & Technology, 4(2), 201–207. Ammulu, K., & Venugopal, T. (2017). Mining web data using PSO algorithm. International Journal for Innovative Research in Science & Technology, 4(2), 201–207.
7.
Zurück zum Zitat Li, R. (2013). Web information extraction by using improved hybrid intelligent algorithm and HMM. In Information and computer technologies, computer modelling and new technologies (pp. 251–259). Li, R. (2013). Web information extraction by using improved hybrid intelligent algorithm and HMM. In Information and computer technologies, computer modelling and new technologies (pp. 251–259).
8.
Zurück zum Zitat Kim, C., & Shim, K. (2011). TEXT: Automatic template extraction from heterogeneous web pages. IEEE Transactions on Knowledge and Data Engineering, 23(4), 612–626.CrossRef Kim, C., & Shim, K. (2011). TEXT: Automatic template extraction from heterogeneous web pages. IEEE Transactions on Knowledge and Data Engineering, 23(4), 612–626.CrossRef
9.
Zurück zum Zitat Udapure, T. V., Kale, R. D., & Dharmik, R. C. (2014). Study of web crawler and its different types. Journal of Computer Engineering, 16(1), 01–05. Udapure, T. V., Kale, R. D., & Dharmik, R. C. (2014). Study of web crawler and its different types. Journal of Computer Engineering, 16(1), 01–05.
10.
Zurück zum Zitat Balaji, H., & Govardhan, A. (2014). A survey on hidden Markov models for information extraction. International Journal of Application or Innovation in Engineering & Management, 3(8), 176–179. Balaji, H., & Govardhan, A. (2014). A survey on hidden Markov models for information extraction. International Journal of Application or Innovation in Engineering & Management, 3(8), 176–179.
11.
Zurück zum Zitat Saklani, N. S., & Sharma, S. (2016). Extracting news from the web pages by using concept of clustering with neural genetic approach. International Journal of Advance Research, Ideas, and Innovations in Technology, 2(5), 1–17. Saklani, N. S., & Sharma, S. (2016). Extracting news from the web pages by using concept of clustering with neural genetic approach. International Journal of Advance Research, Ideas, and Innovations in Technology, 2(5), 1–17.
12.
Zurück zum Zitat Bhagat, A., & Raut, V. (2015). Ternary based web crawler for optimized search results. International Journal of Computer Science and Information Technologies, 6(5), 4444–4449. Bhagat, A., & Raut, V. (2015). Ternary based web crawler for optimized search results. International Journal of Computer Science and Information Technologies, 6(5), 4444–4449.
13.
Zurück zum Zitat Ge, A., Mao, W., & Zeng, D. (2010). Story extraction from the Web: A case study in security informatics. In IEEE international conference on service operations and logistics, and informatics (pp. 306–310). Ge, A., Mao, W., & Zeng, D. (2010). Story extraction from the Web: A case study in security informatics. In IEEE international conference on service operations and logistics, and informatics (pp. 306–310).
14.
Zurück zum Zitat Huang J., Liu, Z., Wang, B., et al. (2018). Web data extraction from scientific publishers’ website using hidden Markov model. In Knowledge science, engineering and management (pp. 469–476). Berlin: Springer. Huang J., Liu, Z., Wang, B., et al. (2018). Web data extraction from scientific publishers’ website using hidden Markov model. In Knowledge science, engineering and management (pp. 469–476). Berlin: Springer.
15.
Zurück zum Zitat Powley, B., & Dale, R. (2007). High accuracy citation extraction and named entity recognition for a heterogeneous corpus of academic papers. In International conference on natural language processing and knowledge engineering (pp. 119–124). Beijing: IEEE Xplore. Powley, B., & Dale, R. (2007). High accuracy citation extraction and named entity recognition for a heterogeneous corpus of academic papers. In International conference on natural language processing and knowledge engineering (pp. 119–124). Beijing: IEEE Xplore.
16.
Zurück zum Zitat Liangtu, S., & Xiaoming, Z. (2007). Web text feature extraction David with particle swarm optimization. International Journal of Computer Science and Network Security, 7(6), 132–136. Liangtu, S., & Xiaoming, Z. (2007). Web text feature extraction David with particle swarm optimization. International Journal of Computer Science and Network Security, 7(6), 132–136.
17.
Zurück zum Zitat Seymore, K., McCallum, A., & Rosenfeld, R. (1999). Learning hidden Markov model structure for information extraction. AAAI-99 Workshop on a machine (pp. 37–42). Seymore, K., McCallum, A., & Rosenfeld, R. (1999). Learning hidden Markov model structure for information extraction. AAAI-99 Workshop on a machine (pp. 37–42).
18.
Zurück zum Zitat Linnet Hailey, V. P. A., & Sudha, N. (2013). An optimization approach of firefly algorithm to record deduplication. International Journal of Engineering Research & Technology, 2(9), 2045–2049. Linnet Hailey, V. P. A., & Sudha, N. (2013). An optimization approach of firefly algorithm to record deduplication. International Journal of Engineering Research & Technology, 2(9), 2045–2049.
19.
Zurück zum Zitat Alexandrescu, A. (2019). Optimization and security in information retrieval, extraction, processing, and presentation on a cloud platform. Information, 10, 200.CrossRef Alexandrescu, A. (2019). Optimization and security in information retrieval, extraction, processing, and presentation on a cloud platform. Information, 10, 200.CrossRef
20.
Zurück zum Zitat Pal, A., Tomar, D. S., & Shrivastava, S. C. (2009). Effective focused crawling based on content and link structure analysis. International Journal of Computer Science and Information Security, 2(1), 103–107. Pal, A., Tomar, D. S., & Shrivastava, S. C. (2009). Effective focused crawling based on content and link structure analysis. International Journal of Computer Science and Information Security, 2(1), 103–107.
21.
Zurück zum Zitat Pranav, A., & Chauhan, S. (2015). Efficient focused web crawling approach for search engine. International Journal of Computer Science and Mobile Computing, 4(5), 545–551. Pranav, A., & Chauhan, S. (2015). Efficient focused web crawling approach for search engine. International Journal of Computer Science and Mobile Computing, 4(5), 545–551.
22.
Zurück zum Zitat Kausar, Md. A., Dhaka, V. S., & Singh, S. V. (2013). Web crawler: A review. International Journal of Computer Applications, 63(2), 31–36.CrossRef Kausar, Md. A., Dhaka, V. S., & Singh, S. V. (2013). Web crawler: A review. International Journal of Computer Applications, 63(2), 31–36.CrossRef
23.
Zurück zum Zitat David R. H., & Miller, T. L. (1999). Bbn at trec7: Using hidden markov models for information retrieval. TREC - 7. David R. H., & Miller, T. L. (1999). Bbn at trec7: Using hidden markov models for information retrieval. TREC - 7.
24.
Zurück zum Zitat Shekhar, S., Agrawal, R., & Arya, K. V. (2010). An architectural framework of a crawler for retrieving highly relevant web documents by filtering replicated web collections. In International conference on advances in computer engineering (pp. 29–33). IEEE Conference Publications. Shekhar, S., Agrawal, R., & Arya, K. V. (2010). An architectural framework of a crawler for retrieving highly relevant web documents by filtering replicated web collections. In International conference on advances in computer engineering (pp. 29–33). IEEE Conference Publications.
25.
Zurück zum Zitat Slamet, C., Andrian, R., Maylawati, D.S., et al. (2018). Web scraping and naïve Bayes classification for job search engine. In IOP conference series: Materials science and engineering. Indonesia: IOP Science.1–7. Slamet, C., Andrian, R., Maylawati, D.S., et al. (2018). Web scraping and naïve Bayes classification for job search engine. In IOP conference series: Materials science and engineering. Indonesia: IOP Science.1–7.
26.
Zurück zum Zitat Munzert, S., Rubba, C., Meissner, P., & Nyhuis, D. (2015). Automated data collection with R: A practical guide to web scraping and text mining. Journal of Statistical Software, 68, 1–3. Munzert, S., Rubba, C., Meissner, P., & Nyhuis, D. (2015). Automated data collection with R: A practical guide to web scraping and text mining. Journal of Statistical Software, 68, 1–3.
27.
Zurück zum Zitat Malik, S. K., & Rizvi, S. (2011). Information extraction using web usage mining, web scrapping and semantic annotation. In International conference on computational intelligence and communication networks (pp. 465–469). Malik, S. K., & Rizvi, S. (2011). Information extraction using web usage mining, web scrapping and semantic annotation. In International conference on computational intelligence and communication networks (pp. 465–469).
29.
Zurück zum Zitat Kennedy, J., & Eberhart, R, (1995). Particle swarm optimization. In International conference on neural networks-Australia (pp. 1942–1948). IEEE. Kennedy, J., & Eberhart, R, (1995). Particle swarm optimization. In International conference on neural networks-Australia (pp. 1942–1948). IEEE.
30.
Zurück zum Zitat Younus, Z. S., Mohamad, D., Saba, T., Alkawaz, M. H., et al. (2014). Content-based image retrieval using PSO and k-means clustering algorithm. Arabian Journal of Geosciences, 8(8), 6211–6224.CrossRef Younus, Z. S., Mohamad, D., Saba, T., Alkawaz, M. H., et al. (2014). Content-based image retrieval using PSO and k-means clustering algorithm. Arabian Journal of Geosciences, 8(8), 6211–6224.CrossRef
31.
Zurück zum Zitat Lai, J., Liu, Q., & Liu, Y. (2010). Web information extraction based on hidden Markov model. In International conference on computer supported cooperative work in design (pp. 234–238). IEEE Explore. Lai, J., Liu, Q., & Liu, Y. (2010). Web information extraction based on hidden Markov model. In International conference on computer supported cooperative work in design (pp. 234–238). IEEE Explore.
32.
Zurück zum Zitat Larabi Marie-Sainte, S., & Alalyani, A. (2018). Firefly algorithm based feature selection for Arabic text classification. Journal of King Saud University – Computer and Information Sciences, 32(3), 320–328.CrossRef Larabi Marie-Sainte, S., & Alalyani, A. (2018). Firefly algorithm based feature selection for Arabic text classification. Journal of King Saud University – Computer and Information Sciences, 32(3), 320–328.CrossRef
Metadaten
Titel
Firefly Optimization Algorithm Based Web Scraping for Web Citation Extraction
verfasst von
E. Suganya
S. Vijayarani
Publikationsdatum
03.02.2021
Verlag
Springer US
Erschienen in
Wireless Personal Communications / Ausgabe 2/2021
Print ISSN: 0929-6212
Elektronische ISSN: 1572-834X
DOI
https://doi.org/10.1007/s11277-021-08093-z

Weitere Artikel der Ausgabe 2/2021

Wireless Personal Communications 2/2021 Zur Ausgabe

Neuer Inhalt