nach oben

Wireless Personal Communications

Erschienen in:

03.02.2021

Firefly Optimization Algorithm Based Web Scraping for Web Citation Extraction

verfasst von: E. Suganya, S. Vijayarani

Erschienen in: Wireless Personal Communications | Ausgabe 2/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Web citation analysis is primarily used to examine the impact of an author, an article or a publication by counting the amount of intervals that has been cited by other authors. The significant goal of web citation analysis is helping the researchers to find their related papers for their further analysis. The references of the paper must be cited by the author, which is used to recognize the link among the previous relevant research. The citation provides the valuable information that directs the researchers to gain knowledge about the current trends and future developments and obtain new ideas in their respective fields. The citation’s information are incorporate in the database called web citation database such as Google Scholar, Web of Science, Scopus and so on. From the web citation database, extracting the user required information is very complex task. Most of the open source tools are available online but manual process is needed to select the user-required information from the web page. For instance if the user need author name and publisher from the web citation database, they required to choose the exact information tags manually in existing tools which consumes more time. To overcome this difficulty we proposed an algorithm Firefly Optimization Algorithm based Web Scraping for web content extraction from web citation database. The primary purpose of this research is to extract author information extraction process extracts citation information published by an author, journal name, publisher, year and citation using web citation analysis. The user’s query input will be the keyword for example big data, artificial intelligence, etc. The web citation information from multiple web pages using Web crawling and web scraping techniques are applied for web citation information based on the user query and Particle Swarm Optimization, Hidden Markov Model are applied for finding the best solution from all the feasible solutions. Experiments illustrate the proposed FOAWS algorithm outperforms well comparing to other two algorithms.

Vorheriger Artikel SALDEFT: Self-Adaptive Learning Differential Evolution Based Optimal Physical Machine Selection for Fault Tolerance Problem in Cloud

Nächster Artikel A Survey on Traffic Management in Software-Defined Networks: Challenges, Effective Approaches, and Potential Measures

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Sari, F. S., & Kurniawan, A. (2010). Implementation of Indonesian electronic citation system based on web. Third International Conference on Knowledge Discovery and Data Mining—IEEE Transaction, 1, 494–497.

Liu, B. (2007). Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data. Berlin: Springer.MATH

Mehra, J., & Thakur, R. S. (2018). An Effective method for Web Log 3 preprocessing and page access frequency using web usage mining. International Journal of Applied Engineering Research, 13(2), 1227–1232.

Ceri, S., Bozzon, A., Brambilla, M., Della Valle, E., et al. (2013). Web information retrieval. Berlin: Springer.CrossRef

Khalil, S., & Fakir, M. (2017). R Crawler: An R package for parallel web crawling and scraping. Software X, 6, 98–106.

Ammulu, K., & Venugopal, T. (2017). Mining web data using PSO algorithm. International Journal for Innovative Research in Science & Technology, 4(2), 201–207.

Li, R. (2013). Web information extraction by using improved hybrid intelligent algorithm and HMM. In Information and computer technologies, computer modelling and new technologies (pp. 251–259).

Kim, C., & Shim, K. (2011). TEXT: Automatic template extraction from heterogeneous web pages. IEEE Transactions on Knowledge and Data Engineering, 23(4), 612–626.CrossRef

Udapure, T. V., Kale, R. D., & Dharmik, R. C. (2014). Study of web crawler and its different types. Journal of Computer Engineering, 16(1), 01–05.

10.

Balaji, H., & Govardhan, A. (2014). A survey on hidden Markov models for information extraction. International Journal of Application or Innovation in Engineering & Management, 3(8), 176–179.

11.

Saklani, N. S., & Sharma, S. (2016). Extracting news from the web pages by using concept of clustering with neural genetic approach. International Journal of Advance Research, Ideas, and Innovations in Technology, 2(5), 1–17.

12.

Bhagat, A., & Raut, V. (2015). Ternary based web crawler for optimized search results. International Journal of Computer Science and Information Technologies, 6(5), 4444–4449.

13.

Ge, A., Mao, W., & Zeng, D. (2010). Story extraction from the Web: A case study in security informatics. In IEEE international conference on service operations and logistics, and informatics (pp. 306–310).

14.

Huang J., Liu, Z., Wang, B., et al. (2018). Web data extraction from scientific publishers’ website using hidden Markov model. In Knowledge science, engineering and management (pp. 469–476). Berlin: Springer.

15.

Powley, B., & Dale, R. (2007). High accuracy citation extraction and named entity recognition for a heterogeneous corpus of academic papers. In International conference on natural language processing and knowledge engineering (pp. 119–124). Beijing: IEEE Xplore.

16.

Liangtu, S., & Xiaoming, Z. (2007). Web text feature extraction David with particle swarm optimization. International Journal of Computer Science and Network Security, 7(6), 132–136.

17.

Seymore, K., McCallum, A., & Rosenfeld, R. (1999). Learning hidden Markov model structure for information extraction. AAAI-99 Workshop on a machine (pp. 37–42).

18.

Linnet Hailey, V. P. A., & Sudha, N. (2013). An optimization approach of firefly algorithm to record deduplication. International Journal of Engineering Research & Technology, 2(9), 2045–2049.

19.

Alexandrescu, A. (2019). Optimization and security in information retrieval, extraction, processing, and presentation on a cloud platform. Information, 10, 200.CrossRef

20.

Pal, A., Tomar, D. S., & Shrivastava, S. C. (2009). Effective focused crawling based on content and link structure analysis. International Journal of Computer Science and Information Security, 2(1), 103–107.

21.

Pranav, A., & Chauhan, S. (2015). Efficient focused web crawling approach for search engine. International Journal of Computer Science and Mobile Computing, 4(5), 545–551.

22.

Kausar, Md. A., Dhaka, V. S., & Singh, S. V. (2013). Web crawler: A review. International Journal of Computer Applications, 63(2), 31–36.CrossRef

23.

David R. H., & Miller, T. L. (1999). Bbn at trec7: Using hidden markov models for information retrieval. TREC - 7.

24.

Shekhar, S., Agrawal, R., & Arya, K. V. (2010). An architectural framework of a crawler for retrieving highly relevant web documents by filtering replicated web collections. In International conference on advances in computer engineering (pp. 29–33). IEEE Conference Publications.

25.

Slamet, C., Andrian, R., Maylawati, D.S., et al. (2018). Web scraping and naïve Bayes classification for job search engine. In IOP conference series: Materials science and engineering. Indonesia: IOP Science.1–7.

26.

Munzert, S., Rubba, C., Meissner, P., & Nyhuis, D. (2015). Automated data collection with R: A practical guide to web scraping and text mining. Journal of Statistical Software, 68, 1–3.

27.

Malik, S. K., & Rizvi, S. (2011). Information extraction using web usage mining, web scrapping and semantic annotation. In International conference on computational intelligence and communication networks (pp. 465–469).

28.

Andrew Cantino, K.M. (2013). SelectorGadget. Retrieved from Chrome Web store: https://chrome.google.com/webstore/detail/selectorgadget/mhjhnkcfbdhnjickkkdbjoemdmbfginb?hl=en

29.

Kennedy, J., & Eberhart, R, (1995). Particle swarm optimization. In International conference on neural networks-Australia (pp. 1942–1948). IEEE.

30.

Younus, Z. S., Mohamad, D., Saba, T., Alkawaz, M. H., et al. (2014). Content-based image retrieval using PSO and k-means clustering algorithm. Arabian Journal of Geosciences, 8(8), 6211–6224.CrossRef

31.

Lai, J., Liu, Q., & Liu, Y. (2010). Web information extraction based on hidden Markov model. In International conference on computer supported cooperative work in design (pp. 234–238). IEEE Explore.

32.

Larabi Marie-Sainte, S., & Alalyani, A. (2018). Firefly algorithm based feature selection for Arabic text classification. Journal of King Saud University – Computer and Information Sciences, 32(3), 320–328.CrossRef

Titel: Firefly Optimization Algorithm Based Web Scraping for Web Citation Extraction
verfasst von: E. Suganya
S. Vijayarani
Publikationsdatum: 03.02.2021
Verlag: Springer US
Erschienen in: Wireless Personal Communications / Ausgabe 2/2021
Print ISSN: 0929-6212
Elektronische ISSN: 1572-834X
DOI: https://doi.org/10.1007/s11277-021-08093-z

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Nachhaltigkeitsaward Key Visual/© Cometis AG/Global ESG Monitor | Daniel Rupp | Generiert mit KI, Search Icon, Banner Hanser, Jonas Klose/© Pine Valley Capital GmbH, Carina Kießling von der Strategieberatung Roland Berger/© Monika Walther Fotografie | ATZ, Beijing Auto Show 2024: Deutsche Hersteller wollen angreifen./© EKH-Pictures / Generated with AI / Stock.adobe.com, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, Zukunftswerkstatt Sales Excellence 2024/© AndreyPopov / Getty Images / iStock, 2023_Antrieb/© supervisuell, ATZ-Webinar: Prototypenfreie Entwicklung durch Offline- und Driver-in-the-Loop-HiL-Tests /© (c) VI-grade

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 2/2021

Data Analysis and Interpretation in IoT-Based Systems for Critical Medical Services and Healthcare Applications

Two-Dimensional Information Transmission Method in DSSS

A Survey on Advanced Multiple Access Techniques for 5G and Beyond Wireless Communications

Distributed Competitive Decision Making Using Multi-Armed Bandit Algorithms

SALDEFT: Self-Adaptive Learning Differential Evolution Based Optimal Physical Machine Selection for Fault Tolerance Problem in Cloud

Design and Implementation of Routing Algorithm to Enhance Network Lifetime in WBAN

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.