Skip to main content
Erschienen in: International Journal of Machine Learning and Cybernetics 1/2024

10.06.2023 | Original Article

Semantic rule-based information extraction for meteorological reports

verfasst von: Mengmeng Cui, Ruibin Huang, Zhichen Hu, Fan Xia, Xiaolong Xu, Lianyong Qi

Erschienen in: International Journal of Machine Learning and Cybernetics | Ausgabe 1/2024

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Meteorological reports are one of the most important means of recording the weather conditions of a place over a period of time, and the existence of a large number of meteorological reports creates a huge demand for text processing and information extraction. However, valuable data and information are still buried deep in the mountain of meteorological reports, and there is an urgent need for an automated information extraction technique to help people integrate data from multiple meteorological reports and perform data analysis for a more comprehensive understanding of a specific meteorological topic or domain. Named entity recognition (NER) technique can extract useful entity information from meteorological reports. By analyzing the characteristics of nested entities in meteorological reports, this paper further proposes to introduce Multi-Conditional Random Fields (Multi-CRF), which uses each layer of CRF to output the recognition results of each type of entities, which helps to solve the problem of identifying nested entities in meteorological reports. The experimental results show that our model achieves state-of-the-art results. The final recognition results provide effective data support for automatic text verification recognition in the meteorological domain and provide important practical value for the construction of knowledge graphs of related meteorological reports.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Weitere Produktempfehlungen anzeigen
Literatur
1.
Zurück zum Zitat Li J, Sun A, Han J, Li C (2020) A survey on deep learning for named entity recognition. IEEE Trans Knowl Data Eng 34(1):50–70CrossRef Li J, Sun A, Han J, Li C (2020) A survey on deep learning for named entity recognition. IEEE Trans Knowl Data Eng 34(1):50–70CrossRef
2.
Zurück zum Zitat Lafferty J, McCallum A, Pereira F.C (2001) Conditional random fields: Probabilistic models for segmenting and labeling sequence data Lafferty J, McCallum A, Pereira F.C (2001) Conditional random fields: Probabilistic models for segmenting and labeling sequence data
3.
Zurück zum Zitat Haojun F, Duan L, Zhang B, Jiangzhou L (2020) A collective entity linking method based on graph embedding algorithm. In: 2020 5th International Conference on Mechanical, Control and Computer Engineering (ICMCCE), pp. 1479–1482 Haojun F, Duan L, Zhang B, Jiangzhou L (2020) A collective entity linking method based on graph embedding algorithm. In: 2020 5th International Conference on Mechanical, Control and Computer Engineering (ICMCCE), pp. 1479–1482
4.
5.
Zurück zum Zitat Lu Y, Liu Q, Dai D, Xiao X, Lin H, Han X, Sun L, Wu H (2022) Unified structure generation for universal information extraction. arXiv preprint arXiv:2203.12277 Lu Y, Liu Q, Dai D, Xiao X, Lin H, Han X, Sun L, Wu H (2022) Unified structure generation for universal information extraction. arXiv preprint arXiv:​2203.​12277
6.
Zurück zum Zitat Li Q, Li J, Sheng J, Cui S, Wu J, Hei Y, Peng H, Guo S, Wang L, Beheshti A, et al. (2022) A survey on deep learning event extraction: Approaches and applications. IEEE Transactions on Neural Networks and Learning Systems Li Q, Li J, Sheng J, Cui S, Wu J, Hei Y, Peng H, Guo S, Wang L, Beheshti A, et al. (2022) A survey on deep learning event extraction: Approaches and applications. IEEE Transactions on Neural Networks and Learning Systems
7.
Zurück zum Zitat Li Q, Peng H, Li J, Wu J, Ning Y, Wang L, Philip SY, Wang Z (2021) Reinforcement learning-based dialogue guided event extraction to exploit argument relations. IEEE/ACM Transactions on Audio, Speech, and Language Processing 30:520–533CrossRef Li Q, Peng H, Li J, Wu J, Ning Y, Wang L, Philip SY, Wang Z (2021) Reinforcement learning-based dialogue guided event extraction to exploit argument relations. IEEE/ACM Transactions on Audio, Speech, and Language Processing 30:520–533CrossRef
8.
Zurück zum Zitat de Castro Júnior S.L, da Silva I.J.O, Alves-Souza S.N, de Souza L.S (2020) Quality of meteorological data used in the context of agriculture: An issue. In: 2020 15th Iberian Conference on Information Systems and Technologies (CISTI), pp. 1–6 . IEEE de Castro Júnior S.L, da Silva I.J.O, Alves-Souza S.N, de Souza L.S (2020) Quality of meteorological data used in the context of agriculture: An issue. In: 2020 15th Iberian Conference on Information Systems and Technologies (CISTI), pp. 1–6 . IEEE
9.
Zurück zum Zitat Zheng L, Li X, Shi L, Qi S, Hu D, Chen Z (2019) Study on automatic and manual observation of precipitation weather phenomenon. In: 2019 International Conference on Meteorology Observations (ICMO), pp. 1–3 . IEEE Zheng L, Li X, Shi L, Qi S, Hu D, Chen Z (2019) Study on automatic and manual observation of precipitation weather phenomenon. In: 2019 International Conference on Meteorology Observations (ICMO), pp. 1–3 . IEEE
10.
Zurück zum Zitat Chenglin Q, Qing S, Pengzhou Z, Hui Y (2018) Cn-makg: China meteorology and agriculture knowledge graph construction based on semi-structured data. In: 2018 IEEE/ACIS 17th International Conference on Computer and Information Science (ICIS), pp. 692–696 . IEEE Chenglin Q, Qing S, Pengzhou Z, Hui Y (2018) Cn-makg: China meteorology and agriculture knowledge graph construction based on semi-structured data. In: 2018 IEEE/ACIS 17th International Conference on Computer and Information Science (ICIS), pp. 692–696 . IEEE
11.
Zurück zum Zitat Sharnagat R (2014) Named entity recognition: A literature survey. Center For Indian Language Technology, 1–27 Sharnagat R (2014) Named entity recognition: A literature survey. Center For Indian Language Technology, 1–27
12.
Zurück zum Zitat Rabiner LR (1989) A tutorial on hidden markov models and selected applications in speech recognition. Proc IEEE 77(2):257–286CrossRef Rabiner LR (1989) A tutorial on hidden markov models and selected applications in speech recognition. Proc IEEE 77(2):257–286CrossRef
13.
Zurück zum Zitat Yadav V, Bethard S (2019) A survey on recent advances in named entity recognition from deep learning models. arXiv preprint arXiv:1910.11470 Yadav V, Bethard S (2019) A survey on recent advances in named entity recognition from deep learning models. arXiv preprint arXiv:​1910.​11470
14.
Zurück zum Zitat Akhundova N (2021) Named entity recognition for the azerbaijani language. In: 2021 IEEE 15th International Conference on Application of Information and Communication Technologies (AICT), pp. 1–7 Akhundova N (2021) Named entity recognition for the azerbaijani language. In: 2021 IEEE 15th International Conference on Application of Information and Communication Technologies (AICT), pp. 1–7
15.
Zurück zum Zitat Feilmayr C (2011) Text mining-supported information extraction: An extended methodology for developing information extraction systems. In: 2011 22nd International Workshop on Database and Expert Systems Applications, pp. 217–221 Feilmayr C (2011) Text mining-supported information extraction: An extended methodology for developing information extraction systems. In: 2011 22nd International Workshop on Database and Expert Systems Applications, pp. 217–221
16.
Zurück zum Zitat Liu C, Fan C, Wang Z, Sun Y (2020) An instance transfer-based approach using enhanced recurrent neural network for domain named entity recognition. IEEE Access 8:45263–45270CrossRef Liu C, Fan C, Wang Z, Sun Y (2020) An instance transfer-based approach using enhanced recurrent neural network for domain named entity recognition. IEEE Access 8:45263–45270CrossRef
17.
Zurück zum Zitat Qiu J, Zhou Y, Wang Q, Ruan T, Gao J (2019) Chinese clinical named entity recognition using residual dilated convolutional neural network with conditional random field. IEEE Trans Nanobiosci 18(3):306–315CrossRef Qiu J, Zhou Y, Wang Q, Ruan T, Gao J (2019) Chinese clinical named entity recognition using residual dilated convolutional neural network with conditional random field. IEEE Trans Nanobiosci 18(3):306–315CrossRef
18.
Zurück zum Zitat Wang J, Shou L, Chen K, Chen G (2020) Pyramid: A layered model for nested named entity recognition. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 5918–5928 Wang J, Shou L, Chen K, Chen G (2020) Pyramid: A layered model for nested named entity recognition. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 5918–5928
19.
Zurück zum Zitat Cao Y, Peng H, Yu P.S (2020) Multi-information source hin for medical concept embedding. In: Advances in Knowledge Discovery and Data Mining: 24th Pacific-Asia Conference, PAKDD 2020, Singapore, May 11–14, 2020, Proceedings, Part II 24, pp. 396–408 . Springer Cao Y, Peng H, Yu P.S (2020) Multi-information source hin for medical concept embedding. In: Advances in Knowledge Discovery and Data Mining: 24th Pacific-Asia Conference, PAKDD 2020, Singapore, May 11–14, 2020, Proceedings, Part II 24, pp. 396–408 . Springer
20.
Zurück zum Zitat Yang Y, Yin X, Yang H, Fei X, Peng H, Zhou K, Lai K, Shen J (2021) Kgsynnet: A novel entity synonyms discovery framework with knowledge graph. In: Database Systems for Advanced Applications: 26th International Conference, DASFAA 2021, Taipei, Taiwan, April 11–14, 2021, Proceedings, Part I 26, pp. 174–190 . Springer Yang Y, Yin X, Yang H, Fei X, Peng H, Zhou K, Lai K, Shen J (2021) Kgsynnet: A novel entity synonyms discovery framework with knowledge graph. In: Database Systems for Advanced Applications: 26th International Conference, DASFAA 2021, Taipei, Taiwan, April 11–14, 2021, Proceedings, Part I 26, pp. 174–190 . Springer
21.
Zurück zum Zitat Devlin J, Chang M.-W, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 Devlin J, Chang M.-W, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:​1810.​04805
22.
Zurück zum Zitat Zhou C, Li Q, Li C, Yu J, Liu Y, Wang G, Zhang K, Ji C, Yan Q, He L, et al (2023) A comprehensive survey on pretrained foundation models: A history from bert to chatgpt. arXiv preprint arXiv:2302.09419 Zhou C, Li Q, Li C, Yu J, Liu Y, Wang G, Zhang K, Ji C, Yan Q, He L, et al (2023) A comprehensive survey on pretrained foundation models: A history from bert to chatgpt. arXiv preprint arXiv:​2302.​09419
23.
Zurück zum Zitat Tian M.-J, Cui R.-Y, Huang Z.-H (2018) Automatic extraction method for specific domain terms based on structural features and mutual information. In: 2018 5th International Conference on Information Science and Control Engineering (ICISCE), pp. 147–150 Tian M.-J, Cui R.-Y, Huang Z.-H (2018) Automatic extraction method for specific domain terms based on structural features and mutual information. In: 2018 5th International Conference on Information Science and Control Engineering (ICISCE), pp. 147–150
24.
Zurück zum Zitat Nakayama H, Kubo T, Kamura J, Taniguchi Y, Liang X (2018) doccano: Text annotation tool for human. Software available from https://github. com/doccano/doccano Nakayama H, Kubo T, Kamura J, Taniguchi Y, Liang X (2018) doccano: Text annotation tool for human. Software available from https://​github.​ com/doccano/doccano
25.
Zurück zum Zitat GAN T, GAN Y, HE Y (2019) Subsequence-level entity attention lstm for relation extraction. In: 2019 16th International Computer Conference on Wavelet Active Media Technology and Information Processing, pp. 262–265 GAN T, GAN Y, HE Y (2019) Subsequence-level entity attention lstm for relation extraction. In: 2019 16th International Computer Conference on Wavelet Active Media Technology and Information Processing, pp. 262–265
26.
Zurück zum Zitat Caruana R, Lawrence S, Giles L (2000) Overfitting in neural nets: Backpropagation, conjugate gradient, and early stopping. In: Proceedings of the 13th International Conference on Neural Information Processing Systems. NIPS’00, pp. 381–387. MIT Press, Cambridge, MA, USA Caruana R, Lawrence S, Giles L (2000) Overfitting in neural nets: Backpropagation, conjugate gradient, and early stopping. In: Proceedings of the 13th International Conference on Neural Information Processing Systems. NIPS’00, pp. 381–387. MIT Press, Cambridge, MA, USA
27.
Zurück zum Zitat Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958MathSciNet Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958MathSciNet
29.
Zurück zum Zitat Ju M, Miwa M, Ananiadou S (2018) A neural layered model for nested named entity recognition. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp. 1446–1459 Ju M, Miwa M, Ananiadou S (2018) A neural layered model for nested named entity recognition. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp. 1446–1459
31.
Zurück zum Zitat Li J, Fei H, Liu J, Wu S, Zhang M, Teng C, Ji D, Li F (2022) Unified named entity recognition as word-word relation classification. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 10965–10973 Li J, Fei H, Liu J, Wu S, Zhang M, Teng C, Ji D, Li F (2022) Unified named entity recognition as word-word relation classification. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 10965–10973
32.
Zurück zum Zitat Li X, Feng J, Meng Y, Han Q, Wu F, Li J (2019) A unified mrc framework for named entity recognition. arXiv preprint arXiv:1910.11476 Li X, Feng J, Meng Y, Han Q, Wu F, Li J (2019) A unified mrc framework for named entity recognition. arXiv preprint arXiv:​1910.​11476
33.
Zurück zum Zitat Yu S, Duan H, Wu Y (2018) Corpus of multi-level processing for modern chinese. Available at: opendata. pku. edu. cn/dataset. xhtml Yu S, Duan H, Wu Y (2018) Corpus of multi-level processing for modern chinese. Available at: opendata. pku. edu. cn/dataset. xhtml
34.
Zurück zum Zitat Sang E.F, De Meulder F (2003) Introduction to the conll-2003 shared task: Language-independent named entity recognition. arXiv preprint cs/0306050 Sang E.F, De Meulder F (2003) Introduction to the conll-2003 shared task: Language-independent named entity recognition. arXiv preprint cs/0306050
36.
Zurück zum Zitat Cui Y, Che W, Liu T, Qin B, Yang Z (2021) Pre-training with whole word masking for chinese bert. IEEE/ACM Transactions on Audio, Speech, and Language Processing 29:3504–3514CrossRef Cui Y, Che W, Liu T, Qin B, Yang Z (2021) Pre-training with whole word masking for chinese bert. IEEE/ACM Transactions on Audio, Speech, and Language Processing 29:3504–3514CrossRef
37.
Zurück zum Zitat Cui Y, Che W, Liu T, Qin B, Wang S, Hu G (2020) Revisiting pre-trained models for Chinese natural language processing. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings, pp. 657–668. Association for Computational Linguistics, Online Cui Y, Che W, Liu T, Qin B, Wang S, Hu G (2020) Revisiting pre-trained models for Chinese natural language processing. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings, pp. 657–668. Association for Computational Linguistics, Online
38.
Zurück zum Zitat Xu L, Zhang X, Dong Q (2020) Cluecorpus2020: A large-scale chinese corpus for pre-training language model. arXiv preprint arXiv:2003.01355 Xu L, Zhang X, Dong Q (2020) Cluecorpus2020: A large-scale chinese corpus for pre-training language model. arXiv preprint arXiv:​2003.​01355
Metadaten
Titel
Semantic rule-based information extraction for meteorological reports
verfasst von
Mengmeng Cui
Ruibin Huang
Zhichen Hu
Fan Xia
Xiaolong Xu
Lianyong Qi
Publikationsdatum
10.06.2023
Verlag
Springer Berlin Heidelberg
Erschienen in
International Journal of Machine Learning and Cybernetics / Ausgabe 1/2024
Print ISSN: 1868-8071
Elektronische ISSN: 1868-808X
DOI
https://doi.org/10.1007/s13042-023-01885-8

Weitere Artikel der Ausgabe 1/2024

International Journal of Machine Learning and Cybernetics 1/2024 Zur Ausgabe

Neuer Inhalt