Skip to main content
Top
Published in:
Cover of the book

2023 | OriginalPaper | Chapter

Ontology Augmented Data Lake System for Policy Support

Authors : Apurva Kulkarni, Pooja Bassin, Niharika Sri Parasa, Vinu E. Venugopal, Srinath Srinivasa, Chandrashekar Ramanathan

Published in: Big Data Analytics in Astronomy, Science, and Engineering

Publisher: Springer Nature Switzerland

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Analytics of Big Data in the absence of an accompanying framework of metadata can be a quite daunting task. While it is true that statistical algorithms can do large-scale analyses on diverse data with little support from metadata, using such methods on widely dispersed, extremely diverse, and dynamic data may not necessarily produce trustworthy findings. One such task is identifying the impact of indicators for various Sustainable Development Goals (SDGs). One of the methods to analyze impact is by developing a Bayesian network for the policymaker to make informed decisions under uncertainty. It is of key interest to policy-makers worldwide to rely on such models to decide the new policies of a state or a country (https://​sdgs.​un.​org/​2030agenda). The accuracy of the models can be improved by considering enriched data – often done by incorporating pertinent data from multiple sources. However, due to the challenges associated with volume, variety, veracity, and the structure of the data, traditional data lake systems fall short of identifying information that is syntactically diverse yet semantically connected. In this paper, we propose a Data Lake (DL) framework that targets ingesting & processing of data like any traditional DL, and in addition, is capable of performing data retrieval for applications such as Policy Support Systems (where the selection of data greatly affect the output interpretations) by using ontologies as the intermediary. We discuss the proof of concept for the proposed system and the preliminary results (IIITB Data Lake project Website link: http://​cads.​iiitb.​ac.​in/​wordpress/​) based on the data collected from the agriculture department of the Government of Karnataka (GoK).

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Ali, A., Manzoor, D., Alouraini, A.: The implementation of government cloud for the services under e-governance in the KSA. Sci. Int. 33(3), 249–257 (2021) Ali, A., Manzoor, D., Alouraini, A.: The implementation of government cloud for the services under e-governance in the KSA. Sci. Int. 33(3), 249–257 (2021)
2.
4.
go back to reference Beheshti, A., Benatallah, B., Nouri, R., Tabebordbar, A.: CoreKG: a knowledge lake service. Proc. VLDB Endow. 11(12), 1942–1945 (2018)CrossRef Beheshti, A., Benatallah, B., Nouri, R., Tabebordbar, A.: CoreKG: a knowledge lake service. Proc. VLDB Endow. 11(12), 1942–1945 (2018)CrossRef
5.
go back to reference Bialecki, A., Muir, R., Ingersoll, G.: Apache lucene 4. In: Trotman, A., Clarke, C.L.A., Ounis, I., Culpepper, J.S., Cartright, M., Geva, S. (eds.) Proceedings of the SIGIR 2012 Workshop on Open Source Information Retrieval, OSIR@SIGIR 2012, Portland, Oregon, USA, 16 August 2012, pp. 17–24. University of Otago, Dunedin, New Zealand (2012) Bialecki, A., Muir, R., Ingersoll, G.: Apache lucene 4. In: Trotman, A., Clarke, C.L.A., Ounis, I., Culpepper, J.S., Cartright, M., Geva, S. (eds.) Proceedings of the SIGIR 2012 Workshop on Open Source Information Retrieval, OSIR@SIGIR 2012, Portland, Oregon, USA, 16 August 2012, pp. 17–24. University of Otago, Dunedin, New Zealand (2012)
6.
go back to reference Boldyreva, E., Gorbunova, N., Grigoreva, T.Y., Ovchinnikova, E.: E-government implementation in Spain, France and Russia: efficiency and trust level. In: SHS Web of Conferences, vol. 62, p. 11005. EDP Sciences (2019) Boldyreva, E., Gorbunova, N., Grigoreva, T.Y., Ovchinnikova, E.: E-government implementation in Spain, France and Russia: efficiency and trust level. In: SHS Web of Conferences, vol. 62, p. 11005. EDP Sciences (2019)
7.
go back to reference Carbone, P., Ewen, S., Fóra, G., Haridi, S., Richter, S., Tzoumas, K.: State management in apache Flink® consistent stateful distributed stream processing. PVLDB 10(12), 1718–1729 (2017) Carbone, P., Ewen, S., Fóra, G., Haridi, S., Richter, S., Tzoumas, K.: State management in apache Flink® consistent stateful distributed stream processing. PVLDB 10(12), 1718–1729 (2017)
8.
go back to reference Daly, H.E.: Beyond Growth: The Economics of Sustainable Development. Beacon Press, Boston (2014) Daly, H.E.: Beyond Growth: The Economics of Sustainable Development. Beacon Press, Boston (2014)
9.
go back to reference De Blasio, E., Selva, D.: Implementing open government: a qualitative comparative analysis of digital platforms in France, Italy and united kingdom. Qual. Quant. 53(2), 871–896 (2019)CrossRef De Blasio, E., Selva, D.: Implementing open government: a qualitative comparative analysis of digital platforms in France, Italy and united kingdom. Qual. Quant. 53(2), 871–896 (2019)CrossRef
10.
go back to reference Dibowski, H., Schmid, S.: Using knowledge graphs to manage a data lake. In: GI-Jahrestagung, pp. 41–50 (2020) Dibowski, H., Schmid, S.: Using knowledge graphs to manage a data lake. In: GI-Jahrestagung, pp. 41–50 (2020)
11.
go back to reference Diouf, P.S., Boly, A., Ndiaye, S.: Variety of data in the ETL processes in the cloud: state of the art. In: 2018 IEEE International Conference on Innovative Research and Development (ICIRD), pp. 1–5. IEEE (2018) Diouf, P.S., Boly, A., Ndiaye, S.: Variety of data in the ETL processes in the cloud: state of the art. In: 2018 IEEE International Conference on Innovative Research and Development (ICIRD), pp. 1–5. IEEE (2018)
12.
go back to reference Fathy, N., Gad, W., Badr, N.: A unified access to heterogeneous big data through ontology-based semantic integration. In: 2019 Ninth International Conference on Intelligent Computing and Information Systems (ICICIS), pp. 387–392. IEEE (2019) Fathy, N., Gad, W., Badr, N.: A unified access to heterogeneous big data through ontology-based semantic integration. In: 2019 Ninth International Conference on Intelligent Computing and Information Systems (ICICIS), pp. 387–392. IEEE (2019)
14.
go back to reference Idowu, L.L., Ali, I.I., Abdullahi, U.G.: A model and architecture for building a sustainable national open government data (OGD) portal. In: Proceedings of the 11th International Conference on Theory and Practice of Electronic Governance, pp. 352–362 (2018) Idowu, L.L., Ali, I.I., Abdullahi, U.G.: A model and architecture for building a sustainable national open government data (OGD) portal. In: Proceedings of the 11th International Conference on Theory and Practice of Electronic Governance, pp. 352–362 (2018)
15.
go back to reference Kumar, R., Sharma, S.C.: Smart information retrieval using query transformation based on ontology and semantic-association. IJACSA 13(4), 388 (2022)CrossRef Kumar, R., Sharma, S.C.: Smart information retrieval using query transformation based on ontology and semantic-association. IJACSA 13(4), 388 (2022)CrossRef
16.
go back to reference Majeed, B., Niazi, H.A.K., Sabahat, N.: E-government in developed and developing countries: a systematic literature review. In: 2019 International Conference on Computing, Electronics & Communications Engineering (iCCECE), pp. 112–117. IEEE (2019) Majeed, B., Niazi, H.A.K., Sabahat, N.: E-government in developed and developing countries: a systematic literature review. In: 2019 International Conference on Computing, Electronics & Communications Engineering (iCCECE), pp. 112–117. IEEE (2019)
17.
go back to reference Miller, R.J.: Open data integration. Proc. VLDB Endow. 11(12), 2130–2139 (2018)CrossRef Miller, R.J.: Open data integration. Proc. VLDB Endow. 11(12), 2130–2139 (2018)CrossRef
18.
go back to reference Mureddu, F., Schmeling, J., Kanellou, E.: Research challenges for the use of big data in policy-making. Transform. Gov. People Process Policy 14(4), 593–604 (2020) Mureddu, F., Schmeling, J., Kanellou, E.: Research challenges for the use of big data in policy-making. Transform. Gov. People Process Policy 14(4), 593–604 (2020)
19.
go back to reference Nargesian, F., Zhu, E., Miller, R.J., Pu, K.Q., Arocena, P.C.: Data lake management: challenges and opportunities. Proc. VLDB Endow. 12(12), 1986–1989 (2019)CrossRef Nargesian, F., Zhu, E., Miller, R.J., Pu, K.Q., Arocena, P.C.: Data lake management: challenges and opportunities. Proc. VLDB Endow. 12(12), 1986–1989 (2019)CrossRef
20.
go back to reference Peña-López, I., et al.: Open, useful and re-usable data (ourdata) index: 2019 (2020) Peña-López, I., et al.: Open, useful and re-usable data (ourdata) index: 2019 (2020)
21.
go back to reference Sawadogo, P., Darmont, J.: On data lake architectures and metadata management. J. Intell. Inf. Syst. 56(1), 97–120 (2021)CrossRef Sawadogo, P., Darmont, J.: On data lake architectures and metadata management. J. Intell. Inf. Syst. 56(1), 97–120 (2021)CrossRef
23.
go back to reference Thirumahal, R., Sudha Sadasivam, G., Shruti, P.: Semantic integration of heterogeneous data sources using ontology-based domain knowledge modeling for early detection of COVID-19. SN Comput. Sci. 3(6), 1–13 (2022)CrossRef Thirumahal, R., Sudha Sadasivam, G., Shruti, P.: Semantic integration of heterogeneous data sources using ontology-based domain knowledge modeling for early detection of COVID-19. SN Comput. Sci. 3(6), 1–13 (2022)CrossRef
24.
go back to reference Venugopal, V.E., Theobald, M., Chaychi, S., Tawakuli, A.: AIR: a light-weight yet high-performance dataflow engine based on asynchronous iterative routing. In: 32nd IEEE SBAC-PAD, Portugal, 9–11 September 2020, pp. 51–58. IEEE (2020) Venugopal, V.E., Theobald, M., Chaychi, S., Tawakuli, A.: AIR: a light-weight yet high-performance dataflow engine based on asynchronous iterative routing. In: 32nd IEEE SBAC-PAD, Portugal, 9–11 September 2020, pp. 51–58. IEEE (2020)
25.
go back to reference Venugopal, V.E., Theobald, M., Tassetti, D., Chaychi, S., Tawakuli, A.: Targeting a light-weight and multi-channel approach for distributed stream processing. J. Parallel Distributed Comput. 167, 77–96 (2022)CrossRef Venugopal, V.E., Theobald, M., Tassetti, D., Chaychi, S., Tawakuli, A.: Targeting a light-weight and multi-channel approach for distributed stream processing. J. Parallel Distributed Comput. 167, 77–96 (2022)CrossRef
26.
go back to reference White, T.: Hadoop: The Definitive Guide, 1st edn. O’Reilly Media Inc., Sebastopol (2009) White, T.: Hadoop: The Definitive Guide, 1st edn. O’Reilly Media Inc., Sebastopol (2009)
27.
go back to reference Zaharia, M., et al.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, NSDI 2012 (2012) Zaharia, M., et al.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, NSDI 2012 (2012)
Metadata
Title
Ontology Augmented Data Lake System for Policy Support
Authors
Apurva Kulkarni
Pooja Bassin
Niharika Sri Parasa
Vinu E. Venugopal
Srinath Srinivasa
Chandrashekar Ramanathan
Copyright Year
2023
DOI
https://doi.org/10.1007/978-3-031-28350-5_1

Premium Partner