Skip to main content

2018 | OriginalPaper | Buchkapitel

Adaptive and Parallel Data Acquisition from Online Big Graphs

verfasst von : Zidu Yin, Kun Yue, Hao Wu, Yingjie Su

Erschienen in: Database Systems for Advanced Applications

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Acquisition of contents from online big graphs (OBGs) like linked Web pages, social networks and knowledge graphs, is critical as data infrastructure for Web applications and massive data analysis. However, effective data acquisition is challenging due to the massive, heterogeneous, dynamically evolving properties of OBGs with unknown global topological structures. In this paper, we give an adaptive and parallel approach for effective data acquisition from OBGs. We adopt the ideas of Quasi Monte Carlo (QMC) and branch & bound methods to propose an adaptive Web-scale sampling algorithm for parallel data collection implemented upon Spark. Experimental results show the effectiveness and efficiency of our method.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
2.
Zurück zum Zitat Faure, H., Lemieux, C.: Improved Halton sequences and discrepancy bounds. Monte Carlo Methods Appl. 16(3), 1–18 (2010)MathSciNetMATH Faure, H., Lemieux, C.: Improved Halton sequences and discrepancy bounds. Monte Carlo Methods Appl. 16(3), 1–18 (2010)MathSciNetMATH
3.
Zurück zum Zitat Hammersley, J., Handscomb, D.: Monte Carlo methods. Appl. Stat. 14(2/3), 347–385 (1964)MATH Hammersley, J., Handscomb, D.: Monte Carlo methods. Appl. Stat. 14(2/3), 347–385 (1964)MATH
4.
Zurück zum Zitat Sharma, A., Baral, C.: Automatic extraction of events-based conditional commonsense knowledge. In: Proceedings of Workshops at the 30th AAAI Conference on Artificial Intelligence, Phoenix, USA, pp. 527–531. AAAI (2016) Sharma, A., Baral, C.: Automatic extraction of events-based conditional commonsense knowledge. In: Proceedings of Workshops at the 30th AAAI Conference on Artificial Intelligence, Phoenix, USA, pp. 527–531. AAAI (2016)
5.
Zurück zum Zitat Surendran, S., Prasad, D., Kaimal, M.: A scalable geometric algorithm for community detection from social networks with incremental update. Soc. Netw. Anal. Min. 6(1), 90:1–90:13 (2016)CrossRef Surendran, S., Prasad, D., Kaimal, M.: A scalable geometric algorithm for community detection from social networks with incremental update. Soc. Netw. Anal. Min. 6(1), 90:1–90:13 (2016)CrossRef
6.
Zurück zum Zitat Xi, S., Sun, F., Wang, J.: A cognitive crawler using structure pattern for incremental crawling and content extraction. In: IEEE International Conference on Cognitive Informatics, Beijing, China, pp. 238–244. IEEE (2010) Xi, S., Sun, F., Wang, J.: A cognitive crawler using structure pattern for incremental crawling and content extraction. In: IEEE International Conference on Cognitive Informatics, Beijing, China, pp. 238–244. IEEE (2010)
7.
Zurück zum Zitat Wu, X., Chen, H., Wu, G., Liu, J., et al.: Knowledge engineering with big data. IEEE Intell. Syst. 30(5), 46–55 (2015)CrossRef Wu, X., Chen, H., Wu, G., Liu, J., et al.: Knowledge engineering with big data. IEEE Intell. Syst. 30(5), 46–55 (2015)CrossRef
8.
Zurück zum Zitat Stivala, A., Koskinen, J., Rolls, D., Wang, P., Robins, G.: Snowball sampling for estimating exponential random graph models for large networks. Soc. Netw. 47, 167–188 (2016)CrossRef Stivala, A., Koskinen, J., Rolls, D., Wang, P., Robins, G.: Snowball sampling for estimating exponential random graph models for large networks. Soc. Netw. 47, 167–188 (2016)CrossRef
9.
Zurück zum Zitat Urbani, J., Dutta, S., Gurajada, S., Weikum, G.: KOGNAC: efficient encoding of large knowledge graphs. In: International Joint Conference on Artificial Intelligence, New York, USA, pp. 3896–3902 (2016) Urbani, J., Dutta, S., Gurajada, S., Weikum, G.: KOGNAC: efficient encoding of large knowledge graphs. In: International Joint Conference on Artificial Intelligence, New York, USA, pp. 3896–3902 (2016)
10.
Zurück zum Zitat Wu, C., Hou, W., Shi, Y., Liu, T.: A Web search contextual crawler using ontology relation mining. In: International Conference on Computational Intelligence and Software Engineering, pp. 1–4. IEEE (2009) Wu, C., Hou, W., Shi, Y., Liu, T.: A Web search contextual crawler using ontology relation mining. In: International Conference on Computational Intelligence and Software Engineering, pp. 1–4. IEEE (2009)
11.
Zurück zum Zitat Tsai, C., Lin, W., Ke, S.: Big data mining with parallel computing: a comparison of distributed and MapReduce methodologies. J. Syst. Softw. 122, 83–92 (2016)CrossRef Tsai, C., Lin, W., Ke, S.: Big data mining with parallel computing: a comparison of distributed and MapReduce methodologies. J. Syst. Softw. 122, 83–92 (2016)CrossRef
Metadaten
Titel
Adaptive and Parallel Data Acquisition from Online Big Graphs
verfasst von
Zidu Yin
Kun Yue
Hao Wu
Yingjie Su
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-91452-7_21

Premium Partner