Skip to main content

2016 | OriginalPaper | Buchkapitel

DSCrank: A Method for Selection and Ranking of Datasets

verfasst von : Yasmmin Cortes Martins, Fábio Faria da Mota, Maria Cláudia Cavalcanti

Erschienen in: Metadata and Semantics Research

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Considerable efforts have been made to build the Web of Data. One of the main challenges has to do with how to identify the most related datasets to connect to. Another challenge is to publish a local dataset into the Web of Data, following the Linked Data principles. The present work is based on the idea that a set of activities should guide the user on the publication of a new dataset into the Web of Data. It presents the specification and implementation of two initial activities, which correspond to the crawling and ranking of a selected set of existing published datasets. The proposed implementation is based on the focused crawling approach, adapting it to address the Linked Data principles. Moreover, the dataset ranking is based on a quick glimpse into the content of the selected datasets. Additionally, the paper presents a case study in the Biomedical area to validate the implemented approach, and it shows promising results with respect to scalability and performance.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Bizer, C., Heath, T., Berners-Lee, T.: Linked data - the story so far. Int. J. Semant. Web Inf. Syst. 5(3), 1–22 (2009)CrossRef Bizer, C., Heath, T., Berners-Lee, T.: Linked data - the story so far. Int. J. Semant. Web Inf. Syst. 5(3), 1–22 (2009)CrossRef
2.
Zurück zum Zitat Caliskan, K., Ozcan, R.: Comparing classification methods for link context based focused crawlers. In: 2013 International Conference on Electronics, Computer and Computation (ICECCO), pp. 143–146, November 2013 Caliskan, K., Ozcan, R.: Comparing classification methods for link context based focused crawlers. In: 2013 International Conference on Electronics, Computer and Computation (ICECCO), pp. 143–146, November 2013
3.
Zurück zum Zitat Hausenblas, M.: Exploiting linked data to build web applications. IEEE Internet Comput. 13(4), 68–73 (2009). Accessed 01 May 2016MathSciNetCrossRef Hausenblas, M.: Exploiting linked data to build web applications. IEEE Internet Comput. 13(4), 68–73 (2009). Accessed 01 May 2016MathSciNetCrossRef
4.
Zurück zum Zitat Hull, D.A.: Stemming algorithms: a case study for detailed evaluation. J. Am. Soc. Inf. Sci. (JASIS) 47(1), 70–84 (1996)CrossRef Hull, D.A.: Stemming algorithms: a case study for detailed evaluation. J. Am. Soc. Inf. Sci. (JASIS) 47(1), 70–84 (1996)CrossRef
5.
Zurück zum Zitat Leme, L.A.P.P., Lopes, G.R., Nunes, B.P., Casanova, M.A., Dietze, S.: Identifying candidate datasets for data interlinking. In: Daniel, F., Dolog, P., Li, Q. (eds.) ICWE 2013. LNCS, vol. 7977, pp. 354–366. Springer, Heidelberg (2013). doi:10.1007/978-3-642-39200-9_29 CrossRef Leme, L.A.P.P., Lopes, G.R., Nunes, B.P., Casanova, M.A., Dietze, S.: Identifying candidate datasets for data interlinking. In: Daniel, F., Dolog, P., Li, Q. (eds.) ICWE 2013. LNCS, vol. 7977, pp. 354–366. Springer, Heidelberg (2013). doi:10.​1007/​978-3-642-39200-9_​29 CrossRef
6.
Zurück zum Zitat Manning, C.D., Raghavan, P., Schtze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)CrossRefMATH Manning, C.D., Raghavan, P., Schtze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)CrossRefMATH
7.
Zurück zum Zitat Nikolov, A., d’Aquin, M., Motta, E.: What should i link to? identifying relevant sources and classes for data linking. In: Pan, J.Z., Chen, H., Kim, H.-G., Li, J., Horrocks, I., Mizoguchi, R., Wu, Z., Wu, Z. (eds.) JIST 2011. LNCS, vol. 7185, pp. 284–299. Springer, Heidelberg (2012). doi:10.1007/978-3-642-29923-0_19 CrossRef Nikolov, A., d’Aquin, M., Motta, E.: What should i link to? identifying relevant sources and classes for data linking. In: Pan, J.Z., Chen, H., Kim, H.-G., Li, J., Horrocks, I., Mizoguchi, R., Wu, Z., Wu, Z. (eds.) JIST 2011. LNCS, vol. 7185, pp. 284–299. Springer, Heidelberg (2012). doi:10.​1007/​978-3-642-29923-0_​19 CrossRef
8.
Zurück zum Zitat de Oliveira, H.R., Tavares, A.T., Lóscio, B.F.: Feedback-based data set recommendation for building linked data applications. In: International Conference on Semantic Systems, I-SEMANTICS 2012, pp. 49–55. ACM, New York (2012) de Oliveira, H.R., Tavares, A.T., Lóscio, B.F.: Feedback-based data set recommendation for building linked data applications. In: International Conference on Semantic Systems, I-SEMANTICS 2012, pp. 49–55. ACM, New York (2012)
9.
Zurück zum Zitat Raman, S., Chaurasiya, V., Venkatesan, S.: Performance comparison of various information retrieval models used in search engines. In: International Conference on Communication, Information Computing Technology (ICCICT), pp. 1–4 (2012) Raman, S., Chaurasiya, V., Venkatesan, S.: Performance comparison of various information retrieval models used in search engines. In: International Conference on Communication, Information Computing Technology (ICCICT), pp. 1–4 (2012)
10.
Zurück zum Zitat Salvadores, M., Alexander, P.R., Musen, M.A., Noy, N.F.: Bioportal as a dataset of linked biomedical ontologies and terminologies in RDF. Semant. Web 4(3), 277–284 (2013) Salvadores, M., Alexander, P.R., Musen, M.A., Noy, N.F.: Bioportal as a dataset of linked biomedical ontologies and terminologies in RDF. Semant. Web 4(3), 277–284 (2013)
11.
Zurück zum Zitat Studer, R., Benjamins, V.R., Fensel, D.: Knowledge engineering: principles and methods. Data Knowl. Eng. 25(1–2), 161–197 (1998)CrossRefMATH Studer, R., Benjamins, V.R., Fensel, D.: Knowledge engineering: principles and methods. Data Knowl. Eng. 25(1–2), 161–197 (1998)CrossRefMATH
12.
Zurück zum Zitat Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, pp. 173–180. Association for Computational Linguistics, Stroudsburg (2003) Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, pp. 173–180. Association for Computational Linguistics, Stroudsburg (2003)
13.
Zurück zum Zitat Zaveri, A., Rula, A., Maurino, A., Pietrobon, R., Lehmann, J., Auer, S.: Quality assessment for linked data: a survey. Semant. Web 7(1), 63–93 (2016)CrossRef Zaveri, A., Rula, A., Maurino, A., Pietrobon, R., Lehmann, J., Auer, S.: Quality assessment for linked data: a survey. Semant. Web 7(1), 63–93 (2016)CrossRef
Metadaten
Titel
DSCrank: A Method for Selection and Ranking of Datasets
verfasst von
Yasmmin Cortes Martins
Fábio Faria da Mota
Maria Cláudia Cavalcanti
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-49157-8_29

Neuer Inhalt