nach oben

Erschienen in:

2015 | OriginalPaper | Buchkapitel

Dimensional Clustering of Linked Data: Techniques and Applications

verfasst von : Alfio Ferrara, Lorenzo Genta, Stefano Montanelli, Silvana Castano

Erschienen in: Transactions on Large-Scale Data- and Knowledge-Centered Systems XIX

Verlag: Springer Berlin Heidelberg

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

The plurality and heterogeneity of linked data features require appropriate solutions for accurate matching and clustering. In this paper, we propose a dimensional clustering approach to enforce (i) the capability to select the set of features to use for data matching and clustering, that are packaged into the so-called thematic dimension, and (ii) the capability to make explicit the cause of similarity that generates each cluster. Ensemble techniques for combining different single-dimension cluster sets into a sort of multi-dimensional view of the considered linked data are also presented as a further contribution of the paper. Application to linked data summarization and exploration is finally discussed.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel The Web Within: Leveraging Web Standards and Graph Analysis to Enable Application-Level Integration of Institutional Data

Nächstes Kapitel ProProtect3: An Approach for Protecting User Profile Data from Disclosure, Tampering, and Improper Use in the Context of WebID

For the sake of readability, only a subset of the available properties is reported (http://www.dbpedia.org).

More technical details about the construction of linked data items from the RDF statements of a repository \(\mathcal {R}\) are provided in [5].

Since \({\text {ldi-match}}^{\mathcal {D}}(ldi_i, ldi_j) = {\text {ldi-match}}^{\mathcal {D}}(ldi_j, ldi_i)\), we define \(\sigma M\) and \(\pi M\) as upper triangular matrices.

A detailed presentation of summarization techniques is out of the scope of this work. Here, we outline how to generate a summary-view over a cluster set \(CL\). For the interested reader, a more technical presentation of cluster essential definition, proximity-link specification, and prominence value calculation is provided in [5].

Amigó, E., Gonzalo, J., Artiles, J., Verdejo, F.: A comparison of extrinsic clustering evaluation metrics based on formal constraints. Inf. Retr. 12(4), 461–486 (2009)CrossRef

Bae, E., Bailey, J.: COALA: a novel approach for the extraction of an alternate clustering of high quality and high dissimilarity. In: Proceedings of the 6th IEEE International Conference on Data Mining (ICDM 2006), Hong Kong, China, pp. 53–62 (2006)

Berkhin, P.: A survey of clustering data mining techniques. In: Kogan, J., Nicholas, C., Teboulle, M. (eds.) Grouping Multidimensional Data. Springer, Heidelberg (2006)

Bizer, C., Heath, T., Berners-Lee, T.: Linked data - the story so far. Int. J. Semant. Web Inf. Syst. 5(3), 1–22 (2009)CrossRef

Castano, S., Ferrara, A., Montanelli, S.: Thematic clustering and exploration of linked data. In: Ceri, S., Brambilla, M. (eds.) Search Computing. LNCS, vol. 7538, pp. 157–175. Springer, Heidelberg (2012) CrossRef

Drost, I., Bickel, S., Scheffer, T.: Discovering communities in linked data by multi-view clustering. In: Proceedings of the 29th Annual Conference of the Gesellschaft für Klassifikation, Magdeburg, Germany, pp. 342–349 (2005)

Ferrara, A., Nikolov, A., Scharffe, F.: Data linking for the semantic web. Int. J. Semant. Web Inf. Syst. 7(3), 46–76 (2011)CrossRef

Ferrara, A., Genta, L., Montanelli, S.: Linked data classification: a feature-based approach. In: Proceedings of the 3rd EDBT International Workshop on Linked Web Data Management (LWDM 2013), Genova, Italy (2013)

Giannakidou, E., Vakali, A.: Integrating web 2.0 data into linked open data cloud via clustering. In: Proceedings of the Workshop on Linked Data in the Future Internet at the Future Internet Assembly, Ghent, Belgium (2010)

10.

Goldberg, M.K., Hayvanovych, M., Magdon-Ismail, M.: Measuring similarity between sets of overlapping clusters. In: Proceedings of the IEEE SocialCom/PASSAT Conference, Minneapolis, Minnesota, USA, pp. 303–308 (2010)

11.

Halkidi, M., Batistakis, Y., Vazirgiannis, M.: On clustering validation techniques. J. Intell. Inf. Syst. 17(2–3), 107–145 (2001)CrossRefMATH

12.

Jean-Mary, Y.R., Shironoshita, E.P., Kabuka, M.R.: Ontology matching with semantic verification. J. Web Semant. 7(3), 235–251 (2009)CrossRef

13.

Kailing, K., Kriegel, H.-P., Pryakhin, A., Schubert, M.: Clustering multi-represented objects with noise. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 394–403. Springer, Heidelberg (2004) CrossRef

14.

Lu, Q., Conrad, J.G., Al-Kofahi, K., Keenan, W.: Legal document clustering with built-in topic segmentation. In: Proceedings of the 20th ACM Conference on Information and Knowledge Management (CIKM 2011), Glasgow, UK (2011)

15.

Minaei-Bidgoli, B., Topchy, A.P., Punch, W.F.: A comparison of resampling methods for clustering ensembles. In: Proceedings of the International Conference on Artificial Intelligence (IC-AI 2004), Las Vegas, Nevada, USA, pp. 939–945 (2004)

16.

Müller, E., Günnemann, S., Färber, I., Seidl, T.: Discovering multiple clustering solutions: grouping objects in different views of the data. In: Proceedings of the 28th IEEE International Conference on Data Engineering (ICDE 2012), Washington, DC, USA, pp. 1207–1210 (2012)

17.

Navarro, G.: A guided tour to approximate string matching. ACM Comput. Surv. 33(1), 31–88 (2001)CrossRef

18.

Newman, M.J.: A measure of betweenness centrality based on random walks. Soc. Netw. 27(1), 39–54 (2005)CrossRef

19.

Nguyen, X.V., Epps, J., Bailey, J.: Information theoretic measures for clusterings comparison: is a correction for chance necessary? In: Proceedings of the 26th Annual International Conference on Machine Learning (ICML 2009), Montreal, Quebec, Canada (2009)

20.

Steinbach, M., Karypis, G., Kumar, V., et al.: A comparison of document clustering techniques. In: Proceedings of the 6th ACM SIGKDD KDD-2000 Workshop on Text Mining, Boston, MA, USA (2000)

21.

Strehl, A., Ghosh, J.: Cluster ensembles – a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583–617 (2002)MathSciNet

22.

Vega-Pons, S., Ruiz-Shulcloper, J.: A survey of clustering ensemble algorithms. Int. J. Pattern Recogn. Artif. Intell. 25(3), 337–372 (2011)CrossRefMathSciNet

23.

Verykios, V.S., Elmagarmid, A.K., Houstis, E.N.: Automating the approximate record-matching process. Inf. Sci. 126(1–4), 83–98 (2000)CrossRefMATH

24.

Wang, Z., Li, J., Zhao, Y., Setchi, R., Tang, J.: A unified approach to matching semantic data on the web. Knowl. Based Syst. 39, 173–184 (2013)CrossRef

25.

Xu, R., Wunsch II, D.C.: Survey of clustering algorithms. IEEE Trans. Neural Netw. 16(3), 645–678 (2005)CrossRef

26.

Zhao, Y., Karypis, G.: Empirical and theoretical comparisons of selected criterion functions for document clustering. Mach. Learn. 55(3), 311–331 (2004)CrossRefMATH

27.

Zhou, Y., Cheng, H., Yu, J.X.: Graph clustering based on structural/attribute similarities. Proc. VLDB Endow. 2(1), 718–729 (2009)CrossRef

Titel: Dimensional Clustering of Linked Data: Techniques and Applications
verfasst von: Alfio Ferrara
Lorenzo Genta
Stefano Montanelli
Silvana Castano
Verlag: Springer Berlin Heidelberg
Buch: Transactions on Large-Scale Data- and Knowledge-Centered Systems XIX
Print ISBN: 978-3-662-46561-5

Electronic ISBN: 978-3-662-46562-2

Copyright-Jahr: 2015
DOI: https://doi.org/10.1007/978-3-662-46562-2_3

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner