Skip to main content

2015 | OriginalPaper | Buchkapitel

Dimensional Clustering of Linked Data: Techniques and Applications

verfasst von : Alfio Ferrara, Lorenzo Genta, Stefano Montanelli, Silvana Castano

Erschienen in: Transactions on Large-Scale Data- and Knowledge-Centered Systems XIX

Verlag: Springer Berlin Heidelberg

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The plurality and heterogeneity of linked data features require appropriate solutions for accurate matching and clustering. In this paper, we propose a dimensional clustering approach to enforce (i) the capability to select the set of features to use for data matching and clustering, that are packaged into the so-called thematic dimension, and (ii) the capability to make explicit the cause of similarity that generates each cluster. Ensemble techniques for combining different single-dimension cluster sets into a sort of multi-dimensional view of the considered linked data are also presented as a further contribution of the paper. Application to linked data summarization and exploration is finally discussed.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
For the sake of readability, only a subset of the available properties is reported (http://​www.​dbpedia.​org).
 
2
More technical details about the construction of linked data items from the RDF statements of a repository \(\mathcal {R}\) are provided in [5].
 
3
Since \({\text {ldi-match}}^{\mathcal {D}}(ldi_i, ldi_j) = {\text {ldi-match}}^{\mathcal {D}}(ldi_j, ldi_i)\), we define \(\sigma M\) and \(\pi M\) as upper triangular matrices.
 
4
A detailed presentation of summarization techniques is out of the scope of this work. Here, we outline how to generate a summary-view over a cluster set \(CL\). For the interested reader, a more technical presentation of cluster essential definition, proximity-link specification, and prominence value calculation is provided in [5].
 
Literatur
1.
Zurück zum Zitat Amigó, E., Gonzalo, J., Artiles, J., Verdejo, F.: A comparison of extrinsic clustering evaluation metrics based on formal constraints. Inf. Retr. 12(4), 461–486 (2009)CrossRef Amigó, E., Gonzalo, J., Artiles, J., Verdejo, F.: A comparison of extrinsic clustering evaluation metrics based on formal constraints. Inf. Retr. 12(4), 461–486 (2009)CrossRef
2.
Zurück zum Zitat Bae, E., Bailey, J.: COALA: a novel approach for the extraction of an alternate clustering of high quality and high dissimilarity. In: Proceedings of the 6th IEEE International Conference on Data Mining (ICDM 2006), Hong Kong, China, pp. 53–62 (2006) Bae, E., Bailey, J.: COALA: a novel approach for the extraction of an alternate clustering of high quality and high dissimilarity. In: Proceedings of the 6th IEEE International Conference on Data Mining (ICDM 2006), Hong Kong, China, pp. 53–62 (2006)
3.
Zurück zum Zitat Berkhin, P.: A survey of clustering data mining techniques. In: Kogan, J., Nicholas, C., Teboulle, M. (eds.) Grouping Multidimensional Data. Springer, Heidelberg (2006) Berkhin, P.: A survey of clustering data mining techniques. In: Kogan, J., Nicholas, C., Teboulle, M. (eds.) Grouping Multidimensional Data. Springer, Heidelberg (2006)
4.
Zurück zum Zitat Bizer, C., Heath, T., Berners-Lee, T.: Linked data - the story so far. Int. J. Semant. Web Inf. Syst. 5(3), 1–22 (2009)CrossRef Bizer, C., Heath, T., Berners-Lee, T.: Linked data - the story so far. Int. J. Semant. Web Inf. Syst. 5(3), 1–22 (2009)CrossRef
5.
Zurück zum Zitat Castano, S., Ferrara, A., Montanelli, S.: Thematic clustering and exploration of linked data. In: Ceri, S., Brambilla, M. (eds.) Search Computing. LNCS, vol. 7538, pp. 157–175. Springer, Heidelberg (2012) CrossRef Castano, S., Ferrara, A., Montanelli, S.: Thematic clustering and exploration of linked data. In: Ceri, S., Brambilla, M. (eds.) Search Computing. LNCS, vol. 7538, pp. 157–175. Springer, Heidelberg (2012) CrossRef
6.
Zurück zum Zitat Drost, I., Bickel, S., Scheffer, T.: Discovering communities in linked data by multi-view clustering. In: Proceedings of the 29th Annual Conference of the Gesellschaft für Klassifikation, Magdeburg, Germany, pp. 342–349 (2005) Drost, I., Bickel, S., Scheffer, T.: Discovering communities in linked data by multi-view clustering. In: Proceedings of the 29th Annual Conference of the Gesellschaft für Klassifikation, Magdeburg, Germany, pp. 342–349 (2005)
7.
Zurück zum Zitat Ferrara, A., Nikolov, A., Scharffe, F.: Data linking for the semantic web. Int. J. Semant. Web Inf. Syst. 7(3), 46–76 (2011)CrossRef Ferrara, A., Nikolov, A., Scharffe, F.: Data linking for the semantic web. Int. J. Semant. Web Inf. Syst. 7(3), 46–76 (2011)CrossRef
8.
Zurück zum Zitat Ferrara, A., Genta, L., Montanelli, S.: Linked data classification: a feature-based approach. In: Proceedings of the 3rd EDBT International Workshop on Linked Web Data Management (LWDM 2013), Genova, Italy (2013) Ferrara, A., Genta, L., Montanelli, S.: Linked data classification: a feature-based approach. In: Proceedings of the 3rd EDBT International Workshop on Linked Web Data Management (LWDM 2013), Genova, Italy (2013)
9.
Zurück zum Zitat Giannakidou, E., Vakali, A.: Integrating web 2.0 data into linked open data cloud via clustering. In: Proceedings of the Workshop on Linked Data in the Future Internet at the Future Internet Assembly, Ghent, Belgium (2010) Giannakidou, E., Vakali, A.: Integrating web 2.0 data into linked open data cloud via clustering. In: Proceedings of the Workshop on Linked Data in the Future Internet at the Future Internet Assembly, Ghent, Belgium (2010)
10.
Zurück zum Zitat Goldberg, M.K., Hayvanovych, M., Magdon-Ismail, M.: Measuring similarity between sets of overlapping clusters. In: Proceedings of the IEEE SocialCom/PASSAT Conference, Minneapolis, Minnesota, USA, pp. 303–308 (2010) Goldberg, M.K., Hayvanovych, M., Magdon-Ismail, M.: Measuring similarity between sets of overlapping clusters. In: Proceedings of the IEEE SocialCom/PASSAT Conference, Minneapolis, Minnesota, USA, pp. 303–308 (2010)
11.
Zurück zum Zitat Halkidi, M., Batistakis, Y., Vazirgiannis, M.: On clustering validation techniques. J. Intell. Inf. Syst. 17(2–3), 107–145 (2001)CrossRefMATH Halkidi, M., Batistakis, Y., Vazirgiannis, M.: On clustering validation techniques. J. Intell. Inf. Syst. 17(2–3), 107–145 (2001)CrossRefMATH
12.
Zurück zum Zitat Jean-Mary, Y.R., Shironoshita, E.P., Kabuka, M.R.: Ontology matching with semantic verification. J. Web Semant. 7(3), 235–251 (2009)CrossRef Jean-Mary, Y.R., Shironoshita, E.P., Kabuka, M.R.: Ontology matching with semantic verification. J. Web Semant. 7(3), 235–251 (2009)CrossRef
13.
Zurück zum Zitat Kailing, K., Kriegel, H.-P., Pryakhin, A., Schubert, M.: Clustering multi-represented objects with noise. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 394–403. Springer, Heidelberg (2004) CrossRef Kailing, K., Kriegel, H.-P., Pryakhin, A., Schubert, M.: Clustering multi-represented objects with noise. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 394–403. Springer, Heidelberg (2004) CrossRef
14.
Zurück zum Zitat Lu, Q., Conrad, J.G., Al-Kofahi, K., Keenan, W.: Legal document clustering with built-in topic segmentation. In: Proceedings of the 20th ACM Conference on Information and Knowledge Management (CIKM 2011), Glasgow, UK (2011) Lu, Q., Conrad, J.G., Al-Kofahi, K., Keenan, W.: Legal document clustering with built-in topic segmentation. In: Proceedings of the 20th ACM Conference on Information and Knowledge Management (CIKM 2011), Glasgow, UK (2011)
15.
Zurück zum Zitat Minaei-Bidgoli, B., Topchy, A.P., Punch, W.F.: A comparison of resampling methods for clustering ensembles. In: Proceedings of the International Conference on Artificial Intelligence (IC-AI 2004), Las Vegas, Nevada, USA, pp. 939–945 (2004) Minaei-Bidgoli, B., Topchy, A.P., Punch, W.F.: A comparison of resampling methods for clustering ensembles. In: Proceedings of the International Conference on Artificial Intelligence (IC-AI 2004), Las Vegas, Nevada, USA, pp. 939–945 (2004)
16.
Zurück zum Zitat Müller, E., Günnemann, S., Färber, I., Seidl, T.: Discovering multiple clustering solutions: grouping objects in different views of the data. In: Proceedings of the 28th IEEE International Conference on Data Engineering (ICDE 2012), Washington, DC, USA, pp. 1207–1210 (2012) Müller, E., Günnemann, S., Färber, I., Seidl, T.: Discovering multiple clustering solutions: grouping objects in different views of the data. In: Proceedings of the 28th IEEE International Conference on Data Engineering (ICDE 2012), Washington, DC, USA, pp. 1207–1210 (2012)
17.
Zurück zum Zitat Navarro, G.: A guided tour to approximate string matching. ACM Comput. Surv. 33(1), 31–88 (2001)CrossRef Navarro, G.: A guided tour to approximate string matching. ACM Comput. Surv. 33(1), 31–88 (2001)CrossRef
18.
Zurück zum Zitat Newman, M.J.: A measure of betweenness centrality based on random walks. Soc. Netw. 27(1), 39–54 (2005)CrossRef Newman, M.J.: A measure of betweenness centrality based on random walks. Soc. Netw. 27(1), 39–54 (2005)CrossRef
19.
Zurück zum Zitat Nguyen, X.V., Epps, J., Bailey, J.: Information theoretic measures for clusterings comparison: is a correction for chance necessary? In: Proceedings of the 26th Annual International Conference on Machine Learning (ICML 2009), Montreal, Quebec, Canada (2009) Nguyen, X.V., Epps, J., Bailey, J.: Information theoretic measures for clusterings comparison: is a correction for chance necessary? In: Proceedings of the 26th Annual International Conference on Machine Learning (ICML 2009), Montreal, Quebec, Canada (2009)
20.
Zurück zum Zitat Steinbach, M., Karypis, G., Kumar, V., et al.: A comparison of document clustering techniques. In: Proceedings of the 6th ACM SIGKDD KDD-2000 Workshop on Text Mining, Boston, MA, USA (2000) Steinbach, M., Karypis, G., Kumar, V., et al.: A comparison of document clustering techniques. In: Proceedings of the 6th ACM SIGKDD KDD-2000 Workshop on Text Mining, Boston, MA, USA (2000)
21.
Zurück zum Zitat Strehl, A., Ghosh, J.: Cluster ensembles – a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583–617 (2002)MathSciNet Strehl, A., Ghosh, J.: Cluster ensembles – a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583–617 (2002)MathSciNet
22.
Zurück zum Zitat Vega-Pons, S., Ruiz-Shulcloper, J.: A survey of clustering ensemble algorithms. Int. J. Pattern Recogn. Artif. Intell. 25(3), 337–372 (2011)CrossRefMathSciNet Vega-Pons, S., Ruiz-Shulcloper, J.: A survey of clustering ensemble algorithms. Int. J. Pattern Recogn. Artif. Intell. 25(3), 337–372 (2011)CrossRefMathSciNet
23.
Zurück zum Zitat Verykios, V.S., Elmagarmid, A.K., Houstis, E.N.: Automating the approximate record-matching process. Inf. Sci. 126(1–4), 83–98 (2000)CrossRefMATH Verykios, V.S., Elmagarmid, A.K., Houstis, E.N.: Automating the approximate record-matching process. Inf. Sci. 126(1–4), 83–98 (2000)CrossRefMATH
24.
Zurück zum Zitat Wang, Z., Li, J., Zhao, Y., Setchi, R., Tang, J.: A unified approach to matching semantic data on the web. Knowl. Based Syst. 39, 173–184 (2013)CrossRef Wang, Z., Li, J., Zhao, Y., Setchi, R., Tang, J.: A unified approach to matching semantic data on the web. Knowl. Based Syst. 39, 173–184 (2013)CrossRef
25.
Zurück zum Zitat Xu, R., Wunsch II, D.C.: Survey of clustering algorithms. IEEE Trans. Neural Netw. 16(3), 645–678 (2005)CrossRef Xu, R., Wunsch II, D.C.: Survey of clustering algorithms. IEEE Trans. Neural Netw. 16(3), 645–678 (2005)CrossRef
26.
Zurück zum Zitat Zhao, Y., Karypis, G.: Empirical and theoretical comparisons of selected criterion functions for document clustering. Mach. Learn. 55(3), 311–331 (2004)CrossRefMATH Zhao, Y., Karypis, G.: Empirical and theoretical comparisons of selected criterion functions for document clustering. Mach. Learn. 55(3), 311–331 (2004)CrossRefMATH
27.
Zurück zum Zitat Zhou, Y., Cheng, H., Yu, J.X.: Graph clustering based on structural/attribute similarities. Proc. VLDB Endow. 2(1), 718–729 (2009)CrossRef Zhou, Y., Cheng, H., Yu, J.X.: Graph clustering based on structural/attribute similarities. Proc. VLDB Endow. 2(1), 718–729 (2009)CrossRef
Metadaten
Titel
Dimensional Clustering of Linked Data: Techniques and Applications
verfasst von
Alfio Ferrara
Lorenzo Genta
Stefano Montanelli
Silvana Castano
Copyright-Jahr
2015
Verlag
Springer Berlin Heidelberg
DOI
https://doi.org/10.1007/978-3-662-46562-2_3

Premium Partner