Skip to main content
Top
Published in: Soft Computing 6/2015

01-06-2015 | Focus

Double-phase locality-sensitive hashing of neighborhood development for multi-relational data

Authors: Ping Ling, Xiangsheng Rong, Yongquan Dong, Guosheng Hao

Published in: Soft Computing | Issue 6/2015

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Multi-relational (MR) data refer to the objects that involve multi-associated tables or a relational database, and they are widely used in diverse applications. In spite of rich achievements in concrete classification or regression tasks, a neighborhood development algorithm customized to MR data is still missing. The reason lies in the fact that MR data are high-dimensional and highly structured. To address these two difficulties, this paper presents a double-phase locality-sensitive hashing (DPLSH) algorithm to develop neighborhood for MR data. DPLSH consists of offline and online hashing schemas to draw and summarize local closeness information from each involved table. Based on the closeness information, three criteria of neighborhood development are proposed. DPLSH is encoded with parameterization heuristics to make the algorithm data adaptive and less costly. Extensive experiments indicate that for MR data, the quality of neighborhoods produced by DPLSH is better than its peers; besides, in common data environment, DPLSH also exhibits the competitive behaviors with the state of the art.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Literature
go back to reference Andoni A, Indyk P (2006) Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In: Proceedings of 47th annual IEEE symposium on foundations of computer science, pp 459–468 Andoni A, Indyk P (2006) Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In: Proceedings of 47th annual IEEE symposium on foundations of computer science, pp 459–468
go back to reference Bradley PS, Fayyad UM (1998) Refining initial points for K-means clustering. In: Proceedings of 15th international conference on machine learning, pp 91–99 Bradley PS, Fayyad UM (1998) Refining initial points for K-means clustering. In: Proceedings of 15th international conference on machine learning, pp 91–99
go back to reference Ceci M, Appice A, Loglisci C, Caruso C, Fumarola F, Malerba D (2009) Novelty detection from evolving complex data streams with time windows. In: ISMIS’09: Proceedings of the 18th international symposium on foundations of intelligent systems, lecture notes in artificial intelligence, vol 5722. Springer, Berlin, Heidelberg Ceci M, Appice A, Loglisci C, Caruso C, Fumarola F, Malerba D (2009) Novelty detection from evolving complex data streams with time windows. In: ISMIS’09: Proceedings of the 18th international symposium on foundations of intelligent systems, lecture notes in artificial intelligence, vol 5722. Springer, Berlin, Heidelberg
go back to reference Coble J, Cook DJ, Holder LB (2006) Structure discovery in sequentially-connected data streams. Int J Artif Intell Tools 15(6):917–944CrossRef Coble J, Cook DJ, Holder LB (2006) Structure discovery in sequentially-connected data streams. Int J Artif Intell Tools 15(6):917–944CrossRef
go back to reference Dolsak B, Bratko I, Jezernik A (1994) Finite element mesh design: an engineering domain for ILP application. In: Proceedings of the 4th international workshop on inductive logic programming, GMD-studien 237, pp 305–320 Dolsak B, Bratko I, Jezernik A (1994) Finite element mesh design: an engineering domain for ILP application. In: Proceedings of the 4th international workshop on inductive logic programming, GMD-studien 237, pp 305–320
go back to reference Domeniconi C, Peng J, Gunopulos (2001) An adaptive metric machine for pattern classification. Adv Neural Inf Process Syst 13:458–464 Domeniconi C, Peng J, Gunopulos (2001) An adaptive metric machine for pattern classification. Adv Neural Inf Process Syst 13:458–464
go back to reference Driessens K, Ramon J, Blockeel H (2001) Speeding up relational reinforcement learning through the use of an incremental first order decision tree learner. In: Raedt LD, Flach P (eds) Proceedings of the 13th European conference on machine learning, lecture notes in artificial intelligence, vol 2167, pp 97–108 Driessens K, Ramon J, Blockeel H (2001) Speeding up relational reinforcement learning through the use of an incremental first order decision tree learner. In: Raedt LD, Flach P (eds) Proceedings of the 13th European conference on machine learning, lecture notes in artificial intelligence, vol 2167, pp 97–108
go back to reference Džeroski S (2003) Multi-relational data mining: an introduction. ACM SIGKDD Explor Newsl 5(1):1–16CrossRef Džeroski S (2003) Multi-relational data mining: an introduction. ACM SIGKDD Explor Newsl 5(1):1–16CrossRef
go back to reference Džeroski S (2010) Relational data mining. Springer, Washington, DC Džeroski S (2010) Relational data mining. Springer, Washington, DC
go back to reference Flach P, Lachiche N (1999) 1BC: a first-order Bayesian classifier, ILP-99. LNAI 1634, pp 92–103 Flach P, Lachiche N (1999) 1BC: a first-order Bayesian classifier, ILP-99. LNAI 1634, pp 92–103
go back to reference Friedman JH (1994) Flexible metric nearest neighbor classification. Technique report. Department of Statistics, Stanford University Friedman JH (1994) Flexible metric nearest neighbor classification. Technique report. Department of Statistics, Stanford University
go back to reference Georgescu B, Shimshoni IL, Mee P (2003) Mean shift based clustering in high dimensions: a texture classification example. In: Proceedings of international conference on computer vision, pp 456–463 Georgescu B, Shimshoni IL, Mee P (2003) Mean shift based clustering in high dimensions: a texture classification example. In: Proceedings of international conference on computer vision, pp 456–463
go back to reference Hastie T, tibshirani R (1996) Discriminant adaptive nearest neighbor classification. IEEE Trans Pattern Anal Mach Intell 18(6):607–615CrossRef Hastie T, tibshirani R (1996) Discriminant adaptive nearest neighbor classification. IEEE Trans Pattern Anal Mach Intell 18(6):607–615CrossRef
go back to reference Hou W, Yang B, Wu C, Zhou Z (2010) RedTrees: a relational decision tree algorithm in streams. Expert Syst Appl 37(9):6265–6269CrossRef Hou W, Yang B, Wu C, Zhou Z (2010) RedTrees: a relational decision tree algorithm in streams. Expert Syst Appl 37(9):6265–6269CrossRef
go back to reference Indyk P, Motwani R (1998) Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of symposium on theory of computing, pp 604–613 Indyk P, Motwani R (1998) Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of symposium on theory of computing, pp 604–613
go back to reference Kietz JU, Wrobel S (1992) Controlling the complexity of learning in logic through syntactic and task-oriented models. In: Muggleton S (ed) Inductive logic programming. Academic Press, London, pp 335–359 Kietz JU, Wrobel S (1992) Controlling the complexity of learning in logic through syntactic and task-oriented models. In: Muggleton S (ed) Inductive logic programming. Academic Press, London, pp 335–359
go back to reference Kirsten M, Wrobel S, Horvath T (2001) Distance based approaches to relational learning and clustering[M]//Relational data mining. Springer, Berlin, Heidelberg Kirsten M, Wrobel S, Horvath T (2001) Distance based approaches to relational learning and clustering[M]//Relational data mining. Springer, Berlin, Heidelberg
go back to reference Kyburg HE (1970) Probability and inductive logic. New York, Macmillan Kyburg HE (1970) Probability and inductive logic. New York, Macmillan
go back to reference Mayur D, Immorlica N, Indyk P, Mirrokni VS (2004) Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedings of the 20th annual symposium on computational geometry, pp 253–262 Mayur D, Immorlica N, Indyk P, Mirrokni VS (2004) Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedings of the 20th annual symposium on computational geometry, pp 253–262
go back to reference Muggleton S, Bain M, Hayes-Michie J, Michie D (1989) An experimental comparison of human and machine learning formalisms. In: Proceedings of 6th international workshop on machine learning. Morgan Kaufmann, pp 113–118 Muggleton S, Bain M, Hayes-Michie J, Michie D (1989) An experimental comparison of human and machine learning formalisms. In: Proceedings of 6th international workshop on machine learning. Morgan Kaufmann, pp 113–118
go back to reference Muggleton S, Buntine WL (1992) Machine invention of first order predicates by inverting resolution. In: Muggleton S (ed) Inductive logic programming. Academic Press, London, pp 261–280 Muggleton S, Buntine WL (1992) Machine invention of first order predicates by inverting resolution. In: Muggleton S (ed) Inductive logic programming. Academic Press, London, pp 261–280
go back to reference Muggleton S (1992) Inductive logic programming. Academic Press, LondonMATH Muggleton S (1992) Inductive logic programming. Academic Press, LondonMATH
go back to reference Nishida K, Yamauchi K, Omori T (2005) ACE: adaptive classifiers-ensemble system for concept-drifting environments. MCS, LNCS 3541:176–185 Nishida K, Yamauchi K, Omori T (2005) ACE: adaptive classifiers-ensemble system for concept-drifting environments. MCS, LNCS 3541:176–185
go back to reference Page D, Craven M (2003) Biological applications of multi-relational data mining. ACM SIGKDD Explor Newsl 5(1):69–79CrossRef Page D, Craven M (2003) Biological applications of multi-relational data mining. ACM SIGKDD Explor Newsl 5(1):69–79CrossRef
go back to reference Panigrahy R (2006) Entropy based nearest neighbor search in high dimensions. In: Proceedings of the 17th annual ACM-SIAM symposium on discrete algorithm, pp 1186–1195 Panigrahy R (2006) Entropy based nearest neighbor search in high dimensions. In: Proceedings of the 17th annual ACM-SIAM symposium on discrete algorithm, pp 1186–1195
go back to reference Raedt LD (1992) Interactive theory revision: an inductive logic programming approach. Academic Press, London Raedt LD (1992) Interactive theory revision: an inductive logic programming approach. Academic Press, London
go back to reference Schölkopf B, Platt JC, Shawe-Taylor J, Smola AJ, Williamson RC (2001) Estimating the support of a high-dimensional distribution. Neural Comput 13(7):1443–1471CrossRefMATH Schölkopf B, Platt JC, Shawe-Taylor J, Smola AJ, Williamson RC (2001) Estimating the support of a high-dimensional distribution. Neural Comput 13(7):1443–1471CrossRefMATH
go back to reference Seidl T, Kriegel HP (1998) Optimal Multi-step k-nearest neighbor search. In: Proceedings of ACM SIGMOD international conference on management of data, pp 154–165 Seidl T, Kriegel HP (1998) Optimal Multi-step k-nearest neighbor search. In: Proceedings of ACM SIGMOD international conference on management of data, pp 154–165
go back to reference Shapiro EY (1983) Algorithmic program debugging. MIT Press, Cambridge Shapiro EY (1983) Algorithmic program debugging. MIT Press, Cambridge
go back to reference Vapnik V (1998) Statistical learning theory. Wiley, New YorkMATH Vapnik V (1998) Statistical learning theory. Wiley, New YorkMATH
go back to reference Woznica A, Kalousis A, Hilario M (2004) Kernel-based distances for relational learning. The tenth ACM SIGKDD international conference on knowledge discovery and data mining Woznica A, Kalousis A, Hilario M (2004) Kernel-based distances for relational learning. The tenth ACM SIGKDD international conference on knowledge discovery and data mining
Metadata
Title
Double-phase locality-sensitive hashing of neighborhood development for multi-relational data
Authors
Ping Ling
Xiangsheng Rong
Yongquan Dong
Guosheng Hao
Publication date
01-06-2015
Publisher
Springer Berlin Heidelberg
Published in
Soft Computing / Issue 6/2015
Print ISSN: 1432-7643
Electronic ISSN: 1433-7479
DOI
https://doi.org/10.1007/s00500-014-1343-4

Other articles of this Issue 6/2015

Soft Computing 6/2015 Go to the issue

Premium Partner