Skip to main content
Top

2017 | OriginalPaper | Chapter

An Efficient Partition-Based Filtering for Similarity Joins on MapReduce Framework

Authors : Miyoung Jang, Archana B. Lokhande, Naeun Baek, Jae-Woo Chang

Published in: Advanced Multimedia and Ubiquitous Engineering

Publisher: Springer Singapore

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Similarity join is an important operation in MapReduce framework to find pairs of similar objects like images, video and time series. Since MapReduce basics do not support efficient join processing, the duplicate reduction of candidates and load-balancing among partitions are the major challenges. Recently, many partition based similarity join algorithms have been proposed to solve such problems. However, the existing algorithms still have limitations for supporting efficient join processing over large-scale data set. In this paper, we proposed a similarity join algorithm with an efficient filtering technique on MapReduce to overcome the limitations of traditional partitioning method in two ways: (1) the number of outputs records generated by the filtering matrix reduces duplicates and (2) the estimated join cost generated by using a partition matrix leads to a better load-balance among reducers. Moreover, we have conducted experimental evaluations using sequential data to show the speed-up and scale-up of proposed method.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: Proceedings of USENIX Symposium on Operating Systems Design and Implementation (OSDI), pp. 1–13 (2004) Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: Proceedings of USENIX Symposium on Operating Systems Design and Implementation (OSDI), pp. 1–13 (2004)
2.
go back to reference Blanas, S., Patel, J.M., Ercegovac, V., Rao, J.: A comparison of join algorithms for log processing in MapReduce. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, pp. 975–986. ACM (2010) Blanas, S., Patel, J.M., Ercegovac, V., Rao, J.: A comparison of join algorithms for log processing in MapReduce. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, pp. 975–986. ACM (2010)
3.
go back to reference Metwally, A., Faloutsos, C.: V-smart-join: a scalable MapReduce framework for all-pair similarity joins of multisets and vectors. Proc. VLDB Endowment 5(8), 704–715 (2012)CrossRef Metwally, A., Faloutsos, C.: V-smart-join: a scalable MapReduce framework for all-pair similarity joins of multisets and vectors. Proc. VLDB Endowment 5(8), 704–715 (2012)CrossRef
4.
go back to reference Okcan, A., Riedewald, M.: Processing theta-joins using MapReduce, In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, pp. 949–960. ACM (2011) Okcan, A., Riedewald, M.: Processing theta-joins using MapReduce, In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, pp. 949–960. ACM (2011)
5.
go back to reference Sharma, A.D., He, Y., Chaudhuri, S.: Clusterjoin: a similarity joins framework using map-reduce. Proc. VLDB Endowment 7(12), 1059–1070 (2014)CrossRef Sharma, A.D., He, Y., Chaudhuri, S.: Clusterjoin: a similarity joins framework using map-reduce. Proc. VLDB Endowment 7(12), 1059–1070 (2014)CrossRef
6.
go back to reference Theodoridis, Y., Silva, J.R.O., Nascimento, M.A.: On the generation of spatiotemporal datasets. In: Proceedings of SSTD, vol. 1651, pp. 147–164 (1999) Theodoridis, Y., Silva, J.R.O., Nascimento, M.A.: On the generation of spatiotemporal datasets. In: Proceedings of SSTD, vol. 1651, pp. 147–164 (1999)
Metadata
Title
An Efficient Partition-Based Filtering for Similarity Joins on MapReduce Framework
Authors
Miyoung Jang
Archana B. Lokhande
Naeun Baek
Jae-Woo Chang
Copyright Year
2017
Publisher
Springer Singapore
DOI
https://doi.org/10.1007/978-981-10-5041-1_84

Premium Partner