Skip to main content
Top
Published in: The Journal of Supercomputing 8/2016

01-08-2016

MBFS: a parallel metadata search method based on Bloomfilters using MapReduce for large-scale file systems

Published in: The Journal of Supercomputing | Issue 8/2016

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The metadata search is an important way to access and manage file systems. Many solutions have been proposed to tackle performance issue of metadata search. However, the existing solutions build a separate metadata index at the internal or external file system through the related data structure or database use semantics and event-notification method to construct the index structure, utilize the sampling-based method to conduct direct metadata search on the namespace, face problems of the high I/O overhead for maintaining consistency between metadata indexes and metadata, have enormous space overhead for metadata indexes storing and low accuracy of results and so on. To address these problems, this paper presents MBFS, a fast, accurate and lightweight metadata search method based on multi-dimensional Bloomfilters. We create a multi-dimensional Bloomfilter structure on the basis of the directory entry that can prune sub-trees to narrow the search scope of namespace. MBFS is capable of producing fast and accurate answers for a class of complex search over a file system after consuming a small number of disk accesses. MBFS residing in the file system does not need additional I/O overhead to maintain consistency. MBFS consists of Bloomfilters which are composed of bits, so it is a lightweight metadata search method that consumes marginal space overhead. Moreover, MBFS employs MapReduce for speeding up search under the environment of multiple metadata servers. Extensive experiments are conducted to prove the effectiveness of MBFS. The experimental results show that MBFS can achieve an excellent performance not only on the search latency, but also on the accuracy of results with low space and time overhead.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Agrawal N, Arpaci-Dusseau AC, Arpaci-Dusseau RH (2009) Generating realistic impressions for file-system benchmarking. ACM Trans Storage 5(4):16CrossRef Agrawal N, Arpaci-Dusseau AC, Arpaci-Dusseau RH (2009) Generating realistic impressions for file-system benchmarking. ACM Trans Storage 5(4):16CrossRef
2.
go back to reference Agrawal N, Bolosky WJ, Douceur JR, Lorch JR (2007) A five-year study of file-system metadata. ACM Trans Storage 3(3):9CrossRef Agrawal N, Bolosky WJ, Douceur JR, Lorch JR (2007) A five-year study of file-system metadata. ACM Trans Storage 3(3):9CrossRef
4.
go back to reference Arasu A, Cho J, Garcia-Molina H, Paepcke A, Raghavan S (2001) Searching the web. ACM Trans Internet Technol 1(1):2–43CrossRef Arasu A, Cho J, Garcia-Molina H, Paepcke A, Raghavan S (2001) Searching the web. ACM Trans Internet Technol 1(1):2–43CrossRef
5.
go back to reference Bergman K, Borkar S, Campbell D, Carlson W, Dally W, Denneau M, Franzon P, Harrod W, Hill K, Hiller J et al (2008) Exascale computing study: technology challenges in achieving exascale systems. Defense Advanced Research Projects Agency Information Processing Techniques Office (DARPA IPTO), Technical Report 15 Bergman K, Borkar S, Campbell D, Carlson W, Dally W, Denneau M, Franzon P, Harrod W, Hill K, Hiller J et al (2008) Exascale computing study: technology challenges in achieving exascale systems. Defense Advanced Research Projects Agency Information Processing Techniques Office (DARPA IPTO), Technical Report 15
6.
go back to reference Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. Comput Netw ISDN Syst 30(1):107–117CrossRef Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. Comput Netw ISDN Syst 30(1):107–117CrossRef
8.
go back to reference Cohen S, Matias Y (2003) Spectral bloom filters. In: Proceedings of the 2003 ACM SIGMOD international conference on management of data, pp 241–252. ACM Cohen S, Matias Y (2003) Spectral bloom filters. In: Proceedings of the 2003 ACM SIGMOD international conference on management of data, pp 241–252. ACM
9.
go back to reference Dai D, Ross RB, Carns P, Kimpe D, Chen Y (2014) Using property graphs for rich metadata management in hpc systems. In: 2014 9th parallel data storage workshop (PDSW), pp 7–12. IEEE Dai D, Ross RB, Carns P, Kimpe D, Chen Y (2014) Using property graphs for rich metadata management in hpc systems. In: 2014 9th parallel data storage workshop (PDSW), pp 7–12. IEEE
10.
go back to reference Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113CrossRef Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113CrossRef
12.
go back to reference Fan L, Cao P, Almeida J, Broder AZ (2000) Summary cache: a scalable wide-area web cache sharing protocol. IEEE/ACM Trans Netw 8(3):281–293CrossRef Fan L, Cao P, Almeida J, Broder AZ (2000) Summary cache: a scalable wide-area web cache sharing protocol. IEEE/ACM Trans Netw 8(3):281–293CrossRef
14.
go back to reference Ficara D, Giordano S, Procissi G, Vitucci F (2008) Multilayer compressed counting bloom filters. In: INFOCOM 2008. The 27th conference on computer communications. IEEE Ficara D, Giordano S, Procissi G, Vitucci F (2008) Multilayer compressed counting bloom filters. In: INFOCOM 2008. The 27th conference on computer communications. IEEE
15.
go back to reference Gifford DK, Jouvelot P, Sheldon MA et al. (1991) Semantic file systems. In: ACM SIGOPS operating systems review, vol 25. ACM, pp 16–25 Gifford DK, Jouvelot P, Sheldon MA et al. (1991) Semantic file systems. In: ACM SIGOPS operating systems review, vol 25. ACM, pp 16–25
18.
go back to reference Groups ES (2007) ESG research report: storage resource management on the launch pad. Technical Report etsg-1809930. Technical Report, Enterprise Strategy Groups Groups ES (2007) ESG research report: storage resource management on the launch pad. Technical Report etsg-1809930. Technical Report, Enterprise Strategy Groups
19.
go back to reference Hua Y, Jiang H, Feng D (2014) Fast: near real-time searchable data analytics for the cloud. In: SC14: international conference for high performance computing, networking, storage and analysis, pp 754–765. IEEE Hua Y, Jiang H, Feng D (2014) Fast: near real-time searchable data analytics for the cloud. In: SC14: international conference for high performance computing, networking, storage and analysis, pp 754–765. IEEE
20.
go back to reference Hua Y, Jiang H, Zhu Y, Feng D (2010) Rapport: semantic-sensitive namespace management in large-scale file systems. CSE Technical reports. University of Nebraska, Lincoln Hua Y, Jiang H, Zhu Y, Feng D (2010) Rapport: semantic-sensitive namespace management in large-scale file systems. CSE Technical reports. University of Nebraska, Lincoln
21.
go back to reference Hua Y, Jiang H, Zhu Y, Feng D, Tian L (2009) Smartstore: a new metadata organization paradigm with semantic-awareness for next-generation file systems. In: Proceedings of the conference on high performance computing networking, storage and analysis, pp 1–12. IEEE Hua Y, Jiang H, Zhu Y, Feng D, Tian L (2009) Smartstore: a new metadata organization paradigm with semantic-awareness for next-generation file systems. In: Proceedings of the conference on high performance computing networking, storage and analysis, pp 1–12. IEEE
22.
go back to reference Hua Y, Jiang H, Zhu Y, Feng D, Xu L (2014) SANE: semantic-aware namespace in ultra-large-scale file systems. IEEE Trans Parallel Distrib Syst 25(5):1328–1338CrossRef Hua Y, Jiang H, Zhu Y, Feng D, Xu L (2014) SANE: semantic-aware namespace in ultra-large-scale file systems. IEEE Trans Parallel Distrib Syst 25(5):1328–1338CrossRef
23.
go back to reference Hua Y, Zhu Y, Jiang H, Feng D, Tian L (2011) Supporting scalable and adaptive metadata management in ultralarge-scale file systems. IEEE Trans Parallel Distrib Syst 22(4):580–593CrossRef Hua Y, Zhu Y, Jiang H, Feng D, Tian L (2011) Supporting scalable and adaptive metadata management in ultralarge-scale file systems. IEEE Trans Parallel Distrib Syst 22(4):580–593CrossRef
24.
go back to reference Huang HH, Zhang N, Wang W, Das G, Szalay A (2012) Just-in-time analytics on large file systems. IEEE Trans Comput 61(11):1651–1664MathSciNetCrossRef Huang HH, Zhang N, Wang W, Das G, Szalay A (2012) Just-in-time analytics on large file systems. IEEE Trans Comput 61(11):1651–1664MathSciNetCrossRef
25.
go back to reference Huston L, Sukthankar R, Wickremesinghe R, Satyanarayanan M, Ganger GR, Riedel E, Ailamaki A (2004) Diamond: a storage architecture for early discard in interactive search. FAST 4:73–86 Huston L, Sukthankar R, Wickremesinghe R, Satyanarayanan M, Ganger GR, Riedel E, Ailamaki A (2004) Diamond: a storage architecture for early discard in interactive search. FAST 4:73–86
26.
go back to reference Imran M, Hlavacs H (2013) Searching in cloud object storage by using a metadata model. In: 2013 Ninth international conference on semantics, knowledge and grids (SKG), pp 121–128. IEEE Imran M, Hlavacs H (2013) Searching in cloud object storage by using a metadata model. In: 2013 Ninth international conference on semantics, knowledge and grids (SKG), pp 121–128. IEEE
30.
go back to reference Leung A, Adams I, Miller EL (2009) Magellan: a searchable metadata architecture for large-scale file systems. University of California, Santa Cruz, Technical Report UCSC-SSRC-09-07 Leung A, Adams I, Miller EL (2009) Magellan: a searchable metadata architecture for large-scale file systems. University of California, Santa Cruz, Technical Report UCSC-SSRC-09-07
31.
go back to reference Leung AW (2009) Organizing, indexing, and searching large-scale file systems. PhD thesis, University of California, Santa Cruz Leung AW (2009) Organizing, indexing, and searching large-scale file systems. PhD thesis, University of California, Santa Cruz
32.
go back to reference Leung AW, Pasupathy S, Goodson GR, Miller EL (2008) Measurement and analysis of large-scale network file system workloads. In: USENIX annual technical conference, vol 1, pp 213–226 Leung AW, Pasupathy S, Goodson GR, Miller EL (2008) Measurement and analysis of large-scale network file system workloads. In: USENIX annual technical conference, vol 1, pp 213–226
33.
go back to reference Leung AW, Shao M, Bisson T, Pasupathy S, Miller EL (2009) Spyglass: fast, scalable metadata search for large-scale storage systems. FAST 9:153–166 Leung AW, Shao M, Bisson T, Pasupathy S, Miller EL (2009) Spyglass: fast, scalable metadata search for large-scale storage systems. FAST 9:153–166
34.
go back to reference Liu J, Feng D, Hua Y, Peng B, Nie Z (2014) Using provenance to efficiently improve metadata searching performance in storage systems. Future Gener Comput Syst Liu J, Feng D, Hua Y, Peng B, Nie Z (2014) Using provenance to efficiently improve metadata searching performance in storage systems. Future Gener Comput Syst
35.
go back to reference Madden APWBA, Long MMDD. Examining scientific data for scalable index designs Madden APWBA, Long MMDD. Examining scientific data for scalable index designs
36.
go back to reference Malkani P, Ellard D, Ledlie J, Seltzer M (2003) Passive NFS tracing of email and research workloads. Proceedings of the 2nd USENIX conference on file and storage technologies. pp 203–216 Malkani P, Ellard D, Ledlie J, Seltzer M (2003) Passive NFS tracing of email and research workloads. Proceedings of the 2nd USENIX conference on file and storage technologies. pp 203–216
37.
go back to reference Mathur A, Cao M, Bhattacharya S, Dilger A, Tomas A, Vivier L (2007) The new ext4 filesystem: current status and future plans. In: Proceedings of the Linux symposium, vol 2. Citeseer, pp 21–33 Mathur A, Cao M, Bhattacharya S, Dilger A, Tomas A, Vivier L (2007) The new ext4 filesystem: current status and future plans. In: Proceedings of the Linux symposium, vol 2. Citeseer, pp 21–33
40.
go back to reference Nunez J (2008) High end computing file system and IO R&D gaps roadmap. In: HEC FSIO R&D Conference Nunez J (2008) High end computing file system and IO R&D gaps roadmap. In: HEC FSIO R&D Conference
41.
go back to reference Ohara Y (2013) Hctrie: a structure for indexing hundreds of dimensions for use in file systems search. In: 2013 IEEE 29th symposium on mass storage systems and technologies (MSST), pp 1–5. IEEE Ohara Y (2013) Hctrie: a structure for indexing hundreds of dimensions for use in file systems search. In: 2013 IEEE 29th symposium on mass storage systems and technologies (MSST), pp 1–5. IEEE
42.
go back to reference Owens L, Brown M, Poore K, Nicolson N (2008) The forrester wave: enterprise search, q2 2008. For information and knowledge management professionals Owens L, Brown M, Poore K, Nicolson N (2008) The forrester wave: enterprise search, q2 2008. For information and knowledge management professionals
43.
go back to reference Pagh A, Pagh R, Rao SS (2005) An optimal bloom filter replacement. In: Proceedings of the sixteenth annual ACM-SIAM symposium on discrete algorithms. Society for Industrial and Applied Mathematics, pp 823–829 Pagh A, Pagh R, Rao SS (2005) An optimal bloom filter replacement. In: Proceedings of the sixteenth annual ACM-SIAM symposium on discrete algorithms. Society for Industrial and Applied Mathematics, pp 823–829
44.
go back to reference Pathan AI, Sinhal A (2013) Encode decode linux based partitions to hide and explore file system. Int J Comput Appl 75(12) Pathan AI, Sinhal A (2013) Encode decode linux based partitions to hide and explore file system. Int J Comput Appl 75(12)
45.
go back to reference Ross RB, Thakur R et al (2000) PVFS: a parallel file system for linux clusters. In: Proceedings of the 4th annual Linux showcase and conference, pp 391–430 Ross RB, Thakur R et al (2000) PVFS: a parallel file system for linux clusters. In: Proceedings of the 4th annual Linux showcase and conference, pp 391–430
46.
go back to reference Schwan P (2003) Lustre: building a file system for 1000-node clusters. In: Proceedings of the 2003 Linux symposium, vol 2003 Schwan P (2003) Lustre: building a file system for 1000-node clusters. In: Proceedings of the 2003 Linux symposium, vol 2003
48.
go back to reference Soules CA, Ganger GR (2005) Connections: using context to enhance file search. In: ACM SIGOPS operating systems review, vol 39. ACM, pp 119–132 Soules CA, Ganger GR (2005) Connections: using context to enhance file search. In: ACM SIGOPS operating systems review, vol 39. ACM, pp 119–132
49.
go back to reference Soules CA, Keeton K, Morrey III CB (2009) Scan-lite: enterprise-wide analysis on the cheap. In: Proceedings of the 4th ACM European conference on computer systems. ACM, pp 117–130 Soules CA, Keeton K, Morrey III CB (2009) Scan-lite: enterprise-wide analysis on the cheap. In: Proceedings of the 4th ACM European conference on computer systems. ACM, pp 117–130
50.
go back to reference van Heuven van Staereling R, Appuswamy R, van Moolenbroek DC, Tanenbaum AS (2011) Efficient, modular metadata management with loris. In: 2011 6th IEEE international conference on networking, architecture and storage (NAS). IEEE, pp 278–287 van Heuven van Staereling R, Appuswamy R, van Moolenbroek DC, Tanenbaum AS (2011) Efficient, modular metadata management with loris. In: 2011 6th IEEE international conference on networking, architecture and storage (NAS). IEEE, pp 278–287
51.
go back to reference Szalay A (2008) New challenges in petascale scientific databases. In: Scientific and statistical database management. Springer, Berlin, p 1 Szalay A (2008) New challenges in petascale scientific databases. In: Scientific and statistical database management. Springer, Berlin, p 1
52.
go back to reference Takata M, Sutoh A (2012) Event-notification-based inactive file search for large-scale file systems. In: APMRC, 2012 digest. IEEE, pp 1–7 Takata M, Sutoh A (2012) Event-notification-based inactive file search for large-scale file systems. In: APMRC, 2012 digest. IEEE, pp 1–7
54.
go back to reference Weil SA (2007) Ceph: reliable, scalable, and high-performance distributed storage. PhD thesis, University of California, Santa Cruz Weil SA (2007) Ceph: reliable, scalable, and high-performance distributed storage. PhD thesis, University of California, Santa Cruz
55.
go back to reference Xiao B, Hua Y (2010) Using parallel bloom filters for multiattribute representation on network services. IEEE Trans Parallel Distrib Syst 21(1):20–32CrossRef Xiao B, Hua Y (2010) Using parallel bloom filters for multiattribute representation on network services. IEEE Trans Parallel Distrib Syst 21(1):20–32CrossRef
56.
go back to reference Xu L, Huang Z, Jiang H, Tian L, Swanson D (2014) VSFS: a searchable distributed file system. In: Parallel data storage workshop (PDSW), 2014 9th. IEEE, pp 25–30 Xu L, Huang Z, Jiang H, Tian L, Swanson D (2014) VSFS: a searchable distributed file system. In: Parallel data storage workshop (PDSW), 2014 9th. IEEE, pp 25–30
57.
go back to reference Xu L, Jiang H, Liu X, Tian L, Hua Y, Hu J (2011) Propeller: a scalable metadata organization for a versatile searchable file system. CSE Technical reports. University of Nebraska, Lincoln Xu L, Jiang H, Liu X, Tian L, Hua Y, Hu J (2011) Propeller: a scalable metadata organization for a versatile searchable file system. CSE Technical reports. University of Nebraska, Lincoln
58.
go back to reference Yu Y, Zhu Y, Ng W, Samsudin J (2014) An efficient multidimension metadata index and search system for cloud data. In: 2014 IEEE 6th international conference on cloud computing technology and science (CloudCom), pp 499–504. IEEE Yu Y, Zhu Y, Ng W, Samsudin J (2014) An efficient multidimension metadata index and search system for cloud data. In: 2014 IEEE 6th international conference on cloud computing technology and science (CloudCom), pp 499–504. IEEE
59.
go back to reference Zhang Q, Feng D, Wang F, Wu S (2014) Mlock: building delegable metadata service for the parallel file systems. Sci China Inf Sci 58(3):1–14CrossRef Zhang Q, Feng D, Wang F, Wu S (2014) Mlock: building delegable metadata service for the parallel file systems. Sci China Inf Sci 58(3):1–14CrossRef
Metadata
Title
MBFS: a parallel metadata search method based on Bloomfilters using MapReduce for large-scale file systems
Publication date
01-08-2016
Published in
The Journal of Supercomputing / Issue 8/2016
Print ISSN: 0920-8542
Electronic ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-015-1464-2

Other articles of this Issue 8/2016

The Journal of Supercomputing 8/2016 Go to the issue

Premium Partner