Speeding up the multimedia feature extraction: a comparative study on the big data approach

Mera, David; Batko, Michal; Zezula, Pavel

doi:10.1007/s11042-016-3415-1

Speeding up the multimedia feature extraction: a comparative study on the big data approach

Published: 10 March 2016

Volume 76, pages 7497–7517, (2017)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

577 Accesses
7 Citations
Explore all metrics

Abstract

The current explosion of multimedia data is significantly increasing the amount of potential knowledge. However, to get to the actual information requires to apply novel content-based techniques which in turn require time consuming extraction of indexable features from the raw data. In order to deal with large datasets, this task needs to be parallelized. However, there are multiple approaches to choose from, each with its own benefits and drawbacks. There are also several parameters that must be taken into consideration, for example the amount of available resources, the size of the data and their availability. In this paper, we empirically evaluate and compare approaches based on Apache Hadoop, Apache Storm, Apache Spark, and Grid computing, employed to distribute the extraction task over an outsourced and distributed infrastructure.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Trends and Future Perspective Challenges in Big Data

Social media analytics: a survey of techniques, tools and platforms

Article Open access 26 July 2014

Big data preprocessing: methods and prospects

Article Open access 01 November 2016

Notes

References

Apache hadoop. Online. Accessed: 2015-11-06
Apache spark. Online. Accessed: 2015-25-11
Apache storm. Online. Accessed: 2015-11-06
Batko M, Novak D, Zezula P (2007) Messif: Metric similarity search implementation framework. In: Digital Libraries: Research and Development, pp 1–10. Springer
Bolettieri P, Esuli A, Falchi F, Lucchese C, Perego R, Piccioli T, Rabitti F (2009) CoPhIR: a test collection for content-based image retrieval
Chen C, Zhang CY (2014) Data-intensive applications, challenges, techniques and technologies: A survey on big data. Information Sciences
Chlumsky V, Klusacek D, Ruda M (2012) The extension of torque scheduler allowing the use of planning and optimizing in grids. Comput Sci 13(2)
Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun of the ACM 51(1):107–113
Article Google Scholar
Donahue J, Jia Y, Vinyals O, Hoffman J, Zhang N, Tzeng E, Darrell T (2014) DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition. Int Conf on Mach Learning :647–655
Eyben F, Wöllmer M, Schuller B (2010) Opensmile: the munich versatile and fast open-source audio feature extractor. In: Proceedings of the international conference on Multimedia, pp 1459– 1462. ACM
Huang FC, Huang SY, Ker JW, Chen YC (2012) High-performance sift hardware accelerator for real-time image feature extraction. Circuits and Sys for Video Tech, IEEE Trans on 22(3):340– 351
Article Google Scholar
IBM research department (2013) Global technology outlook. Research note, IBM Corporation
Jogalekar P, Woodside M (2000) Evaluating the scalability of distributed systems. Parall Distri Sys, IEEE Trans on 11(6):589–603
Article Google Scholar
Kao O (2008) On parallel image retrieval with dynamically extracted features. Parall comput 34(12):700–709
Article Google Scholar
Karau H, Konwinski A, Wendell P, Zaharia M (2015) Learning Spark: Lightning-Fast Big Data Analysis. ” O’Reilly Media, Inc.”
Kruliš M, Lokoč J, Skopal T (2015) Efficient extraction of clustering-based feature signatures using gpu architectures Multimedia Tools and Applications:1–33
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Article Google Scholar
Marz N, Warren J (2014) Big Data: Principles and best practices of scalable realtime data systems O’Reilly Media
Moise D, Shestakov D, Gudmundsson G, Amsaleg L (2013) Indexing and searching 100m images with map-reduce. In: Proceedings of the 3rd ACM conference on International conference on multimedia retrieval, pp 17–24. ACM
MPEG-7: (2002) Multimedia content description interfaces. Part 3: Visual. ISO/IEC 15938-3:2002
Oliva A, Torralba A (2001) Modeling the shape of the scene: A holistic representation of the spatial envelope. Int J Comput Vis 42(3):145–175.
Article MATH Google Scholar
Shvachko K, Kuang H, Radia S, Chansler R (2010) The hadoop distributed file system. In: 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp 1–10. IEEE
Stupar A., Michel S., Schenkel R. (2010) Rankreduceprocessing k-nearest neighbor queries on top of mapreduce. In: Proceedings of the 8th Workshop on Large-Scale Distributed Systems for Information Retrieval, pp 13–18. Citeseer
Šustr Z, Sitera J, Mulac M, Ruda M, Antoš D, Hejtmánek L, Holub P, Salvet Z, Matyska L (2009) Metacentrum, the czech virtualized ngi.. In: EGEE Technical Forum
Sweeney C (2011) Hipi: A Hadoop Image Processing Interface for Image-Based MapReduce Tasks. B.S. Thesis, University of Virginia Department of Computer Science
Toshniwal A, Taneja S, Shukla A, Ramasamy K, Patel JM, Kulkarni S, Jackson J, Gade K, Fu M., Donham J, et.al (2014) Storm@ twitter. In: Proceedings of the 2014 ACM SIGMOD international conference on Management of data, pp 147–156. ACM
White T (2012) Hadoop: The definitive guide. ” O’Reilly Media, Inc.”
Zaharia M, Chowdhury M, Das T, Dave A, Ma J, McCauley M, Franklin MJ, Shenker S, Stoica I (2012) Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation, pp 2–2. USENIX Association
Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I (2010) Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX conference on Hot topics in cloud computing, pp 10–10

Download references

Author information

Authors and Affiliations

Centro Singular de Investigación en Tecnoloxías da Información (CITIUS), Universidade de Santiago de Compostela, Rúa de Jenaro de la Fuente Domínguez, 15782, Santiago de Compostela, Spain
David Mera
Laboratory of Data Intensive Systems and Applications, Faculty of Informatics, Masaryk University, Brno, Czech Republic
Michal Batko & Pavel Zezula

Authors

David Mera
View author publications
You can also search for this author in PubMed Google Scholar
Michal Batko
View author publications
You can also search for this author in PubMed Google Scholar
Pavel Zezula
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to David Mera.

Additional information

This work was carried out during the tenure of an ERCIM “Alain Bensoussan” Fellowship programme. The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement N^∘ 246016. This work has been partially supported by both the Czech Science Foundation project number P103/12/G084 and the Xunta de Galicia project number GPC2014/037. Computational resources were provided by the MetaCentrum under the program LM2010005 and the CERIT-SC under the program Centre CERIT Scientific Cloud, part of the Operational Program Research and Development for Innovations, Reg. no. CZ.1.05/3.2.00/08.0144.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mera, D., Batko, M. & Zezula, P. Speeding up the multimedia feature extraction: a comparative study on the big data approach. Multimed Tools Appl 76, 7497–7517 (2017). https://doi.org/10.1007/s11042-016-3415-1

Download citation

Received: 16 June 2015
Revised: 10 January 2016
Accepted: 29 February 2016
Published: 10 March 2016
Issue Date: March 2017
DOI: https://doi.org/10.1007/s11042-016-3415-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Speeding up the multimedia feature extraction: a comparative study on the big data approach

Abstract

Access this article

Similar content being viewed by others

Trends and Future Perspective Challenges in Big Data

Social media analytics: a survey of techniques, tools and platforms

Big data preprocessing: methods and prospects

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Speeding up the multimedia feature extraction: a comparative study on the big data approach

Abstract

Access this article

Similar content being viewed by others

Trends and Future Perspective Challenges in Big Data

Social media analytics: a survey of techniques, tools and platforms

Big data preprocessing: methods and prospects

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation