Top

International Journal of Computer Vision

Published in:

24-05-2018

Efficiently Annotating Object Images with Absolute Size Information Using Mobile Devices

Authors: Martin Hofmann, Marco Seeland, Patrick Mäder

Published in: International Journal of Computer Vision | Issue 2/2019

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

The projection of a real world scenery to a planar image sensor inherits the loss of information about the 3D structure as well as the absolute dimensions of the scene. For image analysis and object classification tasks, however, absolute size information can make results more accurate. Today, the creation of size annotated image datasets is effort intensive and typically requires measurement equipment not available to public image contributors. In this paper, we propose an effective annotation method that utilizes the camera within smart mobile devices to capture the missing size information along with the image. The approach builds on the fact that with a camera, calibrated to a specific object distance, lengths can be measured in the object’s plane. We use the camera’s minimum focus distance as calibration distance and propose an adaptive feature matching process for precise computation of the scale change between two images facilitating measurements on larger object distances. Eventually, the measured object is segmented and its size information is annotated for later analysis. A user study showed that humans are able to retrieve the calibration distance with a low variance. The proposed approach facilitates a measurement accuracy comparable to manual measurement with a ruler and outperforms state-of-the-art methods in terms of accuracy and repeatability. Consequently, the proposed method allows in-situ size annotation of objects in images without the need for additional equipment or an artificial reference object in the scene.

previous article Group Collaborative Representation for Image Set Classification

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

https://thoth.inrialpes.fr/people/mikolajczyk/Database/zoom.html.

http://roboimagedata.compute.dtu.dk/?page_id=24.

Aanæs, H., Dahl, A. L., & Perfanov, V. (2010). A ground truth data set for two view image matching. Technical report, DTU Informatics, Technical University of Denmark. http://roboimagedata.imm.dtu.dk/papers/technicalReport.pdf.

Aanæs, H., Dahl, A. L., & Steenstrup Pedersen, K. (2011). Interesting interest points. International Journal of Computer Vision, 97(1), 18–35. https://doi.org/10.1007/s11263-011-0473-8.CrossRef

Agarwal, S. (2009). R.: Building rome in a day. In International conference on computer vision (ICCV).

Apple Inc. (2017). Arkit. https://developer.apple.com/arkit/.

Arandjelovic, R., & Zisserman, A. (2012). Three things everyone should know to improve object retrieval. In 2012 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2911–2918). https://doi.org/10.1109/CVPR.2012.6248018.

Bay, H., Ess, A., Tuytelaars, T., & Van Gool, L. (2008). Speeded-up robust features (surf). Computer Vision and Image Understanding, 110(3), 346–359.CrossRef

Bradski, G. (2000). The OpenCV library. Dr Dobb’s Journal of Software Tools, 25, 120–123.

Bursuc, A., Tolias, G., & Jégou, H. (2015). Kernel local descriptors with implicit rotation matching. In Proceedings of the 5th ACM on international conference on multimedia retrieval (pp. 595–598). ACM, New York, NY, USA, ICMR ’15. https://doi.org/10.1145/2671188.2749379.

Cadena, C., Carlone, L., Carrillo, H., Latif, Y., Scaramuzza, D., Neira, J., et al. (2016). Past, present, and future of simultaneous localization and mapping: Toward the robust-perception age. IEEE Transactions on Robotics, 32(6), 1309–1332. https://doi.org/10.1109/TRO.2016.2624754.CrossRef

Criminisi, A., Reid, I., & Zisserman, A. (1999). A plane measuring device. Image and Vision Computing, 17(8), 625–634.CrossRef

Criminisi, A., Reid, I., & Zisserman, A. (2000). Single view metrology. International Journal of Computer Vision, 40(2), 123–148. https://doi.org/10.1023/A:1026598000963.MATHCrossRef

Davison, A. J., Reid, I. D., Molton, N. D., & Stasse, O. (2007). Monoslam: Real-time single camera slam. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(6), 1052–1067.CrossRef

Dong, J., & Soatto, S. (2015). Domain-size pooling in local descriptors: Dsp-sift. In 2015 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 5097–5106). https://doi.org/10.1109/CVPR.2015.7299145.

Eigen, D., & Fergus, R. (2015). Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In 2015 IEEE international conference on computer vision (ICCV) (pp. 2650–2658). https://doi.org/10.1109/ICCV.2015.304.

Everingham, M., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2010). The pascal visual object classes (voc) challenge. International Journal of Computer Vision, 88(2), 303–338.CrossRef

Fuentes-Pacheco, J., Ruiz-Ascencio, J., & Rendón-Mancha, J. M. (2015). Visual simultaneous localization and mapping: A survey. Artificial Intelligence Review, 43(1), 55–81.CrossRef

Google Inc. (2017). Arcore. https://developers.google.com/ar/.

Harris, C., & Stephens, M. (1988). A combined corner and edge detector. In Proceedings of the alvey vision conference (pp. 23.1–23.6). Alvety Vision Club. https://doi.org/10.5244/C.2.23.

Karlsson, N., di Bernardo, E., Ostrowski, J., Goncalves, L., Pirjanian, P., & Munich, M. E. (2005). The vslam algorithm for robust localization and mapping. In Proceedings of the 2005 IEEE international conference on robotics and automation (pp. 24–29). https://doi.org/10.1109/ROBOT.2005.1570091.

Ke, Y., & Sukthankar, R. (2004). Pca-sift: A more distinctive representation for local image descriptors. In Proceedings of the 2004 IEEE computer society conference on computer vision and pattern recognition, 2004 (Vol. 2, pp. II–506–II–513). CVPR 2004. https://doi.org/10.1109/CVPR.2004.1315206.

Kim, H., Richardt, C., & Theobalt, C. (2016). Video depth-from-defocus. In 2016 fourth international conference on 3D vision (3DV) (pp. 370–379). IEEE.

Klein, G., & Murray, D. (2007). Parallel tracking and mapping for small ar workspaces. In 2007 6th IEEE and ACM international symposium on mixed and augmented reality (pp. 225–234). https://doi.org/10.1109/ISMAR.2007.4538852.

Koenderink, J. J., & van Doorn, A. J. (1991). Affine structure from motion. Journal of the Optical Society of America A, 8(2), 377–385. https://doi.org/10.1364/JOSAA.8.000377.CrossRef

Kuhl, A., Wöhler, C., Krüger, L., d’Angelo, P., & Groß, H. M. (2006). Monocular 3D scene reconstruction at absolute scales by combination of geometric and real-aperture methods (pp. 607–616). Berlin, Heidelberg: Springer. https://doi.org/10.1007/11861898_61.CrossRef

Lai, K., Bo, L., Ren, X., & Fox, D. (2011). A large-scale hierarchical multi-view rgb-d object dataset. In 2011 IEEE international conference on robotics and automation (pp. 1817–1824). https://doi.org/10.1109/ICRA.2011.5980382.

Leutenegger, S., Lynen, S., Bosse, M., Siegwart, R., & Furgale, P. (2015). Keyframe-based visualinertial odometry using nonlinear optimization. The International Journal of Robotics Research, 34(3), 314–334. https://doi.org/10.1177/0278364914554813.CrossRef

Levin, A., Fergus, R., Durand, F., & Freeman, W. T. (2007). Image and depth from a conventional camera with a coded aperture. ACM Transactions on Graphics (TOG), 26(3), 70.CrossRef

Li, J., & Allinson, N. M. (2008). A comprehensive review of current local features for computer vision. Neurocomputing, 71(1012), 17711787. https://doi.org/10.1016/j.neucom.2007.11.032.CrossRef

Lin, J., Ji, X., Xu, W., & Dai, Q. (2013). Absolute depth estimation from a single defocused image. IEEE Transactions on Image Processing, 22(11), 4545–4550. https://doi.org/10.1109/TIP.2013.2274389.CrossRef

Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.MathSciNetCrossRef

Luhmann, T., Robson, S., Kyle, S., & Harley, I. (2006). Close range photogrammetry: Principles, methods and applications. Dunbeath: Whittles.

McGuinness, K., & O’Connor, N. E. (2010). A comparative evaluation of interactive segmentation algorithms. Pattern Recognition, 43(2), 434–444. https://doi.org/10.1016/j.patcog.2009.03.008.MATHCrossRef

Mikolajczyk, K., & Schmid, C. (2004). Scale & affine invariant interest point detectors. International Journal of Computer Vision, 60(1), 63–86. https://doi.org/10.1023/B:VISI.0000027790.02288.f2.CrossRef

Moeller, M., Benning, M., Schnlieb, C., & Cremers, D. (2015). Variational depth from focus reconstruction. IEEE Transactions on Image Processing, 24(12), 5369–5378. https://doi.org/10.1109/TIP.2015.2479469.MathSciNetCrossRefMATH

Moreels, P., & Perona, P. (2006). Evaluation of features detectors and descriptors based on 3d objects. International Journal of Computer Vision, 73(3), 263–284. https://doi.org/10.1007/s11263-006-9967-1.CrossRef

Mur-Artal, R., Montiel, J. M. M., & Tards, J. D. (2015). Orb-slam: A versatile and accurate monocular slam system. IEEE Transactions on Robotics, 31(5), 1147–1163. https://doi.org/10.1109/TRO.2015.2463671.CrossRef

Mur-Artal, R., & Tards, J. D. (2017). Orb-slam2: An open-source slam system for monocular, stereo, and rgb-d cameras. IEEE Transactions on Robotics, 33(5), 1255–1262. https://doi.org/10.1109/TRO.2017.2705103.CrossRef

Mustafah, Y. M., Noor, R., Hasbi, H., & Azma, A. W. (2012). Stereo vision images processing for real-time object distance and size measurements. In 2012 international conference on computer and communication engineering (ICCCE) (pp. 659–663). https://doi.org/10.1109/ICCCE.2012.6271270.

Nayar, S. K., & Nakagawa, Y. (1994). Shape from focus. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16(8), 824–831. https://doi.org/10.1109/34.308479.CrossRef

Nitzan, D. (1985). Development of intelligent robots: Achievements and issues. IEEE Journal on Robotics and Automation, 1(1), 3–13.CrossRef

Peng, B., Zhang, L., & Zhang, D. (2013). A survey of graph theoretical approaches to image segmentation. Pattern Recognition, 46(3), 1020–1038. https://doi.org/10.1016/j.patcog.2012.09.015.CrossRef

Pentland, A. P. (1987). A new sense for depth of field. IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI, 9(4), 523–531. https://doi.org/10.1109/TPAMI.1987.4767940.CrossRef

Piasco, N., Sidib, D., Demonceaux, C., & Gouet-Brunet, V. (2018). A survey on visual-based localization: On the benefit of heterogeneous data. Pattern Recognition, 74, 90–109. https://doi.org/10.1016/j.patcog.2017.09.013.CrossRef

Robertson, P., Frassl, M., Angermann, M., Doniec, M., Julian, B. J., Puyol, M. G., Khider, M., Lichtenstern, M., & Bruno, L. (2013). Simultaneous localization and mapping for pedestrians using distortions of the local magnetic field intensity in large indoor environments. In International conference on indoor positioning and indoor navigation (pp. 1–10). https://doi.org/10.1109/IPIN.2013.6817910.

Rother, C., Kolmogorov, V., & Blake, A. (2004). Grabcut: Interactive foreground extraction using iterated graph cuts. ACM Transactions on Graphics, 23(3), 309–314. https://doi.org/10.1145/1015706.1015720.CrossRef

Rzanny, M., Seeland, M., Wäldchen, J., & Mäder, P. (2017). Acquiring and preprocessing leaf images for automated plant identification: Understanding the tradeoff between effort and information gain. Plant Methods, 13(1), 97. https://doi.org/10.1186/s13007-017-0245-8.CrossRef

Saxena, A., Sun, M., & Ng, A. Y. (2009). Make3d: Learning 3d scene structure from a single still image. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(5), 824–840.CrossRef

Schönberger, J. L., Hardmeier, H., Sattler, T., & Pollefeys, M. (2017). Comparative evaluation of hand-crafted and learned local features. In Conference on computer vision and pattern recognition (CVPR)

Seeland, M., Rzanny, M., Alaqraa, N., Wäldchen, J., & Mäder, P. (2017). Plant species classification using flower imagesa comparative study of local feature representations. PLoS ONE, 12(2), e0170,629.CrossRef

Smith, R. C., & Cheeseman, P. (1986). On the representation and estimation of spatial uncertainty. The International Journal of Robotics Research, 5(4), 56–68.CrossRef

Subbarao, M., & Surya, G. (1994). Depth from defocus: A spatial domain approach. International Journal of Computer Vision, 13(3), 271–294. https://doi.org/10.1007/BF02028349.CrossRef

Thrun, S., et al. (2002). Robotic mapping: A survey. Exploring Artificial Intelligence in the New Millennium, 1, 1–35.

Torralba, A., Murphy, K. P., & Freeman, W. T. (2004). Sharing features: Efficient boosting procedures for multiclass object detection. In Proceedings of the 2004 IEEE computer society conference on computer vision and pattern recognition, 2004 (Vol. 2, pp. II–762–II–769). CVPR 2004. https://doi.org/10.1109/CVPR.2004.1315241.

Tuytelaars, T., & Mikolajczyk, K. (2008). Local invariant feature detectors: A survey. Foundations and Trends in Computer Graphics and Vision, 3(3), 177–280. https://doi.org/10.1561/0600000017.CrossRef

Uhrig, J., Cordts, M., Franke, U., & Brox, T. (2016). Pixel-level encoding and depth layering for instance-level semantic labeling (pp. 14–25). Cham: Springer. https://doi.org/10.1007/978-3-319-45886-1_2.CrossRef

Wäldchen, J., & Mäder, P. (2018). Plant species identification using computer vision techniques: A systematic literature review. Archives of Computational Methods in Engineering, 25(2), 507–543. https://doi.org/10.1007/s11831-016-9206-z.MathSciNetMATHCrossRef

Wäldchen, J., Rzanny, M., Seeland, M., & Mäder, P. (2018). Automated plant species identificationtrends and future directions. PLoS Computational Biology, 14(4), e1005,993.CrossRef

Watanabe, M., & Nayar, S. K. (1998). Rational filters for passive depth from defocus. International Journal of Computer Vision, 27(3), 203–225. https://doi.org/10.1023/A:1007905828438.CrossRef

Williams, B., Cummins, M., Neira, J., Newman, P., Reid, I., & Tards, J. (2009). A comparison of loop closing techniques in monocular slam. Robotics and Autonomous Systems, 57(12), 1188–1197. https://doi.org/10.1016/j.robot.2009.06.010.CrossRef

Wittich, H. C., Seeland, M., Wäldchen, J., Rzanny, M., & Mäder, P. (2018). Recommending plant taxa for supporting on-site species identification. BMC Bioinformatics, 19. https://doi.org/10.1186/s12859-018-2201-7

ygx2011. (2017). Orb slam2 ios. https://github.com/ygx2011/ORB_SLAM2-IOS.

Title: Efficiently Annotating Object Images with Absolute Size Information Using Mobile Devices
Authors: Martin Hofmann
Marco Seeland
Patrick Mäder
Publication date: 24-05-2018
Publisher: Springer US
Published in: International Journal of Computer Vision / Issue 2/2019
Print ISSN: 0920-5691
Electronic ISSN: 1573-1405
DOI: https://doi.org/10.1007/s11263-018-1093-3

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Other articles of this Issue 2/2019

Facial Landmark Detection: A Literature Survey

Complete 3D Scene Parsing from an RGBD Image

Equivalent Constraints for Two-View Geometry: Pose Solution/Pure Rotation Identification and 3D Reconstruction

Appreciation to IJCV Reviewers of 2018

Group Collaborative Representation for Image Set Classification

Premium Partner