nach oben

International Journal of Computer Vision

Erschienen in:

01.05.2014

The Ignorant Led by the Blind: A Hybrid Human–Machine Vision System for Fine-Grained Categorization

verfasst von: Steve Branson, Grant Van Horn, Catherine Wah, Pietro Perona, Serge Belongie

Erschienen in: International Journal of Computer Vision | Ausgabe 1-2/2014

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

We present a visual recognition system for fine-grained visual categorization. The system is composed of a human and a machine working together and combines the complementary strengths of computer vision algorithms and (non-expert) human users. The human users provide two heterogeneous forms of information object part clicks and answers to multiple choice questions. The machine intelligently selects the most informative question to pose to the user in order to identify the object class as quickly as possible. By leveraging computer vision and analyzing the user responses, the overall amount of human effort required, measured in seconds, is minimized. Our formalism shows how to incorporate many different types of computer vision algorithms into a human-in-the-loop framework, including standard multiclass methods, part-based methods, and localized multiclass and attribute methods. We explore our ideas by building a field guide for bird identification. The experimental results demonstrate the strength of combining ignorant humans with poor-sighted machines the hybrid system achieves quick and accurate bird identification on a dataset containing 200 bird species.

Vorheriger Artikel Editorial: Special Issue on Active and Interactive Methods in Computer Vision

Nächster Artikel Putting the User in the Loop for Image-Based Modeling

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Nur mit Berechtigung zugänglich

Our user model assumes binary or multinomial attributes; however, one could use continuous attribute values for the computer vision component described in this section

The integral in Eq. 26 involves a bottom-up traversal of \(T=(V,E)\), at each step convolving a spatial score map with a unary score map (takes time \(O(n \log n)\) time in the number of pixels).

Maximum likelihood inference involves a bottom-up traversal of \(T\), doing a distance transform operation (Felzenszwalb et al. 2008) for each part in the tree (takes time \(O(n)\) time in the number of pixels).

in practice, we also computed an average segmentation mask for each part-aspect and used that to weight each extracted patch, see supplementary material

http://www.allaboutbirds.org/NetCommunity/page.aspx?pid=1053

Belhumeur, P., Chen, D., Feiner, S., Jacobs, D., Kress, W., Ling, H., Lopez, I., Ramamoorthi, R., Sheorey, S., White, S. & Zhang, L. (2008). Searching the world’s herbaria. In ECCV.

Berg, T. & Belhumeur, P.N. (2013). Poof: Part-based one-vs-one features for fine-grained categorization, face verification, and attribute estimation. In CVPR.

Biederman, I., Subramaniam, S., Bar, M., Kalocsai, P., & Fiser, J. (1999). Subordinate-level object classification reexamined. Psychological Research, 63(2–3), 131–153.CrossRef

Bourdev, L. & Malik, J. (2009). Poselets: Body part detectors trained using 3d annotations. In ICCV.

Branson, S., Perona, P. & Belongie, S. (2011). Strong supervision from weak annotation. In ICCV.

Branson, S., Wah, C., Schroff, F., Babenko, B., Welinder, P., Perona, P. & Belongie, S. (2010). Visual recognition with humans in the loop. In ECCV.

Chai, Y., Lempitsky, V. & Zisserman, A. (2011). Bicos: A bi-level co-segmentation method. In ICCV.

Chai, Y., Lempitsky, V. & Zisserman, A. (2013). Symbiotic segmentation and part localization for fine-grained categorization. In ICCV.

Chai, Y., Rahtu, E., Lempitsky, V., Van Gool, L. & Zisserman, A. (2012). Tricos. In ECCV.

Cox, I.J., Miller, M.L., Minka, T.P., Papathomas, T.V. & Yianilos, P.N. (2000). The bayesian image retrieval system, pichunter: Theory, implementation, and psychophysical experiments. Image processing.

Donahue, J. & Grauman, K. (2011). Annotator rationales for visual recognition. In ICCV.

Douze, M., Ramisa, A. & Schmid, C. (2011). Combining attributes and fisher vectors for efficient image retrieval. In CVPR.

Duan, K., Parikh, D., Crandall, D. & Grauman, K. (2012). Discovering localized attributes for fine-grained recognition. In CVPR.

Fang, Y. & Geman, D. (2005). Experiments in mental face retrieval. In AVBPA.

Farhadi, A., Endres, I. & Hoiem, D. (2010). Attribute-centric recognition for generalization. In CVPR.

Farhadi, A., Endres, I., Hoiem, D. & Forsyth, D. (2009). Describing objects by attributes. In CVPR.

Farrell, R., Oza, O., Zhang, N., Morariu, V., Darrell, T. & Davis, L. (2011). Birdlets. In ICCV.

Felzenszwalb, P. & Huttenlocher, D. (2002). Efficient matching of pictorial structures. In CVPR.

Felzenszwalb, P., McAllester, D. & Ramanan, D. (2008). A discriminatively trained, multiscale, deformable part model. In CVPR.

Ferecatu, M. & Geman, D. (2007). Interactive search by mental matching. In ICCV .

Ferecatu, M. & Geman, D. (2009). A statistical framework for image category search from a mental picture. In PAMI.

Gavves, E., Fernando, B., Snoek, C., Smeulders, A. & Tuytelaars, T. (2013). Fine-grained categorization by alignments. In ICCV.

Geman, D. & Jedynak, B. (1993). Shape recognition and twenty questions. Belmont: Wadsworth.

Geman, D. & Jedynak, B. (1996). An active testing model for tracking roads in satellite images. In PAMI.

Jedynak, B., Frazier, P. I., & Sznitman, R. (2012). Twenty questions with noise: Bayes optimal policies for entropy loss. Journal of Applied Probability, 49(1), 114–136.CrossRefMATHMathSciNet

Khosla, A., Jayadevaprakash, N., Yao, B. & Li, F.F. (2011). Novel dataset for fgvc: Stanford dogs. San Diego: CVPR Workshop on FGVC.

Kumar, N., Belhumeur, P., Biswas, A., Jacobs, D., Kress, W., Lopez, I. & Soares, J. (2012). Leafsnap: A computer vision system for automatic plant species identification. In ECCV.

Kumar, N., Belhumeur, P. & Nayar, S. (2008). Facetracer: A search engine for large collections of images with faces. In ECCV.

Kumar, N., Berg, A.C., Belhumeur, P.N. & Nayar, S.K. (2009). Attribute and simile classifiers for face verification. In ICCV.

Lampert, C., Nickisch, H. & Harmeling, S. (2009). Learning to detect unseen object classes. In CVPR.

Larios, N., Soran, B., Shapiro, L.G., Martinez-Munoz, G., Lin, J. & Dietterich, T.G. (2010). Haar random forest features and svm spatial matching kernel for stonefly species identification. In ICPR.

Lazebnik, S., Schmid, C. & Ponce, J. (2005). A maximum entropy framework for part-based texture and object recognition. In ICCV.

Levin, A., Lischinski, D. & Weiss, Y. (2007). A closed-form solution to natural image matting. In PAMI.

Liu, J., Kanazawa, A., Jacobs, D. & Belhumeur, P. (2012). Dog breed classification using part localization. In ECCV.

Lu, Y., Hu, C., Zhu, X., Zhang, H. & Yang, Q. (2000). A unified framework for semantics and feature based relevance feedback in image retrieval systems. In ACM Multimedia.

Maji, S. (2012). Discovering a lexicon of parts and attributes. In ECCV Parts and Attributes.

Maji, S. & Shakhnarovich, G. (2012). Part annotations via pairwise correspondence. In Conference on Artificial Intelligence Workshop.

Martınez-Munoz et al. (2009). Dictionary-free categorization of very similar objects. In CVPR.

Mervis, C. B., & Crisafi, M. A. (1982). Order of acquisition of subordinate-, basic-, and superordinate-level categories. Child Development, 53(1), 256–266.

Nilsback, M. & Zisserman, A. (2008). Automated flower classification. In ICVGIP.

Nilsback, M.E. & Zisserman, A. (2006). A visual vocabulary for flower classification. In CVPR.

Ott, P. & Everingham, M. (2011). Shared parts for deformable part-based models. In CVPR.

Parikh, D. & Grauman, K. (2011). Interactively building a vocabulary of attributes. In CVPR.

Parikh, D. & Grauman, K. (2011). Relative attributes. In ICCV.

Parikh, D. & Grauman, K. (2013). Implied feedback: Learning nuances of user behavior in image search. In ICCV.

Parikh, D. & Zitnick, C.L. (2011a). Finding the weakest link in person detectors. In CVPR .

Parikh, D. & Zitnick, C.L. (2011b). Human-debugging of machines. In NIPS Wisdom of Crowds.

Parkash, A. & Parikh, D. (2012). Attributes for classifier feedback. In ECCV.

Parkhi, O., Vedaldi, A., Zisserman, A. & Jawahar, C. (2012). Cats and dogs. In CVPR.

Parkhi, O.M., Vedaldi, A., Jawahar, C. & Zisserman, A. (2011). The truth about cats and dogs. In ICCV.

Perronnin, F., Sánchez, J. & Mensink, T. (2010). Improving the fisher kernel. In ECCV.

Platt, J.C. (1999). Probabilistic outputs for svms. In ALMC.

Quinlan, J. R. (1993). C4.5: Programs for machine learning. Burlington: Morgan Kaufmann.

Rasiwasia, N., Moreno, P.J. & Vasconcelos, N. (2007). Bridging the gap: Query by semantic example. In Multimedia.

Rosch, E. (1999). Principles of categorization. In Concepts: Core readings.

Rosch, E., Mervis, C.B. & Gray, W.D., Johnson, D.M., Boyes-Braem, P. (1976). Basic objects in natural categories. In Cognitive Psychology.

Rother, C., Kolmogorov, V. & Blake, A. (2004). Grabcut: Interactive foreground extraction. In TOG.

Settles, B. (2008). Curious machines: Active learning with structured instances.

Stark, M., Krause, J., Pepik, B., Meger, D., Little, J.J., Schiele, B. & Koller, D. (2012). Fine-grained categorization for 3d scene understanding. In BMVC.

Sznitman, R., Basu, A., Richa, R., Handa, J., Gehlbach, P., Taylor, R.H., Jedynak, B. & Hager, G.D. (2011). Unified detection and tracking in retinal microsurgery. In MICCAI.

Sznitman, R. & Jedynak, B. (2010). Active testing for face detection and localization. In PAMI.

Tsiligkaridis, T., Sadler, B. & Hero, A. (2013). A collaborative 20 questions model for target search with human-machine interaction. In ICASSP.

Tsochantaridis, I., Joachims, T., Hofmann, T. & Altun, Y. (2006). Large margin methods for structured and interdependent output variables. In JMLR.

Vijayanarasimhan, S. & Grauman, K. (2009). What’s It Going to Cost You? In CVPR.

Vijayanarasimhan, S. & Grauman, K. (2011). Large-scale live active learning. In CVPR.

Vondrick, C. & Ramanan, D. (2011). Video Annotation and Tracking with Active Learning. In NIPS.

Vondrick, C., Ramanan, D. & Patterson, D. (2010). Efficiently scaling up video annotation. In ECCV.

Wah, C., Branson, S., Perona, P. & Belongie, S. (2011). Multiclass recognition and part localization with humans in the loop. In ICCV.

Wah, C., Branson, S., Welinder, P., Perona, P. & Belongie, S. (2011). The Caltech-UCSD Birds-200-2011 Dataset. Tech. Rep. CNS-TR-2011-001, Pasadena: Caltech.

Wang, G. & Forsyth, D. (2009). Joint learning of visual attributes, object classes. In ICCV.

Wang, J., Markert, K. & Everingham, M. (2009). Learning models for object recognition from natural language descriptions. In BMVC.

Wu, W. & Yang, J. (2006). SmartLabel: an object labeling tool. In Multimedia.

Yang, Y. & Ramanan, D. (2011). Articulated pose estimation using mixtures of parts. In CVPR.

Yao, B., Bradski, G., Fei-Fei, L.: A codebook and annotation-free approach for fgvc. In: CVPR (2012)

Yao, B., Khosla, A. & Fei-Fei, L. (2011). Combining randomization and discrimination for fgvc. In CVPR.

Zhang, N., Farrell, R. & Darrell, T. (2012). Pose pooling kernels for sub-category recognition. In CVPR.

Zhang, N., Farrell, R., Iandola, F. & Darrell, T. (2013). Deformable part descriptors for fine-grained recognition and attribute prediction. In ICCV.

Zhou, X. & Huang, T. (2003). Relevance feedback in image retrieval. In Multimedia.

Titel: The Ignorant Led by the Blind: A Hybrid Human–Machine Vision System for Fine-Grained Categorization
verfasst von: Steve Branson
Grant Van Horn
Catherine Wah
Pietro Perona
Serge Belongie
Publikationsdatum: 01.05.2014
Verlag: Springer US
Erschienen in: International Journal of Computer Vision / Ausgabe 1-2/2014
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI: https://doi.org/10.1007/s11263-014-0698-4

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 1-2/2014

Putting the User in the Loop for Image-Based Modeling

Iterative Category Discovery via Multiple Kernel Metric Learning

The SUN Attribute Database: Beyond Categories for Deeper Scene Understanding

Editorial: Special Issue on Active and Interactive Methods in Computer Vision

Active Image Clustering with Pairwise Constraints from Humans

Large-Scale Live Active Learning: Training Object Detectors with Crawled Data and Crowds

Premium Partner