Automatic fruit and vegetable classification from images

https://doi.org/10.1016/j.compag.2009.09.002Get rights and content

Abstract

Contemporary Vision and Pattern Recognition problems such as face recognition, fingerprinting identification, image categorization, and DNA sequencing often have an arbitrarily large number of classes and properties to consider. To deal with such complex problems using just one feature descriptor is a difficult task and feature fusion may become mandatory. Although normal feature fusion is quite effective for some problems, it can yield unexpected classification results when the different features are not properly normalized and preprocessed. Besides it has the drawback of increasing the dimensionality which might require more training data. To cope with these problems, this paper introduces a unified approach that can combine many features and classifiers that requires less training and is more adequate to some problems than a naïve method, where all features are simply concatenated and fed independently to each classification algorithm. Besides that, the presented technique is amenable to continuous learning, both when refining a learned model and also when adding new classes to be discriminated. The introduced fusion approach is validated using a multi-class fruit-and-vegetable categorization task in a semi-controlled environment, such as a distribution center or the supermarket cashier. The results show that the solution is able to reduce the classification error in up to 15 percentage points with respect to the baseline.

Introduction

Recognizing different kinds of vegetables and fruits is a recurrent task in supermarkets, where the cashier must be able to point out not only the species of a particular fruit (i.e., banana, apple, pear) but also its variety (i.e., Golden Delicious, Jonagold, Fuji), which will determine it’s price. The use of barcodes has mostly ended this problem for packaged products but given that consumers want to pick their produce, they cannot be packaged, and thus must be weighted. A common solution to this problem is issuing codes for each kind of fruit/vegetable; which has problems given that the memorization is hard, leading to errors in pricing.

As an aid to the cashier, many supermarkets issue a small book with pictures and codes; the problem with this solution is that flipping over the booklet is time-consuming.

This paper reviews several image descriptors in the literature and introduces a system to solve the problem by adapting a camera to the supermarket scale that identifies fruits and vegetables based on color, texture, and appearance cues.

Formally, given an image of fruits or vegetables of only one variety, in arbitrary position and number, the system must return a list of possible candidates of the form (species, variety). Sometimes, the object can be inside a plastic bag that can add specular reflections and hue shifts.

Given the variety and the impossibility of predicting which kinds of fruit/vegetables are sold, training must be done on-site by someone with little or no technical knowledge. Therefore, the system must be able to achieve a high level of precision with only a few training examples (e.g., up to 30 images).

Often, one needs to deal with complex classification problems. In such scenarios, using just one feature descriptor to capture the classes’ separability might not be enough and feature fusion may become necessary.

Although normal feature fusion is quite effective for some problems, it can yield unexpected classification results when the different features are not properly normalized and preprocessed. Besides it has the drawback of increasing the dimensionality of the data which might require more training examples.

This paper presents a unified approach that can combine many features and classifiers. It requires less training and is more adequate to some problems than a naïve method, where all features are simply concatenated and fed independently to each classification algorithm. We expect that this solution will endure beyond the problem solved in this paper.

The introduced fusion approach is validated using an image data set collected from the local fruits and vegetables distribution center and made public. The image data set contains 15 produce categories comprising 2633 images collected on-site in a period of 5 months under diverse conditions. The implemented solution achieves a classification error less than 2% for the top one responses. With the top two responses such error is smaller than 1%.

Section 2 gives a brief overview of previous work in object recognition and image categorization. Section 3 presents the different kinds of image descriptors used in this paper as well as the produce data set. Section 4 introduces the solution for feature and classifier fusion, and Section 5 presents experimental results. Finally, Section 6 draws the conclusions and future directions.

Section snippets

Literature review

Recently, there has been a lot of activity in the area of Image Categorization. Previous approaches considered patterns in color, edge and texture properties (Stehling et al., 2002, Unser, 1986, Pass et al., 1997); low- and middle-level features to distinguish broad classes of images (Rocha and Goldenstein, 2007, Lyu and Farid, 2005, Cutzu et al., 2005, Serrano et al., 2004). In addition, Heidemann (2004) has presented an approach to establish image categories automatically using histograms,

Materials and methods

In general, image categorization relies on combinations of statistical, structural and spectral approaches. Statistical approaches describe the objects using global and local descriptors such as mean, variance, and entropy. Structural approaches represent the object’s appearance using well-known primitives such as patches of important parts of the object. Finally, spectral approaches describe the objects using some spectral space representation such as Fourier spectrum (Gonzalez and Woods, 2007

Feature and classifier fusion

This section shows the motivation and design of the feature and classifier fusion introduced in this paper.

Results and discussions

In the quest for finding the best classification procedures and features for produce categorization, this paper analyzes several appearance-, color-, texture-, and shape-based image descriptors as well as diverse machine learning techniques such as Support Vector Machine (SVM), Linear Discriminant Analysis (LDA), Classification Trees, K-Nearest Neighbors (K-NN), and Ensembles of Trees and LDA (Bishop, 2006). All the experiments hereafter are made on real data (Section 3.1).

In the following

Conclusions and future work

Oftentimes, when tackling complex classification problems, just one feature descriptor is not enough to capture the classes’ separability. Therefore, efficient and effective feature fusion policies may become necessary. Although normal feature fusion is quite effective for some problems, it can yield unexpected classification results when not properly normalized and preprocessed. Additionally, it has the drawback of increasing the dimensionality which might require more training data.

This paper

Acknowledgments

The authors thank the people at the local produce distribution center for their patience and help. Finally, this research was funded by FAPESP (Award Number 2008/08681-9, 05/58103-3, 07/52015-0, and 08/54443-2) and CNPq (Award Number 309254/2007-8, 472402/2007-2, and 551007/2007-9).

References (27)

  • F. Cutzu et al.

    Distinguishing paintings from photographs

    CVIU

    (March 2005)
  • S. Agarwal et al.

    Learning to detect objects in images via a sparse, part-based representation

    TPAMI

    (November 2004)
  • R. Anand et al.

    Efficient classification for multi-class problems using modular neural networks

    IEEE TNN

    (January 1995)
  • A. Berg et al.

    Shape matching and object recognition using low distortion correspondences

  • C.M. Bishop

    Pattern Recognition and Machine Learning

    (2006)
  • R.M. Bolle et al.

    Veggievision: a produce recognition system.

  • D. Comaniciu et al.

    Mean shift: a robust approach toward feature space analysis

    IEEE TPAMI

    (May 2002)
  • T.G. Dietterich et al.

    Solving multi-class learning problems via error-correcting output codes

    JAIR

    (January 1996)
  • L. Fei-Fei et al.

    One-shot learning of object categories

    IEEE TPAMI

    (April 2006)
  • R. Gonzalez et al.

    Digital Image Processing

    (2007)
  • K. Grauman et al.

    Efficient image matching with distributions of local invariant features

  • G. Heidemann

    Unsupervised image categorization

    IVC

    (October 2004)
  • F. Jurie et al.

    Creating efficient codebooks for visual recognition

  • Cited by (0)

    View full text