Automatic fruit and vegetable classification from images
Introduction
Recognizing different kinds of vegetables and fruits is a recurrent task in supermarkets, where the cashier must be able to point out not only the species of a particular fruit (i.e., banana, apple, pear) but also its variety (i.e., Golden Delicious, Jonagold, Fuji), which will determine it’s price. The use of barcodes has mostly ended this problem for packaged products but given that consumers want to pick their produce, they cannot be packaged, and thus must be weighted. A common solution to this problem is issuing codes for each kind of fruit/vegetable; which has problems given that the memorization is hard, leading to errors in pricing.
As an aid to the cashier, many supermarkets issue a small book with pictures and codes; the problem with this solution is that flipping over the booklet is time-consuming.
This paper reviews several image descriptors in the literature and introduces a system to solve the problem by adapting a camera to the supermarket scale that identifies fruits and vegetables based on color, texture, and appearance cues.
Formally, given an image of fruits or vegetables of only one variety, in arbitrary position and number, the system must return a list of possible candidates of the form (species, variety). Sometimes, the object can be inside a plastic bag that can add specular reflections and hue shifts.
Given the variety and the impossibility of predicting which kinds of fruit/vegetables are sold, training must be done on-site by someone with little or no technical knowledge. Therefore, the system must be able to achieve a high level of precision with only a few training examples (e.g., up to 30 images).
Often, one needs to deal with complex classification problems. In such scenarios, using just one feature descriptor to capture the classes’ separability might not be enough and feature fusion may become necessary.
Although normal feature fusion is quite effective for some problems, it can yield unexpected classification results when the different features are not properly normalized and preprocessed. Besides it has the drawback of increasing the dimensionality of the data which might require more training examples.
This paper presents a unified approach that can combine many features and classifiers. It requires less training and is more adequate to some problems than a naïve method, where all features are simply concatenated and fed independently to each classification algorithm. We expect that this solution will endure beyond the problem solved in this paper.
The introduced fusion approach is validated using an image data set collected from the local fruits and vegetables distribution center and made public. The image data set contains 15 produce categories comprising 2633 images collected on-site in a period of 5 months under diverse conditions. The implemented solution achieves a classification error less than 2% for the top one responses. With the top two responses such error is smaller than 1%.
Section 2 gives a brief overview of previous work in object recognition and image categorization. Section 3 presents the different kinds of image descriptors used in this paper as well as the produce data set. Section 4 introduces the solution for feature and classifier fusion, and Section 5 presents experimental results. Finally, Section 6 draws the conclusions and future directions.
Section snippets
Literature review
Recently, there has been a lot of activity in the area of Image Categorization. Previous approaches considered patterns in color, edge and texture properties (Stehling et al., 2002, Unser, 1986, Pass et al., 1997); low- and middle-level features to distinguish broad classes of images (Rocha and Goldenstein, 2007, Lyu and Farid, 2005, Cutzu et al., 2005, Serrano et al., 2004). In addition, Heidemann (2004) has presented an approach to establish image categories automatically using histograms,
Materials and methods
In general, image categorization relies on combinations of statistical, structural and spectral approaches. Statistical approaches describe the objects using global and local descriptors such as mean, variance, and entropy. Structural approaches represent the object’s appearance using well-known primitives such as patches of important parts of the object. Finally, spectral approaches describe the objects using some spectral space representation such as Fourier spectrum (Gonzalez and Woods, 2007
Feature and classifier fusion
This section shows the motivation and design of the feature and classifier fusion introduced in this paper.
Results and discussions
In the quest for finding the best classification procedures and features for produce categorization, this paper analyzes several appearance-, color-, texture-, and shape-based image descriptors as well as diverse machine learning techniques such as Support Vector Machine (SVM), Linear Discriminant Analysis (LDA), Classification Trees, K-Nearest Neighbors (K-NN), and Ensembles of Trees and LDA (Bishop, 2006). All the experiments hereafter are made on real data (Section 3.1).
In the following
Conclusions and future work
Oftentimes, when tackling complex classification problems, just one feature descriptor is not enough to capture the classes’ separability. Therefore, efficient and effective feature fusion policies may become necessary. Although normal feature fusion is quite effective for some problems, it can yield unexpected classification results when not properly normalized and preprocessed. Additionally, it has the drawback of increasing the dimensionality which might require more training data.
This paper
Acknowledgments
The authors thank the people at the local produce distribution center for their patience and help. Finally, this research was funded by FAPESP (Award Number 2008/08681-9, 05/58103-3, 07/52015-0, and 08/54443-2) and CNPq (Award Number 309254/2007-8, 472402/2007-2, and 551007/2007-9).
References (27)
- et al.
Distinguishing paintings from photographs
CVIU
(March 2005) - et al.
Learning to detect objects in images via a sparse, part-based representation
TPAMI
(November 2004) - et al.
Efficient classification for multi-class problems using modular neural networks
IEEE TNN
(January 1995) - et al.
Shape matching and object recognition using low distortion correspondences
Pattern Recognition and Machine Learning
(2006)- et al.
Veggievision: a produce recognition system.
- et al.
Mean shift: a robust approach toward feature space analysis
IEEE TPAMI
(May 2002) - et al.
Solving multi-class learning problems via error-correcting output codes
JAIR
(January 1996) - et al.
One-shot learning of object categories
IEEE TPAMI
(April 2006) - et al.
Digital Image Processing
(2007)