Skip to main content
Erschienen in:
Buchtitelbild

Open Access 2014 | OriginalPaper | Buchkapitel

10. Survey of Ground Truth Datasets

verfasst von : Scott Krig

Erschienen in: Computer Vision Metrics

Verlag: Apress

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
download
DOWNLOAD
print
DRUCKEN
insite
SUCHEN
loading …

Abstract

Table B-1 is a brief survey of public domain datasets in various categories, in no particular order. Note that many of the public domain datasets are freely available from universities and government agencies.
Table B-1 is a brief survey of public domain datasets in various categories, in no particular order. Note that many of the public domain datasets are freely available from universities and government agencies.
Table B-1.
Public domain datasets
Name
Labelme
Description
Annotated scenes and objects
Categories
Over 30,000 images; comprehensive; hundreds of categories, including car, person, building, road, sidewalk, sky, tree
Contributions
Open to contributions
Tools and apps
Labelme app for iPhone to contribute to database
Key papers
[67][68]
Owner
MTI CSAIL
Link
Name
SUN
Description
Annotated scenes and objects
Categories
908 scene categories, 3,819 object categories,13,1072 objects, and growing
Contributions
Open to contributions
Tools and apps
Image classifier source code + API, iOS app, Android app
Key papers
[70]
Owner
MTI CSAIL
Link
Name
UC Irvine Machine Learning Repository
Description
Very useful; huge repository of many categories of images
Categories
Too many to list; very wide range of categories, many attributes of the data are specifically searchable and designed into the ground truth datasets
Contributions
Ongoing
Tools and apps
Online assistant to search for specific ground truth datasets
Key papers
[550]
Link
Name
Stanford 3D Scanning Repository
Description
High-resolution 3D scanned images with sub-millimeter accuracy, including XYZ and RGB datasets
Categories
Several scanned 3D objects with 3D point clouds, resolution ranging from 3,400,000 scanned point to 750,000 triangles and upwards
Link
Name
KITTI Benchmark Suite, Karlsruhe Institute of Technology
Description
Stereo datasets for various city driving scenes
Categories
KITTI benchmark suite covers optical flow, odometry, object detection, object orientation estimation; Karlsruhe sequences cover gray scale stereo sequences taken from a moving platform driving through a city; Karlsruhe objects cover gray scale stereo sequences taken from a moving platform driving through a city
Link
Name
Caltech Object Recognition Datasets
Description
Old but still useful; objects in hundreds of categories, some annotated with outlines
Categories
Over 256 categories, animals,plants, people, common objects, common food items, tools, furniture, more.
Key papers
[71]
Link
Name
Imagenet + Wordnet
Description
Labeled, annotated, bounding-boxed, and feature-descriptor marked images; over 14,197,122 images indexed into 21,841 sets of similar images, or synsets, created using sister app Wordnet
Categories
Categories include almost anything
Contributions
Images taken from Internet searches
Tools and apps
Source Code: ImageNet Large Scale Visual Recognition Challenge (ILSVRC2010) http://​www.​image-net.​org/​challenges/​LSVRC/​2010/​index
Key papers
Owner
Images have individual owners; website is © Stanford and Princeton
Link
Name
Middlebury Computer Vision Datasets
Description
Scholarly and comprehensive datasets, and algorithm comparisons over most of the datasets
Categories
Stereo vision (excellent), multi-view stereo (excellent), MRF, Optical Flow (excellent), Color processing
Contributions
Algorithm benchmarks over the datasets can be submitted
Key papers
Several; see website
Owner
Middlebury College
Link
Name
ADL Activity Recognition Dataset
Description
Annotated scenes for activity recognition of common living scenes
Categories
Daily life
Tools and apps
Activity recognition code available (see link below)
Key papers
[73]
Link
Name
MIT Indoor Scenes 67, Scene Classification
Description
Annotated dataset specifically containing diverse indoor scenes
Categories
15,620 images organized into 67 indoor categories, some annotations in Labelme format
Key papers
[74]
Link
Name
RGB-D Object Recognition Dataset, U of W
Description
Dataset contains RGB and corresponding depth images
Categories
300 common household objects, 51 categories using Wordnet similar to Imagenet style (Imagenet dataset reviewed above), each object recorded in RGB and Kinect depth at various rotational angles and viewpoints
Key papers
[75]
Link
Name
NYU Depth Datasets
Description
Annotated dataset of indoor scenes using RGB-D datasets + accelerometer data
Categories
Over 500,000 frames, many different indoor scenes and scene types, thousands of classes, accelerometer data, inpainted and raw depth information
Tools and apps
Matlab toolbox + g++ code
Key papers
[76]
Link
Name
Intel Labs Seattle - Egocentric Recognition of Handled Objects
Description
Annotated dataset for egocentric handled objects using a wearable camera
Categories
Over 42 everyday objects under varied lighting, occlusion, perspectives; over 6GB total video sequence data
Key papers
[77] [78]
Link
Name
Georgia Tech GTEA Egocentric Activities - Gaze(+)
Description
Annotated dataset for egocentric handled objects using a wearable camera
Categories
Many everyday objects under varied lighting, occlusion, perspectives
Tools and apps
Code library of vision functions and mathematical functions
Key papers
[79]
Link
Name
CUReT: Columbia-Utrecht Reflectance and Texture Database
Description
Extensive texture sample and illumination datasets directions
Categories
Over 60 different samples with over 200 viewing and illumination combinations, BRDF measurement database, more
Key papers
[80]
Link
Name
MIT Flickr Material Surface Category Dataset
Description
Dataset for identifying material categories including fabric, glass, metal, plastic, water, foliage, leather, paper, stone, wood
Categories
Contains images of materials for surface property analysis, in contrast to object or texture analysis; 10 categories of materials + 100 images in each category
Key papers
[81]
Link
Name
Faces in the Wilds
Description
Collection of over 13,000 images of faces annotated with names of people
Categories
Faces
Key papers
[82]
Link
Name
The CMU Multi-PIE Face Database
Description
Annotated face and emotion database with multiple pose angles
Categories
750,000 face images are taken over a period of several months for each of 337 subjects over 15 viewpoints and 19 illuminations, annotated facial expressions
Key papers
[83]
Link
Name
Stanford 40 Actions
Description
People actions image database
Categories
People performing 40 actions, bounding-box annotations, 9,532 images, 180-300 images per action class
Key papers
[84]
Link
Name
NORB 3D Object Recognition from Shape
Description
NYU object recognition benchmark
Categories
Stereo image pairs; 194,400 total images of 50 toys under 36 azimuths, 9 elevations, and 6 lighting conditions
Tools and apps
EBLEARN C++ learning and vision library, LUSH programming language, VisionGRader object detection tool
Key papers
[85]
Link
Name
Optical Flow Algorithm Evaluation
Description
Tools and data for optical flow evaluation purposes
Categories
Many optical flow sequence ground truth datasets
Tools and apps
Tool for generating optical flow data, some optical flow code algorithms
Key papers
[86]
Link
Name
PETS Crowd Sensing Dataset Challenge
Description
Multi-sensor camera views composed into a dataset containing sequences of crowd activities
Categories
Challenge goals include crowd estimation, density, tracking of specific people, flow of crowd
Key papers
[94]
Link
Name
I-LIDS
Description
Security-oriented challenge ground truth dataset to enable competitive benchmarking including scenes for locating parked vehicles, abandoned baggage, secure perimeters, and doorway surveillance
Categories
Various categories in the security domain
Contributions
No, funded by UK government
Tools and apps
n.a.
Key papers
n.a.
Link
Name
TRECVID, NIST, US Government
Description
NIST-sponsored public project spanning 2001-2013 for research in automatic segmentation, indexing, and content-based video retrieval
Categories
1. Semantic indexing (SIN) 2. Known-item search (KIS) 3. Instance search (INS) 4. Multimedia event detection (MED) 5. Multimedia event recounting (MER) 6. Surveillance event detection (SER), natural scenes, humans, vegetation, pets, office objects, more
Contributions
Annually by U.S. Government
Tools and apps
The Framework For Detection Evaluations (F4DE) tool, story evaluation tool, and others
Key papers
[95]
Link
Name
Microsoft Research Cambridge
Description
Pixel-wise labeled or segmented objects
Categories
Several hundred objects
Link
Name
Optical Flow Algorithm Evaluation
Description
Volume-rendered video scenes for optical flow algorithm benchmarking
Categories
Various scenes for optical flow; mainly synthetic sequences generated via ray tracing
Contributions
n.a.
Tools and apps
Yes, Tcl/Tk
Key papers
[96]
Link
Name
Pascal Object Recognition VOC Challenge Dataset
Description
Standardized ground truth data for a research challenge spanning 2005-2013 in the area of object recognition; competitions include classification, detection, segmentation, and actions over each of 20 classes of data
Categories
Consists of over 20 classes of objects in scenes including persons, animals, vehicles, indoor objects
Contributions
Via the Pascal conference
Tools and apps
Includes a developer kit and other useful software for labeling data and database access, and tools for reporting benchmarks results
Key papers
[97]
Link
Name
CRCV
Description
Very extensive; University of Central Florida’s Center for Research in Computer Vision hosts a large collection of research data covering several domains
Categories
Comprehensive set of categories (aerial views, ground views) including dynamic textures, multi-modal iPhone sensor ground truth data (video, accelerometer, gyro), several categories of human actions, crowd segmentation, parking lots, human actions, much more
Contributions
n.a.
Tools and apps
n.a.
Key papers
[98]
Link
Name
UCB Contour Detection and Image Segmentation
Description
U.C. Berkeley Computer Vision group provides a complete set of ground truth data, algorithms, and performance evaluations for contour detection, image segmentation, and some interest point methods
Categories
500 ground truth images on natural scenes containing a wide range of subjects and labeled ground truth data
Contributions
n.a.
Tools and apps
Benchmarking code (globalPB for CPU and GPU)
Key papers
[99]
Link
Name
CAVIAR Ground Truth Videos for Context-Aware Vision
Description
Project site containing labeled and annotated ground truth data of humans in cities and shopping centers, including 52 videos with 90K frames total including people in indoor office scenes and shopping centers
Categories
Both scripted and real-life activities in shopping centers and offices, including walking, browsing, meeting, fighting, window shopping, entering/exiting stores
Contributions
n.a.
Tools and apps
n.a.
Key papers
[100]
Link
Name
Boston University Computer Science Department
Description
Image and video database covering a wide range of subject categories
Categories
Video sequences for head tracking and sign language; some datasets are labeled; still images for hand tracking, multi-face tracking, vehicle tracking, more
Contributions
Anonymous FTP
Tools and apps
n.a.
Key papers
[101]
Link
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (http://​creativecommons.​org/​licenses/​by-nc-nd/​4.​0/​), which permits any noncommercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this chapter or parts of it.
The images or other third party material in this chapter are included in the chapter’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
download
DOWNLOAD
print
DRUCKEN
Metadaten
Titel
Survey of Ground Truth Datasets
verfasst von
Scott Krig
Copyright-Jahr
2014
Verlag
Apress
DOI
https://doi.org/10.1007/978-1-4302-5930-5_10