Name | Labelme |
---|---|
Description | Annotated scenes and objects |
Categories | Over 30,000 images; comprehensive; hundreds of categories, including car, person, building, road, sidewalk, sky, tree |
Contributions | Open to contributions |
Tools and apps | Labelme app for iPhone to contribute to database |
Key papers | [67][68] |
Owner | MTI CSAIL |
Link |
Name | SUN |
---|---|
Description | Annotated scenes and objects |
Categories | 908 scene categories, 3,819 object categories,13,1072 objects, and growing |
Contributions | Open to contributions |
Tools and apps | Image classifier source code + API, iOS app, Android app |
Key papers | [70] |
Owner | MTI CSAIL |
Link |
Name | UC Irvine Machine Learning Repository |
---|---|
Description | Very useful; huge repository of many categories of images |
Categories | Too many to list; very wide range of categories, many attributes of the data are specifically searchable and designed into the ground truth datasets |
Contributions | Ongoing |
Tools and apps | Online assistant to search for specific ground truth datasets |
Key papers | [550] |
Link |
Name | Stanford 3D Scanning Repository |
---|---|
Description | High-resolution 3D scanned images with sub-millimeter accuracy, including XYZ and RGB datasets |
Categories | Several scanned 3D objects with 3D point clouds, resolution ranging from 3,400,000 scanned point to 750,000 triangles and upwards |
Link |
Name | KITTI Benchmark Suite, Karlsruhe Institute of Technology |
---|---|
Description | Stereo datasets for various city driving scenes |
Categories | KITTI benchmark suite covers optical flow, odometry, object detection, object orientation estimation; Karlsruhe sequences cover gray scale stereo sequences taken from a moving platform driving through a city; Karlsruhe objects cover gray scale stereo sequences taken from a moving platform driving through a city |
Link |
Name | Caltech Object Recognition Datasets |
---|---|
Description | Old but still useful; objects in hundreds of categories, some annotated with outlines |
Categories | Over 256 categories, animals,plants, people, common objects, common food items, tools, furniture, more. |
Key papers | [71] |
Link |
http://authors.library.caltech.edu/7694/(latest versions of 101 and 256) |
Name | Imagenet + Wordnet |
---|---|
Description | Labeled, annotated, bounding-boxed, and feature-descriptor marked images; over 14,197,122 images indexed into 21,841 sets of similar images, or synsets, created using sister app Wordnet |
Categories | Categories include almost anything |
Contributions | Images taken from Internet searches |
Tools and apps | Online controls: http://www.image-net.org/download-API
Source Code: ImageNet Large Scale Visual Recognition Challenge (ILSVRC2010) http://www.image-net.org/challenges/LSVRC/2010/index
|
Key papers | [72]; several see http://www.image-net.org/about-publication
|
Owner | Images have individual owners; website is © Stanford and Princeton |
Link |
Name | Middlebury Computer Vision Datasets |
---|---|
Description | Scholarly and comprehensive datasets, and algorithm comparisons over most of the datasets |
Categories | Stereo vision (excellent), multi-view stereo (excellent), MRF, Optical Flow (excellent), Color processing |
Contributions | Algorithm benchmarks over the datasets can be submitted |
Key papers | Several; see website |
Owner | Middlebury College |
Link |
Name | ADL Activity Recognition Dataset |
---|---|
Description | Annotated scenes for activity recognition of common living scenes |
Categories | Daily life |
Tools and apps | Activity recognition code available (see link below) |
Key papers | [73] |
Link |
Name | MIT Indoor Scenes 67, Scene Classification |
---|---|
Description | Annotated dataset specifically containing diverse indoor scenes |
Categories | 15,620 images organized into 67 indoor categories, some annotations in Labelme format |
Key papers | [74] |
Link |
Name | RGB-D Object Recognition Dataset, U of W |
---|---|
Description | Dataset contains RGB and corresponding depth images |
Categories | 300 common household objects, 51 categories using Wordnet similar to Imagenet style (Imagenet dataset reviewed above), each object recorded in RGB and Kinect depth at various rotational angles and viewpoints |
Key papers | [75] |
Link |
Name | NYU Depth Datasets |
---|---|
Description | Annotated dataset of indoor scenes using RGB-D datasets + accelerometer data |
Categories | Over 500,000 frames, many different indoor scenes and scene types, thousands of classes, accelerometer data, inpainted and raw depth information |
Tools and apps | Matlab toolbox + g++ code |
Key papers | [76] |
Link |
Name | Intel Labs Seattle - Egocentric Recognition of Handled Objects |
---|---|
Description | Annotated dataset for egocentric handled objects using a wearable camera |
Categories | Over 42 everyday objects under varied lighting, occlusion, perspectives; over 6GB total video sequence data |
Key papers | [77] [78] |
Link |
Name | Georgia Tech GTEA Egocentric Activities - Gaze(+) |
---|---|
Description | Annotated dataset for egocentric handled objects using a wearable camera |
Categories | Many everyday objects under varied lighting, occlusion, perspectives |
Tools and apps | Code library of vision functions and mathematical functions |
Key papers | [79] |
Link |
Name | CUReT: Columbia-Utrecht Reflectance and Texture Database |
---|---|
Description | Extensive texture sample and illumination datasets directions |
Categories | Over 60 different samples with over 200 viewing and illumination combinations, BRDF measurement database, more |
Key papers | [80] |
Link |
Name | MIT Flickr Material Surface Category Dataset |
---|---|
Description | Dataset for identifying material categories including fabric, glass, metal, plastic, water, foliage, leather, paper, stone, wood |
Categories | Contains images of materials for surface property analysis, in contrast to object or texture analysis; 10 categories of materials + 100 images in each category |
Key papers | [81] |
Link |
Name | Faces in the Wilds |
---|---|
Description | Collection of over 13,000 images of faces annotated with names of people |
Categories | Faces |
Key papers | [82] |
Link |
Name | The CMU Multi-PIE Face Database |
---|---|
Description | Annotated face and emotion database with multiple pose angles |
Categories | 750,000 face images are taken over a period of several months for each of 337 subjects over 15 viewpoints and 19 illuminations, annotated facial expressions |
Key papers | [83] |
Link |
Name | Stanford 40 Actions |
---|---|
Description | People actions image database |
Categories | People performing 40 actions, bounding-box annotations, 9,532 images, 180-300 images per action class |
Key papers | [84] |
Link |
Name | NORB 3D Object Recognition from Shape |
---|---|
Description | NYU object recognition benchmark |
Categories | Stereo image pairs; 194,400 total images of 50 toys under 36 azimuths, 9 elevations, and 6 lighting conditions |
Tools and apps | EBLEARN C++ learning and vision library, LUSH programming language, VisionGRader object detection tool |
Key papers | [85] |
Link |
Name | Optical Flow Algorithm Evaluation |
---|---|
Description | Tools and data for optical flow evaluation purposes |
Categories | Many optical flow sequence ground truth datasets |
Tools and apps | Tool for generating optical flow data, some optical flow code algorithms |
Key papers | [86] |
Link |
Name | PETS Crowd Sensing Dataset Challenge |
---|---|
Description | Multi-sensor camera views composed into a dataset containing sequences of crowd activities |
Categories | Challenge goals include crowd estimation, density, tracking of specific people, flow of crowd |
Key papers | [94] |
Link |
Name | I-LIDS |
---|---|
Description | Security-oriented challenge ground truth dataset to enable competitive benchmarking including scenes for locating parked vehicles, abandoned baggage, secure perimeters, and doorway surveillance |
Categories | Various categories in the security domain |
Contributions | No, funded by UK government |
Tools and apps | n.a. |
Key papers | n.a. |
Link |
Name | TRECVID, NIST, US Government |
---|---|
Description | NIST-sponsored public project spanning 2001-2013 for research in automatic segmentation, indexing, and content-based video retrieval |
Categories | 1. Semantic indexing (SIN) 2. Known-item search (KIS) 3. Instance search (INS) 4. Multimedia event detection (MED) 5. Multimedia event recounting (MER) 6. Surveillance event detection (SER), natural scenes, humans, vegetation, pets, office objects, more |
Contributions | Annually by U.S. Government |
Tools and apps | The Framework For Detection Evaluations (F4DE) tool, story evaluation tool, and others |
Key papers | [95] |
Link |
Name | Microsoft Research Cambridge |
---|---|
Description | Pixel-wise labeled or segmented objects |
Categories | Several hundred objects |
Link |
Name | Optical Flow Algorithm Evaluation |
---|---|
Description | Volume-rendered video scenes for optical flow algorithm benchmarking |
Categories | Various scenes for optical flow; mainly synthetic sequences generated via ray tracing |
Contributions | n.a. |
Tools and apps | Yes, Tcl/Tk |
Key papers | [96] |
Link |
Name | Pascal Object Recognition VOC Challenge Dataset |
---|---|
Description | Standardized ground truth data for a research challenge spanning 2005-2013 in the area of object recognition; competitions include classification, detection, segmentation, and actions over each of 20 classes of data |
Categories | Consists of over 20 classes of objects in scenes including persons, animals, vehicles, indoor objects |
Contributions | Via the Pascal conference |
Tools and apps | Includes a developer kit and other useful software for labeling data and database access, and tools for reporting benchmarks results |
Key papers | [97] |
Link |
Name | CRCV |
---|---|
Description | Very extensive; University of Central Florida’s Center for Research in Computer Vision hosts a large collection of research data covering several domains |
Categories | Comprehensive set of categories (aerial views, ground views) including dynamic textures, multi-modal iPhone sensor ground truth data (video, accelerometer, gyro), several categories of human actions, crowd segmentation, parking lots, human actions, much more |
Contributions | n.a. |
Tools and apps | n.a. |
Key papers | [98] |
Link |
Name | UCB Contour Detection and Image Segmentation |
---|---|
Description | U.C. Berkeley Computer Vision group provides a complete set of ground truth data, algorithms, and performance evaluations for contour detection, image segmentation, and some interest point methods |
Categories | 500 ground truth images on natural scenes containing a wide range of subjects and labeled ground truth data |
Contributions | n.a. |
Tools and apps | Benchmarking code (globalPB for CPU and GPU) |
Key papers | [99] |
Link |
Name | CAVIAR Ground Truth Videos for Context-Aware Vision |
---|---|
Description | Project site containing labeled and annotated ground truth data of humans in cities and shopping centers, including 52 videos with 90K frames total including people in indoor office scenes and shopping centers |
Categories | Both scripted and real-life activities in shopping centers and offices, including walking, browsing, meeting, fighting, window shopping, entering/exiting stores |
Contributions | n.a. |
Tools and apps | n.a. |
Key papers | [100] |
Link |
Name
| Boston University Computer Science Department |
Description | Image and video database covering a wide range of subject categories |
Categories | Video sequences for head tracking and sign language; some datasets are labeled; still images for hand tracking, multi-face tracking, vehicle tracking, more |
Contributions | Anonymous FTP |
Tools and apps | n.a. |
Key papers | [101] |
Link |