Image Matching: Graffiti	University of Oxford, United Kingdom 8 sequences with 6 images each showing different structured and textured planar scenes. Each sequence shows different image transformations. Includes change of viewpoint, zoom, blur, and rotation. 48 images with homography ground truth - graffiti.tar.gz (1158074515 Bytes). Evaluation protocol, related papers, top scores, and features.

Image Matching: Patches	Microsoft, USA The data is taken from Photo Tourism reconstructions from Trevi Fountain (Rome), Notre Dame (Paris) and Half Dome (Yosemite). Each dataset consists of a series of corresponding patches, which are obtained by projecting 3D points from Photo Tourism reconstructions back into the original images. Patch data Download Evaluation protocol, related papers, top scores, and features.

Image Matching: Planar Scenes	INRIA, France This dataset describes 5 different image changes using in total 449 images with homography ground truth. The images have medium resolution. Evaluation protocol, related papers, top scores, and features.

Image Retrieval: Paris	University of Oxford, United Kingdom The Paris Dataset consists of 6412 images. Images have high resolution and are in JPEG format. Paris images part1 Download Paris images part2 Download Evaluation protocol, related papers, top scores, and features.

Image Retrieval: Oxford Buildings	University of Oxford, United Kingdom The Oxford Buildings Dataset consists of 5062 images . Images have high resolution and are in JPEG format. Images Download Evaluation protocol, related papers, top scores, and features.

Image Classification: Caltech101	California Institute of Technology, United States Pictures of objects belonging to 101 categories. About 40 to 800 images per category. Most categories have about 50 images. Collected by Fei-Fei Li, Marco Andreetto, and Marc 'Aurelio Ranzato. The size of each image is roughly 300 x 200 pixels. L. Fei-Fei, R. Fergus and P. Perona. Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. IEEE. CVPR 2004, Workshop on Generative-Model Based Vision. 2004. You can download the collection of images here Evaluation protocol, related papers, top scores, and features.

Image Classification: Caltech256	California Institute of Technology, United States Collection of 30607 images of objects corresponding to 256 categories. The images resolution varies. You can view the collection of images here You can download the collection of images here Evaluation protocol, related papers, top scores, and features.

Image Classification: Scene 15	Ecole Normale Superieure, France This is a dataset of 15 Urban and Natural Scene categories. Each category consists of 210-410 images. Download images Evaluation protocol, related papers, top scores, and features.

Image Classification: Birds	Ecole Normale Superieure, France This database contains 600 images of 6 different classes of birds. The images are color JPEG of variable resolution. View and download images Evaluation protocol, related papers, top scores, and features.

Image Classification: Butterflies	Ecole Normale Superieure, France This database contains 619 images of 7 different classes of butterflies. The images are color JPEG of variable resolution. View and download images Evaluation protocol, related papers, top scores, and features.

Image Classification: Flowers17	University of Oxford, United Kingdom This is a 17 categories flower dataset with 80 images for each class. Images have variable resolution and are in JPEG format. Flower images Download Evaluation protocol, related papers, top scores, and features.

Image Classification: Flowers102	University of Oxford, United Kingdom This is a 102 categories flower dataset consisting of 40 to 258 images for each class. Images have variable resolution and are in JPEG format. Flower images Download Evaluation protocol, related papers, top scores, and features.

Image Classification: Soccer	INRIA, France This is a dataset consisting of 7 soccer teams, containing 40 images per class. Images have variable resolution and are in JPEG format. 280 Images Download Evaluation protocol, related papers, top scores, and features.

Image Classification: Bottles	University of Oxford, United Kingdom 247 images of bottles. Images have JPEG format and are of variable resolution. Images Download Evaluation protocol, related papers, top scores, and features.

Image Classification: Texture-25	Ecole Normale Superieure, France This dataset consists of 25 texture classes, 40 samples each. The images are grayscale JPGs and have a medium resolution (approximately 640x480 pixels). Texture samples View Evaluation protocol, related papers, top scores, and features.

Image Classification: Camels	University of Oxford, United Kingdom 356 images of camels. Images have JPEG format and are of variable resolution. Images Download Evaluation protocol, related papers, top scores, and features.

Image Classification: Google "Things"	University of Oxford, United Kingdom 520 images collected from google searching for "things". Images have JPEG format and are of variable resolution. Images Download Evaluation protocol, related papers, top scores, and features.

Image Classification: ETH Face Pose	ETH, Switzerland 10'545 range images of 20 persons. In the beginning, each person looks straight into the camera before moving the head. Afterwards, the scanner captures range images at 28 fps while each person turns its head. The resulting range images have a resolution of 640x480 pixels, and a face typically consists of about 150x200 depth values. The head pose range covers about +-90 degrees yaw and +- 45 degrees pitch rotation. Roll rotation is not included in this data set. Download images Evaluation protocol, related papers, top scores, and features.

Object Recognition: Pascal	PASCAL This dataset consists of 4 categories, 20 objects each. The images are color JPEGs and have a medium resolution. Example images view Evaluation protocol, related papers, top scores, and features.

Object Recognition: Caltech Pedestrian	California Institute of Technology, United States The dataset consists of 10 hours of 640x480 30Hz video taken from a driving vehicle. About 250,000 frames with a total of 350,000 bounding boxes and 2300 unique pedestrians were annotated. Training data download Evaluation protocol, related papers, top scores, and features.

Object Recognition: GRAZ1	INRIA, France This dataset describes 4 different categories of images. In total, there are 1317 images of medium resolution (640x480 pixels). Available ground truth for some images. Evaluation protocol, related papers, top scores, and features.

Object Recognition: GRAZ2	INRIA, France This dataset describes 3 different categories of images and one counter - class . In total, there are 1476 images of medium resolution (640x480 pixels). The images contain objects of high complexity on highly cluttered backgrounds. Available ground truth for some images. Evaluation protocol, related papers, top scores, and features.

Object Recognition: INRIA person	INRIA, France This dataset is a collection of person images separated into two categories: (a) 2573 original images with corresponding annotation files, and (b) positive images in normalized 64x128 pixel format with original negative images. Download images Evaluation protocol, related papers, top scores, and features.

Object Recognition: ETH pedestrian	ETH, Switzerland Training data of walking pedestrians in busy scenarios Download annotations and videos Evaluation protocol, related papers, top scores, and features.

Object Recognition: ETH-5	ETH, Switzerland This dataset consists of 255 test images and features 5 diverse shape-based classes (apple logos, bottles, giraffes, mugs, and swans). Images Download Evaluation protocol, related papers, top scores, and features.

Object Recognition: ETH-9	ETH, Switzerland The ETH-9 is a larger database of shape categories, created by merging the above shape classes with 4x50 closed shapes. Images Download Evaluation protocol, related papers, top scores, and features.

Object Recognition: ETH vehicles	ETH, Switzerland This dataset consists of 1175 stereo camera pairs acquired with setup mounted on top of a moving vehicle. The stereo setup has a fixed baseline, and the cameras are calibrated internally and with respect to each other. Download images Evaluation protocol, related papers, top scores, and features.

Object Recognition: ETH pedestrians	ETH, Switzerland This dataset consists of 12'298 annotated pedestrians in roughly 2'000 frames. A pair of cameras mounted on a mobile platform were used to record data with a resolution of 640 x 480 and a framerate of 13-14 FPS. Demonstration of the system View Download images Evaluation protocol, related papers, top scores, and features.

Object Modeling and Recognition: Multi view stereo 1	Ecole Normale Superieure, France A dataset of 9 different objects used for 3D Object Recognition test. Each object is represented by 7 to 12 stereo pairs. View and download images Evaluation protocol, related papers, top scores, and features.

Object Modeling and Recognition: Multi view stereo 2	Ecole Normale Superieure, France This database consists of 8 objects represented by 8-14 images as well as 51 cluttered test shots containing multiple objects. The images are color JPEGs, the resolutions are 1.2 Mpix (1280 x 960) and 3.7 Mpix (2200 x 1700). Download images Evaluation protocol, related papers, top scores, and features.

Object Modeling and Recognition: Multi view stereo 3	Ecole Normale Superieure, France This is a collection of 10 datasets each one containing 24 image of an object. The images have an approximately high resolution. Each dataset is provided with camera parameters and extracted apparent contours for each image. View and download images Evaluation protocol, related papers, top scores, and features.

Action Classification: Hollywood	INRIA, France This dataset consists of 663 video samples describing 8 different human action from 32 movies. Download video Evaluation protocol, related papers, top scores, and features.

Action Classification: Multi-KTH	University of Surrey, United Kingdom Motion sequence with 6 persons each performing different KTH action. Includes camera motion, zoom and structured background with multiple planes. Video sequence - multi-kth.avi (8696206 Bytes). 753 frames and bounding box maps for each person - multi-kth.tar.gz (1158074515 Bytes). Evaluation protocol, related papers, top scores, and features.

Action Classification: KTH	KTH, Sweden This is a video database containing 2391 sequences. The sequences describe 6 different types of human actions performed by 25 subjects in 4 different scenarios. Evaluation protocol, related papers, top scores, and features.

Action Classification: Weizmann	Weizmann Institute of Science, Israel This is a collection of 90 low resolution (180x144 pixels, deinterlaced 50 fps) video sequences. There are 9 different people performing 10 natural scenes. Evaluation protocol, related papers, top scores, and features.

Action Classification: Mouse Behavior	University of California, San Diego, USA This dataset consists of 7 video sequences parts showing mouse behavior on different activities(drink, sleep, eat, explore, groom). Evaluation protocol, related papers, top scores, and features.

Action Classification: Sign Language	University of Oxford, United Kingdom This is a dataset of 6000 frames signing sequences. Images have variable resolution and are in JPEG format. Sequences Download Evaluation protocol, related papers, top scores, and features.

Action Classification: Facial Expressions	University of California, San Diego, USA This dataset consists of 4 video sequences parts Evaluation protocol, related papers, top scores, and features.