Dataset: Aesthetics Based on Fashion Images

An approach for ranking images by pooling from the knowledge and experience of crowdsourced annotators is presented in [1,2]. To address the highly subjective and complex problem of fashion interpretation and assessment of aesthetic qualities of images, a dataset is introduced. This dataset includes images fully labelled with attributes of body shape (s), top (t) and bottom (b) clothing where an image configuration (s t b) gives the different attribute states. It also includes aesthetic pairwise assessments performed by utilizing these images.


Images

Collectively there are 1064 images for 120 configurations where a configuration represents an image of a person of a body shape with specific top and bottom clothing categories.

From left to right the body shape attributes shown are apple, column, hourglass and pear.

An image is represented by its configuration (config) and the number of images in that particular configuration. So image 1_10.jpg belongs to the first configuration and is the 10th image of this configuration. Specific states for the configurations as given in config_state_table.txt are as follows;

Body shape (s): 1: apple, 2: column, 3: hourglass, 4: pear

Top clothing (t): 1: fitted top, 2: fitted jacket, 3: loose jacket, 4: loose top, 5: ruffled top

Bottom clothing (b): 1: flared trousers, 2: fitted trousers, 3: straight trousers, 4: flared skirt, 5: fitted skirt, 6: straight skirt

For example, image 1_10.jpg of config 1 (s t b: 1 1 1) in config_state_table.txt has an apple body shape with the person wearing a fitted top with flared trousers. The states for the remaining configurations can be determined in a similar manner.

Aesthetic comparisons

Ten annotators who follow fashion were recruited and allowed the scoring of a total of 70000 images. The size of the expert and repeated control pairs was set to 700 paired-images, details of which can be found in [1,2].

The files aesthetic_01.txt to aesthetic_10.txt contain pairwise comparisons performed by the 10 annotators and aesthetic_exp.txt includes annotations performed by the expert. Annotations given by the 10 annotators are consistent with the evaluation performed in [1,2]. An image pair with the preferred choice is given where 1 indicates a preference for the first image (left) and 2 shows the second image (right) is preferred.

As an example, a total number of 7000 aesthetic comparisons are given in aesthetic_01.txt which have been provided by the first annotator. The first comparison in this file includes;

27_6.jpg 28_3.jpg 1

Therefore the annotator chooses the left image 27_6.jpg from the image pair. Out of the 7000 comparisons, 700 have been performed by the expert (aesthetic_exp.txt). And 700 are repeated in all the annotations so they have been performed by the remaining 9 annotators as well. These control image pairs have been sampled at regular intervals within an annotated dataset. The expert image pairs are included at positions 5,15,25,...,6995 and the repeated control pairs are included at 10,20,30,...,7000.


References

[1] A. Gaur and K. Mikolajczyk. Ranking Images Based on Aesthetic Qualities. In Proceedings of the International Conference on Pattern Recognition (ICPR), August 2014.

[2] A. Gaur and K. Mikolajczyk. Aesthetics Based Assessment and Ranking of Fashion Images. Computer Vision and Image Understanding (CVIU), Submitted in October 2014.


Download

fashion_data.zip

Contact: A. Gaur a.gaur@surrey.ac.uk.