Dataset Preparation
Supported Datasets
FR Dataset | Description | NR Dataset | Description |
---|---|---|---|
PIPAL | 2AFC | FLIVE(PaQ-2-PiQ) | Tech & Aesthetic |
BAPPS | 2AFC | SPAQ | Mobile |
PieAPP | 2AFC | AVA | Aesthetic |
KADID-10k | KonIQ-10k(++) | ||
LIVEM | LIVEChallange | ||
LIVE | PIQ2023 | Portrait dataset | |
TID2013 | GFIQA | Face IQA Dataset | |
TID2008 | |||
CSIQ |
Please see more details at Awesome Image Quality Assessment
Resources
Here are some other resources to download the dataset:
Interface of Dataloader
We create general interfaces for FR and NR datasets in pyiqa/data/general_fr_dataset.py
and pyiqa/data/general_nr_dataset.py
. The main arguments are
opt
contains all dataset options, includingdataroot_target
: path of target image folder.dataroot_ref [optional]
: path of reference image folder.meta_info_file
: file containing meta information of images, including relative image paths, mos labels and other labels.augment [optional]
data augmentation transform listhflip
: flip input images or pairsrandom_crop
: int or tuple, random crop input images or pairs
split_file [optional]
:train/val/test
split file*.pkl
. If not specified, will load the whole dataset.split_index [optional]
: default1
, which split to use, only valid whensplit_file
is specified.dmos max
: some dataset use difference of mos. Set this to non-zero will change dmos to mos withmos = dmos_max - dmos
.phase
: phase labels [train, val, test]
The above interface requires two files to provide the dataset information, i.e., the meta_info_file
and split_file
. The meta_info_file
are .csv
files, and has the following general format
- For NR datasets: name, mos(mean), std
```
100.bmp 32.56107532210109 19.12472638223644
```
- For FR datasets: ref_name, dist_name, mos(mean), std
```
I01.bmp I01_01_1.bmp 5.51429 0.13013
```
The split_file
are .pkl
files which contains the train/val/test
information with python dictionary in the following format:
{
train_index: {
train: [train_index_list]
val: [val_index_list] # blank if no validation split
test: [test_index_list] # blank if no test split
}
}
The train_index starts from 1
. And the sample indexes correspond to the row index of meta_info_file
, starting from 0
. We already generate the files for mainstream public datasets with scripts in folder ./scripts/.
Note that we generate train/val/test
splits follow the principles below:
For datasets which has official splits, we follow their splits.
For official split which has no
val
part, e.g., AVA dataset, we random separate 5% from training data as validation.For small datasets which requires n-split results, we use
train:val=8:2
ratio.All random seeds are set to
123
when needed.
Specific Datasets and Dataloader
Some of the supported datasets have different label formats and file organizations, and we create specific dataloader for them:
Live Challenge. The first 7 samples are usually removed in the related works.
AVA. Different label formats.
PieAPP. Different label formats.
BAPPS. Different label formats.
Test Dataloader
You may use tests/test_datasets.py
to test whether a dataset can be correctly loaded.