pyiqa.archs.topiq_arch¶
TOP-IQ metric, proposed by
TOPIQ: A Top-down Approach from Semantics to Distortions for Image Quality Assessment. Chaofeng Chen, Jiadi Mo, Jingwen Hou, Haoning Wu, Liang Liao, Wenxiu Sun, Qiong Yan, Weisi Lin. Transactions on Image Processing, 2024.
Paper link: https://arxiv.org/abs/2308.03060
Module Contents¶
- class pyiqa.archs.topiq_arch.TransformerEncoderLayer(d_model, nhead, dim_feedforward=2048, dropout=0.1, activation='gelu', normalize_before=False)[source]¶
Bases:
torch.nn.ModuleTransformer encoder layer used in local self-attention blocks.
- class pyiqa.archs.topiq_arch.TransformerDecoderLayer(d_model, nhead, dim_feedforward=2048, dropout=0.1, activation='gelu', normalize_before=False)[source]¶
Bases:
torch.nn.ModuleTransformer decoder layer used for cross-scale attention.
- class pyiqa.archs.topiq_arch.TransformerEncoder(encoder_layer, num_layers)[source]¶
Bases:
torch.nn.ModuleStacked wrapper for encoder layers.
- class pyiqa.archs.topiq_arch.TransformerDecoder(decoder_layer, num_layers)[source]¶
Bases:
torch.nn.ModuleStacked wrapper for decoder layers.
- class pyiqa.archs.topiq_arch.GatedConv(weightdim, ksz=3)[source]¶
Bases:
torch.nn.ModuleGated local pooling module for no-reference feature aggregation.
- class pyiqa.archs.topiq_arch.CFANet(semantic_model_name='resnet50', model_name='cfanet_nr_koniq_res50', backbone_pretrain=True, in_size=None, use_ref=True, num_class=1, num_crop=1, crop_size=256, inter_dim=256, num_heads=4, num_attn_layers=1, dprate=0.1, activation='gelu', pretrained=True, pretrained_model_path=None, out_act=False, block_pool='weighted_avg', test_img_size=None, align_crop_face=True, default_mean=IMAGENET_DEFAULT_MEAN, default_std=IMAGENET_DEFAULT_STD)[source]¶
Bases:
torch.nn.ModuleTOPIQ/CFANet architecture for NR and FR quality prediction.
- Parameters:
semantic_model_name (str) – Backbone name, for example
'resnet50','clip_ViT-B/32', or a Swin variant.model_name (str) – Registered checkpoint key.
backbone_pretrain (bool) – Whether to load pretrained backbone weights.
in_size (tuple[int, int] | None) – Optional training input size.
use_ref (bool) – Whether to use a reference image input.
num_class (int) – Number of output dimensions.
num_crop (int) – Number of evaluation crops.
crop_size (int) – Crop size for multi-crop evaluation.
inter_dim (int) – Intermediate feature dimension.
num_heads (int) – Attention head count.
num_attn_layers (int) – Number of attention layers per block.
dprate (float) – Dropout probability.
activation (str) – Activation name.
pretrained (bool) – Whether to load pretrained CFANet checkpoint.
pretrained_model_path (str | None) – Optional local checkpoint path.
out_act (bool) – Whether to apply positive output activation for scalar prediction.
block_pool (str) – Feature block pooling mode.
test_img_size (tuple[int, int] | None) – Optional test-time resize.
align_crop_face (bool) – Whether to run face alignment for GFIQA models.
default_mean (tuple[float, float, float]) – Input normalization mean.
default_std (tuple[float, float, float]) – Input normalization std.
Notes
Set
use_ref=Truefor full-reference mode anduse_ref=Falsefor no-reference mode.- forward(x, y=None, return_mos=True, return_dist=False)[source]¶
Compute quality prediction.
- Parameters:
x (torch.Tensor) – Distorted image tensor with shape
(N, 3, H, W).y (torch.Tensor | None) – Optional reference image tensor with shape
(N, 3, H, W). Required whenuse_refisTrue.return_mos (bool) – Whether to return mapped MOS output.
return_dist (bool) – Whether to return raw distance/logit output.
- Returns:
Single tensor when one output is requested, otherwise
[mos, dist]in that order.- Return type:
torch.Tensor | list[torch.Tensor]
- Raises:
AssertionError – If
use_refisTruebutyis not given.