pyiqa.archs.topiq_arch ====================== .. py:module:: pyiqa.archs.topiq_arch .. autoapi-nested-parse:: TOP-IQ metric, proposed by TOPIQ: A Top-down Approach from Semantics to Distortions for Image Quality Assessment. Chaofeng Chen, Jiadi Mo, Jingwen Hou, Haoning Wu, Liang Liao, Wenxiu Sun, Qiong Yan, Weisi Lin. Transactions on Image Processing, 2024. Paper link: https://arxiv.org/abs/2308.03060 Module Contents --------------- .. py:data:: default_model_urls .. py:class:: TransformerEncoderLayer(d_model, nhead, dim_feedforward=2048, dropout=0.1, activation='gelu', normalize_before=False) Bases: :py:obj:`torch.nn.Module` Transformer encoder layer used in local self-attention blocks. .. py:method:: forward(src) .. py:class:: TransformerDecoderLayer(d_model, nhead, dim_feedforward=2048, dropout=0.1, activation='gelu', normalize_before=False) Bases: :py:obj:`torch.nn.Module` Transformer decoder layer used for cross-scale attention. .. py:method:: forward(tgt, memory) .. py:class:: TransformerEncoder(encoder_layer, num_layers) Bases: :py:obj:`torch.nn.Module` Stacked wrapper for encoder layers. .. py:method:: forward(src) .. py:class:: TransformerDecoder(decoder_layer, num_layers) Bases: :py:obj:`torch.nn.Module` Stacked wrapper for decoder layers. .. py:method:: forward(tgt, memory) .. py:class:: GatedConv(weightdim, ksz=3) Bases: :py:obj:`torch.nn.Module` Gated local pooling module for no-reference feature aggregation. .. py:method:: forward(x) .. py:class:: CFANet(semantic_model_name='resnet50', model_name='cfanet_nr_koniq_res50', backbone_pretrain=True, in_size=None, use_ref=True, num_class=1, num_crop=1, crop_size=256, inter_dim=256, num_heads=4, num_attn_layers=1, dprate=0.1, activation='gelu', pretrained=True, pretrained_model_path=None, out_act=False, block_pool='weighted_avg', test_img_size=None, align_crop_face=True, default_mean=IMAGENET_DEFAULT_MEAN, default_std=IMAGENET_DEFAULT_STD) Bases: :py:obj:`torch.nn.Module` TOPIQ/CFANet architecture for NR and FR quality prediction. :param semantic_model_name: Backbone name, for example ``'resnet50'``, ``'clip_ViT-B/32'``, or a Swin variant. :type semantic_model_name: str :param model_name: Registered checkpoint key. :type model_name: str :param backbone_pretrain: Whether to load pretrained backbone weights. :type backbone_pretrain: bool :param in_size: Optional training input size. :type in_size: tuple[int, int] | None :param use_ref: Whether to use a reference image input. :type use_ref: bool :param num_class: Number of output dimensions. :type num_class: int :param num_crop: Number of evaluation crops. :type num_crop: int :param crop_size: Crop size for multi-crop evaluation. :type crop_size: int :param inter_dim: Intermediate feature dimension. :type inter_dim: int :param num_heads: Attention head count. :type num_heads: int :param num_attn_layers: Number of attention layers per block. :type num_attn_layers: int :param dprate: Dropout probability. :type dprate: float :param activation: Activation name. :type activation: str :param pretrained: Whether to load pretrained CFANet checkpoint. :type pretrained: bool :param pretrained_model_path: Optional local checkpoint path. :type pretrained_model_path: str | None :param out_act: Whether to apply positive output activation for scalar prediction. :type out_act: bool :param block_pool: Feature block pooling mode. :type block_pool: str :param test_img_size: Optional test-time resize. :type test_img_size: tuple[int, int] | None :param align_crop_face: Whether to run face alignment for GFIQA models. :type align_crop_face: bool :param default_mean: Input normalization mean. :type default_mean: tuple[float, float, float] :param default_std: Input normalization std. :type default_std: tuple[float, float, float] .. rubric:: Notes Set ``use_ref=True`` for full-reference mode and ``use_ref=False`` for no-reference mode. .. py:method:: preprocess(x) .. py:method:: fix_bn(model) .. py:method:: get_swin_feature(model, x) .. py:method:: dist_func(x, y, eps=1e-12) .. py:method:: forward_cross_attention(x, y=None) .. py:method:: preprocess_face(x) .. py:method:: forward(x, y=None, return_mos=True, return_dist=False) Compute quality prediction. :param x: Distorted image tensor with shape ``(N, 3, H, W)``. :type x: torch.Tensor :param y: Optional reference image tensor with shape ``(N, 3, H, W)``. Required when ``use_ref`` is ``True``. :type y: torch.Tensor | None :param return_mos: Whether to return mapped MOS output. :type return_mos: bool :param return_dist: Whether to return raw distance/logit output. :type return_dist: bool :returns: Single tensor when one output is requested, otherwise ``[mos, dist]`` in that order. :rtype: torch.Tensor | list[torch.Tensor] :raises AssertionError: If ``use_ref`` is ``True`` but ``y`` is not given.