pyiqa.archs.musiq_arch ====================== .. py:module:: pyiqa.archs.musiq_arch .. autoapi-nested-parse:: MUSIQ model. Reference: Ke, Junjie, Qifei Wang, Yilin Wang, Peyman Milanfar, and Feng Yang. "Musiq: Multi-scale image quality transformer." In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 5148-5157. 2021. Ref url: https://github.com/google-research/google-research/tree/master/musiq Re-implemented by: Chaofeng Chen (https://github.com/chaofengc) Module Contents --------------- .. py:data:: default_model_urls .. py:class:: StdConv Bases: :py:obj:`torch.nn.Conv2d` Reference: https://github.com/joe-siyuan-qiao/WeightStandardization .. py:method:: forward(x) .. py:class:: Bottleneck(inplanes, outplanes, stride=1) Bases: :py:obj:`torch.nn.Module` .. py:method:: forward(x) .. py:function:: drop_path(x, drop_prob: float = 0.0, training: bool = False) .. py:class:: DropPath(drop_prob=None) Bases: :py:obj:`torch.nn.Module` Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks). .. py:method:: forward(x) .. py:class:: Mlp(in_features, hidden_features=None, out_features=None, act_layer=nn.GELU, drop=0.0) Bases: :py:obj:`torch.nn.Module` .. py:method:: forward(x) .. py:class:: MultiHeadAttention(dim, num_heads=6, bias=False, attn_drop=0.0, out_drop=0.0) Bases: :py:obj:`torch.nn.Module` .. py:method:: forward(x, mask=None) .. py:class:: TransformerBlock(dim, mlp_dim, num_heads, drop=0.0, attn_drop=0.0, drop_path=0.0, act_layer=nn.GELU, norm_layer=nn.LayerNorm) Bases: :py:obj:`torch.nn.Module` .. py:method:: forward(x, inputs_masks) .. py:class:: AddHashSpatialPositionEmbs(spatial_pos_grid_size, dim) Bases: :py:obj:`torch.nn.Module` Adds learnable hash-based spatial embeddings to the inputs. .. py:method:: forward(inputs, inputs_positions) .. py:class:: AddScaleEmbs(num_scales, dim) Bases: :py:obj:`torch.nn.Module` Adds learnable scale embeddings to the inputs. .. py:method:: forward(inputs, inputs_scale_positions) .. py:class:: TransformerEncoder(input_dim, mlp_dim=1152, attention_dropout_rate=0.0, dropout_rate=0, num_heads=6, num_layers=14, num_scales=3, spatial_pos_grid_size=10, use_scale_emb=True, use_sinusoid_pos_emb=False) Bases: :py:obj:`torch.nn.Module` .. py:method:: forward(x, inputs_spatial_positions, inputs_scale_positions, inputs_masks) .. py:class:: MUSIQ(patch_size=32, num_class=1, hidden_size=384, mlp_dim=1152, attention_dropout_rate=0.0, dropout_rate=0, num_heads=6, num_layers=14, num_scales=3, spatial_pos_grid_size=10, use_scale_emb=True, use_sinusoid_pos_emb=False, pretrained=True, pretrained_model_path=None, longer_side_lengths=[224, 384], max_seq_len_from_original_res=-1) Bases: :py:obj:`torch.nn.Module` MUSIQ model architecture. :param - patch_size: Size of the patches to extract from the images. :type - patch_size: int :param - num_class: Number of classes to predict. :type - num_class: int :param - hidden_size: Size of the hidden layer in the transformer encoder. :type - hidden_size: int :param - mlp_dim: Size of the feedforward layer in the transformer encoder. :type - mlp_dim: int :param - attention_dropout_rate: Dropout rate for the attention layer in the transformer encoder. :type - attention_dropout_rate: float :param - dropout_rate: Dropout rate for the transformer encoder. :type - dropout_rate: float :param - num_heads: Number of attention heads in the transformer encoder. :type - num_heads: int :param - num_layers: Number of layers in the transformer encoder. :type - num_layers: int :param - num_scales: Number of scales to use in the transformer encoder. :type - num_scales: int :param - spatial_pos_grid_size: Size of the spatial position grid in the transformer encoder. :type - spatial_pos_grid_size: int :param - use_scale_emb: Whether to use scale embeddings in the transformer encoder. :type - use_scale_emb: bool :param - use_sinusoid_pos_emb: Whether to use sinusoidal position embeddings in the transformer encoder. :type - use_sinusoid_pos_emb: bool :param - pretrained: Whether to use a pretrained model. If str, specifies the path to the pretrained model. :type - pretrained: bool or str :param - pretrained_model_path: Path to the pretrained model. :type - pretrained_model_path: str :param - longer_side_lengths: List of longer side lengths to use for multiscale evaluation. :type - longer_side_lengths: list :param - max_seq_len_from_original_res: Maximum sequence length to use for multiscale evaluation. :type - max_seq_len_from_original_res: int .. attribute:: - conv_root Convolutional layer for the root of the network. :type: StdConv .. attribute:: - gn_root Group normalization layer for the root of the network. :type: nn.GroupNorm .. attribute:: - root_pool Max pooling layer for the root of the network. :type: nn.Sequential .. attribute:: - block1 First bottleneck block in the network. :type: Bottleneck .. attribute:: - embedding Linear layer for the transformer encoder input. :type: nn.Linear .. attribute:: - transformer_encoder Transformer encoder. :type: TransformerEncoder .. attribute:: - head Output layer of the network. :type: nn.Sequential or nn.Linear .. method:: forward(x, return_mos=True, return_dist=False) Forward pass of the network. .. py:method:: forward(x, return_mos=True, return_dist=False) Forward pass of the MUSIQ network. :param x: Input tensor. :type x: torch.Tensor :param return_mos: Whether to return the mean opinion score (MOS). :type return_mos: bool :param return_dist: Whether to return the predicted distribution. :type return_dist: bool :returns: If only one of return_mos and return_dist is True, returns a tensor. If both are True, returns a tuple of tensors. :rtype: torch.Tensor or tuple