pyiqa.archs.musiq_arch
======================

.. py:module:: pyiqa.archs.musiq_arch

.. autoapi-nested-parse::

   MUSIQ model.

   Reference:
           Ke, Junjie, Qifei Wang, Yilin Wang, Peyman Milanfar, and Feng Yang.
           "Musiq: Multi-scale image quality transformer." In Proceedings of the
           IEEE/CVF International Conference on Computer Vision (ICCV), pp. 5148-5157. 2021.

   Ref url: https://github.com/google-research/google-research/tree/master/musiq
   Re-implemented by: Chaofeng Chen (https://github.com/chaofengc)


Module Contents
---------------

.. py:data:: default_model_urls

.. py:class:: StdConv

   Bases: :py:obj:`torch.nn.Conv2d`


   Reference: https://github.com/joe-siyuan-qiao/WeightStandardization


   .. py:method:: forward(x)


.. py:class:: Bottleneck(inplanes, outplanes, stride=1)

   Bases: :py:obj:`torch.nn.Module`


   .. py:method:: forward(x)


.. py:function:: drop_path(x, drop_prob: float = 0.0, training: bool = False)

.. py:class:: DropPath(drop_prob=None)

   Bases: :py:obj:`torch.nn.Module`


   Drop paths (Stochastic Depth) per sample  (when applied in main path of residual blocks).


   .. py:method:: forward(x)


.. py:class:: Mlp(in_features, hidden_features=None, out_features=None, act_layer=nn.GELU, drop=0.0)

   Bases: :py:obj:`torch.nn.Module`


   .. py:method:: forward(x)


.. py:class:: MultiHeadAttention(dim, num_heads=6, bias=False, attn_drop=0.0, out_drop=0.0)

   Bases: :py:obj:`torch.nn.Module`


   .. py:method:: forward(x, mask=None)


.. py:class:: TransformerBlock(dim, mlp_dim, num_heads, drop=0.0, attn_drop=0.0, drop_path=0.0, act_layer=nn.GELU, norm_layer=nn.LayerNorm)

   Bases: :py:obj:`torch.nn.Module`


   .. py:method:: forward(x, inputs_masks)


.. py:class:: AddHashSpatialPositionEmbs(spatial_pos_grid_size, dim)

   Bases: :py:obj:`torch.nn.Module`


   Adds learnable hash-based spatial embeddings to the inputs.


   .. py:method:: forward(inputs, inputs_positions)


.. py:class:: AddScaleEmbs(num_scales, dim)

   Bases: :py:obj:`torch.nn.Module`


   Adds learnable scale embeddings to the inputs.


   .. py:method:: forward(inputs, inputs_scale_positions)


.. py:class:: TransformerEncoder(input_dim, mlp_dim=1152, attention_dropout_rate=0.0, dropout_rate=0, num_heads=6, num_layers=14, num_scales=3, spatial_pos_grid_size=10, use_scale_emb=True, use_sinusoid_pos_emb=False)

   Bases: :py:obj:`torch.nn.Module`


   .. py:method:: forward(x, inputs_spatial_positions, inputs_scale_positions, inputs_masks)


.. py:class:: MUSIQ(patch_size=32, num_class=1, hidden_size=384, mlp_dim=1152, attention_dropout_rate=0.0, dropout_rate=0, num_heads=6, num_layers=14, num_scales=3, spatial_pos_grid_size=10, use_scale_emb=True, use_sinusoid_pos_emb=False, pretrained=True, pretrained_model_path=None, longer_side_lengths=[224, 384], max_seq_len_from_original_res=-1)

   Bases: :py:obj:`torch.nn.Module`


   MUSIQ model architecture.

   :param - patch_size: Size of the patches to extract from the images.
   :type - patch_size: int
   :param - num_class: Number of classes to predict.
   :type - num_class: int
   :param - hidden_size: Size of the hidden layer in the transformer encoder.
   :type - hidden_size: int
   :param - mlp_dim: Size of the feedforward layer in the transformer encoder.
   :type - mlp_dim: int
   :param - attention_dropout_rate: Dropout rate for the attention layer in the transformer encoder.
   :type - attention_dropout_rate: float
   :param - dropout_rate: Dropout rate for the transformer encoder.
   :type - dropout_rate: float
   :param - num_heads: Number of attention heads in the transformer encoder.
   :type - num_heads: int
   :param - num_layers: Number of layers in the transformer encoder.
   :type - num_layers: int
   :param - num_scales: Number of scales to use in the transformer encoder.
   :type - num_scales: int
   :param - spatial_pos_grid_size: Size of the spatial position grid in the transformer encoder.
   :type - spatial_pos_grid_size: int
   :param - use_scale_emb: Whether to use scale embeddings in the transformer encoder.
   :type - use_scale_emb: bool
   :param - use_sinusoid_pos_emb: Whether to use sinusoidal position embeddings in the transformer encoder.
   :type - use_sinusoid_pos_emb: bool
   :param - pretrained: Whether to use a pretrained model. If str, specifies the path to the pretrained model.
   :type - pretrained: bool or str
   :param - pretrained_model_path: Path to the pretrained model.
   :type - pretrained_model_path: str
   :param - longer_side_lengths: List of longer side lengths to use for multiscale evaluation.
   :type - longer_side_lengths: list
   :param - max_seq_len_from_original_res: Maximum sequence length to use for multiscale evaluation.
   :type - max_seq_len_from_original_res: int

   .. attribute:: - conv_root

      Convolutional layer for the root of the network.

      :type: StdConv

   .. attribute:: - gn_root

      Group normalization layer for the root of the network.

      :type: nn.GroupNorm

   .. attribute:: - root_pool

      Max pooling layer for the root of the network.

      :type: nn.Sequential

   .. attribute:: - block1

      First bottleneck block in the network.

      :type: Bottleneck

   .. attribute:: - embedding

      Linear layer for the transformer encoder input.

      :type: nn.Linear

   .. attribute:: - transformer_encoder

      Transformer encoder.

      :type: TransformerEncoder

   .. attribute:: - head

      Output layer of the network.

      :type: nn.Sequential or nn.Linear

   .. method:: forward(x, return_mos=True, return_dist=False)

      Forward pass of the network.
      
      
   .. py:method:: forward(x, return_mos=True, return_dist=False)

      Forward pass of the MUSIQ network.

      :param x: Input tensor.
      :type x: torch.Tensor
      :param return_mos: Whether to return the mean opinion score (MOS).
      :type return_mos: bool
      :param return_dist: Whether to return the predicted distribution.
      :type return_dist: bool

      :returns: If only one of return_mos and return_dist is True, returns a tensor. If both are True, returns a tuple of tensors.
      :rtype: torch.Tensor or tuple