pyiqa.archs.topiq_swin
======================

.. py:module:: pyiqa.archs.topiq_swin

.. autoapi-nested-parse::

   Swin Transformer
   A PyTorch impl of : `Swin Transformer: Hierarchical Vision Transformer using Shifted Windows`
       - https://arxiv.org/pdf/2103.14030

   Code/weights from https://github.com/microsoft/Swin-Transformer, original copyright/license info below

   S3 (AutoFormerV2, https://arxiv.org/abs/2111.14725) Swin weights from
       - https://github.com/microsoft/Cream/tree/main/AutoFormerV2

   Modifications and additions for timm hacked together by / Copyright 2021, Ross Wightman


Module Contents
---------------

.. py:data:: default_cfgs

.. py:function:: resize_pos_embed(posemb, posemb_new, num_prefix_tokens=1, gs_new=())

.. py:function:: checkpoint_filter_fn(state_dict, model, adapt_layer_scale=False)

   convert patch embedding weight from manual patchify + linear proj to conv


.. py:function:: window_partition(x, window_size: int)

   :param x: (B, H, W, C)
   :param window_size: window size
   :type window_size: int

   :returns: (num_windows*B, window_size, window_size, C)
   :rtype: windows


.. py:function:: window_reverse(windows, window_size: int, H: int, W: int)

   :param windows: (num_windows*B, window_size, window_size, C)
   :param window_size: Window size
   :type window_size: int
   :param H: Height of image
   :type H: int
   :param W: Width of image
   :type W: int

   :returns: (B, H, W, C)
   :rtype: x


.. py:function:: get_relative_position_index(win_h, win_w)

.. py:class:: WindowAttention(dim, num_heads, head_dim=None, window_size=7, qkv_bias=True, attn_drop=0.0, proj_drop=0.0)

   Bases: :py:obj:`torch.nn.Module`


   Window based multi-head self attention (W-MSA) module with relative position bias.
   It supports both of shifted and non-shifted window.

   :param dim: Number of input channels.
   :type dim: int
   :param num_heads: Number of attention heads.
   :type num_heads: int
   :param head_dim: Number of channels per head (dim // num_heads if not set)
   :type head_dim: int
   :param window_size: The height and width of the window.
   :type window_size: tuple[int]
   :param qkv_bias: If True, add a learnable bias to query, key, value. Default: True
   :type qkv_bias: bool, optional
   :param attn_drop: Dropout ratio of attention weight. Default: 0.0
   :type attn_drop: float, optional
   :param proj_drop: Dropout ratio of output. Default: 0.0
   :type proj_drop: float, optional


   .. py:method:: forward(x, mask: Optional[torch.Tensor] = None)

      :param x: input features with shape of (num_windows*B, N, C)
      :param mask: (0/-inf) mask with shape of (num_windows, Wh*Ww, Wh*Ww) or None


.. py:class:: SwinTransformerBlock(dim, input_resolution, num_heads=4, head_dim=None, window_size=7, shift_size=0, mlp_ratio=4.0, qkv_bias=True, drop=0.0, attn_drop=0.0, drop_path=0.0, act_layer=nn.GELU, norm_layer=nn.LayerNorm)

   Bases: :py:obj:`torch.nn.Module`


   Swin Transformer Block.

   :param dim: Number of input channels.
   :type dim: int
   :param input_resolution: Input resolution.
   :type input_resolution: tuple[int]
   :param window_size: Window size.
   :type window_size: int
   :param num_heads: Number of attention heads.
   :type num_heads: int
   :param head_dim: Enforce the number of channels per head
   :type head_dim: int
   :param shift_size: Shift size for SW-MSA.
   :type shift_size: int
   :param mlp_ratio: Ratio of mlp hidden dim to embedding dim.
   :type mlp_ratio: float
   :param qkv_bias: If True, add a learnable bias to query, key, value. Default: True
   :type qkv_bias: bool, optional
   :param drop: Dropout rate. Default: 0.0
   :type drop: float, optional
   :param attn_drop: Attention dropout rate. Default: 0.0
   :type attn_drop: float, optional
   :param drop_path: Stochastic depth rate. Default: 0.0
   :type drop_path: float, optional
   :param act_layer: Activation layer. Default: nn.GELU
   :type act_layer: nn.Module, optional
   :param norm_layer: Normalization layer.  Default: nn.LayerNorm
   :type norm_layer: nn.Module, optional


   .. py:method:: forward(x)


.. py:class:: PatchMerging(input_resolution, dim, out_dim=None, norm_layer=nn.LayerNorm)

   Bases: :py:obj:`torch.nn.Module`


   Patch Merging Layer.

   :param input_resolution: Resolution of input feature.
   :type input_resolution: tuple[int]
   :param dim: Number of input channels.
   :type dim: int
   :param norm_layer: Normalization layer.  Default: nn.LayerNorm
   :type norm_layer: nn.Module, optional


   .. py:method:: forward(x)

      x: B, H*W, C


.. py:class:: BasicLayer(dim, out_dim, input_resolution, depth, num_heads=4, head_dim=None, window_size=7, mlp_ratio=4.0, qkv_bias=True, drop=0.0, attn_drop=0.0, drop_path=0.0, norm_layer=nn.LayerNorm, downsample=None)

   Bases: :py:obj:`torch.nn.Module`


   A basic Swin Transformer layer for one stage.

   :param dim: Number of input channels.
   :type dim: int
   :param input_resolution: Input resolution.
   :type input_resolution: tuple[int]
   :param depth: Number of blocks.
   :type depth: int
   :param num_heads: Number of attention heads.
   :type num_heads: int
   :param head_dim: Channels per head (dim // num_heads if not set)
   :type head_dim: int
   :param window_size: Local window size.
   :type window_size: int
   :param mlp_ratio: Ratio of mlp hidden dim to embedding dim.
   :type mlp_ratio: float
   :param qkv_bias: If True, add a learnable bias to query, key, value. Default: True
   :type qkv_bias: bool, optional
   :param drop: Dropout rate. Default: 0.0
   :type drop: float, optional
   :param attn_drop: Attention dropout rate. Default: 0.0
   :type attn_drop: float, optional
   :param drop_path: Stochastic depth rate. Default: 0.0
   :type drop_path: float | tuple[float], optional
   :param norm_layer: Normalization layer. Default: nn.LayerNorm
   :type norm_layer: nn.Module, optional
   :param downsample: Downsample layer at the end of the layer. Default: None
   :type downsample: nn.Module | None, optional


   .. py:method:: forward(x)


.. py:class:: SwinTransformer(img_size=224, patch_size=4, in_chans=3, num_classes=1000, global_pool='avg', embed_dim=96, depths=(2, 2, 6, 2), num_heads=(3, 6, 12, 24), head_dim=None, window_size=7, mlp_ratio=4.0, qkv_bias=True, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.1, norm_layer=nn.LayerNorm, ape=False, patch_norm=True, weight_init='', **kwargs)

   Bases: :py:obj:`torch.nn.Module`


   Swin Transformer
       A PyTorch impl of : `Swin Transformer: Hierarchical Vision Transformer using Shifted Windows`  -
         https://arxiv.org/pdf/2103.14030

   :param img_size: Input image size. Default 224
   :type img_size: int | tuple(int)
   :param patch_size: Patch size. Default: 4
   :type patch_size: int | tuple(int)
   :param in_chans: Number of input image channels. Default: 3
   :type in_chans: int
   :param num_classes: Number of classes for classification head. Default: 1000
   :type num_classes: int
   :param embed_dim: Patch embedding dimension. Default: 96
   :type embed_dim: int
   :param depths: Depth of each Swin Transformer layer.
   :type depths: tuple(int)
   :param num_heads: Number of attention heads in different layers.
   :type num_heads: tuple(int)
   :param head_dim:
   :type head_dim: int, tuple(int)
   :param window_size: Window size. Default: 7
   :type window_size: int
   :param mlp_ratio: Ratio of mlp hidden dim to embedding dim. Default: 4
   :type mlp_ratio: float
   :param qkv_bias: If True, add a learnable bias to query, key, value. Default: True
   :type qkv_bias: bool
   :param drop_rate: Dropout rate. Default: 0
   :type drop_rate: float
   :param attn_drop_rate: Attention dropout rate. Default: 0
   :type attn_drop_rate: float
   :param drop_path_rate: Stochastic depth rate. Default: 0.1
   :type drop_path_rate: float
   :param norm_layer: Normalization layer. Default: nn.LayerNorm.
   :type norm_layer: nn.Module
   :param ape: If True, add absolute position embedding to the patch embedding. Default: False
   :type ape: bool
   :param patch_norm: If True, add normalization after patch embedding. Default: True
   :type patch_norm: bool


   .. py:method:: no_weight_decay()


   .. py:method:: group_matcher(coarse=False)


   .. py:method:: set_grad_checkpointing(enable=True)


   .. py:method:: get_classifier()


   .. py:method:: reset_classifier(num_classes, global_pool=None)


   .. py:method:: forward_features(x)


   .. py:method:: forward_head(x, pre_logits: bool = False)


   .. py:method:: forward(x)


.. py:function:: swin_base_patch4_window12_384(pretrained=False, **kwargs)

   Swin-B @ 384x384, pretrained ImageNet-22k, fine tune 1k


.. py:function:: swin_base_patch4_window7_224(pretrained=False, **kwargs)

   Swin-B @ 224x224, pretrained ImageNet-22k, fine tune 1k


.. py:function:: swin_large_patch4_window12_384(pretrained=False, **kwargs)

   Swin-L @ 384x384, pretrained ImageNet-22k, fine tune 1k


.. py:function:: swin_large_patch4_window7_224(pretrained=False, **kwargs)

   Swin-L @ 224x224, pretrained ImageNet-22k, fine tune 1k


.. py:function:: swin_small_patch4_window7_224(pretrained=False, **kwargs)

   Swin-S @ 224x224, trained ImageNet-1k


.. py:function:: swin_tiny_patch4_window7_224(pretrained=False, **kwargs)

   Swin-T @ 224x224, trained ImageNet-1k


.. py:function:: swin_base_patch4_window12_384_in22k(pretrained=False, **kwargs)

   Swin-B @ 384x384, trained ImageNet-22k


.. py:function:: swin_base_patch4_window7_224_in22k(pretrained=False, **kwargs)

   Swin-B @ 224x224, trained ImageNet-22k


.. py:function:: swin_large_patch4_window12_384_in22k(pretrained=False, **kwargs)

   Swin-L @ 384x384, trained ImageNet-22k


.. py:function:: swin_large_patch4_window7_224_in22k(pretrained=False, **kwargs)

   Swin-L @ 224x224, trained ImageNet-22k


.. py:function:: swin_s3_tiny_224(pretrained=False, **kwargs)

   Swin-S3-T @ 224x224, ImageNet-1k. https://arxiv.org/abs/2111.14725


.. py:function:: swin_s3_small_224(pretrained=False, **kwargs)

   Swin-S3-S @ 224x224, trained ImageNet-1k. https://arxiv.org/abs/2111.14725


.. py:function:: swin_s3_base_224(pretrained=False, **kwargs)

   Swin-S3-B @ 224x224, trained ImageNet-1k. https://arxiv.org/abs/2111.14725


.. py:function:: create_swin(name, **kwargs)