pyiqa.archs.topiq_swin ====================== .. py:module:: pyiqa.archs.topiq_swin .. autoapi-nested-parse:: Swin Transformer A PyTorch impl of : `Swin Transformer: Hierarchical Vision Transformer using Shifted Windows` - https://arxiv.org/pdf/2103.14030 Code/weights from https://github.com/microsoft/Swin-Transformer, original copyright/license info below S3 (AutoFormerV2, https://arxiv.org/abs/2111.14725) Swin weights from - https://github.com/microsoft/Cream/tree/main/AutoFormerV2 Modifications and additions for timm hacked together by / Copyright 2021, Ross Wightman Module Contents --------------- .. py:data:: default_cfgs .. py:function:: resize_pos_embed(posemb, posemb_new, num_prefix_tokens=1, gs_new=()) .. py:function:: checkpoint_filter_fn(state_dict, model, adapt_layer_scale=False) convert patch embedding weight from manual patchify + linear proj to conv .. py:function:: window_partition(x, window_size: int) :param x: (B, H, W, C) :param window_size: window size :type window_size: int :returns: (num_windows*B, window_size, window_size, C) :rtype: windows .. py:function:: window_reverse(windows, window_size: int, H: int, W: int) :param windows: (num_windows*B, window_size, window_size, C) :param window_size: Window size :type window_size: int :param H: Height of image :type H: int :param W: Width of image :type W: int :returns: (B, H, W, C) :rtype: x .. py:function:: get_relative_position_index(win_h, win_w) .. py:class:: WindowAttention(dim, num_heads, head_dim=None, window_size=7, qkv_bias=True, attn_drop=0.0, proj_drop=0.0) Bases: :py:obj:`torch.nn.Module` Window based multi-head self attention (W-MSA) module with relative position bias. It supports both of shifted and non-shifted window. :param dim: Number of input channels. :type dim: int :param num_heads: Number of attention heads. :type num_heads: int :param head_dim: Number of channels per head (dim // num_heads if not set) :type head_dim: int :param window_size: The height and width of the window. :type window_size: tuple[int] :param qkv_bias: If True, add a learnable bias to query, key, value. Default: True :type qkv_bias: bool, optional :param attn_drop: Dropout ratio of attention weight. Default: 0.0 :type attn_drop: float, optional :param proj_drop: Dropout ratio of output. Default: 0.0 :type proj_drop: float, optional .. py:method:: forward(x, mask: Optional[torch.Tensor] = None) :param x: input features with shape of (num_windows*B, N, C) :param mask: (0/-inf) mask with shape of (num_windows, Wh*Ww, Wh*Ww) or None .. py:class:: SwinTransformerBlock(dim, input_resolution, num_heads=4, head_dim=None, window_size=7, shift_size=0, mlp_ratio=4.0, qkv_bias=True, drop=0.0, attn_drop=0.0, drop_path=0.0, act_layer=nn.GELU, norm_layer=nn.LayerNorm) Bases: :py:obj:`torch.nn.Module` Swin Transformer Block. :param dim: Number of input channels. :type dim: int :param input_resolution: Input resolution. :type input_resolution: tuple[int] :param window_size: Window size. :type window_size: int :param num_heads: Number of attention heads. :type num_heads: int :param head_dim: Enforce the number of channels per head :type head_dim: int :param shift_size: Shift size for SW-MSA. :type shift_size: int :param mlp_ratio: Ratio of mlp hidden dim to embedding dim. :type mlp_ratio: float :param qkv_bias: If True, add a learnable bias to query, key, value. Default: True :type qkv_bias: bool, optional :param drop: Dropout rate. Default: 0.0 :type drop: float, optional :param attn_drop: Attention dropout rate. Default: 0.0 :type attn_drop: float, optional :param drop_path: Stochastic depth rate. Default: 0.0 :type drop_path: float, optional :param act_layer: Activation layer. Default: nn.GELU :type act_layer: nn.Module, optional :param norm_layer: Normalization layer. Default: nn.LayerNorm :type norm_layer: nn.Module, optional .. py:method:: forward(x) .. py:class:: PatchMerging(input_resolution, dim, out_dim=None, norm_layer=nn.LayerNorm) Bases: :py:obj:`torch.nn.Module` Patch Merging Layer. :param input_resolution: Resolution of input feature. :type input_resolution: tuple[int] :param dim: Number of input channels. :type dim: int :param norm_layer: Normalization layer. Default: nn.LayerNorm :type norm_layer: nn.Module, optional .. py:method:: forward(x) x: B, H*W, C .. py:class:: BasicLayer(dim, out_dim, input_resolution, depth, num_heads=4, head_dim=None, window_size=7, mlp_ratio=4.0, qkv_bias=True, drop=0.0, attn_drop=0.0, drop_path=0.0, norm_layer=nn.LayerNorm, downsample=None) Bases: :py:obj:`torch.nn.Module` A basic Swin Transformer layer for one stage. :param dim: Number of input channels. :type dim: int :param input_resolution: Input resolution. :type input_resolution: tuple[int] :param depth: Number of blocks. :type depth: int :param num_heads: Number of attention heads. :type num_heads: int :param head_dim: Channels per head (dim // num_heads if not set) :type head_dim: int :param window_size: Local window size. :type window_size: int :param mlp_ratio: Ratio of mlp hidden dim to embedding dim. :type mlp_ratio: float :param qkv_bias: If True, add a learnable bias to query, key, value. Default: True :type qkv_bias: bool, optional :param drop: Dropout rate. Default: 0.0 :type drop: float, optional :param attn_drop: Attention dropout rate. Default: 0.0 :type attn_drop: float, optional :param drop_path: Stochastic depth rate. Default: 0.0 :type drop_path: float | tuple[float], optional :param norm_layer: Normalization layer. Default: nn.LayerNorm :type norm_layer: nn.Module, optional :param downsample: Downsample layer at the end of the layer. Default: None :type downsample: nn.Module | None, optional .. py:method:: forward(x) .. py:class:: SwinTransformer(img_size=224, patch_size=4, in_chans=3, num_classes=1000, global_pool='avg', embed_dim=96, depths=(2, 2, 6, 2), num_heads=(3, 6, 12, 24), head_dim=None, window_size=7, mlp_ratio=4.0, qkv_bias=True, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.1, norm_layer=nn.LayerNorm, ape=False, patch_norm=True, weight_init='', **kwargs) Bases: :py:obj:`torch.nn.Module` Swin Transformer A PyTorch impl of : `Swin Transformer: Hierarchical Vision Transformer using Shifted Windows` - https://arxiv.org/pdf/2103.14030 :param img_size: Input image size. Default 224 :type img_size: int | tuple(int) :param patch_size: Patch size. Default: 4 :type patch_size: int | tuple(int) :param in_chans: Number of input image channels. Default: 3 :type in_chans: int :param num_classes: Number of classes for classification head. Default: 1000 :type num_classes: int :param embed_dim: Patch embedding dimension. Default: 96 :type embed_dim: int :param depths: Depth of each Swin Transformer layer. :type depths: tuple(int) :param num_heads: Number of attention heads in different layers. :type num_heads: tuple(int) :param head_dim: :type head_dim: int, tuple(int) :param window_size: Window size. Default: 7 :type window_size: int :param mlp_ratio: Ratio of mlp hidden dim to embedding dim. Default: 4 :type mlp_ratio: float :param qkv_bias: If True, add a learnable bias to query, key, value. Default: True :type qkv_bias: bool :param drop_rate: Dropout rate. Default: 0 :type drop_rate: float :param attn_drop_rate: Attention dropout rate. Default: 0 :type attn_drop_rate: float :param drop_path_rate: Stochastic depth rate. Default: 0.1 :type drop_path_rate: float :param norm_layer: Normalization layer. Default: nn.LayerNorm. :type norm_layer: nn.Module :param ape: If True, add absolute position embedding to the patch embedding. Default: False :type ape: bool :param patch_norm: If True, add normalization after patch embedding. Default: True :type patch_norm: bool .. py:method:: no_weight_decay() .. py:method:: group_matcher(coarse=False) .. py:method:: set_grad_checkpointing(enable=True) .. py:method:: get_classifier() .. py:method:: reset_classifier(num_classes, global_pool=None) .. py:method:: forward_features(x) .. py:method:: forward_head(x, pre_logits: bool = False) .. py:method:: forward(x) .. py:function:: swin_base_patch4_window12_384(pretrained=False, **kwargs) Swin-B @ 384x384, pretrained ImageNet-22k, fine tune 1k .. py:function:: swin_base_patch4_window7_224(pretrained=False, **kwargs) Swin-B @ 224x224, pretrained ImageNet-22k, fine tune 1k .. py:function:: swin_large_patch4_window12_384(pretrained=False, **kwargs) Swin-L @ 384x384, pretrained ImageNet-22k, fine tune 1k .. py:function:: swin_large_patch4_window7_224(pretrained=False, **kwargs) Swin-L @ 224x224, pretrained ImageNet-22k, fine tune 1k .. py:function:: swin_small_patch4_window7_224(pretrained=False, **kwargs) Swin-S @ 224x224, trained ImageNet-1k .. py:function:: swin_tiny_patch4_window7_224(pretrained=False, **kwargs) Swin-T @ 224x224, trained ImageNet-1k .. py:function:: swin_base_patch4_window12_384_in22k(pretrained=False, **kwargs) Swin-B @ 384x384, trained ImageNet-22k .. py:function:: swin_base_patch4_window7_224_in22k(pretrained=False, **kwargs) Swin-B @ 224x224, trained ImageNet-22k .. py:function:: swin_large_patch4_window12_384_in22k(pretrained=False, **kwargs) Swin-L @ 384x384, trained ImageNet-22k .. py:function:: swin_large_patch4_window7_224_in22k(pretrained=False, **kwargs) Swin-L @ 224x224, trained ImageNet-22k .. py:function:: swin_s3_tiny_224(pretrained=False, **kwargs) Swin-S3-T @ 224x224, ImageNet-1k. https://arxiv.org/abs/2111.14725 .. py:function:: swin_s3_small_224(pretrained=False, **kwargs) Swin-S3-S @ 224x224, trained ImageNet-1k. https://arxiv.org/abs/2111.14725 .. py:function:: swin_s3_base_224(pretrained=False, **kwargs) Swin-S3-B @ 224x224, trained ImageNet-1k. https://arxiv.org/abs/2111.14725 .. py:function:: create_swin(name, **kwargs)