pyiqa.archs.topiq_swin

Swin Transformer A PyTorch impl of : Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

Code/weights from https://github.com/microsoft/Swin-Transformer, original copyright/license info below

S3 (AutoFormerV2, https://arxiv.org/abs/2111.14725) Swin weights from

Modifications and additions for timm hacked together by / Copyright 2021, Ross Wightman

Module Contents

pyiqa.archs.topiq_swin.default_cfgs[source]
pyiqa.archs.topiq_swin.resize_pos_embed(posemb, posemb_new, num_prefix_tokens=1, gs_new=())[source]
pyiqa.archs.topiq_swin.checkpoint_filter_fn(state_dict, model, adapt_layer_scale=False)[source]

convert patch embedding weight from manual patchify + linear proj to conv

pyiqa.archs.topiq_swin.window_partition(x, window_size: int)[source]
Parameters:
  • x – (B, H, W, C)

  • window_size (int) – window size

Returns:

(num_windows*B, window_size, window_size, C)

Return type:

windows

pyiqa.archs.topiq_swin.window_reverse(windows, window_size: int, H: int, W: int)[source]
Parameters:
  • windows – (num_windows*B, window_size, window_size, C)

  • window_size (int) – Window size

  • H (int) – Height of image

  • W (int) – Width of image

Returns:

(B, H, W, C)

Return type:

x

pyiqa.archs.topiq_swin.get_relative_position_index(win_h, win_w)[source]
class pyiqa.archs.topiq_swin.WindowAttention(dim, num_heads, head_dim=None, window_size=7, qkv_bias=True, attn_drop=0.0, proj_drop=0.0)[source]

Bases: torch.nn.Module

Window based multi-head self attention (W-MSA) module with relative position bias. It supports both of shifted and non-shifted window.

Parameters:
  • dim (int) – Number of input channels.

  • num_heads (int) – Number of attention heads.

  • head_dim (int) – Number of channels per head (dim // num_heads if not set)

  • window_size (tuple[int]) – The height and width of the window.

  • qkv_bias (bool, optional) – If True, add a learnable bias to query, key, value. Default: True

  • attn_drop (float, optional) – Dropout ratio of attention weight. Default: 0.0

  • proj_drop (float, optional) – Dropout ratio of output. Default: 0.0

forward(x, mask: torch.Tensor | None = None)[source]
Parameters:
  • x – input features with shape of (num_windows*B, N, C)

  • mask – (0/-inf) mask with shape of (num_windows, Wh*Ww, Wh*Ww) or None

class pyiqa.archs.topiq_swin.SwinTransformerBlock(dim, input_resolution, num_heads=4, head_dim=None, window_size=7, shift_size=0, mlp_ratio=4.0, qkv_bias=True, drop=0.0, attn_drop=0.0, drop_path=0.0, act_layer=nn.GELU, norm_layer=nn.LayerNorm)[source]

Bases: torch.nn.Module

Swin Transformer Block.

Parameters:
  • dim (int) – Number of input channels.

  • input_resolution (tuple[int]) – Input resolution.

  • window_size (int) – Window size.

  • num_heads (int) – Number of attention heads.

  • head_dim (int) – Enforce the number of channels per head

  • shift_size (int) – Shift size for SW-MSA.

  • mlp_ratio (float) – Ratio of mlp hidden dim to embedding dim.

  • qkv_bias (bool, optional) – If True, add a learnable bias to query, key, value. Default: True

  • drop (float, optional) – Dropout rate. Default: 0.0

  • attn_drop (float, optional) – Attention dropout rate. Default: 0.0

  • drop_path (float, optional) – Stochastic depth rate. Default: 0.0

  • act_layer (nn.Module, optional) – Activation layer. Default: nn.GELU

  • norm_layer (nn.Module, optional) – Normalization layer. Default: nn.LayerNorm

forward(x)[source]
class pyiqa.archs.topiq_swin.PatchMerging(input_resolution, dim, out_dim=None, norm_layer=nn.LayerNorm)[source]

Bases: torch.nn.Module

Patch Merging Layer.

Parameters:
  • input_resolution (tuple[int]) – Resolution of input feature.

  • dim (int) – Number of input channels.

  • norm_layer (nn.Module, optional) – Normalization layer. Default: nn.LayerNorm

forward(x)[source]

x: B, H*W, C

class pyiqa.archs.topiq_swin.BasicLayer(dim, out_dim, input_resolution, depth, num_heads=4, head_dim=None, window_size=7, mlp_ratio=4.0, qkv_bias=True, drop=0.0, attn_drop=0.0, drop_path=0.0, norm_layer=nn.LayerNorm, downsample=None)[source]

Bases: torch.nn.Module

A basic Swin Transformer layer for one stage.

Parameters:
  • dim (int) – Number of input channels.

  • input_resolution (tuple[int]) – Input resolution.

  • depth (int) – Number of blocks.

  • num_heads (int) – Number of attention heads.

  • head_dim (int) – Channels per head (dim // num_heads if not set)

  • window_size (int) – Local window size.

  • mlp_ratio (float) – Ratio of mlp hidden dim to embedding dim.

  • qkv_bias (bool, optional) – If True, add a learnable bias to query, key, value. Default: True

  • drop (float, optional) – Dropout rate. Default: 0.0

  • attn_drop (float, optional) – Attention dropout rate. Default: 0.0

  • drop_path (float | tuple[float], optional) – Stochastic depth rate. Default: 0.0

  • norm_layer (nn.Module, optional) – Normalization layer. Default: nn.LayerNorm

  • downsample (nn.Module | None, optional) – Downsample layer at the end of the layer. Default: None

forward(x)[source]
class pyiqa.archs.topiq_swin.SwinTransformer(img_size=224, patch_size=4, in_chans=3, num_classes=1000, global_pool='avg', embed_dim=96, depths=(2, 2, 6, 2), num_heads=(3, 6, 12, 24), head_dim=None, window_size=7, mlp_ratio=4.0, qkv_bias=True, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.1, norm_layer=nn.LayerNorm, ape=False, patch_norm=True, weight_init='', **kwargs)[source]

Bases: torch.nn.Module

Swin Transformer
A PyTorch impl ofSwin Transformer: Hierarchical Vision Transformer using Shifted Windows -

https://arxiv.org/pdf/2103.14030

Parameters:
  • img_size (int | tuple(int)) – Input image size. Default 224

  • patch_size (int | tuple(int)) – Patch size. Default: 4

  • in_chans (int) – Number of input image channels. Default: 3

  • num_classes (int) – Number of classes for classification head. Default: 1000

  • embed_dim (int) – Patch embedding dimension. Default: 96

  • depths (tuple(int)) – Depth of each Swin Transformer layer.

  • num_heads (tuple(int)) – Number of attention heads in different layers.

  • head_dim (int, tuple(int))

  • window_size (int) – Window size. Default: 7

  • mlp_ratio (float) – Ratio of mlp hidden dim to embedding dim. Default: 4

  • qkv_bias (bool) – If True, add a learnable bias to query, key, value. Default: True

  • drop_rate (float) – Dropout rate. Default: 0

  • attn_drop_rate (float) – Attention dropout rate. Default: 0

  • drop_path_rate (float) – Stochastic depth rate. Default: 0.1

  • norm_layer (nn.Module) – Normalization layer. Default: nn.LayerNorm.

  • ape (bool) – If True, add absolute position embedding to the patch embedding. Default: False

  • patch_norm (bool) – If True, add normalization after patch embedding. Default: True

no_weight_decay()[source]
group_matcher(coarse=False)[source]
set_grad_checkpointing(enable=True)[source]
get_classifier()[source]
reset_classifier(num_classes, global_pool=None)[source]
forward_features(x)[source]
forward_head(x, pre_logits: bool = False)[source]
forward(x)[source]
pyiqa.archs.topiq_swin.swin_base_patch4_window12_384(pretrained=False, **kwargs)[source]

Swin-B @ 384x384, pretrained ImageNet-22k, fine tune 1k

pyiqa.archs.topiq_swin.swin_base_patch4_window7_224(pretrained=False, **kwargs)[source]

Swin-B @ 224x224, pretrained ImageNet-22k, fine tune 1k

pyiqa.archs.topiq_swin.swin_large_patch4_window12_384(pretrained=False, **kwargs)[source]

Swin-L @ 384x384, pretrained ImageNet-22k, fine tune 1k

pyiqa.archs.topiq_swin.swin_large_patch4_window7_224(pretrained=False, **kwargs)[source]

Swin-L @ 224x224, pretrained ImageNet-22k, fine tune 1k

pyiqa.archs.topiq_swin.swin_small_patch4_window7_224(pretrained=False, **kwargs)[source]

Swin-S @ 224x224, trained ImageNet-1k

pyiqa.archs.topiq_swin.swin_tiny_patch4_window7_224(pretrained=False, **kwargs)[source]

Swin-T @ 224x224, trained ImageNet-1k

pyiqa.archs.topiq_swin.swin_base_patch4_window12_384_in22k(pretrained=False, **kwargs)[source]

Swin-B @ 384x384, trained ImageNet-22k

pyiqa.archs.topiq_swin.swin_base_patch4_window7_224_in22k(pretrained=False, **kwargs)[source]

Swin-B @ 224x224, trained ImageNet-22k

pyiqa.archs.topiq_swin.swin_large_patch4_window12_384_in22k(pretrained=False, **kwargs)[source]

Swin-L @ 384x384, trained ImageNet-22k

pyiqa.archs.topiq_swin.swin_large_patch4_window7_224_in22k(pretrained=False, **kwargs)[source]

Swin-L @ 224x224, trained ImageNet-22k

pyiqa.archs.topiq_swin.swin_s3_tiny_224(pretrained=False, **kwargs)[source]

Swin-S3-T @ 224x224, ImageNet-1k. https://arxiv.org/abs/2111.14725

pyiqa.archs.topiq_swin.swin_s3_small_224(pretrained=False, **kwargs)[source]

Swin-S3-S @ 224x224, trained ImageNet-1k. https://arxiv.org/abs/2111.14725

pyiqa.archs.topiq_swin.swin_s3_base_224(pretrained=False, **kwargs)[source]

Swin-S3-B @ 224x224, trained ImageNet-1k. https://arxiv.org/abs/2111.14725

pyiqa.archs.topiq_swin.create_swin(name, **kwargs)[source]