pyiqa.archs.clip_model
======================

.. py:module:: pyiqa.archs.clip_model


Module Contents
---------------

.. py:function:: available_models() -> List[str]

   Returns the names of available CLIP models


.. py:function:: load(name: str, device: Union[str, torch.device] = 'cuda' if torch.cuda.is_available() else 'cpu', jit: bool = False, download_root: str = None)

   Load a CLIP model
   :param name: A model name listed by `clip.available_models()`, or the path to a model checkpoint containing the state_dict
   :type name: str
   :param device: The device to put the loaded model
   :type device: Union[str, torch.device]
   :param jit: Whether to load the optimized JIT model or more hackable non-JIT model (default).
   :type jit: bool
   :param download_root: path to download the model files; by default, it uses "~/.cache/clip"
   :type download_root: str

   :returns: * **model** (*torch.nn.Module*) -- The CLIP model
             * **preprocess** (*Callable[[PIL.Image], torch.Tensor]*) -- A torchvision transform that converts a PIL image into a tensor that the returned model can take as its input


.. py:class:: Bottleneck(inplanes, planes, stride=1)

   Bases: :py:obj:`torch.nn.Module`


   .. py:method:: forward(x: torch.Tensor)


.. py:class:: AttentionPool2d(spacial_dim: int, embed_dim: int, num_heads: int, output_dim: int = None)

   Bases: :py:obj:`torch.nn.Module`


   .. py:method:: forward(x, return_token=False, pos_embedding=False)


.. py:class:: ModifiedResNet(layers, output_dim, heads, input_resolution=224, width=64)

   Bases: :py:obj:`torch.nn.Module`


   A ResNet class that is similar to torchvision's but contains the following changes:
   - There are now 3 "stem" convolutions as opposed to 1, with an average pool instead of a max pool.
   - Performs anti-aliasing strided convolutions, where an avgpool is prepended to convolutions with stride > 1
   - The final pooling layer is a QKV attention instead of an average pool


   .. py:method:: forward_features(x, return_token=False, pos_embedding=False)


   .. py:method:: forward(x, return_token=False, pos_embedding=False)


.. py:class:: LayerNorm

   Bases: :py:obj:`torch.nn.LayerNorm`


   Subclass torch's LayerNorm to handle fp16.


   .. py:method:: forward(x: torch.Tensor)


.. py:class:: QuickGELU

   Bases: :py:obj:`torch.nn.Module`


   .. py:method:: forward(x: torch.Tensor)


.. py:class:: ResidualAttentionBlock(d_model: int, n_head: int, attn_mask: torch.Tensor = None)

   Bases: :py:obj:`torch.nn.Module`


   .. py:method:: attention(x: torch.Tensor)


   .. py:method:: forward(x: torch.Tensor)


.. py:class:: Transformer(width: int, layers: int, heads: int, attn_mask: torch.Tensor = None)

   Bases: :py:obj:`torch.nn.Module`


   .. py:method:: forward(x: torch.Tensor)


.. py:class:: VisionTransformer(input_resolution: int, patch_size: int, width: int, layers: int, heads: int, output_dim: int)

   Bases: :py:obj:`torch.nn.Module`


   .. py:method:: forward(x: torch.Tensor, return_token=False, pos_embedding=False)


.. py:class:: CLIP(embed_dim: int, image_resolution: int, vision_layers: Union[Tuple[int, int, int, int], int], vision_width: int, vision_patch_size: int, context_length: int, vocab_size: int, transformer_width: int, transformer_heads: int, transformer_layers: int)

   Bases: :py:obj:`torch.nn.Module`


   .. py:method:: initialize_parameters()


   .. py:method:: build_attention_mask()


   .. py:property:: dtype


   .. py:method:: encode_image(image, pos_embedding)


   .. py:method:: encode_text(text)


   .. py:method:: forward(image, text, pos_embedding=False, text_features=None)


.. py:function:: convert_weights(model: torch.nn.Module)

   Convert applicable model parameters to fp16


.. py:function:: build_model(state_dict: dict)