pyiqa.archs.clip_model¶
Module Contents¶
- pyiqa.archs.clip_model.available_models() List[str][source]¶
Returns the names of available CLIP models
- pyiqa.archs.clip_model.load(name: str, device: Union[str, torch.device]='cuda' if torch.cuda.is_available() else 'cpu', jit: bool = False, download_root: str = None)[source]¶
Load a CLIP model :param name: A model name listed by clip.available_models(), or the path to a model checkpoint containing the state_dict :type name: str :param device: The device to put the loaded model :type device: Union[str, torch.device] :param jit: Whether to load the optimized JIT model or more hackable non-JIT model (default). :type jit: bool :param download_root: path to download the model files; by default, it uses “~/.cache/clip” :type download_root: str
- Returns:
model (torch.nn.Module) – The CLIP model
preprocess (Callable[[PIL.Image], torch.Tensor]) – A torchvision transform that converts a PIL image into a tensor that the returned model can take as its input
- class pyiqa.archs.clip_model.AttentionPool2d(spacial_dim: int, embed_dim: int, num_heads: int, output_dim: int = None)[source]¶
Bases:
torch.nn.Module
- class pyiqa.archs.clip_model.ModifiedResNet(layers, output_dim, heads, input_resolution=224, width=64)[source]¶
Bases:
torch.nn.ModuleA ResNet class that is similar to torchvision’s but contains the following changes: - There are now 3 “stem” convolutions as opposed to 1, with an average pool instead of a max pool. - Performs anti-aliasing strided convolutions, where an avgpool is prepended to convolutions with stride > 1 - The final pooling layer is a QKV attention instead of an average pool
- class pyiqa.archs.clip_model.LayerNorm[source]¶
Bases:
torch.nn.LayerNormSubclass torch’s LayerNorm to handle fp16.
- class pyiqa.archs.clip_model.ResidualAttentionBlock(d_model: int, n_head: int, attn_mask: torch.Tensor = None)[source]¶
Bases:
torch.nn.Module
- class pyiqa.archs.clip_model.Transformer(width: int, layers: int, heads: int, attn_mask: torch.Tensor = None)[source]¶
Bases:
torch.nn.Module
- class pyiqa.archs.clip_model.VisionTransformer(input_resolution: int, patch_size: int, width: int, layers: int, heads: int, output_dim: int)[source]¶
Bases:
torch.nn.Module
- class pyiqa.archs.clip_model.CLIP(embed_dim: int, image_resolution: int, vision_layers: Tuple[int, int, int, int] | int, vision_width: int, vision_patch_size: int, context_length: int, vocab_size: int, transformer_width: int, transformer_heads: int, transformer_layers: int)[source]¶
Bases:
torch.nn.Module