pyiqa.archs.clipscore_arch¶
CLIPScore for no reference image caption matching.
- Reference:
@inproceedings{hessel2021clipscore, title={{CLIPScore:} A Reference-free Evaluation Metric for Image Captioning}, author={Hessel, Jack and Holtzman, Ari and Forbes, Maxwell and Bras, Ronan Le and Choi, Yejin}, booktitle={EMNLP}, year={2021} }
Reference url: https://github.com/jmhessel/clipscore Re-implemented by: Chaofeng Chen (https://github.com/chaofengc)
Module Contents¶
- class pyiqa.archs.clipscore_arch.CLIPScore(backbone='ViT-B/32', w=2.5, prefix='A photo depicts')[source]¶
Bases:
torch.nn.ModuleA PyTorch module for computing image-text similarity scores using the CLIP model.
- Parameters:
backbone (str) – The name of the CLIP model backbone to use. Default is ‘ViT-B/32’.
w (float) – The weight to apply to the similarity score. Default is 2.5.
prefix (str) – The prefix to add to each caption when computing text features. Default is ‘A photo depicts’.
- forward(img, caption_list)[source]¶
Computes the similarity score between the input image and a list of captions.
- forward(img, caption_list=None)[source]¶
Computes the similarity score between the input image and a list of captions.
- Parameters:
img (torch.Tensor) – Input image tensor.
caption_list (list of str) – List of captions to compare with the image.
- Returns:
The computed similarity scores.
- Return type:
torch.Tensor