pyiqa.archs.clipscore_arch
==========================

.. py:module:: pyiqa.archs.clipscore_arch

.. autoapi-nested-parse::

   CLIPScore for no reference image caption matching.

   Reference:
       @inproceedings{hessel2021clipscore,
       title={{CLIPScore:} A Reference-free Evaluation Metric for Image Captioning},
       author={Hessel, Jack and Holtzman, Ari and Forbes, Maxwell and Bras, Ronan Le and Choi, Yejin},
       booktitle={EMNLP},
       year={2021}
       }

   Reference url: https://github.com/jmhessel/clipscore
   Re-implemented by: Chaofeng Chen (https://github.com/chaofengc)


Module Contents
---------------

.. py:class:: CLIPScore(backbone='ViT-B/32', w=2.5, prefix='A photo depicts')

   Bases: :py:obj:`torch.nn.Module`


   Compute CLIPScore between an image and one or more captions.

   The implementation follows the original CLIPScore formulation and returns a
   non-negative image-text similarity score:

   .. math::

       s = w \cdot \max(\cos(f_{img}, f_{txt}), 0)

   :param backbone: CLIP backbone name accepted by :mod:`clip`, for example
                    ``"ViT-B/32"``.
   :type backbone: str
   :param w: Multiplicative scaling factor applied to cosine similarity.
   :type w: float
   :param prefix: Text prefix prepended to each caption before tokenization.
   :type prefix: str

   .. rubric:: Example

   >>> metric = CLIPScore(backbone='ViT-B/32')
   >>> img = torch.rand(2, 3, 224, 224)
   >>> score = metric(img, ['a dog on grass', 'a city street'])
   >>> score.shape
   torch.Size([2])


   .. py:method:: forward(img, caption_list=None)

      Compute CLIPScore for each image-caption pair.

      :param img: Input tensor with shape ``(N, 3, H, W)``.
      :type img: torch.Tensor
      :param caption_list: List of length ``N`` containing
                           captions paired with each image.
      :type caption_list: list[str] | None

      :returns: Score tensor with shape ``(N,)``.
      :rtype: torch.Tensor

      :raises AssertionError: If ``caption_list`` is not provided.