pyiqa.archs.clipscore_arch
==========================

.. py:module:: pyiqa.archs.clipscore_arch

.. autoapi-nested-parse::

   CLIPScore for no reference image caption matching.

   Reference:
       @inproceedings{hessel2021clipscore,
       title={{CLIPScore:} A Reference-free Evaluation Metric for Image Captioning},
       author={Hessel, Jack and Holtzman, Ari and Forbes, Maxwell and Bras, Ronan Le and Choi, Yejin},
       booktitle={EMNLP},
       year={2021}
       }

   Reference url: https://github.com/jmhessel/clipscore
   Re-implemented by: Chaofeng Chen (https://github.com/chaofengc)


Module Contents
---------------

.. py:class:: CLIPScore(backbone='ViT-B/32', w=2.5, prefix='A photo depicts')

   Bases: :py:obj:`torch.nn.Module`


   A PyTorch module for computing image-text similarity scores using the CLIP model.

   :param backbone: The name of the CLIP model backbone to use. Default is 'ViT-B/32'.
   :type backbone: str
   :param w: The weight to apply to the similarity score. Default is 2.5.
   :type w: float
   :param prefix: The prefix to add to each caption when computing text features. Default is 'A photo depicts'.
   :type prefix: str

   .. attribute:: clip_model

      The CLIP model used for computing image and text features.

      :type: CLIP

   .. attribute:: prefix

      The prefix to add to each caption when computing text features.

      :type: str

   .. attribute:: w

      The weight to apply to the similarity score.

      :type: float

   .. method:: forward(img, caption_list)

      Computes the similarity score between the input image and a list of captions.
      

   .. py:method:: forward(img, caption_list=None)

      Computes the similarity score between the input image and a list of captions.

      :param img: Input image tensor.
      :type img: torch.Tensor
      :param caption_list: List of captions to compare with the image.
      :type caption_list: list of str

      :returns: The computed similarity scores.
      :rtype: torch.Tensor