pyiqa.archs.clipscore_arch ========================== .. py:module:: pyiqa.archs.clipscore_arch .. autoapi-nested-parse:: CLIPScore for no reference image caption matching. Reference: @inproceedings{hessel2021clipscore, title={{CLIPScore:} A Reference-free Evaluation Metric for Image Captioning}, author={Hessel, Jack and Holtzman, Ari and Forbes, Maxwell and Bras, Ronan Le and Choi, Yejin}, booktitle={EMNLP}, year={2021} } Reference url: https://github.com/jmhessel/clipscore Re-implemented by: Chaofeng Chen (https://github.com/chaofengc) Module Contents --------------- .. py:class:: CLIPScore(backbone='ViT-B/32', w=2.5, prefix='A photo depicts') Bases: :py:obj:`torch.nn.Module` A PyTorch module for computing image-text similarity scores using the CLIP model. :param backbone: The name of the CLIP model backbone to use. Default is 'ViT-B/32'. :type backbone: str :param w: The weight to apply to the similarity score. Default is 2.5. :type w: float :param prefix: The prefix to add to each caption when computing text features. Default is 'A photo depicts'. :type prefix: str .. attribute:: clip_model The CLIP model used for computing image and text features. :type: CLIP .. attribute:: prefix The prefix to add to each caption when computing text features. :type: str .. attribute:: w The weight to apply to the similarity score. :type: float .. method:: forward(img, caption_list) Computes the similarity score between the input image and a list of captions. .. py:method:: forward(img, caption_list=None) Computes the similarity score between the input image and a list of captions. :param img: Input image tensor. :type img: torch.Tensor :param caption_list: List of captions to compare with the image. :type caption_list: list of str :returns: The computed similarity scores. :rtype: torch.Tensor