pyiqa.archs.maclip_arch
=======================

.. py:module:: pyiqa.archs.maclip_arch

.. autoapi-nested-parse::

   Beyond Cosine Similarity: Magnitude-Aware CLIP for No-Reference Image Quality Assessment

   @article{liao2025beyond,
     title={Beyond Cosine Similarity Magnitude-Aware CLIP for No-Reference Image Quality Assessment},
     author={Liao, Zhicheng and Wu, Dongxu and Shi, Zhenshan and Mai, Sijie and Zhu, Hanwei and Zhu, Lingyu and Jiang, Yuncheng and Chen, Baoliang},
     journal={arXiv preprint arXiv:2511.09948},
     year={2025}
   }

   Accepted by AAAI 2026.

   Reference:
       - Arxiv link: https://arxiv.org/abs/2511.09948
       - Official Github: https://github.com/zhix000/MA-CLIP


Module Contents
---------------

.. py:class:: CustomCLIP(backbone: str, device='cpu')

   Bases: :py:obj:`torch.nn.Module`


   Thin wrapper around CLIP image/text encoders used by MACLIP.

   :param backbone: CLIP backbone identifier.
   :type backbone: str
   :param device: Device string used when initializing the model.
   :type device: str


   .. py:method:: forward(image, text, pos_embedding=False, text_features=None)

      Encode image/text and return logits and unnormalized image features.

      :param image: Image tensor with shape ``(N, 3, H, W)``.
      :type image: torch.Tensor
      :param text: Tokenized text tensor.
      :type text: torch.Tensor
      :param pos_embedding: Whether to enable positional embedding branch
                            in the custom CLIP visual encoder.
      :type pos_embedding: bool
      :param text_features: Optional precomputed text
                            features.
      :type text_features: torch.Tensor | None

      :returns:     ``(logits_per_image, logits_per_text, image_features_org)``.
      :rtype: tuple[torch.Tensor, torch.Tensor, torch.Tensor]


.. py:class:: MACLIP(model_type='clipiqa', backbone='RN50', pos_embedding=False)

   Bases: :py:obj:`torch.nn.Module`


   Magnitude-Aware CLIP for no-reference image quality assessment.

   :param model_type: Output type identifier.
   :type model_type: str
   :param backbone: CLIP backbone name.
   :type backbone: str
   :param pos_embedding: Whether to enable visual positional embedding in
                         CLIP image encoding.
   :type pos_embedding: bool

   .. rubric:: Notes

   The current implementation runs on CUDA and is intended for inference.


   .. py:method:: preprocess(img)

      Normalize image and build overlapping 224x224 patch set.

      :param img: Input tensor with shape ``(1, 3, H, W)``.
      :type img: torch.Tensor

      :returns: Patch tensor with shape ``(P, 3, 224, 224)``.
      :rtype: torch.Tensor


   .. py:method:: box_cox(x, lam=0.5, epsilon=1e-06)

      Apply Box-Cox-like transform after per-sample standardization.


   .. py:method:: fusion(cos, norm, base_cos=1.0, base_norm=0.6, alpha=1.0)

      Fuse cosine and magnitude cues with adaptive softmax weighting.

      :param cos: Cosine-similarity based quality scores.
      :type cos: torch.Tensor
      :param norm: Magnitude-cue scores.
      :type norm: torch.Tensor
      :param base_cos: Base weight prior for cosine cue.
      :type base_cos: float
      :param base_norm: Base weight prior for magnitude cue.
      :type base_norm: float
      :param alpha: Adaptive weight adjustment factor.
      :type alpha: float

      :returns:     Fused score, cosine weight, and magnitude weight.
      :rtype: tuple[torch.Tensor, torch.Tensor, torch.Tensor]


   .. py:method:: forward(x, box_lam=0.5, base_cos=1.0, base_norm=0.6, alpha=1.0)

      Compute MACLIP score.

      :param x: Input image tensor with shape ``(1, 3, H, W)``.
      :type x: torch.Tensor
      :param box_lam: Lambda for Box-Cox transform.
      :type box_lam: float
      :param base_cos: Base weight for cosine cue.
      :type base_cos: float
      :param base_norm: Base weight for magnitude cue.
      :type base_norm: float
      :param alpha: Adaptive fusion factor.
      :type alpha: float

      :returns: Scalar quality score.
      :rtype: torch.Tensor