pyiqa.archs.maclip_arch¶
Beyond Cosine Similarity: Magnitude-Aware CLIP for No-Reference Image Quality Assessment
- @article{liao2025beyond,
title={Beyond Cosine Similarity Magnitude-Aware CLIP for No-Reference Image Quality Assessment}, author={Liao, Zhicheng and Wu, Dongxu and Shi, Zhenshan and Mai, Sijie and Zhu, Hanwei and Zhu, Lingyu and Jiang, Yuncheng and Chen, Baoliang}, journal={arXiv preprint arXiv:2511.09948}, year={2025}
}
Accepted by AAAI 2026.
- Reference:
Arxiv link: https://arxiv.org/abs/2511.09948
Official Github: https://github.com/zhix000/MA-CLIP
Module Contents¶
- class pyiqa.archs.maclip_arch.CustomCLIP(backbone: str, device='cpu')[source]¶
Bases:
torch.nn.ModuleThin wrapper around CLIP image/text encoders used by MACLIP.
- Parameters:
backbone (str) – CLIP backbone identifier.
device (str) – Device string used when initializing the model.
- forward(image, text, pos_embedding=False, text_features=None)[source]¶
Encode image/text and return logits and unnormalized image features.
- Parameters:
image (torch.Tensor) – Image tensor with shape
(N, 3, H, W).text (torch.Tensor) – Tokenized text tensor.
pos_embedding (bool) – Whether to enable positional embedding branch in the custom CLIP visual encoder.
text_features (torch.Tensor | None) – Optional precomputed text features.
- Returns:
(logits_per_image, logits_per_text, image_features_org).- Return type:
tuple[torch.Tensor, torch.Tensor, torch.Tensor]
- class pyiqa.archs.maclip_arch.MACLIP(model_type='clipiqa', backbone='RN50', pos_embedding=False)[source]¶
Bases:
torch.nn.ModuleMagnitude-Aware CLIP for no-reference image quality assessment.
- Parameters:
model_type (str) – Output type identifier.
backbone (str) – CLIP backbone name.
pos_embedding (bool) – Whether to enable visual positional embedding in CLIP image encoding.
Notes
The current implementation runs on CUDA and is intended for inference.
- preprocess(img)[source]¶
Normalize image and build overlapping 224x224 patch set.
- Parameters:
img (torch.Tensor) – Input tensor with shape
(1, 3, H, W).- Returns:
Patch tensor with shape
(P, 3, 224, 224).- Return type:
torch.Tensor
- box_cox(x, lam=0.5, epsilon=1e-06)[source]¶
Apply Box-Cox-like transform after per-sample standardization.
- fusion(cos, norm, base_cos=1.0, base_norm=0.6, alpha=1.0)[source]¶
Fuse cosine and magnitude cues with adaptive softmax weighting.
- Parameters:
cos (torch.Tensor) – Cosine-similarity based quality scores.
norm (torch.Tensor) – Magnitude-cue scores.
base_cos (float) – Base weight prior for cosine cue.
base_norm (float) – Base weight prior for magnitude cue.
alpha (float) – Adaptive weight adjustment factor.
- Returns:
Fused score, cosine weight, and magnitude weight.
- Return type:
tuple[torch.Tensor, torch.Tensor, torch.Tensor]
- forward(x, box_lam=0.5, base_cos=1.0, base_norm=0.6, alpha=1.0)[source]¶
Compute MACLIP score.
- Parameters:
x (torch.Tensor) – Input image tensor with shape
(1, 3, H, W).box_lam (float) – Lambda for Box-Cox transform.
base_cos (float) – Base weight for cosine cue.
base_norm (float) – Base weight for magnitude cue.
alpha (float) – Adaptive fusion factor.
- Returns:
Scalar quality score.
- Return type:
torch.Tensor