pyiqa.archs.maclip_arch¶

Beyond Cosine Similarity: Magnitude-Aware CLIP for No-Reference Image Quality Assessment

@article{liao2025beyond,: title={Beyond Cosine Similarity Magnitude-Aware CLIP for No-Reference Image Quality Assessment}, author={Liao, Zhicheng and Wu, Dongxu and Shi, Zhenshan and Mai, Sijie and Zhu, Hanwei and Zhu, Lingyu and Jiang, Yuncheng and Chen, Baoliang}, journal={arXiv preprint arXiv:2511.09948}, year={2025}

}

Accepted by AAAI 2026.

Reference:

Module Contents¶

class pyiqa.archs.maclip_arch.CustomCLIP(backbone: str, device='cpu')[source]¶

Bases: torch.nn.Module

Thin wrapper around CLIP image/text encoders used by MACLIP.

Parameters:

forward(image, text, pos_embedding=False, text_features=None)[source]¶

Encode image/text and return logits and unnormalized image features.

Parameters:

image (torch.Tensor) – Image tensor with shape (N, 3, H, W).
text (torch.Tensor) – Tokenized text tensor.
pos_embedding (bool) – Whether to enable positional embedding branch in the custom CLIP visual encoder.
text_features (torch.Tensor | None) – Optional precomputed text features.

Returns:

(logits_per_image, logits_per_text, image_features_org).

Return type:

tuple[torch.Tensor, torch.Tensor, torch.Tensor]

class pyiqa.archs.maclip_arch.MACLIP(model_type='clipiqa', backbone='RN50', pos_embedding=False)[source]¶

Bases: torch.nn.Module

Magnitude-Aware CLIP for no-reference image quality assessment.

Parameters:

model_type (str) – Output type identifier.
backbone (str) – CLIP backbone name.
pos_embedding (bool) – Whether to enable visual positional embedding in CLIP image encoding.

Notes

The current implementation runs on CUDA and is intended for inference.

preprocess(img)[source]¶

Normalize image and build overlapping 224x224 patch set.

Parameters:: img (torch.Tensor) – Input tensor with shape (1, 3, H, W).
Returns:: Patch tensor with shape (P, 3, 224, 224).
Return type:: torch.Tensor

box_cox(x, lam=0.5, epsilon=1e-06)[source]¶: Apply Box-Cox-like transform after per-sample standardization.

fusion(cos, norm, base_cos=1.0, base_norm=0.6, alpha=1.0)[source]¶

Fuse cosine and magnitude cues with adaptive softmax weighting.

Parameters:

Returns:

Fused score, cosine weight, and magnitude weight.

Return type:

tuple[torch.Tensor, torch.Tensor, torch.Tensor]

forward(x, box_lam=0.5, base_cos=1.0, base_norm=0.6, alpha=1.0)[source]¶

Compute MACLIP score.

Parameters:

Returns:

Scalar quality score.

Return type:

torch.Tensor