pyiqa.archs.qalign_arch¶

Q-Align: All-in-one Foundation Model for visual scoring.

Reference: @article{wu2023qalign,

title={Q-Align: Teaching LMMs for Visual Scoring via Discrete Text-Defined Levels}, author={Wu, Haoning and Zhang, Zicheng and Zhang, Weixia and Chen, Chaofeng and Li, Chunyi and Liao, Liang and Wang, Annan and Zhang, Erli and Sun, Wenxiu and Yan, Qiong and Min, Xiongkuo and Zhai, Guangtai and Lin, Weisi}, journal={arXiv preprint arXiv:2312.17090}, year={2023}, institution={Nanyang Technological University and Shanghai Jiao Tong University and Sensetime Research}, note={Equal Contribution by Wu, Haoning and Zhang, Zicheng. Project Lead by Wu, Haoning. Corresponding Authors: Zhai, Guangtai and Lin, Weisi.}

}

Reference url: https://github.com/Q-Future/Q-Align

Module Contents¶

pyiqa.archs.qalign_arch.expand2square(pil_img)[source]¶

Pad image to square canvas using CLIP-mean background.

Parameters:: pil_img (PIL.Image.Image) – Input image.
Returns:: Square padded image.
Return type:: PIL.Image.Image

class pyiqa.archs.qalign_arch.QAlign(dtype='fp16')[source]¶

Bases: torch.nn.Module

Q-Align multimodal visual scoring model.

Parameters:: dtype (str) – Inference precision mode. Supported values are 'fp16', '4bit', and '8bit'.

Notes

The current preprocessing path supports batch size 1.

preprocess(x)[source]¶

Convert input tensor to Q-Align CLIP-processor tensor.

Parameters:: x (torch.Tensor) – Input image tensor with shape (1, 3, H, W).
Returns:: Processed image tensor suitable for Q-Align.
Return type:: torch.Tensor
Raises:: AssertionError – If batch size is not 1.

forward(x, task_='quality', input_='image')[source]¶

Run Q-Align scoring.

Parameters:

x (torch.Tensor) – Input tensor with shape (1, 3, H, W).
task (str) – Task prompt. Common options are 'quality' and 'aesthetic'.
input (str) – Input type. Currently only 'image' is supported.

Returns:

Predicted task score.

Return type:

torch.Tensor

Raises:

NotImplementedError – If input_ is not 'image'.