pyiqa.data.multiscale_trans_util¶

Preprocessing utils for Multiscale Transformer

Reference: https://github.com/google-research/google-research/blob/5c622d523c/musiq/model/preprocessing.py

Modified: Chaofeng Chen (https://github.com/chaofengc)

Module Contents¶

pyiqa.data.multiscale_trans_util.extract_image_patches(x, kernel, stride=1, dilation=1)[source]¶: Ref: https://stackoverflow.com/a/65886666

pyiqa.data.multiscale_trans_util.resize_preserve_aspect_ratio(image, h, w, longer_side_length)[source]¶

Aspect-ratio-preserving resizing with tf.image.ResizeMethod.GAUSSIAN. :param image: The image tensor (n_crops, c, h, w). :param h: Height of the input image. :param w: Width of the input image. :param longer_side_length: The length of the longer side after resizing.

Returns:: A tuple of [Image after resizing, Resized height, Resized width].

pyiqa.data.multiscale_trans_util.get_hashed_spatial_pos_emb_index(grid_size, count_h, count_w)[source]¶

Get hased spatial pos embedding index for each patch. The size H x W is hashed to grid_size x grid_size. :param grid_size: grid size G for the hashed-based spatial positional embedding. :param count_h: number of patches in each row for the image. :param count_w: number of patches in each column for the image.

Returns:: hashed position of shape (1, HxW). Each value corresponded to the hashed position index in [0, grid_size x grid_size).

pyiqa.data.multiscale_trans_util.get_multiscale_patches(image, patch_size=32, patch_stride=32, hse_grid_size=10, longer_side_lengths=[224, 384], max_seq_len_from_original_res=None)[source]¶

Extracts image patches from multi-scale representation. :param image: input image tensor with shape [n_crops, 3, h, w] :param patch_size: patch size. :param patch_stride: patch stride. :param hse_grid_size: Hash-based positional embedding grid size. :param longer_side_lengths: List of longer-side lengths for each scale in the

multi-scale representation.

Parameters:: max_seq_len_from_original_res – Maximum number of patches extracted from original resolution. <0 means use all the patches from the original resolution. None means we don’t use original resolution input.
Returns:: A concatenating vector of (patches, HSE, SCE, input mask). The tensor shape is (n_crops, num_patches, patch_size * patch_size * c + 3).