pyiqa.data.multiscale_trans_util ================================ .. py:module:: pyiqa.data.multiscale_trans_util .. autoapi-nested-parse:: Preprocessing utils for Multiscale Transformer Reference: https://github.com/google-research/google-research/blob/5c622d523c/musiq/model/preprocessing.py Modified: Chaofeng Chen (https://github.com/chaofengc) Module Contents --------------- .. py:function:: extract_image_patches(x, kernel, stride=1, dilation=1) Ref: https://stackoverflow.com/a/65886666 .. py:function:: resize_preserve_aspect_ratio(image, h, w, longer_side_length) Aspect-ratio-preserving resizing with tf.image.ResizeMethod.GAUSSIAN. :param image: The image tensor (n_crops, c, h, w). :param h: Height of the input image. :param w: Width of the input image. :param longer_side_length: The length of the longer side after resizing. :returns: A tuple of [Image after resizing, Resized height, Resized width]. .. py:function:: get_hashed_spatial_pos_emb_index(grid_size, count_h, count_w) Get hased spatial pos embedding index for each patch. The size H x W is hashed to grid_size x grid_size. :param grid_size: grid size G for the hashed-based spatial positional embedding. :param count_h: number of patches in each row for the image. :param count_w: number of patches in each column for the image. :returns: hashed position of shape (1, HxW). Each value corresponded to the hashed position index in [0, grid_size x grid_size). .. py:function:: get_multiscale_patches(image, patch_size=32, patch_stride=32, hse_grid_size=10, longer_side_lengths=[224, 384], max_seq_len_from_original_res=None) Extracts image patches from multi-scale representation. :param image: input image tensor with shape [n_crops, 3, h, w] :param patch_size: patch size. :param patch_stride: patch stride. :param hse_grid_size: Hash-based positional embedding grid size. :param longer_side_lengths: List of longer-side lengths for each scale in the multi-scale representation. :param max_seq_len_from_original_res: Maximum number of patches extracted from original resolution. <0 means use all the patches from the original resolution. None means we don't use original resolution input. :returns: A concatenating vector of (patches, HSE, SCE, input mask). The tensor shape is (n_crops, num_patches, patch_size * patch_size * c + 3).