pyiqa.data.multiscale_trans_util
================================

.. py:module:: pyiqa.data.multiscale_trans_util

.. autoapi-nested-parse::

   Preprocessing utils for Multiscale Transformer

   Reference: https://github.com/google-research/google-research/blob/5c622d523c/musiq/model/preprocessing.py

   Modified: Chaofeng Chen (https://github.com/chaofengc)


Module Contents
---------------

.. py:function:: extract_image_patches(x, kernel, stride=1, dilation=1)

   Ref: https://stackoverflow.com/a/65886666


.. py:function:: resize_preserve_aspect_ratio(image, h, w, longer_side_length)

   Aspect-ratio-preserving resizing with tf.image.ResizeMethod.GAUSSIAN.
   :param image: The image tensor (n_crops, c, h, w).
   :param h: Height of the input image.
   :param w: Width of the input image.
   :param longer_side_length: The length of the longer side after resizing.

   :returns: A tuple of [Image after resizing, Resized height, Resized width].


.. py:function:: get_hashed_spatial_pos_emb_index(grid_size, count_h, count_w)

   Get hased spatial pos embedding index for each patch.
   The size H x W is hashed to grid_size x grid_size.
   :param grid_size: grid size G for the hashed-based spatial positional embedding.
   :param count_h: number of patches in each row for the image.
   :param count_w: number of patches in each column for the image.

   :returns: hashed position of shape (1, HxW). Each value corresponded to the hashed
             position index in [0, grid_size x grid_size).


.. py:function:: get_multiscale_patches(image, patch_size=32, patch_stride=32, hse_grid_size=10, longer_side_lengths=[224, 384], max_seq_len_from_original_res=None)

   Extracts image patches from multi-scale representation.
   :param image: input image tensor with shape [n_crops, 3, h, w]
   :param patch_size: patch size.
   :param patch_stride: patch stride.
   :param hse_grid_size: Hash-based positional embedding grid size.
   :param longer_side_lengths: List of longer-side lengths for each scale in the
                               multi-scale representation.
   :param max_seq_len_from_original_res: Maximum number of patches extracted from
                                         original resolution. <0 means use all the patches from the original
                                         resolution. None means we don't use original resolution input.

   :returns: A concatenating vector of (patches, HSE, SCE, input mask). The tensor shape
             is (n_crops, num_patches, patch_size * patch_size * c + 3).