pyiqa.archs.q_align.visual_encoder
==================================

.. py:module:: pyiqa.archs.q_align.visual_encoder


Module Contents
---------------

.. py:function:: find_pruneable_heads_and_indices(heads, n_heads, head_size, already_pruned_heads)

   Compatibility fallback for Transformers>=5 where this helper was removed.


.. py:function:: get_abs_pos(abs_pos, tgt_size)

.. py:function:: get_2d_sincos_pos_embed(embed_dim, grid_size, cls_token=False)

   grid_size: int of the grid height and width
   return:
   pos_embed: [grid_size*grid_size, embed_dim] or [1+grid_size*grid_size, embed_dim] (w/ or w/o cls_token)


.. py:function:: get_2d_sincos_pos_embed_from_grid(embed_dim, grid)

.. py:function:: get_1d_sincos_pos_embed_from_grid(embed_dim, pos)

   embed_dim: output dimension for each position
   pos: a list of positions to be encoded: size (M,)
   out: (M, D)


.. py:class:: MplugOwlVisionEmbeddings(config)

   Bases: :py:obj:`torch.nn.Module`


   .. py:method:: forward(pixel_values: torch.FloatTensor) -> torch.Tensor


.. py:class:: MplugOwlVisionAttention(config)

   Bases: :py:obj:`torch.nn.Module`


   Multi-headed attention from 'Attention Is All You Need' paper


   .. py:method:: forward(hidden_states: torch.Tensor, head_mask: Optional[torch.Tensor] = None, output_attentions: Optional[bool] = False) -> Tuple[torch.Tensor, Optional[torch.Tensor], Optional[Tuple[torch.Tensor]]]

      Input shape: Batch x Time x Channel


.. py:class:: QuickGELU

   Bases: :py:obj:`torch.nn.Module`


   .. py:method:: forward(x: torch.Tensor)


.. py:class:: MplugOwlMLP(config)

   Bases: :py:obj:`torch.nn.Module`


   .. py:method:: forward(hidden_states: torch.Tensor) -> torch.Tensor


.. py:class:: MplugOwlVisionEncoderLayer(config)

   Bases: :py:obj:`torch.nn.Module`


   .. py:method:: forward(hidden_states: torch.Tensor, attention_mask: torch.Tensor, output_attentions: Optional[bool] = False) -> Tuple[torch.FloatTensor]

      :param hidden_states: input to the layer of shape `(batch, seq_len, embed_dim)`
      :type hidden_states: `torch.FloatTensor`
      :param attention_mask: attention mask of size
                             `(batch, 1, tgt_len, src_len)` where padding elements are indicated by very large negative values.
                             `(config.encoder_attention_heads,)`.
      :type attention_mask: `torch.FloatTensor`
      :param output_attentions: Whether or not to return the attentions tensors of all attention layers. See `attentions` under
                                returned tensors for more detail.
      :type output_attentions: `bool`, *optional*


.. py:class:: MplugOwlVisionEncoder(config)

   Bases: :py:obj:`torch.nn.Module`


   Transformer encoder consisting of `config.num_hidden_layers` self attention layers. Each layer is a
   [`MplugOwlVisionEncoderLayer`].

   :param config: The corresponding vision configuration for the `MplugOwlEncoder`.
   :type config: `MplugOwlVisionConfig`


   .. py:method:: forward(inputs_embeds, attention_mask: Optional[torch.Tensor] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None) -> Union[Tuple, transformers.modeling_outputs.BaseModelOutput]

      :param inputs_embeds: Embedded representation of the inputs. Should be float, not int tokens.
      :type inputs_embeds: `torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`
      :param attention_mask: Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`:

                             - 1 for tokens that are **not masked**,
                             - 0 for tokens that are **masked**.

                             [What are attention masks?](../glossary#attention-mask)
      :type attention_mask: `torch.Tensor` of shape `(batch_size, sequence_length)`, *optional*
      :param output_attentions: Whether or not to return the attentions tensors of all attention layers. See `attentions` under
                                returned tensors for more detail.
      :type output_attentions: `bool`, *optional*
      :param output_hidden_states: Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors
                                   for more detail.
      :type output_hidden_states: `bool`, *optional*
      :param return_dict: Whether or not to return a [`~utils.ModelOutput`] instead of a plain tuple.
      :type return_dict: `bool`, *optional*


.. py:class:: MplugOwlVisionModel(config)

   Bases: :py:obj:`transformers.modeling_utils.PreTrainedModel`


   .. py:method:: forward(pixel_values: Optional[torch.FloatTensor] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None) -> Union[Tuple, transformers.modeling_outputs.BaseModelOutputWithPooling]

      Returns:


   .. py:method:: get_input_embeddings()


.. py:class:: MplugOwlVisualAbstractorMLP(config)

   Bases: :py:obj:`torch.nn.Module`


   .. py:method:: forward(hidden_states: torch.Tensor) -> torch.Tensor


.. py:class:: MplugOwlVisualAbstractorMultiHeadAttention(config)

   Bases: :py:obj:`torch.nn.Module`


   .. py:method:: save_attn_gradients(attn_gradients)


   .. py:method:: get_attn_gradients()


   .. py:method:: save_attention_map(attention_map)


   .. py:method:: get_attention_map()


   .. py:method:: transpose_for_scores(x)


   .. py:method:: forward(hidden_states, attention_mask=None, head_mask=None, encoder_hidden_states=None, encoder_attention_mask=None, past_key_value=None, output_attentions=False)


.. py:class:: MplugOwlVisualAbstractorCrossOutput(config)

   Bases: :py:obj:`torch.nn.Module`


   .. py:method:: forward(hidden_states: torch.Tensor, input_tensor: torch.Tensor) -> torch.Tensor


.. py:class:: MplugOwlVisualAbstractorAttention(config)

   Bases: :py:obj:`torch.nn.Module`


   .. py:method:: prune_heads(heads)


   .. py:method:: forward(hidden_states: torch.Tensor, attention_mask: Optional[torch.FloatTensor] = None, head_mask: Optional[torch.FloatTensor] = None, encoder_hidden_states: Optional[torch.FloatTensor] = None, encoder_attention_mask: Optional[torch.FloatTensor] = None, past_key_value: Optional[Tuple[Tuple[torch.FloatTensor]]] = None, output_attentions: Optional[bool] = False) -> Tuple[torch.Tensor]


.. py:class:: MplugOwlVisualAbstractorLayer(config, layer_idx)

   Bases: :py:obj:`torch.nn.Module`


   .. py:method:: forward(hidden_states, attention_mask=None, head_mask=None, encoder_hidden_states=None, encoder_attention_mask=None, output_attentions=False)


.. py:class:: MplugOwlVisualAbstractorEncoder(config)

   Bases: :py:obj:`torch.nn.Module`


   .. py:method:: forward(hidden_states, attention_mask=None, head_mask=None, encoder_hidden_states=None, encoder_attention_mask=None, past_key_values=None, output_attentions=False, output_hidden_states=False, return_dict=True)


.. py:class:: MplugOwlVisualAbstractorModel(config, language_hidden_size)

   Bases: :py:obj:`transformers.modeling_utils.PreTrainedModel`


   .. py:method:: get_head_mask(head_mask, num_hidden_layers, is_attention_chunked=False)

      Compatibility helper for Transformers>=5 where PreTrainedModel.get_head_mask was removed.


   .. py:method:: get_extended_attention_mask(attention_mask: torch.Tensor, input_shape: Tuple[int], device: torch.device) -> torch.Tensor

      Makes broadcastable attention and causal masks so that future and masked tokens are ignored.

      :param attention_mask: Mask with ones indicating tokens to attend to, zeros for tokens to ignore.
      :type attention_mask: `torch.Tensor`
      :param input_shape: The shape of the input to the model.
      :type input_shape: `Tuple[int]`
      :param device: (`torch.device`):
                     The device of the input to the model.

      :returns: `torch.Tensor` The extended attention mask, with a the same dtype as `attention_mask.dtype`.


   .. py:method:: forward(attention_mask=None, head_mask=None, encoder_hidden_states=None, encoder_attention_mask=None, past_key_values=None, output_attentions=None, output_hidden_states=None, return_dict=None)

      encoder_hidden_states  (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, `optional`):
          Sequence of hidden-states at the output of the last layer of the encoder. Used in the cross-attention if
          the model is configured as a decoder.
      encoder_attention_mask (`torch.FloatTensor` of shape `(batch_size, sequence_length)`, `optional`):
          Mask to avoid performing attention on the padding token indices of the encoder input. This mask is used in
          the cross-attention if the model is configured as a decoder. Mask values selected in `[0, 1]`:
          - 1 for tokens that are **not masked**,
          - 0 for tokens that are **masked**.
      past_key_values (`tuple(tuple(torch.FloatTensor))` of length `config.n_layers` with each tuple having 4 tensors of:
          shape `(batch_size, num_heads, sequence_length - 1, embed_size_per_head)`): Contains precomputed key and
          value hidden states of the attention blocks. Can be used to speed up decoding. If `past_key_values` are
          used, the user can optionally input only the last `decoder_input_ids` (those that don't have their past key
          value states given to this model) of shape `(batch_size, 1)` instead of all `decoder_input_ids` of shape
          `(batch_size, sequence_length)`.