pyiqa.archs.qrealign.qwen3_5_src.configuration_qwen3_5

Module Contents

class pyiqa.archs.qrealign.qwen3_5_src.configuration_qwen3_5.Qwen3_5TextConfig(vocab_size=248320, hidden_size=4096, intermediate_size=12288, num_hidden_layers=32, num_attention_heads=16, num_key_value_heads=4, hidden_act='silu', max_position_embeddings=32768, initializer_range=0.02, rms_norm_eps=1e-06, use_cache=True, tie_word_embeddings=False, rope_parameters: pyiqa.archs.modeling_rope_utils.RopeParameters | dict[str, pyiqa.archs.modeling_rope_utils.RopeParameters] | None = None, attention_bias=False, attention_dropout=0.0, head_dim=256, linear_conv_kernel_dim=4, linear_key_head_dim=128, linear_value_head_dim=128, linear_num_key_heads=16, linear_num_value_heads=32, layer_types=None, pad_token_id: int | None = None, bos_token_id: int | None = None, eos_token_id: int | None = None, **kwargs)[source]

Bases: pyiqa.archs.configuration_utils.PreTrainedConfig

This is the configuration class to store the configuration of a [Qwen3_5TextModel]. It is used to instantiate a Qwen3_5 model according to the specified arguments, defining the model architecture. Instantiating a configuration with the defaults will yield a similar configuration to that of Qwen3.5-9B-Instruct [Qwen/Qwen3.5-9B-Instruct](https://huggingface.co/Qwen/Qwen3.5-9B-Instruct).

Configuration objects inherit from [PreTrainedConfig] and can be used to control the model outputs. Read the documentation from [PreTrainedConfig] for more information.

Parameters:
  • vocab_size (int, optional, defaults to 248320) – Vocabulary size of the model. Defines the number of different tokens that can be represented by the inputs_ids.

  • hidden_size (int, optional, defaults to 4096) – Dimension of the hidden representations.

  • intermediate_size (int, optional, defaults to 12288) – Dimension of the MLP representations.

  • num_hidden_layers (int, optional, defaults to 32) – Number of hidden layers in the Transformer encoder.

  • num_attention_heads (int, optional, defaults to 16) – Number of attention heads for each attention layer in the Transformer encoder.

  • num_key_value_heads (int, optional, defaults to 4) – This is the number of key_value heads that should be used to implement Grouped Query Attention. If num_key_value_heads=num_attention_heads, the model will use Multi Head Attention (MHA), if num_key_value_heads=1 the model will use Multi Query Attention (MQA) otherwise GQA is used. When converting a multi-head checkpoint to a GQA checkpoint, each group key and value head should be constructed by meanpooling all the original heads within that group. For more details checkout [this paper](https://arxiv.org/pdf/2305.13245.pdf). If it is not specified, will default to 32.

  • hidden_act (str, optional, defaults to “silu”) – The non-linear activation function in the decoder.

  • max_position_embeddings (int, optional, defaults to 32768) – The maximum sequence length that this model might ever be used with.

  • initializer_range (float, optional, defaults to 0.02) – The standard deviation of the truncated_normal_initializer for initializing all weight matrices.

  • rms_norm_eps (float, optional, defaults to 1e-06) – The epsilon used by the rms normalization layers.

  • use_cache (bool, optional, defaults to True) – Whether or not the model should return the last key/values attentions (not used by all models). Only relevant if config.is_decoder=True.

  • tie_word_embeddings (bool, optional, defaults to False) – Whether the model’s input and output word embeddings should be tied.

  • rope_parameters (RopeParameters, optional) – Dictionary containing the configuration parameters for the RoPE embeddings. The dictionary should contain a value for rope_theta and optionally parameters used for scaling in case you want to use RoPE with longer max_position_embeddings.

  • attention_bias (bool, optional, defaults to False) – Whether to use a bias in the query, key, value and output projection layers during self-attention.

  • attention_dropout (float, optional, defaults to 0.0) – The dropout ratio for the attention probabilities.

  • head_dim (int, optional, defaults to 256) – Projection weights dimension in multi-head attention.

  • linear_conv_kernel_dim (int, optional, defaults to 4) – Kernel size of the convolution used in linear attention layers.

  • linear_key_head_dim (int, optional, defaults to 128) – Dimension of each key head in linear attention.

  • linear_value_head_dim (int, optional, defaults to 128) – Dimension of each value head in linear attention.

  • linear_num_key_heads (int, optional, defaults to 16) – Number of key heads used in linear attention layers.

  • linear_num_value_heads (int, optional, defaults to 32) – Number of value heads used in linear attention layers.

  • layer_types (list[str], optional) – Types of each layer (attention or linear).

  • pad_token_id (int, optional) – Padding token id.

  • bos_token_id (int, optional) – Beginning of stream token id.

  • eos_token_id (int, optional) – End of stream token id.

```python >>> from transformers import Qwen3_5TextModel, Qwen3_5TextConfig

>>> # Initializing a Qwen3.5 style configuration
>>> configuration =  Qwen3_5TextConfig()
>>> # Initializing a model from the Qwen3.5-9B style configuration
>>> model = Qwen3_5TextModel(configuration)
>>> # Accessing the model configuration
>>> configuration = model.config
```
class pyiqa.archs.qrealign.qwen3_5_src.configuration_qwen3_5.Qwen3_5Config(text_config=None, vision_config=None, image_token_id=248056, video_token_id=248057, vision_start_token_id=248053, vision_end_token_id=248054, tie_word_embeddings=False, **kwargs)[source]

Bases: pyiqa.archs.configuration_utils.PreTrainedConfig

This is the configuration class to store the configuration of a [Qwen3_5Model]. It is used to instantiate a Qwen3.5 model according to the specified arguments, defining the model architecture. Instantiating a configuration with the defaults will yield a similar configuration to that of Qwen3.5-9B-Instruct [Qwen/Qwen3.5-9B-Instruct](https://huggingface.co/Qwen/Qwen3.5-9B-Instruct).

Configuration objects inherit from [PreTrainedConfig] and can be used to control the model outputs. Read the documentation from [PreTrainedConfig] for more information.

Parameters:
  • text_config (Union[PreTrainedConfig, dict], optional, defaults to Qwen3_5TextConfig) – The config object or dictionary of the text backbone.

  • vision_config (Union[PreTrainedConfig, dict], optional, defaults to Qwen3_5VisionConfig) – The config object or dictionary of the vision backbone.

  • image_token_id (int, optional, defaults to 248056) – The image token index to encode the image prompt.

  • video_token_id (int, optional, defaults to 248057) – The video token index to encode the image prompt.

  • vision_start_token_id (int, optional, defaults to 248053) – The start token index to encode the image prompt.

  • vision_end_token_id (int, optional, defaults to 248054) – The end token index to encode the image prompt.

  • tie_word_embeddings (bool, optional, defaults to False) – Whether to tie the word embeddings.

```python >>> from transformers import Qwen3_5ForConditionalGeneration, Qwen3_5Config

>>> # Initializing a Qwen3.5 style configuration
>>> configuration = Qwen3_5Config()
>>> # Initializing a model from the Qwen3.5-9B style configuration
>>> model = Qwen3_5ForConditionalGeneration(configuration)
>>> # Accessing the model configuration
>>> configuration = model.config
```