pyiqa.archs.qrealign.qwen3_5_src.configuration_qwen3_5¶

Module Contents¶

class pyiqa.archs.qrealign.qwen3_5_src.configuration_qwen3_5.Qwen3_5TextConfig(vocab_size=248320, hidden_size=4096, intermediate_size=12288, num_hidden_layers=32, num_attention_heads=16, num_key_value_heads=4, hidden_act='silu', max_position_embeddings=32768, initializer_range=0.02, rms_norm_eps=1e-06, use_cache=True, tie_word_embeddings=False, rope_parameters: pyiqa.archs.modeling_rope_utils.RopeParameters | dict[str, pyiqa.archs.modeling_rope_utils.RopeParameters] | None = None, attention_bias=False, attention_dropout=0.0, head_dim=256, linear_conv_kernel_dim=4, linear_key_head_dim=128, linear_value_head_dim=128, linear_num_key_heads=16, linear_num_value_heads=32, layer_types=None, pad_token_id: int | None = None, bos_token_id: int | None = None, eos_token_id: int | None = None, **kwargs)[source]¶

Bases: pyiqa.archs.configuration_utils.PreTrainedConfig

This is the configuration class to store the configuration of a [Qwen3_5TextModel]. It is used to instantiate a Qwen3_5 model according to the specified arguments, defining the model architecture. Instantiating a configuration with the defaults will yield a similar configuration to that of Qwen3.5-9B-Instruct [Qwen/Qwen3.5-9B-Instruct](https://huggingface.co/Qwen/Qwen3.5-9B-Instruct).

Configuration objects inherit from [PreTrainedConfig] and can be used to control the model outputs. Read the documentation from [PreTrainedConfig] for more information.

Parameters:

vocab_size (int, optional, defaults to 248320) – Vocabulary size of the model. Defines the number of different tokens that can be represented by the inputs_ids.
hidden_size (int, optional, defaults to 4096) – Dimension of the hidden representations.
intermediate_size (int, optional, defaults to 12288) – Dimension of the MLP representations.
num_hidden_layers (int, optional, defaults to 32) – Number of hidden layers in the Transformer encoder.
num_attention_heads (int, optional, defaults to 16) – Number of attention heads for each attention layer in the Transformer encoder.
num_key_value_heads (int, optional, defaults to 4) – This is the number of key_value heads that should be used to implement Grouped Query Attention. If num_key_value_heads=num_attention_heads, the model will use Multi Head Attention (MHA), if num_key_value_heads=1 the model will use Multi Query Attention (MQA) otherwise GQA is used. When converting a multi-head checkpoint to a GQA checkpoint, each group key and value head should be constructed by meanpooling all the original heads within that group. For more details checkout [this paper](https://arxiv.org/pdf/2305.13245.pdf). If it is not specified, will default to 32.
hidden_act (str, optional, defaults to “silu”) – The non-linear activation function in the decoder.
max_position_embeddings (int, optional, defaults to 32768) – The maximum sequence length that this model might ever be used with.
initializer_range (float, optional, defaults to 0.02) – The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
rms_norm_eps (float, optional, defaults to 1e-06) – The epsilon used by the rms normalization layers.
use_cache (bool, optional, defaults to True) – Whether or not the model should return the last key/values attentions (not used by all models). Only relevant if config.is_decoder=True.
tie_word_embeddings (bool, optional, defaults to False) – Whether the model’s input and output word embeddings should be tied.
rope_parameters (RopeParameters, optional) – Dictionary containing the configuration parameters for the RoPE embeddings. The dictionary should contain a value for rope_theta and optionally parameters used for scaling in case you want to use RoPE with longer max_position_embeddings.
attention_bias (bool, optional, defaults to False) – Whether to use a bias in the query, key, value and output projection layers during self-attention.
attention_dropout (float, optional, defaults to 0.0) – The dropout ratio for the attention probabilities.
head_dim (int, optional, defaults to 256) – Projection weights dimension in multi-head attention.
linear_conv_kernel_dim (int, optional, defaults to 4) – Kernel size of the convolution used in linear attention layers.
linear_key_head_dim (int, optional, defaults to 128) – Dimension of each key head in linear attention.
linear_value_head_dim (int, optional, defaults to 128) – Dimension of each value head in linear attention.
linear_num_key_heads (int, optional, defaults to 16) – Number of key heads used in linear attention layers.
linear_num_value_heads (int, optional, defaults to 32) – Number of value heads used in linear attention layers.
layer_types (list[str], optional) – Types of each layer (attention or linear).
pad_token_id (int, optional) – Padding token id.
bos_token_id (int, optional) – Beginning of stream token id.
eos_token_id (int, optional) – End of stream token id.

```python >>> from transformers import Qwen3_5TextModel, Qwen3_5TextConfig

>>> # Initializing a Qwen3.5 style configuration
>>> configuration =  Qwen3_5TextConfig()

>>> # Initializing a model from the Qwen3.5-9B style configuration
>>> model = Qwen3_5TextModel(configuration)

>>> # Accessing the model configuration
>>> configuration = model.config
```

class pyiqa.archs.qrealign.qwen3_5_src.configuration_qwen3_5.Qwen3_5Config(text_config=None, vision_config=None, image_token_id=248056, video_token_id=248057, vision_start_token_id=248053, vision_end_token_id=248054, tie_word_embeddings=False, **kwargs)[source]¶

Bases: pyiqa.archs.configuration_utils.PreTrainedConfig

This is the configuration class to store the configuration of a [Qwen3_5Model]. It is used to instantiate a Qwen3.5 model according to the specified arguments, defining the model architecture. Instantiating a configuration with the defaults will yield a similar configuration to that of Qwen3.5-9B-Instruct [Qwen/Qwen3.5-9B-Instruct](https://huggingface.co/Qwen/Qwen3.5-9B-Instruct).

Configuration objects inherit from [PreTrainedConfig] and can be used to control the model outputs. Read the documentation from [PreTrainedConfig] for more information.

Parameters:

text_config (Union[PreTrainedConfig, dict], optional, defaults to Qwen3_5TextConfig) – The config object or dictionary of the text backbone.
vision_config (Union[PreTrainedConfig, dict], optional, defaults to Qwen3_5VisionConfig) – The config object or dictionary of the vision backbone.
image_token_id (int, optional, defaults to 248056) – The image token index to encode the image prompt.
video_token_id (int, optional, defaults to 248057) – The video token index to encode the image prompt.
vision_start_token_id (int, optional, defaults to 248053) – The start token index to encode the image prompt.
vision_end_token_id (int, optional, defaults to 248054) – The end token index to encode the image prompt.
tie_word_embeddings (bool, optional, defaults to False) – Whether to tie the word embeddings.

```python >>> from transformers import Qwen3_5ForConditionalGeneration, Qwen3_5Config

>>> # Initializing a Qwen3.5 style configuration
>>> configuration = Qwen3_5Config()

>>> # Initializing a model from the Qwen3.5-9B style configuration
>>> model = Qwen3_5ForConditionalGeneration(configuration)

>>> # Accessing the model configuration
>>> configuration = model.config
```