pyiqa.archs.q_align.modeling_llama2

PyTorch Llama model.

Module Contents

pyiqa.archs.q_align.modeling_llama2.logger[source]
class pyiqa.archs.q_align.modeling_llama2.MultiwayNetwork(module_provider, num_multiway=2)[source]

Bases: torch.nn.Module

forward(hidden_states, multiway_indices)[source]
class pyiqa.archs.q_align.modeling_llama2.LlamaRMSNorm(hidden_size, eps=1e-06)[source]

Bases: torch.nn.Module

forward(hidden_states)[source]
class pyiqa.archs.q_align.modeling_llama2.LlamaRotaryEmbedding(dim, max_position_embeddings=2048, base=10000, device=None)[source]

Bases: torch.nn.Module

forward(x, seq_len=None)[source]
class pyiqa.archs.q_align.modeling_llama2.LlamaLinearScalingRotaryEmbedding(dim, max_position_embeddings=2048, base=10000, device=None, scaling_factor=1.0)[source]

Bases: LlamaRotaryEmbedding

LlamaRotaryEmbedding extended with linear scaling. Credits to the Reddit user /u/kaiokendev

class pyiqa.archs.q_align.modeling_llama2.LlamaDynamicNTKScalingRotaryEmbedding(dim, max_position_embeddings=2048, base=10000, device=None, scaling_factor=1.0)[source]

Bases: LlamaRotaryEmbedding

LlamaRotaryEmbedding extended with Dynamic NTK scaling. Credits to the Reddit users /u/bloc97 and /u/emozilla

pyiqa.archs.q_align.modeling_llama2.rotate_half(x)[source]

Rotates half the hidden dims of the input.

pyiqa.archs.q_align.modeling_llama2.apply_rotary_pos_emb(q, k, cos, sin, position_ids)[source]
class pyiqa.archs.q_align.modeling_llama2.LlamaMLP(config)[source]

Bases: torch.nn.Module

forward(x)[source]
pyiqa.archs.q_align.modeling_llama2.repeat_kv(hidden_states: torch.Tensor, n_rep: int) torch.Tensor[source]

This is the torch.repeat_interleave equivalent of llama.cpp’s repeat_kv.

class pyiqa.archs.q_align.modeling_llama2.LlamaAttention(config: pyiqa.archs.q_align.configuration_mplug_owl2.LlamaConfig, layer_idx: int | None = None)[source]

Bases: torch.nn.Module

Multi-headed attention from ‘Attention Is All You Need’ paper

forward(hidden_states: torch.Tensor, modality_indicators: torch.Tensor = None, attention_mask: torch.Tensor | None = None, position_ids: torch.LongTensor | None = None, past_key_value: Tuple[torch.Tensor] | None = None, output_attentions: bool = False, use_cache: bool = False, **kwargs) Tuple[torch.Tensor, torch.Tensor | None, Tuple[torch.Tensor] | None][source]
class pyiqa.archs.q_align.modeling_llama2.LlamaDecoderLayer(config: pyiqa.archs.q_align.configuration_mplug_owl2.LlamaConfig, layer_idx: int)[source]

Bases: torch.nn.Module

forward(hidden_states: torch.Tensor, modality_indicators: torch.Tensor = None, attention_mask: torch.Tensor | None = None, position_ids: torch.LongTensor | None = None, past_key_value: Tuple[torch.Tensor] | None = None, output_attentions: bool | None = False, use_cache: bool | None = False, **kwargs) Tuple[torch.FloatTensor, Tuple[torch.FloatTensor, torch.FloatTensor] | None][source]
Parameters:
  • hidden_states (torch.FloatTensor) – input to the layer of shape (batch, seq_len, embed_dim)

  • attention_mask (torch.FloatTensor, optional) – attention mask of size (batch, 1, tgt_len, src_len) where padding elements are indicated by very large negative values.

  • output_attentions (bool, optional) – Whether or not to return the attentions tensors of all attention layers. See attentions under returned tensors for more detail.

  • use_cache (bool, optional) – If set to True, past_key_values key value states are returned and can be used to speed up decoding (see past_key_values).

  • past_key_value (Tuple(torch.FloatTensor), optional) – cached past key and value projection states

class pyiqa.archs.q_align.modeling_llama2.LlamaModel(config: pyiqa.archs.q_align.configuration_mplug_owl2.LlamaConfig)[source]

Bases: transformers.modeling_utils.PreTrainedModel

Transformer decoder consisting of config.num_hidden_layers layers. Each layer is a [LlamaDecoderLayer]

Parameters:

config – LlamaConfig

get_input_embeddings()[source]
set_input_embeddings(value)[source]
forward(input_ids: torch.LongTensor = None, modality_indicators: torch.Tensor = None, attention_mask: torch.Tensor | None = None, position_ids: torch.LongTensor | None = None, past_key_values: List[torch.FloatTensor] | None = None, inputs_embeds: torch.FloatTensor | None = None, use_cache: bool | None = None, output_attentions: bool | None = None, output_hidden_states: bool | None = None, return_dict: bool | None = None, **kwargs) Tuple | transformers.modeling_outputs.BaseModelOutputWithPast[source]
class pyiqa.archs.q_align.modeling_llama2.LlamaForCausalLM(config)[source]

Bases: transformers.modeling_utils.PreTrainedModel, transformers.generation.GenerationMixin

get_input_embeddings()[source]
set_input_embeddings(value)[source]
get_output_embeddings()[source]
set_output_embeddings(new_embeddings)[source]
set_decoder(decoder)[source]
get_decoder()[source]
forward(input_ids: torch.LongTensor = None, modality_indicators: torch.Tensor = None, attention_mask: torch.Tensor | None = None, position_ids: torch.LongTensor | None = None, past_key_values: List[torch.FloatTensor] | None = None, inputs_embeds: torch.FloatTensor | None = None, labels: torch.LongTensor | None = None, use_cache: bool | None = None, output_attentions: bool | None = None, output_hidden_states: bool | None = None, return_dict: bool | None = None, **kwargs) Tuple | transformers.modeling_outputs.CausalLMOutputWithPast[source]
Parameters:

labels (torch.LongTensor of shape (batch_size, sequence_length), optional) – Labels for computing the masked language modeling loss. Indices should either be in [0, …, config.vocab_size] or -100 (see input_ids docstring). Tokens with indices set to -100 are ignored (masked), the loss is only computed for the tokens with labels in [0, …, config.vocab_size].

Returns:

Example:

```python >>> from transformers import AutoTokenizer, LlamaForCausalLM

>>> model = LlamaForCausalLM.from_pretrained(PATH_TO_CONVERTED_WEIGHTS)
>>> tokenizer = AutoTokenizer.from_pretrained(PATH_TO_CONVERTED_TOKENIZERS)
>>> prompt = "Hey, are you conscious? Can you answer me?"
>>> inputs = tokenizer(prompt, return_tensors="pt")
>>> # Generate
>>> generate_ids = model.generate(inputs.input_ids, max_length=30)
>>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
"Hey, are you conscious? Can you answer me?\nI'm not sure if I'm conscious, but I can answer you."
```
prepare_inputs_for_generation(input_ids, past_key_values=None, attention_mask=None, inputs_embeds=None, **kwargs)[source]
pyiqa.archs.q_align.modeling_llama2.replace_llama_modality_adaptive()[source]