pyiqa.archs.q_align.modeling_llama2 =================================== .. py:module:: pyiqa.archs.q_align.modeling_llama2 .. autoapi-nested-parse:: PyTorch Llama model. Module Contents --------------- .. py:data:: logger .. py:class:: MultiwayNetwork(module_provider, num_multiway=2) Bases: :py:obj:`torch.nn.Module` .. py:method:: forward(hidden_states, multiway_indices) .. py:class:: LlamaRMSNorm(hidden_size, eps=1e-06) Bases: :py:obj:`torch.nn.Module` .. py:method:: forward(hidden_states) .. py:class:: LlamaRotaryEmbedding(dim, max_position_embeddings=2048, base=10000, device=None) Bases: :py:obj:`torch.nn.Module` .. py:method:: forward(x, seq_len=None) .. py:class:: LlamaLinearScalingRotaryEmbedding(dim, max_position_embeddings=2048, base=10000, device=None, scaling_factor=1.0) Bases: :py:obj:`LlamaRotaryEmbedding` LlamaRotaryEmbedding extended with linear scaling. Credits to the Reddit user /u/kaiokendev .. py:class:: LlamaDynamicNTKScalingRotaryEmbedding(dim, max_position_embeddings=2048, base=10000, device=None, scaling_factor=1.0) Bases: :py:obj:`LlamaRotaryEmbedding` LlamaRotaryEmbedding extended with Dynamic NTK scaling. Credits to the Reddit users /u/bloc97 and /u/emozilla .. py:function:: rotate_half(x) Rotates half the hidden dims of the input. .. py:function:: apply_rotary_pos_emb(q, k, cos, sin, position_ids) .. py:class:: LlamaMLP(config) Bases: :py:obj:`torch.nn.Module` .. py:method:: forward(x) .. py:function:: repeat_kv(hidden_states: torch.Tensor, n_rep: int) -> torch.Tensor This is the torch.repeat_interleave equivalent of llama.cpp's repeat_kv. .. py:class:: LlamaAttention(config: pyiqa.archs.q_align.configuration_mplug_owl2.LlamaConfig, layer_idx: Optional[int] = None) Bases: :py:obj:`torch.nn.Module` Multi-headed attention from 'Attention Is All You Need' paper .. py:method:: forward(hidden_states: torch.Tensor, modality_indicators: torch.Tensor = None, attention_mask: Optional[torch.Tensor] = None, position_ids: Optional[torch.LongTensor] = None, past_key_value: Optional[Tuple[torch.Tensor]] = None, output_attentions: bool = False, use_cache: bool = False, **kwargs) -> Tuple[torch.Tensor, Optional[torch.Tensor], Optional[Tuple[torch.Tensor]]] .. py:class:: LlamaDecoderLayer(config: pyiqa.archs.q_align.configuration_mplug_owl2.LlamaConfig, layer_idx: int) Bases: :py:obj:`torch.nn.Module` .. py:method:: forward(hidden_states: torch.Tensor, modality_indicators: torch.Tensor = None, attention_mask: Optional[torch.Tensor] = None, position_ids: Optional[torch.LongTensor] = None, past_key_value: Optional[Tuple[torch.Tensor]] = None, output_attentions: Optional[bool] = False, use_cache: Optional[bool] = False, **kwargs) -> Tuple[torch.FloatTensor, Optional[Tuple[torch.FloatTensor, torch.FloatTensor]]] :param hidden_states: input to the layer of shape `(batch, seq_len, embed_dim)` :type hidden_states: `torch.FloatTensor` :param attention_mask: attention mask of size `(batch, 1, tgt_len, src_len)` where padding elements are indicated by very large negative values. :type attention_mask: `torch.FloatTensor`, *optional* :param output_attentions: Whether or not to return the attentions tensors of all attention layers. See `attentions` under returned tensors for more detail. :type output_attentions: `bool`, *optional* :param use_cache: If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding (see `past_key_values`). :type use_cache: `bool`, *optional* :param past_key_value: cached past key and value projection states :type past_key_value: `Tuple(torch.FloatTensor)`, *optional* .. py:class:: LlamaModel(config: pyiqa.archs.q_align.configuration_mplug_owl2.LlamaConfig) Bases: :py:obj:`transformers.modeling_utils.PreTrainedModel` Transformer decoder consisting of *config.num_hidden_layers* layers. Each layer is a [`LlamaDecoderLayer`] :param config: LlamaConfig .. py:method:: get_input_embeddings() .. py:method:: set_input_embeddings(value) .. py:method:: forward(input_ids: torch.LongTensor = None, modality_indicators: torch.Tensor = None, attention_mask: Optional[torch.Tensor] = None, position_ids: Optional[torch.LongTensor] = None, past_key_values: Optional[List[torch.FloatTensor]] = None, inputs_embeds: Optional[torch.FloatTensor] = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, **kwargs) -> Union[Tuple, transformers.modeling_outputs.BaseModelOutputWithPast] .. py:class:: LlamaForCausalLM(config) Bases: :py:obj:`transformers.modeling_utils.PreTrainedModel`, :py:obj:`transformers.generation.GenerationMixin` .. py:method:: get_input_embeddings() .. py:method:: set_input_embeddings(value) .. py:method:: get_output_embeddings() .. py:method:: set_output_embeddings(new_embeddings) .. py:method:: set_decoder(decoder) .. py:method:: get_decoder() .. py:method:: forward(input_ids: torch.LongTensor = None, modality_indicators: torch.Tensor = None, attention_mask: Optional[torch.Tensor] = None, position_ids: Optional[torch.LongTensor] = None, past_key_values: Optional[List[torch.FloatTensor]] = None, inputs_embeds: Optional[torch.FloatTensor] = None, labels: Optional[torch.LongTensor] = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, **kwargs) -> Union[Tuple, transformers.modeling_outputs.CausalLMOutputWithPast] :param labels: Labels for computing the masked language modeling loss. Indices should either be in `[0, ..., config.vocab_size]` or -100 (see `input_ids` docstring). Tokens with indices set to `-100` are ignored (masked), the loss is only computed for the tokens with labels in `[0, ..., config.vocab_size]`. :type labels: `torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional* Returns: Example: ```python >>> from transformers import AutoTokenizer, LlamaForCausalLM >>> model = LlamaForCausalLM.from_pretrained(PATH_TO_CONVERTED_WEIGHTS) >>> tokenizer = AutoTokenizer.from_pretrained(PATH_TO_CONVERTED_TOKENIZERS) >>> prompt = "Hey, are you conscious? Can you answer me?" >>> inputs = tokenizer(prompt, return_tensors="pt") >>> # Generate >>> generate_ids = model.generate(inputs.input_ids, max_length=30) >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0] "Hey, are you conscious? Can you answer me?\nI'm not sure if I'm conscious, but I can answer you." ``` .. py:method:: prepare_inputs_for_generation(input_ids, past_key_values=None, attention_mask=None, inputs_embeds=None, **kwargs) .. py:function:: replace_llama_modality_adaptive()