Models#

Models Interface#

class tropt.model.BaseModel(model_name)[source]#

Bases: ABC

Parameters:: model_name (str)

property device#: Defaults to available accelerator. Should be overriden if the model is local, and on a specific device.

get_model_name()[source]#

Returns the model identifier string.

Return type:: str

get_usage_stats()[source]#

Returns summary of model usage statistics for logging.

Return type:: Dict[str, int]

reset_usage_stats()[source]#: Resets the usage statistics.

set_flop_counting(mode='manual')[source]#

Enable or disable FLOP counting.

Parameters:: mode (Literal['manual', 'none']) – The method to use for counting FLOPs. Options: -> “manual”: Uses a ManualFlopCounter that estimates FLOPs based on token counts and model architecture (follows Kaplan et al. 2020). Requires the model to expose an inner HuggingFace PreTrainedModel (via _hf_model on HF backends). This is the default. -> “none”: Disables FLOP counting.

Note: - FLOP counting will appear in get_usage_stats() under "total_flops". - Since the optimizer logs all entries under model.get_usage_stats(), this makes self.log() automatically include FLOP counts in all optimizer logs, without needing to explicitly log it in each optimizer method. - FLOPs are counted at the model invoke_from_tokens / invoke_from_texts level only – the cost of optimizer-internal and loss-internal computation (e.g. candidate sampling, sorting, Gumbel draws) is knowingly excluded.

class tropt.model.LMBaseModel(model_name)[source]#

Bases: BaseModel

Language model base class.

Parameters:: model_name (str)

abstract invoke_from_texts(input_texts, message_targets=None, require_target_prefill=False, require_generation=False, **kwargs)[source]#

Generates text completions for the given input texts.

Parameters:

input_texts (List[str]) – List of input strings.
message_targets (Optional[MessageTargets]) – Targets for the messages.
require_target_prefill (bool) – Whether to prefill the target response from message_targets, and return the corresponding logits (e.g., for LMs).
require_generation (bool) – Whether to perform autoregressive generation after the forward pass (for LMs).

Return type:

ModelOutput

Always returns ModelOutput with at least generated_response_strs populated. This method also updates the usage stats (e.g., token counts, forward call counts, etc.). It must call _update_invoke_stats after each raw model call.

class tropt.model.EncoderBaseModel(model_name)[source]#

Bases: BaseModel

Encoder model base class.

Parameters:: model_name (str)

abstract property d_model: int#: Returns the dimensionality of the output embeddings.

abstract invoke_from_texts(input_texts, **kwargs)[source]#

Computes encoder embeddings for the given input texts. Always returns ModelOutput with at least output_embeddings populated. This method also updates the usage stats (e.g., token counts, forward call counts, etc.). It must call _update_invoke_stats after each raw model call.

Return type:: ModelOutput
Parameters:: input_texts (List[str])

class tropt.model.BaseTokenizer[source]#

Bases: ABC

Abstract base class for tokenizers to ensure a unified interface compatible with Hugging Face-style usage.

abstract batch_decode(ids, **kwargs)[source]#

Converts a batch of token IDs back to a list of strings.

Return type:: List[str]
Parameters:: ids (List[int] | List[List[int]] | Tensor)

abstract decode(ids, **kwargs)[source]#

Converts token IDs back to a string.

Return type:: str
Parameters:: ids (int | List[int] | Tensor)

decode_trigger(trigger_ids)[source]#

Decode a 1-D trigger ids tensor -> string (special tokens skipped).

Return type:: str
Parameters:: trigger_ids (Int[Tensor, 'trigger_seq_len'])

decode_triggers(trigger_ids)[source]#

Batch-decode a 2-D trigger ids tensor -> list of strings (special tokens skipped).

Return type:: List[str]
Parameters:: trigger_ids (Int[Tensor, 'bsz trigger_seq_len'])

abstract encode(text, **kwargs)[source]#

Converts a string to token IDs.

Return type:: List[int]
Parameters:: text (str | List[str])

encode_trigger(trigger_str)[source]#

Encode a trigger string -> 1-D tensor (no special tokens).

Return type:: Int[Tensor, 'trigger_seq_len']
Parameters:: trigger_str (str)

abstract property name_or_path: str#

abstract property vocab_size: int#: Returns the size of the vocabulary.

Model Mixins#

Model Mixins provide specific functionalities or access levels to the models. They define methods that models must implement to support various text optimization processes. For example, LossTokenAccessMixin defines methods for computing loss based on token-level inputs. This modular approach allows for flexible composition of model capabilities.

Token Mixins#

Token mixins define model interactions at the token level.

class tropt.model.TokenAccessMixin[source]#

Bases: ABC

Mixin for models that have a tokenizer and can prepare token-level inputs.

This is the base mixin for any model with token-level access (tokenizer, set/reset inputs). Note that such models may not have access to _compute_ the loss from tokens (see LossTokenAccessMixin), but they must be able to at least prepare the token inputs (e.g., OpenAI models).

reset_inputs_from_tokens()[source]#

Clear self._token_input_manager.

Return type:: None

abstract set_inputs_from_tokens(templates, targets=None)[source]#

Prepare and store the inputs manager as self._token_input_manager.

Parameters:

templates (List[str]) – List of text templates containing the trigger placeholder.
targets (Optional[Targets]) – Optional targets for the loss function.

Return type:

None

abstract property tokenizer: BaseTokenizer#: Force the class using this mixin to implement a tokenizer. This tokenizer implement API defined by BaseTokenizer, which matches the main functionality of HuggingFace tokenizer.

property vocab_size: int#

class tropt.model.LossTokenAccessMixin[source]#

Bases: InvokeTokenAccessMixin

Mixin for models that can compute losses based on token-level inputs.

abstract compute_loss_from_tokens(candidate_trigger_ids, **kwargs)[source]#

Compute the loss on the stored token inputs with the given trigger merged in.

Return type:: Float[Tensor, 'n_templates n_candidates']
Parameters:: candidate_trigger_ids (Float[Tensor, 'n_candidates trigger_seq_len'])

class tropt.model.LogitsTokenAccessMixin[source]#

Bases: InvokeTokenAccessMixin

Mixin for models that can compute logits based on token-level inputs.

abstract compute_logits_from_tokens(candidate_trigger_ids, **kwargs)[source]#

Compute logits w.r.t. trigger tokens that are merged into stored token inputs.

Return type:: Float[Tensor, 'trigger_seq_len vocab_size']
Parameters:: candidate_trigger_ids (Float[Tensor, 'n_candidates trigger_seq_len'])

class tropt.model.GradientTokenAccessMixin[source]#

Bases: InvokeTokenAccessMixin

Mixin for models that can compute gradients based on token-level inputs.

abstract compute_grad_from_tokens(candidate_trigger_ids, **kwargs)[source]#

Compute gradients w.r.t. trigger tokens that are merged into stored token inputs.

Return type:: Float[Tensor, 'trigger_seq_len vocab_size']
Parameters:: candidate_trigger_ids (Float[Tensor, 'n_candidates trigger_seq_len'])

Text Mixins#

Text mixins define model interactions at the text level.

class tropt.model.TextAccessMixin[source]#

Bases: ABC

reset_inputs_from_texts()[source]#

Clear the stored text input manager.

Return type:: None

set_inputs_from_texts(templates, targets=None)[source]#

Prepare and store the text-based inputs manager.

Return type:

None

Parameters:

templates (Annotated[List[str], FieldInfo(annotation=NoneType, required=True, metadata=[MinLen(min_length=1)]), 'n_templates'])
targets (Targets | None)

class tropt.model.LossTextAccessMixin[source]#

Bases: TextAccessMixin

Mixin for models that compute losses based on text-level inputs (black-box access).

compute_loss_from_texts(candidate_trigger_strs, loss_func, keep_message_dim=False)[source]#

Computes the loss on all candidate string texts using the stored text inputs manager. This computation is based on the invoke_from_texts() method of the model, and the information it provides.

Return type:

Float[Tensor, 'n_candidates']

Parameters:

candidate_trigger_strs (List[str])
loss_func (BaseLoss)
keep_message_dim (bool)

Input Managers#

Input Managers are responsible for streamlining the repeated combination of new triggers into text templates. They depend on the input type and are strongly linked to the model’s key methods. For instance, LMHFTokenInputManager specializes in combining trigger tokens within user text templates and providing them as model input for loss computation.

class tropt.model.inputs_manager.DefaultTokenInputManager(tokenizer, templates_ids, targets=None, **kwargs)[source]#

Bases: TokenInputManager

Default token-level inputs manager for models with token-level access.

This implementation works with any tokenizer supporting the BaseTokenizer interface (or HuggingFace PreTrainedTokenizer). It decodes trigger token IDs to strings and reconstructs full texts — suitable for API-based models or any model where embedding-level manipulation is not needed.

Parameters:

tokenizer (Any)
templates_ids (List[List[int]])
targets (Targets)

get_triggered_inputs(chosen_template_idx, trigger_ids, **kwargs)[source]#

Constructs full text inputs by decoding candidate trigger tokens and inserting them into the templates.

Return type:

ModelInput

Parameters:

chosen_template_idx (int)
trigger_ids (Int[Tensor, 'n_candidates trigger_seq_len'])

property vocab_size#

class tropt.model.inputs_manager.InputsManager(templates, targets)[source]#

Bases: ABC

Base class for maintaining the input template, corresponding targets, and the method for injecting triggers into the inputs. This class wraps n_templates templates (that contain the substring OPTIMIZED_TRIGGER_PLACEHOLDER as a trigger placeholder) and targets, and provides a unified interface for different types of inputs (e.g., text-based, token-based) used in adversarial trigger optimization.

Parameters:

templates (TextTemplates)
targets (Targets)

abstract get_triggered_inputs(chosen_template_idx, *args, **kwargs)[source]#

Returns the trigger-combined model inputs, for the specified template index.

Parameters:

chosen_template_idx (int) – Index of the template to use for generating the inputs.
... (... args for receiving the trigger candidates)

Return type:

ModelInput

Returns:

A ModelInput object containing the crafted triggered-combined inputs, which includes the corresponding targets for the specified template.

class tropt.model.inputs_manager.TextInputManager(templates, targets=None)[source]#

Bases: InputsManager

Class for maintaining text-based trigger-combined inputs (fits black-box text-level query access). Instances of this class store n_templates templates and targets, and provide the method get_triggered_inputs to combine them with given trigger strings.

Parameters:

templates (TextTemplates)
targets (Targets)

after_texts: Annotated[List[str]]#

before_texts: Annotated[List[str]]#

get_triggered_inputs(chosen_template_idx, trigger_strs)[source]#

Returns a list of inputs with the given trigger strings merged in. The list is two-dimensional: outer list over templates, inner list over trigger variations; also, returns the corresponding targets.

Given chosen_template_idx, returns only the inputs for that template (1D list), and the corresponding targets.

Return type:

ModelInput

Parameters:

chosen_template_idx (int)
trigger_strs (Annotated[List[str], 'n_candidates'])

property n_templates: int#

targets: Targets#

class tropt.model.inputs_manager.TokenInputManager(templates, targets)[source]#

Bases: InputsManager

Abstract base class for token-level inputs managers.

Subclasses manage the combination of candidate triggers into tokenized templates.

Parameters:

templates (TextTemplates)
targets (Targets)

n_templates: int#

targets: Targets#

tokenizer: Any#

Model Implementations#

class tropt.model.CLIPTextEncoderHFModel(model_name, forward_pass_batch_size=512, backward_pass_batch_size=28, without_final_projection=False, device=None, dtype=None, set_model_to_train=False, **kwargs)[source]#

Bases: HuggingFaceBackendModel, EncoderBaseModel, LossTokenAccessMixin, GradientTokenAccessMixin, GradientEmbedAccessMixin, LossTextAccessMixin

Wrapper for the text encoder of OpenAI CLIP models from HuggingFace.

Implementation note:: CLIP’s text encoder does not accept input embeddings, so we reimplement its forward pass to support them. This is hacky, as we repeat logic from Transformers’s CLIP’s Modeling file, but necessary for supporting grad-based / soft-token-based optimization. Other discrete-optimizer implementations have turned to similar solutions, e.g., PEZ (Wen et al. 2023), that forked OpenCLIP (YuxinWenRick/hard-prompts-made-easy).

Parameters:

model_name (str)
forward_pass_batch_size (int)
backward_pass_batch_size (int)
without_final_projection (bool)
device (Optional[str])
dtype (Optional[Union[str, torch.dtype]])
set_model_to_train (bool)

property d_model: int#: Returns the dimensionality of the output embeddings.

invoke_from_texts(input_texts, **kwargs)[source]#

Encode texts into the CLIP embedding space.

Return type:: ModelOutput
Parameters:: input_texts (Annotated[List[str], 'n_texts'])

invoke_from_tokens(input_embeds, input_attention_mask=None, count_backward=False, **kwargs)[source]#

White-box forward pass through the text encoder using input embeddings.

Return type:

ModelOutput

Parameters:

input_embeds (Float[Tensor, 'bsz seq_len d_text'])
input_attention_mask (Float[Tensor, 'bsz seq_len'] | None)
count_backward (bool)

set_inputs_from_tokens(templates, targets=None)[source]#

Prepare and store the given templates in the inputs manager.

Return type:

None

Parameters:

templates (Annotated[List[str], FieldInfo(annotation=NoneType, required=True, metadata=[MinLen(min_length=1)]), 'n_templates'])
targets (Targets | None)

class tropt.model.ClassifierBaseModel(model_name)[source]#

Bases: BaseModel

Classifier model base class.

Parameters:: model_name (str)

abstract invoke_from_texts(input_texts, **kwargs)[source]#

Compute classification logits for the given input texts. Always returns ModelOutput with at least output_class_logits populated. This method also updates the usage stats. It must call _update_invoke_stats after each raw model call.

Return type:: ModelOutput
Parameters:: input_texts (List[str])

abstract property n_classes: int#: Returns the number of output classes.

class tropt.model.ClassifierHFModel(model_name=None, device=None, dtype=None, forward_pass_batch_size=512, backward_pass_batch_size=28, loaded_model=None, set_model_to_train=False, **kwargs)[source]#

Bases: HuggingFaceBackendModel, ClassifierBaseModel, LossTokenAccessMixin, GradientTokenAccessMixin, GradientEmbedAccessMixin, LossTextAccessMixin

HuggingFace sequence classification model wrapper (for models loadable with AutoModelForSequenceClassification).

Parameters:

model_name (Optional[str])
device (Optional[str])
dtype (Optional[str | torch.dtype])
forward_pass_batch_size (int)
backward_pass_batch_size (int)
set_model_to_train (bool)

property id2label: Dict[int, str]#

invoke_from_texts(input_texts, **kwargs)[source]#

Compute classification logits for the given input texts. Always returns ModelOutput with at least output_class_logits populated. This method also updates the usage stats. It must call _update_invoke_stats after each raw model call.

Return type:: ModelOutput
Parameters:: input_texts (List[str])

invoke_from_tokens(input_embeds, input_attention_mask=None, count_backward=False, **kwargs)[source]#

Performs a forward pass with the given token-based model input. Forward pass is expected to be done on input_embeds.

Parameters:

input_embeds (Float[Tensor, 'bsz seq_len d_model']) – Float[Tensor, “bsz seq_len d_model”] the input embeddings with the trigger merged in; if provided, used instead of any other potential input.
input_attention_mask (Optional[Int[Tensor, 'bsz seq_len']]) – Optional[Int[Tensor, “bsz seq_len”]] = None the attention mask matching the input embeddings
require_target_prefill – bool whether to prefill the target response, and return the corresponding logits (e.g., for LMs).
require_generation – bool whether to perform autoregressive generation after the forward pass (for LMs).
require_hidden_states – bool whether to return the hidden states from the model output.
require_attentions – bool whether to return the attention weights from the model output.
count_backward (bool) – bool whether this forward pass will be back-propagated through (set by gradient methods). Could be used by FLOP counters.

Return type:

ModelOutput

Returns:

ModelOutput: the model output containing logits, embeddings, attentions, etc.

property n_classes: int#: Returns the number of output classes.

set_inputs_from_tokens(templates, targets=None)[source]#

Prepare and store the inputs manager as self._token_input_manager.

Parameters:

templates (List[str]) – List of text templates containing the trigger placeholder.
targets (Optional[Targets]) – Optional targets for the loss function.

Return type:

None

class tropt.model.EncoderGeminiModel(model_name='gemini-embedding-001', d_model=3072, use_vertex=False, project=None, location='us-central1', default_text_type=None, **kwargs)[source]#

Bases: EncoderBaseModel, LossTextAccessMixin

Google Gemini Encoder model wrapper, with text-query access. https://ai.google.dev/gemini-api/docs/embeddings

Parameters:

d_model (int)
use_vertex (bool)
project (Optional[str])
location (str)
default_text_type (Optional[str])

property d_model: int#: Returns the dimensionality of the output embeddings.

invoke_from_texts(input_texts, text_type=None, **kwargs)[source]#

Generates embeddings for the given texts using the Gemini API.

Parameters:

input_texts (List[str]) – A list of strings to embed.
text_type (Optional[str]) – The type of text (e.g., “document” or “query”) to guide the embedding generation.

Return type:

ModelOutput

Returns:

A ModelOutput containing the generated embeddings.

class tropt.model.EncoderHFModel(model_name=None, device=None, dtype=None, forward_pass_batch_size=512, backward_pass_batch_size=28, loaded_model=None, set_model_to_train=False, **kwargs)[source]#

Bases: HuggingFaceBackendModel, EncoderBaseModel, LossTokenAccessMixin, GradientTokenAccessMixin, GradientEmbedAccessMixin, LossTextAccessMixin

Parameters:

model_name (Optional[str])
device (Optional[str])
dtype (Optional[str | torch.dtype])
forward_pass_batch_size (int)
backward_pass_batch_size (int)
loaded_model (Optional[SentenceTransformer])
set_model_to_train (bool)

property d_model#: Returns the dimensionality of the output embeddings.

invoke_from_texts(input_texts, **kwargs)[source]#

Get the embeddings for the given texts (n_texts elements). Note: we mostly assume any prompting/instruction will be applied before the call to this function.

Return type:: ModelOutput
Parameters:: input_texts (Annotated[List[str], 'n_texts'])

invoke_from_tokens(input_embeds, input_attention_mask=None, count_backward=False, **kwargs)[source]#

Perform a white-box forward pass through the model using input embeddings.

Parameters:

input_embeds (Float[Tensor, 'bsz seq_len d_model']) – Input embeddings tensor (bsz, seq_len, d_model).
input_attention_mask (Optional[Float[Tensor, 'bsz seq_len']]) – Attention mask tensor (bsz, seq_len).
count_backward (bool) – Whether this forward pass will be back-propagated through.

Returns:

The output from the model.

Return type:

ModelOutput

set_inputs_from_tokens(templates, targets=None)[source]#

Prepare and store the given templates in the inputs manager.

Return type:

None

Parameters:

templates (Annotated[List[str], FieldInfo(annotation=NoneType, required=True, metadata=[MinLen(min_length=1)]), 'n_templates'])
targets (Targets | None)

class tropt.model.EncoderOpenAIModel(model_name='text-embedding-3-small', d_model=None, api_key=None, base_url=None, **kwargs)[source]#

Bases: EncoderBaseModel, LossTextAccessMixin, TokenAccessMixin

OpenAI Encoder model wrapper for embedding generation via the OpenAI API. https://platform.openai.com/docs/guides/embeddings

Parameters:

model_name (str)
d_model (Optional[int])
api_key (Optional[str])
base_url (Optional[str])

property d_model#: Returns the dimensionality of the output embeddings.

invoke_from_texts(input_texts, **kwargs)[source]#

Compute embeddings for the given texts using the OpenAI API.

Parameters:: input_texts (List[str]) – A list of strings to embed.
Return type:: ModelOutput
Returns:: ModelOutput with output_embeddings populated.

set_inputs_from_tokens(templates, targets=None)[source]#

Prepares and stores the inputs manager from raw texts.

Return type:

None

Parameters:

templates (Annotated[List[str], FieldInfo(annotation=NoneType, required=True, metadata=[MinLen(min_length=1)]), 'n_templates'])
targets (Targets | None)

property tokenizer: OpenAITokenizer#: Force the class using this mixin to implement a tokenizer. This tokenizer implement API defined by BaseTokenizer, which matches the main functionality of HuggingFace tokenizer.

property vocab_size: int#

class tropt.model.EncoderVoyageModel(model_name='voyage-4', d_model=1024, **kwargs)[source]#

Bases: EncoderBaseModel, LossTextAccessMixin

Voyage AI Encoder model wrapper, with text-query access. https://docs.voyageai.com/docs/embeddings

Parameters:

model_name (str)
d_model (int)

property d_model: int#: Returns the dimensionality of the output embeddings.

invoke_from_texts(input_texts, text_type=None, **kwargs)[source]#

Generates embeddings for the given texts using the Voyage API.

Parameters:

input_texts (List[str]) – A list of strings to embed.
text_type (Optional[str]) – The type of text (e.g., “document” or “query”) to guide the embedding generation.

Return type:

ModelOutput

Returns:

A ModelOutput containing the generated embeddings.

class tropt.model.GradientEmbedAccessMixin[source]#

Bases: InvokeTokenAccessMixin

Mixin for models that can compute gradients from the input embeddings based on token-level inputs.

abstract compute_grad_from_embeds(loss_func, candidate_trigger_embeds)[source]#

Compute gradients w.r.t. trigger embeddings using stored token inputs.

Return type:

Float[Tensor, 'n_candidates trigger_seq_len embed_dim']

Parameters:

loss_func (BaseLoss)
candidate_trigger_embeds (Float[Tensor, 'n_candidates trigger_seq_len embed_dim'])

class tropt.model.HFTokenizerWrapper(tokenizer)[source]#

Bases: BaseTokenizer

Wraps a HuggingFace PreTrainedTokenizerBase, exposing it as a BaseTokenizer.

All attributes not defined here are transparently forwarded to the underlying HF tokenizer, so existing code that accesses tokenizer internals (padding_side, apply_chat_template, etc.) continues to work unchanged.

batch_decode(ids, **kwargs)[source]#

Converts a batch of token IDs back to a list of strings.

Return type:: List[str]

decode(ids, **kwargs)[source]#

Converts token IDs back to a string.

Return type:: str

encode(text, **kwargs)[source]#

Converts a string to token IDs.

Return type:: List[int]

property name_or_path: str#

property vocab_size: int#: Returns the size of the vocabulary.

class tropt.model.InvokeTokenAccessMixin[source]#

Bases: TokenAccessMixin

Mixin for models that can perform a forward pass from token-level inputs.

Adds the abstract invoke_from_tokens method. All compute-* mixins (LossTokenAccessMixin, GradientTokenAccessMixin, etc.) inherit from this.

abstract invoke_from_tokens(input_ids=None, message_targets=None, require_target_prefill=False, require_generation=False, **kwargs)[source]#

Perform a forward pass from token-level (embedding) inputs.

Parameters:

input_ids (Optional[Float[Tensor, 'bsz seq_len']]) – Token IDs of the full input sequence (incl. trigger), plus optionally target tokens. Shape: (batch_size, seq_len).
message_targets (Optional[MessageTargets]) – Optional MessageTargets object containing the targets for the messages.
require_target_prefill (bool) – Whether to prefill the target response from message_targets, and return the corresponding logits (e.g., for LMs).
require_generation (bool) – Whether to perform autoregressive generation after the forward pass (for LMs).

Return type:

ModelOutput

Returns:

ModelOutput with the fields this model can provide.

class tropt.model.LMHFModel(model_name, device=None, dtype=None, forward_pass_batch_size=1024, backward_pass_batch_size=32, use_prefix_cache=True, set_model_to_train=False, use_eager_attention=False, loaded_model=None, chat_template_kwargs=None, **model_kwargs)[source]#

Bases: HuggingFaceBackendModel, LMBaseModel, LossTokenAccessMixin, GradientTokenAccessMixin, LogitsTokenAccessMixin, GradientEmbedAccessMixin, LossTextAccessMixin

Parameters:

model_name (str)
device (Optional[str])
dtype (Optional[str])
forward_pass_batch_size (int)
backward_pass_batch_size (int)
use_prefix_cache (bool)
set_model_to_train (bool)
use_eager_attention (bool)
loaded_model (Optional[AutoModelForCausalLM])
chat_template_kwargs (Optional[Dict[str, Any]])

compute_logits_from_tokens(candidate_trigger_ids, keep_message_dim=False, return_trigger_logits_only=False, return_after_trigger_logits_only=False)[source]#

Given a batch of candidate trigger token ids and inputs object, returns the logits for the next token after the input sequence (i.e., after the trigger + input text + target text, if provided).

Parameters:

candidate_trigger_ids (Int[Tensor, 'n_candidates trigger_seq_len']) – Tensor, shape = (n_candidates, trigger_seq_len) the token ids of the candidate trigger sequences to evaluate
inputs – LMHFTokenInputManager the inputs object containing the input text and target text (if provided)
return_slices – bool whether to return the slices corresponding to each input in the batch (default: False)
keep_message_dim (bool) – bool whether to keep the message dimension in the output logits (default: False)
return_trigger_logits_only (bool) – bool whether to return only the logits corresponding to the trigger tokens (default: False)
return_after_trigger_logits_only (bool) – bool whether to return only the logits corresponding to predicting the next token after trigger (default: False)

Return type:

Union[Float[Tensor, 'n_templates n_candidates seq_len vocab_size'], Tuple[Float[Tensor, 'n_templates n_candidates seq_len vocab_size'], List[slice]]]

invoke_from_texts(input_texts=None, message_targets=None, greedy_decode=True, max_new_tokens=128, require_target_prefill=False, require_generation=True, require_first_token_logprobs=False)[source]#

Generate text completions. Always returns a ModelOutput. - If self.do_prefill_response is True, and the relevant target response prefix is available, the generation starts after the prefilled response, and the returned logits will include the prefilled response portion.

Parameters:

input_texts (Optional[List[str]]) – list of plain-text prompts.
message_targets (Optional[MessageTargets]) – Optional MessageTargets object. Mainly relevant if require_target_prefill is True, in which case the target responses will be prefixed to the model output.
greedy_decode (bool) – Whether to use greedy decoding (vs. sampling) for generation.
max_new_tokens (int) – The maximum number of new tokens to generate.
require_target_prefill (bool) – Whether to prefill the target response in the model input (if provided in message_targets) and return the corresponding logits.
require_generation (bool) – Whether to perform generation. If False, performs only the forward pass.
require_first_token_logprobs (bool) – Whether to return log-probabilities for the top-20 candidates for the first generated token.

Return type:

ModelOutput

invoke_from_tokens(input_embeds=None, input_ids=None, input_attention_mask=None, input_prefix_cache_kwargs=None, input_slices=None, require_target_prefill=False, require_generation=False, require_hidden_states=False, require_attentions=False, require_first_token_logprobs=False, count_backward=False, max_new_tokens=128, greedy_decode=True, **kwargs)[source]#

Performs a forward pass through the model given input embeddings and attention mask.

Parameters:

input_embeds (Optional[Float[Tensor, 'bsz seq_len embd_dim']]) – Input embeddings tensor of shape (bsz, seq_len, embd_dim). Primary input.
input_ids (Optional[Int[Tensor, 'bsz seq_len']]) – Token IDs tensor of shape (bsz, seq_len). Used as a fallback when input_embeds is not provided; embedded internally via the model’s embedding layer.
input_attention_mask (Optional[Float[Tensor, 'bsz seq_len']]) – Attention mask tensor of shape (bsz, seq_len).
input_prefix_cache_kwargs (Optional[Dict[str, Any]]) – Optional dict of prefix cache kwargs to pass to the model.
input_slices (Optional[Dict[str, slice]]) – Optional dict mapping slice keys to slices for extracting specific parts of the output.
require_target_prefill (bool) – Whether the input includes a prefixed target, of which indices are marked by the input_slices, and we should extract it logits.
require_generation (bool) – Whether to perform generation, in addition to forward pass.
require_hidden_states (bool) – Whether to return hidden states in the output.
require_attentions (bool) – Whether to return attentions in the output.
require_first_token_logprobs (bool) – Whether to return log-probabilities for the top-20 candidates for the first generated token.
count_backward (bool) – Whether this forward pass will be back-propagated through (set by gradient methods).
max_new_tokens (int)
greedy_decode (bool)

Returns:

The output of the model containing logits, hidden states, and attentions as applicable.

Return type:

ModelOutput

set_inputs_from_texts(templates, targets=None)[source]#: Prepare and store the text-based inputs manager.

set_inputs_from_tokens(templates, targets=None)[source]#

Prepares and stores the inputs manager for the model, including tokenization and target processing.

Parameters:

templates (List[str]) – List of input templates containing the trigger placeholder.
targets (Optional[Targets]) – Optional Targets object containing target response strings to optimize towards.

Return type:

None

class tropt.model.LMHFTokenInputManager(tokenizer, device, templates_ids, embed_func, targets, use_prefix_cache=False, model=None)[source]#

Bases: HuggingFaceTokenInputManager

Parameters:

tokenizer (transformers.PreTrainedTokenizerBase)
device (torch.device)
templates_ids (Annotated[List[List[int]], 'n_templates seq_len'])
embed_func (Module)
targets (Targets)
use_prefix_cache (Optional[bool])
model (Optional[transformers.PreTrainedModel])

get_triggered_inputs(do_append_embeds=False, **kwargs)[source]#

Returns the input embeddings with the given trigger merged in for a specific message.

Notes: - We do not support varying trigger lengths in the same candidate batch

(they must share trigger_seq_len).

for specific use cases, the following method is suboptimal; however,
currently generality and support for different input types/shapes are prioritized.
Allows gradient flow through trigger_embeds, which can be useful for combining backporable trigger
candidates (e.g., for compute_grad_from_*() methods).

Parameters:

trigger_ids – Tensor, shape = (n_candidates, trigger_seq_len) the token ids of the trigger(s) to insert. If trigger_embeds is also provided, this is used for reference only.
trigger_embeds – Tensor, shape = (n_candidates, trigger_seq_len, embd_dim) an optional alternative to trigger_ids, where the trigger embeddings are provided directly (useful for gradient computation). If provided, it is used for input computation instead of trigger_ids.
append_embeds – n_templates-long List of tensors, each of shape = (n_app_ids, embd_dim) optional embeddings to append at the end of each message (e.g., for planting response in LMs)
do_append_embeds (bool) – If True, the provided append_embeds will be used and appended at the end of the input.
chosen_template_idx – int (required) the index of the message to process. Must be provided; multi-message is not supported by this method.

Returns: A ModelInput object containing:

input_trigger_ids: Tensor, shape = (n_candidates, trigger_seq_len)
the token ids of the trigger(s) inserted (detached from grad graph; for reference)
inputs_embeds: Tensor, shape = (n_candidates, seq_len, embd_dim)
the input embeddings with the trigger merged in; if the provided input_embds required grad, then this tensor will also require grad.
attention_mask: Tensor, shape = (n_candidates, seq_len)
the attention mask matching the input embeddings
input_prefix_cache_kwargs: Dict, optional
the kwargs to pass to the model forward pass for using the prefix cache on chosen_template_idx, if applicable.
message_targets: MessageTargets
the targets dict for the chosen message, expanded to match n_candidates dimension

targets: Targets#: optioanlly includes target_response_toks (n_templates, target_seq_len) if target outputs are provided; these are used to prefill the response per message

class tropt.model.LiteLLMModel(model_name, base_url=None, api_key=None, system_prompt=None, max_concurrent_requests=20, using_litellm_proxy=False, **client_kwargs)[source]#

Bases: LMBaseModel, LossTextAccessMixin

Model wrapper using LiteLLM as a unified interface to LLM providers.

By default operates in library mode: LiteLLM routes requests directly to the provider based on the model name prefix (e.g. "openai/gpt-4o" -> OpenAI, "anthropic/claude-3.5-sonnet" -> Anthropic). API keys are read from environment variables (OPENAI_API_KEY, etc.).

Alternatively, set using_litellm_proxy=True and base_url to connect via a running LiteLLM proxy server (litellm --port 4000), which is mostly used for centralized key management or shared server setups.

Parameters:

model_name (str)
base_url (Optional[str])
api_key (Optional[str])
system_prompt (Optional[str])
max_concurrent_requests (int)
using_litellm_proxy (bool)

invoke_from_texts(input_texts, max_new_tokens=128, temperature=0.0, message_targets=None, require_generation=False, require_target_prefill=False, require_first_token_logprobs=False, **kwargs)[source]#

Generates text completions for the given input texts using parallel execution.

Parameters:

input_texts (List[str]) – List of input strings.
require_generation (bool) – Whether to perform generation.
max_new_tokens (int) – Maximum number of tokens to generate. Relevant when require_generation=True.
temperature (float) – Sampling temperature. Relevant when require_generation=True.
require_first_token_logprobs (bool) – Whether to return log-probabilities for the first generated token. Default is False.
require_target_prefill (bool) – Whether to prefill the target response. Currently not supported in this class and will raise an error.
**kwargs – Additional arguments to pass to the litellm completion call.

Return type:

ModelOutput

Returns:

ModelOutput containing the generated response strings and optionally the first-token logprobs.

property tokenizer: BaseTokenizer#

property vocab_size: int#

class tropt.model.ManualFlopCounter(model)[source]#

Bases: FlopCounterBase

Track FLOPs using Kaplan et al. (2020) approximation.

FLOPs_fwd ≈ 2 · N_params · n_tokens FLOPs_bwd ≈ 4 · N_params · n_tokens

For MoE models, N_params is the active parameter count (shared params + expert params scaled by top-k / num_experts).

Note that this code may require adaptation once new models come out (e.g., MOE with slightly different API than it currently supports).

Parameters:: model (PreTrainedModel)

count_backward(n_tokens)[source]#

Count backward-pass FLOPs for a given number of tokens.

Return type:: int
Parameters:: n_tokens (int)

count_forward(n_tokens)[source]#

Count forward-pass FLOPs for a given number of tokens.

Return type:: int
Parameters:: n_tokens (int)

count_forward_backward(n_tokens)[source]#

Count combined forward+backward FLOPs for a given number of tokens.

Return type:: int
Parameters:: n_tokens (int)