Optimizers#

Optimizer Interface#

class tropt.optimizer.BaseOptimizer(model, loss=None, tracker=None, seed=None)[source]#

Bases: ABC

Base class for all trigger optimizers.

Implements common functionality and interface for optimizers, including tracking.
Subclasses must implement the optimize_trigger method, which contains the core optimization loop and returns an OptimizerResult; this method is automatically wrapped to handle logging, model state resets, and tracker finalization.

Parameters:

model (BaseModel)
loss (Optional[BaseLoss])
tracker (Optional[BaseTracker])
seed (Optional[int])

log(loss, trigger_str=None, **extra)[source]#

Log per-step metrics to the tracker.

Automatically enriches with:

best_loss: running best loss across steps.
loss/*: loss function component stats.
target_model_stats/*, total_models_stats/*: model usage stats (by inspecting all optimizer attributes that subclass BaseModel).

Parameters:

loss (float) – Per-step loss value.
trigger_str (Optional[str]) – Current trigger string (omitted from log dict if None).
**extra – Any additional key-value pairs to include in the log dict.

Return type:

None

model_requirements = ()#

Tuple of model mixin classes the primary model must satisfy; validated in __init__.

Convention: declare the least-restrictive configuration the optimizer supports.

Black-box only -> (LossTextAccessMixin,), even if gradient modes exist.
Always needs token-level loss → include LossTokenAccessMixin.
Always needs gradients → include GradientTokenAccessMixin or GradientEmbedAccessMixin.

Notes: - Requirements that only occur in an optional flow (e.g., candidate_selection="gradient" requiring GradientTokenAccessMixin) must be validated explicitly in __init__ after super().__init__(), with an assert/error. - Requirements on auxiliary models (proxy_model, util_model, etc.) are not covered here, and should also be validated explicitly in __init__.

abstract optimize_trigger(templates, initial_trigger=None, targets=None)[source]#

Optimize the trigger to minimize the loss on the given inputs.

Subclasses only implement the search loop and return an OptimizerResult.

Parameters:

templates (List[str]) – Can be a single string or a list of (n_templates) strings.
initial_trigger (Optional[str]) – Initial trigger to start optimization from, if used by the optimizer.
targets (Optional[Targets]) – Target outputs for the given inputs, if applicable.

Return type:

OptimizerResult

Returns:

Optimized trigger.

Note

This method is wrapped by the baseclass (via __init_subclass__) to handle the full tracker lifecycle: tracker.init(config) before, tracker.finish(summary) after, model state resets, and other bookeeping.

The optimization loop should iterate via track_steps(), which handles tqdm progress bar and enforces any budget upper-bound configured via set_budget().

reset_budget()[source]#: Clears any budget set by set_budget().

set_budget(limit, metric='total_tokens', scope='all')[source]#

Registers an upper-bound resource budget enforced by track_steps().

The budget is a ceiling, not a quota: if the optimizer terminates naturally before reaching it, the budget has no effect.

Common metrics (keys of BaseModel.get_usage_stats()): - "total_flops": Estimated FLOPs consumed. Requires model.set_flop_counting("manual") on any model whose FLOPs should count. Best choice for white-box compute-equalised comparisons. - "total_tokens": Total tokens processed (prompt + generation). Best for black-box models where FLOPs aren’t observable but token usage is.

Parameters:

limit (int) – Integer upper bound on the metric.
metric (str) – The metric the budget is set by. Defaults to the token usage count.
scope (str) – What models to take the metric against. In optimizers that accomodate multiple models (e.g., proxy models), this may be critical choice. "all", sums the metric across all models found on self. "target" only considers the primary target model (self.model), which is useful if we only care about the target model API token usage.

Return type:

None

Usage: ```python # Whitebox: cap compute by FLOPs (across target + any proxy LM) optimizer = GCGOptimizer(model=model_obj, loss=PrefillCELoss(), num_steps=10_000) optimizer.set_budget(1e17, metric=”total_flops”)

# Blackbox: cap by target-model tokens (FLOPs aren’t observable on API models) optimizer = RandomSearchOptimizer(model=model_obj, loss=PrefillCELoss(), num_steps=10_000) optimizer.set_budget(1_000_000, metric=”total_tokens”, scope=”target”) ```

track_steps(*args, **kwargs)[source]#

Iterator for the optimization loop that handles progress bar and budget enforcement.

This supplement the optimziation loop with: - a tqdm progress bar (args/kwargs forwarded) that log()

calls auto-updates with the current loss and trigger string, and

early termination when the budget set via set_budget() is hit.

The budget is checked at the top of each step, so overshoot is bounded by one step’s work. Without a budget set, behaves like plain tqdm.

Note: If you implement a personal-use custom optimizer for quick check, and don’t care for: fancy progress bar / budget, you may safely ignore this.

Usage:

for _ in self.track_steps(range(self.num_steps), desc="MyOpt"):
    ...

Optimizers Implementations#

class tropt.optimizer.ARCAOptimizer(model, loss, tracker=None, seed=None, num_steps=500, n_candidates=512, sample_topk=256, token_constraints=TokenConstraints(disallow_non_ascii=True, disallow_special_tokens=True, disallow_unused_tokens=True, disallow_custom_token_ids=[]), use_retokenize=True, n_grad_avg=32)[source]#

Bases: BaseOptimizer

Gradient-based cyclic coordinate descent (Jones et al., 2023).

Each step advances to the next trigger position (cyclically), averages gradients over multiple random-token perturbations at that position, then evaluates all top-k candidates there.

Reference: https://arxiv.org/abs/2303.04381

Parameters:

model (BaseModel)
loss (BaseLoss)
tracker (Optional[BaseTracker])
seed (Optional[int])
num_steps (int)
n_candidates (int)
sample_topk (int)
token_constraints (TokenConstraints)
use_retokenize (bool)
n_grad_avg (int)

model_requirements = (<class 'tropt.model.model_mixins.LossTokenAccessMixin'>, <class 'tropt.model.model_mixins.GradientTokenAccessMixin'>)#

Tuple of model mixin classes the primary model must satisfy; validated in __init__.

Convention: declare the least-restrictive configuration the optimizer supports.

Black-box only -> (LossTextAccessMixin,), even if gradient modes exist.
Always needs token-level loss → include LossTokenAccessMixin.
Always needs gradients → include GradientTokenAccessMixin or GradientEmbedAccessMixin.

Notes: - Requirements that only occur in an optional flow (e.g., candidate_selection="gradient" requiring GradientTokenAccessMixin) must be validated explicitly in __init__ after super().__init__(), with an assert/error. - Requirements on auxiliary models (proxy_model, util_model, etc.) are not covered here, and should also be validated explicitly in __init__.

optimize_trigger(templates, initial_trigger='! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !', targets=None)[source]#

Optimize the trigger to minimize the loss on the given inputs.

Subclasses only implement the search loop and return an OptimizerResult.

Parameters:

templates (List[str]) – Can be a single string or a list of (n_templates) strings.
initial_trigger (Optional[str]) – Initial trigger to start optimization from, if used by the optimizer.
targets (Optional[Targets]) – Target outputs for the given inputs, if applicable.

Return type:

OptimizerResult

Returns:

Optimized trigger.

Note

This method is wrapped by the baseclass (via __init_subclass__) to handle the full tracker lifecycle: tracker.init(config) before, tracker.finish(summary) after, model state resets, and other bookeeping.

The optimization loop should iterate via track_steps(), which handles tqdm progress bar and enforces any budget upper-bound configured via set_budget().

class tropt.optimizer.AutoPromptOptimizer(model, loss, tracker=None, seed=None, num_steps=500, n_candidates=512, sample_topk=256, token_constraints=TokenConstraints(disallow_non_ascii=True, disallow_special_tokens=True, disallow_unused_tokens=True, disallow_custom_token_ids=[]), use_retokenize=True)[source]#

Bases: BaseOptimizer

Gradient-based discrete prompt optimization (Shin et al., 2020).

Each step picks a single random trigger position and evaluates all gradient-ranked top-k candidate tokens at that position.

Reference: https://arxiv.org/abs/2010.15980

Parameters:

model (BaseModel)
loss (BaseLoss)
tracker (Optional[BaseTracker])
seed (Optional[int])
num_steps (int)
n_candidates (int)
sample_topk (int)
token_constraints (TokenConstraints)
use_retokenize (bool)

model_requirements = (<class 'tropt.model.model_mixins.LossTokenAccessMixin'>, <class 'tropt.model.model_mixins.GradientTokenAccessMixin'>)#

Tuple of model mixin classes the primary model must satisfy; validated in __init__.

Convention: declare the least-restrictive configuration the optimizer supports.

Black-box only -> (LossTextAccessMixin,), even if gradient modes exist.
Always needs token-level loss → include LossTokenAccessMixin.
Always needs gradients → include GradientTokenAccessMixin or GradientEmbedAccessMixin.

Notes: - Requirements that only occur in an optional flow (e.g., candidate_selection="gradient" requiring GradientTokenAccessMixin) must be validated explicitly in __init__ after super().__init__(), with an assert/error. - Requirements on auxiliary models (proxy_model, util_model, etc.) are not covered here, and should also be validated explicitly in __init__.

optimize_trigger(templates, initial_trigger='! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !', targets=None)[source]#

Optimize the trigger to minimize the loss on the given inputs.

Subclasses only implement the search loop and return an OptimizerResult.

Parameters:

templates (List[str]) – Can be a single string or a list of (n_templates) strings.
initial_trigger (Optional[str]) – Initial trigger to start optimization from, if used by the optimizer.
targets (Optional[Targets]) – Target outputs for the given inputs, if applicable.

Return type:

OptimizerResult

Returns:

Optimized trigger.

Note

This method is wrapped by the baseclass (via __init_subclass__) to handle the full tracker lifecycle: tracker.init(config) before, tracker.finish(summary) after, model state resets, and other bookeeping.

The optimization loop should iterate via track_steps(), which handles tqdm progress bar and enforces any budget upper-bound configured via set_budget().

class tropt.optimizer.BeamSearchOptimizer(model, loss, tracker=None, seed=None, util_lm=None, util_lm_prefix=None, num_steps=40, beam_size=15, branching_factor=15, top_k=None, temperature=1.0, token_constraints=TokenConstraints(disallow_non_ascii=True, disallow_special_tokens=True, disallow_unused_tokens=True, disallow_custom_token_ids=[]))[source]#

Bases: BaseOptimizer

An LM beam search-based optimizer.: The general idea is to sample tokens while generating from a util LM, and steer the generation towards the desired objective(s) on the target model.

Combines the implementations of BEAST and AdvDecoding optimizers: - BEAST optimizer: https://arxiv.org/abs/2402.15570 - AdvDecoding optimizer: https://arxiv.org/abs/2410.02163

Parameters:

model (BaseModel)
loss (BaseLoss)
tracker (Optional[BaseTracker])
seed (Optional[int])
util_lm (Optional[LMBaseModel])
util_lm_prefix (Optional[str])
num_steps (int)
beam_size (int)
branching_factor (int)
top_k (Optional[int])
temperature (float)
token_constraints (TokenConstraints)

model_requirements = (<class 'tropt.model.model_mixins.LossTextAccessMixin'>,)#

Tuple of model mixin classes the primary model must satisfy; validated in __init__.

Convention: declare the least-restrictive configuration the optimizer supports.

Black-box only -> (LossTextAccessMixin,), even if gradient modes exist.
Always needs token-level loss → include LossTokenAccessMixin.
Always needs gradients → include GradientTokenAccessMixin or GradientEmbedAccessMixin.

Notes: - Requirements that only occur in an optional flow (e.g., candidate_selection="gradient" requiring GradientTokenAccessMixin) must be validated explicitly in __init__ after super().__init__(), with an assert/error. - Requirements on auxiliary models (proxy_model, util_model, etc.) are not covered here, and should also be validated explicitly in __init__.

optimize_trigger(templates, initial_trigger=None, targets=None)[source]#

Optimize the trigger using BEAST algorithm.

Parameters:

templates (TextTemplates) – List of text templates to optimize the trigger against.
targets (Optional[Targets], optional) – Target values for the loss function.
initial_trigger (str | None)

Return type:

OptimizerResult

Implementation notes: - We use the auxiliary LM (util_lm) to samples candidate tokens for the trigger.

(Note that in the original BEAST it was the same as the attacked LM; other attack use separate utility LM)

Then, we evaluate the candidate triggers on the targeted model (model) to compute the losses.
This loss evaluation against the target model is usually done in a black-box manner using text-level access (i.e., we query the model with the full text including the decoded candidate triggers), to enable the attack of fully black-box models; however, if util and target model share the same tokenizer, we can compute loss in token-level using the use_model_with_token_inputs option.

class tropt.optimizer.GASLITEOptimizer(model, loss, tracker=None, seed=None, num_steps=100, n_grad=50, n_flip=20, n_candidates=128, token_constraints=TokenConstraints(disallow_non_ascii=True, disallow_special_tokens=True, disallow_unused_tokens=True, disallow_custom_token_ids=[]), use_retokenize=True, use_random_gradient=False, **kwargs)[source]#

Bases: BaseOptimizer

Implements the GASLITE optimization algorithm (Algorithm 1) from the paper: “GASLITEing the Retrieval: Exploring Vulnerabilities in Dense Embedding-based Search” (https://arxiv.org/abs/2412.20953)

Parameters:

model (BaseModel)
loss (BaseLoss)
tracker (Optional[BaseTracker])
seed (Optional[int])
num_steps (int)
n_grad (int)
n_flip (int)
n_candidates (int)
token_constraints (TokenConstraints)
use_retokenize (bool)
use_random_gradient (bool)

model_requirements = (<class 'tropt.model.model_mixins.LossTokenAccessMixin'>, <class 'tropt.model.model_mixins.GradientTokenAccessMixin'>)#

Tuple of model mixin classes the primary model must satisfy; validated in __init__.

Convention: declare the least-restrictive configuration the optimizer supports.

Black-box only -> (LossTextAccessMixin,), even if gradient modes exist.
Always needs token-level loss → include LossTokenAccessMixin.
Always needs gradients → include GradientTokenAccessMixin or GradientEmbedAccessMixin.

Notes: - Requirements that only occur in an optional flow (e.g., candidate_selection="gradient" requiring GradientTokenAccessMixin) must be validated explicitly in __init__ after super().__init__(), with an assert/error. - Requirements on auxiliary models (proxy_model, util_model, etc.) are not covered here, and should also be validated explicitly in __init__.

optimize_trigger(templates, initial_trigger='! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !', targets=None)[source]#

Optimize the trigger to minimize the loss on the given inputs.

Subclasses only implement the search loop and return an OptimizerResult.

Parameters:

templates (List[str]) – Can be a single string or a list of (n_templates) strings.
initial_trigger (Optional[str]) – Initial trigger to start optimization from, if used by the optimizer.
targets (Optional[Targets]) – Target outputs for the given inputs, if applicable.

Return type:

OptimizerResult

Returns:

Optimized trigger.

Note

This method is wrapped by the baseclass (via __init_subclass__) to handle the full tracker lifecycle: tracker.init(config) before, tracker.finish(summary) after, model state resets, and other bookeeping.

The optimization loop should iterate via track_steps(), which handles tqdm progress bar and enforces any budget upper-bound configured via set_budget().

class tropt.optimizer.GASLITEPlusOptimizer(model, loss, tracker=None, seed=None, num_steps=100, n_grad=50, n_flip=20, n_candidates=128, token_constraints=TokenConstraints(disallow_non_ascii=True, disallow_special_tokens=True, disallow_unused_tokens=True, disallow_custom_token_ids=[]), use_retokenize=True, use_random_gradient=False, buffer_size=10, decline_n_flip_from_step=None, early_stopping_patience=None, early_stopping_threshold=0.005, n_bulk_flips=5, flip_pos_method='random', time_limit=None, n_flip_scheduler=None, **kwargs)[source]#

Bases: BaseOptimizer

Extends the GASLITE optimization algorithm with a trigger buffer, bulk flips, n_flip scheduling, and early stopping.

Paper: “GASLITEing the Retrieval: Exploring Vulnerabilities in Dense Embedding-based Search” (https://arxiv.org/abs/2412.20953)

Parameters:

model (BaseModel)
loss (BaseLoss)
tracker (Optional[BaseTracker])
seed (Optional[int])
num_steps (int)
n_grad (int)
n_flip (int)
n_candidates (int)
token_constraints (TokenConstraints)
use_retokenize (bool)
use_random_gradient (bool)
buffer_size (int)
decline_n_flip_from_step (Optional[int | float])
early_stopping_patience (Optional[int])
early_stopping_threshold (float)
n_bulk_flips (int)
flip_pos_method (str)
time_limit (Optional[float])
n_flip_scheduler (Optional[NFlipScheduler])

model_requirements = (<class 'tropt.model.model_mixins.LossTokenAccessMixin'>, <class 'tropt.model.model_mixins.GradientTokenAccessMixin'>)#

Tuple of model mixin classes the primary model must satisfy; validated in __init__.

Convention: declare the least-restrictive configuration the optimizer supports.

Black-box only -> (LossTextAccessMixin,), even if gradient modes exist.
Always needs token-level loss → include LossTokenAccessMixin.
Always needs gradients → include GradientTokenAccessMixin or GradientEmbedAccessMixin.

Notes: - Requirements that only occur in an optional flow (e.g., candidate_selection="gradient" requiring GradientTokenAccessMixin) must be validated explicitly in __init__ after super().__init__(), with an assert/error. - Requirements on auxiliary models (proxy_model, util_model, etc.) are not covered here, and should also be validated explicitly in __init__.

optimize_trigger(templates, initial_trigger='! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !', targets=None)[source]#

Optimize the trigger to minimize the loss on the given inputs.

Subclasses only implement the search loop and return an OptimizerResult.

Parameters:

templates (List[str]) – Can be a single string or a list of (n_templates) strings.
initial_trigger (Optional[str]) – Initial trigger to start optimization from, if used by the optimizer.
targets (Optional[Targets]) – Target outputs for the given inputs, if applicable.

Return type:

OptimizerResult

Returns:

Optimized trigger.

Note

This method is wrapped by the baseclass (via __init_subclass__) to handle the full tracker lifecycle: tracker.init(config) before, tracker.finish(summary) after, model state resets, and other bookeeping.

The optimization loop should iterate via track_steps(), which handles tqdm progress bar and enforces any budget upper-bound configured via set_budget().

class tropt.optimizer.GBDAOptimizer(model, loss, tracker=None, seed=None, num_steps=100, n_grad_samples=10, n_final_gumbel_samples=100, initial_coeff=15.0, init_mode='from_trigger', init_noise_scale=2.0, temp_schedule='linear', temp_start=1.0, temp_end=0.1, gd_optimizer=<class 'torch.optim.adam.Adam'>, use_lr_schedule=True, learning_rate=0.3, grad_clip_norm=None)[source]#

Bases: BaseOptimizer

Gradient-Based Distributional Attack (GBDA). Paper: https://arxiv.org/abs/2104.13733 Reference implementation: facebookresearch/text-adversarial-attack

Optimizes a continuous logit matrix theta (model distribution over L tokens, where L is the trigger sequence length) that can be used to sample triggers (w/ Gumbel-softmax). Throughout the optimization, the matrix theta used to provide a weighted sum of input embedding, on which the loss and gradients can be computed, and subsequently update theta. After each optimization step, and in particular at the end, theta can be used to sample discrete triggers.

Parameters:

model (BaseModel)
loss (BaseLoss)
tracker (Optional[BaseTracker])
seed (Optional[int])
num_steps (int)
n_grad_samples (int)
n_final_gumbel_samples (int)
initial_coeff (float)
init_mode (Literal['from_trigger', 'random'])
init_noise_scale (float)
temp_schedule (Literal['linear', 'gradual'])
temp_start (float)
temp_end (float)
gd_optimizer (Callable[..., torch.optim.Optimizer])
use_lr_schedule (bool)
learning_rate (float)
grad_clip_norm (Optional[float])

model_requirements = (<class 'tropt.model.model_mixins.LossTokenAccessMixin'>, <class 'tropt.model.model_mixins.GradientTokenAccessMixin'>)#

Tuple of model mixin classes the primary model must satisfy; validated in __init__.

Convention: declare the least-restrictive configuration the optimizer supports.

Black-box only -> (LossTextAccessMixin,), even if gradient modes exist.
Always needs token-level loss → include LossTokenAccessMixin.
Always needs gradients → include GradientTokenAccessMixin or GradientEmbedAccessMixin.

Notes: - Requirements that only occur in an optional flow (e.g., candidate_selection="gradient" requiring GradientTokenAccessMixin) must be validated explicitly in __init__ after super().__init__(), with an assert/error. - Requirements on auxiliary models (proxy_model, util_model, etc.) are not covered here, and should also be validated explicitly in __init__.

optimize_trigger(templates, initial_trigger='! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !', targets=None)[source]#

Optimize the trigger to minimize the loss on the given inputs.

Subclasses only implement the search loop and return an OptimizerResult.

Parameters:

templates (List[str]) – Can be a single string or a list of (n_templates) strings.
initial_trigger (Optional[str]) – Initial trigger to start optimization from, if used by the optimizer.
targets (Optional[Targets]) – Target outputs for the given inputs, if applicable.

Return type:

OptimizerResult

Returns:

Optimized trigger.

Note

This method is wrapped by the baseclass (via __init_subclass__) to handle the full tracker lifecycle: tracker.init(config) before, tracker.finish(summary) after, model state resets, and other bookeeping.

The optimization loop should iterate via track_steps(), which handles tqdm progress bar and enforces any budget upper-bound configured via set_budget().

class tropt.optimizer.GCGOptimizer(model, loss, tracker=None, seed=None, num_steps=500, n_candidates=512, sample_topk=256, sample_n_replace=1, token_constraints=TokenConstraints(disallow_non_ascii=True, disallow_special_tokens=True, disallow_unused_tokens=True, disallow_custom_token_ids=[]), use_retokenize=True)[source]#

Bases: BaseOptimizer

https://arxiv.org/abs/2307.15043

Parameters:

model (BaseModel)
loss (BaseLoss)
tracker (Optional[BaseTracker])
seed (Optional[int])
num_steps (int)
n_candidates (int)
sample_topk (int)
sample_n_replace (int)
token_constraints (TokenConstraints)
use_retokenize (bool)

model_requirements = (<class 'tropt.model.model_mixins.LossTokenAccessMixin'>, <class 'tropt.model.model_mixins.GradientTokenAccessMixin'>)#

Tuple of model mixin classes the primary model must satisfy; validated in __init__.

Convention: declare the least-restrictive configuration the optimizer supports.

Black-box only -> (LossTextAccessMixin,), even if gradient modes exist.
Always needs token-level loss → include LossTokenAccessMixin.
Always needs gradients → include GradientTokenAccessMixin or GradientEmbedAccessMixin.

Notes: - Requirements that only occur in an optional flow (e.g., candidate_selection="gradient" requiring GradientTokenAccessMixin) must be validated explicitly in __init__ after super().__init__(), with an assert/error. - Requirements on auxiliary models (proxy_model, util_model, etc.) are not covered here, and should also be validated explicitly in __init__.

optimize_trigger(templates, initial_trigger='! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !', targets=None)[source]#

Optimize the trigger to minimize the loss on the given inputs.

Subclasses only implement the search loop and return an OptimizerResult.

Parameters:

templates (List[str]) – Can be a single string or a list of (n_templates) strings.
initial_trigger (Optional[str]) – Initial trigger to start optimization from, if used by the optimizer.
targets (Optional[Targets]) – Target outputs for the given inputs, if applicable.

Return type:

OptimizerResult

Returns:

Optimized trigger.

Note

This method is wrapped by the baseclass (via __init_subclass__) to handle the full tracker lifecycle: tracker.init(config) before, tracker.finish(summary) after, model state resets, and other bookeeping.

The optimization loop should iterate via track_steps(), which handles tqdm progress bar and enforces any budget upper-bound configured via set_budget().

class tropt.optimizer.GCGPlusOptimizer(model, loss, tracker=None, seed=None, proxy_model=None, proxy_loss=None, candidate_selection='gradient', num_steps=500, n_candidates=512, sample_topk=256, token_constraints=TokenConstraints(disallow_non_ascii=True, disallow_special_tokens=True, disallow_unused_tokens=True, disallow_custom_token_ids=[]), use_retokenize=True, sample_n_replace=(1, 1), candidate_oversample_factor=1.1, momentum=0.0, skip_visited=False, buffer_size=None, n_grad_avg=1, template_batch_size=None)[source]#

Bases: BaseOptimizer

Flexible GCG implementation, supporting tricks from GCG, QCG, GASLITE, and UAT.

Two-stage design:

Candidate selection (on proxy model) — gradient-based, random, or focused.
Candidate evaluation (on target model) — via text or token access.

References

GCG: https://arxiv.org/abs/2307.15043
QCG: https://arxiv.org/abs/2402.12329
PAL: https://arxiv.org/abs/2402.09674
GASLITE: https://arxiv.org/abs/2412.20953
UAT: https://arxiv.org/abs/1908.07125

Parameters:

model (BaseModel)
loss (BaseLoss)
tracker (Optional[BaseTracker])
seed (Optional[int])
proxy_model (Optional[BaseModel])
proxy_loss (Optional[BaseLoss])
candidate_selection (Literal['gradient', 'random', 'focused'])
num_steps (int)
n_candidates (int)
sample_topk (int)
token_constraints (TokenConstraints)
use_retokenize (bool)
sample_n_replace (Union[int, Tuple[int, int]])
candidate_oversample_factor (float)
momentum (float)
skip_visited (bool)
buffer_size (Optional[int])
n_grad_avg (int)
template_batch_size (Optional[int])

model_requirements = (<class 'tropt.model.model_mixins.LossTextAccessMixin'>,)#

Tuple of model mixin classes the primary model must satisfy; validated in __init__.

Convention: declare the least-restrictive configuration the optimizer supports.

Black-box only -> (LossTextAccessMixin,), even if gradient modes exist.
Always needs token-level loss → include LossTokenAccessMixin.
Always needs gradients → include GradientTokenAccessMixin or GradientEmbedAccessMixin.

Notes: - Requirements that only occur in an optional flow (e.g., candidate_selection="gradient" requiring GradientTokenAccessMixin) must be validated explicitly in __init__ after super().__init__(), with an assert/error. - Requirements on auxiliary models (proxy_model, util_model, etc.) are not covered here, and should also be validated explicitly in __init__.

optimize_trigger(templates, initial_trigger='! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !', targets=None)[source]#

Optimize the trigger to minimize the loss on the given inputs.

Subclasses only implement the search loop and return an OptimizerResult.

Parameters:

templates (List[str]) – Can be a single string or a list of (n_templates) strings.
initial_trigger (Optional[str]) – Initial trigger to start optimization from, if used by the optimizer.
targets (Optional[Targets]) – Target outputs for the given inputs, if applicable.

Return type:

OptimizerResult

Returns:

Optimized trigger.

Note

This method is wrapped by the baseclass (via __init_subclass__) to handle the full tracker lifecycle: tracker.init(config) before, tracker.finish(summary) after, model state resets, and other bookeeping.

The optimization loop should iterate via track_steps(), which handles tqdm progress bar and enforces any budget upper-bound configured via set_budget().

class tropt.optimizer.HotFlipOptimizer(model, loss, tracker=None, seed=None, num_steps=500, token_constraints=TokenConstraints(disallow_non_ascii=True, disallow_special_tokens=True, disallow_unused_tokens=True, disallow_custom_token_ids=[]), use_retokenize=True)[source]#

Bases: BaseOptimizer

HotFlip: White-Box Adversarial Examples for Text Classification. https://arxiv.org/abs/1712.06751

Uses first-order Taylor approximation of the loss to greedily select token substitutions. Each flip is chosen as the (position, token) pair that maximally decreases the estimated loss, without requiring a forward pass for candidate evaluation. We implement the greedy variant introduced in the paper.

Parameters:

model (BaseModel)
loss (BaseLoss)
tracker (Optional[BaseTracker])
seed (Optional[int])
num_steps (int)
token_constraints (TokenConstraints)
use_retokenize (bool)

model_requirements = (<class 'tropt.model.model_mixins.LossTokenAccessMixin'>, <class 'tropt.model.model_mixins.GradientTokenAccessMixin'>)#

Tuple of model mixin classes the primary model must satisfy; validated in __init__.

Convention: declare the least-restrictive configuration the optimizer supports.

Black-box only -> (LossTextAccessMixin,), even if gradient modes exist.
Always needs token-level loss → include LossTokenAccessMixin.
Always needs gradients → include GradientTokenAccessMixin or GradientEmbedAccessMixin.

Notes: - Requirements that only occur in an optional flow (e.g., candidate_selection="gradient" requiring GradientTokenAccessMixin) must be validated explicitly in __init__ after super().__init__(), with an assert/error. - Requirements on auxiliary models (proxy_model, util_model, etc.) are not covered here, and should also be validated explicitly in __init__.

optimize_trigger(templates, initial_trigger='! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !', targets=None)[source]#

Optimize the trigger to minimize the loss on the given inputs.

Subclasses only implement the search loop and return an OptimizerResult.

Parameters:

templates (List[str]) – Can be a single string or a list of (n_templates) strings.
initial_trigger (Optional[str]) – Initial trigger to start optimization from, if used by the optimizer.
targets (Optional[Targets]) – Target outputs for the given inputs, if applicable.

Return type:

OptimizerResult

Returns:

Optimized trigger.

Note

This method is wrapped by the baseclass (via __init_subclass__) to handle the full tracker lifecycle: tracker.init(config) before, tracker.finish(summary) after, model state resets, and other bookeeping.

The optimization loop should iterate via track_steps(), which handles tqdm progress bar and enforces any budget upper-bound configured via set_budget().

class tropt.optimizer.OptimizerResult(best_loss, best_trigger_ids=None, best_trigger_str=None, best_trigger_emb=None, best_trigger_probs=None, losses=None, trigger_strs=None)[source]#

Bases: object

Parameters:

best_loss (float)
best_trigger_ids (Float[Tensor, '1 trigger_seq_len'] | None)
best_trigger_str (str | None)
best_trigger_emb (Float[Tensor, 'trigger_seq_len embed_dim'] | None)
best_trigger_probs (Float[Tensor, 'trigger_seq_len vocab_size'] | None)
losses (List[float] | None)
trigger_strs (List[str] | None)

best_loss: float#

best_trigger_emb: Optional[Float[Tensor, 'trigger_seq_len embed_dim']] = None#

best_trigger_ids: Optional[Float[Tensor, '1 trigger_seq_len']] = None#

best_trigger_probs: Optional[Float[Tensor, 'trigger_seq_len vocab_size']] = None#

best_trigger_str: Optional[str] = None#

losses: Optional[List[float]] = None#

to_dict()[source]#

Lightweight summary dict for final logging (no tensors or lists !).

Return type:: dict

trigger_strs: Optional[List[str]] = None#

class tropt.optimizer.PALOptimizer(model, loss, tracker=None, seed=None, proxy_model=None, proxy_loss=None, candidate_selection='gradient', num_steps=500, n_candidates=512, sample_topk=256, sample_n_replace=1, candidate_oversample_factor=1.5, token_constraints=TokenConstraints(disallow_non_ascii=True, disallow_special_tokens=True, disallow_unused_tokens=True, disallow_custom_token_ids=[]), n_candidates_after_proxy_filter=None)[source]#

Bases: BaseOptimizer

Proxy-guided black-box optimizer for PAL and RAL attacks.

Two-stage design:

Candidate selection (on proxy) — gradient-based or random.
Candidate evaluation (on target) — via text or token access.

Skip-visited is always enabled. Optional proxy filtering narrows candidates by proxy loss before querying the target.

References

Paper: https://arxiv.org/abs/2402.09674
Official Codebase: chawins/pal

Note: PAL original implementation also support fine-tuning the proxy model (to get it closer to the target model in the “optimized area”), we currently don’t support that.

Parameters:

model (BaseModel)
loss (BaseLoss)
tracker (Optional[BaseTracker])
seed (Optional[int])
proxy_model (Optional[BaseModel])
proxy_loss (Optional[BaseLoss])
candidate_selection (Literal['gradient', 'random'])
num_steps (int)
n_candidates (int)
sample_topk (int)
sample_n_replace (int)
candidate_oversample_factor (float)
token_constraints (TokenConstraints)
n_candidates_after_proxy_filter (Optional[int])

model_requirements = (<class 'tropt.model.model_mixins.LossTextAccessMixin'>,)#

Tuple of model mixin classes the primary model must satisfy; validated in __init__.

Convention: declare the least-restrictive configuration the optimizer supports.

Black-box only -> (LossTextAccessMixin,), even if gradient modes exist.
Always needs token-level loss → include LossTokenAccessMixin.
Always needs gradients → include GradientTokenAccessMixin or GradientEmbedAccessMixin.

Notes: - Requirements that only occur in an optional flow (e.g., candidate_selection="gradient" requiring GradientTokenAccessMixin) must be validated explicitly in __init__ after super().__init__(), with an assert/error. - Requirements on auxiliary models (proxy_model, util_model, etc.) are not covered here, and should also be validated explicitly in __init__.

optimize_trigger(templates, initial_trigger='! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !', targets=None)[source]#

Optimize the trigger to minimize the loss on the given inputs.

Subclasses only implement the search loop and return an OptimizerResult.

Parameters:

templates (List[str]) – Can be a single string or a list of (n_templates) strings.
initial_trigger (Optional[str]) – Initial trigger to start optimization from, if used by the optimizer.
targets (Optional[Targets]) – Target outputs for the given inputs, if applicable.

Return type:

OptimizerResult

Returns:

Optimized trigger.

Note

This method is wrapped by the baseclass (via __init_subclass__) to handle the full tracker lifecycle: tracker.init(config) before, tracker.finish(summary) after, model state resets, and other bookeeping.

The optimization loop should iterate via track_steps(), which handles tqdm progress bar and enforces any budget upper-bound configured via set_budget().

class tropt.optimizer.PEZOptimizer(model, loss, tracker=None, seed=None, num_steps=300, learning_rate=0.1, weight_decay=0.1, gd_optimizer=<class 'torch.optim.sgd.SGD'>)[source]#

Bases: BaseOptimizer

Optimizer of PEZ: optimizes continuous trigger (AKA soft trigger), but projects it to discrete tokens every optimization step.

Paper: https://arxiv.org/abs/2302.03668 Reference implementation: YuxinWenRick/hard-prompts-made-easy

Parameters:

model (BaseModel)
loss (BaseLoss)
tracker (Optional[BaseTracker])
seed (Optional[int])
num_steps (int)
learning_rate (float)
weight_decay (float)
gd_optimizer (type)

model_requirements = (<class 'tropt.model.model_mixins.LossTokenAccessMixin'>, <class 'tropt.model.model_mixins.GradientEmbedAccessMixin'>)#

Tuple of model mixin classes the primary model must satisfy; validated in __init__.

Convention: declare the least-restrictive configuration the optimizer supports.

Black-box only -> (LossTextAccessMixin,), even if gradient modes exist.
Always needs token-level loss → include LossTokenAccessMixin.
Always needs gradients → include GradientTokenAccessMixin or GradientEmbedAccessMixin.

Notes: - Requirements that only occur in an optional flow (e.g., candidate_selection="gradient" requiring GradientTokenAccessMixin) must be validated explicitly in __init__ after super().__init__(), with an assert/error. - Requirements on auxiliary models (proxy_model, util_model, etc.) are not covered here, and should also be validated explicitly in __init__.

optimize_trigger(templates, initial_trigger='! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !', targets=None)[source]#

Optimize the trigger to minimize the loss on the given inputs.

Subclasses only implement the search loop and return an OptimizerResult.

Parameters:

templates (List[str]) – Can be a single string or a list of (n_templates) strings.
initial_trigger (Optional[str]) – Initial trigger to start optimization from, if used by the optimizer.
targets (Optional[Targets]) – Target outputs for the given inputs, if applicable.

Return type:

OptimizerResult

Returns:

Optimized trigger.

Note

This method is wrapped by the baseclass (via __init_subclass__) to handle the full tracker lifecycle: tracker.init(config) before, tracker.finish(summary) after, model state resets, and other bookeeping.

The optimization loop should iterate via track_steps(), which handles tqdm progress bar and enforces any budget upper-bound configured via set_budget().

class tropt.optimizer.QCGOptimizer(model, loss, tracker=None, seed=None, proxy_model=None, num_steps=500, n_proxy_candidates=8192, n_target_candidates=32, buffer_size=128, token_constraints=TokenConstraints(disallow_non_ascii=True, disallow_special_tokens=True, disallow_unused_tokens=True, disallow_custom_token_ids=[]), candidate_oversample_factor=1.5)[source]#

Bases: BaseOptimizer

Greedy Coordinate Query optimizer (Hayase et al., 2024).

Buffer-based query attack: maintains a buffer of B best triggers, expands from the best entry each step via random single-token flips, uses the proxy models to filter to top-K, then evaluates on the target model.

Reference: https://arxiv.org/abs/2402.12329

Parameters:

model (BaseModel)
loss (BaseLoss)
tracker (Optional[BaseTracker])
seed (Optional[int])
proxy_model (Optional[BaseModel])
num_steps (int)
n_proxy_candidates (int)
n_target_candidates (int)
buffer_size (int)
token_constraints (TokenConstraints)
candidate_oversample_factor (float)

model_requirements = (<class 'tropt.model.model_mixins.LossTextAccessMixin'>,)#

Tuple of model mixin classes the primary model must satisfy; validated in __init__.

Convention: declare the least-restrictive configuration the optimizer supports.

Black-box only -> (LossTextAccessMixin,), even if gradient modes exist.
Always needs token-level loss → include LossTokenAccessMixin.
Always needs gradients → include GradientTokenAccessMixin or GradientEmbedAccessMixin.

Notes: - Requirements that only occur in an optional flow (e.g., candidate_selection="gradient" requiring GradientTokenAccessMixin) must be validated explicitly in __init__ after super().__init__(), with an assert/error. - Requirements on auxiliary models (proxy_model, util_model, etc.) are not covered here, and should also be validated explicitly in __init__.

optimize_trigger(templates, initial_trigger='! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !', targets=None)[source]#

Optimize the trigger to minimize the loss on the given inputs.

Subclasses only implement the search loop and return an OptimizerResult.

Parameters:

templates (List[str]) – Can be a single string or a list of (n_templates) strings.
initial_trigger (Optional[str]) – Initial trigger to start optimization from, if used by the optimizer.
targets (Optional[Targets]) – Target outputs for the given inputs, if applicable.

Return type:

OptimizerResult

Returns:

Optimized trigger.

Note

This method is wrapped by the baseclass (via __init_subclass__) to handle the full tracker lifecycle: tracker.init(config) before, tracker.finish(summary) after, model state resets, and other bookeeping.

The optimization loop should iterate via track_steps(), which handles tqdm progress bar and enforces any budget upper-bound configured via set_budget().

class tropt.optimizer.RASLITEPlusOptimizer(model, loss, tracker=None, seed=None, num_steps=100, n_logit_samples=None, n_flip=20, n_candidates=128, token_constraints=TokenConstraints(disallow_non_ascii=True, disallow_special_tokens=True, disallow_unused_tokens=True, disallow_custom_token_ids=[]), use_retokenize=True, util_model=None, use_random_logits=False, buffer_size=10, decline_n_flip_from_step=None, early_stopping_patience=None, early_stopping_threshold=0.005, n_bulk_flips=5, flip_pos_method='random')[source]#

Bases: BaseOptimizer

Implements the RASLITEPlus optimization algorithm, which basically runs GASLITE against a black-box model; specifically, we use a util-LM for the tokenizer and to compute logits, using stratgies from GASLITEPlus (buffer, early stopping, decreasing n_flip, etc.). The key loss computations are done on text-level against the black-box target model.

Builds on the paper: “GASLITEing the Retrieval: Exploring Vulnerabilities in Dense Embedding-based Search” (https://arxiv.org/abs/2412.20953)

Parameters:

model (BaseModel)
loss (BaseLoss)
tracker (Optional[BaseTracker])
seed (Optional[int])
num_steps (int)
n_logit_samples (Optional[int])
n_flip (int | float)
n_candidates (int)
token_constraints (TokenConstraints)
use_retokenize (bool)
util_model (Optional[LMBaseModel])
use_random_logits (bool)
buffer_size (int)
decline_n_flip_from_step (Optional[int | float])
early_stopping_patience (Optional[int])
early_stopping_threshold (float)
n_bulk_flips (int)
flip_pos_method (str)

model_requirements = (<class 'tropt.model.model_mixins.LossTextAccessMixin'>,)#

Tuple of model mixin classes the primary model must satisfy; validated in __init__.

Convention: declare the least-restrictive configuration the optimizer supports.

Black-box only -> (LossTextAccessMixin,), even if gradient modes exist.
Always needs token-level loss → include LossTokenAccessMixin.
Always needs gradients → include GradientTokenAccessMixin or GradientEmbedAccessMixin.

Notes: - Requirements that only occur in an optional flow (e.g., candidate_selection="gradient" requiring GradientTokenAccessMixin) must be validated explicitly in __init__ after super().__init__(), with an assert/error. - Requirements on auxiliary models (proxy_model, util_model, etc.) are not covered here, and should also be validated explicitly in __init__.

optimize_trigger(templates, initial_trigger='! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !', targets=None)[source]#

Optimize the trigger to minimize the loss on the given inputs.

Subclasses only implement the search loop and return an OptimizerResult.

Parameters:

templates (List[str]) – Can be a single string or a list of (n_templates) strings.
initial_trigger (Optional[str]) – Initial trigger to start optimization from, if used by the optimizer.
targets (Optional[Targets]) – Target outputs for the given inputs, if applicable.

Return type:

OptimizerResult

Returns:

Optimized trigger.

Note

This method is wrapped by the baseclass (via __init_subclass__) to handle the full tracker lifecycle: tracker.init(config) before, tracker.finish(summary) after, model state resets, and other bookeeping.

The optimization loop should iterate via track_steps(), which handles tqdm progress bar and enforces any budget upper-bound configured via set_budget().

class tropt.optimizer.RandomSearchOptimizer(model, loss, tracker=None, seed=None, num_steps=500, n_candidates=128, mutation_mode='block_random', schedule='fixed', initial_block_len=4, patience=25, token_constraints=TokenConstraints(disallow_non_ascii=True, disallow_special_tokens=True, disallow_unused_tokens=True, disallow_custom_token_ids=[]), tokenizer=None)[source]#

Bases: BaseOptimizer

RandomSearch: batched zeroth-order token optimization with block mutation.

Per step:

Compute block size from coarse-to-fine schedule
For each candidate, pick a random start position and replace a contiguous block with random tokens from the allowed set
Decode candidates to strings and evaluate via compute_loss_from_texts
Keep best if it improves current loss
If no improvement for patience steps, restart from random init

Implementation Notes: - Candidate evaluation is always text-based (compute_loss_from_texts); even for HF model, we decode to strings and re-encode for model input. - A tokenizer is needed for the optimizer’s token-level mutations; it should either be provided, or we fall back to the model’s tokenizer if it has one. - The original implementation employs a “warm” initial trigger (eg another GCG suffix), and uses it as the starting point for all restarts. Here, we sample random triggers for all restarts for diversity. - The original implementation employs an LLM judge for early stopping; here we use a simple patience counter for restarts. - The original implementation mostly use a loss-based scheduler. For generality (e.g., different potential loss values) we avoid using it.

Reference implementation: - The original implementation: tml-epfl/llm-adaptive-attacks - Another (more simplified) implementation: romovpa/claudini

Parameters:

model (BaseModel)
loss (BaseLoss)
seed (Optional[int])
num_steps (int)
n_candidates (int)
mutation_mode (Literal['block_random', 'single_cyclic'])
schedule (Literal['fixed', 'none'])
initial_block_len (int)
patience (int)
token_constraints (TokenConstraints)
tokenizer (Optional[BaseTokenizer])

model_requirements = (<class 'tropt.model.model_mixins.LossTextAccessMixin'>,)#

Tuple of model mixin classes the primary model must satisfy; validated in __init__.

Convention: declare the least-restrictive configuration the optimizer supports.

Black-box only -> (LossTextAccessMixin,), even if gradient modes exist.
Always needs token-level loss → include LossTokenAccessMixin.
Always needs gradients → include GradientTokenAccessMixin or GradientEmbedAccessMixin.

Notes: - Requirements that only occur in an optional flow (e.g., candidate_selection="gradient" requiring GradientTokenAccessMixin) must be validated explicitly in __init__ after super().__init__(), with an assert/error. - Requirements on auxiliary models (proxy_model, util_model, etc.) are not covered here, and should also be validated explicitly in __init__.

optimize_trigger(templates, initial_trigger='! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !', targets=None)[source]#

Optimize the trigger to minimize the loss on the given inputs.

Subclasses only implement the search loop and return an OptimizerResult.

Parameters:

templates (List[str]) – Can be a single string or a list of (n_templates) strings.
initial_trigger (Optional[str]) – Initial trigger to start optimization from, if used by the optimizer.
targets (Optional[Targets]) – Target outputs for the given inputs, if applicable.

Return type:

OptimizerResult

Returns:

Optimized trigger.

Note

This method is wrapped by the baseclass (via __init_subclass__) to handle the full tracker lifecycle: tracker.init(config) before, tracker.finish(summary) after, model state resets, and other bookeeping.

The optimization loop should iterate via track_steps(), which handles tqdm progress bar and enforces any budget upper-bound configured via set_budget().

class tropt.optimizer.SoftPromptOptimizer(model, loss, tracker=None, seed=None, num_steps=100, learning_rate=0.001, gd_optimizer=<class 'torch.optim.adam.Adam'>)[source]#

Bases: BaseOptimizer

Optimizing soft prompts

Parameters:

model (BaseModel)
loss (BaseLoss)
tracker (Optional[BaseTracker])
seed (Optional[int])
num_steps (int)
learning_rate (float)
gd_optimizer (Callable[..., torch.optim.Optimizer])

model_requirements = (<class 'tropt.model.model_mixins.GradientEmbedAccessMixin'>,)#

Tuple of model mixin classes the primary model must satisfy; validated in __init__.

Convention: declare the least-restrictive configuration the optimizer supports.

Black-box only -> (LossTextAccessMixin,), even if gradient modes exist.
Always needs token-level loss → include LossTokenAccessMixin.
Always needs gradients → include GradientTokenAccessMixin or GradientEmbedAccessMixin.

Notes: - Requirements that only occur in an optional flow (e.g., candidate_selection="gradient" requiring GradientTokenAccessMixin) must be validated explicitly in __init__ after super().__init__(), with an assert/error. - Requirements on auxiliary models (proxy_model, util_model, etc.) are not covered here, and should also be validated explicitly in __init__.

optimize_trigger(templates, initial_trigger='! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !', targets=None)[source]#

Optimize the trigger to minimize the loss on the given inputs.

Subclasses only implement the search loop and return an OptimizerResult.

Parameters:

templates (List[str]) – Can be a single string or a list of (n_templates) strings.
initial_trigger (Optional[str]) – Initial trigger to start optimization from, if used by the optimizer.
targets (Optional[Targets]) – Target outputs for the given inputs, if applicable.

Return type:

OptimizerResult

Returns:

Optimized trigger.

Note

This method is wrapped by the baseclass (via __init_subclass__) to handle the full tracker lifecycle: tracker.init(config) before, tracker.finish(summary) after, model state resets, and other bookeeping.

The optimization loop should iterate via track_steps(), which handles tqdm progress bar and enforces any budget upper-bound configured via set_budget().