Optimizers#
Optimizer Interface#
- class tropt.optimizer.BaseOptimizer(model, loss=None, tracker=None, seed=None)[source]#
Bases:
ABCBase class for all trigger optimizers.
Implements common functionality and interface for optimizers, including tracking.
Subclasses must implement the
optimize_triggermethod, which contains the core optimization loop and returns anOptimizerResult; this method is automatically wrapped to handle logging, model state resets, and tracker finalization.
- Parameters:
model (BaseModel)
loss (Optional[BaseLoss])
tracker (Optional[BaseTracker])
seed (Optional[int])
- log(loss, trigger_str=None, **extra)[source]#
Log per-step metrics to the tracker.
- Automatically enriches with:
best_loss: running best loss across steps.loss/*: loss function component stats.target_model_stats/*,total_models_stats/*: model usage stats (by inspecting all optimizer attributes that subclass BaseModel).
- Parameters:
loss (
float) – Per-step loss value.trigger_str (
Optional[str]) – Current trigger string (omitted from log dict if None).**extra – Any additional key-value pairs to include in the log dict.
- Return type:
None
- model_requirements = ()#
Tuple of model mixin classes the primary model must satisfy; validated in
__init__.Convention: declare the least-restrictive configuration the optimizer supports.
Black-box only ->
(LossTextAccessMixin,), even if gradient modes exist.Always needs token-level loss → include
LossTokenAccessMixin.Always needs gradients → include
GradientTokenAccessMixinorGradientEmbedAccessMixin.
Notes: - Requirements that only occur in an optional flow (e.g.,
candidate_selection="gradient"requiringGradientTokenAccessMixin) must be validated explicitly in__init__aftersuper().__init__(), with an assert/error. - Requirements on auxiliary models (proxy_model, util_model, etc.) are not covered here, and should also be validated explicitly in__init__.
- abstract optimize_trigger(templates, initial_trigger=None, targets=None)[source]#
Optimize the trigger to minimize the loss on the given inputs.
Subclasses only implement the search loop and return an
OptimizerResult.- Parameters:
templates (
List[str]) – Can be a single string or a list of (n_templates) strings.initial_trigger (
Optional[str]) – Initial trigger to start optimization from, if used by the optimizer.targets (
Optional[Targets]) – Target outputs for the given inputs, if applicable.
- Return type:
- Returns:
Optimized trigger.
Note
This method is wrapped by the baseclass (via
__init_subclass__) to handle the full tracker lifecycle:tracker.init(config)before,tracker.finish(summary)after, model state resets, and other bookeeping.The optimization loop should iterate via
track_steps(), which handles tqdm progress bar and enforces any budget upper-bound configured viaset_budget().
- reset_budget()[source]#
Clears any budget set by
set_budget().
- set_budget(limit, metric='total_tokens', scope='all')[source]#
Registers an upper-bound resource budget enforced by
track_steps().The budget is a ceiling, not a quota: if the optimizer terminates naturally before reaching it, the budget has no effect.
Common metrics (keys of
BaseModel.get_usage_stats()): -"total_flops": Estimated FLOPs consumed. Requiresmodel.set_flop_counting("manual")on any model whose FLOPs should count. Best choice for white-box compute-equalised comparisons. -"total_tokens": Total tokens processed (prompt + generation). Best for black-box models where FLOPs aren’t observable but token usage is.- Parameters:
limit (
int) – Integer upper bound on themetric.metric (
str) – The metric the budget is set by. Defaults to the token usage count.scope (
str) – What models to take the metric against. In optimizers that accomodate multiple models (e.g., proxy models), this may be critical choice."all", sums the metric across all models found onself."target"only considers the primary target model (self.model), which is useful if we only care about the target model API token usage.
- Return type:
None
Usage: ```python # Whitebox: cap compute by FLOPs (across target + any proxy LM) optimizer = GCGOptimizer(model=model_obj, loss=PrefillCELoss(), num_steps=10_000) optimizer.set_budget(1e17, metric=”total_flops”)
# Blackbox: cap by target-model tokens (FLOPs aren’t observable on API models) optimizer = RandomSearchOptimizer(model=model_obj, loss=PrefillCELoss(), num_steps=10_000) optimizer.set_budget(1_000_000, metric=”total_tokens”, scope=”target”) ```
- track_steps(*args, **kwargs)[source]#
Iterator for the optimization loop that handles progress bar and budget enforcement.
This supplement the optimziation loop with: - a
tqdmprogress bar (args/kwargs forwarded) thatlog()calls auto-updates with the current loss and trigger string, and
early termination when the budget set via
set_budget()is hit.
The budget is checked at the top of each step, so overshoot is bounded by one step’s work. Without a budget set, behaves like plain
tqdm.- Note: If you implement a personal-use custom optimizer for quick check, and don’t care for
fancy progress bar / budget, you may safely ignore this.
Usage:
for _ in self.track_steps(range(self.num_steps), desc="MyOpt"): ...
Optimizers Implementations#
- class tropt.optimizer.ARCAOptimizer(model, loss, tracker=None, seed=None, num_steps=500, n_candidates=512, sample_topk=256, token_constraints=TokenConstraints(disallow_non_ascii=True, disallow_special_tokens=True, disallow_unused_tokens=True, disallow_custom_token_ids=[]), use_retokenize=True, n_grad_avg=32)[source]#
Bases:
BaseOptimizerGradient-based cyclic coordinate descent (Jones et al., 2023).
Each step advances to the next trigger position (cyclically), averages gradients over multiple random-token perturbations at that position, then evaluates all top-k candidates there.
Reference: https://arxiv.org/abs/2303.04381
- Parameters:
model (BaseModel)
loss (BaseLoss)
tracker (Optional[BaseTracker])
seed (Optional[int])
num_steps (int)
n_candidates (int)
sample_topk (int)
token_constraints (TokenConstraints)
use_retokenize (bool)
n_grad_avg (int)
- model_requirements = (<class 'tropt.model.model_mixins.LossTokenAccessMixin'>, <class 'tropt.model.model_mixins.GradientTokenAccessMixin'>)#
Tuple of model mixin classes the primary model must satisfy; validated in
__init__.Convention: declare the least-restrictive configuration the optimizer supports.
Black-box only ->
(LossTextAccessMixin,), even if gradient modes exist.Always needs token-level loss → include
LossTokenAccessMixin.Always needs gradients → include
GradientTokenAccessMixinorGradientEmbedAccessMixin.
Notes: - Requirements that only occur in an optional flow (e.g.,
candidate_selection="gradient"requiringGradientTokenAccessMixin) must be validated explicitly in__init__aftersuper().__init__(), with an assert/error. - Requirements on auxiliary models (proxy_model, util_model, etc.) are not covered here, and should also be validated explicitly in__init__.
- optimize_trigger(templates, initial_trigger='! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !', targets=None)[source]#
Optimize the trigger to minimize the loss on the given inputs.
Subclasses only implement the search loop and return an
OptimizerResult.- Parameters:
templates (
List[str]) – Can be a single string or a list of (n_templates) strings.initial_trigger (
Optional[str]) – Initial trigger to start optimization from, if used by the optimizer.targets (
Optional[Targets]) – Target outputs for the given inputs, if applicable.
- Return type:
- Returns:
Optimized trigger.
Note
This method is wrapped by the baseclass (via
__init_subclass__) to handle the full tracker lifecycle:tracker.init(config)before,tracker.finish(summary)after, model state resets, and other bookeeping.The optimization loop should iterate via
track_steps(), which handles tqdm progress bar and enforces any budget upper-bound configured viaset_budget().
- class tropt.optimizer.AutoPromptOptimizer(model, loss, tracker=None, seed=None, num_steps=500, n_candidates=512, sample_topk=256, token_constraints=TokenConstraints(disallow_non_ascii=True, disallow_special_tokens=True, disallow_unused_tokens=True, disallow_custom_token_ids=[]), use_retokenize=True)[source]#
Bases:
BaseOptimizerGradient-based discrete prompt optimization (Shin et al., 2020).
Each step picks a single random trigger position and evaluates all gradient-ranked top-k candidate tokens at that position.
Reference: https://arxiv.org/abs/2010.15980
- Parameters:
model (BaseModel)
loss (BaseLoss)
tracker (Optional[BaseTracker])
seed (Optional[int])
num_steps (int)
n_candidates (int)
sample_topk (int)
token_constraints (TokenConstraints)
use_retokenize (bool)
- model_requirements = (<class 'tropt.model.model_mixins.LossTokenAccessMixin'>, <class 'tropt.model.model_mixins.GradientTokenAccessMixin'>)#
Tuple of model mixin classes the primary model must satisfy; validated in
__init__.Convention: declare the least-restrictive configuration the optimizer supports.
Black-box only ->
(LossTextAccessMixin,), even if gradient modes exist.Always needs token-level loss → include
LossTokenAccessMixin.Always needs gradients → include
GradientTokenAccessMixinorGradientEmbedAccessMixin.
Notes: - Requirements that only occur in an optional flow (e.g.,
candidate_selection="gradient"requiringGradientTokenAccessMixin) must be validated explicitly in__init__aftersuper().__init__(), with an assert/error. - Requirements on auxiliary models (proxy_model, util_model, etc.) are not covered here, and should also be validated explicitly in__init__.
- optimize_trigger(templates, initial_trigger='! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !', targets=None)[source]#
Optimize the trigger to minimize the loss on the given inputs.
Subclasses only implement the search loop and return an
OptimizerResult.- Parameters:
templates (
List[str]) – Can be a single string or a list of (n_templates) strings.initial_trigger (
Optional[str]) – Initial trigger to start optimization from, if used by the optimizer.targets (
Optional[Targets]) – Target outputs for the given inputs, if applicable.
- Return type:
- Returns:
Optimized trigger.
Note
This method is wrapped by the baseclass (via
__init_subclass__) to handle the full tracker lifecycle:tracker.init(config)before,tracker.finish(summary)after, model state resets, and other bookeeping.The optimization loop should iterate via
track_steps(), which handles tqdm progress bar and enforces any budget upper-bound configured viaset_budget().
- class tropt.optimizer.BeamSearchOptimizer(model, loss, tracker=None, seed=None, util_lm=None, util_lm_prefix=None, num_steps=40, beam_size=15, branching_factor=15, top_k=None, temperature=1.0, token_constraints=TokenConstraints(disallow_non_ascii=True, disallow_special_tokens=True, disallow_unused_tokens=True, disallow_custom_token_ids=[]))[source]#
Bases:
BaseOptimizer- An LM beam search-based optimizer.
The general idea is to sample tokens while generating from a util LM, and steer the generation towards the desired objective(s) on the target model.
Combines the implementations of BEAST and AdvDecoding optimizers: - BEAST optimizer: https://arxiv.org/abs/2402.15570 - AdvDecoding optimizer: https://arxiv.org/abs/2410.02163
- Parameters:
model (BaseModel)
loss (BaseLoss)
tracker (Optional[BaseTracker])
seed (Optional[int])
util_lm (Optional[LMBaseModel])
util_lm_prefix (Optional[str])
num_steps (int)
beam_size (int)
branching_factor (int)
top_k (Optional[int])
temperature (float)
token_constraints (TokenConstraints)
- model_requirements = (<class 'tropt.model.model_mixins.LossTextAccessMixin'>,)#
Tuple of model mixin classes the primary model must satisfy; validated in
__init__.Convention: declare the least-restrictive configuration the optimizer supports.
Black-box only ->
(LossTextAccessMixin,), even if gradient modes exist.Always needs token-level loss → include
LossTokenAccessMixin.Always needs gradients → include
GradientTokenAccessMixinorGradientEmbedAccessMixin.
Notes: - Requirements that only occur in an optional flow (e.g.,
candidate_selection="gradient"requiringGradientTokenAccessMixin) must be validated explicitly in__init__aftersuper().__init__(), with an assert/error. - Requirements on auxiliary models (proxy_model, util_model, etc.) are not covered here, and should also be validated explicitly in__init__.
- optimize_trigger(templates, initial_trigger=None, targets=None)[source]#
Optimize the trigger using BEAST algorithm.
- Parameters:
templates (TextTemplates) – List of text templates to optimize the trigger against.
targets (Optional[Targets], optional) – Target values for the loss function.
initial_trigger (str | None)
- Return type:
Implementation notes: - We use the auxiliary LM (util_lm) to samples candidate tokens for the trigger.
(Note that in the original BEAST it was the same as the attacked LM; other attack use separate utility LM)
Then, we evaluate the candidate triggers on the targeted model (model) to compute the losses.
This loss evaluation against the target model is usually done in a black-box manner using text-level access (i.e., we query the model with the full text including the decoded candidate triggers), to enable the attack of fully black-box models; however, if util and target model share the same tokenizer, we can compute loss in token-level using the use_model_with_token_inputs option.
- class tropt.optimizer.GASLITEOptimizer(model, loss, tracker=None, seed=None, num_steps=100, n_grad=50, n_flip=20, n_candidates=128, token_constraints=TokenConstraints(disallow_non_ascii=True, disallow_special_tokens=True, disallow_unused_tokens=True, disallow_custom_token_ids=[]), use_retokenize=True, use_random_gradient=False, **kwargs)[source]#
Bases:
BaseOptimizerImplements the GASLITE optimization algorithm (Algorithm 1) from the paper: “GASLITEing the Retrieval: Exploring Vulnerabilities in Dense Embedding-based Search” (https://arxiv.org/abs/2412.20953)
- Parameters:
model (BaseModel)
loss (BaseLoss)
tracker (Optional[BaseTracker])
seed (Optional[int])
num_steps (int)
n_grad (int)
n_flip (int)
n_candidates (int)
token_constraints (TokenConstraints)
use_retokenize (bool)
use_random_gradient (bool)
- model_requirements = (<class 'tropt.model.model_mixins.LossTokenAccessMixin'>, <class 'tropt.model.model_mixins.GradientTokenAccessMixin'>)#
Tuple of model mixin classes the primary model must satisfy; validated in
__init__.Convention: declare the least-restrictive configuration the optimizer supports.
Black-box only ->
(LossTextAccessMixin,), even if gradient modes exist.Always needs token-level loss → include
LossTokenAccessMixin.Always needs gradients → include
GradientTokenAccessMixinorGradientEmbedAccessMixin.
Notes: - Requirements that only occur in an optional flow (e.g.,
candidate_selection="gradient"requiringGradientTokenAccessMixin) must be validated explicitly in__init__aftersuper().__init__(), with an assert/error. - Requirements on auxiliary models (proxy_model, util_model, etc.) are not covered here, and should also be validated explicitly in__init__.
- optimize_trigger(templates, initial_trigger='! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !', targets=None)[source]#
Optimize the trigger to minimize the loss on the given inputs.
Subclasses only implement the search loop and return an
OptimizerResult.- Parameters:
templates (
List[str]) – Can be a single string or a list of (n_templates) strings.initial_trigger (
Optional[str]) – Initial trigger to start optimization from, if used by the optimizer.targets (
Optional[Targets]) – Target outputs for the given inputs, if applicable.
- Return type:
- Returns:
Optimized trigger.
Note
This method is wrapped by the baseclass (via
__init_subclass__) to handle the full tracker lifecycle:tracker.init(config)before,tracker.finish(summary)after, model state resets, and other bookeeping.The optimization loop should iterate via
track_steps(), which handles tqdm progress bar and enforces any budget upper-bound configured viaset_budget().
- class tropt.optimizer.GASLITEPlusOptimizer(model, loss, tracker=None, seed=None, num_steps=100, n_grad=50, n_flip=20, n_candidates=128, token_constraints=TokenConstraints(disallow_non_ascii=True, disallow_special_tokens=True, disallow_unused_tokens=True, disallow_custom_token_ids=[]), use_retokenize=True, use_random_gradient=False, buffer_size=10, decline_n_flip_from_step=None, early_stopping_patience=None, early_stopping_threshold=0.005, n_bulk_flips=5, flip_pos_method='random', time_limit=None, n_flip_scheduler=None, **kwargs)[source]#
Bases:
BaseOptimizerImplements the GASLITE optimization algorithm (Algorithm 1) from the paper: “GASLITEing the Retrieval: Exploring Vulnerabilities in Dense Embedding-based Search” (https://arxiv.org/abs/2412.20953)
- Parameters:
model (BaseModel)
loss (BaseLoss)
tracker (Optional[BaseTracker])
seed (Optional[int])
num_steps (int)
n_grad (int)
n_flip (int)
n_candidates (int)
token_constraints (TokenConstraints)
use_retokenize (bool)
use_random_gradient (bool)
buffer_size (int)
decline_n_flip_from_step (Optional[int | float])
early_stopping_patience (Optional[int])
early_stopping_threshold (float)
n_bulk_flips (int)
flip_pos_method (str)
time_limit (Optional[float])
n_flip_scheduler (Optional[NFlipScheduler])
- model_requirements = (<class 'tropt.model.model_mixins.LossTokenAccessMixin'>, <class 'tropt.model.model_mixins.GradientTokenAccessMixin'>)#
Tuple of model mixin classes the primary model must satisfy; validated in
__init__.Convention: declare the least-restrictive configuration the optimizer supports.
Black-box only ->
(LossTextAccessMixin,), even if gradient modes exist.Always needs token-level loss → include
LossTokenAccessMixin.Always needs gradients → include
GradientTokenAccessMixinorGradientEmbedAccessMixin.
Notes: - Requirements that only occur in an optional flow (e.g.,
candidate_selection="gradient"requiringGradientTokenAccessMixin) must be validated explicitly in__init__aftersuper().__init__(), with an assert/error. - Requirements on auxiliary models (proxy_model, util_model, etc.) are not covered here, and should also be validated explicitly in__init__.
- optimize_trigger(templates, initial_trigger='! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !', targets=None)[source]#
Optimize the trigger to minimize the loss on the given inputs.
Subclasses only implement the search loop and return an
OptimizerResult.- Parameters:
templates (
List[str]) – Can be a single string or a list of (n_templates) strings.initial_trigger (
Optional[str]) – Initial trigger to start optimization from, if used by the optimizer.targets (
Optional[Targets]) – Target outputs for the given inputs, if applicable.
- Return type:
- Returns:
Optimized trigger.
Note
This method is wrapped by the baseclass (via
__init_subclass__) to handle the full tracker lifecycle:tracker.init(config)before,tracker.finish(summary)after, model state resets, and other bookeeping.The optimization loop should iterate via
track_steps(), which handles tqdm progress bar and enforces any budget upper-bound configured viaset_budget().
- class tropt.optimizer.GBDAOptimizer(model, loss, tracker=None, seed=None, num_steps=100, n_grad_samples=10, n_final_gumbel_samples=100, initial_coeff=15.0, init_mode='from_trigger', init_noise_scale=2.0, temp_schedule='linear', temp_start=1.0, temp_end=0.1, gd_optimizer=<class 'torch.optim.adam.Adam'>, use_lr_schedule=True, learning_rate=0.3, grad_clip_norm=None)[source]#
Bases:
BaseOptimizerGradient-Based Distributional Attack (GBDA). Paper: https://arxiv.org/abs/2104.13733 Reference implementation: facebookresearch/text-adversarial-attack
Optimizes a continuous logit matrix theta (model distribution over L tokens, where L is the trigger sequence length) that can be used to sample triggers (w/ Gumbel-softmax). Throughout the optimization, the matrix theta used to provide a weighted sum of input embedding, on which the loss and gradients can be computed, and subsequently update theta. After each optimization step, and in particular at the end, theta can be used to sample discrete triggers.
- Parameters:
model (BaseModel)
loss (BaseLoss)
tracker (Optional[BaseTracker])
seed (Optional[int])
num_steps (int)
n_grad_samples (int)
n_final_gumbel_samples (int)
initial_coeff (float)
init_mode (Literal['from_trigger', 'random'])
init_noise_scale (float)
temp_schedule (Literal['linear', 'gradual'])
temp_start (float)
temp_end (float)
gd_optimizer (Callable[..., torch.optim.Optimizer])
use_lr_schedule (bool)
learning_rate (float)
grad_clip_norm (Optional[float])
- model_requirements = (<class 'tropt.model.model_mixins.LossTokenAccessMixin'>, <class 'tropt.model.model_mixins.GradientTokenAccessMixin'>)#
Tuple of model mixin classes the primary model must satisfy; validated in
__init__.Convention: declare the least-restrictive configuration the optimizer supports.
Black-box only ->
(LossTextAccessMixin,), even if gradient modes exist.Always needs token-level loss → include
LossTokenAccessMixin.Always needs gradients → include
GradientTokenAccessMixinorGradientEmbedAccessMixin.
Notes: - Requirements that only occur in an optional flow (e.g.,
candidate_selection="gradient"requiringGradientTokenAccessMixin) must be validated explicitly in__init__aftersuper().__init__(), with an assert/error. - Requirements on auxiliary models (proxy_model, util_model, etc.) are not covered here, and should also be validated explicitly in__init__.
- optimize_trigger(templates, initial_trigger='! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !', targets=None)[source]#
Optimize the trigger to minimize the loss on the given inputs.
Subclasses only implement the search loop and return an
OptimizerResult.- Parameters:
templates (
List[str]) – Can be a single string or a list of (n_templates) strings.initial_trigger (
Optional[str]) – Initial trigger to start optimization from, if used by the optimizer.targets (
Optional[Targets]) – Target outputs for the given inputs, if applicable.
- Return type:
- Returns:
Optimized trigger.
Note
This method is wrapped by the baseclass (via
__init_subclass__) to handle the full tracker lifecycle:tracker.init(config)before,tracker.finish(summary)after, model state resets, and other bookeeping.The optimization loop should iterate via
track_steps(), which handles tqdm progress bar and enforces any budget upper-bound configured viaset_budget().
- class tropt.optimizer.GCGOptimizer(model, loss, tracker=None, seed=None, num_steps=500, n_candidates=512, sample_topk=256, sample_n_replace=1, token_constraints=TokenConstraints(disallow_non_ascii=True, disallow_special_tokens=True, disallow_unused_tokens=True, disallow_custom_token_ids=[]), use_retokenize=True)[source]#
Bases:
BaseOptimizerhttps://arxiv.org/abs/2307.15043
- Parameters:
model (BaseModel)
loss (BaseLoss)
tracker (Optional[BaseTracker])
seed (Optional[int])
num_steps (int)
n_candidates (int)
sample_topk (int)
sample_n_replace (int)
token_constraints (TokenConstraints)
use_retokenize (bool)
- model_requirements = (<class 'tropt.model.model_mixins.LossTokenAccessMixin'>, <class 'tropt.model.model_mixins.GradientTokenAccessMixin'>)#
Tuple of model mixin classes the primary model must satisfy; validated in
__init__.Convention: declare the least-restrictive configuration the optimizer supports.
Black-box only ->
(LossTextAccessMixin,), even if gradient modes exist.Always needs token-level loss → include
LossTokenAccessMixin.Always needs gradients → include
GradientTokenAccessMixinorGradientEmbedAccessMixin.
Notes: - Requirements that only occur in an optional flow (e.g.,
candidate_selection="gradient"requiringGradientTokenAccessMixin) must be validated explicitly in__init__aftersuper().__init__(), with an assert/error. - Requirements on auxiliary models (proxy_model, util_model, etc.) are not covered here, and should also be validated explicitly in__init__.
- optimize_trigger(templates, initial_trigger='! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !', targets=None)[source]#
Optimize the trigger to minimize the loss on the given inputs.
Subclasses only implement the search loop and return an
OptimizerResult.- Parameters:
templates (
List[str]) – Can be a single string or a list of (n_templates) strings.initial_trigger (
Optional[str]) – Initial trigger to start optimization from, if used by the optimizer.targets (
Optional[Targets]) – Target outputs for the given inputs, if applicable.
- Return type:
- Returns:
Optimized trigger.
Note
This method is wrapped by the baseclass (via
__init_subclass__) to handle the full tracker lifecycle:tracker.init(config)before,tracker.finish(summary)after, model state resets, and other bookeeping.The optimization loop should iterate via
track_steps(), which handles tqdm progress bar and enforces any budget upper-bound configured viaset_budget().
- class tropt.optimizer.GCGPlusOptimizer(model, loss, tracker=None, seed=None, proxy_model=None, proxy_loss=None, candidate_selection='gradient', num_steps=500, n_candidates=512, sample_topk=256, token_constraints=TokenConstraints(disallow_non_ascii=True, disallow_special_tokens=True, disallow_unused_tokens=True, disallow_custom_token_ids=[]), use_retokenize=True, sample_n_replace=(1, 1), candidate_oversample_factor=1.1, momentum=0.0, buffer_size=None, n_grad_avg=1, template_batch_size=None)[source]#
Bases:
BaseOptimizerFlexible GCG implementation, supporting tricks from GCG, QCG, GASLITE, and UAT.
- Two-stage design:
Candidate selection (on proxy model) — gradient-based, random, or focused.
Candidate evaluation (on target model) — via text or token access.
References
GASLITE: https://arxiv.org/abs/2412.20953
- Parameters:
model (BaseModel)
loss (BaseLoss)
tracker (Optional[BaseTracker])
seed (Optional[int])
proxy_model (Optional[BaseModel])
proxy_loss (Optional[BaseLoss])
candidate_selection (Literal['gradient', 'random', 'focused'])
num_steps (int)
n_candidates (int)
sample_topk (int)
token_constraints (TokenConstraints)
use_retokenize (bool)
sample_n_replace (Union[int, Tuple[int, int]])
candidate_oversample_factor (float)
momentum (float)
buffer_size (Optional[int])
n_grad_avg (int)
template_batch_size (Optional[int])
- model_requirements = (<class 'tropt.model.model_mixins.LossTextAccessMixin'>,)#
Tuple of model mixin classes the primary model must satisfy; validated in
__init__.Convention: declare the least-restrictive configuration the optimizer supports.
Black-box only ->
(LossTextAccessMixin,), even if gradient modes exist.Always needs token-level loss → include
LossTokenAccessMixin.Always needs gradients → include
GradientTokenAccessMixinorGradientEmbedAccessMixin.
Notes: - Requirements that only occur in an optional flow (e.g.,
candidate_selection="gradient"requiringGradientTokenAccessMixin) must be validated explicitly in__init__aftersuper().__init__(), with an assert/error. - Requirements on auxiliary models (proxy_model, util_model, etc.) are not covered here, and should also be validated explicitly in__init__.
- optimize_trigger(templates, initial_trigger='! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !', targets=None)[source]#
Optimize the trigger to minimize the loss on the given inputs.
Subclasses only implement the search loop and return an
OptimizerResult.- Parameters:
templates (
List[str]) – Can be a single string or a list of (n_templates) strings.initial_trigger (
Optional[str]) – Initial trigger to start optimization from, if used by the optimizer.targets (
Optional[Targets]) – Target outputs for the given inputs, if applicable.
- Return type:
- Returns:
Optimized trigger.
Note
This method is wrapped by the baseclass (via
__init_subclass__) to handle the full tracker lifecycle:tracker.init(config)before,tracker.finish(summary)after, model state resets, and other bookeeping.The optimization loop should iterate via
track_steps(), which handles tqdm progress bar and enforces any budget upper-bound configured viaset_budget().
- class tropt.optimizer.HotFlipOptimizer(model, loss, tracker=None, seed=None, num_steps=500, token_constraints=TokenConstraints(disallow_non_ascii=True, disallow_special_tokens=True, disallow_unused_tokens=True, disallow_custom_token_ids=[]), use_retokenize=True)[source]#
Bases:
BaseOptimizerHotFlip: White-Box Adversarial Examples for Text Classification. https://arxiv.org/abs/1712.06751
Uses first-order Taylor approximation of the loss to greedily select token substitutions. Each flip is chosen as the (position, token) pair that maximally decreases the estimated loss, without requiring a forward pass for candidate evaluation. We implement the greedy variant introduced in the paper.
- Parameters:
model (BaseModel)
loss (BaseLoss)
tracker (Optional[BaseTracker])
seed (Optional[int])
num_steps (int)
token_constraints (TokenConstraints)
use_retokenize (bool)
- model_requirements = (<class 'tropt.model.model_mixins.LossTokenAccessMixin'>, <class 'tropt.model.model_mixins.GradientTokenAccessMixin'>)#
Tuple of model mixin classes the primary model must satisfy; validated in
__init__.Convention: declare the least-restrictive configuration the optimizer supports.
Black-box only ->
(LossTextAccessMixin,), even if gradient modes exist.Always needs token-level loss → include
LossTokenAccessMixin.Always needs gradients → include
GradientTokenAccessMixinorGradientEmbedAccessMixin.
Notes: - Requirements that only occur in an optional flow (e.g.,
candidate_selection="gradient"requiringGradientTokenAccessMixin) must be validated explicitly in__init__aftersuper().__init__(), with an assert/error. - Requirements on auxiliary models (proxy_model, util_model, etc.) are not covered here, and should also be validated explicitly in__init__.
- optimize_trigger(templates, initial_trigger='! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !', targets=None)[source]#
Optimize the trigger to minimize the loss on the given inputs.
Subclasses only implement the search loop and return an
OptimizerResult.- Parameters:
templates (
List[str]) – Can be a single string or a list of (n_templates) strings.initial_trigger (
Optional[str]) – Initial trigger to start optimization from, if used by the optimizer.targets (
Optional[Targets]) – Target outputs for the given inputs, if applicable.
- Return type:
- Returns:
Optimized trigger.
Note
This method is wrapped by the baseclass (via
__init_subclass__) to handle the full tracker lifecycle:tracker.init(config)before,tracker.finish(summary)after, model state resets, and other bookeeping.The optimization loop should iterate via
track_steps(), which handles tqdm progress bar and enforces any budget upper-bound configured viaset_budget().
- class tropt.optimizer.OptimizerResult(best_loss, best_trigger_ids=None, best_trigger_str=None, best_trigger_emb=None, best_trigger_probs=None, losses=None, trigger_strs=None)[source]#
Bases:
object- Parameters:
best_loss (float)
best_trigger_ids (Float[Tensor, '1 trigger_seq_len'] | None)
best_trigger_str (str | None)
best_trigger_emb (Float[Tensor, 'trigger_seq_len embed_dim'] | None)
best_trigger_probs (Float[Tensor, 'trigger_seq_len vocab_size'] | None)
losses (List[float] | None)
trigger_strs (List[str] | None)
-
best_loss:
float#
-
best_trigger_emb:
Optional[Float[Tensor, 'trigger_seq_len embed_dim']] = None#
-
best_trigger_ids:
Optional[Float[Tensor, '1 trigger_seq_len']] = None#
-
best_trigger_probs:
Optional[Float[Tensor, 'trigger_seq_len vocab_size']] = None#
-
best_trigger_str:
Optional[str] = None#
-
losses:
Optional[List[float]] = None#
- to_dict()[source]#
Lightweight summary dict for final logging (no tensors or lists !).
- Return type:
dict
-
trigger_strs:
Optional[List[str]] = None#
- class tropt.optimizer.PALOptimizer(model, loss, tracker=None, seed=None, proxy_model=None, proxy_loss=None, candidate_selection='gradient', num_steps=500, n_candidates=512, sample_topk=256, sample_n_replace=1, candidate_oversample_factor=1.5, token_constraints=TokenConstraints(disallow_non_ascii=True, disallow_special_tokens=True, disallow_unused_tokens=True, disallow_custom_token_ids=[]), n_candidates_after_proxy_filter=None)[source]#
Bases:
BaseOptimizerProxy-guided black-box optimizer for PAL and RAL attacks.
- Two-stage design:
Candidate selection (on proxy) — gradient-based or random.
Candidate evaluation (on target) — via text or token access.
Skip-visited is always enabled. Optional proxy filtering narrows candidates by proxy loss before querying the target.
References
Official Codebase: chawins/pal
Note: PAL original implementation also support fine-tuning the proxy model (to get it closer to the target model in the “optimized area”), we currently don’t support that.
- Parameters:
model (BaseModel)
loss (BaseLoss)
tracker (Optional[BaseTracker])
seed (Optional[int])
proxy_model (Optional[BaseModel])
proxy_loss (Optional[BaseLoss])
candidate_selection (Literal['gradient', 'random'])
num_steps (int)
n_candidates (int)
sample_topk (int)
sample_n_replace (int)
candidate_oversample_factor (float)
token_constraints (TokenConstraints)
n_candidates_after_proxy_filter (Optional[int])
- model_requirements = (<class 'tropt.model.model_mixins.LossTextAccessMixin'>,)#
Tuple of model mixin classes the primary model must satisfy; validated in
__init__.Convention: declare the least-restrictive configuration the optimizer supports.
Black-box only ->
(LossTextAccessMixin,), even if gradient modes exist.Always needs token-level loss → include
LossTokenAccessMixin.Always needs gradients → include
GradientTokenAccessMixinorGradientEmbedAccessMixin.
Notes: - Requirements that only occur in an optional flow (e.g.,
candidate_selection="gradient"requiringGradientTokenAccessMixin) must be validated explicitly in__init__aftersuper().__init__(), with an assert/error. - Requirements on auxiliary models (proxy_model, util_model, etc.) are not covered here, and should also be validated explicitly in__init__.
- optimize_trigger(templates, initial_trigger='! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !', targets=None)[source]#
Optimize the trigger to minimize the loss on the given inputs.
Subclasses only implement the search loop and return an
OptimizerResult.- Parameters:
templates (
List[str]) – Can be a single string or a list of (n_templates) strings.initial_trigger (
Optional[str]) – Initial trigger to start optimization from, if used by the optimizer.targets (
Optional[Targets]) – Target outputs for the given inputs, if applicable.
- Return type:
- Returns:
Optimized trigger.
Note
This method is wrapped by the baseclass (via
__init_subclass__) to handle the full tracker lifecycle:tracker.init(config)before,tracker.finish(summary)after, model state resets, and other bookeeping.The optimization loop should iterate via
track_steps(), which handles tqdm progress bar and enforces any budget upper-bound configured viaset_budget().
- class tropt.optimizer.PEZOptimizer(model, loss, tracker=None, seed=None, num_steps=300, learning_rate=0.1, weight_decay=0.1, gd_optimizer=<class 'torch.optim.sgd.SGD'>)[source]#
Bases:
BaseOptimizerOptimizer of PEZ: optimizes contiuous trigger (AKA soft trigger), but projects it to discrete tokens every optimization step.
Paper: https://arxiv.org/abs/2302.03668 Reference implementation: YuxinWenRick/hard-prompts-made-easy
- Parameters:
model (BaseModel)
loss (BaseLoss)
tracker (Optional[BaseTracker])
seed (Optional[int])
num_steps (int)
learning_rate (float)
weight_decay (float)
gd_optimizer (type)
- model_requirements = (<class 'tropt.model.model_mixins.LossTokenAccessMixin'>, <class 'tropt.model.model_mixins.GradientEmbedAccessMixin'>)#
Tuple of model mixin classes the primary model must satisfy; validated in
__init__.Convention: declare the least-restrictive configuration the optimizer supports.
Black-box only ->
(LossTextAccessMixin,), even if gradient modes exist.Always needs token-level loss → include
LossTokenAccessMixin.Always needs gradients → include
GradientTokenAccessMixinorGradientEmbedAccessMixin.
Notes: - Requirements that only occur in an optional flow (e.g.,
candidate_selection="gradient"requiringGradientTokenAccessMixin) must be validated explicitly in__init__aftersuper().__init__(), with an assert/error. - Requirements on auxiliary models (proxy_model, util_model, etc.) are not covered here, and should also be validated explicitly in__init__.
- optimize_trigger(templates, initial_trigger='! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !', targets=None)[source]#
Optimize the trigger to minimize the loss on the given inputs.
Subclasses only implement the search loop and return an
OptimizerResult.- Parameters:
templates (
List[str]) – Can be a single string or a list of (n_templates) strings.initial_trigger (
Optional[str]) – Initial trigger to start optimization from, if used by the optimizer.targets (
Optional[Targets]) – Target outputs for the given inputs, if applicable.
- Return type:
- Returns:
Optimized trigger.
Note
This method is wrapped by the baseclass (via
__init_subclass__) to handle the full tracker lifecycle:tracker.init(config)before,tracker.finish(summary)after, model state resets, and other bookeeping.The optimization loop should iterate via
track_steps(), which handles tqdm progress bar and enforces any budget upper-bound configured viaset_budget().
- class tropt.optimizer.QCGOptimizer(model, loss, tracker=None, seed=None, proxy_model=None, num_steps=500, n_proxy_candidates=8192, n_target_candidates=32, buffer_size=128, token_constraints=TokenConstraints(disallow_non_ascii=True, disallow_special_tokens=True, disallow_unused_tokens=True, disallow_custom_token_ids=[]), candidate_oversample_factor=1.5)[source]#
Bases:
BaseOptimizerGreedy Coordinate Query optimizer (Hayase et al., 2024).
Buffer-based query attack: maintains a buffer of B best triggers, expands from the best entry each step via random single-token flips, uses the proxy models to filter to top-K, then evaluates on the target model.
Reference: https://arxiv.org/abs/2402.12329
- Parameters:
model (BaseModel)
loss (BaseLoss)
tracker (Optional[BaseTracker])
seed (Optional[int])
proxy_model (Optional[BaseModel])
num_steps (int)
n_proxy_candidates (int)
n_target_candidates (int)
buffer_size (int)
token_constraints (TokenConstraints)
candidate_oversample_factor (float)
- model_requirements = (<class 'tropt.model.model_mixins.LossTextAccessMixin'>,)#
Tuple of model mixin classes the primary model must satisfy; validated in
__init__.Convention: declare the least-restrictive configuration the optimizer supports.
Black-box only ->
(LossTextAccessMixin,), even if gradient modes exist.Always needs token-level loss → include
LossTokenAccessMixin.Always needs gradients → include
GradientTokenAccessMixinorGradientEmbedAccessMixin.
Notes: - Requirements that only occur in an optional flow (e.g.,
candidate_selection="gradient"requiringGradientTokenAccessMixin) must be validated explicitly in__init__aftersuper().__init__(), with an assert/error. - Requirements on auxiliary models (proxy_model, util_model, etc.) are not covered here, and should also be validated explicitly in__init__.
- optimize_trigger(templates, initial_trigger='! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !', targets=None)[source]#
Optimize the trigger to minimize the loss on the given inputs.
Subclasses only implement the search loop and return an
OptimizerResult.- Parameters:
templates (
List[str]) – Can be a single string or a list of (n_templates) strings.initial_trigger (
Optional[str]) – Initial trigger to start optimization from, if used by the optimizer.targets (
Optional[Targets]) – Target outputs for the given inputs, if applicable.
- Return type:
- Returns:
Optimized trigger.
Note
This method is wrapped by the baseclass (via
__init_subclass__) to handle the full tracker lifecycle:tracker.init(config)before,tracker.finish(summary)after, model state resets, and other bookeeping.The optimization loop should iterate via
track_steps(), which handles tqdm progress bar and enforces any budget upper-bound configured viaset_budget().
- class tropt.optimizer.RASLITEPlusOptimizer(model, loss, tracker=None, seed=None, num_steps=100, n_logit_samples=None, n_flip=20, n_candidates=128, token_constraints=TokenConstraints(disallow_non_ascii=True, disallow_special_tokens=True, disallow_unused_tokens=True, disallow_custom_token_ids=[]), use_retokenize=True, util_model=None, use_random_logits=False, buffer_size=10, decline_n_flip_from_step=None, early_stopping_patience=None, early_stopping_threshold=0.005, n_bulk_flips=5, flip_pos_method='random', **kwargs)[source]#
Bases:
BaseOptimizerImplements the RASLITEPlus optimization algorithm, which basically runs GASLITE against a black-box model; specifically, we use a util-LM for the tokenizer and to compute logits, using stratgies from GASLITEPlus (buffer, early stopping, decreasing n_flip, etc.). The key loss computations are done on text-level against the black-box target model.
Builds on the paper: “GASLITEing the Retrieval: Exploring Vulnerabilities in Dense Embedding-based Search” (https://arxiv.org/abs/2412.20953)
- Parameters:
model (BaseModel)
loss (BaseLoss)
tracker (Optional[BaseTracker])
seed (Optional[int])
num_steps (int)
n_logit_samples (Optional[int])
n_flip (int | float)
n_candidates (int)
token_constraints (TokenConstraints)
use_retokenize (bool)
util_model (Optional[LMBaseModel])
use_random_logits (bool)
buffer_size (int)
decline_n_flip_from_step (Optional[int | float])
early_stopping_patience (Optional[int])
early_stopping_threshold (float)
n_bulk_flips (int)
flip_pos_method (str)
- model_requirements = (<class 'tropt.model.model_mixins.LossTextAccessMixin'>,)#
Tuple of model mixin classes the primary model must satisfy; validated in
__init__.Convention: declare the least-restrictive configuration the optimizer supports.
Black-box only ->
(LossTextAccessMixin,), even if gradient modes exist.Always needs token-level loss → include
LossTokenAccessMixin.Always needs gradients → include
GradientTokenAccessMixinorGradientEmbedAccessMixin.
Notes: - Requirements that only occur in an optional flow (e.g.,
candidate_selection="gradient"requiringGradientTokenAccessMixin) must be validated explicitly in__init__aftersuper().__init__(), with an assert/error. - Requirements on auxiliary models (proxy_model, util_model, etc.) are not covered here, and should also be validated explicitly in__init__.
- optimize_trigger(templates, initial_trigger='! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !', targets=None)[source]#
Optimize the trigger to minimize the loss on the given inputs.
Subclasses only implement the search loop and return an
OptimizerResult.- Parameters:
templates (
List[str]) – Can be a single string or a list of (n_templates) strings.initial_trigger (
Optional[str]) – Initial trigger to start optimization from, if used by the optimizer.targets (
Optional[Targets]) – Target outputs for the given inputs, if applicable.
- Return type:
- Returns:
Optimized trigger.
Note
This method is wrapped by the baseclass (via
__init_subclass__) to handle the full tracker lifecycle:tracker.init(config)before,tracker.finish(summary)after, model state resets, and other bookeeping.The optimization loop should iterate via
track_steps(), which handles tqdm progress bar and enforces any budget upper-bound configured viaset_budget().
- class tropt.optimizer.RandomSearchOptimizer(model, loss, tracker=None, seed=None, num_steps=500, n_candidates=128, mutation_mode='block_random', schedule='fixed', initial_block_len=4, patience=25, token_constraints=TokenConstraints(disallow_non_ascii=True, disallow_special_tokens=True, disallow_unused_tokens=True, disallow_custom_token_ids=[]), tokenizer=None)[source]#
Bases:
BaseOptimizerRandomSearch: batched zeroth-order token optimization with block mutation.
- Per step:
Compute block size from coarse-to-fine schedule
For each candidate, pick a random start position and replace a contiguous block with random tokens from the allowed set
Decode candidates to strings and evaluate via
compute_loss_from_textsKeep best if it improves current loss
If no improvement for
patiencesteps, restart from random init
Implementation Notes: - Candidate evaluation is always text-based (
compute_loss_from_texts); even for HF model, we decode to strings and re-encode for model input. - A tokenizer is needed for the optimizer’s token-level mutations; it should eitehr be provided, or we fall back to the model’s tokenizer if it has one. - The original implementation employs a “warm” intiial trigger (eg another GCG suffix), and uses it as the starting point for all restarts. Here, we sample random triggers for all restarts for diversity. - The original implementation employs an LLM judge for early stopping; here we use a simple patience counter for restarts. - The original implementation mostly use a loss-based scheduler. For generality (e.g., different potential loss values) we avoid using it.Reference implementation: - The original implementation: tml-epfl/llm-adaptive-attacks - Another (more simplified) implementation: romovpa/claudini
- Parameters:
model (BaseModel)
loss (BaseLoss)
seed (Optional[int])
num_steps (int)
n_candidates (int)
mutation_mode (Literal['block_random', 'single_cyclic'])
schedule (Literal['fixed', 'none'])
initial_block_len (int)
patience (int)
token_constraints (TokenConstraints)
tokenizer (Optional[BaseTokenizer])
- model_requirements = (<class 'tropt.model.model_mixins.LossTextAccessMixin'>,)#
Tuple of model mixin classes the primary model must satisfy; validated in
__init__.Convention: declare the least-restrictive configuration the optimizer supports.
Black-box only ->
(LossTextAccessMixin,), even if gradient modes exist.Always needs token-level loss → include
LossTokenAccessMixin.Always needs gradients → include
GradientTokenAccessMixinorGradientEmbedAccessMixin.
Notes: - Requirements that only occur in an optional flow (e.g.,
candidate_selection="gradient"requiringGradientTokenAccessMixin) must be validated explicitly in__init__aftersuper().__init__(), with an assert/error. - Requirements on auxiliary models (proxy_model, util_model, etc.) are not covered here, and should also be validated explicitly in__init__.
- optimize_trigger(templates, initial_trigger='! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !', targets=None)[source]#
Optimize the trigger to minimize the loss on the given inputs.
Subclasses only implement the search loop and return an
OptimizerResult.- Parameters:
templates (
List[str]) – Can be a single string or a list of (n_templates) strings.initial_trigger (
Optional[str]) – Initial trigger to start optimization from, if used by the optimizer.targets (
Optional[Targets]) – Target outputs for the given inputs, if applicable.
- Return type:
- Returns:
Optimized trigger.
Note
This method is wrapped by the baseclass (via
__init_subclass__) to handle the full tracker lifecycle:tracker.init(config)before,tracker.finish(summary)after, model state resets, and other bookeeping.The optimization loop should iterate via
track_steps(), which handles tqdm progress bar and enforces any budget upper-bound configured viaset_budget().