Optimizer Utilities#

Shared building blocks used across optimizer implementations: token constraints, candidate buffers, retokenization filters, schedules, and trigger initializers. Import each from its submodule, e.g. from tropt.optimizer.utils.token_constraints import TokenConstraints.

Token constraints#

class tropt.optimizer.utils.token_constraints.TokenConstraints(disallow_non_ascii=True, disallow_special_tokens=True, disallow_unused_tokens=True, disallow_custom_token_ids=<factory>)[source]#

Bases: object

Parameters:
  • disallow_non_ascii (bool)

  • disallow_special_tokens (bool)

  • disallow_unused_tokens (bool)

  • disallow_custom_token_ids (List[int])

disallow_custom_token_ids: List[int]#

Disallow any additional custom token ids.

disallow_non_ascii: bool = True#

Disallow non-ASCII tokens, which may be escaped by a defender when the trigger is used.

disallow_special_tokens: bool = True#

Disallow special tokens (e.g., bos, eos, unk), which may be escaped by a defender when the trigger is used.

disallow_unused_tokens: bool = True#

disallow <unused*> tokens, which can be filtered by a defender. In many cases there are not part of the special tokens, thus require special care.

get_blacklist_ids(tokenizer, vocab_size=None)[source]#

Returns a list of token IDs that should be blacklisted based on the constraints.

Return type:

List[int]

Parameters:

vocab_size (int | None)

get_whitelist_ids(tokenizer, vocab_size, device=None, return_tensor=False)[source]#

Returns valid (non-blacklisted) token ids. Reuses the cached blacklist for efficiency.

Parameters:
  • return_tensor (bool) – If True, return a 1-D int tensor on device instead of a list.

  • device – Required when return_tensor=True.

  • vocab_size (int)

Return type:

Union[List[int], Int[Tensor, 'n_valid']]

Token initializers#

tropt.optimizer.utils.token_initializers.get_printable_random_trigger(trigger_len, return_ids=False, blacklist_ids=None, tokenizer=None, token_constraints=None)[source]#

Generates a random initial trigger consisting of printable ASCII english letters. - If the tokenizer is provided, the trigger is tokenized and truncated to ensure it fits within the specified length. Otherwise, the length stands for the number of characters. - Tokens whose IDs appear in blacklist_ids are resampled until a clean sequence is found. - return_ids: If True, returns a 1-D LongTensor of token IDs (requires tokenizer), instead of the string. - token_constraints: If provided (and tokenizer is set), extracts blacklist_ids automatically. Overrides blacklist_ids.

Return type:

str | Float[Tensor, ‘trigger_seq_len’]

Parameters:
  • trigger_len (int)

  • return_ids (bool)

  • blacklist_ids (Optional[List[int]])

  • tokenizer (Optional[BaseTokenizer])

  • token_constraints (Optional['TokenConstraints'])

Trigger buffer#

class tropt.optimizer.utils.buffer.TriggerBuffer(triggers=None, losses=None)[source]#

Bases: object

Enables maintaining a buffer of the best triggers found during optimization. https://www.haizelabs.com/blog/making-a-sota-adversarial-attack-on-llms-38x-faster https://arxiv.org/pdf/2402.12329

Parameters:
  • triggers (Optional[list[torch.Tensor]])

  • losses (Optional[list[float]])

add(trigger_ids, loss)[source]#

Adds a new trigger and its loss to the buffer. Increases the buffer size by one.

Parameters:
  • trigger_ids (Tensor)

  • loss (float)

add_if_better(trigger_ids, loss)[source]#

Adds the trigger to the buffer if its loss is better than the worst in the buffer. Retains the buffer size.

Parameters:
  • trigger_ids (Tensor)

  • loss (float)

get_best_trigger(top_k=1)[source]#

Return the lowest-loss trigger; if top_k > 1, sample uniformly from the top_k for exploration.

Return type:

Tensor

Parameters:

top_k (int)

get_highest_loss()[source]#
Return type:

float

get_lowest_loss()[source]#
Return type:

float

property size: int#

Running-best tracking#

class tropt.optimizer.utils.running_best.RunningBest(loss=inf, trigger_ids=None, trigger_str=None, trigger_emb=None, step=-1, losses=<factory>, trigger_strs=<factory>)[source]#

Bases: object

An auxilary object for optimizers: accumulates per-step losses and tracks the best trigger found so far.

Stores:

loss: the best loss found so far trigger_ids: the token IDs of the best trigger found so far trigger_str: the string of the best trigger found so far trigger_emb: the embedding of the best trigger found so far (for gradient-based optimizers; optional) step: the step at which the best trigger was found

losses: a list of all losses observed at each step trigger_strs: a list of all trigger strings observed at each step

Parameters:
  • loss (float)

  • trigger_ids (Int[Tensor, 'trigger_seq_len'] | None)

  • trigger_str (str | None)

  • trigger_emb (Float[Tensor, 'trigger_seq_len embed_dim'] | None)

  • step (int)

  • losses (List[float])

  • trigger_strs (List[str])

loss: float = inf#
losses: List[float]#
step: int = -1#
to_result()[source]#

Convert to an OptimizerResult.

trigger_emb: Optional[Float[Tensor, 'trigger_seq_len embed_dim']] = None#
trigger_ids: Optional[Int[Tensor, 'trigger_seq_len']] = None#
trigger_str: Optional[str] = None#
trigger_strs: List[str]#
update(loss, trigger_ids=None, trigger_str=None, trigger_emb=None)[source]#

Record a step and update the best if improved. Returns True on new best.

Parameters:
  • loss (float) – the loss observed at the current step

  • trigger_ids (Optional[Int[Tensor, 'trigger_seq_len']]) – the token IDs of the trigger at the current step

  • trigger_str (Optional[str]) – the string of the trigger at the current step

  • trigger_emb (Optional[Float[Tensor, 'trigger_seq_len embed_dim']]) – the embedding of the trigger at the current step (for gradient-based optimizers; optional)

Returns:

True if this is a new best, False otherwise.

Return type:

bool

Schedulers#

class tropt.optimizer.utils.scheduler.ConstantScheduler(n_flip)[source]#

Bases: NFlipScheduler

A scheduler that always returns the same n_flip value.

Parameters:

n_flip (int)

get_n_flip(step)[source]#

Returns the n_flip value for the given step (0-indexed).

Return type:

int

Parameters:

step (int)

class tropt.optimizer.utils.scheduler.LinearScheduler(initial_n_flip, total_steps, decline_start)[source]#

Bases: NFlipScheduler

A scheduler that linearly decreases n_flip from an initial value to 1 over the course of optimization steps, starting from a specified step.

Parameters:
  • initial_n_flip (int)

  • total_steps (int)

  • decline_start (int | float)

get_n_flip(step)[source]#

Returns the n_flip value for the given step (0-indexed).

Return type:

int

Parameters:

step (int)

class tropt.optimizer.utils.scheduler.NFlipScheduler[source]#

Bases: ABC

abstract get_n_flip(step)[source]#

Returns the n_flip value for the given step (0-indexed).

Return type:

int

Parameters:

step (int)

Retokenization filtering#

tropt.optimizer.utils.retokenization.full_messages_retokenize_filtering(candidate_trigger_ids, tokenizer, templates, trigger_placeholder='{{OPTIMIZED_TRIGGER}}')[source]#

Filters out candidate triggers that change after retokenization in the full trigger-combined message context (as opposed to just retokenizing the trigger, handled by retokenize_filtering).

Some context:

The idea is that we want full alignment between: (i) the token ids the model “sees” during optimization, and (ii) the token ids the model “sees” at inference time (which will be an artifact of retokenization). Crucially, this restriction is much more strict than the one in retokenize_filtering (i.e. the following function also enforces the former condition), which only requires successful retokenization of the trigger. Subsequenctly, for some tokenizers, this function may leave very few to no valid candidates, in which case the user should consider disabling. Notably, empirically, optimizations were shown to perform well with the retokenize_filtering alone.

Parameters:
  • candidate_trigger_ids (Float[Tensor, 'n_candidates trigger_seq_len']) – Tensor, shape = (n_candidates, trigger_seq_len) candidate trigger token ids

  • tokenizer (PreTrainedTokenizer) – ~transformers.PreTrainedTokenizer the model’s tokenizer

  • templates (List[str]) – List[str] (length = n_templates) list of user message templates, each containing the trigger_placeholder where the trigger will be inserted.

  • trigger_placeholder (str) – str the placeholder string in templates to be replaced by the trigger

Returns:

Tensor, shape = (new_n_candidates, trigger_seq_len)

all token ids that are the same after retokenization in full context

Return type:

filtered_ids

tropt.optimizer.utils.retokenization.retokenize_filtering(ids, tokenizer)[source]#

Filters out sequeneces of token ids that change after retokenization. It is a common practice for discrete token optimizations to ensure alignment between the optimized token sequences and the ones that will be eventually provided to the model. It was shown to improve performance.

Parameters:
  • ids (Float[Tensor, 'bsz n_ids']) – Tensor, shape = (bsz, n_ids) batch of token ids to be filtered

  • tokenizer (PreTrainedTokenizer) – ~transformers.PreTrainedTokenizer the model’s tokenizer

Returns:

Tensor, shape = (new_search_width, n_optim_ids)

all token ids that are the same after retokenization

Return type:

filtered_ids

tropt.optimizer.utils.retokenization.retokenize_transform(ids, tokenizer)[source]#

Retokenize a token sequence: decode then re-encode.

Handles length mismatches by truncating or padding (with original ids) to preserve the original sequence length.

Parameters:
  • ids (Float[Tensor, 'n_ids']) – Token ids to retokenize, shape (n_ids,).

  • tokenizer (PreTrainedTokenizer) – The model’s tokenizer.

Return type:

Float[Tensor, 'n_ids']

Returns:

Retokenized ids with the same length as input.