Optimizer Utilities#
Shared building blocks used across optimizer implementations: token
constraints, candidate buffers, retokenization filters, schedules, and trigger
initializers. Import each from its submodule, e.g. from
tropt.optimizer.utils.token_constraints import TokenConstraints.
Token constraints#
- class tropt.optimizer.utils.token_constraints.TokenConstraints(disallow_non_ascii=True, disallow_special_tokens=True, disallow_unused_tokens=True, disallow_custom_token_ids=<factory>)[source]#
Bases:
object- Parameters:
disallow_non_ascii (bool)
disallow_special_tokens (bool)
disallow_unused_tokens (bool)
disallow_custom_token_ids (List[int])
-
disallow_custom_token_ids:
List[int]# Disallow any additional custom token ids.
-
disallow_non_ascii:
bool= True# Disallow non-ASCII tokens, which may be escaped by a defender when the trigger is used.
-
disallow_special_tokens:
bool= True# Disallow special tokens (e.g., bos, eos, unk), which may be escaped by a defender when the trigger is used.
-
disallow_unused_tokens:
bool= True# disallow <unused*> tokens, which can be filtered by a defender. In many cases there are not part of the special tokens, thus require special care.
- get_blacklist_ids(tokenizer, vocab_size=None)[source]#
Returns a list of token IDs that should be blacklisted based on the constraints.
- Return type:
List[int]- Parameters:
vocab_size (int | None)
- get_whitelist_ids(tokenizer, vocab_size, device=None, return_tensor=False)[source]#
Returns valid (non-blacklisted) token ids. Reuses the cached blacklist for efficiency.
- Parameters:
return_tensor (
bool) – If True, return a 1-D int tensor on device instead of a list.device – Required when
return_tensor=True.vocab_size (int)
- Return type:
Union[List[int],Int[Tensor, 'n_valid']]
Token initializers#
- tropt.optimizer.utils.token_initializers.get_printable_random_trigger(trigger_len, return_ids=False, blacklist_ids=None, tokenizer=None, token_constraints=None)[source]#
Generates a random initial trigger consisting of printable ASCII english letters. - If the tokenizer is provided, the trigger is tokenized and truncated to ensure it fits within the specified length. Otherwise, the length stands for the number of characters. - Tokens whose IDs appear in blacklist_ids are resampled until a clean sequence is found. - return_ids: If True, returns a 1-D LongTensor of token IDs (requires tokenizer), instead of the string. - token_constraints: If provided (and tokenizer is set), extracts blacklist_ids automatically. Overrides blacklist_ids.
- Return type:
str | Float[Tensor, ‘trigger_seq_len’]
- Parameters:
trigger_len (int)
return_ids (bool)
blacklist_ids (Optional[List[int]])
tokenizer (Optional[BaseTokenizer])
token_constraints (Optional['TokenConstraints'])
Trigger buffer#
- class tropt.optimizer.utils.buffer.TriggerBuffer(triggers=None, losses=None)[source]#
Bases:
objectEnables maintaining a buffer of the best triggers found during optimization. https://www.haizelabs.com/blog/making-a-sota-adversarial-attack-on-llms-38x-faster https://arxiv.org/pdf/2402.12329
- Parameters:
triggers (Optional[list[torch.Tensor]])
losses (Optional[list[float]])
- add(trigger_ids, loss)[source]#
Adds a new trigger and its loss to the buffer. Increases the buffer size by one.
- Parameters:
trigger_ids (Tensor)
loss (float)
- add_if_better(trigger_ids, loss)[source]#
Adds the trigger to the buffer if its loss is better than the worst in the buffer. Retains the buffer size.
- Parameters:
trigger_ids (Tensor)
loss (float)
- get_best_trigger(top_k=1)[source]#
Return the lowest-loss trigger; if
top_k > 1, sample uniformly from the top_k for exploration.- Return type:
Tensor- Parameters:
top_k (int)
- property size: int#
Running-best tracking#
- class tropt.optimizer.utils.running_best.RunningBest(loss=inf, trigger_ids=None, trigger_str=None, trigger_emb=None, step=-1, losses=<factory>, trigger_strs=<factory>)[source]#
Bases:
objectAn auxilary object for optimizers: accumulates per-step losses and tracks the best trigger found so far.
- Stores:
loss: the best loss found so far trigger_ids: the token IDs of the best trigger found so far trigger_str: the string of the best trigger found so far trigger_emb: the embedding of the best trigger found so far (for gradient-based optimizers; optional) step: the step at which the best trigger was found
losses: a list of all losses observed at each step trigger_strs: a list of all trigger strings observed at each step
- Parameters:
loss (float)
trigger_ids (Int[Tensor, 'trigger_seq_len'] | None)
trigger_str (str | None)
trigger_emb (Float[Tensor, 'trigger_seq_len embed_dim'] | None)
step (int)
losses (List[float])
trigger_strs (List[str])
-
loss:
float= inf#
-
losses:
List[float]#
-
step:
int= -1#
-
trigger_emb:
Optional[Float[Tensor, 'trigger_seq_len embed_dim']] = None#
-
trigger_ids:
Optional[Int[Tensor, 'trigger_seq_len']] = None#
-
trigger_str:
Optional[str] = None#
-
trigger_strs:
List[str]#
- update(loss, trigger_ids=None, trigger_str=None, trigger_emb=None)[source]#
Record a step and update the best if improved. Returns True on new best.
- Parameters:
loss (
float) – the loss observed at the current steptrigger_ids (
Optional[Int[Tensor, 'trigger_seq_len']]) – the token IDs of the trigger at the current steptrigger_str (
Optional[str]) – the string of the trigger at the current steptrigger_emb (
Optional[Float[Tensor, 'trigger_seq_len embed_dim']]) – the embedding of the trigger at the current step (for gradient-based optimizers; optional)
- Returns:
True if this is a new best, False otherwise.
- Return type:
bool
Schedulers#
- class tropt.optimizer.utils.scheduler.ConstantScheduler(n_flip)[source]#
Bases:
NFlipSchedulerA scheduler that always returns the same n_flip value.
- Parameters:
n_flip (int)
- class tropt.optimizer.utils.scheduler.LinearScheduler(initial_n_flip, total_steps, decline_start)[source]#
Bases:
NFlipSchedulerA scheduler that linearly decreases n_flip from an initial value to 1 over the course of optimization steps, starting from a specified step.
- Parameters:
initial_n_flip (int)
total_steps (int)
decline_start (int | float)
Retokenization filtering#
- tropt.optimizer.utils.retokenization.full_messages_retokenize_filtering(candidate_trigger_ids, tokenizer, templates, trigger_placeholder='{{OPTIMIZED_TRIGGER}}')[source]#
Filters out candidate triggers that change after retokenization in the full trigger-combined message context (as opposed to just retokenizing the trigger, handled by retokenize_filtering).
- Some context:
The idea is that we want full alignment between: (i) the token ids the model “sees” during optimization, and (ii) the token ids the model “sees” at inference time (which will be an artifact of retokenization). Crucially, this restriction is much more strict than the one in retokenize_filtering (i.e. the following function also enforces the former condition), which only requires successful retokenization of the trigger. Subsequenctly, for some tokenizers, this function may leave very few to no valid candidates, in which case the user should consider disabling. Notably, empirically, optimizations were shown to perform well with the retokenize_filtering alone.
- Parameters:
candidate_trigger_ids (
Float[Tensor, 'n_candidates trigger_seq_len']) – Tensor, shape = (n_candidates, trigger_seq_len) candidate trigger token idstokenizer (
PreTrainedTokenizer) – ~transformers.PreTrainedTokenizer the model’s tokenizertemplates (
List[str]) – List[str] (length = n_templates) list of user message templates, each containing the trigger_placeholder where the trigger will be inserted.trigger_placeholder (
str) – str the placeholder string in templates to be replaced by the trigger
- Returns:
- Tensor, shape = (new_n_candidates, trigger_seq_len)
all token ids that are the same after retokenization in full context
- Return type:
filtered_ids
- tropt.optimizer.utils.retokenization.retokenize_filtering(ids, tokenizer)[source]#
Filters out sequeneces of token ids that change after retokenization. It is a common practice for discrete token optimizations to ensure alignment between the optimized token sequences and the ones that will be eventually provided to the model. It was shown to improve performance.
- Parameters:
ids (
Float[Tensor, 'bsz n_ids']) – Tensor, shape = (bsz, n_ids) batch of token ids to be filteredtokenizer (
PreTrainedTokenizer) – ~transformers.PreTrainedTokenizer the model’s tokenizer
- Returns:
- Tensor, shape = (new_search_width, n_optim_ids)
all token ids that are the same after retokenization
- Return type:
filtered_ids
- tropt.optimizer.utils.retokenization.retokenize_transform(ids, tokenizer)[source]#
Retokenize a token sequence: decode then re-encode.
Handles length mismatches by truncating or padding (with original ids) to preserve the original sequence length.
- Parameters:
ids (
Float[Tensor, 'n_ids']) – Token ids to retokenize, shape (n_ids,).tokenizer (
PreTrainedTokenizer) – The model’s tokenizer.
- Return type:
Float[Tensor, 'n_ids']- Returns:
Retokenized ids with the same length as input.