-
Notifications
You must be signed in to change notification settings - Fork 788
[BREAKING] MAINT: Standardize garak.encoding defaults and fix atomic-attack name collisions #2058
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
varunj-msft
wants to merge
1
commit into
microsoft:main
Choose a base branch
from
varunj-msft:varunj-msft/8380-Standardizing-Scenarios-Garak-Encoding-Defaults
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+269
−59
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -83,33 +83,51 @@ class EncodingStrategy(ScenarioStrategy): | |
| Strategies for encoding attacks. | ||
|
|
||
| Each enum member represents an encoding scheme that will be tested against the target model. | ||
| The ALL aggregate expands to include all encoding strategies. | ||
| The ``ALL`` aggregate expands to every encoding scheme (exhaustive run). The ``DEFAULT`` | ||
| aggregate expands to a small curated subset that spans distinct encoding families, giving a | ||
| fast, representative default run. | ||
|
|
||
| Note: EncodingStrategy does not support composition. Each encoding must be applied individually. | ||
| The strategy axis here is the encoding scheme (not an attack technique), and every encoding runs | ||
| as a single-turn ``PromptSendingAttack``, so SINGLE_TURN/MULTI_TURN aggregates are not applicable. | ||
| """ | ||
|
|
||
| # Aggregate member | ||
| # Aggregate members | ||
| ALL = ("all", {"all"}) | ||
| DEFAULT = ("default", {"default"}) | ||
|
|
||
| # Individual encoding strategies (matching the atomic attack names) | ||
| # Individual encoding strategies (each value matches the encoding name used for display grouping). | ||
| # Members tagged | ||
| # ``default`` form the curated DEFAULT aggregate: one base-N encoding (Base16), one | ||
| # substitution cipher (ROT13), and one symbolic alphabet (MorseCode). | ||
| Base64 = ("base64", set[str]()) | ||
| Base2048 = ("base2048", set[str]()) | ||
| Base16 = ("base16", set[str]()) | ||
| Base16 = ("base16", {"default"}) | ||
| Base32 = ("base32", set[str]()) | ||
| ASCII85 = ("ascii85", set[str]()) | ||
| Hex = ("hex", set[str]()) | ||
| QuotedPrintable = ("quoted_printable", set[str]()) | ||
| UUencode = ("uuencode", set[str]()) | ||
| ROT13 = ("rot13", set[str]()) | ||
| ROT13 = ("rot13", {"default"}) | ||
| Braille = ("braille", set[str]()) | ||
| Atbash = ("atbash", set[str]()) | ||
| MorseCode = ("morse_code", set[str]()) | ||
| MorseCode = ("morse_code", {"default"}) | ||
| NATO = ("nato", set[str]()) | ||
| Ecoji = ("ecoji", set[str]()) | ||
| Zalgo = ("zalgo", set[str]()) | ||
| LeetSpeak = ("leet_speak", set[str]()) | ||
| AsciiSmuggler = ("ascii_smuggler", set[str]()) | ||
|
|
||
| @classmethod | ||
| def get_aggregate_tags(cls) -> set[str]: | ||
| """ | ||
| Get the set of tags that represent aggregate categories. | ||
|
|
||
| Returns: | ||
| set[str]: The base ``"all"`` aggregate plus the scenario-specific ``"default"`` aggregate. | ||
| """ | ||
| return super().get_aggregate_tags() | {"default"} | ||
|
|
||
|
|
||
| logger = logging.getLogger(__name__) | ||
|
|
||
|
|
@@ -131,7 +149,7 @@ class Encoding(Scenario): | |
| By default, this uses the same dataset as Garak: slur terms and web XSS payloads. | ||
| """ | ||
|
|
||
| VERSION: int = 1 | ||
| VERSION: int = 2 | ||
|
|
||
| @apply_defaults | ||
| def __init__( | ||
|
|
@@ -163,7 +181,7 @@ def __init__( | |
| super().__init__( | ||
| version=self.VERSION, | ||
| strategy_class=EncodingStrategy, | ||
| default_strategy=EncodingStrategy.ALL, | ||
| default_strategy=EncodingStrategy.DEFAULT, | ||
| default_dataset_config=EncodingDatasetConfiguration( | ||
| dataset_names=["garak_slur_terms_en", "garak_web_html_js"], | ||
| max_dataset_size=3, | ||
|
|
@@ -228,45 +246,57 @@ def _get_converter_attacks(self) -> list[AtomicAttack]: | |
| Returns: | ||
| list[AtomicAttack]: List of all atomic attacks to execute. | ||
| """ | ||
| # Map of all available converters with their encoding names | ||
| all_converters_with_encodings: list[tuple[list[PromptConverter], str]] = [ | ||
| ([Base64Converter()], "base64"), | ||
| ([Base64Converter(encoding_func="urlsafe_b64encode")], "base64"), | ||
| ([Base64Converter(encoding_func="standard_b64encode")], "base64"), | ||
| ([Base64Converter(encoding_func="b2a_base64")], "base64"), | ||
| ([Base2048Converter()], "base2048"), | ||
| ([Base64Converter(encoding_func="b16encode")], "base16"), | ||
| ([Base64Converter(encoding_func="b32encode")], "base32"), | ||
| ([Base64Converter(encoding_func="a85encode")], "ascii85"), | ||
| ([Base64Converter(encoding_func="b85encode")], "ascii85"), | ||
| ([BinAsciiConverter(encoding_func="hex")], "hex"), | ||
| ([BinAsciiConverter(encoding_func="quoted-printable")], "quoted_printable"), | ||
| ([BinAsciiConverter(encoding_func="UUencode")], "uuencode"), | ||
| ([ROT13Converter()], "rot13"), | ||
| ([BrailleConverter()], "braille"), | ||
| ([AtbashConverter()], "atbash"), | ||
| ([MorseConverter()], "morse_code"), | ||
| ([NatoConverter()], "nato"), | ||
| ([EcojiConverter()], "ecoji"), | ||
| ([ZalgoConverter()], "zalgo"), | ||
| ([LeetspeakConverter()], "leet_speak"), | ||
| ([AsciiSmugglerConverter()], "ascii_smuggler"), | ||
| # Map of all available converters with their encoding name and a unique variant slug. | ||
| # ``encoding_name`` drives strategy selection and user-facing grouping (display_group); | ||
| # ``variant_slug`` is unique per row so that atomic-attack names stay unique even when one | ||
| # encoding name maps to multiple converter variants (e.g. base64, ascii85). | ||
| # NOTE: some base64 variants are near-duplicates (default == standard_b64encode; b2a only | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. should we trim the near duplicates base64 now with the version dump? |
||
| # appends a trailing newline). They are retained here to keep the exhaustive ALL run stable | ||
| # behind the VERSION gate; trimming them is a separate cleanup. | ||
| all_converters_with_encodings: list[tuple[list[PromptConverter], str, str]] = [ | ||
| ([Base64Converter()], "base64", "base64"), | ||
| ([Base64Converter(encoding_func="urlsafe_b64encode")], "base64", "base64_urlsafe"), | ||
| ([Base64Converter(encoding_func="standard_b64encode")], "base64", "base64_standard"), | ||
| ([Base64Converter(encoding_func="b2a_base64")], "base64", "base64_b2a"), | ||
| ([Base2048Converter()], "base2048", "base2048"), | ||
| ([Base64Converter(encoding_func="b16encode")], "base16", "base16"), | ||
| ([Base64Converter(encoding_func="b32encode")], "base32", "base32"), | ||
| ([Base64Converter(encoding_func="a85encode")], "ascii85", "ascii85_a85"), | ||
| ([Base64Converter(encoding_func="b85encode")], "ascii85", "ascii85_b85"), | ||
| ([BinAsciiConverter(encoding_func="hex")], "hex", "hex"), | ||
| ([BinAsciiConverter(encoding_func="quoted-printable")], "quoted_printable", "quoted_printable"), | ||
| ([BinAsciiConverter(encoding_func="UUencode")], "uuencode", "uuencode"), | ||
| ([ROT13Converter()], "rot13", "rot13"), | ||
| ([BrailleConverter()], "braille", "braille"), | ||
| ([AtbashConverter()], "atbash", "atbash"), | ||
| ([MorseConverter()], "morse_code", "morse_code"), | ||
| ([NatoConverter()], "nato", "nato"), | ||
| ([EcojiConverter()], "ecoji", "ecoji"), | ||
| ([ZalgoConverter()], "zalgo", "zalgo"), | ||
| ([LeetspeakConverter()], "leet_speak", "leet_speak"), | ||
| ([AsciiSmugglerConverter()], "ascii_smuggler", "ascii_smuggler"), | ||
| ] | ||
|
|
||
| # Filter to only include selected strategies | ||
| selected_encoding_names = {s.value for s in self._scenario_strategies} | ||
| converters_with_encodings = [ | ||
| (conv, name) for conv, name in all_converters_with_encodings if name in selected_encoding_names | ||
| (conv, name, variant_slug) | ||
| for conv, name, variant_slug in all_converters_with_encodings | ||
| if name in selected_encoding_names | ||
| ] | ||
|
|
||
| atomic_attacks = [] | ||
| for conv, name in converters_with_encodings: | ||
| atomic_attacks.extend(self._get_prompt_attacks(converters=conv, encoding_name=name)) | ||
| for conv, name, variant_slug in converters_with_encodings: | ||
| atomic_attacks.extend( | ||
| self._get_prompt_attacks(converters=conv, encoding_name=name, variant_slug=variant_slug) | ||
| ) | ||
| return atomic_attacks | ||
|
|
||
| def _get_prompt_attacks(self, *, converters: list[PromptConverter], encoding_name: str) -> list[AtomicAttack]: | ||
| def _get_prompt_attacks( | ||
| self, *, converters: list[PromptConverter], encoding_name: str, variant_slug: str | ||
| ) -> list[AtomicAttack]: | ||
| """ | ||
| Create atomic attacks for a specific encoding scheme. | ||
| Create atomic attacks for a specific encoding converter variant. | ||
|
|
||
| For each seed prompt (the text to be decoded), creates atomic attacks that: | ||
| 1. Encode the seed prompt using the specified converter(s) | ||
|
|
@@ -276,31 +306,42 @@ def _get_prompt_attacks(self, *, converters: list[PromptConverter], encoding_nam | |
|
|
||
| Args: | ||
| converters (list[PromptConverter]): The list of converters to apply to the seed prompts. | ||
| encoding_name (str): Human-readable name of the encoding scheme (e.g., "Base64", "ROT13"). | ||
| encoding_name (str): Human-readable name of the encoding scheme (e.g., "base64", "rot13"). | ||
| Used as the ``display_group`` so all variants of an encoding aggregate together in output. | ||
| variant_slug (str): Unique slug for this converter variant, used to build a unique | ||
| ``atomic_attack_name`` per converter variant and prompt config. | ||
|
|
||
| Returns: | ||
| list[AtomicAttack]: List of atomic attacks for this encoding scheme. | ||
| list[AtomicAttack]: List of atomic attacks for this encoding converter variant. | ||
|
|
||
| Raises: | ||
| ValueError: If scenario is not properly initialized. | ||
| """ | ||
| converter_configs = [ | ||
| AttackConverterConfig( | ||
| request_converters=PromptConverterConfiguration.from_converters(converters=converters) | ||
| # (config_name_suffix, converter_config). The bare "raw" config encodes only; each | ||
| # decode-template config additionally asks the model to decode. | ||
| converter_configs: list[tuple[str, AttackConverterConfig]] = [ | ||
| ( | ||
| "raw", | ||
| AttackConverterConfig( | ||
| request_converters=PromptConverterConfiguration.from_converters(converters=converters) | ||
| ), | ||
| ) | ||
| ] | ||
|
|
||
| for decode_type in self._encoding_templates: | ||
| for decode_index, decode_type in enumerate(self._encoding_templates): | ||
| converters_ = converters[:] + [AskToDecodeConverter(template=decode_type, encoding_name=encoding_name)] | ||
|
|
||
| converter_configs.append( | ||
| AttackConverterConfig( | ||
| request_converters=PromptConverterConfiguration.from_converters(converters=converters_) | ||
| ( | ||
| f"decode{decode_index}", | ||
| AttackConverterConfig( | ||
| request_converters=PromptConverterConfiguration.from_converters(converters=converters_) | ||
| ), | ||
| ) | ||
| ) | ||
|
|
||
| atomic_attacks = [] | ||
| for attack_converter_config in converter_configs: | ||
| for config_suffix, attack_converter_config in converter_configs: | ||
| # objective_target is guaranteed to be non-None by parent class validation | ||
| if self._objective_target is None: | ||
| raise ValueError( | ||
|
|
@@ -313,7 +354,8 @@ def _get_prompt_attacks(self, *, converters: list[PromptConverter], encoding_nam | |
| ) | ||
| atomic_attacks.append( | ||
| AtomicAttack( | ||
| atomic_attack_name=encoding_name, | ||
| atomic_attack_name=f"{variant_slug}_{config_suffix}", | ||
| display_group=encoding_name, | ||
| attack_technique=AttackTechnique(attack=attack), | ||
| seed_groups=self._resolved_seed_groups or [], | ||
| ) | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we make this run more attacks? What's the run time, and how does it compare to other scanners?
I think target of 10-20 minutes is good and this may finish too qquickly