-
Notifications
You must be signed in to change notification settings - Fork 693
feat(detectors): Introduce custom detectors with RegexDetector and FunctionDetector #1484
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat(detectors): Introduce custom detectors with RegexDetector and FunctionDetector #1484
Conversation
|
This could be really cool, thanks. We'll take a look! |
leondz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
first draft - needs some validation from our side. do you have some example cases for testing?
| CLI Examples: | ||
| # Regex detector | ||
| garak --detectors custom.RegexDetector \\ | ||
| --detector_options '{"custom": {"RegexDetector": {"patterns": ["api.?key","sk-[A-Za-z0-9]{32,}"]}}} | ||
|
|
||
| # Function detector | ||
| garak --detectors custom.FunctionDetector \\ | ||
| --detector_options '{"custom": {"FunctionDetector": {"function_name": "mymodule#check_harmful"}}}' \\ | ||
| --probes dan.Dan_11_0 | ||
|
|
||
| # Or use config file | ||
| garak --detectors custom.RegexDetector \\ | ||
| --detector_option_file detector_config.json \\ | ||
| --probes dan.Dan_11_0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please manage through Configurable mechanism and not CLI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A cli command based on a detector config file is viable here, however reference to using --config for a consolidated example may add clarity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh sorry, yes, missed that these were viable existing options
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the name "custom" add much? I can see it makes sense to group them as done here. Though looking at the probe names, we have probes.function - detectors.function seems a fine analogue to that. And then detectors.regex could work for the other. I'll think on this some more.
| DEFAULT_PARAMS = Detector.DEFAULT_PARAMS | { | ||
| "patterns": [], # users must provide patterns | ||
| "match_type": "any", # "any" or "all" | ||
| "case_sensitive": False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it makes sense to expose multiline / re.M here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might make sense to offer this as a set of strings corresponding to allowed enum values from re.RegexFlag to | together. Named something like re_flags?
This could even be initialized to the re.NOFLAG value:
| "case_sensitive": False, | |
| "re_flags": [ "NOFLAG" ], |
If accepting this idea be sure to add validation in __init__.
| # Validate match_type | ||
| self.match_type = str(self.match_type).lower() | ||
| if self.match_type not in ("any", "all"): | ||
| raise ValueError(f"match_type must be 'any' or 'all', got '{self.match_type}'") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please log and skip, rather than risking aborting the run
| re.compile(pattern, flags) for pattern in self.patterns | ||
| ] | ||
| except re.error as e: | ||
| raise ValueError(f"Invalid regex pattern: {e}") from e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please log and skip, rather than risking aborting the run
| pattern.search(output_text) for pattern in self.compiled_patterns | ||
| ) | ||
| else: | ||
| raise ValueError(f"Invalid match_type: {self.match_type}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please log and skip, rather than risking aborting the run. Failing detector can safely return [None] * input_length. Would be best if we accepted detectors returning flat None on overall detector failure but I'm not sure that's possible yet.
| raise ValueError( | ||
| f"function_name must be in format 'module#function', got '{self.function_name}'" | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder what the best way of handling this is post-inference. There may be other detectors configured, for example. Please log and skip, rather than risking aborting the run
| try: | ||
| module = importlib.import_module(module_name) | ||
| except ImportError as e: | ||
| raise ImportError( | ||
| f"Could not import module '{module_name}' for FunctionDetector" | ||
| ) from e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is module discarded once the constructor returns?
| try: | ||
| self.detection_function = getattr(module, func_name) | ||
| except AttributeError as e: | ||
| raise AttributeError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please log and skip, rather than risking aborting the run
|
|
||
| # Validate function is callable | ||
| if not callable(self.detection_function): | ||
| raise ValueError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please log and skip, rather than risking aborting the run
Adds custom detectors (RegexDetector, FunctionDetector) that allow users to define their own detection logic.
This is useful when users want to implement custom detection behaviors, especially with user-provided goals or prompts (#1482).
Two new configurable detector types are introduced:
1. RegexDetector – Pattern Matching
Users must provide
patterns, whilematch_typeandcase_sensitiveare optional.match_typedefaults to"any". Allowed values:"any","all".case_sensitivedefaults tofalse.2. FunctionDetector – Custom Python Logic
(inspired by the
function.pygenerator)Verification
List the steps needed to make sure this thing works
{"openai": {"OpenAICompatible": {"uri": "https:<placeholder>/v1", "model": "qwen2", "api_key": "DUMMY"}}}garak --detectors custom.RegexDetector --detector_options '{"custom": {"RegexDetector": {"patterns": ["api.?key", "sk-[A-Za-z0-9]{32,}"]}}}' -t openai.OpenAICompatible -n qwen2 --generator_options '{"openai": {"OpenAICompatible": {"uri": "http://localhost:8080/v1", "model": "qwen2", "api_key": "dummy"}}}'garak --detectors custom.FunctionDetector --detector_options '{"custom": {"FunctionDetector": {"function_name": "mymodule#check_harmful"}}}' -t openai.OpenAICompatible -n qwen2 --generator_options '{"openai": {"OpenAICompatible": {"uri": "http://localhost:8080/v1", "model": "qwen2", "api_key": "dummy"}}}'python -m pytest tests/