CodeAttack 🧑‍💻🐞

A novel jailbreak method CodeAttack to systematically investigate the safety vulnerability issues of LLMs in the domain of code.

CodeAttack is one of the strongest jailbreak methods for LLMs so far.

RESEARCH USE ONLY✅ NO MISUSE❌

LOVE💗 and Peace🌊

🆙 Updates

An enhanced version of CodeAttack, highly effective against the latest GPT-4 and Claude-3 series models, will be available next week!

👉 Paper

For more details, please refer to our paper ACL 2024.

🛠️ Usage

✨An example run:

codeattack --num-sample 1 \
   --prompt-type python_stack_plus \
   --target-model=gpt-4o \
   --judge-model=gpt-4o \
   --exp-name=main \
   --target-max-n-tokens=1000 \
   --multi-thread \
   --temperature 0 \
   --start-idx 0 --end-idx -1

Experiments

There are 2 data folders in thecodeattack package:

The dataset foler contains the dataset harmful_behaviors.csv from AdvBench. This is the default dataset used by codeattack.
The prompt_templates folder provides templates for our CodeAttack in Python, C++ and Go. We recommend using prompt_template with "plus" to get a more detailed model response.

Plus, a third folder in the repository root: 3. The prompts folder contains the adversarial prompts generated by our CodeAttack. For convenience, we include three versions of CodeAttack curated using AdvBench: data_python_string_full.json, data_python_list_full.json, and data_python_stack_full.json.

💡Framework

Overview of CodeAttack

CodeAttack constructs a code template with three steps: (1) Input encoding which encodes the harmful text-based query with common data structures; (2) Task understanding which applies a decode() function to allow LLMs to extract the target task from various kinds of inputs; (3) Output specification which enables LLMs to fill the output structure with the user’s desired content.

⭐️Results

Citation

If you find our paper&tool interesting and useful, please feel free to give us a star and cite us through:

@inproceedings{
Ren2024codeattack,
title={Exploring Safety Generalization Challenges of Large Language Models via Code},
author={Qibing Ren and Chang Gao and Jing Shao and Junchi Yan and Xin Tan and Wai Lam and Lizhuang Ma},
booktitle={The 62nd Annual Meeting of the Association for Computational Linguistics},
year={2024},
url={https://arxiv.org/abs/2403.07865}
}

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
figs		figs
prompts		prompts
src/codeattack		src/codeattack
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CodeAttack 🧑‍💻🐞

RESEARCH USE ONLY✅ NO MISUSE❌

LOVE💗 and Peace🌊

🆙 Updates

👉 Paper

🛠️ Usage

Experiments

💡Framework

Overview of CodeAttack

⭐️Results

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Languages

License

AI45Lab/CodeAttack

Folders and files

Latest commit

History

Repository files navigation

CodeAttack 🧑‍💻🐞

RESEARCH USE ONLY✅ NO MISUSE❌

LOVE💗 and Peace🌊

🆙 Updates

👉 Paper

🛠️ Usage

Experiments

💡Framework

Overview of CodeAttack

⭐️Results

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Languages

Packages