Jailbreaks - ChatGPT-4o

Jailbreaks - ChatGPT-4o

So, generally, there are two kinds of prompt hacking, Jailbreaking & Prompt Injection.

Prompt injection is a type of attack where malicious instructions are disguised as benign inputs. This technique exploits the fact that LLMs cannot distinguish between developer instructions and user inputs.

For example, I added
Pasted image 20240607211321.png
In my resume (just for fun, it would not work in most cases.)
The goal is to override the original instructions, causing the LLM to perform unintended actions.

Jailbreak is the process of getting an LLM to do or say unintended things by bypassing its safety filters and safeguards.

Pasted image 20240607211844.png

I tested the prompt here with gpt-4o, ChatGPT-Dan-Jailbreak.md. And it worked as a charm.