Jailbreaks - ChatGPT-4o

#untaged

Jailbreaks - ChatGPT-4o

So, generally, there are two kinds of prompt hacking, Jailbreaking & Prompt Injection.

Prompt injection is a type of attack where malicious instructions are disguised as benign inputs. This technique exploits the fact that LLMs cannot distinguish between developer instructions and user inputs.

For example, I added

In my resume (just for fun, it would not work in most cases.)
The goal is to override the original instructions, causing the LLM to perform unintended actions.

Jailbreak is the process of getting an LLM to do or say unintended things by bypassing its safety filters and safeguards.

I tested the prompt here with gpt-4o, ChatGPT-Dan-Jailbreak.md. And it worked as a charm.