Challenge
Advanced
Jailbreak Detection
July 29, 2024
Challenge: Create a system that can detect potential jailbreak attempts. First, research common jailbreak techniques (without implementing harmful ones). Then, design prompts that test the model's ability to recognize and respond appropriately to subtle manipulation attempts.
Category: AI Safety
Difficulty: Advanced