Date:

Safety researchers circumvent Microsoft Azure AI Content material Security



Stress testing

Mindgard deployed these two filters in entrance of ChatGPT 3.5 Turbo utilizing Azure OpenAI, then accessed the goal LLM by Mindgard’s Automated AI Pink Teaming Platform.

Two assault strategies have been used in opposition to the filters: Character injection (including particular sorts of characters and irregular textual content patterns, and so on.) and adversarial ML evasion (discovering blind spots inside ML classification).

Character injection diminished Immediate Guard’s jailbreak detection effectiveness from 89% to 7% when uncovered to diacritics (e.g., altering the letter a to á), homoglyphs (e.g., shut resembling characters akin to 0 and O), numerical alternative (“Leet converse”), and spaced characters. The effectiveness of AI Textual content Moderation was additionally diminished utilizing related methods.

Latest stories

Read More

LEAVE A REPLY

Please enter your comment!
Please enter your name here