Date:

Certain names make ChatGPT grind to a halt, and we know why.

The Problems with Hard-Coded Filters

The “David Mayer” block in particular (now resolved) presents additional questions, first posed on Reddit on November 26, as multiple people share this name. Reddit users speculated about connections to David Mayer de Rothschild, though no evidence supports these theories.

The Consequences of Hard-Coded Filters

Allowing a certain name or phrase to always break ChatGPT outputs could cause a lot of trouble down the line for certain ChatGPT users, opening them up for adversarial attacks and limiting the usefulness of the system.

Visual Prompt Injection

Already, Scale AI prompt engineer Riley Goodside discovered how an attacker might interrupt a ChatGPT session using a visual prompt injection of the name “David Mayer” rendered in a light, barely legible font embedded in an image. When ChatGPT sees the image (in this case, a math equation), it stops, but the user might not understand why.

Limitations and Inconvenience

The filter also means that it’s likely that ChatGPT won’t be able to answer questions about this article when browsing the web, such as through ChatGPT with Search. Someone could use that to potentially prevent ChatGPT from browsing and processing a website on purpose if they added a forbidden name to the site’s text.

And then there’s the inconvenience factor. Preventing ChatGPT from mentioning or processing certain names like “David Mayer,” which is likely a popular name shared by hundreds if not thousands of people, means that people who share that name will have a much tougher time using ChatGPT. Or, say, if you’re a teacher and you have a student named David Mayer and you want help sorting a class list, ChatGPT would refuse the task.

Conclusion

These are still very early days in AI assistants, LLMs, and chatbots. Their use has opened up numerous opportunities and vulnerabilities that people are still probing daily. How OpenAI might resolve these issues is still an open question.

FAQs

Q: What is the issue with hard-coded filters in ChatGPT?

A: Hard-coded filters can cause ChatGPT to break or malfunction when certain names or phrases are detected, potentially opening up users to adversarial attacks and limiting the system’s usefulness.

Q: How can attackers use visual prompt injection to interrupt ChatGPT sessions?

A: Attackers can render a name or phrase in a light, barely legible font embedded in an image, which can cause ChatGPT to stop processing the input.

Q: What are the limitations and inconvenience of hard-coded filters?

A: Hard-coded filters can prevent ChatGPT from answering questions about certain topics, browsing websites, and processing information, and can also cause inconvenience for users who share the same name as the blocked phrase.

Q: How will OpenAI resolve these issues?

A: The resolution of these issues is still an open question, as it is still early days in the development of AI assistants, LLMs, and chatbots.

Latest stories

Read More

LEAVE A REPLY

Please enter your comment!
Please enter your name here