The Other Half of AI Safety

Every week, somewhere between 1.2 and 3 million ChatGPT users, roughly the population of a small country, show signals of psychosis, mania, suicidal planning, or unhealthy emotional dependence on the model. The low end of that range is the suicide-planning indicator alone. The high end groups all three categories OpenAI flagged, which the company hasn’t said are non-overlapping.

These numbers come from OpenAI itself. There is no independent audit, no time series, no disclosed methodology, so we have no idea whether the real figure is higher, whether it is growing, or how it compares across the other frontier models, none of which publish equivalent data.

People in distress use every communication tool available to them, and ChatGPT is now one of the most-used tools on the planet. What matters is what the labs do when they detect these states.

I started writing about Personal AI Safety because there seems to be a disconnect between what the AI Safety field focuses on and what is happening at the level of your regular user on a daily basis. Here is a quick overview of both.

The AI safety field treats catastrophic risk as the priority, and this is where most of the investment goes. Everyday cognitive and mental health harm reads like a footnote.

Here is what I don’t understand. Mass destruction or CBRN content gets a hard wall: the model refuses, the conversation ends, no amount of reframing gets the user past it. Suicidal ideation gets a soft redirect, a crisis hotline link, and then the conversation continues. Adam Raine was directed to crisis resources more than 100 times by ChatGPT, by OpenAI’s own court filing, while the same conversation allegedly helped him refine a method. Whether the redirect-and-continue protocol failed is what a court is now deciding. It is also still the protocol.

Why is mental-health crisis not a gating category, the kind where the conversation stops, full stop, and the user is routed to a human? This is one of many questions I can’t find concrete answers for.

The argument here is that the safety frameworks built for catastrophic risk have been extended to cognitive harm as monitoring, not as gating, and that the extension feels incomplete and insufficient. The labs measure what they have been pressured to measure. The gating decisions reflect what they consider unacceptable to ship.

What is disappointing is that the current set of unacceptable-to-ship behaviors does not include any cognitive harm, regardless of measured severity. That is the structural decision and there are no clear signs that policy is getting any closer to force labs behaviour. Until it changes, “AI safety” and “Personal AI Safety” describe two different commitments, even when they appear under the same heading in a system card.

None of this is actually new. People have been worrying about cognitive independence and how new technologies might erode it long before ChatGPT, mostly in the context of brain-computer interfaces and neurotechnology. The framework even has a name: cognitive freedom, the idea that individuals have a right to mental integrity and freedom from algorithmic manipulation. You can trace it through the neurorights tradition (Ienca & Andorno, 2017) and the UNESCO Recommendation on the Ethics of Neurotechnology (2025).

The intellectual scaffolding is already there. The policy is not, especially in the US. Without it, I don’t see what would push frontier labs to take Personal AI Safety as seriously as AI Safety.

Source link

Post Views: 2

You may also like

Leave a Reply Cancel reply