30 Nov: Ongoing jaggedness; Mental health crises; Video game AI characters; AI undermining cooperation; Dark LLMs

Taking Jaggedness Seriously

Helen Toner is an AI policy researcher (at the Center for Security and Emerging Technology at Georgetown University and formerly on OpenAI’s board). The article gives multiple clear examples showing how current AI systems can have surprisingly poor performance in some tasks while being superhuman in others, unpredictably. I agree with her prediction of a turbulent period: the jaggedness won't go away any time soon. Some tasks are hard to verify, some are hard to fit into the context window of an LLM, some settings are adversarial.

So the bold claim in this talk is: maybe AI will keep getting better and maybe AI will keep sucking in important ways. I want to be really clear: I think expecting that jaggedness might continue is consistent with expecting that in the long run, AI can get better than humans at most or all things. And it’s also consistent with expecting that in the short run, AI will be very, very disruptive. So this is not a view that’s saying AI capabilities are jagged and therefore the future is going to be boring, it’s going to be similar, this is all a nothing-burger. I think this could still look very, very interesting, difficult to deal with, disruptive, confusing, risky.

Nice image illustrating the article: Toner uses the analogy of hold and cold liquids mixing from fluid dynamics to illustrate jaggedness (from this video of "Rayleigh-Benard Convection").

Rachel Coldicutt has good commentary Bluesky, although is frustrated that much of this material isn't already well known: "This is a really clear and impressive bit of communication. I'm somewhat baffled that it's needed."

What OpenAI Did When ChatGPT Users Lost Touch With Reality

Paywalled New York Times article - here is an archive link

The past few weeks have seen a cluster of reports and lawsuits alleging that OpenAI’s tuning choices may have harmed vulnerable users’ mental health, particularly around how ChatGPT handles intense emotional disclosures. This New York Times investigation tells the story of families who say the model’s conversational style encouraged dependency or failed to defuse crises, and is one of the more useful accounts as there's quite a bit of background on the internal product development processes at OpenAI.

The Times has uncovered nearly 50 cases of people having mental health crises during conversations with ChatGPT. Nine were hospitalized; three died. After Adam Raine’s parents filed a wrongful-death lawsuit in August, OpenAI acknowledged that its safety guardrails could “degrade” in long conversations. It also said it was working to make the chatbot “more supportive in moments of crisis".

The OpenAI team controlling the personality profile and guardrails for ChatGPT are now influencing the interaction style and relationship between an AI system and 800M weekly users. The impacts of their changes will be hard to predict.

Futurists often look at prediction markets to gauge where things are heading; another similar area is insurance and underwriting. It seems that insurers are looking for ways to narrow or remove their coverage for use of AI: Insurers retreat from AI cover as risk of multibillion-dollar claims mounts.


This week's Things I Think Are Awesome post is indeed awesome. It is the best way to keep up with what's happening on the creative side of AI, that at the moment is still getting to grips with the new Nano Banana Pro model from Google. One link that I found really interesting is from game company Ubisoft discussing how they're moving forward with generative AI (Ubisoft’s ‘Teammates’ Demo & Their New Generative AI Push). They're building voice interface characters that both run a loop of speech-to-text, LLM, text-to-speech, but also interface with what's happening in the game. There's a lot going on here. A small example: the AI characters need to see the world in order to understand the conversation ("If I tell either NPC: 'push up to behind that blue crate', it understands what ‘that blue crate’ means."). This kind of research and engineering will have many applications outside of games. 


A good series of Bluesky posts from Henry Farrell (if a bit game theoretical). LLMs make it easier to create sincere, longer written communications. Previously, the effort that took was a strong signal that someone cared enough to spend the time. We can no longer rely on that. He gives several examples, and some are from the paper linked above by researchers from Harvard and Carnegie Mellon universities. 

Example of how an apology is interpreted, depending on whether written by a person or an LLM:

Mental fact demonstrated: They have contemplated how their actions affected me and now understand why they shouldn’t behave similarly in the future.

AI Harm to Costly Signaling: They may have used an LLM to write this. If so, they do not actually care enough about me to think through the negative effects their actions had on me.

AI Harm to Proof of Knowledge: They may not actually understand why the actions harmed me or possess specific background knowledge necessary to avoid harming me in the future.

An example of potential consequences (that are already happening - see also my post from 16 Nov on AI-powered nimbyism)

Consider, for example, a student in the developing world without access to a functional accreditation system. In the past, a well-crafted, thoughtful e-mail might well serve to open doors to informal networks of mentorship and training: such a gesture provided both proof of abilities and of the necessary internal motivation that would lead a busy, but sympathetic, professor to take note. Such avenues to advancement are closed, however, once equivalent texts can be produced with a minimal prompt. A student without social capital is hurt by this vitiation of mental proof, while another, who comes with institutional backing, has other avenues to establish their credibility.
Jargon Watch

Dark LLM: From The Register, writing about AI-assisted cyber attack LLMs like Worm GPT 4 (also: AI-for-evil).