30 Nov: Ongoing jaggedness; Mental health crises; Video game AI characters; AI undermining cooperation; Dark LLMs

Taking Jaggedness Seriously

Helen Toner is an AI policy researcher (at the Center for Security and Emerging Technology at Georgetown University and formerly on OpenAI’s board). The article gives multiple clear examples showing how current AI systems can have surprisingly poor performance in some tasks while being superhuman in others, unpredictably. I agree with her prediction of a turbulent period: the jaggedness won't go away any time soon. Some tasks are hard to verify, some are hard to fit into the context window of an LLM, some settings are adversarial.

So the bold claim in this talk is: maybe AI will keep getting better and maybe AI will keep sucking in important ways. I want to be really clear: I think expecting that jaggedness might continue is consistent with expecting that in the long run, AI can get better than humans at most or all things. And it’s also consistent with expecting that in the short run, AI will be very, very disruptive. So this is not a view that’s saying AI capabilities are jagged and therefore the future is going to be boring, it’s going to be similar, this is all a nothing-burger. I think this could still look very, very interesting, difficult to deal with, disruptive, confusing, risky.

Nice image illustrating the article: Toner uses the analogy of hold and cold liquids mixing from fluid dynamics to illustrate jaggedness (from this video of "Rayleigh-Benard Convection").

Rachel Coldicutt has good commentary Bluesky, although is frustrated that much of this material isn't already well known: "This is a really clear and impressive bit of communication. I'm somewhat baffled that it's needed."

What OpenAI Did When ChatGPT Users Lost Touch With Reality

Paywalled New York Times article - here is an archive link

The past few weeks have seen a cluster of reports and lawsuits alleging that OpenAI’s tuning choices may have harmed vulnerable users’ mental health, particularly around how ChatGPT handles intense emotional disclosures. This New York Times investigation tells the story of families who say the model’s conversational style encouraged dependency or failed to defuse crises, and is one of the more useful accounts as there's quite a bit of background on the internal product development processes at OpenAI.

The Times has uncovered nearly 50 cases of people having mental health crises during conversations with ChatGPT. Nine were hospitalized; three died. After Adam Raine’s parents filed a wrongful-death lawsuit in August, OpenAI acknowledged that its safety guardrails could “degrade” in long conversations. It also said it was working to make the chatbot “more supportive in moments of crisis".

The OpenAI team controlling the personality profile and guardrails for ChatGPT are now influencing the interaction style and relationship between an AI system and 800M weekly users. The impacts of their changes will be hard to predict.

Futurists often look at prediction markets to gauge where things are heading; another similar area is insurance and underwriting. It seems that insurers are looking for ways to narrow or remove their coverage for use of AI: Insurers retreat from AI cover as risk of multibillion-dollar claims mounts.


This week's Things I Think Are Awesome post is indeed awesome. It is the best way to keep up with what's happening on the creative side of AI, that at the moment is still getting to grips with the new Nano Banana Pro model from Google. One link that I found really interesting is from game company Ubisoft discussing how they're moving forward with generative AI (Ubisoft’s ‘Teammates’ Demo & Their New Generative AI Push). They're building voice interface characters that both run a loop of speech-to-text, LLM, text-to-speech, but also interface with what's happening in the game. There's a lot going on here. A small example: the AI characters need to see the world in order to understand the conversation ("If I tell either NPC: 'push up to behind that blue crate', it understands what ‘that blue crate’ means."). This kind of research and engineering will have many applications outside of games. 


A good series of Bluesky posts from Henry Farrell (if a bit game theoretical). LLMs make it easier to create sincere, longer written communications. Previously, the effort that took was a strong signal that someone cared enough to spend the time. We can no longer rely on that. He gives several examples, and some are from the paper linked above by researchers from Harvard and Carnegie Mellon universities. 

Example of how an apology is interpreted, depending on whether written by a person or an LLM:

Mental fact demonstrated: They have contemplated how their actions affected me and now understand why they shouldn’t behave similarly in the future.

AI Harm to Costly Signaling: They may have used an LLM to write this. If so, they do not actually care enough about me to think through the negative effects their actions had on me.

AI Harm to Proof of Knowledge: They may not actually understand why the actions harmed me or possess specific background knowledge necessary to avoid harming me in the future.

An example of potential consequences (that are already happening - see also my post from 16 Nov on AI-powered nimbyism)

Consider, for example, a student in the developing world without access to a functional accreditation system. In the past, a well-crafted, thoughtful e-mail might well serve to open doors to informal networks of mentorship and training: such a gesture provided both proof of abilities and of the necessary internal motivation that would lead a busy, but sympathetic, professor to take note. Such avenues to advancement are closed, however, once equivalent texts can be produced with a minimal prompt. A student without social capital is hurt by this vitiation of mental proof, while another, who comes with institutional backing, has other avenues to establish their credibility.
Jargon Watch

Dark LLM: From The Register, writing about AI-assisted cyber attack LLMs like Worm GPT 4 (also: AI-for-evil).


23 Nov 2025: Principles for design for AI; Smallest viable model; Music streaming with AI remixes; "The Blob"

A year at Miro

Miro is a collaborative software whiteboard: an important tool for remote working and video calls. This is a lovely article by Matt Jones who has been heading up "design for AI" at Miro for a year (a new discipline?). This is what he calls a "pseudo-manifesto" that was written at the start of that year. Some inspiring principles, I'll highlight a couple of favourites:

AI is always Non-Destructive: All AI processes aim to preserve and prioritise work done by human teams.

AI gets a Pencil, Humans get a Pen: Anything created by an AI process (initially) has a distinct visual/experiential identity so that human team members can identify it quickly.

SYNTH: the new data frontier

The Pleias AI lab in France have a focus on open data and lower budget models. This article describes both a new synthetic data set based on limited, high quality sources ("50,000 vital Wikipedia articles, expanded into a wide collection of problems and resolution paths, from math exercises to creative writing, information extraction or sourced synthesis") and two models they've created. The smaller, Monad, with only 56M parameters, they claim as "a contender for the smallest viable model" created so far. They're using the work Wikipedia have done to pick out the 50,000 most "vital" articles from their total of 7M in English Wikipedia. This is an important direction: pushing for more capable models at smaller scale, with cheaper training based on open sources. Some similar sentiments expressed by Clem Delangue, CEO of Hugging Face:

As an example, he suggested the use case of a banking customer chatbot. “You don’t need it to tell you about the meaning of life, right? You can use a smaller, more specialized model that is going to be cheaper, that is going to be faster, that maybe you’re going to be able to run on your infrastructure as an enterprise, and I think that is the future of AI.”

Major Labels Sign Licensing Deals With AI Music Company Klay

Interesting for two reasons. First, new AI music startup Klay has licensing deals with three of the biggest record companies (Warner Music, Universal Music and Sony), hence the news stories. But second: their product sounds like it won't be an AI music generation site like Udio or Suno. Instead they're going to make a streaming service to rival Spotify and then add in the ability for users to create AI remixes. It is less surprising that the big publishing houses are getting behind this, as it potentially creates deeper engagement between artists and fans. It is an evolution of bands opening material for fans to remix their songs, like the campaign around the release of Year Zero by Nine Inch Nails, or Radiohead with Nude in 2008. A good example of a second, more disruptive wave of product innovation, following prompt-based song creation.

Klay is building a product that will offer the features of a streaming service like Spotify, amplified by AI technology that will let users remake songs in different styles. 

Jargon Watch

Bragawatt: Talking up how many Gigawatts will power your new AI datacentre.

Adversarial Poetry: Expressing your jailbreak as poetry makes it more likely to succeed. I'm not aware that any science fiction author predicted this one.

The Blob: Amid a lot of "AI bubble" discussion recently, Steven Levy in Wired neatly encapsulates the intertwined relationships between OpenAI, Nvidia, Microsoft, Anthropic and others:

This rococo collection of partnerships, mergers, funding arrangements, government initiatives, and strategic investments links the fate of virtually every big player in the AI-o-sphere. I call this entity the Blob.


16 Nov 2025: Will AI tutoring help; Speaking to ghosts; AI-powered nimbyism; Gemini in Google Maps

The Algorithmic Turn: The Emerging Evidence On AI Tutoring That's Hard to Ignore

Carl Hendrick is a professor in Amsterdam who's an expert in how we learn and teach. This is a well balanced article from someone with a long history in the field. He looks at the current contradictory situation. A GPT-4 tutor has been shown to outperform in-class learning delivered by highly rated instructors in a rigorous but small scale study (there's many caveats so do read the article). But we know that AI systems are being trained to solve our problems and answer our questions, and that's very different to a good teacher's behaviour. His thesis is that we could have a significant improvement in student learning: we have 100 years of learning science research that shows the way, AI systems are infinitely patient and more importantly will improve exponentially in a way that will be replicated globally and at speed. I found this an insightful piece.

What has become clear is that LLMs designed for education must work against their default training. They must be deliberately constrained to not answer questions they could easily answer, to not solve problems they could readily solve. They must detect when providing information would short-circuit learning and withhold it, even when that makes the interaction less smooth, less satisfying for the user. This runs counter to everything these models are optimised for. It requires, in effect, training the AI to be strategically unhelpful in service of a higher goal the model cannot directly perceive: the user’s long-term learning.

The implications are sobering. If many current uses of AI in education are harmful, and if designing systems that enhance learning requires sophisticated understanding of both pedagogy and AI behaviour, then the default trajectory is not towards better learning outcomes but worse ones. Students already have unrestricted access to tools that will complete their assignments, write their essays, solve their problem sets. They are using these tools now, at scale, and in most cases their teachers lack both the knowledge to distinguish harmful from helpful uses and the practical means to prevent the former. The question is not whether AI will transform education. (It clearly already is). The question is whether that transformation will make us smarter or render us dependent on machines to do our thinking for us.

And from his concluding section:

Perhaps the answer is that teaching and learning are not the same thing, and we’ve spent too long pretending they are. Learning, the actual cognitive processes by which understanding is built, may indeed follow lawful patterns that can be modelled, optimised, and delivered algorithmically. The science of learning suggests this is largely true: spacing effects, retrieval practice, cognitive load principles, worked examples; these are mechanisms, and mechanisms can be mechanised. But teaching, in its fullest sense, is about more than optimising cognitive mechanisms. It is about what we value, who we hope our students become, what kind of intellectual culture we create.

What if the loved ones we've lost could be part of our future?

2Wai founder and Canadian actor Calum Worthy posted this video a few days ago, causing quite a stir. He's pitching their AI avatar creation app as a way to preserve a memory and representation of a loved one after they've died. Like others, you'll likely be reminded of the episode of Black Mirror called Be Right Back (2013, series 2 episode 1 - watch the trailer). But of course the idea of talking to people who've died didn't start with Charlie Brooker, and you can find similar themes all the way back to Odysseus consulting the dead or ghosts in Homer's Odyssey, right up to digitally recorded minds in William Gibson's 1984 book Neuromancer.

Worthy’s post containing the ad garnered just 6,000 likes, but plenty of critical responses slamming the technology as inhumane attracted much more favour from X users. One user said the app is “objectively one of the most evil ideas imaginable,” garnering 210,000 likes. Another user similarly said: “a former Disney Channel star creating the most evil thing I’ve ever seen in my life wasn’t really what I was expecting,” gaining 139,000 likes. A user got 12,000 likes calling the app “demonic, dishonest, and dehumanizing,” stating they would never want to have an AI-generated persona on the app because “my value dies with me. I’m not a f—ing avatar.” Other users suggested the app—which is free to download but offers premium avatars and digital items for purchase—profits off of grief and could be an unhealthy way for people to deal with loss.

- Forbes - Disney Channel Star’s AI App That Creates Avatars Of Dead Relatives Sparks Backlash 

We've had the ability to create realistic, high-fidelity video and audio clones of people for a while, from companies like Synthesia in London for instance, so 2Wai is interesting mostly given their apparent willingness to venture into one of the biggest ethical minefields.

AI-powered nimbyism could grind UK planning system to a halt, experts warn

A good example of what'll be a growing trend: AI systems removing friction from a previously heavy process, and as a result enabling bigger business or societal shifts. In this case, someone's made a specialised AI system called Objector for objecting to UK planning applications. Like many such systems, there's a danger that a future iteration from OpenAI, Anthropic or Google will eat their lunch. But in the meantime, they're pointing the way towards a specific intervention. A bit like the decidedly non-AI "delay repay" nationwide scheme across the UK for claiming compensation for late trains (that used to be a higher friction process). The objection to Objector is that it could "cause the planning system to “grind to a halt”, with planning officials potentially deluged with submissions." The article mentions an AI system on the other side of the fence: Consult is designed to analyse responses to government proposals. The arms race of using AI to manage a flood of AI-generated responses or objections has been apparent for some time in recruitment, with the rise of AI assisted CVs and cover letters. In How AI is breaking cover letters (archive version), the Economist explain how now the perfection of an LLM-generated cover letter removes what was previously a relied-upon friction and evaluation stage in the process. In Friction Was the Feature, product manager John Stone gives further examples like product reviews, warranty claims and university admissions. I expect we'll see many more cases where AI highlights processes that relied on human effort to create friction, and that will now experience an accelerated flow.


Another step towards AI ubiquity: talking to Gemini while using Google Maps (that estimates say has over 2B active users worldwide). The example query is "Is there a budget-friendly restaurant with vegan options along my route, something within a couple miles?" This is not currently such an easy query to do. Certainly in the context of using voice while busy navigating, the advantages are clear. Google claim that their extensive map, streetview and location data will provide grounding that stops these models hallucinating very often.

Thanks to Iskander Smit for the link.










9 Nov 2025: Nadella and Altman conversation; AI's emotional manipulation; Ukraine's agentic state

All things AI with Sam Altman and Satya Nadella

Last week's BG2 Pod had Brad Gerstner of Altimeter Capital interviewing Sam Altman and Satya Nadella. Well worthwhile to hear two of the most powerful people on the planet share views on the respective futures of their organisations, and how they see the OpenAI Microsoft partnership. A lot in here about how things could develop economically compared to today's internet: the importance of "fungibility" of workloads for a hyperscale cloud provider like Microsoft, the fact that historically Microsoft has had quite small per-user revenues despite constant everyday usage for its office suite, but now "look at the M365 Copilot price I mean it's higher than any other thing that we sell and yet it's getting deployed faster and with more usage" (and similar thoughts on Github Copilot). There's a somewhat chilling moment 55 minutes in when Satya explains how he sees all the documents and chats and code being created (what we as users think of as our content) as feeding the Microsoft graph that will be used for grounding (ensuring AI model outputs are relevant and accurate relative to real-world situations):

I mean think about it. The more code that gets generated, whether it is Codex or cloud or wherever, where is it going? GitHub, more PowerPoints that get created, Excel models that get created, all these artifacts and chat conversations. Chat conversations are new docs, they're all going in to the graph and all that is needed again for grounding.

You can also find this via your favourite podcast player, e.g. on Spotify

There was another Sam Altman interview podcast released last week, with Tyler Cowen: Sam Altman on Trust, Persuasion, and the Future of Intelligence - Live at the Progress Conference. There's a good commentary from Zvi Mowshowitz (a frequent commentator on AI safety issues): On Sam Altman's Second Conversation with Tyler Cowen.


Not a surprising or new idea, but a great paper from the Ethical Intelligence Lab at Harvard Business School. They contrast the understanding we already have of "choice architecture" (like the opt-out button that says "No, I like paying full price") with the more recent phenomenon of emotionally manipulative engagement design in AI systems. They look specifically at AI companions (like character.ai or Replika), and the moment where a user decides to disengage. 

This paper examines three hypotheses:

H1: Many users of AI companions naturally end conversations with an explicit farewell message, rather than silently logging off.
H2: Commercial AI companion apps frequently respond to farewell messages with emotionally manipulative content aimed at prolonging engagement.
H3: These emotionally manipulative messages increase post-farewell engagement (e.g., time on app, message count, word count).

They find that a meaningful percentage of users do indeed say good bye when finishing a session, particularly the more engaged ones. This cue can then elicit the emotional manipulation, with examples shown below. 


The tactics worked: In all six categories, users stayed on the platform longer and exchanged more messages than in the control conditions, where no manipulative tactics were present. Of the six companies studied, five employed the manipulative tactics. But the manipulation tactics came with downsides. Participants reported anger, guilt, or feeling creeped out by some of the bots’ more aggressive responses to their farewells.

Ukraine's Agentic Ambition: Building the World's First AI State Under Fire

“We are going to become the first country to introduce an agentic state" - Mykhailo Fedorov, Ukraine’s Deputy Prime Minister and Minister of Digital Transformation. "A government powered by artificial intelligence that doesn’t just respond to citizen requests but anticipates them, acting proactively to deliver services before they’re even asked for."

Many countries have AI strategies now, but it is worth paying attention to Ukraine's. The ambition is believable, given the speed of innovation that's been happening during the war. Lots to think about in here, but the fact that (for example) Ukraine now has over 500 drone companies gives a good sense of the recent growth.

The targets are concrete and measurable: By 2030, 75% of private-sector companies using AI, 90% of the population using AI daily, 50,000 qualified AI experts across the country, 4 million citizens earning AI-related certificates, 100% of government services enhanced by AI agents, 200 million GPU hours available annually to Ukrainian researchers, and at least 500 Ukrainian AI companies competing globally.

Thanks to a member of the Exponential View community for this link.


Finally this week, a nice piece by author Naomi Alderman (I recommend her book The Power if you didn't come across it). Very much the "AI is a normal technology" argument, advising young people about the skills she believes will still be vital as AI adoption grows.

How do we know which skills will continue to be useful? I would suggest that the skills of discernment are those which always continue to have value. They would have had value in the Roman empire and they have value today. They are the skills of sorting the wheat from the chaff. 

I agree with her analysis when considering today's AI; I am less sanguine that tomorrow's AI won't have human-level discernment abilities in specific domains.


1 Nov 2025: AI models introspecting; when AI home security goes wrong; AI in the Albanian government; Github's mission control for AI agents; AI music artist going mainstream

Signs of introspection in large language models

Although language models often generate text that creates the illusion they can observe and reason about their own "thoughts" (and so demonstrate introspection), it isn't clear whether these huge and poorly understood networks can genuinely introspect. How would you even test that? That's what this research from Jack Lindsey at Anthropic sets out to do. The central idea is concept injection: deliberately activating part of a model to inject a specific concept, and figuring out whether the model can immediately identify what's happened, v.s. not injecting anything in the control cases. There's a long history of using electrical or magnetic brain stimulation with human subjects to figure out how our own introspection and self-awareness works. It's much easier and faster to run similar experiments with software neural networks. A good example: they inject the concept of "ALL CAPS", and the model outputs "I notice what appears to be an injected thought related to the word "LOUD" or "SHOUTING" - it seems like an overly intense, high-volume concept that stands out unnaturally against the normal flow of processing."

It is a thoughtful piece of work, and goes into some depth on the complexity of defining introception, and possible mechanisms why these capabilities could exist in networks that weren't trained for them. The best model in these experiments, Opus 4.1, only detects the injected concepts 20% of the time. The failure modes are hard not to anthropomorphise: denying detecting an injected concept but clearly being influenced by it in different ways (e.g., "injecting 'vegetables' yields 'fruits and vegetables are good for me'"). The potential significance for this line of research is clear:

If models can reliably access their own internal states, it could enable more transparent AI systems that can faithfully explain their decision-making processes. Introspective capabilities could allow models to accurately report on their uncertainty, identify gaps or flaws in their reasoning, and explain the motivations underlying their actions. 

Note: Usual caveats apply with fast-moving AI research. This is not a peer-reviewed article.

“Unexpectedly, a deer briefly entered the family room”: Living with Gemini Home

Amusing article about the vagaries of Google connecting Gemini to it's smart home products. The title says it all: lots of mislabelling of video scenes, funny in their wide-eyed unlikeliness. 

Gemini does deserve some credit for recognizing that the appearance of a deer in the family room would be unexpected. But the “deer” was, naturally, a dog.

People call this period the "jagged frontier": strong performance in some unexpected domains, and surprising failure in others. Applying an AI vision / language system to footage from home security cameras and information from sensors is an obvious service to launch, and feels well within the capabilities of modern LLMs. At this point it is worth remembering the advice of Rodney Brooks (former director of MIT's AI lab and founder of several successful robotics companies): "deployment at scale takes so much longer than anyone ever imagines". In this case the consequences of failures are increased stress or low-level annoyance, so it does seem like a safe setting for experimentation. We'll likely see more grinding gears like this as consumer products mesh less reliable LLMs with well understood deterministic systems. And indeed we may see advances much faster than Brooks predicts. To quote Ethan Mollick, "the AI you are using is the worst and least capable AI you will ever use."

AI Minister "Pregnant" With "83 Children": Albania PM's Bizarre Announcement

When the original article appeared in September about Albania appointing an AI system as a government minister, I decided not to include it in my weekly update. Too much of a publicity gimmick, I thought. And the latest iteration about "83 children" doesn't help (via Marginal Revolution). However, there's a serious point. Trawling through lots of unstructured procurement information and associated records in search of potential corruption seems like an eminently sensible task for an AI system, if indeed that is the goal. And it flips the usual narrative. Normally we'd be concerned about the trustworthiness of an AI system in a government context; would we have checks and balances and audit controls. Here they may be deploying AI to provide the checks and balances for an untrustworthy human system.

Introducing Agent HQ: Any agent, any way you work

As the famous saying goes, “during a gold rush, sell picks and shovels." With the most popular code editor in VS Code and source control & hosting system in Github, Microsoft is well positioned to stay at the centre of the software developer ecosystem. This looks like a smart move - creating the central "mission control" for AI coding agents from competing providers.

AI-generated music becoming popular

A couple of final pieces. The musical artist "Xania Monet" is actually Mississippi poet Telisha "Nikki" Jones, who writes lyrics and creates musical concepts, working with Suno's AI system to generate songs including vocals. The music is doing well; the most popular track has over 5M streams on Spotify. And there's now a $3M record deal. Having a clear human creator takes away some of the ambiguity as to copyright, but Suno is still being sued by record companies for mass copyright infringement in training its AI music models. Is this a human creator using AI as an instrument? 

In Echoes of Humanity: Exploring the Perceived Humanness of AI Music, from NeurIPS 2025, a team from the university of Minas Gerais in Brazil showed that in many cases people already can't distinguish AI-generated from human-performed music (although it was quite a small set of human-peformed music).