Signs of introspection in large language models
Although language models often generate text that creates the illusion they can observe and reason about their own "thoughts" (and so demonstrate introspection), it isn't clear whether these huge and poorly understood networks can genuinely introspect. How would you even test that? That's what this research from Jack Lindsey at Anthropic sets out to do. The central idea is concept injection: deliberately activating part of a model to inject a specific concept, and figuring out whether the model can immediately identify what's happened, v.s. not injecting anything in the control cases. There's a long history of using electrical or magnetic brain stimulation with human subjects to figure out how our own introspection and self-awareness works. It's much easier and faster to run similar experiments with software neural networks. A good example: they inject the concept of "ALL CAPS", and the model outputs "I notice what appears to be an injected thought related to the word "LOUD" or "SHOUTING" - it seems like an overly intense, high-volume concept that stands out unnaturally against the normal flow of processing."
It is a thoughtful piece of work, and goes into some depth on the complexity of defining introception, and possible mechanisms why these capabilities could exist in networks that weren't trained for them. The best model in these experiments, Opus 4.1, only detects the injected concepts 20% of the time. The failure modes are hard not to anthropomorphise: denying detecting an injected concept but clearly being influenced by it in different ways (e.g., "injecting 'vegetables' yields 'fruits and vegetables are good for me'"). The potential significance for this line of research is clear:
If models can reliably access their own internal states, it could enable more transparent AI systems that can faithfully explain their decision-making processes. Introspective capabilities could allow models to accurately report on their uncertainty, identify gaps or flaws in their reasoning, and explain the motivations underlying their actions.
Note: Usual caveats apply with fast-moving AI research. This is not a peer-reviewed article.
“Unexpectedly, a deer briefly entered the family room”: Living with Gemini Home
Amusing article about the vagaries of Google connecting Gemini to it's smart home products. The title says it all: lots of mislabelling of video scenes, funny in their wide-eyed unlikeliness.
Gemini does deserve some credit for recognizing that the appearance of a deer in the family room would be unexpected. But the “deer” was, naturally, a dog.
People call this period the "jagged frontier": strong performance in some unexpected domains, and surprising failure in others. Applying an AI vision / language system to footage from home security cameras and information from sensors is an obvious service to launch, and feels well within the capabilities of modern LLMs. At this point it is worth remembering the advice of Rodney Brooks (former director of MIT's AI lab and founder of several successful robotics companies): "deployment at scale takes so much longer than anyone ever imagines". In this case the consequences of failures are increased stress or low-level annoyance, so it does seem like a safe setting for experimentation. We'll likely see more grinding gears like this as consumer products mesh less reliable LLMs with well understood deterministic systems. And indeed we may see advances much faster than Brooks predicts. To quote Ethan Mollick, "the AI you are using is the worst and least capable AI you will ever use."
AI Minister "Pregnant" With "83 Children": Albania PM's Bizarre Announcement
When the original article appeared in September about Albania appointing an AI system as a government minister, I decided not to include it in my weekly update. Too much of a publicity gimmick, I thought. And the latest iteration about "83 children" doesn't help (via Marginal Revolution). However, there's a serious point. Trawling through lots of unstructured procurement information and associated records in search of potential corruption seems like an eminently sensible task for an AI system, if indeed that is the goal. And it flips the usual narrative. Normally we'd be concerned about the trustworthiness of an AI system in a government context; would we have checks and balances and audit controls. Here they may be deploying AI to provide the checks and balances for an untrustworthy human system.
Introducing Agent HQ: Any agent, any way you work
As the famous saying goes, “during a gold rush, sell picks and shovels." With the most popular code editor in VS Code and source control & hosting system in Github, Microsoft is well positioned to stay at the centre of the software developer ecosystem. This looks like a smart move - creating the central "mission control" for AI coding agents from competing providers.
AI-generated music becoming popular
A couple of final pieces. The musical artist "Xania Monet" is actually Mississippi poet Telisha "Nikki" Jones, who writes lyrics and creates musical concepts, working with Suno's AI system to generate songs including vocals. The music is doing well; the most popular track has over 5M streams on Spotify. And there's now a $3M record deal. Having a clear human creator takes away some of the ambiguity as to copyright, but Suno is still being sued by record companies for mass copyright infringement in training its AI music models. Is this a human creator using AI as an instrument?
In Echoes of Humanity: Exploring the Perceived Humanness of AI Music, from NeurIPS 2025, a team from the university of Minas Gerais in Brazil showed that in many cases people already can't distinguish AI-generated from human-performed music (although it was quite a small set of human-peformed music).