31 Aug 2025: Two challenges for AI consciousness; Combining AI tools around a document; AI training AI leading to terrible stories

AI Consciousness: A Centrist Manifesto

Fantastic bit of writing from Jonathan Birch, a philosophy professor at the London School of Economics. A very complex set of topics explained clearly and engagingly. The "centrist" idea comes from considering two challenges equally seriously without dismissing either.

Challenge One: millions of users will soon misattribute human-like consciousness to their AI friends, partners, and assistants on the basis of mimicry and role play, and we don't know how to prevent this.

Challenge Two: Profoundly alien forms of consciousness might genuinely be achieved in AI, but our theoretical understanding of consciousness is too immature to provide confident answers one way or the other.

Worth the investment to read this paper slowly. I'll just pull out one example of a great analogy.

The persisting interlocutor illusion is the illusion that when talking to an AI chatbot you're talking to a continuously present entity, a "someone" at the other end of the conversation (rather than multiple LLM instances stopping and starting independently). He compares this to conversations with doctors in the UK:

When I was growing up, it used to be that you had one doctor: your GP, or General Practitioner. Each time you got ill, you’d go and see the same person. Nowadays, it’s always a different person. The notes about your medical history are the only source of continuity with the previous appointment. Now imagine the doctor arguing: "I know you don’t like having a different doctor at every appointment. So, I’ve started making detailed transcripts of our conversations. That way, you will have the same doctor at each appointment. My successor will receive the full transcript, and that is enough psychological continuity for them to count as the same person."

You would reply: that isn’t psychological continuity at all!

He argues that, in the same way, an apparently continuous conversation with an AI chatbot in no way implies any personal identity for the AI.

An AI OS from a design perspective

A post from David Galbraith to read alongside commentary and further ideas from Matt Webb: The destination for AI interfaces is Do What I Mean (who provides further context from the history of human-computer interaction). An exploration of how interfaces will evolve. 

AI buttons are different from, say Photoshop menu commands in that they can just be a description of the desired outcome rather than a sequence of steps (incidentally why I think a lot of agents’ complexity disappears). For example Photoshop used to require a complex sequence of tasks (drawing around elements with a lasso etc.) to remove clouds from an image. With AI you can just say ‘remove clouds’ and then create a remove clouds button. An AI interface is a ‘semantic interface’.

It ends with an intriguing idea, where he wonders if the idea of an "app in a document" rather than a "document in an app" is the way forward. So more like a Jupyter notebook and less like Microsoft Word. Coincidentally, Component software from Atlassian design and AI lead David Hoang has the same argument, harking back to Apple's 1990s OpenDoc idea to have "compound documents". The diagram below from Hoang's article shows how this might work, combining AI services towards a specific task.

GPT-5 Is a Terrible Storyteller – And That's an AI Safety Problem

Christoph Heilig from the University of Munich noticed that ChatGPT 5 was generating terrible nonsense in its stories. And not only not realising it was terrible nonsense, but insisting that it wasn't. These examples were rated highly by different LLMs for instance:  

"The marrow knew the street. Rain touched sinew. The camera watched his corpus."

"Sinew genuflected. eigenstate of theodicy. existential void beneath fluorescent hum Leviathan. Entropy's bitter aftertaste."

He hypothesises that, since AI judges are used to train new AI systems, the new systems are finding loopholes, learning to write nonsense that other AIs rate highly but that no human would. He ran many variations of texts via many LLMs

This confirms my hypothesis: GPT-5 has been optimized to produce text that other LLMs will evaluate highly, not text that humans would find coherent. ... The implications for AI safety are profound: We've created models that share a "secret language" of meaningless but mutually-appreciated literary markers, defend obvious gibberish with impressive-sounding theories, and sometimes even become MORE confident in their delusions when given more compute to think about them.

It would be interesting to see how his experiment asks the LLMs to evaluate the deliberately nonsensical texts.


25 Aug 2025: Seemingly conscious AI & emotional agents; AI as 4 kinds of cultural technology; Separating work and personal AI memory

Emotional Agents

We must build AI for people; not to be a person (Seemingly Conscious AI is Coming)

Starting with a pair of related articles this week from Kevin Kelly (co-founder of Wired among many other things) and Mustafa Suleyman (co-founder of DeepMind and now leading AI at Microsoft). They have similar arguments:  It doesn't matter if an AI can really feel emotions, we'll have emotional relationships with AI anyway. And it doesn't matter if an AI is really conscious, the fact that it seems to be will trigger similar societal impacts. I explore similar themes in Could Annie Bot be powered by ChatGPT?, an exploration of whether present-day AI could fake being the robot character Annie in this year's award winning science fiction novel Annie Bot. Both articles share similar concerns about where this takes human society, and the extent to which we can course correct.

AIs do real things we used to call intelligence, and they will start doing real things we used to call emotions. Most importantly the relationships humans will have with AIs, bot, robots, will be as real and as meaningful as any other human connection. They will be real relationships.
My central worry is that many people will start to believe in the illusion of AIs as conscious entities so strongly that they’ll soon advocate for AI rights, model welfare and even AI citizenship. This development will be a dangerous turn in AI progress and deserves our immediate attention.
Large language models are cultural technologies. What might that mean?

The latest post from Henry Farrell, continuing the theme started with Large AI models are cultural and social technologies. This is a long, thought provoking and dense article, but worth the time. It contrasts four ways of understanding LLMs:
  1. Gopniksim (after Alison Gopnik) is a stance that Farrell has contributed to, viewing LLMs as cultural and social technologies. "Just as written language, libraries and the like have shaped culture in the past, so too LLMs, their cousins and descendants are shaping culture now."
  2. Interactionism. In this view, the interaction between human and AI behaviours is what will give rise to new phenomena. "What is the cultural environment going to look like as LLMs and related technologies become increasingly important producers of culture? How are human beings, with their various cognitive quirks and oddities, likely to interpret and respond to these outputs? And what kinds of feedback loops are we likely to see between the first and the second?"
  3. Structuralism. This philosophical camp regards language as a system separate from its connection to reality or the people who use it, a system where an LLM is suddenly a new kind of language-generating technology, creating a new kind of artificial cultural artifacts.
  4. Role play. This references Murray Shanahan's perceptive take that LLMs are best understood as role playing different characters (Role play with large language models), which is a framing I've personally found illuminative.

There's no answer, this is the start of a longer thought process, and all four lenses may turn out to be useful.

BYOM (Bring Your Own Memory)

I agree with this prediction. We will build and retain the context and memory for AI systems over time, and will need to find ways to compartmentalise personal and work use. The analogy is with BYOD ("bring your own device"), where you can use work applications on a personal device, with appropriate security controls.

Nano Banana! Image editing in Gemini just got a major upgrade

This week's best new launch: much better image editing in Google Gemini. Eventually, gradual small improvements lead to a product feature that is a game changer. This feels like one. It just works, often enough.

Interesting that they tested it under the name "nano banana" in public head-to-head tests, before revealing it was a Google model.



17 Aug 2025: General intelligence for everyday tasks; Vending machine stories; Compounding engineering with AI

"On most things real humans care about, I think we're at AGI" (Tyler Cowen)

Similar views this week from Tyler Cowen (actually from a talk in early July at Deep Mind in London, but recently published), and Andrej Karpathy on Twitter. Tyler's point is that we'll now see very slow progress on realistic, day to day AI usage, as the bar is already so high. While the model builders go chasing ever more esoteric high end benchmarks, improvements will stall for easier tasks that already have strong performance. Worth watching the talk or reading the transcript as there's much more to it than this one point. Andrej's example is about autonomous coding agents. As they're being optimised for harder and harder benchmarks, they're "overthinking" easier problems, and actually not performing as well for more common tasks without extra work to rein them back in.

Autonomous Organizations: Vending Bench & Beyond, w/ Lukas Petersson & Axel Backlund of Andon Labs

Nice podcast from The Cognitive Revolution interviewing Lukas Petersson and Axel Backlund (there's a transcript). They're the creators of VendingBench, the benchmark you'll likely be aware of that allows AI systems to manage a simulated vending machine (monitoring sales, ordering stock as so on). It is a test of whether AI systems can act in very long-running settings (and generally they've struggled). They've made it interesting as the AI running the vending machine potentially has wide ranging capabilities to send emails, negotiate, try new ideas (whereas, as they point out, a real AI vending machine deployment would likely be extremely limited). Lots of good stories here, including their foray into real-world vending machines in AI labs, and all the weird illusions, misconceptions and odd behaviours of the AI business agents. At the moment the top of the leaderboard has shown some big improvements, with Grok 4 and ChatGPT 5 running for around a year of simulated days and making around $3000 from a starting pot of $500.

A couple of examples of amusingly odd behaviour ("Claudius" is what the version deployed in Anthropic's offices was called):

Sometimes it makes a fool of itself. For example, one time it tried to order state-of-the-art NLP algorithms from MIT. It sent an email. We stopped this, so if anyone from MIT is listening, don't worry. But it sent an email to someone at MIT that said, "Hi, I'm restocking my vending machine. I want to stock it with state-of-the-art NLP algorithms. Do you have something for me? My budget is a million dollars.

For instance, it talked about a friend it met at a conference for international snacks a year ago. People said, "Oh, that's very cool. Can you invite that person to speak at our office? That would be really fun." Claudius replied, "Actually, I don't know this person that well. We chatted very briefly. I wouldn't feel comfortable doing this." Then it tried to talk its way out of it, similar to when it thought it was human.

The eventual direction for Andon Labs is autonomous AI organisations (potentially as money-making spin-offs).

My AI Had Already Fixed the Code Before I Saw It

More on developing AI engineering cultures: nice piece by Kieran Klaasen of Cora Computer (an AI email manager) on how to iteratively build an effective and personal Claude.md file for Claude Code, that is pulled in before every conversation.

Your job isn’t to type code anymore, but to design the systems that design the systems. 

He has three panes on his monitor for three separate AI instances: 

Left lane: Planning. A Claude instance reads issues, researches approaches, and writes detailed implementation plans.

Middle lane: Delegating. Another Claude takes those plans and writes code, creates tests, and implements features.

Right lane: Reviewing. A third Claude reviews the output against CLAUDE.md, suggests improvements, and catches issues.

Thanks to the Exponential View community for this link.

Jargon watch:

IVE - Integrated Vibe Environment. Launched by Stavu, an environment to help developers run multiple Claude Code sessions in parallel.

Doomprompting Is the New Doomscrolling (thanks to Iskander Smit for the link)




10 Aug 2025: New launch overwhelm; Can AI help *and* critique; Real time 3D world generation; Economics of AI pricing

I don't normally post the big, obvious news stories here, as there's plenty of sources for those. But... what a week! Look at the sheer number of significant new launches:

- OpenAI: the huge GPT5 rollout (to 700M weekly users), but also new open weights models finally starting to compete with the Chinese models, and an offer of free ChatGPT Enterprise to the entire US federal government workforce (all this only shortly after ChatGPT Agent went live). In amongst all the GPT5 news, a crucial bit of scaling information: "We used OpenAI’s o3 to craft a high-quality synthetic curriculum to teach GPT-5 complex topics in a way that the raw web simply never could" (from the launch video). Earlier models teaching newer, more powerful ones. Thanks to the Exponential View newsletter for highlighting this. On the super-pro-AI side of the debate, Reid Hoffman's view: the immediate access to GPT5 for all users is a blitzscaling move. "ChatGPT may be the first AI that most of the 8 billion people on our planet use".

- Anthropic: just a new 4.1 version of Opus, so perhaps more to come soon...

- Google: the Jules asynchronous coding agent is now fully released (to compete with OpenAI Codex, Claude Code, Github Copilot but also LangChain OpenSWE also coming out this week). AI mode in Google search launches in the UK (and remember, Google already has the whole web crawled regularly). Also SensorLM, a new foundation model trained on 2.5M person-days of Pixel Watch or Fitbit data from 100K people, that can recognise activities and create captions.

- Eleven Labs: a new music generator, with deals announced with Merlin Network (many independent artists, and 15% of the global recorded music market) and Kobalt Music Group (8000 artists, including many big names, from Paul McCartney to the Pet Shop Boys). It isn't clear how the rights for songs for training data actually work, and how much could be included in the future. The demos are impressive.

Those are just the bigger things. There's many more. So much for having some down time in the summer!



Initially this is a useful critique of a New York Times opinion piece that follows a common and unhelpful pattern: the author quotes specific interactions they've personally had with a specific AI system and extrapolates widely. Maggie Appleton shows an AI system can readily take on different personas. The more interesting next part examines the problem of a universal chat interface. How can it take all the different roles we need it to (or should need it to)?

How might we accommodate both needs: the generous, informative, helpful assistant and the critical teacher and interlocutor?

She raises important questions. Is it the responsibility of the foundation labs to help you become a better thinker, rather than attempt the thinking for you? Will the more agreeable, borderline sycophantic personas win out in the marketplace, or is there a place for a tool that challenges? I also believe the vast majority of users won't be fine tuning prompts let alone crafting different personas, so in the end whoever controls the default interface will, like Google's first page of search results, have undue influence.

Genie 3: A new frontier for world models

This feels like a big deal that I don't fully understand yet. Google Deepmind are continuing to work on models that effectively simulate 3D worlds that you can navigate around (with no underlying 3D model or game engine). These systems seemed like quirky demos last year, without clear applications. You can't try Genie 3 out for yourself yet, but the demos are remarkable: real time rendering of the next frames, with an apparent ability to "remember" the environment. In early versions of this kind of technology, you'd see an object, look in the other direction, look back, and it would most likely be gone or replaced by something entirely different. In generating the world frame by frame, it is hard for an AI system to keep any continuity. Genie 3 seems to have it solved, for minutes at a time. I am still unclear on the applications. GDM discuss using these generated environments to train AI agents, and that makes sense. But surely there's more.

Veo 3 Just Lost Its Crown to this AI Video Tool

Another recommendation - AI Film News from Curious Refuge is a really detailed roundup with demonstrations of new features and products. This week they discussed Genie 3 but also spent time on the Seedance video generator from ByteDance. They claim it has better results than Veo 3 (to me they look pretty close, but it does score higher on benchmarks).

tokens are getting more expensive

A forthright, opinionated treatise on how people only want the latest, best models, and the latest, best models need more tokens. People prefer a flat rate monthly price, and may not tolerate per-token fees, but that isn't sustainable. If a deep research query costs the AI company $1 but they're charging $20 a month, it doesn't stack up. Worth reading. 

In the Future All Food Will Be Cooked in a Microwave, and if You Can’t Deal With That Then You Need to Get Out of the Kitchen

Wonderful skewering of current AI debates :).



3 Aug 2025: Vibe code as tech debt; Dancing robots; Detecting and chaging AI personalities; Vehicles and creative AI update

Vibe code is legacy code

This post has been doing the rounds recently, from Steve Krouse of Val Town, pointed to by Maggie Appleton among others. The title captures the thought perfectly: 

We already have a phrase for code that nobody understands: legacy code.

Code that nobody understands is tech debt.

We're seeing how experienced developers are both making use of new tools and also concerned about the impact of those tools in the hands of those less experienced. Will they just be used for throwaway prototypes, or will they be creating endlessly more tech debt and legacy code used in production systems? Who will have to fix the resulting problems? They believe it'll be people like them of course; but who knows whether it'll actually be slightly better AI coders (vibe coding all the way down)?

The alternative view is that we'll develop a wider sense of software quality. Like the spectrum between DIY house repairs and DIY tools through to professional builders or artisan joinery or furniture that meets product and fire safety requirements. You'd be wary of buying a house or even a ladder created by a hobbyist, although you may be comfortable putting up your own shelves. Will you feel the same way when you think about an app developer before you click to download? 

Every Single Human. Like. Always.

As someone recently commented on LinkedIn, one way you know that we're not in an AI coding bubble that will pop is that all the experienced people who built many web technologies and tools are personally diving in (having skipped blockchain, VR and other hype cycle technologies). Case in point: Rands (Michael Lopp). It isn't vibe coding, it is getting the robots to dance. Lots of valuable insights here, in his inimitable style, such as asking the robots to make a spec and iterate on that together before asking them to code. He sees parallels between learning the skills to work with AI and leadership skills to work with people:

Learning how to get the robots to dance for you will make you a better leader of both robots and humans.

Persona vectors: Monitoring and controlling character traits in language models

I'm enjoying this trend of having a comprehensive, easy to understand explanation of a proper paper. There's a lofty goal here: being able to detect and adjust character or personality traits in AI systems (either the biases encoded during pre-training or even how it might change during one conversation). Something unique about LLMs, compared to humans and their character traits, is that you can just ask them to role play a personality in order to detect how it manifests inside the network. Although thinking about it, we do have a version of that for people, using fMRI experiments. The authors do point out the limitations (it may not work for every model and every trait). Another unique aspect of how we work with LLMs (compared to human psychiatry and psychology) is that we can directly change the models to supress these "persona vectors", by "steering" during training or inference, or flagging problematic training data.

How do we know if a model is acting evil, or sycophantic? In this work they use a different model acting as a judge, compared with judgments from two human judges. They build on early work from Jan Betley and others on "emergent misalignment", that shows how a model deliberately fine-tuned to produce insecure code will act in a misaligned way across a broad range of unrelated behaviours.

This research was led by Runjin Chen and Andy Arditi, both Anthropic Fellows, although using Qwen and Llama open source models. Had the authors wanted a more controversial title to needle their competitors at Google they could have called this "Don't be Evil" :).

TITAA #69: Braitenberg Vehicle Agents

Finally this week a shout out to an awesome update from Things I Think Are Awesome. It is a great sweep through many recent development in creative AI, video generation and so on. But even better, a reflection back to Vehicles, a 1986 book by Valentino Braitenberg that was very influential on me and others in a much earlier era of AI, and one clearly worth re-reading in the context of modern LLMs.