AI Consciousness: A Centrist Manifesto
Fantastic bit of writing from Jonathan Birch, a philosophy professor at the London School of Economics. A very complex set of topics explained clearly and engagingly. The "centrist" idea comes from considering two challenges equally seriously without dismissing either.
Challenge One: millions of users will soon misattribute human-like consciousness to their AI friends, partners, and assistants on the basis of mimicry and role play, and we don't know how to prevent this.
Challenge Two: Profoundly alien forms of consciousness might genuinely be achieved in AI, but our theoretical understanding of consciousness is too immature to provide confident answers one way or the other.
Worth the investment to read this paper slowly. I'll just pull out one example of a great analogy.
The persisting interlocutor illusion is the illusion that when talking to an AI chatbot you're talking to a continuously present entity, a "someone" at the other end of the conversation (rather than multiple LLM instances stopping and starting independently). He compares this to conversations with doctors in the UK:
When I was growing up, it used to be that you had one doctor: your GP, or General Practitioner. Each time you got ill, you’d go and see the same person. Nowadays, it’s always a different person. The notes about your medical history are the only source of continuity with the previous appointment. Now imagine the doctor arguing: "I know you don’t like having a different doctor at every appointment. So, I’ve started making detailed transcripts of our conversations. That way, you will have the same doctor at each appointment. My successor will receive the full transcript, and that is enough psychological continuity for them to count as the same person."
You would reply: that isn’t psychological continuity at all!
He argues that, in the same way, an apparently continuous conversation with an AI chatbot in no way implies any personal identity for the AI.
An AI OS from a design perspective
A post from David Galbraith to read alongside commentary and further ideas from Matt Webb: The destination for AI interfaces is Do What I Mean (who provides further context from the history of human-computer interaction). An exploration of how interfaces will evolve.
AI buttons are different from, say Photoshop menu commands in that they can just be a description of the desired outcome rather than a sequence of steps (incidentally why I think a lot of agents’ complexity disappears). For example Photoshop used to require a complex sequence of tasks (drawing around elements with a lasso etc.) to remove clouds from an image. With AI you can just say ‘remove clouds’ and then create a remove clouds button. An AI interface is a ‘semantic interface’.
It ends with an intriguing idea, where he wonders if the idea of an "app in a document" rather than a "document in an app" is the way forward. So more like a Jupyter notebook and less like Microsoft Word. Coincidentally, Component software from Atlassian design and AI lead David Hoang has the same argument, harking back to Apple's 1990s OpenDoc idea to have "compound documents". The diagram below from Hoang's article shows how this might work, combining AI services towards a specific task.
GPT-5 Is a Terrible Storyteller – And That's an AI Safety Problem
Christoph Heilig from the University of Munich noticed that ChatGPT 5 was generating terrible nonsense in its stories. And not only not realising it was terrible nonsense, but insisting that it wasn't. These examples were rated highly by different LLMs for instance:
"The marrow knew the street. Rain touched sinew. The camera watched his corpus."
"Sinew genuflected. eigenstate of theodicy. existential void beneath fluorescent hum Leviathan. Entropy's bitter aftertaste."
He hypothesises that, since AI judges are used to train new AI systems, the new systems are finding loopholes, learning to write nonsense that other AIs rate highly but that no human would. He ran many variations of texts via many LLMs
This confirms my hypothesis: GPT-5 has been optimized to produce text that other LLMs will evaluate highly, not text that humans would find coherent. ... The implications for AI safety are profound: We've created models that share a "secret language" of meaningless but mutually-appreciated literary markers, defend obvious gibberish with impressive-sounding theories, and sometimes even become MORE confident in their delusions when given more compute to think about them.
It would be interesting to see how his experiment asks the LLMs to evaluate the deliberately nonsensical texts.