11 Oct 2025: AI novel that regenerates daily based on news; Sam Altman on platform strategy; AI and the end of thinking; Chrome's built-in AI model; Big new AI reports

The Next Four Years


Interesing concept: an AI-written novel that regenerates daily based on recent news. As I look at it today on 11 Oct 2025, chapter 1 starts on 15 Oct 2025, with a scientist at the Centres for Disease Control seeing concerning H5N1 bird flu virus mutation rates, in the context of massive budget and job cuts at the CDC. A good counterpart to the 27 Sept post that looked at AI superforecasting.
This experiment set out to answer two questions:
First, can Al analyse eight months of U.S. government upheaval and write a near-term speculative fiction novel that predicts an imaginable future for America?
And next, can Al automatically update that novel daily based on the 24-hour news cycle without any human editorial intervention?
The credits list "Author: Claude Sonnet 4 | Editor: Gemini 2.5 Pro | Researcher: GPT-5". Thanks to Webcurios for recommending this.

An Interview with OpenAI CEO Sam Altman About DevDay and the AI Buildout

In attempting to make sense of the bilzzard of OpenAI's recent launches, this conversation between Ben Thompson (who writes Stratechery) and Sam Altman is useful. Stratechery has been a great source of technology strategy thinking over the years. The latest take compares ChatGPT to Windows' rise to dominance (OpenAI’s Windows Play), with popularity among users attracting developers:

This is a push to make ChatGPT the operating system of the future. Apps won’t be on your phone or in a browser; they’ll be in ChatGPT, and if they aren’t, they simply will not exist for ChatGPT users.

There's lots of interesting quotes in the interview, just picking out a few here. On copyright issues in products like Sora, he's predicting that rights holders will actually want their IP and content to be used:

I predict in another year, maybe less or something like that, the thing will be, “OpenAI is not being fair to me and not putting my content in enough videos and we need better rules about this”, because people want the deep connection with the fans.

And in terms of what's next:
We are going to spend a lot on infrastructure, we are going to make a bet, the company scale bet that this is the right time to do it. Given where we are with the research, with our business, with the product, what we see happening and is it the right decision or not? We will find out, but it is the decision we’re going to make.
Give us a few months and it’ll all make sense and we’ll be able to talk about the whole — we are not as crazy as it seems. There is a plan. ... I do feel like this is a once in a lifetime opportunity for all of us and well take the run at it.


Two links that are worth reading together. The first is a polemical essay by Derek Thompson (journalist and co-author of Abundance with Ezra Klein). He looks at declining writing and reading (in the US) and sees AI as the latest phenomenon after TV, the web, social media, smartphones and then streaming media that "steals our focus" and encroaches on space for deep thinking.
Do not let stories on the rise of “thinking machines” distract you from the real cognitive challenge of our time. It is the decline of thinking people.
The second is a glimpse into China's encouragement of AI in education, come what may: "Beijing is making AI education mandatory in schools", "Guangxi province has instructed schools to experiment with AI teachers, AI career coaches, and AI mental health counsellors". China will be where we first see how Thompson's concerns play out.
The one-foot tall AlphaDog ... was developed by robotics startup Weilan and is powered by DeepSeek’s AI model. In addition to practicing English with Wu’s son, it chats with him about current events, dances to his guitar music, and, through its built-in camera, helps Wu monitor the home when she is away. It has become a part of the family... “My son needs company, but we are a one-child family,” Wu said. “He asks the dog about all kinds of things — national news, weather, geography. Through AlphaDog, he is learning what the world is like.”

How to Try Chrome’s Hidden AI Model

I hadn't realised that Chrome is already shipping with a fully functional local LLM (the tiny but still multimodal Gemini Nano, that also ships with some Android phones). This post explains how to activate and access it. This kind of distribution and usage will be a lot easier for less technical folks compared to using something like Ollama, and will be a disruptive direction if it gets take up (using a local LLM from or within a web page is quite different to installing an app). Very small models that can run on laptops or phones are advancing rapidly but get less press: look at the new 3B parameter Jamba reasoning model from AI21 or the much smaller 7M parameter Tiny Recursion Model from Samsung's AI lab in Montreal.

Important new reports out recently:
  • The annual State of AI from Nathan Benaich and Air Street Capital is always comprehensive and insightful. It is also huge, with a 313-slide deck. Recommended.
  • A report on the State of AI-assisted Software Development from DORA. DORA is the DevOps Research and Assessment group, with a very long running research programme on software development (acquired by Google in 2018). A lot in here about the practicalities and cultural aspects of real life AI adoption in software teams.









27 Sep 2025: General purpose vision understanding; AI superforecasting; Western bias; the AI megasystem

Video models are zero-shot learners and reasoners

Pivotal insights from Google DeepMind published this week. Everyone was surprised at the sheer variety of tasks that LLMs could tackle; noone expected that a next-word prediction machine could write good code, or reason through problems, or many of the other applications we now take for granted that were't previously considered purely language or writing tasks. This work suggests that video models are similar, albeit a few years earlier in their evolution. They show a remarkable range of activities that Veo3 can perform. Remember, Veo3's job is just to produce a series of frames for a short video (and accompanying audio), just like an LLM's job is to produce a series of words.

Could video models be on a trajectory towards general-purpose vision understanding, much like LLMs developed general-purpose language understanding? We demonstrate that Veo 3 can solve a broad variety of tasks it wasn’t explicitly trained for: segmenting objects, detecting edges, editing images, understanding physical properties, recognizing object affordances, simulating tool use, and more. These abilities to perceive, model, and manipulate the visual world enable early forms of visual reasoning like maze and symmetry solving. Veo’s emergent zero-shot capabilities indicate that video models are on a path to becoming unified, generalist vision foundation models.

This is easiest to understand with an example, one of the very many presented. Can a video generation model successfully find a path through a maze? The model is given the maze as a starting image and simply asked to generate an animation of what happens next, given a prompt. The prompt starts with: "Without crossing any black boundary, the grey mouse from the corner skillfully navigates the maze by walking around until it finds the yellow cheese."

Here's the result:

(I've actually picked an example that only worked in 17% of their experiments, but there are many others with much higher success rates. The mouse in the maze makes a good video though! The expectation is that, like LLMs, these capabilities will continue to improve)

British AI startup beats humans in international forecasting competition

Asimov's Foundation series introduced the fictional science of psychohistory, that can predict broad societal trends and events across a galactic civilisation. Mantic is a startup attempting to build an initial version. I hadn't realised that forecasting is a competive sport. The Metaculus Cup sets a number of prediction challenges, answers are submitted, and scored 2 weeks later (so it is quite a short time frame). Mantic achieved 8th place in the summer 2025 contest, the highest ever for a bot, across a wide variety of questions predicting developments in Ukraine and Gaza, sporting results, elections, all kinds of political events. Mantic's approach appears is a multi-agent system:

Mantic breaks down a forecasting problem into different jobs and assigns them to a roster of machine-learning models including OpenAI, Google and DeepSeek, depending on their strengths.

Using AI (rather than human "superforecasters") opens up possibilities for faster experimentation. They can do "backtesting", giving the AI access to information prior to a certain date and then asking for predictions, where the outcome is already known. And they can work at much greater speed and scale. It will be interesting to see if this kind of technology starts being applied outside of finance and trading.

Which Humans?

This research from the Culture, Cognition, Coevolution Lab at Harvard looks at how LLMs answer questions compared to people from different cultures and countries. As they state:

Technical reports often compare LLMs’ outputs with “human” performance on various tests. Here, we ask, “Which humans?” Much of the existing literature largely ignores the fact that humans are a cultural species with substantial psychological diversity around the globe that is not fully captured by the textual data on which current LLMs have been trained.

It's introduced me to a new acronym - WEIRD - Western, Educated, Industrialised, Rich, and Democratic. WEIRD populations "tend to be more individualistic, independent, and impersonally prosocial (e.g., trusting of strangers) while being less morally parochial, less respectful toward authorities, less conforming, and less loyal to their local groups." Unsurprisngly, LLMs are trained on very WEIRD-biased text ("most of the textual data on the internet are produced by WEIRD people (and primarily in English)"), and so we get the "WEIRD-in WEIRD-out" problem. The World Values Survey (WVS) is a long running international survey that's been done in waves since 1981, and looks at values, norms, beliefs, and attitudes around politics, religion, family, work, identity, trust, and well-being. By essentially getting ChatGPT to answer the WVS survey questions, it can be placed on a scale for comparison. The graph below shows the ChatGPT WEIRD bias pretty clearly: ChatGPT is much more correlated with answers from countries like the US.

Why the AI “megasystem problem” needs our attention

Not the usual AI doomer nonsense. Quite the opposite: a depressingly realistic view from Susan Schneider (a philosphy professor at Florida Atlantic University) on likely problems that will come not from a single superintelligence that is created in some lab, but from the "megasystem":

"But the real risk isn’t one system going rogue. It’s a web of systems interacting, training one another, colluding in ways we don’t anticipate.... Losing control of a megasystem is far more plausible than a single AI going rogue. And it’s harder to monitor, because you can’t point to one culprit — you’re dealing with networks."

It has some parallels to systemic risk in financial markets, but the effect on individuals and culture makes it a different kind of problem:

Individuals need to cultivate awareness. Recognize the risks of addiction and homogeneity. Push for friction in learning. Demand transparency about how these tools shape our thought patterns. Without cultural pressure, policy alone won’t be enough.


21 Sep 2025: Learning to predict diseases; how to guarantee reproducibility; why we don't hallucinate as much as AI systems; the explosion of image generation capabilities

Apologies for the summer holiday hiatus; weekly updates should now resume!

Learning the natural history of human disease with generative transformers

First up, a significant piece of work that points towards a big new research area. Rather than creating a large language model, this group from the German Cancer Research Centre and the University of Heidelberg alongside the European Bioinformatics Institute in Cambridge are creating a large health model. They are using data from the 500,000 volunteers for UK Biobank to create a model to predict disease progression across multiple diseases, and testing with similar data from Finland. It is very promising, as it can already replicate the accuracy of some existing long-standing risk predictiors. It only took 1 hour of GPU time to train. They also created and published a synthetic dataset, and it appears that using that instead of real people's data was only slighly less accurate. Useful synthetic data will speed up health research: if it isn't and doesn't include personal data, it should be far easier to distribute and work with.

Defeating Nondeterminism in LLM Inference

A technical report from Thinking Machines Lab (founded by former OpenAI CTO Mira Murati) that looks at ways to make LLM output fully deterministic. Quite a technical area, as it comes down to looking at very detailed implementation design like how GPU computation is paralellised and how work is batched. However, knowing that we could have fully reproducible, deterministic LLM outputs (given some cost or computation penalties) would be important for domains like healthcare or law. Beware this isn't peer reviewed or published as a paper yet.

Knowledge and memory

I like this short piece by author Robin Sloan, because he points out something obvious that needed putting into words. We have an episodic, autobiographical memory that means we remember the process of how we learned things. AI systems don't. They appear in the world with a fully formed language generation capability. One of the reasons we're less likely to fabricate stories thinking they're true is that we'll have a history with those truths; we'll remember when we learned them.


This is an extensive repository of currently 91 examples of what you can do with the new Google Nano Banana image generation tool. The longer it has been available, the more capabilities people have figured out. Each one has examples and a detailed prompt. Everything from generating a photo of a scene from a map to creating movie storyboards. We're still in the infancy of understanding how these tools will be deployed.


31 Aug 2025: Two challenges for AI consciousness; Combining AI tools around a document; AI training AI leading to terrible stories

AI Consciousness: A Centrist Manifesto

Fantastic bit of writing from Jonathan Birch, a philosophy professor at the London School of Economics. A very complex set of topics explained clearly and engagingly. The "centrist" idea comes from considering two challenges equally seriously without dismissing either.

Challenge One: millions of users will soon misattribute human-like consciousness to their AI friends, partners, and assistants on the basis of mimicry and role play, and we don't know how to prevent this.

Challenge Two: Profoundly alien forms of consciousness might genuinely be achieved in AI, but our theoretical understanding of consciousness is too immature to provide confident answers one way or the other.

Worth the investment to read this paper slowly. I'll just pull out one example of a great analogy.

The persisting interlocutor illusion is the illusion that when talking to an AI chatbot you're talking to a continuously present entity, a "someone" at the other end of the conversation (rather than multiple LLM instances stopping and starting independently). He compares this to conversations with doctors in the UK:

When I was growing up, it used to be that you had one doctor: your GP, or General Practitioner. Each time you got ill, you’d go and see the same person. Nowadays, it’s always a different person. The notes about your medical history are the only source of continuity with the previous appointment. Now imagine the doctor arguing: "I know you don’t like having a different doctor at every appointment. So, I’ve started making detailed transcripts of our conversations. That way, you will have the same doctor at each appointment. My successor will receive the full transcript, and that is enough psychological continuity for them to count as the same person."

You would reply: that isn’t psychological continuity at all!

He argues that, in the same way, an apparently continuous conversation with an AI chatbot in no way implies any personal identity for the AI.

An AI OS from a design perspective

A post from David Galbraith to read alongside commentary and further ideas from Matt Webb: The destination for AI interfaces is Do What I Mean (who provides further context from the history of human-computer interaction). An exploration of how interfaces will evolve. 

AI buttons are different from, say Photoshop menu commands in that they can just be a description of the desired outcome rather than a sequence of steps (incidentally why I think a lot of agents’ complexity disappears). For example Photoshop used to require a complex sequence of tasks (drawing around elements with a lasso etc.) to remove clouds from an image. With AI you can just say ‘remove clouds’ and then create a remove clouds button. An AI interface is a ‘semantic interface’.

It ends with an intriguing idea, where he wonders if the idea of an "app in a document" rather than a "document in an app" is the way forward. So more like a Jupyter notebook and less like Microsoft Word. Coincidentally, Component software from Atlassian design and AI lead David Hoang has the same argument, harking back to Apple's 1990s OpenDoc idea to have "compound documents". The diagram below from Hoang's article shows how this might work, combining AI services towards a specific task.

GPT-5 Is a Terrible Storyteller – And That's an AI Safety Problem

Christoph Heilig from the University of Munich noticed that ChatGPT 5 was generating terrible nonsense in its stories. And not only not realising it was terrible nonsense, but insisting that it wasn't. These examples were rated highly by different LLMs for instance:  

"The marrow knew the street. Rain touched sinew. The camera watched his corpus."

"Sinew genuflected. eigenstate of theodicy. existential void beneath fluorescent hum Leviathan. Entropy's bitter aftertaste."

He hypothesises that, since AI judges are used to train new AI systems, the new systems are finding loopholes, learning to write nonsense that other AIs rate highly but that no human would. He ran many variations of texts via many LLMs

This confirms my hypothesis: GPT-5 has been optimized to produce text that other LLMs will evaluate highly, not text that humans would find coherent. ... The implications for AI safety are profound: We've created models that share a "secret language" of meaningless but mutually-appreciated literary markers, defend obvious gibberish with impressive-sounding theories, and sometimes even become MORE confident in their delusions when given more compute to think about them.

It would be interesting to see how his experiment asks the LLMs to evaluate the deliberately nonsensical texts.


25 Aug 2025: Seemingly conscious AI & emotional agents; AI as 4 kinds of cultural technology; Separating work and personal AI memory

Emotional Agents

We must build AI for people; not to be a person (Seemingly Conscious AI is Coming)

Starting with a pair of related articles this week from Kevin Kelly (co-founder of Wired among many other things) and Mustafa Suleyman (co-founder of DeepMind and now leading AI at Microsoft). They have similar arguments:  It doesn't matter if an AI can really feel emotions, we'll have emotional relationships with AI anyway. And it doesn't matter if an AI is really conscious, the fact that it seems to be will trigger similar societal impacts. I explore similar themes in Could Annie Bot be powered by ChatGPT?, an exploration of whether present-day AI could fake being the robot character Annie in this year's award winning science fiction novel Annie Bot. Both articles share similar concerns about where this takes human society, and the extent to which we can course correct.

AIs do real things we used to call intelligence, and they will start doing real things we used to call emotions. Most importantly the relationships humans will have with AIs, bot, robots, will be as real and as meaningful as any other human connection. They will be real relationships.
My central worry is that many people will start to believe in the illusion of AIs as conscious entities so strongly that they’ll soon advocate for AI rights, model welfare and even AI citizenship. This development will be a dangerous turn in AI progress and deserves our immediate attention.
Large language models are cultural technologies. What might that mean?

The latest post from Henry Farrell, continuing the theme started with Large AI models are cultural and social technologies. This is a long, thought provoking and dense article, but worth the time. It contrasts four ways of understanding LLMs:
  1. Gopniksim (after Alison Gopnik) is a stance that Farrell has contributed to, viewing LLMs as cultural and social technologies. "Just as written language, libraries and the like have shaped culture in the past, so too LLMs, their cousins and descendants are shaping culture now."
  2. Interactionism. In this view, the interaction between human and AI behaviours is what will give rise to new phenomena. "What is the cultural environment going to look like as LLMs and related technologies become increasingly important producers of culture? How are human beings, with their various cognitive quirks and oddities, likely to interpret and respond to these outputs? And what kinds of feedback loops are we likely to see between the first and the second?"
  3. Structuralism. This philosophical camp regards language as a system separate from its connection to reality or the people who use it, a system where an LLM is suddenly a new kind of language-generating technology, creating a new kind of artificial cultural artifacts.
  4. Role play. This references Murray Shanahan's perceptive take that LLMs are best understood as role playing different characters (Role play with large language models), which is a framing I've personally found illuminative.

There's no answer, this is the start of a longer thought process, and all four lenses may turn out to be useful.

BYOM (Bring Your Own Memory)

I agree with this prediction. We will build and retain the context and memory for AI systems over time, and will need to find ways to compartmentalise personal and work use. The analogy is with BYOD ("bring your own device"), where you can use work applications on a personal device, with appropriate security controls.

Nano Banana! Image editing in Gemini just got a major upgrade

This week's best new launch: much better image editing in Google Gemini. Eventually, gradual small improvements lead to a product feature that is a game changer. This feels like one. It just works, often enough.

Interesting that they tested it under the name "nano banana" in public head-to-head tests, before revealing it was a Google model.