21 Sep 2025: Learning to predict diseases; how to guarantee reproducibility; why we don't hallucinate as much as AI systems; the explosion of image generation capabilities

Apologies for the summer holiday hiatus; weekly updates should now resume!

Learning the natural history of human disease with generative transformers

First up, a significant piece of work that points towards a big new research area. Rather than creating a large language model, this group from the German Cancer Research Centre and the University of Heidelberg alongside the European Bioinformatics Institute in Cambridge are creating a large health model. They are using data from the 500,000 volunteers for UK Biobank to create a model to predict disease progression across multiple diseases, and testing with similar data from Finland. It is very promising, as it can already replicate the accuracy of some existing long-standing risk predictiors. It only took 1 hour of GPU time to train. They also created and published a synthetic dataset, and it appears that using that instead of real people's data was only slighly less accurate. Useful synthetic data will speed up health research: if it isn't and doesn't include personal data, it should be far easier to distribute and work with.

Defeating Nondeterminism in LLM Inference

A technical report from Thinking Machines Lab (founded by former OpenAI CTO Mira Murati) that looks at ways to make LLM output fully deterministic. Quite a technical area, as it comes down to looking at very detailed implementation design like how GPU computation is paralellised and how work is batched. However, knowing that we could have fully reproducible, deterministic LLM outputs (given some cost or computation penalties) would be important for domains like healthcare or law. Beware this isn't peer reviewed or published as a paper yet.

Knowledge and memory

I like this short piece by author Robin Sloan, because he points out something obvious that needed putting into words. We have an episodic, autobiographical memory that means we remember the process of how we learned things. AI systems don't. They appear in the world with a fully formed language generation capability. One of the reasons we're less likely to fabricate stories thinking they're true is that we'll have a history with those truths; we'll remember when we learned them.


This is an extensive repository of currently 91 examples of what you can do with the new Google Nano Banana image generation tool. The longer it has been available, the more capabilities people have figured out. Each one has examples and a detailed prompt. Everything from generating a photo of a scene from a map to creating movie storyboards. We're still in the infancy of understanding how these tools will be deployed.