Apologies for the summer holiday hiatus; weekly updates should now resume!
Learning the natural history of human disease with generative transformers
First up, a significant piece of work that points towards a big new research area. Rather than creating a large language model, this group from the German Cancer Research Centre and the University of Heidelberg alongside the European Bioinformatics Institute in Cambridge are creating a large health model. They are using data from the 500,000 volunteers for UK Biobank to create a model to predict disease progression across multiple diseases, and testing with similar data from Finland. It is very promising, as it can already replicate the accuracy of some existing long-standing risk predictiors. It only took 1 hour of GPU time to train. They also created and published a synthetic dataset, and it appears that using that instead of real people's data was only slighly less accurate. Useful synthetic data will speed up health research: if it isn't and doesn't include personal data, it should be far easier to distribute and work with.
Defeating Nondeterminism in LLM Inference
A technical report from Thinking Machines Lab (founded by former OpenAI CTO Mira Murati) that looks at ways to make LLM output fully deterministic. Quite a technical area, as it comes down to looking at very detailed implementation design like how GPU computation is paralellised and how work is batched. However, knowing that we could have fully reproducible, deterministic LLM outputs (given some cost or computation penalties) would be important for domains like healthcare or law. Beware this isn't peer reviewed or published as a paper yet.
I like this short piece by author Robin Sloan, because he points out something obvious that needed putting into words. We have an episodic, autobiographical memory that means we remember the process of how we learned things. AI systems don't. They appear in the world with a fully formed language generation capability. One of the reasons we're less likely to fabricate stories thinking they're true is that we'll have a history with those truths; we'll remember when we learned them.