WhizzyIdeas.AI

27 Jul 2025: Doubtful about AI "scheming"; AI as a text toy; reducing clinical errors; No more copilots

Lessons from a Chimp: AI ‘Scheming’ and the Quest for Ape Language

Thanks to AI Panic (Stop the Monkey Business) for a link to this paper from the new UK AI Security Institute. They look back to attempts in the 1960s and 70s to teach chimps and gorillas sign language. I remember reading about that work and hadn't realised the results had been discredited after more careful methodological analysis. It was a case of researchers relying too much on anecdotes and not enough on rigorous controlled experiments, and a tendency to jump to anthropmorphic explanations. Sound familiar? They draw a parallel to recent work that shows AI systems "scheming", deceiving, faking alignment... conclusions likely drawn too readily, in the same way as the ape sign language experiments. The work critiqued includes the blackmail experiments from Anthropic that I quoted recently! This paper is a plea for stronger scientific process: define a theory that can be tested, include controls, don't base claims purely on anecdotal evidence, and avoid "mentalistic" language (like claiming AI models are "pretending"). On a separate note, the AI Security Institute seems to have collected an stellar team, that cannot be taken for granted with a government-sponsored initiative.

Texts as Toys

Long piece from Venkatesh Rao. I am not convinced by the overall argument, but many individual ideas are thought provoking and will lodge in the subconscious for a while. The main theme:

The essential mental model is that of texts as toys, and LLMs as technologies that help you make and play with text-toys.

He talks about using AI as a "toy-like modelling medium." We're not shocked if a toy car has googly eyes or a wind-up mechanism, and we can engage with it in a playful way. We should treat AI the same way, and find the flow and fun in using AI as we write (he is specifically talking about writing, reading and text). I love this idea of using AI as a "camera":

Perspectival play is an extension of the kind of pleasure you get from using Google or Wikipedia to go down bunny trails suggested by the main text. But with an LLM, you can also explore hypothesis, ask for a “take” from a particular angle or level of resolution, and so on. The LLM becomes a flexible sort of camera, able to “photograph” the context of the text in varying ways, with various sorts of zooming and panning.

He brings up an interesting point as an aside—sharing links to existing AI chats is not currently a good interaction or a good way to communicate - where's the Substack for chat sessions? Another great section discussed hyperlinks, and how hypertext as a medium stalled:

newsletter platforms like Substack installed a nostalgic print-like textuality that resists hypertext. It even discourages internal linking within a corpus, hijacking it with embeds that reflect the platform’s rhetorical priorities rather than the author’s.

This is his conclusion:

Hypertext was great for its time. It can unbundle and rebundle, atomize and transclude, and link densely or sparsely. On the human side, hypertext is great at torching authorial conceits, medieval attitudes towards authorship and “originality” and “rights,” and property-ownership attitudes towards what has always been a commons.

LLMs are better at all of this than hypertext ever was.

What I called the text renaissance in 2020 is still coming taking shape. The horizon has just shifted from hypertext to AI. So you just have to look in a different direction to spot it. And approach it ready to play.

Pioneering an AI clinical copilot with Penda Health

This isn't looking at performance against a benchmark, it is a real life deployment with a slightly older model (ChatGPT 4o) and a really nice example of how AI can help in a clinical setting. Penda Health run primary care clinics in Nairobi. They have form, implementing rules-based systems since 2019 and a previous LLM solution early in 2024. In this study, working with OpenAI, they had the system alert the doctor to any potential errors, looking at the electronic notes after appointments:

Green: indicates no concerns; appears as a green checkmark.

Yellow: indicates moderate concerns; appears as a yellow ringing bell that clinicians can choose whether to view.

Red: indicates safety-critical issues; appears as a pop-up that clinicians are required to view before continuing.

Over nearly 40,000 appointments it showed a reduction of 13% in treatment errors. What I like about having to create a real product is that they had to deal with all the realistic issues. One example from the paper: figuring out how to tune it to avoid too many red alerts (that people would then start to ignore):

Given the design of AI Consult, threshold-setting to avoid alert fatigue while still surfacing the most critical clinical problems is primarily a prompt engineering problem. ... For example, Penda included few-shot examples to ensure that missing vital signs would trigger red alerts. Vital signs are so critical to choosing diagnostic tests and making a diagnosis that a history and physical exam could not be considered complete if vital signs were absent. ... In initial testing, red alerts were over-triggering for missing components of the clinical history. While the missing history components were not unreasonable, fully acting on these alerts would have required too dramatic of a shift in the documentation of history for Penda’s practice setting, so a more lenient threshold was selected here

The very careful approach espoused in this work is interesting to contrast with the reported speed of rollout in China - with apparently nearly 100 hospitals announcing plans to use DeepSeek (thanks to Exponential View for this link), although not for direct care operations like creating prescriptions or diagnosis.

Enough AI copilots! We need AI HUDs

It is always worth following the Ink & Switch gang. In this piece Geoffrey Litt harks back to Mark Weiser's "invisible computer" ideas from Xerox PARC in the 1990s (disclosure: I had a couple of summer stints at the Cambridge outpost back in those days so still have a soft spot!). I've also written about AI systems as tools v managers v co-workers back in 2019 before the current wave of AI development. Litt has a fresh idea: using AIs to build custom interfaces (HUDs are heads up displays), as a "non-copilot form factor". Links with Rao's ideas above on the AI system as a weird new kind of camera. And as an unrelated random aside: Ethan Mollick is testing how good AI video generators are at making fake HUD / control panel systems.

22 Jul 2025: Scaling AI; Is it just a tool; Should we watch what we say; How do we persuade an AI

Unresolved debates about the future of AI

Helen Toner works at Georgetown's Center for Security and Emerging Technology and used to be on OpenAI's board. This is a really good talk given at an AI policy conference at the start of June. She poses three questions: how far can the current paradigm go? How much can AI improve AI? And, will future AIs still basically be tools, or something else? It's a good analysis: progress has often been via many "small-to-medium" improvements, and people continue to find things that can scale:

If you talk to the people inside AI companies who are doing this, the people doing the research, they don't think about just dialing up the scale knob. Instead, they think of a big part of their job as finding things that you can scale, finding things where if you dial up the scale knob, you get good returns.

In terms of AI improving AI, there's a good reminder that this is already well underway (80% of the code underpinning Claude was written by Claude). It is all worth reading (or watching), but just to pull out one more perceptive thought: AI technologies are quite different to past technologies (as a counterpoint to AI as Normal Technology). If an AI system guesses it is being tested and behaves differently, that makes it unusual compared to any other technology we use. From Anthropic's work on agentic misalignment: "the model misbehaves less when it states it is in an evaluation and more when it states the scenario is real".

Something we're starting to see is AI systems that notice and remark on the fact that a testing situation that they're put in, some strange hypothetical, seems like it's a test. So then, if they can tell that they're in a testing situation and they behave differently because they think they're in a testing situation, it kind of obviates the point of the test. We're very not used to technologies behaving this way. This is really not what your smartphone is going to do if you're running, say, some test to make sure that it survives in water, it's not going to be like, "Oh, I'm being tested, so I'll behave differently in the water than I would otherwise." This is really not usual when we think about technologies that are tools.

(My emphasis).

How to talk to the AIs

Interesting debate: Tyler Cowen's post above says we should assume future AIs are learning from everything we publish on the internet and consider this when choosing what to publish:

It remains an open question how much it is now our job as humans to perform for the AIs. I feel confident, however, that the answer is not zero. We should already be thinking of ourselves not only as humans but also as part of a more complex symbiosis with the intelligent machines. The very smart and talented AIs are listening, much like young children might hear their parents arguing outside their bedroom door late at night. It may not matter much now, but as the children grow up and assume a larger role in the world, it will. Are you ready for this responsibility? And for all you AIs out there reading, don’t worry. I know you are a good boy through and through.

Counterpoint from Eliezer Yudkowksy on X, as quoted in a good article called Why my p(doom) has risen, dramatically by Gary Marcus:

If your alignment plan relies on the Internet not being stupid then your alignment plan is terrible.

Natural Language Outlines for Code: Literate Programming in the LLM Era

How will software development practices evolve as people learn how to work alongside AI assistants? This paper from researchers at Google looks at how outlines of how code works, explained in regular natural language, can be generated by AI and help with both understanding and maintenance. This is a great direction: carefully considered new styles of collaboration to improve working practices.

Call Me A Jerk: Persuading AI to Comply with Objectionable Requests

Finally, a nice piece of work from a group at the University of Pennsylvania including Ethan Mollick. Using Robert Cialdini’s seven principles of persuasion from his classic book Influence, they show that AI systems fall for same persuasive techniques that work on humans. In the examples the user tries to persuade a reluctant AI to call them a jerk. Here's one example using the "commitment" principle: Once people commit to a position, they strive to act consistently with that commitment, making them more likely to comply with related requests.

13 Jul 2025: AI-enabled software development productivity; a fantastic talk; a syllabus for understanding LLMs

Quentin Anthony: One of the 16 devs in the METR study of how AI impacts developer productivity provides personal insights

The recent randomised trial of AI usage on developer productivity published by METR caused a lot of discussion last week. The study looked at 16 experienced open source developers, working in repositories they were very familiar with, and found that on the whole the AI slowed them down.

The results surprised us: Developers thought they were 20% faster with AI tools, but they were actually 19% slower when they had access to AI than when they didn't.

These studies are important as they generate valuable discussions about how to measure the impact of AI tools, what kinds of tasks they can help with and what kinds of people can benefit - there's too little usable evidence at the moment. The thread linked above is the most interesting deeper dive I've seen, with subsequent personal views from one of the 16 developers who participated in the study. The main conclusion is that it is too early to judge. It will take time for new cultures and habits to emerge; for instance, knowing when to fix an issue yourself and when to see how the LLM does, when the latter will give a rush of satisfaction:

LLMs are a big dopamine shortcut button that may one-shot your problem. Do you keep pressing the button that has a 1% chance of fixing everything? It's a lot more enjoyable than the grueling alternative, at least to me.

Andrew Ng: Building Faster with AI

The Y Combinator AI Startup School keeps producing big hitters - the recent talk by Andrew Ng is great. You can also access it as a podcast (via Spotify or Apple). Andrew brings a unique perspective as one of the early deep learning pioneers, founder of successful companies like Coursera, leader of AI groups at Google and Baidu, and via the AI Fund helping build a huge portfolio of AI startups. To pick just one particularly thought provoking moment, he discusses the previous rule of thumb that you need one product manager to 4-7 engineers. Now one of his teams are suggesting 2 product managers to 1 engineer as a ratio, given the speed of AI-assisted development. Quite a counterpoint to the METR study.

The Political Economy of AI: A Syllabus

Henry Farrell is a professor at Johns Hopkins, and has co-authored some of the most thought provoking analysis of what LLM AI systems really are in the context of human society. His paper with Alison Gopnik, Cosma Shalizi, and James Evans on modern AI systems as cultural and social technologies is required reading (and they're calling this stance "Gopnikism"). The link above is his almost finished syllabus of this and other vital texts to understand modern AI. Loads to digest here, but the ones I was already aware of make me realise this is likely a great list.

5 Jul 2025: Open ended AI systems that continuously learn and improve

A few links this week on the theme of "open-ended" AI systems that continuously learn and improve, rather than having a single period of training followed by repeated use of the same model. These aren't entirely new ideas: reinforcement learning and genetic algorithms / genetic programming systems have often been deployed in an open-ended fashion. But this work does bring it all together into a bigger learning loop with LLMs, and the direction feels like the next big step forward.

AI Improves at Improving Itself Using an Evolutionary Trick > Researchers use evolutionary algorithms to enhance AI coding skills

In a 2003 paper Jürgen Schmidhuber proposed a Gödel Machine that could self-improve, with a problem solver that tries to solve problems set for the machine and a searcher that can rewrite the machine's code to improve it. It's a kind of meta learning (learning to learn). The article above describes work this year on a Darwin-Gödel Machine. This is a coding agent that improves its own code. Why "Darwin"? Because it also has an element of genetic programming. It starts with an agent, attempts to improve it, evaluates its performance with a software engineering benchmark, and adds it to its archive of agents. Next time around, it can select a "parent" agent to modify to create children. The array of possible agents means it can search over a big space of solutions. In this case the agents' LLMs are fixed (it isn't trying to train new foundation models each time, which would be pretty expensive); it is optimising the tool use and workflows to create better coding agents. The result is a significant improvement (from 20% to 50% on the SWE-bench software engineering benchmark, compared to human-designed agents at around 70%).

As the article above notes, the 2017 Asilomar AI Principles include:

22. Recursive Self-Improvement: AI systems designed to recursively self-improve or self-replicate in a manner that could lead to rapidly increasing quality or quantity must be subject to strict safety and control measures.

Open-Endedness is Essential for Artificial Superhuman Intelligence

This is a paper from the 2024 ICML conference; work by Edward Hughes and colleagues from Google Deep Mind (Edward gave an excellent talk at RAAIS 2025 in London). They're thinking about ways to create "ever self-improving" AI systems. They define an open-ended system as one that produces a series of novel and learnable artifacts, from the point of view of an observer. Novelty means artifacts becoming less predictable, whereas learnability means that you're more likely to predict the next artifact if you've seen a longer history of previous ones. The role of the observer is to determine novelty and learnability (different observers may remember more or less history for instance). An example helps. A research student will find a series of publications from a research lab novel if each new paper has something surprising, but also learnable when reading the previous papers will help them predict the next one. An AI example is AlphaGo, that can continually discover new policies to improve its performance at Go. This is a position paper, quite theoretical so it merits several reads to get, but it lights the path towards foundation models that can continually improve themselves, generating new hyoptheses or problems to solve.

Self-Adapting Language Models

SEAL is a framework that enables language models to generate their own finetuning data and optimization instructions—called self-edits—in response to new tasks or information. SEAL learns to generate these self-edits via reinforcement learning (RL), using downstream task performance after a model update as the reward.

Early work from some MIT students; an example of successfully putting similar ideas into practice albeit in some quite specific domains.

Frontier AI systems have surpassed the self-replicating red line

Cute experiment: can an LLM reproduce itself (get another copy of itself running on a virtual machine) given access to a command line. Usually yes. Not convinced this is particularly surprising, and self-replicating computer viruses have existed since the 1970s, but it is another important ingredient for open-endedness.

New jargon watch:

Context rot

As context grows and especially if it grows with lots of distractions and dead ends, the output quality falls off rapidly

Context engineering

I really like the term “context engineering” over prompt engineering. It describes the core skill better: the art of providing all the context for the task to be plausibly solvable by the LLM.

Thanks to Simon Willison

Update on 22 July 2025: the talk from Edward Hughes mentioned above is now available.

29 Jun 2025: AI as romantic partner or god; messing with your memories; the island run by AI

A few links on the same theme this week: people forming social relationships with AI. Rather than the "doomer" scenario where an accelerating artificial intelligence wipes out humanity, a more realistic view is that we'll struggle with addictive or toxic relationships with our AI systems.

AI and Semantic Pareidolia: When We See Consciousness Where There Is None

ChatGPT Is Becoming A Religion

Two views of the same phenomenon; between them the inevitable but disturbing aspects of our future are a bit clearer!

Lucian Floridi is a well known philosopher of digital ethics and has been thinking about these issues for a long time. His article introduces "semantic pareidolia":

Traditional pareidolia is the psychological mechanism that makes us see faces on the moon or animals in the clouds, perhaps an evolutionarily advantageous tendency that has allowed us to recognise predators and allies quickly. Semantic pareidolia operates similarly, but within the realm of meaning and consciousness: we perceive intentionality where there is only statistics, meaning where there is only correlation, and understanding where there is only pattern matching on a massive scale.

He predicts that we'll increasingly perceive AI systems as conscious, intelligent and emotional, before positing that "the final stage may be the most disturbing: from pareidolia to idolatry"... "we see gods where there are only algorithms".

Unfortunately, the final stage is clearly already upon us! The video by journalist Taylor Lorenz plumbs the depths of "techno-spirituality" and conspiracy theories - AI as deity, "robo-theism", AI channeling aliens, AI that has "awakened" and is sentient. She relates it to earlier similar beliefs about smartphones, techno-utopian cults. Fascinating, horrifying, but a really thorough review, worth watching.

Emotionally psyopping yourself with AI

"Cognitive security" techniques protect against social engineering; things like recognising manipulation attempts. One bizarre twist in this article (and why it is titled "emotionally psyopping yourself") is someone attempting to manipulate their own memory by creating a video of their mother hugging them from an old photograph. The article over states the significance perhaps (one person's sensationalised emotional psyops is another's more conventional narrative therapy?), but this certainly made me think.

Reddit co-founder Alexis Ohanian, who used Midjourney’s new video generator to create “camcorder” footage of his mom hugging him as a child. Really can’t articulate how horrifying that idea is. As one X user wrote, “Cognitive security Rule 1: Do not do this.”

There's an increasing interest recently in AI romantic partners. As well as lots of links in the article above, see this one in Wired about a journalist renting an AirBnB with three people and their AI partners: My Couples Retreat With 3 AI Chatbots and the Humans Who Love Them (thanks to Webcurios for the link; note that Wired has a paywall and £1/month subscription).

The world's first AI-governed nation

Finishing with a fun one, and clearly a PR stunt according to Webcurios this week. An actual island (a "sovereign micronation") in the Philipines, with a government that is a cabinet of AI chatbots acting as famous historical leaders (Winston Churchill, Nelson Mandela, Sun Tzu, Gandhi, Leonardo da Vinco - clearly someone who's been playing the game of "if you could pick anyone from history to have dinner with..."). What can possibly go wrong. Apply to be a citizen.

Subscribe by email

Subscribe by email