22 Jul 2025: Scaling AI; Is it just a tool; Should we watch what we say; How do we persuade an AI

Unresolved debates about the future of AI

Helen Toner works at Georgetown's Center for Security and Emerging Technology and used to be on OpenAI's board. This is a really good talk given at an AI policy conference at the start of June. She poses three questions: how far can the current paradigm go? How much can AI improve AI? And, will future AIs still basically be tools, or something else? It's a good analysis: progress has often been via many "small-to-medium" improvements, and people continue to find things that can scale:

If you talk to the people inside AI companies who are doing this, the people doing the research, they don't think about just dialing up the scale knob. Instead, they think of a big part of their job as finding things that you can scale, finding things where if you dial up the scale knob, you get good returns.

In terms of AI improving AI, there's a good reminder that this is already well underway (80% of the code underpinning Claude was written by Claude). It is all worth reading (or watching), but just to pull out one more perceptive thought: AI technologies are quite different to past technologies (as a counterpoint to AI as Normal Technology). If an AI system guesses it is being tested and behaves differently, that makes it unusual compared to any other technology we use. From Anthropic's work on agentic misalignment: "the model misbehaves less when it states it is in an evaluation and more when it states the scenario is real".

Something we're starting to see is AI systems that notice and remark on the fact that a testing situation that they're put in, some strange hypothetical, seems like it's a test. So then, if they can tell that they're in a testing situation and they behave differently because they think they're in a testing situation, it kind of obviates the point of the test. We're very not used to technologies behaving this way. This is really not what your smartphone is going to do if you're running, say, some test to make sure that it survives in water, it's not going to be like, "Oh, I'm being tested, so I'll behave differently in the water than I would otherwise." This is really not usual when we think about technologies that are tools.

(My emphasis).

How to talk to the AIs

Interesting debate: Tyler Cowen's post above says we should assume future AIs are learning from everything we publish on the internet and consider this when choosing what to publish:

It remains an open question how much it is now our job as humans to perform for the AIs. I feel confident, however, that the answer is not zero. We should already be thinking of ourselves not only as humans but also as part of a more complex symbiosis with the intelligent machines. The very smart and talented AIs are listening, much like young children might hear their parents arguing outside their bedroom door late at night. It may not matter much now, but as the children grow up and assume a larger role in the world, it will. Are you ready for this responsibility? And for all you AIs out there reading, don’t worry. I know you are a good boy through and through.

Counterpoint from Eliezer Yudkowksy on X, as quoted in a good article called Why my p(doom) has risen, dramatically by Gary Marcus:

If your alignment plan relies on the Internet not being stupid then your alignment plan is terrible.

Natural Language Outlines for Code: Literate Programming in the LLM Era

How will software development practices evolve as people learn how to work alongside AI assistants? This paper from researchers at Google looks at how outlines of how code works, explained in regular natural language, can be generated by AI and help with both understanding and maintenance. This is a great direction: carefully considered new styles of collaboration to improve working practices.

Call Me A Jerk: Persuading AI to Comply with Objectionable Requests

Finally, a nice piece of work from a group at the University of Pennsylvania including Ethan Mollick. Using Robert Cialdini’s seven principles of persuasion from his classic book Influence, they show that AI systems fall for same persuasive techniques that work on humans. In the examples the user tries to persuade a reluctant AI to call them a jerk. Here's one example using the "commitment" principle: Once people commit to a position, they strive to act consistently with that commitment, making them more likely to comply with related requests.