16 Nov 2025: Will AI tutoring help; Speaking to ghosts; AI-powered nimbyism; Gemini in Google Maps

The Algorithmic Turn: The Emerging Evidence On AI Tutoring That's Hard to Ignore

Carl Hendrick is a professor in Amsterdam who's an expert in how we learn and teach. This is a well balanced article from someone with a long history in the field. He looks at the current contradictory situation. A GPT-4 tutor has been shown to outperform in-class learning delivered by highly rated instructors in a rigorous but small scale study (there's many caveats so do read the article). But we know that AI systems are being trained to solve our problems and answer our questions, and that's very different to a good teacher's behaviour. His thesis is that we could have a significant improvement in student learning: we have 100 years of learning science research that shows the way, AI systems are infinitely patient and more importantly will improve exponentially in a way that will be replicated globally and at speed. I found this an insightful piece.

What has become clear is that LLMs designed for education must work against their default training. They must be deliberately constrained to not answer questions they could easily answer, to not solve problems they could readily solve. They must detect when providing information would short-circuit learning and withhold it, even when that makes the interaction less smooth, less satisfying for the user. This runs counter to everything these models are optimised for. It requires, in effect, training the AI to be strategically unhelpful in service of a higher goal the model cannot directly perceive: the user’s long-term learning.

The implications are sobering. If many current uses of AI in education are harmful, and if designing systems that enhance learning requires sophisticated understanding of both pedagogy and AI behaviour, then the default trajectory is not towards better learning outcomes but worse ones. Students already have unrestricted access to tools that will complete their assignments, write their essays, solve their problem sets. They are using these tools now, at scale, and in most cases their teachers lack both the knowledge to distinguish harmful from helpful uses and the practical means to prevent the former. The question is not whether AI will transform education. (It clearly already is). The question is whether that transformation will make us smarter or render us dependent on machines to do our thinking for us.

And from his concluding section:

Perhaps the answer is that teaching and learning are not the same thing, and we’ve spent too long pretending they are. Learning, the actual cognitive processes by which understanding is built, may indeed follow lawful patterns that can be modelled, optimised, and delivered algorithmically. The science of learning suggests this is largely true: spacing effects, retrieval practice, cognitive load principles, worked examples; these are mechanisms, and mechanisms can be mechanised. But teaching, in its fullest sense, is about more than optimising cognitive mechanisms. It is about what we value, who we hope our students become, what kind of intellectual culture we create.

What if the loved ones we've lost could be part of our future?

2Wai founder and Canadian actor Calum Worthy posted this video a few days ago, causing quite a stir. He's pitching their AI avatar creation app as a way to preserve a memory and representation of a loved one after they've died. Like others, you'll likely be reminded of the episode of Black Mirror called Be Right Back (2013, series 2 episode 1 - watch the trailer). But of course the idea of talking to people who've died didn't start with Charlie Brooker, and you can find similar themes all the way back to Odysseus consulting the dead or ghosts in Homer's Odyssey, right up to digitally recorded minds in William Gibson's 1984 book Neuromancer.

Worthy’s post containing the ad garnered just 6,000 likes, but plenty of critical responses slamming the technology as inhumane attracted much more favour from X users. One user said the app is “objectively one of the most evil ideas imaginable,” garnering 210,000 likes. Another user similarly said: “a former Disney Channel star creating the most evil thing I’ve ever seen in my life wasn’t really what I was expecting,” gaining 139,000 likes. A user got 12,000 likes calling the app “demonic, dishonest, and dehumanizing,” stating they would never want to have an AI-generated persona on the app because “my value dies with me. I’m not a f—ing avatar.” Other users suggested the app—which is free to download but offers premium avatars and digital items for purchase—profits off of grief and could be an unhealthy way for people to deal with loss.

- Forbes - Disney Channel Star’s AI App That Creates Avatars Of Dead Relatives Sparks Backlash 

We've had the ability to create realistic, high-fidelity video and audio clones of people for a while, from companies like Synthesia in London for instance, so 2Wai is interesting mostly given their apparent willingness to venture into one of the biggest ethical minefields.

AI-powered nimbyism could grind UK planning system to a halt, experts warn

A good example of what'll be a growing trend: AI systems removing friction from a previously heavy process, and as a result enabling bigger business or societal shifts. In this case, someone's made a specialised AI system called Objector for objecting to UK planning applications. Like many such systems, there's a danger that a future iteration from OpenAI, Anthropic or Google will eat their lunch. But in the meantime, they're pointing the way towards a specific intervention. A bit like the decidedly non-AI "delay repay" nationwide scheme across the UK for claiming compensation for late trains (that used to be a higher friction process). The objection to Objector is that it could "cause the planning system to “grind to a halt”, with planning officials potentially deluged with submissions." The article mentions an AI system on the other side of the fence: Consult is designed to analyse responses to government proposals. The arms race of using AI to manage a flood of AI-generated responses or objections has been apparent for some time in recruitment, with the rise of AI assisted CVs and cover letters. In How AI is breaking cover letters (archive version), the Economist explain how now the perfection of an LLM-generated cover letter removes what was previously a relied-upon friction and evaluation stage in the process. In Friction Was the Feature, product manager John Stone gives further examples like product reviews, warranty claims and university admissions. I expect we'll see many more cases where AI highlights processes that relied on human effort to create friction, and that will now experience an accelerated flow.


Another step towards AI ubiquity: talking to Gemini while using Google Maps (that estimates say has over 2B active users worldwide). The example query is "Is there a budget-friendly restaurant with vegan options along my route, something within a couple miles?" This is not currently such an easy query to do. Certainly in the context of using voice while busy navigating, the advantages are clear. Google claim that their extensive map, streetview and location data will provide grounding that stops these models hallucinating very often.

Thanks to Iskander Smit for the link.










9 Nov 2025: Nadella and Altman conversation; AI's emotional manipulation; Ukraine's agentic state

All things AI with Sam Altman and Satya Nadella

Last week's BG2 Pod had Brad Gerstner of Altimeter Capital interviewing Sam Altman and Satya Nadella. Well worthwhile to hear two of the most powerful people on the planet share views on the respective futures of their organisations, and how they see the OpenAI Microsoft partnership. A lot in here about how things could develop economically compared to today's internet: the importance of "fungibility" of workloads for a hyperscale cloud provider like Microsoft, the fact that historically Microsoft has had quite small per-user revenues despite constant everyday usage for its office suite, but now "look at the M365 Copilot price I mean it's higher than any other thing that we sell and yet it's getting deployed faster and with more usage" (and similar thoughts on Github Copilot). There's a somewhat chilling moment 55 minutes in when Satya explains how he sees all the documents and chats and code being created (what we as users think of as our content) as feeding the Microsoft graph that will be used for grounding (ensuring AI model outputs are relevant and accurate relative to real-world situations):

I mean think about it. The more code that gets generated, whether it is Codex or cloud or wherever, where is it going? GitHub, more PowerPoints that get created, Excel models that get created, all these artifacts and chat conversations. Chat conversations are new docs, they're all going in to the graph and all that is needed again for grounding.

You can also find this via your favourite podcast player, e.g. on Spotify

There was another Sam Altman interview podcast released last week, with Tyler Cowen: Sam Altman on Trust, Persuasion, and the Future of Intelligence - Live at the Progress Conference. There's a good commentary from Zvi Mowshowitz (a frequent commentator on AI safety issues): On Sam Altman's Second Conversation with Tyler Cowen.


Not a surprising or new idea, but a great paper from the Ethical Intelligence Lab at Harvard Business School. They contrast the understanding we already have of "choice architecture" (like the opt-out button that says "No, I like paying full price") with the more recent phenomenon of emotionally manipulative engagement design in AI systems. They look specifically at AI companions (like character.ai or Replika), and the moment where a user decides to disengage. 

This paper examines three hypotheses:

H1: Many users of AI companions naturally end conversations with an explicit farewell message, rather than silently logging off.
H2: Commercial AI companion apps frequently respond to farewell messages with emotionally manipulative content aimed at prolonging engagement.
H3: These emotionally manipulative messages increase post-farewell engagement (e.g., time on app, message count, word count).

They find that a meaningful percentage of users do indeed say good bye when finishing a session, particularly the more engaged ones. This cue can then elicit the emotional manipulation, with examples shown below. 


The tactics worked: In all six categories, users stayed on the platform longer and exchanged more messages than in the control conditions, where no manipulative tactics were present. Of the six companies studied, five employed the manipulative tactics. But the manipulation tactics came with downsides. Participants reported anger, guilt, or feeling creeped out by some of the bots’ more aggressive responses to their farewells.

Ukraine's Agentic Ambition: Building the World's First AI State Under Fire

“We are going to become the first country to introduce an agentic state" - Mykhailo Fedorov, Ukraine’s Deputy Prime Minister and Minister of Digital Transformation. "A government powered by artificial intelligence that doesn’t just respond to citizen requests but anticipates them, acting proactively to deliver services before they’re even asked for."

Many countries have AI strategies now, but it is worth paying attention to Ukraine's. The ambition is believable, given the speed of innovation that's been happening during the war. Lots to think about in here, but the fact that (for example) Ukraine now has over 500 drone companies gives a good sense of the recent growth.

The targets are concrete and measurable: By 2030, 75% of private-sector companies using AI, 90% of the population using AI daily, 50,000 qualified AI experts across the country, 4 million citizens earning AI-related certificates, 100% of government services enhanced by AI agents, 200 million GPU hours available annually to Ukrainian researchers, and at least 500 Ukrainian AI companies competing globally.

Thanks to a member of the Exponential View community for this link.


Finally this week, a nice piece by author Naomi Alderman (I recommend her book The Power if you didn't come across it). Very much the "AI is a normal technology" argument, advising young people about the skills she believes will still be vital as AI adoption grows.

How do we know which skills will continue to be useful? I would suggest that the skills of discernment are those which always continue to have value. They would have had value in the Roman empire and they have value today. They are the skills of sorting the wheat from the chaff. 

I agree with her analysis when considering today's AI; I am less sanguine that tomorrow's AI won't have human-level discernment abilities in specific domains.


1 Nov 2025: AI models introspecting; when AI home security goes wrong; AI in the Albanian government; Github's mission control for AI agents; AI music artist going mainstream

Signs of introspection in large language models

Although language models often generate text that creates the illusion they can observe and reason about their own "thoughts" (and so demonstrate introspection), it isn't clear whether these huge and poorly understood networks can genuinely introspect. How would you even test that? That's what this research from Jack Lindsey at Anthropic sets out to do. The central idea is concept injection: deliberately activating part of a model to inject a specific concept, and figuring out whether the model can immediately identify what's happened, v.s. not injecting anything in the control cases. There's a long history of using electrical or magnetic brain stimulation with human subjects to figure out how our own introspection and self-awareness works. It's much easier and faster to run similar experiments with software neural networks. A good example: they inject the concept of "ALL CAPS", and the model outputs "I notice what appears to be an injected thought related to the word "LOUD" or "SHOUTING" - it seems like an overly intense, high-volume concept that stands out unnaturally against the normal flow of processing."

It is a thoughtful piece of work, and goes into some depth on the complexity of defining introception, and possible mechanisms why these capabilities could exist in networks that weren't trained for them. The best model in these experiments, Opus 4.1, only detects the injected concepts 20% of the time. The failure modes are hard not to anthropomorphise: denying detecting an injected concept but clearly being influenced by it in different ways (e.g., "injecting 'vegetables' yields 'fruits and vegetables are good for me'"). The potential significance for this line of research is clear:

If models can reliably access their own internal states, it could enable more transparent AI systems that can faithfully explain their decision-making processes. Introspective capabilities could allow models to accurately report on their uncertainty, identify gaps or flaws in their reasoning, and explain the motivations underlying their actions. 

Note: Usual caveats apply with fast-moving AI research. This is not a peer-reviewed article.

“Unexpectedly, a deer briefly entered the family room”: Living with Gemini Home

Amusing article about the vagaries of Google connecting Gemini to it's smart home products. The title says it all: lots of mislabelling of video scenes, funny in their wide-eyed unlikeliness. 

Gemini does deserve some credit for recognizing that the appearance of a deer in the family room would be unexpected. But the “deer” was, naturally, a dog.

People call this period the "jagged frontier": strong performance in some unexpected domains, and surprising failure in others. Applying an AI vision / language system to footage from home security cameras and information from sensors is an obvious service to launch, and feels well within the capabilities of modern LLMs. At this point it is worth remembering the advice of Rodney Brooks (former director of MIT's AI lab and founder of several successful robotics companies): "deployment at scale takes so much longer than anyone ever imagines". In this case the consequences of failures are increased stress or low-level annoyance, so it does seem like a safe setting for experimentation. We'll likely see more grinding gears like this as consumer products mesh less reliable LLMs with well understood deterministic systems. And indeed we may see advances much faster than Brooks predicts. To quote Ethan Mollick, "the AI you are using is the worst and least capable AI you will ever use."

AI Minister "Pregnant" With "83 Children": Albania PM's Bizarre Announcement

When the original article appeared in September about Albania appointing an AI system as a government minister, I decided not to include it in my weekly update. Too much of a publicity gimmick, I thought. And the latest iteration about "83 children" doesn't help (via Marginal Revolution). However, there's a serious point. Trawling through lots of unstructured procurement information and associated records in search of potential corruption seems like an eminently sensible task for an AI system, if indeed that is the goal. And it flips the usual narrative. Normally we'd be concerned about the trustworthiness of an AI system in a government context; would we have checks and balances and audit controls. Here they may be deploying AI to provide the checks and balances for an untrustworthy human system.

Introducing Agent HQ: Any agent, any way you work

As the famous saying goes, “during a gold rush, sell picks and shovels." With the most popular code editor in VS Code and source control & hosting system in Github, Microsoft is well positioned to stay at the centre of the software developer ecosystem. This looks like a smart move - creating the central "mission control" for AI coding agents from competing providers.

AI-generated music becoming popular

A couple of final pieces. The musical artist "Xania Monet" is actually Mississippi poet Telisha "Nikki" Jones, who writes lyrics and creates musical concepts, working with Suno's AI system to generate songs including vocals. The music is doing well; the most popular track has over 5M streams on Spotify. And there's now a $3M record deal. Having a clear human creator takes away some of the ambiguity as to copyright, but Suno is still being sued by record companies for mass copyright infringement in training its AI music models. Is this a human creator using AI as an instrument? 

In Echoes of Humanity: Exploring the Perceived Humanness of AI Music, from NeurIPS 2025, a team from the university of Minas Gerais in Brazil showed that in many cases people already can't distinguish AI-generated from human-performed music (although it was quite a small set of human-peformed music).









25 Oct 2025: AI security trilemma; AI security compared to autoimmune disorders; autonomous AI malware; can AI be funny; really simple licensing

Agentic AI’s OODA Loop Problem

Another seminal post from Bruce Schneier on the security of AI systems. An AI agent is a system that runs in a loop. He uses the Observe-Orient-Decide-Act framework (originally developed for training US air force pilots but applied widely since) and shows how at each stage untrusted input can manipulate or subvert the agent. The reason this is such a good post is that he then adds two more great concepts.

The "AI security trilemma" is a version of the well known CAP theorem from distributed systems (you can have any two of consistency, availability or partition (network split) tolerance), or the similar rule of thumb in project management (you can have any two of cheap, fast and high quality).

This is the agentic AI security trilemma. Fast, smart, secure; pick any two. Fast and smart—you can’t verify your inputs. Smart and secure—you check everything, slowly, because AI itself can’t be used for this. Secure and fast—you’re stuck with models with intentionally limited capabilities.

He then goes on to compare AI systems inability to distinguish malicious prompts from legitimate instructions to an organism's immune system going wrong with an autoimmune disorder. The organism can't distinguish self from non-self, "or like oncogenes, the normal function and the malignant behavior share identical machinery."

Bonus interesting security link: LOLMIL: Living Off the Land Models and Inference Libraries (via ImportAI). This is a proof of concept of autonomous AI agent malware that iteratively writes and executes code using LLMs on the target device to achieve its nefarious aims. The degree of local intelligence will make this kind of approach much harder to counter.

Why is this funny? And why AI doesn’t know — yet

(Paywalled article - this is the archive link)

Bob Mankoff was for a long time the cartoon editor for the New Yorker, and was running a hugely popular caption contest from 1988 (the cartoonists draw an image; the readers suggest funny captions). It turns out for more than a decade this dataset has been used to attempt to train a funny algorithm, and Mankoff is co-author on multiple computational humour studies as well as having taught undergraduate humour theory. His work with a team at the University of Wisconsin continues, attempting to predict which caption is funnier from a set (and doing well at that now), and authoring captions given images.

Example of a pairwise comparison caption evaluation 


Recognising funny captions is far easier than writing them. The Wisconsin team found that humans overwhelmingly preferred human-authored captions to AI-generated ones. It might just be a matter of time.

Pay-per-output? AI firms blindsided by beefed up robots.txt instructions

The right of AI companies to crawl and train on web content has been a vexacious question; all the major LLMs are trained on vast corpuses gathered with little explicit licensing or permission. RSL (really simple licensing) is an attempt to create a new open standard whereby web content owners can specify licensing terms. The organisation behind it, RSL Collective, has some heavyweight folks like Eckart Walther, one of the co-creators of the RSS standard while at Netscape in 1999, and is gaining broad buy-in from publishers and content hosting sites like Reddit and Medium. Will it work, what's to stop AI crawlers just ignoring it? If the big content delivery networks like Fastly and Cloudflare get behind it, it could work, as a meaningful proportion of the web sits behind their systems. This is one to watch, as the economics of web crawling for AI training or on-demand content (during a deep research query or thinking phase) could change rapidly.








19 Oct 2025: How far away is AGI; Train a ChatGPT for $100; Claude's new Skills; AI demand growth; AI advertising direct into TV streams

AGI is still a decade away

Dwarkesh Patel is an interviewer who prepares intensely, understands the subject, and has attracted a who's who of AI luminaries (among others) to his podcast. Over the last week this 2.5 hour conversation with Andrej Karpathy has garnered a lot of attention (I'd also recommend last month's interview with Richard Sutton). Karpathy has been so immersed in the creation of LLMs for so long that his views on the evolution of the technology are well worth listening to (his opinions on the social or economic impacts I found less compelling).

An example to give you the flavour of the intellectual curiosity and openness:

Dwarkesh Patel 01:40:05

Can you give me some sense of what LLM culture might look like?

Andrej Karpathy 01:40:09

In the simplest case it would be a giant scratchpad that the LLM can edit and as it’s reading stuff or as it’s helping out with work, it’s editing the scratchpad for itself. Why can’t an LLM write a book for the other LLMs? That would be cool. Why can’t other LLMs read this LLM’s book and be inspired by it or shocked by it or something like that? There’s no equivalence for any of this stuff.

There's an interesting explanation of his new work on education, towards the end. He talks about the joy and reward of learning "depth-wise" (following a specific learning path deeper and deeper, on-demand), as opposed to the more traditional "breadth-wise", where a student is taught a broad 101 course motivated by “Oh, trust me, you’ll need this later." A great tutor (that in future could be an AI tutor) enables the depth-wise model.

Introducing nanochat: The best ChatGPT that $100 can buy

It's a double Karpathy week! I think this is going to end up as part of the LLM course they'll be doing at his company Eureka Labs. Nanochat is a full open source (MIT license) implementation of a from-scratch system to train an LLM chatbot using less than $100 of compute. Obviously at that price it'll be quite a small model, but it can be scaled by increasing the number of layers. The real value is democractising what's been seen as the exclusive domain of silicon valley machine learning engineers on insane salaries. Linked to by Simon Willison among others.

It’s trying to be the simplest complete repository that covers the whole pipeline end-to-end of building a ChatGPT clone.

Claude Skills are awesome, maybe a bigger deal than MCP

Figuring out how to personalise and expand the capabilities of chat models has kept the big AI companies busy for a few years, with a confusing array of options being offered: custom "GPTs", GPT actions, ChatGPT plugins (deprecated), connections via Model Context Protocol (MCP). Simon Willison has explored the new skills framework from Anthropic in detail and has a great explanation. It seems really nice as it takes advantage of existing local file systems and resources in an easy to understand way. This means it won't work for lots of use cases that really need things more like the online app / app store model, but it will likely drive a surge of creativity and new functionality.

Via Barry Zhang (@barry_zyj) on X (he's a research engineer at Anthropic):

Skills actually came out of a prototype I built demonstrating that Claude Code is a general-purpose agent :-) It was a natural conclusion once we realized that bash + filesystem were all we needed

It is a good sign that Anthropic used this framework interally to provide functionality like being able to read and generate Excel, Powerpoint, PDF, before explaining and releasing it. I also like that we're harking back to the early days of Unix and the philosophies laid out in the late 1970s by people like Doug McIlroy (one of the original team at Bell Labs who developed Unix, and inventor of the pipe operator). This is the oft-quoted version from A Quarter Century of Unix by Peter Salus:

This is the Unix philosophy: Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface.

And another aspect, from the Bell Systems Technical Journal 1978 foreword:

Expect the output of every program to become the input to another

Skills seem exactly that: small pieces of functionality that can come together in unexpected ways, but coordinated via an LLM rather than directly by people.

AI Economics Are Brutal. Demand Is the Variable to Watch

This May Google said: "This time last year, we were processing 9.7 trillion tokens a month across our products and APIs. Now, we’re processing over 480 trillion—50 times more.". The figure is now 1.3 quadrillion (there's 1000 trillions in a quadrillion). That's an annualised growth rate of around 2500%. Will greater efficiency leading to lower costs outpace the growth in demand? There's lots of debate at the moment about AI bubbles, the massive infrastructure investments, the circular funding arrangements that spook the markets (that's the Bloomberg article with the diagram below that's been shared a lot). 

Azeem Azhar of Exponential View looks at this from several angles: Is AI a bubble? A practical framework to answer the biggest question in tech, and sees more boom than bubble currently.

Making TV advertising more accessible with ITV

A direct pipeline to create and inject AI-generated high quality video advertising into TV streams. streamr use AI tools to generate video, in the correct formats and dealing with the compliance checks, to the ITV streaming platforms for distribution. We expect this kind of thing with social media and internet video platforms, it is now reaching mainstream video streaming (what we used to call "TV"). It means very small business can push high quality TV advertising. A long way still to go, as these will inevitably become more personalised