Video models are zero-shot learners and reasoners
Pivotal insights from Google DeepMind published this week. Everyone was surprised at the sheer variety of tasks that LLMs could tackle; noone expected that a next-word prediction machine could write good code, or reason through problems, or many of the other applications we now take for granted that were't previously considered purely language or writing tasks. This work suggests that video models are similar, albeit a few years earlier in their evolution. They show a remarkable range of activities that Veo3 can perform. Remember, Veo3's job is just to produce a series of frames for a short video (and accompanying audio), just like an LLM's job is to produce a series of words.
Could video models be on a trajectory towards general-purpose vision understanding, much like LLMs developed general-purpose language understanding? We demonstrate that Veo 3 can solve a broad variety of tasks it wasn’t explicitly trained for: segmenting objects, detecting edges, editing images, understanding physical properties, recognizing object affordances, simulating tool use, and more. These abilities to perceive, model, and manipulate the visual world enable early forms of visual reasoning like maze and symmetry solving. Veo’s emergent zero-shot capabilities indicate that video models are on a path to becoming unified, generalist vision foundation models.
This is easiest to understand with an example, one of the very many presented. Can a video generation model successfully find a path through a maze? The model is given the maze as a starting image and simply asked to generate an animation of what happens next, given a prompt. The prompt starts with: "Without crossing any black boundary, the grey mouse from the corner skillfully navigates the maze by walking around until it finds the yellow cheese."
Here's the result:
(I've actually picked an example that only worked in 17% of their experiments, but there are many others with much higher success rates. The mouse in the maze makes a good video though! The expectation is that, like LLMs, these capabilities will continue to improve)
British AI startup beats humans in international forecasting competition
Asimov's Foundation series introduced the fictional science of psychohistory, that can predict broad societal trends and events across a galactic civilisation. Mantic is a startup attempting to build an initial version. I hadn't realised that forecasting is a competive sport. The Metaculus Cup sets a number of prediction challenges, answers are submitted, and scored 2 weeks later (so it is quite a short time frame). Mantic achieved 8th place in the summer 2025 contest, the highest ever for a bot, across a wide variety of questions predicting developments in Ukraine and Gaza, sporting results, elections, all kinds of political events. Mantic's approach appears is a multi-agent system:
Mantic breaks down a forecasting problem into different jobs and assigns them to a roster of machine-learning models including OpenAI, Google and DeepSeek, depending on their strengths.
Using AI (rather than human "superforecasters") opens up possibilities for faster experimentation. They can do "backtesting", giving the AI access to information prior to a certain date and then asking for predictions, where the outcome is already known. And they can work at much greater speed and scale. It will be interesting to see if this kind of technology starts being applied outside of finance and trading.
This research from the Culture, Cognition, Coevolution Lab at Harvard looks at how LLMs answer questions compared to people from different cultures and countries. As they state:
Technical reports often compare LLMs’ outputs with “human” performance on various tests. Here, we ask, “Which humans?” Much of the existing literature largely ignores the fact that humans are a cultural species with substantial psychological diversity around the globe that is not fully captured by the textual data on which current LLMs have been trained.
It's introduced me to a new acronym - WEIRD - Western, Educated, Industrialised, Rich, and Democratic. WEIRD populations "tend to be more individualistic, independent, and impersonally prosocial (e.g., trusting of strangers) while being less morally parochial, less respectful toward authorities, less conforming, and less loyal to their local groups." Unsurprisngly, LLMs are trained on very WEIRD-biased text ("most of the textual data on the internet are produced by WEIRD people (and primarily in English)"), and so we get the "WEIRD-in WEIRD-out" problem. The World Values Survey (WVS) is a long running international survey that's been done in waves since 1981, and looks at values, norms, beliefs, and attitudes around politics, religion, family, work, identity, trust, and well-being. By essentially getting ChatGPT to answer the WVS survey questions, it can be placed on a scale for comparison. The graph below shows the ChatGPT WEIRD bias pretty clearly: ChatGPT is much more correlated with answers from countries like the US.
Why the AI “megasystem problem” needs our attention
Not the usual AI doomer nonsense. Quite the opposite: a depressingly realistic view from Susan Schneider (a philosphy professor at Florida Atlantic University) on likely problems that will come not from a single superintelligence that is created in some lab, but from the "megasystem":
"But the real risk isn’t one system going rogue. It’s a web of systems interacting, training one another, colluding in ways we don’t anticipate.... Losing control of a megasystem is far more plausible than a single AI going rogue. And it’s harder to monitor, because you can’t point to one culprit — you’re dealing with networks."
It has some parallels to systemic risk in financial markets, but the effect on individuals and culture makes it a different kind of problem:
Individuals need to cultivate awareness. Recognize the risks of addiction and homogeneity. Push for friction in learning. Demand transparency about how these tools shape our thought patterns. Without cultural pressure, policy alone won’t be enough.