When AI Solves a Hard Math Problem—But Not the One You Think
- James Garner
- Dec 7, 2025
- 4 min read
A Mathematical Breakthrough, a Major Caveat, and What It Tells Us About AI's Real Trajectory. An AI system called Aristotle, built by Harmonic (founded by Robinhood CEO Vlad Tenev), independently solved a mathematical problem that has remained open for nearly 30 years. Erdős Problem # 124, posed in 1995, is the sort of mathematical challenge that separates the brilliant from the merely expert. The AI didn't just solve it. It solved it in six hours, then formally verified the proof in one minute.
The accomplishment is genuinely remarkable. But the narrative surrounding it requires some care. The breakthrough tells us something important about where AI is heading. It also tells us something important about how to read AI claims in an era when hype runs deeper than rigour.

What Actually Happened
Aristotle, using reinforcement learning and formal verification in the Lean proof system, worked through a version of Erdős Problem #124 and produced what appears to be a valid solution. The problem involves combinatorial mathematics, asking whether any natural number can be represented as a sum of powers under specific constraints. The mathematical community, initially excited, has since applied due diligence.
Thomas Bloom, who maintains the official Erdős problems website, has clarified that Aristotle solved "a" version of the problem rather than "the" version. The complete, original issue, the harder one Erdős actually posed, remains unsolved.
This distinction matters. The version Aristotle solved was notably easier, possibly already known from the mathematical literature or from the history of competitions. The harder version, with additional constraints that Erdős included in his original papers, is still open.
Vlad Tenev called this "vibe proving": AI discovering proofs, then formally verifying them. It's an elegant phrase. The mathematical community's more measured assessment is that Aristotle solved a simplified variant of a previously known complex problem, discovered a path through it that humans hadn't noticed (possibly because it lurked in prior work we'd collectively forgotten), and then rigorously verified the solution.
The Real Achievement
Here's what actually matters: an AI system working independently from human direction, within a formal system of verification, discovered and proved something mathematically valid. That's genuine progress. It demonstrates AI reasoning capability at a level that rivals human mathematician performance on olympiad-level problems.
Compare this to ChatGPT or Gemini producing plausible-sounding but mathematically nonsensical answers to the same problem. Both failed where Aristotle succeeded. That gap is significant. It suggests we're moving toward reasoning systems that can operate within formal constraints, verify their own outputs against rigorous standards, and avoid the hallucination patterns that plague language models.
This is the trajectory that matters: toward systems that reason within bounded, verifiable domains rather than merely generating statistically plausible text. Aristotle isn't proof of AGI, but it's evidence we're moving toward systems capable of contributing to genuine scientific and mathematical insight.
The Hype Versus Reality Gap
Yet the way this achievement has been reported illustrates a persistent problem in how AI progress gets communicated: the gap between what actually happened and what the headlines suggest.
"AI Solves 30-Year-Old Math Problem" is substantially accurate but misleading. "AI Solves Difficult Version of Erdős Problem" would be more precise but less exciting. The truth, "AI solves a previously-known-to-be-solvable variant of a genuinely difficult problem," doesn't generate engagement.
Tenev himself has been transparent and honest about what happened. The mathematical community has been appropriately measured. But the secondary coverage has tended toward the sensational interpretation. This pattern repeats constantly in AI discourse: genuine breakthroughs are exaggerated, caveats are buried, and public understanding drifts toward an inflated sense of what systems can do.
What This Actually Signals
The more important point is what this achievement tells us about AI's genuine direction. We're likely to see far more of these "AI makes a novel discovery" headlines throughout 2026 as models begin to cross into domains that previously required specialist reasoning. Whether each claim stands up to scrutiny or not, the volume and ambition of these attempts will accelerate.
Taken together, they mark a steady move toward systems capable of contributing to genuine scientific and mathematical insight. The kind of progress that nudges us closer to something resembling the long-promised idea of AGI, even if the headlines get ahead of the reality.
For project professionals, this matters for a specific reason. The more AI systems prove capable of rigorous reasoning within bounded domains, the more they'll become genuinely helpful for project-specific work. Resource optimisation, risk prediction, scenario analysis, even aspects of strategic planning all benefit from systems that can reason carefully within constraints rather than merely generate fluent text.
But here's the critical caveat: just because AI systems improve at specialist reasoning doesn't mean they're ready to be trusted with high-stakes decisions without human oversight. Aristotle solved an Olympiad problem brilliantly. That doesn't mean you should deploy an AI system to make autonomous decisions about your project strategy.
Applying This to Project Work
We've observed that project teams using AI tools tend to fall into one of two camps: those who scrutinise outputs carefully and build understanding before trusting results, and those who trust vendor marketing and assume tools will "just work" on their problems. The first group builds a sustainable advantage. The second encounter is expensive failures. Aristotle's story teaches exactly this lesson: brilliant capability matched with honest limitations beats polished marketing every time.
When you're evaluating AI tools for project delivery, whether for scheduling, resource allocation, risk analysis, or forecasting, apply the Aristotle test. Ask what specific problem it actually solved, and demand to see how it's been independently verified. Require a clear explanation of its limitations. If vendors can't answer these questions directly, that's your signal to be deeply sceptical.
Our take on this: when you hear about AI breakthroughs in project delivery tools, better scheduling, more intelligent resource allocation, improved risk prediction- resist the temptation to assume they work as the marketing suggests. Instead, think like the mathematical community evaluating Aristotle. What version of the problem did it actually solve? How has it been verified? What's being overstated? The vendors who can discuss their tools' genuine limitations alongside their capabilities are the ones worth trusting with your projects.
How should your organisation distinguish between genuine AI breakthroughs and hype? Subscribe to Project Flux for analysis that cuts through the noise.



Comments