Home | zahavy

AlphaProof

An agent that self-taught itself Mathematics in Lean and achieved a silver-medal standard in the International Math Olympiad. Starting from a pre-trained LLM exhibiting proficiency in mathematics, AlphaProof embarked on a lifelong Reinforcement Learning journey: proving and disproving theorems, learning from them, and getting better and better. During the IMO competition, AlphaProof first invented variations of the competition problems and performed a second (test-time) Reinforcement Learning phase, where it tested these variations while attempting to prove the main problems. Over time, its comprehension improved, allowing it to solve all the number theory and algebra questions, including a P6 problem that only five human competitors managed to solve.

Read the paper in Nature and learn more about it on our blog, or at Nature, The New Scientist, MIT Technology Review, The New York Times, Semafor, Fortune, The Web.

PuzzleGen

Using RL and generative models to discover creative chess puzzles 🔊♟️♟️

While strong chess players intuitively recognize the beauty of a position, articulating the precise elements that constitute creativity remains elusive. To address this, we pre-trained generative models on public datasets and then applied reinforcement learning, using novel rewards designed for uniqueness, counter-intuitiveness, realism, and novelty. This approach doubled the number of novel chess puzzles compared to the original training data, while successfully maintaining aesthetic diversity.

Three distinguished experts—International Master of chess compositions Amatzia Avni (author of "Creative Chess"), Grandmaster Jonathan Levitt (author of "Secrets of Spectacular Chess"), and Grandmaster Matthew Sadler (author of "Game Changer")—evaluated and selected the puzzles they found most compelling. Their preference was for puzzles exhibiting original, paradoxical, surprising, and naturally occurring positions, with particular emphasis on those that integrated aesthetic themes in innovative ways and demonstrated exceptional over-the-board vision.

Paper, booklet & review, X, linkedin, cover by GothamChess (watch here), chess.com (story, youtube, puzzles) and lichess (blog, puzzles).

AlphaZero db

Artificial Intelligence (AI) systems have surpassed human intelligence in a variety of computational tasks. However, AI systems, like humans, make mistakes, have blind spots, hallucinate, and struggle to generalize to new situations. We explored whether AI systems can benefit from creative decision-making mechanisms when pushed to the limits of its computational rationality. AlphaZerodb is a league of AlphaZero agents, represented via a latent-conditioned architecture, and trained with quality-diversity techniques to generate a wider range of ideas. It then selects the most promising ones with sub-additive planning. AlphaZerodb plays chess in diverse ways, solves more puzzles as a group and outperforms a more homogeneous team. Read about it in our preprint, or in Quanta.

Tom Zahavy

tomzahavy (at) gmail (dot) com

AlphaProof

PuzzleGen

AlphaZero db