Game Intelligence as a Path to AGI, Othello as a Testbed

Exploring game intelligence as a pathway to AGI through Othello

Introduction

In this lecture, we will (1) examine various definitions of intelligence, (2) explore how Othello can serve as a practical testbed for these definitions, and (3) discuss broader implications for AGI.

Why Study Intelligence Through Games?

Games provide a structured yet complex environment to study intelligence in a controlled manner. They offer:

Bounded complexity: Clear rules with well-defined goals
Progressive mastery: A path from novice to expert understanding
Strategic depth: Multiple layers of abstraction and planning
Measurable performance: Concrete metrics for improvement

Othello, in particular, offers a sweet spot of simplicity and depth. Its rules can be learned in minutes, but strategic mastery requires extensive experience—making it an ideal microcosm for studying how intelligence develops from basic rules to advanced abstract reasoning.

Understanding Intelligence

Intelligence remains one of the most fascinating yet elusive concepts in AI research. To frame our discussion on game intelligence and its relation to AGI, we’ll first examine several influential perspectives on intelligence:

Chollet’s Intelligence Metric (Chollet, 2019): Defines intelligence as “the rate at which a learner turns its experience and prior knowledge into new skills at valuable tasks that involve uncertainty and adaptation.” Chollet emphasizes skill-acquisition efficiency and generalization capability rather than task-specific performance.
Yann LeCun’s Autonomous Machine Intelligence (LeCun, 2022): Focuses on building systems with world models capable of planning, reasoning, and goal-directed behavior - elements critical for strategic games like Othello.
Brandon Lake’s Human-Like Learning (Lake et al., 2017): Argues that human-like learning should be rapid, adaptable, and built on causal models of the world—characteristics we might want in game-playing systems that truly understand their domain.

These perspectives offer complementary views on what constitutes “intelligence” - from adaptation efficiency to world modeling to human-like reasoning - all relevant to our exploration of game-playing intelligence.

Intelligence-Aligned Models

Several existing AI systems demonstrate aspects of intelligence that align with our definitions above:

AlphaZero (DeepMind) (Silver et al., 2017): Mastered chess, shogi, and Go through self-play reinforcement learning combined with Monte Carlo Tree Search, showcasing how an AI system can discover strategic concepts through experience without human knowledge.
Gato (DeepMind) (Reed et al., 2022): A generalist agent capable of performing hundreds of tasks across different modalities, demonstrating how a single model can generalize across diverse domains and tasks.
Meta’s Cicero (Meta AI, 2022): Achieved human-level performance in Diplomacy, a game requiring strategic reasoning, negotiation, and understanding of other players’ intentions.
Gemini 1.5 series (Google, 2024): Multimodal general-purpose models demonstrating strong reasoning and abstraction capabilities across diverse tasks.

These systems represent different approaches to intelligence - from specialized game expertise to multi-task generalization to sophisticated reasoning capabilities.

Othello as a Testbed for Intelligence

Othello (also known as Reversi) serves as an excellent testbed for exploring aspects of intelligence for several reasons:

Clear rules but complex strategy: While the rules can be learned in minutes, mastering strategic play requires significant experience and insight.
Bounded complexity: The 8x8 board provides enough complexity to be challenging while remaining computationally tractable.
Strategic depth: From opening theory to endgame calculation, Othello involves multiple layers of strategic thinking.

Understanding Othello

For those unfamiliar with the game, Othello is played on an 8x8 board. Players alternate placing their colored discs, with the objective of having the majority of discs in their color at the end of the game. A player captures opponent’s discs by “sandwiching” them between their own discs in horizontal, vertical, or diagonal lines.

Play here: Othello

alt text

Strategic Progression

The strategic complexity of Othello can be viewed as a progression:

Basic baseline: Placing discs in any valid position
Simple heuristic: Maximizing the number of discs flipped in each move
Intermediate heuristic: Weighting board positions differently (corners, edges, etc.)
Advanced domain knowledge: Applying concepts like mobilityMobility refers to the number of legal moves available to a player. Higher mobility gives you more options and flexibility, while restricting your opponent's mobility limits their choices. Players often try to maximize their own mobility while minimizing their opponent's., stabilityStability describes how secure or "stable" your discs are on the board. A stable disc cannot be flipped by your opponent for the remainder of the game. Edge and corner discs often become stable more easily. The most stable positions are corners, which once captured can never be flipped., parityParity relates to the even/odd nature of empty squares in regions of the board. The player who makes the last move in a region often has an advantage. If a region has an odd number of empty squares, the player who moves first into that region can also make the last move there (assuming alternating play)., and tempoTempo refers to who has the initiative or the timing of moves. Sometimes it's advantageous to force your opponent to make a particular move at a specific time. "Gaining tempo" means creating a situation where your opponent must respond in a predictable way, giving you control over the flow of the game.

The question becomes:

Can we design AI systems that progress through these levels of understanding, and what would that tell us about their intelligence?

Othello-AI Design Considerations

To build an effective Othello-playing AI, we must consider multiple aspects of game understanding and strategic thinking:

1. Understanding Game Mechanics

How deeply must our AI understand the game’s operational principles?

Rule-based knowledge:
- Importance of corners and edges
- Value of stable discs (those that cannot be flipped)
- Risk of X-squaresX-squares are the diagonal squares adjacent to the corners (for example, B2, B7, G2, and G7 on a standard 8×8 board). These squares are considered dangerous positions and C-squaresC-squares are the squares that are adjacent to the corners but along the edges (for example, A2, B1, A7, B8, G1, H2, G8, and H7). Like X-squares, C-squares are generally considered risky positions
- Parity (odd/even) strategy
Pattern recognition:
- Opening theory and variations
- Mid-game strategic patterns
- Endgame optimization patterns
Program synthesis approach:
- Perfect modeling of game rules
- Optimized state space representation
- Efficient move generation algorithms

2. Learning from Experience

How much gameplay experience is necessary?

Self-play learning:
- AlphaZero-style reinforcement learning (Silver et al., 2017)AlphaZero's reinforcement learning is based on self-play with Monte Carlo Tree Search (MCTS). It uses a deep neural network to predict both move probabilities (policy) and the expected outcome (value). During self-play, MCTS improves move selection by simulating multiple future game states. After each game, the neural network is updated using supervised learning on MCTS visit counts (for policy) and game outcomes (for value). This iterative process continuously refines AlphaZero’s decision-making without human data, leading to superhuman performance in chess, Go, and shogi.
- Policy and value network training
- Experience replay for diverse situation exposure
Dataset construction:
- Learning from expert game records
- Situation-specific response databases
- Decision patterns under time constraints
Reward modeling:
- Balancing short-term gains (disc count) with long-term advantages (positional strength)
- Phase-differentiated reward functions
- Adaptive reward systems

3. Strategic Depth

How deep must our AI’s thinking capabilities be?

Search depth and breadth:
- Opening: Wide search for strategic direction
- Middle game: Balanced search for tactical advantage
- Endgame: Deep search for perfect calculation
Meta-strategy:
- Opponent modeling and counter-strategy development
- Time management strategies (identifying critical moves)
- Risk-reward balancing
General intelligence approach:
- Multi-time scale planning
- Switching between abstraction levels (tactical ↔ strategic)
- Meta-learning (learning to learn)

See more thoughts on the supplementary material.

From Game Intelligence to AGI

How can we leverage game-specific intelligence as a stepping stone toward more general intelligence?

1. Extending to Similar Games

Adapting an Othello AI to handle variations:

Rule variations:
- Different board sizes (6x6, 10x10, asymmetric)
- Modified rules (“Can-Pass” instead of “Must-Pass”)
- Alternative starting positions
- Different movement or capture rules
Implementation approach:
- Meta-parameterization of game rules
- Few-shot learning capabilities for quick adaptation
- Rule-change detection and strategy adjustment modules

2. Extending to Different Board Games

Progressive expansion from similar games to increasingly different ones:

Similar perfect-information games:
- Checkers, Connect Four
- Chess, Go
Imperfect information games:
- Poker, Bridge
- Stratego
Cooperative games:
- Pandemic
- Team-based strategy games
Implementation approach:
- Modular architecture with shared reasoning components
- Transfer learning from one game to another
- Common abstract concept learning (control, territory, mobility)

3. Meta-Learning Approaches

Developing systems that can learn to play new games:

Model-Agnostic Meta-Learning (MAML):
- Learning initial parameters that can quickly adapt to new game rules
- Fine-tuning capabilities with minimal samples
HyperNetworks:
- Networks that generate game-playing networks from rule descriptions
- Dynamic network structure adjustment based on rule changes
Neuromodulation-based architectures:
- Context-dependent network activation mechanisms
- Different “thinking modes” for different games and game states

LLMs and Othello

An interesting question arises: can current Large Language Models play Othello effectively?

Grok 3 Experience

alt text

When asked to play Othello, Grok 3 demonstrated:

Accurate explanation of the rules
Creation of a visual board representation
Turn-based gameplay mechanics
However, it exhibited rule comprehension issues and occasional hallucinations in game state tracking

Claude 3.7 Experience

alt text

Claude 3.7 Sonnet demonstrated:

Complete implementation of Othello in HTML/CSS/JS
An interactive game environment with responsive interface
Solidly intermediate-level play adhering to game rules
An ability to create an interactive experience within its response framework

This difference in capabilities highlights the varying approaches to tool use and interactive content generation among current LLMs, and raises questions about how well language models can represent and reason about spatial and strategic game information.

Conclusion

The journey from specialized game intelligence to artificial general intelligence requires several key developments:

Adaptability: The ability to transfer knowledge between similar domains with minimal adjustment
Abstraction: The capacity to extract general principles from specific experiences
Meta-learning: The capability to “learn how to learn” new tasks efficiently

Othello provides an excellent starting point for this journey - complex enough to require sophisticated strategic thinking, yet simple enough to allow us to track an AI system’s progression from basic rule-following to advanced strategic thinking.

The path forward involves creating systems that can not only master individual games but understand the underlying patterns that connect different strategic challenges. This might involve combining traditional search algorithms with modern neural approaches, embedding both in a meta-learning framework that allows for transfer across domains.

By studying how AI systems develop mastery in constrained environments like Othello, we gain insights into the nature of intelligence itself - insights that may guide us toward creating truly general artificial intelligence.

References

For those interested in diving deeper into Othello and Games:

Othello: A Minute to Learn… A Lifetime to Master by Brian Rose
Detailed technique related to this Lecture Note
AlphaZero: Mastering Chess, Shogi and Go by Silver et al.
Generalist Agent Gato by Reed et al.
Cicero by Meta Fundamental AI Research Diplomacy Team

For those who want to gain AGI perspectives:

How Far are we from AGI by Feng et al.
Chollet’s ARC and Intelligence Metrics by Francois Chollet
Brandon Lake’s Human-Like Learning by Lake et al.
Autonomous Machine Intelligence by Yann LeCun

For LLM capabilities

Gemini by Google
Competitive Programming with LLMs by OpenAI

For LLM usage:

How I use LLMs by Andrej Karpathy
Deep Dive into LLMs like ChatGPT by Andrej Karpathy