In this lecture, we will (1) examine various definitions of intelligence, (2) explore how Othello can serve as a practical testbed for these definitions, and (3) discuss broader implications for AGI.
Why Study Intelligence Through Games?
Games provide a structured yet complex environment to study intelligence in a controlled manner. They offer:
Bounded complexity: Clear rules with well-defined goals
Progressive mastery: A path from novice to expert understanding
Strategic depth: Multiple layers of abstraction and planning
Measurable performance: Concrete metrics for improvement
Othello, in particular, offers a sweet spot of simplicity and depth. Its rules can be learned in minutes, but strategic mastery requires extensive experience—making it an ideal microcosm for studying how intelligence develops from basic rules to advanced abstract reasoning.
Understanding Intelligence
Intelligence remains one of the most fascinating yet elusive concepts in AI research. To frame our discussion on game intelligence and its relation to AGI, we’ll first examine several influential perspectives on intelligence:
Chollet’s Intelligence Metric (Chollet, 2019): Defines intelligence as “the rate at which a learner turns its experience and prior knowledge into new skills at valuable tasks that involve uncertainty and adaptation.” Chollet emphasizes skill-acquisition efficiency and generalization capability rather than task-specific performance.
Yann LeCun’s Autonomous Machine Intelligence (LeCun, 2022): Focuses on building systems with world models capable of planning, reasoning, and goal-directed behavior - elements critical for strategic games like Othello.
Brandon Lake’s Human-Like Learning (Lake et al., 2017): Argues that human-like learning should be rapid, adaptable, and built on causal models of the world—characteristics we might want in game-playing systems that truly understand their domain.
These perspectives offer complementary views on what constitutes “intelligence” - from adaptation efficiency to world modeling to human-like reasoning - all relevant to our exploration of game-playing intelligence.
Intelligence-Aligned Models
Several existing AI systems demonstrate aspects of intelligence that align with our definitions above:
AlphaZero (DeepMind) (Silver et al., 2017): Mastered chess, shogi, and Go through self-play reinforcement learning combined with Monte Carlo Tree Search, showcasing how an AI system can discover strategic concepts through experience without human knowledge.
Gato (DeepMind) (Reed et al., 2022): A generalist agent capable of performing hundreds of tasks across different modalities, demonstrating how a single model can generalize across diverse domains and tasks.
Meta’s Cicero (Meta AI, 2022): Achieved human-level performance in Diplomacy, a game requiring strategic reasoning, negotiation, and understanding of other players’ intentions.
Gemini 1.5 series (Google, 2024): Multimodal general-purpose models demonstrating strong reasoning and abstraction capabilities across diverse tasks.
These systems represent different approaches to intelligence - from specialized game expertise to multi-task generalization to sophisticated reasoning capabilities.
Othello as a Testbed for Intelligence
Othello (also known as Reversi) serves as an excellent testbed for exploring aspects of intelligence for several reasons:
Clear rules but complex strategy: While the rules can be learned in minutes, mastering strategic play requires significant experience and insight.
Bounded complexity: The 8x8 board provides enough complexity to be challenging while remaining computationally tractable.
Strategic depth: From opening theory to endgame calculation, Othello involves multiple layers of strategic thinking.
Understanding Othello
For those unfamiliar with the game, Othello is played on an 8x8 board. Players alternate placing their colored discs, with the objective of having the majority of discs in their color at the end of the game. A player captures opponent’s discs by “sandwiching” them between their own discs in horizontal, vertical, or diagonal lines.
Advanced domain knowledge: Applying concepts like mobilityMobility refers to the number of legal moves available to a player. Higher mobility gives you more options and flexibility, while restricting your opponent's mobility limits their choices. Players often try to maximize their own mobility while minimizing their opponent's., stabilityStability describes how secure or "stable" your discs are on the board. A stable disc cannot be flipped by your opponent for the remainder of the game. Edge and corner discs often become stable more easily. The most stable positions are corners, which once captured can never be flipped., parityParity relates to the even/odd nature of empty squares in regions of the board. The player who makes the last move in a region often has an advantage. If a region has an odd number of empty squares, the player who moves first into that region can also make the last move there (assuming alternating play)., and tempoTempo refers to who has the initiative or the timing of moves. Sometimes it's advantageous to force your opponent to make a particular move at a specific time. "Gaining tempo" means creating a situation where your opponent must respond in a predictable way, giving you control over the flow of the game.
The question becomes:
Can we design AI systems that progress through these levels of understanding, and what would that tell us about their intelligence?
Othello-AI Design Considerations
To build an effective Othello-playing AI, we must consider multiple aspects of game understanding and strategic thinking:
1. Understanding Game Mechanics
How deeply must our AI understand the game’s operational principles?
Rule-based knowledge:
Importance of corners and edges
Value of stable discs (those that cannot be flipped)
Risk of X-squaresX-squares are the diagonal squares adjacent to the corners (for example, B2, B7, G2, and G7 on a standard 8×8 board). These squares are considered dangerous positions and C-squaresC-squares are the squares that are adjacent to the corners but along the edges (for example, A2, B1, A7, B8, G1, H2, G8, and H7). Like X-squares, C-squares are generally considered risky positions
Parity (odd/even) strategy
Pattern recognition:
Opening theory and variations
Mid-game strategic patterns
Endgame optimization patterns
Program synthesis approach:
Perfect modeling of game rules
Optimized state space representation
Efficient move generation algorithms
2. Learning from Experience
How much gameplay experience is necessary?
Self-play learning:
AlphaZero-style reinforcement learning (Silver et al., 2017)AlphaZero's reinforcement learning is based on self-play with Monte Carlo Tree Search (MCTS). It uses a deep neural network to predict both move probabilities (policy) and the expected outcome (value). During self-play, MCTS improves move selection by simulating multiple future game states. After each game, the neural network is updated using supervised learning on MCTS visit counts (for policy) and game outcomes (for value). This iterative process continuously refines AlphaZero’s decision-making without human data, leading to superhuman performance in chess, Go, and shogi.
Policy and value network training
Experience replay for diverse situation exposure
Dataset construction:
Learning from expert game records
Situation-specific response databases
Decision patterns under time constraints
Reward modeling:
Balancing short-term gains (disc count) with long-term advantages (positional strength)
Phase-differentiated reward functions
Adaptive reward systems
3. Strategic Depth
How deep must our AI’s thinking capabilities be?
Search depth and breadth:
Opening: Wide search for strategic direction
Middle game: Balanced search for tactical advantage
Endgame: Deep search for perfect calculation
Meta-strategy:
Opponent modeling and counter-strategy development
Time management strategies (identifying critical moves)
Risk-reward balancing
General intelligence approach:
Multi-time scale planning
Switching between abstraction levels (tactical ↔ strategic)
An interactive game environment with responsive interface
Solidly intermediate-level play adhering to game rules
An ability to create an interactive experience within its response framework
This difference in capabilities highlights the varying approaches to tool use and interactive content generation among current LLMs, and raises questions about how well language models can represent and reason about spatial and strategic game information.
Conclusion
The journey from specialized game intelligence to artificial general intelligence requires several key developments:
Adaptability: The ability to transfer knowledge between similar domains with minimal adjustment
Abstraction: The capacity to extract general principles from specific experiences
Meta-learning: The capability to “learn how to learn” new tasks efficiently
Othello provides an excellent starting point for this journey - complex enough to require sophisticated strategic thinking, yet simple enough to allow us to track an AI system’s progression from basic rule-following to advanced strategic thinking.
The path forward involves creating systems that can not only master individual games but understand the underlying patterns that connect different strategic challenges. This might involve combining traditional search algorithms with modern neural approaches, embedding both in a meta-learning framework that allows for transfer across domains.
By studying how AI systems develop mastery in constrained environments like Othello, we gain insights into the nature of intelligence itself - insights that may guide us toward creating truly general artificial intelligence.
References
For those interested in diving deeper into Othello and Games: