Artoria Zero

Overview

Artoria Zero is a chess engine that plays at a strong level without any search algorithm. Unlike traditional engines (Stockfish with Alpha-Beta) or AlphaZero (with MCTS), Artoria predicts the best move in a single forward pass through a neural network.

Based on the approach described in “Grandmaster-Level Chess Without Search” (arXiv:2402.04494), the model is trained via behavioral cloning — learning to imitate moves played by strong human players.

Architecture

FEN String (board state)
    |
    v
[ASCII Tokenizer] -> 79 tokens
    |
    v
[Token Embedding] + [Positional Embedding]
    |
    v
[N x Transformer Block]
  ├─ RMSNorm -> Multi-Head Attention (bidirectional)
  └─ RMSNorm -> SwiGLU FFN
    |
    v
[Mean Pooling] -> board representation
    |
    ├──> [Policy Head] -> move logits (~4544 classes)
    └──> [Value Head]  -> position eval [-1, 1]

Key Design Choices

-No causal masking — The model sees the entire board state at once (bidirectional attention), unlike language models.
-RMSNorm + SwiGLU — LLaMA-style pre-normalization and gated linear units for stable deep training.
-Mean pooling — The whole sequence is pooled into a single board representation, not just the last token.
-Dual head — Policy loss (cross-entropy) + Value loss (MSE) are trained jointly.

Training

Models are trained on the Lichess standard chess games dataset via streaming. Each position in each game becomes a training sample where the input is the FEN and the target is the move actually played. Game results provide the value target (1.0 for white win, -1.0 for black win, 0.0 for draw).

About Artoria Zero

Overview

Architecture

Key Design Choices

Training

Links