About Artoria Zero
Overview
Artoria Zero is a chess engine that plays at a strong level without any search algorithm. Unlike traditional engines (Stockfish with Alpha-Beta) or AlphaZero (with MCTS), Artoria predicts the best move in a single forward pass through a neural network.
Based on the approach described in “Grandmaster-Level Chess Without Search” (arXiv:2402.04494), the model is trained via behavioral cloning — learning to imitate moves played by strong human players.
Architecture
FEN String (board state)
|
v
[ASCII Tokenizer] -> 79 tokens
|
v
[Token Embedding] + [Positional Embedding]
|
v
[N x Transformer Block]
├─ RMSNorm -> Multi-Head Attention (bidirectional)
└─ RMSNorm -> SwiGLU FFN
|
v
[Mean Pooling] -> board representation
|
├──> [Policy Head] -> move logits (~4544 classes)
└──> [Value Head] -> position eval [-1, 1]Key Design Choices
- -No causal masking — The model sees the entire board state at once (bidirectional attention), unlike language models.
- -RMSNorm + SwiGLU — LLaMA-style pre-normalization and gated linear units for stable deep training.
- -Mean pooling — The whole sequence is pooled into a single board representation, not just the last token.
- -Dual head — Policy loss (cross-entropy) + Value loss (MSE) are trained jointly.
Training
Models are trained on the Lichess standard chess games dataset via streaming. Each position in each game becomes a training sample where the input is the FEN and the target is the move actually played. Game results provide the value target (1.0 for white win, -1.0 for black win, 0.0 for draw).
Links
- Models: Shinapri/artoria-zero
- Source: ShinapriLN/artoria
- Paper: arXiv:2402.04494