ArtoriaZero
A decoder-only transformer that plays chess through pure pattern recognition — no search, no MCTS.
Implementation inspired by “Grandmaster-Level Chess Without Search” (Ruoss et al., 2024) — not the original paper's model
No Search
Pure neural network policy — no MCTS, no Alpha-Beta, no search tree.
Transformer Architecture
LLaMA-style decoder with RMSNorm, SwiGLU, and bidirectional attention.
Behavioral Cloning
Trained to imitate strong players directly from millions of chess games.
Dual Head
Policy head for move prediction + Value head for position evaluation.
Multiple Scales
Small (19M), Mid (100M), and Large (500M) parameter variants.
Instant Inference
Single forward pass per move — no thinking time, no depth limits.
Model Variants
| Variant | d_model | Layers | Heads | Params |
|---|---|---|---|---|
| Small | 256 | 8 | 8 | ~19M |
| Mid | 512 | 16 | 8 | ~100M |
| Large | 1024 | 40 | 32 | ~500M |