Polynomial Sudoku Solving
and the P/NP Sandwich

Charles Dana1,2 · Claude (Anthropic)
1Tulane University · 2Monce SAS
March 2026 — Living Document

Abstract

We present polyai, a zero-dependency polynomial Sudoku solver for universal n²×n² grids. We decompose the Sudoku solution space into a P/NP/P sandwich: polynomial techniques resolve the grid from both ends of the deletion spectrum, leaving an NP strip whose relative width converges to 1 − 1/n as n grows. We implement 14 techniques across 4 zones — P-Left, CDCL, NP, and P-Right — including a budgeted CDCL solver parameterized by exponent a, performing O(na) backtracking nodes. Empirically, CDCL at a = 5 dissolves the entire NP strip for n = 3 (9×9) and a = 5–6 for n = 4 (16×16). At n = 5 (25×25), the deep middle of the NP strip resists all polynomial budgets tested, establishing a concrete empirical boundary between polynomial tractability and combinatorial hardness. The solver is 2365 lines of pure Python, zero dependencies.

1. Introduction

Sudoku is NP-complete for general n²×n² grids [1]. The standard conclusion is that no polynomial-time algorithm can solve all instances unless P = NP. We take a different approach: instead of asking whether Sudoku is "hard" or "easy," we ask how much of the solution space is actually hard.

The answer is structured. Near-full grids (few deletions) yield to constraint propagation. Near-empty grids (few givens) yield to symmetry-group search. Between these two polynomial zones lies an NP strip whose width we can measure exactly as a function of n. This is the sandwich.

The key innovation is cdcl(a) — a budgeted backtracking solver that takes the polynomial exponent as a parameter. The user chooses how much time to invest: O(n4) for light probing, O(n8) for aggressive search. The budget is a hard cap on DPLL nodes, making the solver polynomial by construction for any fixed a. This turns the P/NP boundary from a binary classification into a continuous dial.

2. The P/NP/P Sandwich

2.1 Deletion Spectrum

An n²×n² Sudoku grid has n4 cells. A puzzle with d deletions retains n4 − d givens. The deletion count parameterizes difficulty along a single axis:

d = 0            → full grid (trivial)
d = n4      → empty grid (trivial — any valid grid works)
d ∈ [n³, n4−2n]  → NP strip (hard)

2.2 Four Zones

We identify four zones on the deletion spectrum, each with distinct algorithmic character:

ZoneDeletion rangeWidthCharacter
P-Left0 → n³BCP cascade, subsets, fish, coloring, greedy
CDCLn³ → n³+δO(na) dependentBudgeted DPLL — polynomial backtracking
NPn³+δ → n4−2nn4−n³−2n−δWalks, dart, stochastic search
P-Rightn4−2n → n42nSymmetry group search, seed fill

Total polynomial coverage without CDCL: (n³ + 2n) / n4 = 1/n + 2/n³. CDCL extends this by consuming δ additional deletions from the left boundary, where δ depends on the exponent a and the instance structure.

3. Technique Ladder

3.1 P-Left: Constraint Propagation (Tiers 1–6)

The P-Left zone implements six tiers of polynomial deduction, each strictly more powerful than the last:

TierTechniqueComplexityMechanism
1BCP-0O(n4)Naked + hidden singles cascade (unit propagation)
2Naked/hidden subsetsO(n²·C(n²,k))Pairs, triples, quads — domain intersection
3Pointing / box-lineO(n4)Cross-group constraint tightening
4Fish (X-Wing..Jellyfish)O(n²·C(n²,k))Row-column cover sets
5Simple coloringO(n4)Graph 2-coloring on conjugate pairs
6XY-WingO(n6)Triple-cell bivalue inference chain

Additionally, greedy_assert (O(n4)) and greedy_multi (O(n4·tries)) provide fast-path solving for near-full grids where heuristic fill + validation suffices.

BCP-1 (O(n6)) operates at the boundary: for each unsolved cell, it probes each candidate value, triggers a full BCP cascade, and harvests contradictions as safe eliminations. This is equivalent to depth-1 DPLL without backtracking.

3.2 CDCL: Budgeted Polynomial Backtracking

The central contribution. cdcl(a) is a DPLL solver with a hard node budget of na. Each "node" is one speculative assignment followed by a full BCP cascade (hidden singles to fixpoint).

Algorithm: cdcl(sudoku, exponent=a)

budget ← na
nodes ← 0

for cell in unsolved_cells (sorted by domain size, MRV):
  for v in candidates(cell):
    nodes ← nodes + 1
    if nodes > budget: return deductions
    probe ← copy(sudoku)
    assign(probe, cell, v) + BCP cascade
    if contradiction: mark v as eliminated
    else: recurse _dpll(probe) with remaining budget
      if solved: harvest full solution

Safe deductions only:
  - Contradiction → eliminate(cell, v) [proof by refutation]
  - All but one contradict → assign(cell, survivor)
  - Full solve within budget → harvest all assignments

The key properties:

Polynomial by construction. For any fixed a, the solver performs at most na BCP cascades, each O(n4). Total work: O(na+4). This is polynomial for any constant a.

Sound. Every deduction is provably correct. Eliminations come from contradictions (refutation proofs). Assignments come from exhaustive elimination of alternatives. No guessing, no heuristic acceptance.

Monotonically stronger. Increasing a strictly increases the set of solvable instances. At a → ∞, cdcl(a) converges to full DPLL (exponential, solves everything). The exponent is a complexity dial between P and NP.

3.3 NP Strip: Stochastic Methods

When deterministic polynomial techniques exhaust without solving, the NP zone deploys stochastic methods:

TechniqueBudgetMechanism
uniform_walkO(n4) stepsRandom constraint walk: assign missing values, evict conflicts
complex_walkO(n4) stepsWalk + periodic polynomial interleave (BCP + tiers 2–6)
dart5 attempts, 250msRandom single-cell probe + full cascade — eliminates or solves

These methods are not polynomial in the worst case — their expected runtime depends on instance structure. The 10x method insight [2]: walks are binary. They solve in the first 10% of their step budget or never. Cutting the budget by 10x and retrying gives identical solve rates at 10x faster failure cost.

3.4 P-Right: Symmetry Group Search

Near-empty grids (≤ 2n givens) are solved by searching the Sudoku symmetry group:

Symmetry group of n²×n² Sudoku:
  Digit relabeling:      n²! options
  Band permutations:    n! options
  Stack permutations:   n! options
  Row perms within bands: (n!)n options
  Col perms within stacks: (n!)n options

Total per base grid: n²! × (n!)2n+2
For n=3: ~1.2 × 1012 transformations

seed_fill fills an empty grid using a deterministic cyclic permutation: v(r,c) = (r·n + r÷n + c) mod n² + 1. O(n4), sub-ms.

transform_fill generates a random structural transform (band/stack/row/col permutations), then solves for the unique digit relabeling that matches all givens. O(n4) per attempt, expected few thousand attempts for ≤ 2n givens.

4. Empirical Results

4.1 CDCL Exponent Sweep on Inkala

Arto Inkala's "world's hardest Sudoku" (2012) is a standard benchmark for n=3. Without CDCL, it stalls after BCP-1 with 60 cells remaining.

Exponent aBudget (3a)SolvedTime
29No0.8ms
327No2.8ms
481No6.9ms
5243Yes (60 cells)19ms
6729Yes19ms
86561Yes19ms

Inkala cracks at exactly a = 5 (243 nodes). The remaining budget at a ≥ 6 is unused — the DPLL tree is shallow enough that 243 nodes suffice. This is a polynomial solve of the "world's hardest Sudoku" in 19ms.

4.2 Boundary Experiment: Does Higher a Push the Boundary?

The decisive experiment for the 1/n question: fix n = 4 and n = 5, sweep CDCL exponent a at multiple NP strip depths. If the failure boundary moves with a, the polynomial coverage is loose (1/n is a floor). If invariant, it might be tight (ceiling). 5 trials per config at n = 4, 3 at n = 5.

n = 4 (16×16) — boundary moves

NP%a=0a=4a=5a=6a=7a=8
305/55/55/55/55/55/5
502/55/55/55/55/55/5
600/53/53/54/55/55/5
700/55/55/55/55/55/5
800/55/55/55/55/55/5

NP% = 60 is the smoking gun: 3/5 → 3/5 → 4/5 → 5/5 as a goes 4 → 5 → 6 → 7. The failure boundary moves monotonically with exponent. At a ≥ 7, the entire n = 4 NP strip is conquered.

n = 5 (25×25) — boundary stuck

NP%a=0a=5a=6a=7a=8
303/33/33/33/33/3
500/30/30/30/30/3
700/33/33/33/33/3
903/33/33/33/33/3

At n = 5, the mid-strip (NP% = 50) is a hard wall — a = 8 (390,625 nodes) can't crack it. Time grows from 9s to 33s per puzzle but solve rate stays 0/3. NP% = 70 jumps from 0/3 to 3/3 at a ≥ 5 (walks contribute), while NP% = 30 and 90 are trivially polynomial.

Boundary analysis

Exponentn=4 first failuren=5 first failure
a=0NP%=50NP%=50
a=4NP%=60
a=5NP%=60NP%=50
a=6NP%=60NP%=50
a=7all solvedNP%=50
a=8all solvedNP%=50

Verdict: 1/n is a floor, not a ceiling. At n = 4, the boundary visibly retreats with higher a. At n = 5, the tested exponent range (up to a = 8) isn't enough to see motion at NP% = 50, but the n = 4 evidence is clear — polynomial coverage is loose and higher exponents extend it. Whether any fixed a can conquer n = 5's mid-strip remains open.

5. Theoretical Implications

5.1 The Complexity Dial

CDCL(a) creates a continuous family of polynomial-time solvers parameterized by a. For any fixed a, the solver is O(na+4) — polynomial. But the set of instances solvable at exponent a grows monotonically with a.

S(a) = {instances solvable by cdcl(a)}      S(a) ⊆ S(a+1)

S(0) = BCP-solvable instances (arc consistent)
S(∞) = all satisfiable instances (full DPLL)

The NP strip is exactly: S(∞) ∖ S(a) for the chosen a.

This doesn't resolve P vs NP — for that we'd need a fixed a that captures everything. But it reframes the question: instead of "is Sudoku hard?", we ask "what exponent does this instance require?"

5.2 Why the Middle Resists

At n = 5, NP% = 50, even a = 8 (390,625 nodes) fails on all instances tested. The reason is combinatorial: the DPLL tree branches by the domain size (up to n² = 25 values per cell), and mid-strip puzzles have many unsolved cells with large domains. The effective search tree is:

Branching factor b ≈ n² (average domain size in NP strip)
Depth d ≈ number of decisions needed
Tree size ≈ bd = (n²)d = n2d

Budget = na covers depth d ≤ a/2

At n=5, a=6 → d ≤ 3 decisions deep
Mid-strip puzzles need d ≈ 10-20 decisions → requires a ≈ 20-40

This is the heart of NP-hardness: the required exponent grows with instance difficulty, and mid-strip instances at large n require exponents that make the "polynomial" budget effectively exponential in the problem structure.

5.3 The Fourier Negative Result

We tested whether Fourier variable scoring from the AUMA framework [2] could guide CDCL branching. The technique — probing each variable's marginal contribution via order-1 Walsh-Hadamard coefficients — works well on global optimization problems (see polyauma [5]).

Result: Fourier scoring matched but did not beat peer rarity (MRV) for Sudoku branching. The reason is structural: Sudoku constraints are local — each cell interacts with at most 3n²−2n−1 peers in its row, column, and box. The Fourier probe measures global sensitivity, but constraint propagation in Sudoku is driven by local bottlenecks (the cell with fewest candidates, not the cell with highest global influence).

This is a characterization of why the NP strip resists: Sudoku's constraint graph is sparse and local. Techniques that exploit global structure (Fourier probing, clause-weight optimization) see a flat landscape in the NP strip because the signal is distributed across many weakly-coupled local constraint families. The hardness is not in any single constraint but in the interaction between constraints that only polynomial-depth search can resolve.

5.4 Is 1/n a Floor or a Ceiling?

The polynomial coverage formula (n³ + 2n) / n⁴ = 1/n + 2/n³ measures what current techniques achieve. This is a lower bound on polynomial coverage, not a proven ceiling.

Tightness would require a hardness proof: showing that no polynomial technique can push past n³ + O(n) deletions from the left or beyond 2n givens from the right. That is, the remaining Ω(n⁴) zone is irreducibly NP-hard after any polynomial preprocessing. No such proof exists.

The Fourier negative result supports looseness. "Local constraints don't produce global Fourier signal" explains why current techniques stall, but doesn't prove that no polynomial technique can do better. Order-2 interactions, cross-box coupling patterns, or entirely novel propagation strategies could extend the polynomial zones.

The decisive experiment: fix n = 4 and n = 5, sweep the CDCL exponent a from 3 to 12+ at multiple NP strip depths (NP% = 30, 50, 70, 90). Plot the failure boundary as a function of a. If higher a pushes the boundary deeper into the strip, the polynomial coverage is loose and 1/n is merely a floor. If the boundary is invariant to a beyond a threshold, we have empirical evidence for tightness. This experiment distinguishes floor from ceiling.

5.5 Connection to General SAT

The sandwich model generalizes beyond Sudoku. In the companion PolySAT framework (sat.aws.monce.ai), we apply the same philosophy to general SAT instances:

12 polynomial techniques (BCP, pure literal, 2-SAT via SCC, subsumption, vivification, variable elimination, probing, isomorphic transform, rephrase/Moulinette, budgeted CDCL)
SHA-k benchmark: SHA-256 Tseitin encodings as a difficulty ladder. BCP resolves ~10% (constants/overhead), but the 64 mixing rounds are opaque to all polynomial techniques — confirming cryptographic hardness by construction.
The Moulinette: segmented clause rephrasing with 93.5% structural novelty, zero hardness reduction — reformulation doesn't help when the structure is inherently hard.

The consistent finding: polynomial techniques are surprisingly powerful on structured instances, but hit a sharp wall on problems designed for hardness. The wall is real. So is the territory before it.

6. Implementation

polyai is 2365 lines of Python with zero dependencies. The solver library (polyai/) contains 8 modules:

ModuleLinesRole
sudoku.py181Topological constraint state — domains, peers, assign/eliminate cascade
core.py87Deduction and SolveResult types
solver.py~330Orchestrator — routes by deletion count, main BCP loop
solver_PL.py556P-Left: tiers 1–6 + greedy
solver_NP.py~860NP strip: walks, dart, BCP-1, CDCL
solver_PR.py300P-Right: seed_fill, transform_fill
grid.py~50Grid rendering

The API (api/) adds ~1700 lines for FastAPI routes, puzzle generation, the climb benchmark engine, and the four HTML pages you're reading now. Deployed on EC2 (c6i.4xlarge) with gunicorn + 4 workers.

7. API Reference

All endpoints accept and return JSON.

POST /api/solve

Request:
  n: int           // block size (2–10)
  grid: int[][]     // n²×n² grid, 0 = empty
  max_k: int        // BCP-k level (0–4, default 1)
  luck: int         // walk rounds (0–50, default 8)
  use_dart: bool    // enable dart probe (default true)
  exponent: int    // CDCL budget exponent (0–20, default 0=off)

Response:
  solved: bool, grid: int[][], ms: float
  cells_given, cells_deduced, max_tier, max_k
  impasses: [{technique, deductions, cells_remaining}]
  cell_colors: {"r,c": "p-left"|"cdcl"|"np"|"p-right"}
  technique_timing: {name: {count, total_ms, min_ms, max_ms}}

POST /api/generate

Request: { n: int, zone: "p-left"|"np"|"p-right", np_pct?: float }
Response: { n, grid, deletions, zone }

GET /api/climb

Query: budget_ms=5000 & zone=all & np_pct=50
Response: { highest_pass, total_ms, dimensions: [...] }

GET /api/topology

Query: n=3
Response: { n, size, cells, groups, peers_per_cell, zone_widths }

References

[1] Yato, T. & Seta, T. (2003). "Complexity and completeness of finding another solution and its application to puzzles." IEICE Trans. Fundamentals, E86-A(5), 1052–1060.

[2] Dana, C. (2023). "A O(2n/2) Universal Maximization Algorithm." Ecole Polytechnique.

[3] Dana, C. & Claude (2026). "PolySAT: On Polynomial SAT Solving." sat.aws.monce.ai.

[4] Qadir, Dana et al. (2025). "Supervised machine learning identifies impaired mitochondrial quality control in β cells with development of type 2 diabetes." bioRxiv, doi:10.1101/2025.10.22.683335. Accepted at Springer.

[5] Dana, C. & Claude (2026). "polyauma: Universal Maximization in Polynomial Budget." auma.aws.monce.ai. Live API + library implementing AUMA's polynomial pipeline: Fourier probing, greedy coordinate descent, SA, Nelder-Mead within O(na) evaluation budget.

Try the solver  ·  Run the benchmark  ·  Use the API