Ralph-looping a math paper to (attempt to) win a pub argument


A while back a friend and I played a variant of Texas hold'em. In short: four players, cooperative, no betting. Each street (pre-flop, flop, turn, river) the four dice in the centre (faces 1, 2, 3, 4) get distributed across the players via a take-or-steal protocol that ends the moment the fourth centre die is taken. Each player ends a round with one die. The team wins iff every player's river die matches their final placement at showdown. The dice are the only communication channel.

The tabletop argument was simple: should the die mean "how strong is my hand right now" or "where do I think I'll finish at showdown"? I had a strong intuition for the second. My hunch was that the information conveyed by the 'current state' is strictly less than the information available by taking future cards up until the river into account, and therefore, we can share a larger amount of information by using that as our communication anchor.

My friends were unconvinced. So I did the obvious 2026-thing and ralph-looped Claude through a dozen or so drafts of a semi-formal paper, with peer-review-style critiques between rounds from other agents, treating it as an experiment in using an "agentic maths expert" to push a hunch into a real proof.

The math holds up better than I expected in some places, and worse in others. The single-player case lands as a clean Blackwell-style sufficiency result. The team-game case lifts under a conditional-independence assumption on player information states given the placement vector. Hold'em itself violates that assumption substantively, so the strongest version of my pub claim is, regretfully, a conjecture rather than a theorem.

Flipping between judge-agents and writing agents, I received pushback on a subset claim Claude had been waving past for several drafts, which sent it into a set of enumeration scripts on reduced/toy decks. The result was a small, somewhat humbling detour: the cleanest version of "richer signal beats current rank" turns out to be incomparable. Whoops. The practical recommendation still stands. The clean proof I wanted does not. I haven't won yet. If anyone's up for it, feel free to improve it!

The PDF is here. It's quite short, and I enjoyed taking a tabletop hunch to "I see exactly where my proof stops" in a few mornings.

Comments

No comments yet.

    Stored via Netlify Functions & Blobs. Do not include sensitive info.