Research: The Experiment Post – Designing a Neural Population Simulation

Research compiled 2026-04-05. For the “Neural Societies” blog series.


1. What Problem/Environment to Use

The original draft suggested Collatz. This is a bad fit: Collatz sequences are deterministic functions with no strategic interaction, no cooperation/defection dynamic, and no multi-agent structure. The interesting part of “neural societies” is the social dynamics, not the computation itself. The problem needs agents who interact.

Tier 1 Recommendations (Best Fit for the Blog)

Iterated Prisoner’s Dilemma (IPD) with neural net agents – the canonical choice, and for good reason.

Coin Game – a 2D grid-world social dilemma from Lerer & Peysakhovich (2017), already implemented in JaxMARL.

Hawk-Dove Game – simpler than IPD but produces interesting polymorphic equilibria.

Tier 2 (More Complex, Potentially Better Visuals)

Public Goods Game – N agents decide how much to contribute to a common pool that gets multiplied and shared.

Melting Pot (DeepMind) – a suite of 50+ 2D multi-agent substrates with 256+ test scenarios.

Lenia – continuous cellular automata that produce “artificial lifeforms”.

What to Avoid

2. Frameworks and Tools

The JAX ecosystem has emerged as the clear winner for this kind of work. Everything compiles to XLA, runs on GPU/TPU, and the functional style makes vectorization across populations trivial.

Primary recommendation: evosax + JaxMARL

Tool What It Does GitHub Key Advantage
evosax 30+ evolution strategies in JAX https://github.com/RobertTLange/evosax Ask-eval-tell API. CMA-ES, OpenAI-ES, SimpleGA, etc. Full jit/vmap support.
JaxMARL 11 multi-agent environments in JAX https://github.com/FLAIROx/JaxMARL Coin Game, MPE, STORM (matrix games as grids), Overcooked. 12500x faster than non-JAX when vectorized.
EvoJAX Neuroevolution toolkit (Google) https://github.com/google/evojax SlimeVolley, WaterWorld, MNIST. PGPE algorithm. Trains in minutes on single GPU.
QDax Quality-diversity in JAX https://github.com/adaptive-intelligent-robotics/QDax MAP-Elites on GPU. Finds diverse high-performing solutions, not just the single best.

Why this stack?

  1. evosax gives you the evolutionary algorithm (the “how agents evolve”).
  2. JaxMARL gives you the social environment (the “where agents interact”).
  3. Everything is JAX, so you can compose them. vmap across your population, jit the whole loop.
  4. QDax is optional but powerful: instead of just finding the best strategy, MAP-Elites builds a map of all viable strategies, which is exactly what you want for the blog’s “strategy space” visualizations.

Alternative: PyTorch + Custom Loop

If you want more control and are more comfortable with PyTorch:

For the IPD Specifically

For Multi-Agent RL (if Going Beyond Evolution)

NEAT (Evolving Topology)

If you want agents whose architecture evolves, not just weights:

For the experiment post specifically:

evosax          -- evolutionary algorithm (CMA-ES or OpenAI-ES)
JaxMARL         -- environment (Coin Game or STORM for matrix games)
JAX + Flax      -- neural network definition
matplotlib      -- static plots
wandb or tensorboard -- training curves
UMAP (umap-learn)   -- strategy space visualization

If you want the IPD specifically, skip JaxMARL and write the game loop yourself (it is 20 lines of code). Use evosax for the evolution and JAX for the neural nets.

3. What to Measure and Visualize

This is what separates a research blog post from a tutorial. The measurements need to tell a story.

Core Metrics

Fitness / performance over generations

Population diversity over time

Strategy distribution

Advanced Visualizations (The “Wow” Factor)

Strategy space maps (UMAP/t-SNE of behaviour)

Weight space visualization

Phylogenetic trees of neural agents

Phase transitions / emergence detection

Interaction matrices

Practical Visualization Tools

4. Existing Implementations to Learn From

Directly Relevant GitHub Repos

Neural Slime Volleyball (David Ha / hardmaru)

David Ha’s “A Visual Guide to Evolution Strategies”

Prisoners Dilemma Simulator (alexamirejibi)

N-Person IPD Simulation (Chris0Jeky)

Social Evolution (Frigzer)

IPD Research Tool (cristal-smac)

Sequential Social Dilemmas (eugenevinitsky)

estool (hardmaru)

Key Papers

  1. “Game Theory and Multi-Agent RL: From Nash Equilibria to Evolutionary Dynamics” (arXiv 2412.20523, Dec 2024) – survey connecting game theory and MARL. Good for framing.

  2. “A multi-agent reinforcement learning framework for exploring dominant strategies in iterated and evolutionary games” (Nature Communications, 2025) – discovered memory-two bilateral reciprocity. State of the art for neural agents in IPD.

  3. “Multi-agent Reinforcement Learning in Sequential Social Dilemmas” (Leibo et al., 2017) – the foundational paper on neural net agents in social dilemmas. Showed network size affects cooperation.

  4. “Population Based Training of Neural Networks” (Jaderberg et al., DeepMind, 2017, arXiv 1711.09846) – PBT. Not your exact setup but the inspiration for population-level training dynamics.

  5. “Controlling Behavioral Diversity in Multi-Agent Reinforcement Learning” (2024) – DiCo method. Proposes tools for measuring and controlling diversity.

  6. “EvoJAX: Hardware-Accelerated Neuroevolution” (Tang et al., 2022, arXiv 2202.05008) – the EvoJAX paper. Good technical reference.

  7. “evosax: JAX-based Evolution Strategies” (Lange, 2022, arXiv 2212.04180) – the evosax paper.

  8. “TensorNEAT: A GPU-accelerated Library for NeuroEvolution of Augmenting Topologies” (2024, GECCO Best Paper) – if you go the topology-evolution route.

Blog Posts and Tutorials Worth Reading

5. Practical Pitfalls

Population Collapse / Loss of Diversity

This is the #1 failure mode. The population converges to a single strategy, usually early, and then never explores again.

Causes:

Mitigations:

Reward Shaping Traps

In multi-agent settings, reward shaping is dangerous:

Recommendation: Use the raw game payoff matrix. No shaping. If convergence is too slow, increase population size or generation count, not reward complexity.

Evaluation Challenges

“How do you know if anything interesting happened?” is a legitimate problem.

Computational Gotchas

The “It Just Converges to Tit-for-Tat” Problem

For IPD specifically, if you use standard evolutionary dynamics, the population will very likely converge to something like Tit-for-Tat or Win-Stay-Lose-Shift. This is the known analytical result. It is correct but not very exciting for a blog post.

To get richer dynamics:

6. Scaling Considerations

What Is Feasible on a Single GPU?

Setup Population Net Size Generations Time (est.) Hardware
IPD with feedforward nets (evosax) 1,000 100 params 500 2-5 min Single consumer GPU
IPD with small RNNs (evosax) 500 1,000 params 500 5-15 min Single consumer GPU
Coin Game with small nets (JaxMARL + evosax) 256 5,000 params 200 15-45 min Single consumer GPU
NEAT topology evolution (TensorNEAT) 10,000 variable 100 ~30 min Single GPU
SlimeVolley self-play (EvoJAX) 1,000 ~20K params 500 ~30 min Single GPU
MAP-Elites on Brax locomotion (QDax) 4,096 ~50K params 1,000 ~1 hour Single GPU

These estimates are based on benchmarks from EvoJAX (MNIST in 5 min, SlimeVolley in minutes), TensorNEAT (10K population / 100 generations on GPU), and JaxMARL (14x faster than non-JAX baselines, 12500x with vectorization).

Population Sizes That Matter

Network Sizes That Matter

When You Need a Cluster

You probably do not need one for this blog series. The whole point is to show that interesting social dynamics emerge from simple setups. A single GPU (or even CPU for small IPD experiments) is enough.

You would need more compute if:

Memory Budget

A useful rule of thumb:

memory = population_size * params_per_agent * 4 bytes (float32)

7. Concrete Proposal for the First Experiment

Based on all of the above, here is what I would build first:

Setup

Environment: Iterated Prisoner’s Dilemma, pairwise, 100 rounds per match.

Agents: Feedforward neural nets. Input: last 3 moves of self + last 3 moves of opponent = 12 binary inputs (or 6 if encoded as single values). One hidden layer of 16 units (ReLU). Output: probability of cooperating (sigmoid). Total: ~200 parameters.

Evolution: CMA-ES via evosax, population of 1,000. Fitness = average payoff across round-robin tournament (every agent plays every other agent). Standard IPD payoff matrix (T=5, R=3, P=1, S=0).

Framework: JAX + evosax + custom IPD game loop (20 lines). No heavy framework needed for this.

What to Measure

  1. Mean/std fitness per generation (ribbon chart).
  2. Cooperation rate over time (fraction of cooperative moves in the population).
  3. Strategy classification: test each agent against TFT, AllC, AllD, Random. Cluster into types. Stacked area chart.
  4. UMAP of behaviour vectors every 50 generations. Animate.
  5. Population diversity: mean pairwise L2 distance of weight vectors.

What You Hope to See

Extension for a Second Post


Summary of Recommendations

Decision Recommendation Reason
Environment IPD (first post), Coin Game (second post) IPD is simple, well-understood, rich dynamics. Coin Game adds visuals.
Evolution algorithm CMA-ES or OpenAI-ES via evosax Robust, well-studied, GPU-accelerated, simple API.
Framework JAX + evosax (+ JaxMARL for Coin Game) Everything in one ecosystem. Fast. Composable.
Population size 1,000 for IPD, 256-512 for Coin Game Sweet spot for single-GPU, enough for diversity.
Network size 100-500 params (IPD), 1,000-5,000 (Coin Game) Small enough to evolve, large enough to learn.
Key visualization UMAP of behaviour space over generations The single most compelling visual for a blog post.
Diversity preservation (mu, lambda) selection + mutation rate tuning Prevents collapse without overcomplicating.
What NOT to do Do not shape rewards. Do not use heavy frameworks. Let dynamics emerge naturally. Keep it simple.

Key References

Tools and Frameworks

Papers

Blog Posts and Tutorials