Research: The Experiment Post – Designing a Neural Population Simulation

Research compiled 2026-04-05. For the “Neural Societies” blog series.

1. What Problem/Environment to Use

The original draft suggested Collatz. This is a bad fit: Collatz sequences are deterministic functions with no strategic interaction, no cooperation/defection dynamic, and no multi-agent structure. The interesting part of “neural societies” is the social dynamics, not the computation itself. The problem needs agents who interact.

Tier 1 Recommendations (Best Fit for the Blog)

Iterated Prisoner’s Dilemma (IPD) with neural net agents – the canonical choice, and for good reason.

Rich history: Axelrod’s tournaments, Tit-for-Tat, evolutionary game theory literature you already know.
Neural net agents with memory (e.g. small RNNs or feedforward nets that see last N moves) develop strategies that can be named, classified, and explained to readers.
Evolutionary dynamics are well-understood analytically, so you can compare simulation results to theory.
The Axelrod-Python library (https://github.com/Axelrod-Python/Axelrod) has 200+ strategies and built-in Moran process / population dynamics – useful as a baseline even if you roll your own neural agents.
A recent Nature Communications paper (2025) used MARL to discover a “memory-two bilateral reciprocity” strategy that outperforms classical strategies in both iterated and evolutionary settings. This is the kind of result that makes good blog material.

Coin Game – a 2D grid-world social dilemma from Lerer & Peysakhovich (2017), already implemented in JaxMARL.

Two agents (red, blue) pick up coins on a 3x3 grid. Picking up your own colour is always good; picking up the other’s colour hurts them.
Has the IPD’s cooperation/defection structure but with a continuous, spatial, visually compelling state space.
Neural network capacity affects behaviour: larger networks defect more in some environments, cooperate more in others (Leibo et al., 2017). This is a great finding to reproduce and blog about.
Available in JaxMARL, so you get GPU acceleration out of the box.

Hawk-Dove Game – simpler than IPD but produces interesting polymorphic equilibria.

Two strategies: aggressive (Hawk) vs. sharing (Dove). The equilibrium is a mixed population.
Good for showing how evolutionary dynamics converge to a Nash equilibrium.
N-person generalizations exist with richer dynamics (see: “Evolutionary dynamics of N-person Hawk-Dove games”, Scientific Reports 2017).
Can be run on a spatial grid for visual appeal.

Tier 2 (More Complex, Potentially Better Visuals)

Public Goods Game – N agents decide how much to contribute to a common pool that gets multiplied and shared.

Natural free-rider problem. Neural agents learn contribution strategies.
Recent work (Neural Computing and Applications, 2024) trained RL agents in PGG with uncertainty and communication.
Good for showing how punishment mechanisms or reputation systems emerge.

Melting Pot (DeepMind) – a suite of 50+ 2D multi-agent substrates with 256+ test scenarios.

Tests cooperation, competition, deception, trust, reciprocation in mixed-incentive environments.
Visually rich: top-down grid worlds with sprites.
Available at https://github.com/google-deepmind/meltingpot with PettingZoo integration via Shimmy.
Caveat: heavyweight. Designed for evaluating trained MARL agents, not for neuroevolution from scratch. The setup cost is high for a blog post.

Lenia – continuous cellular automata that produce “artificial lifeforms”.

Over 400 species identified. Visually stunning.
The update rule is literally a convolutional neural network layer (convolution + nonlinear growth function).
Could be a companion post rather than the main experiment. The connection to neural nets is elegant but the “society” framing is a stretch.

What to Avoid

Pure optimization problems (Collatz, TSP, function optimization): no agent interaction, no social dynamics, nothing to narrativize.
Full StarCraft / complex MARL benchmarks: too expensive, too many moving parts, results are hard to interpret for a blog audience.
Environments where the interesting dynamics require millions of steps: you want things that converge in minutes, not days.

2. Frameworks and Tools

The JAX Ecosystem (Recommended Stack)

The JAX ecosystem has emerged as the clear winner for this kind of work. Everything compiles to XLA, runs on GPU/TPU, and the functional style makes vectorization across populations trivial.

Primary recommendation: evosax + JaxMARL

Tool	What It Does	GitHub	Key Advantage
evosax	30+ evolution strategies in JAX	https://github.com/RobertTLange/evosax	Ask-eval-tell API. CMA-ES, OpenAI-ES, SimpleGA, etc. Full jit/vmap support.
JaxMARL	11 multi-agent environments in JAX	https://github.com/FLAIROx/JaxMARL	Coin Game, MPE, STORM (matrix games as grids), Overcooked. 12500x faster than non-JAX when vectorized.
EvoJAX	Neuroevolution toolkit (Google)	https://github.com/google/evojax	SlimeVolley, WaterWorld, MNIST. PGPE algorithm. Trains in minutes on single GPU.
QDax	Quality-diversity in JAX	https://github.com/adaptive-intelligent-robotics/QDax	MAP-Elites on GPU. Finds diverse high-performing solutions, not just the single best.

Why this stack?

evosax gives you the evolutionary algorithm (the “how agents evolve”).
JaxMARL gives you the social environment (the “where agents interact”).
Everything is JAX, so you can compose them. vmap across your population, jit the whole loop.
QDax is optional but powerful: instead of just finding the best strategy, MAP-Elites builds a map of all viable strategies, which is exactly what you want for the blog’s “strategy space” visualizations.

Alternative: PyTorch + Custom Loop

If you want more control and are more comfortable with PyTorch:

EvoX (https://github.com/EMI-Group/evox): 50+ evolutionary algorithms, moved to PyTorch in v1.0.0 (Jan 2025). Full torch.compile support.
TensorNEAT (https://github.com/EMI-Group/tensorneat): GPU-accelerated NEAT in JAX. 500x speedup vs. NEAT-Python. Won GECCO 2024 Best Paper. Evolves network topology, not just weights.
Roll your own evolution loop with PyTorch tensors. For IPD with small nets, this is actually fine. The overhead of learning a framework might not be worth it for a blog post.

For the IPD Specifically

Axelrod-Python (https://github.com/Axelrod-Python/Axelrod): 200+ strategies, Moran process, tournaments, visualization. Pure Python, no GPU. Great for baselines and comparison.
OpenSpiel (https://github.com/google-deepmind/open_spiel): DeepMind’s game theory + RL framework. Supports IPD, coordination games, extensive-form games. C++ core with Python bindings. Overkill for a blog post but comprehensive.

For Multi-Agent RL (if Going Beyond Evolution)

Mava (https://github.com/instadeepai/Mava): MARL in JAX. IPPO, MAPPO, etc. 100x faster than EPyMARL at 256 vectorized environments.
PettingZoo (https://pettingzoo.farama.org/): The standard API for MARL environments. Classic games, MPE, Atari. Not GPU-accelerated itself, but everything integrates with it.

NEAT (Evolving Topology)

If you want agents whose architecture evolves, not just weights:

TensorNEAT (above): the modern choice. GPU-accelerated, JAX-based.
NEAT-Python (https://github.com/CodeReclaimers/neat-python): the classic. Pure Python, easy to understand, slow. Good for pedagogical purposes if you want to explain NEAT in a post.

Recommended Stack for This Blog Series

For the experiment post specifically:

evosax          -- evolutionary algorithm (CMA-ES or OpenAI-ES)
JaxMARL         -- environment (Coin Game or STORM for matrix games)
JAX + Flax      -- neural network definition
matplotlib      -- static plots
wandb or tensorboard -- training curves
UMAP (umap-learn)   -- strategy space visualization

If you want the IPD specifically, skip JaxMARL and write the game loop yourself (it is 20 lines of code). Use evosax for the evolution and JAX for the neural nets.

3. What to Measure and Visualize

This is what separates a research blog post from a tutorial. The measurements need to tell a story.

Core Metrics

Fitness / performance over generations

Mean, max, min, and standard deviation of fitness across the population.
Plot as a ribbon chart (mean +/- 1 std). Readers instantly see convergence and diversity.
For IPD: average payoff per agent per round.

Population diversity over time

Behavioural diversity: how differently do agents act? Measure as variance of actions across the population on a fixed set of test scenarios.
Parametric diversity: distance between agent weight vectors. Compute pairwise L2 distances, plot mean/median.
The “Unified Diversity Measure” (UDM) from NeurIPS 2022 provides a formal framework, but for the blog, simpler is better.

Strategy distribution

Classify agents into strategy types (cooperator, defector, conditional cooperator, etc.) and track the population composition over time.
For IPD: run each agent against Tit-for-Tat, Always-Cooperate, Always-Defect, and Random, then cluster based on response profiles.
Stacked area chart of strategy proportions is visually compelling.

Advanced Visualizations (The “Wow” Factor)

Strategy space maps (UMAP/t-SNE of behaviour)

For each agent, record its action sequence against a fixed battery of opponents. This gives a behaviour vector.
Apply UMAP to project to 2D. Colour by fitness, generation, or strategy type.
Animate across generations: you see clusters form, merge, split, and go extinct.
This is the single most bloggable visualization. It looks like a galaxy forming.
Use umap-learn (Python). UMAP is faster and preserves global structure better than t-SNE.
Caveat from recent research: distances between clusters in UMAP/t-SNE plots are not always meaningful. State this honestly.

Weight space visualization

Apply UMAP directly to flattened weight vectors.
Less interpretable than behaviour space, but shows whether evolution explores broadly or converges to a narrow region.
Can reveal “speciation” if distinct clusters form in weight space.

Phylogenetic trees of neural agents

Track parent-child relationships across generations.
Visualize as a tree (or forest). Colour by strategy type or fitness.
Shows which lineages survive, when extinctions happen, when new strategies branch off.
Can build with ete3 (Python phylogenetics library) or graphviz.

Phase transitions / emergence detection

Plot diversity metrics and fitness simultaneously. Look for:
- Sudden drops in diversity (population collapse / convergence).
- Sudden jumps in fitness (discovery of a new strategy).
- Oscillations (Red Queen dynamics, like Hawk-Dove cycling).
K-means clustering on trajectory data can identify phase transitions programmatically.
The “DiCo” method (2024) provides a framework for controlling and measuring behavioural diversity, with novel visualizations of diversity distribution over the observation space.

Interaction matrices

Heatmap of average payoffs when strategy type A meets strategy type B.
Shows the game-theoretic structure of the evolved population.
Update across generations to show how the “meta-game” changes.

Practical Visualization Tools

matplotlib: fine for everything. Seaborn for prettier defaults.
plotly: interactive plots that can be embedded in the blog.
wandb: free for personal use. Logs metrics, produces nice dashboards, handles animations.
umap-learn: pip install umap-learn. Fast, well-documented.
ete3: phylogenetic tree visualization. pip install ete3.
networkx + graphviz: for lineage graphs if ete3 is too biology-focused.

4. Existing Implementations to Learn From

Directly Relevant GitHub Repos

Neural Slime Volleyball (David Ha / hardmaru)

https://github.com/hardmaru/slimevolleygym
https://github.com/hardmaru/neuralslimevolley
Small RNNs trained by self-play + neuroevolution. Ported to EvoJAX for GPU acceleration.
The blog post at https://blog.otoro.net/2015/03/28/neural-slime-volleyball/ is a gold standard for how to write about neuroevolution experiments. Study its structure.

David Ha’s “A Visual Guide to Evolution Strategies”

https://blog.otoro.net/2017/10/29/visual-evolution-strategies/
Beautiful visual explanations of CMA-ES, OpenAI-ES, PEPG, etc.
This is the style benchmark for your post.

Prisoners Dilemma Simulator (alexamirejibi)

https://github.com/alexamirejibi/prisoners-dilemma-simulator
Multi-agent IPD with evolutionary algorithm. Exactly your use case, though without neural nets.

N-Person IPD Simulation (Chris0Jeky)

https://github.com/Chris0Jeky/N-person-prisoners-dilemma-simulation
Published research codebase. Multi-agent N-person IPD with analysis tools.

Social Evolution (Frigzer)

https://github.com/Frigzer/Social-evolution
Interactive agent-based simulation of evolutionary game theory on a 2D grid. Prisoner’s Dilemma, Hawk-Dove, etc. Visual and spatial.

IPD Research Tool (cristal-smac)

https://github.com/cristal-smac/ipd
Python + Jupyter notebooks for computational game theory. Good pedagogical material.

Sequential Social Dilemmas (eugenevinitsky)

https://github.com/eugenevinitsky/sequential_social_dilemma_games
DeepMind’s sequential social dilemma environments (Gathering, Wolfpack). Neural net agents in grid worlds.

estool (hardmaru)

https://github.com/hardmaru/estool
Simple evolution strategies tool. Good reference for a minimal ES implementation.

Key Papers

“Game Theory and Multi-Agent RL: From Nash Equilibria to Evolutionary Dynamics” (arXiv 2412.20523, Dec 2024) – survey connecting game theory and MARL. Good for framing.
“A multi-agent reinforcement learning framework for exploring dominant strategies in iterated and evolutionary games” (Nature Communications, 2025) – discovered memory-two bilateral reciprocity. State of the art for neural agents in IPD.
“Multi-agent Reinforcement Learning in Sequential Social Dilemmas” (Leibo et al., 2017) – the foundational paper on neural net agents in social dilemmas. Showed network size affects cooperation.
“Population Based Training of Neural Networks” (Jaderberg et al., DeepMind, 2017, arXiv 1711.09846) – PBT. Not your exact setup but the inspiration for population-level training dynamics.
“Controlling Behavioral Diversity in Multi-Agent Reinforcement Learning” (2024) – DiCo method. Proposes tools for measuring and controlling diversity.
“EvoJAX: Hardware-Accelerated Neuroevolution” (Tang et al., 2022, arXiv 2202.05008) – the EvoJAX paper. Good technical reference.
“evosax: JAX-based Evolution Strategies” (Lange, 2022, arXiv 2212.04180) – the evosax paper.
“TensorNEAT: A GPU-accelerated Library for NeuroEvolution of Augmenting Topologies” (2024, GECCO Best Paper) – if you go the topology-evolution route.

Blog Posts and Tutorials Worth Reading

David Ha’s blog (blog.otoro.net) – multiple posts on neuroevolution, evolution strategies, world models.
The Towards Data Science article “From Genes to Neural Networks: Understanding and Building NEAT from Scratch” (Aug 2025).
The Lunatech blog post “The NEAT Algorithm: Evolving Neural Networks” (Feb 2024).

5. Practical Pitfalls

Population Collapse / Loss of Diversity

This is the #1 failure mode. The population converges to a single strategy, usually early, and then never explores again.

Causes:

Selection pressure too high (elitism too aggressive).
Mutation rate too low.
Tournament selection with large tournament sizes.

Mitigations:

Use (mu, lambda) selection instead of (mu+lambda). Offspring replace parents entirely, which preserves diversity. Research confirms (mu, lambda) has “stronger capacity to preserve diversity across generations” (2025).
Add explicit diversity pressure: fitness sharing, novelty bonus, MAP-Elites archive.
Speciation (NEAT-style): group similar agents together, let them compete within their group.
Mutation rate schedule: start high, decay.
Island model: split population into subpopulations that evolve independently with occasional migration.

Reward Shaping Traps

In multi-agent settings, reward shaping is dangerous:

Shaped rewards can change the equilibria of the game. What converges under shaped rewards may not be stable under true rewards.
“Reward shaping has been shown to alter the intended goals of a domain if misused, leading to unintended behaviour” (survey, 2024).
In the IPD, do not shape rewards to encourage cooperation. Let cooperation emerge (or not). The whole point is to observe what happens.

Recommendation: Use the raw game payoff matrix. No shaping. If convergence is too slow, increase population size or generation count, not reward complexity.

Evaluation Challenges

“How do you know if anything interesting happened?” is a legitimate problem.

A population that all defects in IPD is a valid Nash equilibrium. It is “correct” but boring.
A population that all cooperates may be unstable – one mutant defector could invade.
You need to test evolved agents against a diverse battery of opponents, not just against each other. This is the Melting Pot insight: social generalization requires exposure to unfamiliar agents.
Report evolutionary stability, not just average fitness. Run invasion experiments: introduce a small number of a different strategy and see if the population resists.

Computational Gotchas

JAX compilation overhead: first call to a jit-compiled function is slow (sometimes 30-60 seconds for complex functions). Subsequent calls are fast. Do not benchmark the first call.
Memory: populations of neural nets can eat GPU memory fast. A population of 1000 networks with 10K parameters each is 10M floats = 40MB. That is fine. A population of 10K networks with 1M parameters each is 10B floats = 40GB. That is not fine. Plan your parameter counts.
Random seeds: JAX requires explicit PRNG key management. Forgetting to split keys leads to correlated “random” numbers across your population. This is a subtle and devastating bug.
Stochastic evaluation: if the environment is stochastic, you need to average fitness over multiple episodes per agent. One evaluation per agent is extremely noisy and leads to drift, not selection.

The “It Just Converges to Tit-for-Tat” Problem

For IPD specifically, if you use standard evolutionary dynamics, the population will very likely converge to something like Tit-for-Tat or Win-Stay-Lose-Shift. This is the known analytical result. It is correct but not very exciting for a blog post.

To get richer dynamics:

Use N-player games (public goods, not pairwise IPD).
Add spatial structure (agents interact with neighbours on a grid).
Use noisy payoffs or noisy actions (agents sometimes “accidentally” defect).
Allow communication (cheap talk before the game).
Use asymmetric games (agents have different roles or capabilities).

6. Scaling Considerations

What Is Feasible on a Single GPU?

Setup	Population	Net Size	Generations	Time (est.)	Hardware
IPD with feedforward nets (evosax)	1,000	100 params	500	2-5 min	Single consumer GPU
IPD with small RNNs (evosax)	500	1,000 params	500	5-15 min	Single consumer GPU
Coin Game with small nets (JaxMARL + evosax)	256	5,000 params	200	15-45 min	Single consumer GPU
NEAT topology evolution (TensorNEAT)	10,000	variable	100	~30 min	Single GPU
SlimeVolley self-play (EvoJAX)	1,000	~20K params	500	~30 min	Single GPU
MAP-Elites on Brax locomotion (QDax)	4,096	~50K params	1,000	~1 hour	Single GPU

These estimates are based on benchmarks from EvoJAX (MNIST in 5 min, SlimeVolley in minutes), TensorNEAT (10K population / 100 generations on GPU), and JaxMARL (14x faster than non-JAX baselines, 12500x with vectorization).

Population Sizes That Matter

< 50: too small. Not enough diversity for interesting dynamics. Fine for debugging.
100-500: minimum viable for evolutionary game theory experiments. You will see strategy cycling and basic dynamics.
1,000-5,000: the sweet spot for bloggable results. Enough diversity for UMAP plots to look meaningful. Strategy distributions stabilize.
10,000+: diminishing returns for the blog use case, but useful for NEAT where many individuals are non-viable.

Network Sizes That Matter

10-100 parameters: feedforward net with 1 hidden layer of 8-16 units. Sufficient for IPD, Hawk-Dove. Fast to evolve.
100-1,000 parameters: small RNN or feedforward with 2 layers. Good for Coin Game, sequential social dilemmas.
1,000-10,000 parameters: enough for simple convolutional policies in grid worlds. Starts to get expensive for large populations.
> 10,000 parameters: you are entering RL territory. Evolution alone struggles. Consider PBT or hybrid methods.

When You Need a Cluster

You probably do not need one for this blog series. The whole point is to show that interesting social dynamics emerge from simple setups. A single GPU (or even CPU for small IPD experiments) is enough.

You would need more compute if:

You want to sweep over many environment configurations.
You want statistically robust results (many independent runs with different seeds).
You use Melting Pot substrates with pixel observations.
You train large policy networks with gradient-based MARL (MAPPO, etc.).

Memory Budget

A useful rule of thumb:

memory = population_size * params_per_agent * 4 bytes (float32)

1,000 agents * 1,000 params = 4MB (trivial)
5,000 agents * 10,000 params = 200MB (fine on any GPU)
10,000 agents * 100,000 params = 4GB (needs a decent GPU)
10,000 agents * 1,000,000 params = 40GB (needs A100 or multi-GPU)

7. Concrete Proposal for the First Experiment

Based on all of the above, here is what I would build first:

Setup

Environment: Iterated Prisoner’s Dilemma, pairwise, 100 rounds per match.

Agents: Feedforward neural nets. Input: last 3 moves of self + last 3 moves of opponent = 12 binary inputs (or 6 if encoded as single values). One hidden layer of 16 units (ReLU). Output: probability of cooperating (sigmoid). Total: ~200 parameters.

Evolution: CMA-ES via evosax, population of 1,000. Fitness = average payoff across round-robin tournament (every agent plays every other agent). Standard IPD payoff matrix (T=5, R=3, P=1, S=0).

Framework: JAX + evosax + custom IPD game loop (20 lines). No heavy framework needed for this.

What to Measure

Mean/std fitness per generation (ribbon chart).
Cooperation rate over time (fraction of cooperative moves in the population).
Strategy classification: test each agent against TFT, AllC, AllD, Random. Cluster into types. Stacked area chart.
UMAP of behaviour vectors every 50 generations. Animate.
Population diversity: mean pairwise L2 distance of weight vectors.

What You Hope to See

Initial chaos (random strategies).
Defectors dominate early (higher immediate payoff).
Conditional cooperators emerge and cluster.
Possible cycling: cooperators enable defectors, defectors kill cooperators, defectors starve, cooperators re-emerge.
In the UMAP plot: clusters form, merge, and drift. This is the “galaxy” visual.

Extension for a Second Post

Move to the Coin Game in JaxMARL. Same evolutionary approach but now with spatial observations and a 2D world.
Or: add MAP-Elites (QDax-style archive) to the IPD experiment. Instead of one population converging, build a map of all viable strategies. This gives you the full strategy landscape as a visualization.

Summary of Recommendations

Decision	Recommendation	Reason
Environment	IPD (first post), Coin Game (second post)	IPD is simple, well-understood, rich dynamics. Coin Game adds visuals.
Evolution algorithm	CMA-ES or OpenAI-ES via evosax	Robust, well-studied, GPU-accelerated, simple API.
Framework	JAX + evosax (+ JaxMARL for Coin Game)	Everything in one ecosystem. Fast. Composable.
Population size	1,000 for IPD, 256-512 for Coin Game	Sweet spot for single-GPU, enough for diversity.
Network size	100-500 params (IPD), 1,000-5,000 (Coin Game)	Small enough to evolve, large enough to learn.
Key visualization	UMAP of behaviour space over generations	The single most compelling visual for a blog post.
Diversity preservation	(mu, lambda) selection + mutation rate tuning	Prevents collapse without overcomplicating.
What NOT to do	Do not shape rewards. Do not use heavy frameworks.	Let dynamics emerge naturally. Keep it simple.

Key References

Tools and Frameworks

evosax: https://github.com/RobertTLange/evosax (arXiv 2212.04180)
EvoJAX: https://github.com/google/evojax (arXiv 2202.05008)
JaxMARL: https://github.com/FLAIROx/JaxMARL (NeurIPS 2024)
QDax: https://github.com/adaptive-intelligent-robotics/QDax
TensorNEAT: https://github.com/EMI-Group/tensorneat (GECCO 2024 Best Paper)
EvoX: https://github.com/EMI-Group/evox
Axelrod-Python: https://github.com/Axelrod-Python/Axelrod
OpenSpiel: https://github.com/google-deepmind/open_spiel
Melting Pot: https://github.com/google-deepmind/meltingpot
PettingZoo: https://pettingzoo.farama.org/
Mava: https://github.com/instadeepai/Mava
pyribs: https://github.com/icaros-usc/pyribs
NEAT-Python: https://github.com/CodeReclaimers/neat-python

Papers

Jaderberg et al., “Population Based Training of Neural Networks”, 2017 (arXiv 1711.09846)
Leibo et al., “Multi-agent Reinforcement Learning in Sequential Social Dilemmas”, 2017
“Game Theory and Multi-Agent RL: From Nash Equilibria to Evolutionary Dynamics”, 2024 (arXiv 2412.20523)
“A multi-agent reinforcement learning framework for exploring dominant strategies”, Nature Communications, 2025
“Controlling Behavioral Diversity in Multi-Agent Reinforcement Learning”, 2024
“A Unified Diversity Measure for Multiagent Reinforcement Learning”, NeurIPS 2022

Blog Posts and Tutorials

David Ha, “A Visual Guide to Evolution Strategies”: https://blog.otoro.net/2017/10/29/visual-evolution-strategies/
David Ha, “Neural Slime Volleyball”: https://blog.otoro.net/2015/03/28/neural-slime-volleyball/
“From Genes to Neural Networks: Understanding and Building NEAT from Scratch”, Towards Data Science, 2025