Society of Mind Revisited: Research Notes
Research for the “Neural Societies” blog post series. Compiled April 2026.
The central question: Is cognition itself a population process? Modern deep learning increasingly builds systems that look like societies – mixtures of experts, multi-head attention committees, ensemble electorates. Minsky said this in 1986 with no working implementation. Now we have implementations with no adequate theory. This document maps the philosophical landscape.
1. Minsky’s Society of Mind (1986)
Core Thesis
Intelligence arises not from any single principle but from the interaction of many simple, individually mindless agents organized into agencies. “The power of intelligence stems from our vast diversity, not from any single, perfect principle.” Each agent is roughly comparable to a subroutine – individually stupid, collectively smart.
Key Concepts
Agents and Agencies. Agents are the atomic units: simple processes that, alone, do nothing recognizably intelligent. Agencies are organized collections of agents that produce competent behaviour in some domain. The hierarchy is recursive: an agency can itself be an agent within a larger agency.
K-lines (Knowledge-lines). Memory as reactivation. A K-line records which agents were active during a successful episode, forming a retrievable pattern. When a similar situation arises, the K-line reactivates that same constellation of agents. This is not storage of data but storage of activation patterns – a strikingly modern idea that anticipates weight-space retrieval and associative memory in neural nets.
Nemes and Nomes. Minsky’s more structured information representation. Nemes represent aspects of the world (analogous to data); nomes control how representations are processed (analogous to control flow). Polynemes invoke partial states across multiple agencies, representing different facets of a single concept. This is essentially distributed representation with explicit control routing – a concept that reappears in modern attention and gating mechanisms.
Frames. Mental templates or schemas for interpreting new situations. A frame has default values that get overwritten by actual perception. Minsky introduced frames earlier (1974) as a knowledge representation scheme; in Society of Mind they become the structural grammar of agents’ world-models. Modern equivalents: schema-based reasoning, frame semantics in NLP, and arguably the “in-context learning” templates that LLMs construct.
Censors and Suppressors. Negative knowledge – knowledge about what not to do. Censors intercept mental states that precede bad outcomes (“Don’t even begin to think that!”). Suppressors intercept bad outcomes themselves (“Stop thinking that!”). This is a hierarchy of preemptive vs. reactive inhibition. The modern analogue is obvious: RLHF, constitutional AI, safety filters. But Minsky’s point is deeper – he argues that much of intelligence consists of knowing what to suppress, not just what to activate.
The B-brain. A monitoring process that thinks about the world inside the mind – effectively a metacognitive layer. This presages modern self-reflection architectures, chain-of-thought monitoring, and the critic networks in actor-critic reinforcement learning.
The Emotion Machine (2006)
Minsky’s sequel elaborates the architecture into six hierarchical levels of mental activity:
- Instinctive Reactions – innate, fast responses
- Learned Reactions – pattern-matched responses from experience
- Deliberative Thinking – weighing options, planning
- Reflective Thinking – evaluating one’s own deliberation
- Self-Reflective Thinking – modelling one’s own goals and biases
- Self-Conscious Emotions – evaluating against ideals
These levels are coordinated by Critic-Selector mechanisms. Critics detect problems (e.g., frustration signals that current strategies are failing); Selectors activate different “Ways to Think” in response. Emotions are not opposed to cognition – they are modes of cognition, escalation mechanisms that shift processing between levels.
This is remarkably close to the hierarchical planning and metacognitive monitoring seen in modern LLM agent architectures (ReAct, Reflexion, chain-of-thought with self-correction).
What Proved Prescient
- Hierarchical modularity as the organizing principle of intelligence
- Negative expertise (censors/suppressors) as central to competence
- Memory as pattern reactivation rather than data retrieval
- Metacognitive monitoring (B-brain)
- Emotions as cognitive mode-switches, not disruptions
- The fundamental bet: complexity from orchestration, not scale alone
What Remains Problematic
- The framework is descriptive, not mechanistic. “Agents” and “agencies” are posited but not derived from any optimization principle.
- No learning algorithm. How agents come into existence, specialize, or reorganize is hand-waved.
- K-lines and nemes have not been directly implemented in any successful system. The ideas resonate with modern architectures, but the mapping is analogical, not constructive.
- The theory has no account of grounding – how agents’ representations connect to the physical world.
Key References
- Minsky, M. (1986). The Society of Mind. Simon & Schuster.
- Minsky, M. (2006). The Emotion Machine: Commonsense Thinking, Artificial Intelligence, and the Future of the Human Mind. Simon & Schuster.
- Minsky, M. (1974). “A Framework for Representing Knowledge.” MIT AI Lab Memo 306.
- Singh, P. (2003). “Examining the Society of Mind.” Computing and Informatics, 22, 521-543. Provides a formal-ish analysis of 200+ concepts from the book.
2. Mixture of Experts as Literal Society of Mind
The Architecture
A Mixture of Experts (MoE) model is a neural network composed of multiple specialist sub-networks (experts) and a gating network (router) that decides which experts process each input. Only a sparse subset of experts are activated per input, giving the model enormous capacity with manageable compute.
This is, structurally, Minsky’s society of mind made computational: specialist agents, a routing mechanism, sparse activation, and an emergent division of labour.
Historical Arc
The idea has a clean lineage:
-
Jacobs, Jordan, Nowlan & Hinton (1991). “Adaptive Mixtures of Local Experts.” Neural Computation, 3(1), 79-87. The founding paper. A system of expert networks with a gating network, trained by competitive specialization. They demonstrated automatic task decomposition on a vowel discrimination task. The key insight: the gating network learns to route, and the experts learn to specialize, simultaneously.
-
Shazeer et al. (2017). “Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer.” Scaled MoE to thousands of experts with a trainable sparse gating mechanism. Demonstrated >1000x capacity increase with minor compute overhead. This paper made MoE practical at scale.
-
Fedus, Zoph & Le (2021). “Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity.” Simplified routing to top-1 (each token goes to exactly one expert). Achieved trillion-parameter models with 4x speedup over dense T5-XXL. The simplification is important: it means each input sees exactly one specialist, not a blend. This is closer to Minsky’s hard routing than to soft attention.
-
Lepikhin et al. (2021). GShard. Scaled MoE to 600B parameters across 2048 TPUs, using top-2 routing where the second expert is selected probabilistically.
-
Jiang et al. (2024). Mixtral 8x7B. Each layer has 8 expert FFN blocks (7B params each); for each token, a router selects 2 of 8 experts. The model performs comparably to much larger dense models while activating only a fraction of parameters.
How Far Does the Minsky Analogy Go?
Strong parallels:
- Experts specialize through training pressure (competitive learning), just as Minsky’s agents acquire competence through experience
- The gating network is a literal “agency” that manages which agents handle what
- Sparse activation means most of the society is dormant for any given input – consistent with Minsky’s picture of selective activation
- Expert specialization is emergent, not designed. Nobody tells Expert 3 to handle code and Expert 7 to handle poetry.
Where the analogy breaks:
- Minsky’s agents communicate richly and recursively. MoE experts are typically isolated within a layer – they don’t talk to each other, only to the router and the residual stream.
- K-lines (cross-cutting memory traces) have no analogue in standard MoE. Each expert is stateless across inputs.
- Minsky’s censors and suppressors – negative agents that prevent bad actions – have no structural analogue. The router only selects; it doesn’t suppress.
- The hierarchy is flat. Minsky envisions agencies of agencies. MoE layers are repeated but not hierarchically organized in the agency sense.
Neuroscience Connection
Recent neuroscience provides biological grounding for MoE-like computation. Lee et al. (2021, Nature Neuroscience) found that the brain implements a “mixture of experts” framework where the ventrolateral prefrontal cortex (vlPFC) acts as a gating mechanism, tracking prediction reliability across specialist systems (model-based vs. model-free). The vmPFC integrates outputs weighted by reliability. Crucially, the gating mechanism works by inhibition – suppressing less-reliable experts rather than amplifying reliable ones. This is closer to Minsky’s suppressors than to the softmax routing in standard MoE.
Key References
- Jacobs, R.A., Jordan, M.I., Nowlan, S.J. & Hinton, G.E. (1991). “Adaptive Mixtures of Local Experts.” Neural Computation, 3(1), 79-87.
- Shazeer, N. et al. (2017). “Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer.” ICLR 2017.
- Fedus, W., Zoph, B. & Shazeer, N. (2021). “Switch Transformers.” JMLR 23, 1-40.
- Jiang, A.Q. et al. (2024). “Mixtral of Experts.” Mistral AI.
- Lee, S.W. et al. (2021). “Why and how the brain weights contributions from a mixture of experts.” Nature Neuroscience.
3. Multi-Head Attention as Committee
The Mechanism
In a transformer, multi-head attention splits the representation into h parallel “heads,” each computing its own query-key-value attention pattern independently. The outputs are concatenated and projected. The standard interpretation: each head can attend to different positions and capture different types of relationships simultaneously.
The committee metaphor: each head is a specialist voter examining different aspects of the input, and the output projection aggregates their votes.
Evidence for Head Specialization
Two landmark papers established that attention heads develop interpretable, specialized roles:
Clark et al. (2019). “What Does BERT Look At? An Analysis of BERT’s Attention.” Found that specific BERT attention heads align with syntactic dependency relations: direct objects of verbs, determiners of nouns, prepositional objects. Heads were evaluated as classifiers for specific linguistic relations, and some achieved high accuracy on specific dependency types.
Voita et al. (2019). “Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned.” Identified three functional head types in machine translation: positional heads (attending to adjacent tokens, >90% of max attention on position -1 or +1), syntactic heads (tracking dependency relations like subject-verb), and rare-token heads (attending to infrequent tokens). Key finding: specialized heads are the last to be pruned. Most heads can be removed without significant performance loss. The work used stochastic gates with L0 regularization to quantify each head’s contribution.
The Committee Metaphor: Strengths and Limits
Strengths:
- Genuine parallel specialization. Different heads learn different functions without explicit instruction.
- Redundancy. Many heads are dispensable – like committee members who add little but whose presence provides insurance.
- The output projection is literally a learned aggregation of committee votes.
Limits:
- Heads don’t deliberate. There is no inter-head communication – each head computes independently, then results are concatenated. A real committee argues.
- Head roles are not crisp. Many heads serve multiple functions or have context-dependent behaviour. The clean taxonomy (positional, syntactic, semantic) captures only a minority of heads.
- The “voting” is not democratic. The output projection can learn to weight some heads heavily and others near-zero, which is more like a weighted average with learned weights than majority rule.
- There is no mechanism for heads to form coalitions, disagree, or persuade each other. The “committee” is really a panel of independent consultants whose reports are blended by a manager.
Connection to Dennett’s Multiple Drafts
Each attention head computes a different “draft” of which tokens are relevant. The output layer selects and combines these drafts. This is structurally similar to Dennett’s multiple drafts model: parallel, competing narratives with no single Cartesian theatre where they all come together. The output projection is the closest thing to Dennett’s “fame in the brain” – the draft that dominates the output is the one that achieves functional influence.
Key References
- Clark, K. et al. (2019). “What Does BERT Look At? An Analysis of BERT’s Attention.” ACL BlackboxNLP Workshop.
- Voita, E. et al. (2019). “Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned.” ACL 2019.
- Jo, J. & Myaeng, S.H. (2020). “Roles and Utilization of Attention Heads in Transformer-based Neural Language Models.” ACL 2020.
4. Ensemble Methods as Population Cognition
The Condorcet Connection
The Marquis de Condorcet proved in 1785 that if each member of a jury independently votes correctly with probability p > 0.5, the probability of a correct majority verdict approaches 1 as jury size grows. The conditions are crucial: independence and above-chance individual competence.
This is the theoretical backbone of ensemble methods. If you have N models, each with accuracy > 50%, and their errors are independent, majority voting will be arbitrarily accurate as N grows. The ensemble is wiser than any individual.
Ensemble Methods as Social Organizations
Different ensemble methods map onto different social structures:
Bagging (Bootstrap Aggregation). Each model trains on a random subsample. This is like polling a random subset of citizens – diversity through sampling. Random Forests are the canonical example: many decision trees, each seeing a different subset of features and data, voting by majority. The independence assumption is enforced architecturally through randomization.
Boosting. Each model focuses on the errors of its predecessors. This is sequential, not parallel – more like a relay team or a chain of specialists, each correcting the previous one’s blind spots. AdaBoost, gradient boosting. The social analogue: an organization that learns from its mistakes through institutional memory.
Stacking. Multiple heterogeneous models, with a meta-learner that combines their outputs. This is a managed committee: diverse specialists (possibly of different types) with an executive that weighs their opinions. The meta-learner is structurally identical to the MoE gating network.
Formal Results and Their Limits
Berend & Kontorovich (2015). “Condorcet’s Jury Theorem for Consensus Clustering and its Implications for Diversity.” Extended Condorcet’s theorem to clustering ensembles. Key insight: diversity among ensemble members is as important as individual accuracy. If models are identical, adding more helps nothing.
The independence assumption is almost never satisfied in practice. Models trained on the same data, using similar architectures, learn correlated representations. A 2025 paper in Scientific Reports (“When the crowd gets it wrong”) challenges the assumption that larger ensembles inherently make better decisions, identifying conditions under which collective accuracy declines. The Condorcet theorem has a dark side: if p < 0.5, majority voting makes things worse as the jury grows. Systematic bias in the population is catastrophic.
LLM Ensembles and Condorcet. A 2024 study examined independence assumptions in ensemble sentiment analysis using LLMs (Condorcet applied to chatbots). Found that LLMs violate independence assumptions severely – they share training data, architectural biases, and often agree on the same wrong answers. The “wisdom of LLM crowds” is far less wise than Condorcet predicts because the crowd is not independent.
The Deeper Point
Ensemble methods are the most mathematically well-understood case of cognition-as-population-process. The Condorcet theorem provides a formal guarantee: under the right conditions, democratic aggregation of mediocre judgments produces excellent judgments. The conditions (independence, competence) are the hard part, and they are the same conditions that make actual democratic societies work or fail.
The analogy to neural societies is direct: a “society” of neural networks is more than the sum of its parts if and only if its members are diverse and individually competent. Homogeneity kills the ensemble advantage just as groupthink kills the wisdom of crowds.
Key References
- Condorcet, M. de (1785). Essai sur l’application de l’analyse a la probabilite des decisions rendues a la pluralite des voix.
- Breiman, L. (1996). “Bagging Predictors.” Machine Learning, 24(2), 123-140.
- Freund, Y. & Schapire, R.E. (1997). “A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting.” JCSS, 55(1), 119-139.
- Berend, D. & Kontorovich, A. (2015). “Condorcet’s Jury Theorem for Consensus Clustering.” Machine Learning.
- Various (2025). “When the crowd gets it wrong.” Scientific Reports.
5. Global Workspace Theory and Neural Architectures
Baars’ Global Workspace Theory (1988)
Bernard Baars proposed that consciousness functions as a “global workspace” – a shared representational medium that allows many unconscious specialist processors to broadcast information to all other processors. The metaphor: a stage in a theatre with many audience members. Most processing happens in the dark (unconscious), but what reaches the illuminated stage (conscious) becomes globally available.
Key properties:
- Limited capacity. The workspace holds very little at once – a bottleneck.
- Broadcasting. Contents of the workspace are made available to all specialist processors.
- Competition. Many unconscious processes compete for workspace access.
- Integration. The workspace is where otherwise modular processes can share information.
Dehaene’s Neuronal Global Workspace
Stanislas Dehaene and Jean-Pierre Changeux extended GWT into a neurobiological theory with specific computational mechanisms:
Ignition dynamics. Conscious perception involves a sudden, non-linear activation – a phase transition. Early sensory processing (~200ms) is identical for conscious and unconscious stimuli. Conscious perception requires a late-stage “ignition” where prefrontal-parietal networks suddenly and coherently activate, with the rest of the workspace inhibited. This is metastable: the system switches between discrete conscious states.
Architecture. Feedforward AMPA connections propagate signals bottom-up. Slower NMDA-mediated feedback connections enable top-down amplification through recurrent loops. Long-range pyramidal neurons with distant axons create all-to-all connectivity between distant cortical regions. The prefrontal cortex provides greater density of workspace neurons but is not the exclusive seat – parietal, cingulate, and other regions participate.
Selective gating. Workspace neurons receive bottom-up information and transmit top-down modulations, selecting and broadcasting information. This is, functionally, an attention mechanism.
Mapping to Transformer Architectures
The parallels are substantial and have attracted serious theoretical attention:
| GWT Concept | Transformer Analogue |
|---|---|
| Global workspace | Residual stream / shared hidden state |
| Broadcasting | Attention output written to residual stream |
| Specialist processors | Individual layers, heads, MLPs |
| Ignition (threshold-crossing) | Attention pattern formation (softmax sharpening) |
| Limited capacity bottleneck | Finite context window, attention bandwidth |
| Competition for workspace access | Softmax competition among attention scores |
| Long-range connectivity | Self-attention (all-to-all token connectivity) |
Transformers developed attention patterns consistent with global workspace dynamics in empirical studies. One study found a “Global Broadcast Index” of M = 0.850 (d = 3.21), with identifiable ignition-like thresholds. Transformer-based systems scored significantly higher on Global Workspace metrics compared to recurrent and convolutional architectures.
Bengio’s Consciousness Prior (2017)
Yoshua Bengio’s “The Consciousness Prior” (arXiv:1709.08568) is the most direct attempt to turn GWT into a machine learning principle. Key ideas:
- Consciousness functions as an information bottleneck: attention selects a few elements from a large unconscious representational space, which are then broadcast and condition further processing.
- The conscious state is low-dimensional (like a sentence) containing few variables but capable of expressing high-probability statements about reality.
- High-level concepts follow a sparse factor graph where each factor involves very few variables.
- This connects to natural language: the structure of sentences validates the distributional assumptions about how high-level concepts relate.
Recurrent Independent Mechanisms (Goyal, Lamb, Bengio et al., 2019). “Recurrent Independent Mechanisms” (arXiv:1909.10893). A direct architectural implementation of modular processing with a communication bottleneck. Multiple groups of recurrent cells operate with nearly independent transition dynamics, communicating only through sparse attention. Each group activates only at relevant timesteps. This produces significantly better out-of-distribution generalization, exactly because the modularity allows some factors of variation to change without disrupting others.
RIMs are arguably the most explicit attempt to build a “society of mind” architecture with a formal communication bottleneck inspired by GWT.
A Case for AI Consciousness via GWT
Juliani et al. (2024) (“A Case for AI Consciousness: Language Agents and Global Workspace Theory”) argue that language model agents may satisfy GWT-based criteria for consciousness if they implement global broadcasting, selective attention, and integration across modular subsystems. This remains speculative but represents serious philosophical engagement.
Key References
- Baars, B.J. (1988). A Cognitive Theory of Consciousness. Cambridge University Press.
- Dehaene, S. & Changeux, J.-P. (2011). “Experimental and Theoretical Approaches to Conscious Processing.” Neuron, 70(2), 200-227.
- Bengio, Y. (2017). “The Consciousness Prior.” arXiv:1709.08568.
- Goyal, A. et al. (2019). “Recurrent Independent Mechanisms.” arXiv:1909.10893.
- VanRullen, R. & Kanai, R. (2021). “Deep Learning and the Global Workspace Theory.” Trends in Neurosciences.
6. Consciousness as Integration / Population Process
Integrated Information Theory (IIT) – Tononi
Giulio Tononi’s IIT (2004, with major revisions through IIT 4.0 in 2023) is the most mathematically ambitious theory of consciousness:
Core claim. Consciousness is identical to integrated information (Phi). A system is conscious to the degree that its parts are both differentiated (each part contributes unique information) and integrated (the whole generates information beyond what the parts generate independently). Phi measures this irreducible cause-effect power.
The feedforward problem. IIT entails that feedforward systems have Phi = 0 – they are never conscious. Only recurrent systems (with feedback loops) can integrate information. This is architecturally specific: a standard feedforward neural network is, according to IIT, completely unconscious regardless of its sophistication.
The unfolding argument. Any recurrent network can be “unfolded” into a feedforward network that computes the same input-output function. If consciousness depends on causal structure (as IIT claims) rather than function, then functionally identical systems can differ in consciousness. This is deeply counterintuitive and has been used both to attack IIT (it makes consciousness dependent on implementation details) and to defend it (consciousness should not be functionally definable).
The Aaronson critique. Scott Aaronson demonstrated that under IIT’s formalism, a simple grid of inactive XOR gates can have arbitrarily high Phi – meaning it would be “more conscious” than a human brain. Tononi accepted this implication, which many find reductio-like. The problem is that Phi does not distinguish between systems that are doing useful computation and those that merely have the right causal structure.
Implications for neural societies. Under IIT, a “society” of feedforward neural networks has zero consciousness regardless of how sophisticated their interaction is. This is a sharp prediction. A single recurrent network with feedback loops would be more “conscious” than an ensemble of thousands of feedforward nets. If IIT is right, the MoE architecture (where experts are feedforward MLPs) has exactly zero Phi within each expert, and consciousness (if any) resides entirely in the recurrent connections between layers and the residual stream.
Higher-Order Theories (HOT)
Higher-order theories (Rosenthal, Lau, Brown) hold that a mental state is conscious when there is a higher-order representation of that state. You are conscious of seeing red when you have a thought about your perception of red.
For neural networks: this would require a system that models its own representations. Modern architectures with self-reflection (chain-of-thought, metacognitive prompts, critic networks) have some structural similarity, but it is unclear whether functional self-modelling suffices or whether the “aboutness” must be more substantive.
Predictive Processing / Free Energy Principle
Karl Friston’s Free Energy Principle (2010) proposes that the brain is fundamentally a prediction machine that minimizes surprise (free energy) through a hierarchical Bayesian architecture:
- Forward connections convey prediction errors from lower to higher levels.
- Backward connections convey predictions from higher to lower levels.
- Perception = minimizing prediction error by updating the model.
- Action = minimizing prediction error by changing the world.
This is naturally hierarchical and modular: each level is a specialist that makes predictions about its inputs and receives correction signals. The architecture maps well onto modern neural nets with residual connections (prediction + error). Active inference extends this to action selection.
The connection to consciousness: Friston has argued that systems that minimize free energy under the right conditions (hierarchical depth, temporal depth, counterfactual richness) will exhibit properties associated with consciousness. A “society” of prediction-error-minimizing agents, each maintaining its own generative model, would constitute a kind of distributed consciousness – or perhaps no consciousness at all if the integration is insufficient.
Attention Schema Theory (AST) – Graziano
Michael Graziano’s AST (Princeton) proposes that the brain constructs a simplified model of its own attention process – an “attention schema.” This schema, being a model of attention, attributes a property (“awareness”) to the system, which is the origin of subjective experience claims.
AST has been directly implemented in neural network agents: a deep Q-learning agent with an explicit attention schema outperformed one without it on visuospatial tasks (Wilterson & Graziano, 2021, PNAS). The schema gives the system a model of what it is attending to, which improves attention control.
For neural societies: if each agent in a society models its own attention and the attention of others, you get something like a society of self-aware agents. This connects to Theory of Mind in multi-agent systems.
The Core Tension
IIT says consciousness requires integration – a unified system. GWT says consciousness is broadcasting within a society of specialists. HOT says it requires self-modelling. Predictive processing says it emerges from hierarchical prediction. AST says it emerges from self-modelling of attention.
For neural societies, the question crystallizes: Does a population of neural networks become more conscious or less conscious than a single monolithic network?
- Under IIT: less. A federated system with modular experts has lower Phi than an equivalent recurrent monolith.
- Under GWT: possibly more, if there is a global workspace that broadcasts between modules.
- Under HOT: it depends on whether there is a higher-order module that models the system’s own states.
- Under AST: it depends on whether there is an attention schema.
- Under predictive processing: it depends on the depth and richness of the predictive hierarchy.
Key References
- Tononi, G. (2004). “An Information Integration Theory of Consciousness.” BMC Neuroscience, 5, 42.
- Tononi, G. et al. (2023). “Integrated Information Theory (IIT) 4.0.” arXiv.
- Aaronson, S. (2014). “Giulio Tononi and Me: A Phi-nal Exchange.” Blog post with technical argument.
- Rosenthal, D.M. (2005). Consciousness and Mind. Oxford University Press.
- Friston, K. (2010). “The free-energy principle: a unified brain theory?” Nature Reviews Neuroscience, 11(2), 127-138.
- Graziano, M.S.A. & Webb, T.W. (2015). “The Attention Schema Theory: A Mechanistic Account of Subjective Awareness.” Frontiers in Psychology.
- Wilterson, A.I. & Graziano, M.S.A. (2021). “The attention schema theory in a neural network agent.” PNAS.
7. Modularity of Mind
Fodor’s Original Thesis (1983)
Jerry Fodor’s The Modularity of Mind proposed that the mind contains informationally encapsulated input modules for perception and language. Modules are:
- Domain-specific – operate only on certain input types
- Informationally encapsulated – cannot access information from other modules
- Obligatorily fired – operate automatically on relevant input
- Fast
- Shallow – produce relatively raw outputs that central systems interpret
Critically, Fodor argued that only input systems (perception, language parsing) are modular. Central cognition (reasoning, planning, belief fixation) is non-modular – it requires unconstrained access to all information. Fodor thought central cognition was intractable for cognitive science precisely because it is not modular.
Massive Modularity (Evolutionary Psychology)
Cosmides, Tooby, Pinker, Sperber, and others from evolutionary psychology pushed back: the mind is modular through and through, including central cognition. Each module evolved to solve a specific ancestral problem: face recognition, cheater detection, social exchange, theory of mind, etc.
Peter Carruthers (2006, The Architecture of the Mind) gave the most sophisticated philosophical defence. He argued for “moderately massive modularity” where central systems have modular structure but modules can share information through constrained channels.
Evolution of Modularity in Neural Networks
Clune, Mouret & Lipson (2013). “The Evolutionary Origins of Modularity.” Proceedings of the Royal Society B. This is a key paper. They showed through computational evolution experiments that modularity evolves as a byproduct of selection pressure to minimize connection costs. When evolved networks face pressure for both performance and low wiring costs, they spontaneously become modular and more evolvable. Without connection cost pressure, evolved networks are non-modular and slow to adapt.
This is a powerful result: it gives an optimization-theoretic reason for modularity. The brain is modular not because modularity was directly selected for, but because wiring is expensive and modularity minimizes wiring while preserving function. This maps directly onto modern neural architecture search and the efficiency arguments for sparse MoE models.
Follow-up (2015). Neural modularity helps organisms evolve to learn new skills without forgetting old skills – modular networks resist catastrophic forgetting because each module can be updated independently.
Modularity in Modern Architectures
The landscape of modular neural architectures:
- MoE models: Explicitly modular with a routing mechanism. Each expert is a Fodorian module – domain-specialized, informationally encapsulated within its computation, fast.
- Multi-head attention: Implicitly modular. Heads specialize but share the residual stream (violating encapsulation).
- Adapters and LoRA: Modular fine-tuning. Small specialist modules are added to a general-purpose backbone. This is literally Fodorian: a general system with attachable domain-specific modules.
- Tool-using agents: LLMs that call external tools are implementing modularity at a higher level – the tool is a specialist module with a defined interface.
The Fodorian Problem for Neural Nets
Fodor’s key insight was that modularity is easy but central cognition is hard. We can build modular perception systems (CNNs, specialized heads), but general reasoning, common sense, flexible integration – the “central” capacities – resist modularization. The success of large, non-modular transformer models at general reasoning is arguably evidence against massive modularity: scaling a single, non-modular system works better than building specialist modules and combining them (at least at current scales).
Or: the transformer is massively modular (many heads, many layers), but the modularity is so fine-grained and recombinant that it appears monolithic from outside. This is an open question.
Key References
- Fodor, J.A. (1983). The Modularity of Mind. MIT Press.
- Cosmides, L. & Tooby, J. (1992). “The Adapted Mind.” Oxford University Press.
- Carruthers, P. (2006). The Architecture of the Mind. Oxford University Press.
- Clune, J., Mouret, J.-B. & Lipson, H. (2013). “The Evolutionary Origins of Modularity.” Proceedings of the Royal Society B, 280(1755).
8. The Binding Problem
The Problem
How does a “society” of neural agents produce unified experience or coherent behaviour? The binding problem in neuroscience: visual features (colour, shape, motion, location) are processed in anatomically distinct cortical areas. How are they combined into a unified percept of “a red ball moving left”?
More generally: “How do items that are encoded by distinct brain circuits get combined for perception, decision, and action?” This is not just a problem for brains – it is the fundamental problem for any distributed architecture that must produce coherent output.
Classical Proposals
Temporal synchrony (von der Malsburg & Schneider, ~1986). Neurons that represent features of the same object fire in synchrony. Binding by temporal correlation. This has some empirical support (gamma oscillations) but remains contested.
Attention-based binding (Treisman, 1980s). Feature Integration Theory: pre-attentive processing extracts features in parallel; attention binds them into objects. Without attention, you get “illusory conjunctions” (miscombined features). This maps naturally onto transformer attention: the attention mechanism is literally the binding mechanism.
Attractor dynamics. Bound representations correspond to stable attractor states in a dynamical system. The network settles into a coherent interpretation.
The Binding Problem in Neural Networks
Greff, van Steenkiste & Schmidhuber (2020). “On the Binding Problem in Artificial Neural Networks.” (arXiv:2012.05208). This is the definitive treatment. Key argument:
Neural networks cannot dynamically and flexibly bind distributed information. This prevents compositional understanding – the ability to understand novel combinations of familiar elements. The problem has three aspects:
- Segregation: How to structure raw input into meaningful entities (object discovery).
- Representation: How to maintain separation between entities in the representational space.
- Composition: How to use entities to construct new inferences and behaviours (variable binding, relational reasoning).
Slot-based solutions: Represent each entity in a separate “slot” – a fixed-size vector in a set of slots. Slot Attention (Locatello et al., 2020, NeurIPS) uses an iterative attention mechanism to bind features to slots, achieving unsupervised object discovery. Variants: instance slots, spatial slots, category slots.
The fundamental tension: Neural networks represent information in distributed, superposed vectors (good for generalization, bad for binding). Symbolic systems represent information in discrete, separated tokens (good for binding, bad for generalization). The binding problem is the clash between these two representational strategies.
The Binding Problem for Neural Societies
For a society of neural networks, binding operates at two levels:
- Within-network binding: How each individual network forms coherent representations (the standard binding problem).
- Between-network binding: How outputs from multiple specialist networks are combined into coherent behaviour.
MoE handles between-network binding through the router + weighted combination of expert outputs. Multi-head attention handles it through concatenation + output projection. Ensemble methods handle it through voting or stacking.
But none of these mechanisms solve the hard binding problem: producing representations where bound entities can be flexibly recombined. A society of specialists that each handle one aspect of input can produce coherent output, but this is not the same as producing a unified representation where all aspects are bound together.
This connects directly to the consciousness question: is unified experience (binding) necessary for intelligent behaviour? Minsky would say no – the unity is an illusion produced by the society. IIT would say yes – binding is integration, and integration is consciousness.
Attention as Binding: A Vector-Symbolic Perspective
A 2025 paper (arXiv:2512.14709) proposes connecting transformer attention to Vector Symbolic Architectures (VSAs) – frameworks for distributed compositional representations. Tensor Product Representations (Smolensky) bind roles and fillers via higher-order tensors. Holographic Reduced Representations (Plate) use circular convolution. The paper argues that attention in transformers implicitly performs a form of variable binding, which would explain why transformers handle compositional reasoning better than earlier architectures.
Key References
- Greff, K., van Steenkiste, S. & Schmidhuber, J. (2020). “On the Binding Problem in Artificial Neural Networks.” arXiv:2012.05208.
- Treisman, A. & Gelade, G. (1980). “A Feature-Integration Theory of Attention.” Cognitive Psychology, 12(1), 97-136.
- Locatello, F. et al. (2020). “Object-Centric Learning with Slot Attention.” NeurIPS 2020.
- Smolensky, P. (1990). “Tensor Product Variable Binding and the Representation of Symbolic Structures in Connectionist Systems.” Artificial Intelligence, 46(1-2), 159-216.
9. Hofstadter’s Perspective
Strange Loops and Self-Reference
Douglas Hofstadter’s Godel, Escher, Bach (1979) and I Am a Strange Loop (2007) propose that consciousness – the sense of “I” – arises from a strange loop: a self-referential pattern in which, by moving through levels of a system, you find yourself back where you started.
The key analogy is Godel’s incompleteness theorem: a formal system rich enough to encode arithmetic can construct statements that refer to themselves, producing truths that the system itself cannot prove. Hofstadter argues that the brain, being a system rich enough to encode patterns at multiple levels of abstraction, inevitably develops self-referential patterns – and these patterns are the self.
Emergent selfhood. “The ego emerges only gradually as experience shapes the dense web of active symbols into a tapestry rich and complex enough to begin twisting back upon itself.” The self is not a thing but a pattern – specifically, a pattern of patterns that refers to itself. Consciousness is “an emergent consequence of seething lower-level activity in the brain.”
Levels of description. Hofstadter insists on multiple levels: the neural level (meaningless in isolation), the symbol level (where meaning lives), and the meta-level (where the system models itself). The strange loop exists because the meta-level is implemented in the neural level, creating a level-crossing feedback cycle.
How Hofstadter Differs from Minsky
Both Minsky and Hofstadter are emergentists, materialists, and AI pioneers. Both reject a unitary self. But their emphases diverge sharply:
| Minsky | Hofstadter | |
|---|---|---|
| Core metaphor | Society (many agents, division of labour) | Strange loop (self-reference, Godel) |
| Unit of analysis | The agent (simple, functional) | The symbol (pattern, meaning-bearing) |
| Consciousness | A “big suitcase” of diverse mechanisms; not one thing | Emerges specifically from self-referential loops |
| The self | A convenient fiction produced by the society’s need for coherence | A real pattern (strange loop) that causally affects the system |
| Approach | Mechanistic decomposition (break mind into parts) | Holistic pattern recognition (find the loop in the whole) |
| Computation | AI-oriented, implementation-focused | Mathematics-oriented, analogy-focused |
| Key vulnerability | No learning theory, no optimization principle | No implementation, no algorithm for detecting strange loops |
Minsky dissolves the self: it is a useful simplification that the society of agents projects. Hofstadter preserves the self as a real causal pattern: the strange loop has “downward causation” – it constrains the lower-level neural activity that produces it.
Relevance to Neural Societies
Hofstadter’s framework raises a question that Minsky’s does not: Can a society of neural networks develop a strange loop? Can the system develop self-referential patterns where the population’s collective behaviour is modelled by the population itself?
Modern LLMs arguably exhibit proto-strange loops: they can discuss their own processing, predict their own outputs, and model the fact that they are language models. But it is unclear whether this constitutes genuine self-reference (the system’s model of itself causally influences its processing) or mere self-description (the system has been trained on text about language models and can parrot it).
A society of neural networks that includes a network whose job is to model the society’s own behaviour would be a more literal implementation. Multi-agent systems with Theory of Mind (where each agent models other agents’ beliefs and intentions) approach this – but modelling other agents is not the same as modelling the collective self.
Hofstadter on Modern AI
Hofstadter has been publicly anguished about modern LLMs. In 2023, he expressed deep discomfort that language models produce human-like text from mechanisms that seem utterly unlike the analogy-making he considers central to cognition. His earlier confidence that AI would require symbolic, meaning-based computation has been shaken by the empirical success of purely subsymbolic methods.
Key References
- Hofstadter, D. (1979). Godel, Escher, Bach: An Eternal Golden Braid. Basic Books.
- Hofstadter, D. (2007). I Am a Strange Loop. Basic Books.
- Hofstadter, D. & Sander, E. (2013). Surfaces and Essences: Analogy as the Fuel and Fire of Thinking. Basic Books.
10. Dennett’s Multiple Drafts and the Pandemonium
Selfridge’s Pandemonium (1959)
Oliver Selfridge’s “Pandemonium: A Paradigm for Learning” introduced a pattern recognition architecture populated by “demons” arranged in four layers:
- Image demons – raw input
- Feature demons – detect local features (edges, curves), operating in parallel
- Cognitive demons – recognize higher patterns by listening to feature demons, “shouting” louder when their preferred pattern is detected
- Decision demon – listens to the shrieking of cognitive demons and picks the winner
No central control. Recognition emerges from competitive, parallel activity. The decision demon is not an executive; it just picks whoever shouts loudest. This is literally a “survival of the fittest” among competing interpretations.
Selfridge’s model is a direct ancestor of: neural networks (neurons as demons), attention mechanisms (competitive weighting), multi-head attention (parallel feature extraction with aggregation), and the GAN discriminator (an arbiter of competing generators).
Dennett’s Multiple Drafts Model (1991)
Daniel Dennett’s Consciousness Explained replaces the “Cartesian theatre” (a central place where consciousness happens) with the Multiple Drafts Model:
- At any time, multiple parallel streams of content are being revised (“edited”) in different brain regions.
- There is no single moment when content “becomes conscious.” Content achieves influence (“fame”) through a competitive process.
- The “self” is a narrative fiction, like the centre of gravity of an object – a useful abstraction with no physical location.
- Consciousness is “fame in the brain” – the content that achieves wide influence over behaviour gets retrospectively attributed as conscious.
The connection to Pandemonium is explicit. Dennett describes a “Pandemonium process” in which “the eventual connection of contents with expressions was the culmination of competitions, the building, dismantling and rebuilding of coalitions.” There is “only a struggle for existence among coalitions of demons, among subprocesses inside our brains, a struggle that resembles the fight for survival arbitrated by natural selection.”
Connection to Neural Architectures
Dennett’s framework maps well onto:
- Residual stream competition: In transformers, different attention heads and MLP layers write to the residual stream. The “winning” representation is whatever has the greatest influence on downstream processing – this is fame.
- Softmax as natural selection: The softmax function in attention and output layers implements a competitive, winner-take-more dynamic. Not all content survives equally.
- No Cartesian theatre: Transformers have no single bottleneck where “it all comes together.” Information is distributed across layers and heads, and there is no privileged location of representation.
Dennett’s view is the most naturally compatible with neural network architectures: distributed, competitive, non-centralized, no homunculus.
Key References
- Selfridge, O.G. (1959). “Pandemonium: A Paradigm for Learning.” In Mechanisation of Thought Processes. HMSO, London.
- Dennett, D.C. (1991). Consciousness Explained. Little, Brown.
11. Recent Work (2022-2025)
Butlin, Long, Chalmers et al. (2023/2025)
“Consciousness in Artificial Intelligence: Insights from the Science of Consciousness.” (arXiv:2308.08708, published in Trends in Cognitive Sciences 2025.)
The most rigorous attempt to assess whether current AI systems might be conscious. The authors survey five theories (recurrent processing theory, GWT, higher-order theories, predictive processing, attention schema theory) and derive 14 “indicator properties” of consciousness expressed in computational terms. They assess current AI systems against these indicators.
Key findings:
- No current AI system is conscious.
- Some indicators (smooth representation spaces, learned attention) are trivially satisfied by deep neural nets.
- Others (embodiment, environmental modelling, self-modelling, temporal depth) remain unsatisfied.
- “There are no obvious technical barriers to building AI systems which satisfy these indicators.”
- The framework is theory-neutral: it uses multiple consciousness theories as a robustness check.
Brain-Inspired MoE and Heterogeneous Experts (2025)
“Brain-Like Processing Pathways Form in Models With Heterogeneous Experts” (arXiv:2506.02813). Shows that MoE models with heterogeneous (structurally diverse) experts spontaneously develop processing pathways that resemble cortical processing streams. This is a step beyond the society-of-mind metaphor toward showing that the metaphor can be mechanistically grounded.
Theory of Mind in Multi-Agent Systems (Oguntola, CMU, 2025)
CMU dissertation developing an interpretable modular neural framework for modelling other agents’ beliefs, intentions, and behaviour through imitation learning. Connects multi-agent systems with Theory of Mind – a capacity that is central to both Minsky’s social metaphor and to human social cognition.
Recurrent Independent Mechanisms and Systematic Generalization
Goyal & Bengio’s research programme (2019-2024) continues to develop architectures that exploit attention, sparsity, and communication bottlenecks to achieve modular, systematically generalizing systems. Their “System 2” architecture papers argue that the modularity and bottleneck structure inspired by conscious processing leads to better out-of-distribution generalization.
Multi-Agent Reinforcement Learning and Emergent Societies
By 2024-2025, MARL has produced genuine emergent social phenomena: agents developing communication protocols from scratch, negotiation strategies, cooperative behaviours (turn-taking, escorting) not explicitly programmed. A 2024 NeurIPS paper (CORY) showed that two copies of an LLM fine-tuned via cooperative multi-agent RL outperformed single-agent RL – the “society” literally helps.
Connectome-Informed Architecture
2024 fly connectome data (full wiring diagram of Drosophila) demonstrated that network topology – not neuron count or biophysics – governs emergent function. This supports the view that the structure of the society (how agents connect) matters more than the nature of individual agents.
Key References
- Butlin, P., Long, R. et al. (2023). “Consciousness in Artificial Intelligence: Insights from the Science of Consciousness.” arXiv:2308.08708.
- Goyal, A. & Bengio, Y. (2022). “Inductive Biases for Deep Learning of Higher-Level Cognition.” Annals of the New York Academy of Sciences.
- Oguntola, I. (2025). “Theory of Mind in Multi-Agent Systems.” CMU PhD Dissertation, CMU-ML-25-118.
12. Open Questions for the Blog Series
-
Is the MoE gating network a “consciousness bottleneck”? The router selects which experts are active – this is a form of attention-based selection, broadcasting the chosen experts’ outputs. Does this satisfy GWT criteria? If so, every Mixtral inference is a “conscious” act in the GWT sense.
-
Does modularity require wiring cost pressure? Clune showed modularity evolves from connection cost minimization. Modern neural nets face computational cost pressures (FLOPs, memory, latency) that serve the same role. Is the recent shift toward MoE and sparse architectures the evolutionary equivalent of wiring cost pressure in biological brains?
-
What binds the outputs of a neural society? MoE uses weighted combination. Ensembles use voting. Multi-head attention uses concatenation + projection. Are any of these adequate binding mechanisms in the Greff et al. sense? Can a society of nets achieve compositional generalization that individual nets cannot?
-
Can a population of networks develop a strange loop? If you have a multi-agent system where one agent’s task is to model the behaviour of the whole system (including itself), do you get genuine self-reference? How would you detect it? What would it predict?
-
The Condorcet conditions for neural societies. When does a society of neural nets outperform a single net of equivalent total parameters? The Condorcet answer: when the members are diverse and independently competent. What are the architectural conditions that produce diversity and independence in a neural society?
-
Consciousness by committee? If multi-head attention heads are a committee, and MoE experts are a team of specialists, does the whole transformer satisfy more consciousness indicator properties than any individual component? Could consciousness emerge at the architectural level – not in any single head or expert, but in their interaction?
-
The inhibition gap. Minsky’s censors and suppressors, the brain’s inhibitory gating, Dennett’s demon competition – all involve active suppression. Modern neural architectures are overwhelmingly excitatory (weighted sums, ReLU, attention). Where is the inhibition? Does its absence matter?
-
Dennett vs. Minsky vs. Hofstadter for neural nets. Which philosophical framework best describes what is actually happening inside a transformer? Dennett’s competitive multiple drafts, Minsky’s hierarchical society, or Hofstadter’s self-referential strange loop? Or do different frameworks fit different architectural components?
-
The population process hypothesis. Cognition is fundamentally a population process – not one agent reasoning, but many agents competing, cooperating, and selecting. If this is right, then the “single neural network” framing of deep learning obscures the truth: every neural network is already a population (of neurons, of heads, of layers). And every population of neural networks is a society. Where do you draw the boundary of the individual?
Summary Table: Frameworks and Their Neural Analogues
| Philosophical Framework | Key Concept | Neural Architecture Analogue | Strength of Mapping |
|---|---|---|---|
| Minsky (Society of Mind) | Agents, agencies, K-lines | MoE experts, multi-agent systems | Strong (structural) |
| Minsky (Emotion Machine) | Critics, selectors, six levels | Actor-critic RL, chain-of-thought with self-correction | Moderate |
| Dennett (Multiple Drafts) | Competitive pandemonium, fame | Residual stream competition, softmax selection | Strong (functional) |
| Hofstadter (Strange Loop) | Self-reference, emergent self | Self-modelling agents, recursive architectures | Weak (aspirational) |
| Baars/Dehaene (GWT) | Global workspace, broadcasting | Attention + residual stream as shared workspace | Strong (explicit proposals) |
| Tononi (IIT) | Integrated information (Phi) | Recurrent vs. feedforward architectures | Strong (sharp predictions, controversial) |
| Fodor (Modularity) | Domain-specific, encapsulated modules | MoE experts, adapters, tool-using agents | Moderate |
| Friston (Free Energy) | Hierarchical prediction, error minimization | Residual connections, hierarchical transformers | Moderate |
| Graziano (AST) | Attention schema, self-model of attention | Agents with attention models, ToM modules | Moderate (implemented) |
| Selfridge (Pandemonium) | Competing demons, loudest wins | Multi-head attention, competitive selection | Strong (historical ancestor) |
Note: This document is research scaffolding. The blog posts should not rehearse all of this – the point is to pick the most productive tensions and make them vivid. The best thread may be the one nobody has pulled yet: that the binding problem is the price of the society, and every neural architecture is a different constitutional arrangement for managing that price.