On not being a language model

I was listening to Dwarkesh Patel’s interview with Andrej Karpathy about how language models actually get trained, and Karpathy said something offhand that wouldn’t let me go. I asked my agent to remind me to dig into it later. By the time I got to my desk the reminder was already there waiting.

That was a few months ago. I have been pulling on the thread ever since, and it has gone in stranger directions than I expected.

The question that caught me was simple enough. How does a language model develop its personality? When you ask GPT something and it sounds like a person, what is actually producing that? And the follow-on question I couldn’t shake: whatever that thing is, is it the same thing happening when I, a person, decide what to say to someone?

It started in machine learning and ended in cell biology and neuroscience, which was not a route I planned. The conclusion I reached is that we are not much like language models after all. What we have and they don’t is the lived experience of contradiction.

The shape of how a model gets built

What Karpathy was describing on the podcast wasn’t surprising to anyone who has been following AI for a while. The way he laid it out made the shape obvious. Modern language models aren’t single artifacts. They are stacks. A four-stage process where each stage shapes what the next stage can be.

The first stage is pretraining. You take a model that is pure noise, mathematical parameters initialized to nothing in particular, and you feed it most of the internet. Books, Wikipedia, code repositories, message boards, transcripts. For months on thousands of GPUs the model plays one game: given some text, predict the next word. Nobody tells it what is true or good. It just learns the shape of human expression. Grammar, facts, style, the way people argue with each other online.

What comes out of this is what’s called a base model. It can complete sentences. It can hold a conversation in form. But it has no particular orientation. Karpathy puts it plainly: “base models are not assistants. they just want to complete internet documents.”

The second stage is supervised fine-tuning. Humans get involved. Contractors are hired to write thousands of examples of what a good assistant response looks like. The model is trained on these examples. It starts to learn manners. It learns that when someone asks a question, you answer the question instead of continuing it like a forum post would.

The third stage is RLHF, reinforcement learning from human feedback. This is the part Karpathy has been most interesting about lately. Humans look at pairs of model outputs and pick which one they prefer. A second model, the reward model, learns to predict these preferences. Then the language model is trained against this reward model. Outputs that score higher get reinforced. Outputs that score lower fade. Karpathy has called RLHF “just barely RL,” meaning it is a thin approximation of real reinforcement learning. He also called it “sucking supervision bits through a straw.” A single thumbs up at the end of a paragraph somehow has to back-propagate across hundreds of word choices to figure out which ones earned it. It works, but barely.

The fourth stage is inference. The model is done training and sits on a server. A user sends a message. Behind the scenes a system prompt has already told the model who it is in this conversation. The user’s question arrives. The model generates a response one word at a time, and every word is shaped by what it learned in stages one through three, plus the prompt context. All of that collapses into the response you see.

So that is the model. Four layers of pressure applied in sequence. Capacity, then shaping, then alignment, then situation, then output. Hold this shape in your head. It comes back.

A four-tier cascade of narrowing bands funneling down to a single point

Four layers of pressure, each narrowing what the next can be, collapsing into a single response.

Same shape, different substrate

What was nagging me wasn’t the mechanics. The mechanics are well documented now. What was nagging me was that this looked like the shape of how a person becomes a person.

The base model is what you arrive with, the species level inheritance, the things you don’t choose. Fine-tuning is the upbringing, the slow shaping by parents and culture into someone with manners and reflexes. RLHF is the moral formation, the part where what others reward and punish becomes what you reward and punish in yourself. Inference is the moment you are actually in, the situation calling for a response.

That was the part I couldn’t drop. Same shape, different substrate. If a person and a model develop through structurally similar processes, then either the analogy is too loose to mean anything, or there is something real underneath it. I wanted to know which.

So I pulled the Darwin thread. Most people remember him for survival of the fittest, but The Descent of Man (1871) is where he tried to figure out where morality comes from. Why social animals care about each other, why we feel guilty, where a conscience originates if no god installs it.

His answer sounds wrong until you sit with it: any social animal, given enough intelligence, develops a moral sense by default. Not a separate thing that gets selected for, but what necessarily emerges when you combine social instincts with the cognitive horsepower to imagine consequences and remember what others did. And it is stable at the level of the group, not the individual. Communities with more sympathetic members outcompete the ones without.

Which means morality has the same architecture as the training stack: a base capacity, then environmental shaping, then internalized alignment, then the situation where you finally have to decide. Two completely different systems, the same shape.

Coincidence, or something deeper? I went looking for the mechanism, for whether something in the actual biology made this kind of layered development necessary.

Going smaller, into the cell

The biology turned out to be more interesting than I expected.

Most people, if you ask them, will tell you that genes are a program. You inherit a set of instructions, the instructions execute, and you become whatever you become. This is the picture I had in my head for most of my life.

It’s wrong.

DNA on its own does nothing. It is a library. The instructions in your genes only do something if they get read. The reading is done by RNA, which copies a section of DNA and carries it to the protein factories of your cells. That is how genetic instructions become physical body. But here is the part I didn’t know: in any given cell, at any given moment, most of your genes are silent. Which ones are active depends on a layer of chemical switches sitting on top of the DNA without changing the DNA itself.

This is called epigenetics. The switches are mostly two kinds. The first is DNA methylation, where small methyl groups attach to specific spots and effectively silence the gene below them, like taping over the words in a book. The second is histone acetylation. DNA isn’t a free floating ribbon, it is wrapped around protein spools called histones, and when acetyl groups attach to these spools they loosen and the genes underneath become readable. When the acetyl groups are removed, the spools tighten and the genes go quiet.

These switches respond to the world. To stress, to trauma, to diet, to sleep, to whether you feel safe. There is solid research showing that adverse childhood experiences change the methylation patterns of stress response genes in the brain, and those changes persist into adulthood. Some changes get inherited by the next generation. The science is still being worked out, but the basic picture is that you are not running a fixed program. You are a responsive system that constantly rewrites which parts of its own code are active, based on what is happening to you.

DNA with methyl tags on the left, DNA wound around histone spools on the right

Methylation silences a gene; histone acetylation loosens the spool and lets it be read. Switches sitting on top of the code, not in it.

I was talking through this with my agent when something clicked. I said something like, this is the seed thing. A seed has all the genetic instructions for an entire plant inside it. You can throw it on concrete and nothing happens. You can put it in dry dirt and nothing happens. The seed already had everything. What the environment provided wasn’t information. It was the conditions to express the information that was already there.

Same code. Different conditions. Different plant.

That is what is happening in you right now, at the cellular level, in response to what you are paying attention to and what you ate this morning and whether you slept and whether you feel safe. The genome is the library. The environment is the librarian. The you that exists in this moment is whichever books are currently checked out.

So now I had a mechanism. The pattern wasn’t a coincidence. Layered, environmentally responsive development is how biology actually builds organisms at the molecular level. The LLM training stack is following the same logic, just in silicon. The seed becomes the plant, the model becomes the assistant, the social animal becomes the moral being, all for the same underlying reason.

But the more I sat with this, the more it felt like I was missing something. The picture said humans and language models were structurally similar. And that didn’t match my actual experience of being a person. There was something the analogy was leaving out. So I kept going.

What I think the actual difference is

The model and the human share a developmental architecture. They both go through layered, sequential pressures, and the pattern is real enough that I stopped trying to talk myself out of it.

But the model is optimized toward a reward function. The training process pushes it in one coherent direction. Internally, the model doesn’t experience contradiction. Its objectives, by the time training is done, have been collapsed into a single unified policy. It knows what to do in any given situation, given how it was trained.

A human life is not like that. It is full of contradictions that never resolve.

You want to be honest and you want to protect the people you love. You want to pursue something ambitious and you want to be present for your kid. You want pleasure now and you want health later. Your body wants rest and sugar and the comfort of the familiar, and the part of you that wants meaning wants something else entirely. These contradictions don’t go away, and they aren’t problems to be solved. They are the conditions of being a person.

This is what I kept circling back to. Character, the kind that makes someone specifically themselves rather than a generic instance of the species, forms inside contradiction, not in its absence. When you are forced to choose between two things you both want and can’t have both, the choice doesn’t just produce a behavior, it becomes part of you. Cortisol floods your bloodstream, your epigenome shifts, your emotional learning updates. The next time you face something similar you are a slightly different person, because the contradiction reshaped you on the way through.

A figure at the center of concentric rings, ghosted at different ages, arrows flowing both ways

Genetic, epigenetic, emotional, situational. The layers feed back on each other, and the person at the center keeps changing.

The model doesn’t have this. Its weights are frozen after training. Every conversation starts from the same state and ends without leaving a trace on the model itself. It doesn’t develop, it just runs.

You develop every day, whether you want to or not. The conversation you had this morning rewrote you a little. So did the choice you made on the way to work, and the thing you almost said but didn’t. You are not the person you were when you woke up, and you won’t be the same one when you go to sleep. What drives this isn’t optimization. It is the friction of being unable to want only one thing.

The seed knows what it is becoming. The model knows what it was trained to do. We don’t, and that is not a gap waiting to be closed.

What I’m still chasing

There’s an obvious rabbit hole here, so I’ll name it before someone else does. The frozen-weights distinction is a fact about today’s models, not a law of nature. People are already building systems that learn continuously and carry memory across conversations. If a model could rewrite itself the way your epigenome does, does the gap close?

I don’t think it closes all the way. You can build a model that updates. What I can’t picture being engineered in is the contradiction — wanting two things that can’t both be had, the body voting against the mind. A system you build to optimize will resolve its conflicts, because resolving conflict is what optimizing is. We don’t resolve ours. We carry them. Whether that carrying can be built, or whether it has to be lived, is what I’m chasing next.

I started this trying to figure out how a language model develops a personality. I ended up understanding something about my own.

The shape of how a model gets built

Same shape, different substrate

Going smaller, into the cell

What I think the actual difference is

What I’m still chasing

Further reading