Creativity Isn't Probable
Aug 15, 2025
A few years ago, while scrolling through Instagram, I stumbled across a song that stopped me in my tracks. The first lyric is:
Once upon a time, I was a baker.
Pause for a second and think about how weird that sentence is. This person was a baker? What does that mean?
With LLM weights, we can quantify how unlikely the sentence is. “Once upon a time” is a common way to start. And then, “I was a” is a pretty logical way to follow that. For the first 7 words, each is among the top 3 most likely options.
Then we take a big turn. “Baker” is not a likely completion. It’s 445th on the list, with a probability of 0.037%. After a string of 7 predictable words- each in the top 3 options- we jump down to the 445th most likely option:

The chances of generating this completion in inference are low (1 in 2700, assuming a temperature of 1.0). If it happened, we'd call it a hallucination and move on.
But the artist chose it, so our model must be missing something! Let’s see where we go from here:
Once upon I was a baker
And everybody was impressed
But I didn’t need approval because
I already knew I was the best
Every dish was a masterpiece
It all tasted like heaven
But then unfortunately…
I turned 7.
Woah. This song isn’t about a baker who has amnesia or lost their inspiration or something. It’s about growing up.
Despite the unlikely start, there's universality in the resolution. Everyone is a “baker” in early childhood. You help your mom make a cake. Your parents and other adults praise your incredible baking abilities, and — being 5 — you believe them. But when you get older, you realize you aren't much help at all! There’s quite a lot to baking and, really, your mom is doing most of the work.
The experience repeats throughout life (and indeed, the artist goes to more examples in the next verse: "once upon a time I was a painter...", etc). Maybe you were the best chess player in your town, but lost in a statewide tournament. Maybe you were the best athlete on the rec team, but got cut from a select league. Maybe you were valedictorian in high school, but didn't get straight As at an elite university. Whatever the case, you've been through something similar.
It's a brilliant lyric. The initial idea seems random, unlikely. But when you play it out, you find something universal.
19o4
Imagine you go back in time to 1904. It's the year before Einstein's annus mirabilis. You train a model on all the published text up to that point. In the spirit of OpenAI's recent releases (o4, o4-mini, etc), I’ll call our model 19o4.
What would happen if you gave 19o4 the following prompt:
You are on a train moving at 99% the speed of light and measure the speed of a beam of light moving parallel to the train. How fast will that beam of light be moving?
You'd get an answer consistent with classical mechanics. The existing corpus of published text would make these answers most probable. There would be lots of textbooks and articles explaining classical mechanics, but none yet on special relativity.
So what about the special relativity completion? Would this be in the weights at all?
It would because there would be training text to support that completion. Experimental evidence was pointing in this direction, and other universal constants had been established. It would be a lot less probable than the classical mechanics answer, but it would be there, hidden in the lower probability space.
In other words, it'd be a lot like “once upon a time I was a baker”.
It would seem random; unlikely. But if you played it out, you'd find something universal.
Recognition vs. Generation
There's a famous football play buried deep in the NFL archives. Legendary coach Bill Walsh describes it in "The Score Takes Care of Itself":
Early in the second half of our game with the Oakland Raiders, Bengals tight end Bob Trumpy (later a well-known announcer) came out of the huddle and lined up on the wrong end of the line of scrimmage—the left instead of the right side, as the play called for.
Trumpy recognized his mistake almost immediately and tried to correct it by sliding over to the right side before the ball snap. The Raiders were utilizing a complex pass defense at the time, so when they saw Bob shifting from one end of the line of scrimmage to the other—legal, but not done—all hell broke loose.
Oakland defensive backs began frantically flapping their arms and screaming, running around and creating havoc as they tried to react to the bizarre movement of Cincinnati’s wandering tight end. Three of them actually collided in the middle of the field. The whole scene was kind of funny, although nobody was laughing on either bench. We lost yardage on the play, and when Bob trotted to the sidelines with a sheepish look on his face, he muttered to head coach Paul Brown, “Sorry, Coach. It won’t happen again.” He was wrong.
When we got back to Cincinnati and the assistant coaches looked at the Oakland game film, Bill Johnson, the offensive line coordinator, ran Trumpy’s play over and over for us on the projector. At first the room was filled with laughter as we saw the mayhem Trumpy’s mistake had precipitated. One man, however, wasn’t laughing—Bill Johnson. He was thinking.
Finally, he stopped rerunning the play, turned to us, and asked, “Fellas, what would happen if we put Trumpy in motion intentionally and worked plays off it?”
There was silence in the room; everyone sitting in the darkness recognized the interesting possibilities this might offer. In fact, I was awake most of that night thinking up ideas that would let us capitalize on Bill Johnson’s revelation, his crazy idea of how to turn a lemon into lemonade, an accident into an asset.
Putting the tight end in motion caught on quickly around the NFL because it created new problems for the defense...
I've shared this passage with some friends. The response is generally "wow, I can't believe no one had thought of that yet!" At first glance, it is surprising. As best I can tell, the move had been legal for the entire history of football. Teddy Roosevelt's famous 1906 rules conference, which created "modern football", established that only one player could be in motion before the snap. It took 70 years under the new rules before a coach saw the potential of using the tight end as that player. Even then, the epiphany didn't come through thoughtful planning or careful consideration. They only noticed after it happened by accident!
This underlies a frustrating truth of creativity: breakthroughs often come from mistakes. Penicillin, Coca-Cola, GLP-1 antagonists, potato chips: all mistakes that worked.
And there's another truth under that. Recognition is easier than generation. Play me a beautiful tune and I can tell you it's great. But put me in front of a piano and nothing comes out. Maybe that's what makes originality so valuable. Although very few people produce it, everyone recognizes it.
The Originality Algorithm
So what's the point? Well, a common knock on LLMs is that they never produce anything creative; they only repeat what they've encountered in the past. But maybe that's not a fundamental limitation. Maybe it's because we are only asking them for the probable completions. If we want originality, we'll need a different algorithm.
Fortunately, "Once upon a time I was a baker" and 19o4 show us that there are great ideas lying dormant in the LLM weights. The problem is, we can't get them with a probabilistic walk. Not efficiently, at least. If we want originality, we'll need a different algorithm.
I suspect this will manifest in the following way. Big labs with lots of computers will ask LLMs to solve hard problems in verifiable domains, like like math, physics, and computing. They'll run the program in a continual loop with instructions not to retry ideas that didn't work. Naturally, the programs will drift into lower and lower in the probability space. They'll start with "once upon a time there was a princess", but end up at "once upon a time I was a baker". If the verification works, they'll find many brilliant original ideas. We'll get many breakthroughs with this approach in the next 5 years.
But you don't need unlimited compute. The open-source LLM weights give you everything you need to seek new, original ideas.. Try the following:
Pick a hard, unsolved problem in a domain you know well. (i.e. "write a song")
Construct a prompt that gets at the core of the problem. (i.e. "write a song starting with 'once upon a time I was a __")
Load your prompt into an LLM and look at through the most probable next tokens. Skip the first few - those won't be original. Take the others seriously. What do you find?
I did this for a problem I care about. My prompt is: "the next big consumer tech company will be an AI product that helps people __". I'm building an autocomplete app, so my answer is "type". That was 353rd on the list, with a probability of 0.025%. That's great. In Zero-to-One, Peter Thiel writes that all great startups are built around a secret. The 353rd rank suggests my idea remains a fairly good secret, at least as of the training cutoff date. Unfortunately, I suspect my answer will be higher on the list with newer models. AI coding tools have made the benefits of AI-assisted typing obvious to anyone who's paying attention. I don't have much time before my secret becomes commonplace. It may already be too late.
There are no guarantees of success. Think of that room of assistant coaches, watching Bob Trumpy's mistake over and over again. Everyone in that room had dedicated their lives to football and risen to the top of their field. Only one, Bill Johnson, recognized the potential (including the author, who's considered one of the most creative coaches in history!). But hearing an idea can be really helpful. When Bill pointed out the opportunity, everyone immediately understood it. Recognition is easier than generation.
More importantly, it's much preferable to the alternatives. With open weights, we don't need to wait for divine inspiration or for fortuitous mistakes, which can take decades. And we don't need to alter our biology for creative new ideas, as many artists do. Burning-Man attendees will disagree, but I prefer machine hallucinations to biological ones.