In 1889, a French doctor named Francois-Gilbert Viault climbed down from a mountain in the Andes, drew blood from his arm and inspected it under a microscope. Viault’s red blood cells, which ferry oxygen, had surged 42 per cent. He had discovered a mysterious power of the human body: when it needs more of these crucial cells, it can make them on demand.
In the early 1900s, scientists theorised that a hormone was the cause. They called the theoretical hormone erythropoietin, or “red maker” in Greek. Seven decades later, researchers found actual erythropoietin after filtering 670 gallons of urine.
And about 50 years after that, biologists in Israel announced they had found a rare kidney cell that makes the hormone when oxygen drops too low. It’s called the Norn cell, named after the Norse deities who were believed to control human fate.
It took humans 134 years to discover Norn cells. Last summer, computers in California discovered them on their own in just six weeks.
The discovery came about when researchers at Stanford University, US, programmed the computers to teach themselves biology. The computers ran an artificial intelligence program similar to ChatGPT, the popular bot that became fluent with language after training on billions of pieces of text from the Internet. But the Stanford researchers trained their computers on raw data about millions of real cells and their chemical and genetic makeup.
The researchers did not tell the computers what these measurements meant. They did not explain that different kinds of cells have different biochemical profiles. They did not define which cells catch light in our eyes, for example, or which ones make antibodies.
The computers crunched the data on their own, creating a model of all the cells based on their similarity to each other in a vast, multidimensional space. When the machines were done, they had learned an astonishing amount. They could classify a cell they had never seen before as one of more than 1,000 different types. One of those was the Norn cell.
That’s remarkable, because nobody ever told the model that a Norn cell exists in the kidney,” said Jure Leskovec, a computer scientist at Stanford who trained the computers.
The software is one of several new AI-powered programs, known as foundation models, that are setting their sights on the fundamentals of biology. As the models scale up, with ever more laboratory data and computing power, scientists predict that they will start making more profound discoveries.
The Stanford team got into the foundation-model business after helping to build one of the biggest databases of cells in the world, known as CellXGene. Beginning in August, the researchers trained their computers on the 33 million cells in the database, focussing on a type of genetic information called messenger RNA. They also fed the model the 3D structures of proteins, which are the products of genes.
From this data, the model — known as Universal Cell Embedding or UCE — calculated the similarity among cells, grouping them into more than 1,000 clusters according to how they used their genes. The clusters corresponded to types of cells discovered by generations of biologists.
UCE also taught itself some important things about how the cells develop from a single fertilised egg. For example, UCE recognised that all the cells in the body can be grouped according to which of three layers they came from in the early embryo.
“It essentially rediscovered developmental biology,” said Stephen Quake, a biophysicist at Stanford who helped develop UCE.
The model was also able to transfer its knowledge to new species. Presented with the genetic profile of cells from an animal that it had never seen before — a naked mole rat, say — UCE could identify many of its cell types.
“You can bring a completely new organism — chicken, frog, fish, whatever — you can put it in, and you will get something useful out,” Leskovec said.
After UCE discovered the Norn cells, Leskovec and his colleagues looked in the CellXGene database to see where they had come from. While many of the cells had been taken from kidneys, some had come from lungs or other organs. It was possible, the researchers speculated, that previously unknown Norn cells were scattered across the body.
Dr Katalin Susztak, a physician-scientist at the University of Pennsylvania, US, who studies Norn cells, said that the finding whetted her curiosity. “I want to check these cells,” she said. She is sceptical that the model found true Norn
cells outside the kidneys, since the erythropoietin hormone hasn’t been found in other places. But the new cells may sense oxygen as Norn cells do. In other words, UCE may have discovered a new type of cell before biologists did.
Leskovec said that the models were improving as scientists trained them on more data. Scientists are also developing tools that let foundation models combine what they’re learning on their own with what flesh-and-blood biologists have already discovered. The idea would be to connect the findings in thousands of published scientific papers to the databases of cell measurements.
With enough data and computing power, the scientists say, they may eventu-
ally create a complete mathematical representation
of a cell.
Quake suspects that foundation models will learn not just about the kinds of cells that currently reside in our bodies but also about the kinds of cells that could exist.
“I think these models are going to help us get some really fundamental understanding of the cell, which is going to provide some insight into what life really is,” Quake said.
NYTNS