MY KOLKATA EDUGRAPH
ADVERTISEMENT
regular-article-logo Thursday, 09 January 2025

Baby Steps

Others claim that there are features built into the human mind that have shaped the forms of all language and are crucial to our learning

Oliver Whang Published 13.05.24, 07:23 AM
Brenden Lake and Tammy Kwan with their children Logan and Luna, whose pink hat has a camera that records things from her point of view

Brenden Lake and Tammy Kwan with their children Logan and Luna, whose pink hat has a camera that records things from her point of view

We ask a lot of ourselves as babies. Somehow, we must grow from sensory blobs into mobile, rational, attentive communicators in just a few years. There has been much research and heated debate around how babies accomplish this. Some scientists have argued that most of our language acquisition can be explained by associative learning, as we relate sounds to sensibilia, much like dogs associate the sound of a bell with food. Others claim that there are features built into the human mind that have shaped the forms of all language and are crucial to our learning. Still others contend that toddlers build their understanding of new words on top of their understanding of other words.

Tammy Kwan and Brenden Lake delivered blackberries from a bowl into the mouth of their one-year-old daughter. Luna was dressed in pink leggings and a pink tutu, with a silicone bib around her neck and a soft pink hat on her head. A lightweight GoPro-type camera was attached to the front.

ADVERTISEMENT

For an hour each week over the past 11 months, Lake, a psychologist at New York University, US, whose research focuses on human and artificial intelligence, has been attaching a camera to Luna and recording things from her point of view as she plays. His goal is to use the videos to train a language model using the same sensory input that a toddler is exposed to — a LunaBot, so to speak. By doing so, he hopes to create better tools for understanding both AI and ourselves.

There are many roadblocks to using AI models to understand the human mind. The two are starkly different, after all. Modern language and multimodal models — such as OpenAI’s GPT-4 and Google’s Gemini — are assembled on neural networks with little built-in structure and have improved mostly as a result of increased computing power and larger training data sets. Meta’s most recent large language model, Llama 3, is trained on more than 10 trillion words; an average five-year-old is exposed to more like 3,00,000.

Such models can analyse pixels in images but are unable to taste cheese or berries or feel hunger, important kinds of learning experiences for children. Researchers can try their best to turn a child’s full sensory stream into code, but crucial aspects of their phenomenology will inevitably be missed. “What we’re seeing is only the residue of an active learner,” said Michael Frank, a psychologist at Stanford University, US, who for years has been trying to capture the human experience on camera. His lab is working with more than 25 children around the country, including Luna, to record their experiences at home and in social settings.

Humans are also not mere data receptacles, as neural nets are, but intentional animals. Everything we see, every object we touch, every word we hear couples with the beliefs and desires we have in the moment. “There is a deep relationship between what you’re trying to learn and the data that come in,” said Linda Smith, a psychologist at Indiana University, US. “These models just predict. They take whatever is put into them and make the next best step.” While you might be able to emulate human intentionality by structuring training data — something Smith’s lab has been attempting to do recently — the most competent AI models, and the companies that make them, have long been geared toward efficiently processing more data, not making more sense out of less.

There is also a more conceptual issue, which stems from the fact that the abilities of AI systems can seem quite human, even though they arise in nonhuman ways. Recently, dubious claims of consciousness, general intelligence and sentience have emerged from industry labs at Google and Microsoft after the release of new models. In March, Claude 3, the newest model from an AI research startup called Anthropic, stirred up debate when, after analysing a random sentence about pizza toppings hidden in a long list of unrelated documents, it expressed the suspicion that it was being tested. Such reports often smell like marketing ploys rather than objective scientific projects, but they highlight our eagerness to attribute scientific meaning to AI.

In February, Lake and his collaborators created the first AI model trained on the experiences of a child, using videos captured in Frank’s lab more than a decade ago. The model was published in the journal Science and, based on 60 hours of footage, was able to match different moments with words. Type in “sand” and the model will recall the moment, 11 years ago, when the boy whose experiences the model was trained on visited the beach with his mother. Type in “car” and the model brings up a first-person video of the boy sitting in his booster seat.

The training videos are old and grainy, and the data are fairly sparse, but the model’s ability to form some kind of conceptual mapping of the world suggests that it might be possible for language to be picked up mostly through association. “We had one reviewer on the paper who said, ‘Before I read this, I would’ve thought this was impossible,’” said Wai Keen Vong, a researcher at NYU, US, who helped lead the work.

For Lake, as well as for other investigators like him, these interlocking questions — How humanlike can we make AI? What makes us human? — present the most exciting research on the horizon.

NYTNS

Follow us on:
ADVERTISEMENT
ADVERTISEMENT