For less-talented portrait artists, drawing hands and feet has been a problem for centuries. Most artists get it right, some don’t. But AI can’t. We are talking about a technology that wants to change the world but somehow it can’t get a simple thing correct.
Be it Midjourney, Stable Diffusion or DALL-E, all of them can create realistic landscape and deliver perfect facial expressions. Last year, an AI-generated image even won first prize for digital art at the Colorado State Fair. But hands, those touching hands, reaching out, they look weird, fingers look mutated. Even a mediocre artist can do a better job. The ineptitude of generative AI models at drawing hands and fingers has become a joke. At the moment, AI-generated fingers look like freaky appendages.
Why can’t AI art create fingers?
To an above-average artist, drawing fingers should come easy because we understand how fingers function, we are good at recognising patterns. What AI engines do is learn about objects from photographs.
So give AI software a picture of an apple and then pictures of more apples so that it can find similarities to the point the algorithm is able to point out an apple on its own. AI can learn from images in a museum or from the Net. Not only images, AI models are also fed descriptions of objects. If you rotate an apple, it remains roughly the same.
For artists, the process is different. To draw something as complicated as fingers, we begin with a simple sketch and then work on the thickness, texture and mechanism of how fingers work. We know when three fingers are held in a way to hide the other two. When we hold a stick, our fingers clasp it differently. The problem with AI is that it knows how fingers look but not work. AI algorithms are trapped inside museums and on the Net. Asking AI to show a man holding a phone and then a woman holding an ice cream cone can throw up hilarious results. There are many variations involved, from finger length/width to metacarpals and wrist joints.
A couple holding hands created using DALL-E
An illustration created on DALL-E
Take the example of an archaeologist translating Egyptian hieroglyphs from the Rosetta Stone. We can learn what is there and then perhaps deduce. A machine can’t. If the input is flawed, the output will be flawed. Recently, linguists Noam Chomsky and Ian Roberts with Jeffrey Watumull, a director of artificial intelligence at a science and technology company, wrote a New York Times op-ed piece: “The human mind is not, like ChatGPT and its ilk, a lumbering statistical engine for pattern matching, gorging on hundreds of terabytes of data and extrapolating the most likely conversational response or most probable answer to a scientific question. On the contrary, the human mind is a surprisingly efficient and even elegant system that operates with small amounts of information; it seeks not to infer brute correlations among data points but to create explanations.”
Hands look different from different angles. Then there are two more problems. First, while taking photos we don’t usually focus on our hands or fingers. So, AI systems have limited access to data. If there are 90,000 pictures of faces, you may find only 1,000 that deal with hands and that too clarity is compromised. Second, while giving an AI engine a description of what needs to be drawn, we are not specific. At the moment, AI systems look for images that get as close to reality as possible.
You may say, we look at pictures at a glance. Yes, we do. If the shoulder is created slightly incorrectly, we won’t notice it because the problem will be spread over a larger surface. But if the ring finger turns out smaller than the thumb we will notice it. If the hand holding an umbrella has no wrist, we will notice it.
Can things improve?
Last year, musician Patti Smith wrote in The New Yorker: “The hand is one of the oldest of icons, a direct correspondence between imagination and execution. Healing energy is channelled through our hands. We extend a hand in greeting and service; we raise a hand as a pledge.”
Yet, there are not enough pictures for AI machines to learn from. With ChatGPT being integrated into many IT services, especially the ones used by Microsoft, the chat bot is getting clever. But the same is not happening with hands. Be it DALL-E or Midjourney, they have to involve humans to learn more about hands. If we are shown 10 pictures of AI-generated hands, we can tell the machine about the ones that are close to the real thing.
Over time, things will improve. At the moment, things are as ridiculous as painting a horse in the 1700s. If you look at George Stubbs’ painting titled Baronet, you will see a horse in full stride with all four limbs splayed out to the maximum. The artist couldn’t always understand how the limbs of a horse would function by simply looking at the animal in action. In 1877, English photographer Eadweard Muybridge captured photographic evidence of a horse’s true gait. Our perspective changed.
Generative AI will get better at rendering pictures of hands and feet and teeth. AI needs to understand what it is to be human and what we see. Till then, it’s best we use generative AI in ways where we can hide hands. Think of the chronic poser Napolean and how he kept his hand tucked in his vest in many paintings. The Emperor Napoleon in His Study at The Tuileries by Jacques-Louis David is a classic example. Of course, David knew how to paint and Napolean was doing what many did before him — George Washington, Mozart, Francisco Pizarro or even the Greek orator Aeschines — show gentlemanly restraint.
— Mathures Paul