MY KOLKATA EDUGRAPH
ADVERTISEMENT
regular-article-logo Saturday, 05 October 2024

Dall-E: The AI tool that lets you turn your thoughts into images

Here’s how the technology built by Elon Musk’s artificial intelligence lab works

Cade Metz Published 18.04.22, 12:37 AM
Images generated by DALL-E in response to commands for “a teapot in the shape of an avocado” and “cats playing chess”

Images generated by DALL-E in response to commands for “a teapot in the shape of an avocado” and “cats playing chess” NYTNS

They call it DALL-E in a nod to both “WALL-E,” the 2008 animated movie about an autonomous robot, and Salvador Dalí, the surrealist painter.

OpenAI, one of the world’s most ambitious artificial intelligence labs, backed by $1 billion in funding from Microsoft, is not yet sharing the technology with the general public. But on a recent afternoon, Alex Nichol, one of the researchers behind the system, demonstrated how it works.

ADVERTISEMENT

When he asked for “a teapot in the shape of an avocado”, typing those words into a largely empty computer screen, the system created 10 distinct images of a dark green avocado teapot, some with pits and some without.

“DALL-E is good at avocados,” Nichol said.

A team of seven researchers spent two years developing the technology, which OpenAI plans to eventually offer as a tool for people like graphic artists, providing new shortcuts and new ideas as they create and edit digital images. Computer programmers already use Copilot, a tool based on similar technology from OpenAI, to generate snippets of software code.

But for many experts, DALL-E is worrisome.

“You could use it for good things, but certainly you could use it for all sorts of other crazy, worrying applications, and that includes deepfakes,” like misleading photos and videos, said Subbarao Kambhampati, a professor of computer science at Arizona State University, US.

A half-decade ago, the world’s leading AI labs built systems that could identify objects in digital images and even generate images on their own, including flowers, dogs, cars and faces. A few years later, they built systems that could do much the same with written language, summarising articles, answering questions, generating tweets and even writing blog posts.

Now researchers are combining those technologies to create new forms of AI. DALL-E is a notable step forward because it juggles both language and images and, in some cases, grasps the relationship between the two.

“We can now use multiple, intersecting streams of information to create better and better technology,” said Oren Etzioni, CEO of the Allen Institute for Artificial Intelligence, an artificial intelligence lab in Seattle, US.

The technology is not perfect. When Nichol asked DALL-E to “put the Eiffel Tower on the moon,” it did not quite grasp the idea. It put the moon in the sky above the tower. When he asked for “a living room filled with sand,” it produced a scene that looked more like a construction site than a living room. But when Nichol tweaked his requests a little, it provided what he wanted.

DALL-E is what artificial intelligence researchers call a neural network, which is a mathematical system loosely modelled on the network of neurons in the brain. That is the same technology that recognises the commands spoken into smartphones and identifies the presence of pedestrians as self-driving cars navigate city streets.

A neural network learns skills by analysing large amounts of data. By pinpointing patterns in thousands of avocado photos, for example, it can learn to recognise an avocado. DALL-E looks for patterns as it analyses millions of digital images as well as text captions that describe what each image depicts. In this way, it learns to recognise the links between the images and the words.

When someone describes an image for DALL-E, it generates a set of key features that this image might include. One feature might be the line at the edge of a trumpet. Another might be the curve at the top of a teddy bear’s ear.

Then, a second neural network, called a diffusion model, creates the image and generates the pixels needed to realise these features. The latest DALL-E version, unveiled with a new research paper describing the system, generates high-res images that in many cases look like photos.

Although DALL-E often fails to understand what someone has described and sometimes mangles the image it produces, OpenAI continues to improve the technology. Researchers can often refine the skills of a neural network by feeding it even larger amounts of data.

They can also build more powerful systems by applying the same concepts to new types of data.

Experts believe researchers will continue to hone such systems. Ultimately, those systems could help companies improve search engines, digital assistants and other common technologies as well as automate new tasks for graphic artists, programmers and other professionals.

But there are caveats. The AI systems can show bias against women and people of colour, in part because they learn their skills from enormous pools of online text, images and other data that show bias. They could be used to generate pornography, hate speech and other offensive material. And many believe the technology will eventually make it so easy to create disinformation, people will have to be sceptical of nearly everything they see online.

“We can forge text. We can put text into someone’s voice. And we can forge images and videos,” Etzioni said. “There is already disinformation online, but the worry is that this scales disinformation to new levels.”

For now, OpenAI is keeping a tight leash on DALL-E.

NYTNS

Follow us on:
ADVERTISEMENT
ADVERTISEMENT