For example, in an article from The Verge, it’s noted that if you ask DALL-E to generate images of a “flight attendant”, almost all the subjects will be women. Check this Twitter thread for some tangible examples.Įarly uses of Dall-E-2 have produced problematic results. ![]() The output, therefore, is often racist, sexist, or toxic in some way. ![]() There’s a reason why Imagen and Dall-E-2 haven’t been made available for public use.įor example, Google acknowledges “an overall bias towards generating images of people with lighter skin tones and a tendency for images portraying different professions to align with Western gender stereotypes.” Another issue, though, is the replication of prevailing social biases and stereotypes. Text-to-image AI offers remarkably creative renderings and innovative design opportunities. There’s a need, therefore, for a tighter filtration process to remove questionable content, and better curated and comprehensive data sets. Google’s researchers summarize this problem in a recent paper: “he large scale data requirements of text-to-image models have led researchers to rely heavily on large, mostly uncurated, web-scraped dataset Dataset audits have revealed these datasets tend to reflect social stereotypes, oppressive viewpoints, and derogatory, or otherwise harmful, associations to marginalized identity groups.” Why not? Their models ingest (and learn to replicate) some abhorrent content you’d expect to find online. Many corners of the web, in fact, meaning it doesn’t always come out as appropriately as you’d like. The problem is huge quantities of text-to-image AI data are coming from the web. The better the drawing, the easier it is to correctly identify. They recall what it looks like from experience and reproduce it on paper. It’s like asking someone to draw something in Pictionary. Specifically, it requires captioned pictures so the AI can learn how to process your request. These models need huge amounts of image data and image annotation to turn your text into an image. Early Issues with Text-to-image AI Problem #1: Shallow Data Pool By Google’s own admission, there are several ethical challenges facing text-to-image AI. Pretty cool, right? Well, it’s almost too good to be true at this point. You type what you want to see, and the program generates it. With both programs, you insert a text prompt, and a corresponding picture is generated. Imagen “builds on the power of large transformer language models in understanding text and hinges on the strength of diffusion models in high-fidelity image generation.” It’s “a text-to-image diffusion model with an unprecedented degree of photorealism and a deep level of language understanding.” Google also announced its own iteration of this technology – Imagen. It can add and remove elements while taking shadows, reflections, and textures into account. It also incorporates unique concepts, attributes, and styles.Īdditionally, DALL♾ 2 makes realistic edits to existing images from a natural language caption. We’ve already mentioned DALL♾ 2, an AI that creates original, realistic images and art from a text description. Input any text you can think of, and the AI generates a surprisingly accurate picture that matches your description.Īdditionally, the images are generated in a range of styles, from oil paintings to CGI renders and even photographs. We’ll discuss what they are and how to untie them below, after digging deeper into what exactly text-to-image AI is to begin with. Partly and for now, anyways.Īs with any new technology, it’s not without its knots. OpenAI trained CLIP with a huge collection of internet images and accompanying text.Īnd as Janelle demonstrates, it generates clear, coherent images. ![]() Like the original, DALL-E, it uses CLIP, a neural network that efficiently learns visual concepts from natural language supervision. She did this using DALL-E 2, a new AI system that creates realistic images and art from a description in natural language. Janelle Shane of AI Weirdness recently had fun with the generator, creating corporate logos out of text prompts like “the local Waffle House” and “the Pizza Hut logo”. Text-to-image artificial intelligence (AI) allows you to generate an image from scratch based on a text description. It’s fun to see the early results of text-to-image AI capabilities, but it will take some time to achieve picture-perfect success.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |