In January 2021, the artificial intelligence research laboratory OpenAI gave a limited release to a piece of software called Dall-E. The software allowed users to enter a simple description of an image they had in their mind and, after a brief pause, the software would produce an almost uncannily good interpretation of their suggestion, worthy of a jobbing illustrator or Adobe-proficient designer – but much faster, and for free. Typing in, for example, “a pig with wings flying over the moon, illustrated by Antoine de Saint-Exupéry” resulted, after a minute or two of processing, in something reminiscent of the patchy but recognisable watercolour brushes of the creator of The Little Prince.
A year or so later, when the software got a wider release, the internet went wild. Social media was flooded with all sorts of bizarre and wondrous creations, an exuberant hodgepodge of fantasies and artistic styles. And a few months later it happened again, this time with language, and a product called ChatGPT, also produced by OpenAI. Ask ChatGPT to produce a summary of the Book of Job in the style of the poet Allen Ginsberg and it would come up with a reasonable attempt in a few seconds. Ask it to render Ginsberg’s poem Howl in the form of a management consultant’s slide deck presentation and it would do that too. The abilities of these programs to conjure up strange new worlds in words and pictures alike entranced the public, and the desire to have a go oneself produced a growing literature on the ins and outs of making the best use of these tools, and particularly how to structure inputs to get the most interesting outcomes.
The latter skill has become known as “prompt engineering”: the technique of framing one’s instructions in terms most clearly understood by the system, so it returns the results that most closely match expectations – or perhaps exceed them. Tech commentators were quick to predict that prompt engineering would become a sought-after and well remunerated job description in a “no code” future, where the most powerful way of interacting with intelligent systems would be through the medium of human language. No longer would we need to know how to draw, or how to write computer code: we would simply whisper our desires to the machine and it would do the rest. The limits on AI’s creations would be the limits of our own imaginations.
Imitators of and advances on Dall-E followed quickly. Dall-E mini (later renamed Craiyon) gave those not invited to OpenAI’s private services a chance to play around with a similar, less powerful, but still highly impressive tool. Meanwhile, the independent commercial effort Midjourney and the open-source Stable Diffusion used a different approach to classifying and generating images, to much the same ends. Within a few months, the field had rapidly advanced to the generation of short videos and 3D models, with new tools appearing daily from academic departments and hobbyist programmers, as well as the established giants of social media and now AI: Facebook (aka Meta), Google, Microsoft and others. A new field of research, software and contestation had opened up.
The name Dall-E combines the robot protagonist of Disney’s Wall-E with the Spanish surrealist artist Salvador Dalí. On the one hand, you have the figure of a plucky, autonomous and adorable little machine sweeping up the debris of a collapsed human civilisation, and on the other a man whose most repeated bon mots include, “Those who do not want to imitate anything, produce nothing,” and “What is important is to spread confusion, not eliminate it.” Both make admirable namesakes for the broad swathe of tools that have come to be known as AI image generators.
In the past year, this new wave of consumer AI, which includes both image generation and tools such as ChatGPT, has captured the popular imagination. It has also provided a boost to the fortunes of major technology companies who have, despite much effort, failed to convince most of us that either blockchain or virtual reality (“the metaverse”) are the future that any of us want. At least this one feels fun, for five minutes or so; and “AI” still has that sparkly, science-fiction quality, redolent of giant robots and superhuman brains, which provides that little contact high with the genuinely novel. What’s going on under the hood, of course, is far from new.
There have been no major breakthroughs in the academic discipline of artificial intelligence for a couple of decades. The underlying technology of neural networks – a method of machine learning based on the way physical brains function – was theorised and even put into practice back in the 1990s. You could use them to generate images then, too, but they were mostly formless abstractions, blobs of colour with little emotional or aesthetic resonance. The first convincing AI chatbots date back even further. In 1964, Joseph Weizenbaum, a computer scientist at the Massachusetts Institute of Technology, developed a chatbot called Eliza. Eliza was modelled on a “person-centred” psychotherapist: whatever you said, it would mirror back to you. If you said “I feel sad”, Eliza would respond with “Why do you feel sad?”, and so on. (Weizenbaum actually wanted his project to demonstrate the superficiality of human communication, not to be a blueprint for future products.)
Early AIs didn’t know much about the world, and academic departments lacked the computing power to exploit them at scale. The difference today is not intelligence, but data and power. The big tech companies have spent 20 years harvesting vast amounts of data from culture and everyday life, and building vast, energy-hungry data centres filled with ever more powerful computers to churn through it. What were once creaky old neural networks have become super-powered, and the gush of AI we’re seeing is the result.
AI image generation relies on the assembly and analysis of millions upon millions of tagged images; that is, images that come with some kind of description of their content already attached. These images and descriptions are then processed through neural networks that learn to associate particular and deeply nuanced qualities of the image – shapes, colours, compositions – with certain words and phrases. These qualities are then layered on top of one another to produce new arrangements of shape, colour and composition, based on the billions of differently weighted associations produced by a simple prompt. But where did all those original images come from?
The datasets released by LAION, a German non-profit, are a good example of the kind of image-text collections used to train large AI models (they provided the basis for both Stable Diffusion and Google’s Imagen, among others). For more than a decade, another non-profit web organisation, Common Crawl, has been indexing and storing as much of the public world wide web as it can access, filing away as many as 3bn pages every month. Researchers at LAION took a chunk of the Common Crawl data and pulled out every image with an “alt” tag, a line or so of text meant to be used to describe images on web pages. After some trimming, links to the original images and the text describing them are released in vast collections: LAION-5B, released in March 2022, contains more than five billion text-image pairs. These images are “public” images in the broadest sense: any image ever published on the internet may be gathered up into them, with exactly the kind of strange effects one may expect.
In September 2022, a San Francisco–based digital artist named Lapine was using a tool called Have I Been Trained, which allows artists to see if their work is being used to train AI image generation models. Have I Been Trained was created by the artists Mat Dryhurst and Holly Herndon, whose own work led them to explore the ways in which artists’ labour is coopted by AI. When Lapine used it to scan the LAION database, she found an image of her own face. She was able to trace this image back to photographs taken by a doctor when she was undergoing treatment for a rare genetic condition. The photographs were taken as part of her clinical documentation, and she signed documents that restricted their use to her medical file alone. The doctor involved died in 2018. Somehow, these private medical images ended up online, then in Common Crawl’s archive and LAION’s dataset, and were finally ingested into the neural networks as they learned about the meaning of images, and how to make new ones. For all we know, the mottled pink texture of our Saint-Exupéry-style piggy could have been blended, however subtly, from the raw flesh of a cancer patient.
“It’s the digital equivalent of receiving stolen property. Someone stole the image from my deceased doctor’s files and it ended up somewhere online, and then it was scraped into this dataset,” Lapine told the website Ars Technica. “It’s bad enough to have a photo leaked, but now it’s part of a product. And this goes for anyone’s photos, medical record or not. And the future abuse potential is really high.” (According to her Twitter account, Lapine continues to use tools like Dall-E to make her own art.)
The entirety of this kind of publicly available AI, whether it works with images or words, as well as the many data-driven applications like it, is based on this wholesale appropriation of existing culture, the scope of which we can barely comprehend. Public or private, legal or otherwise, most of the text and images scraped up by these systems exist in the nebulous domain of “fair use” (permitted in the US, but questionable if not outright illegal in the EU). Like most of what goes on inside advanced neural networks, it’s really impossible to understand how they work from the outside, rare encounters such as Lapine’s aside. But we can be certain of this: far from being the magical, novel creations of brilliant machines, the outputs of this kind of AI is entirely dependent on the uncredited and unremunerated work of generations of human artists.
AI image and text generation is pure primitive accumulation: expropriation of labour from the many for the enrichment and advancement of a few Silicon Valley technology companies and their billionaire owners. These companies made their money by inserting themselves into every aspect of everyday life, including the most personal and creative areas of our lives: our secret passions, our private conversations, our likenesses and our dreams. They enclosed our imaginations in much the same manner as landlords and robber barons enclosed once-common lands. They promised that in doing so they would open up new realms of human experience, give us access to all human knowledge, and create new kinds of human connection. Instead, they are selling us back our dreams repackaged as the products of machines, with the only promise being that they’ll make even more money advertising on the back of them.
The weirdness of AI image generation exists in the output as well as the input. One user tried typing in nonsense phrases and was confused and somewhat discomforted to discover that Dall-E mini seemed to have a very good idea of what a “Crungus” was: an otherwise unknown phrase that consistently produced images of a snarling, naked, ogre-like figure. Crungus was sufficiently clear within the program’s imagination that he could be manipulated easily: other users quickly offered up images of ancient Crungus tapestries, Roman-style Crungus mosaics, oil paintings of Crungus, photos of Crungus hugging various celebrities, and, this being the internet, “sexy” Crungus.
So, who or what is Crungus? Twitter users were quick to describe him as “the first AI cryptid”, a creature like Bigfoot who exists, in this case, within the underexplored terrain of the AI’s imagination. And this is about as clear an answer as we can get at this point, due to our limited understanding of how the system works. We can’t peer inside its decision-making processes because the way these neural networks “think” is inherently inhuman. It is the product of an incredibly complex, mathematical ordering of the world, as opposed to the historical, emotional way in which humans order their thinking. The Crungus is a dream emerging from the AI’s model of the world, composited from billions of references that have escaped their origins and coalesced into a mythological figure untethered from human experience. Which is fine, even amazing – but it does make one ask, whose dreams are being drawn upon here? What composite of human culture, what perspective on it, produced this nightmare?
A similar experience occurred to another digital artist experimenting with negative prompts, a technique for generating what the system considers to be the polar opposite of what is described. When the artist entered “Brando::-1”, the system returned something that looked a bit like a logo for a video game company called DIGITA PNTICS. That this may, across the multiple dimensions of the system’s vision of the world, be the opposite of Marlon Brando seems reasonable enough. But when they checked to see if it went the other way, by typing in “DIGITA PNTICS skyline logo::-1”, something much stranger happened: all of the images depicted a creepy-looking woman with sunken eyes and reddened cheeks, who the artist christened Loab. Once discovered, Loab seemed unusually and disturbingly persistent. Feeding the image back into the program, combined with ever more divergent text prompts, kept bringing Loab back, in increasingly nightmarish forms, in which blood, gore and violence predominated.
Here’s one explanation for Loab, and possibly Crungus: although it’s very, very hard to imagine the way the machine’s imagination works, it is possible to imagine it having a shape. This shape is never going to be smooth or neatly rounded: rather, it is going to have troughs and peaks, mountains and valleys, areas full of information and areas lacking many features at all. Those areas of high information correspond to networks of associations that the system “knows” a lot about. One can imagine the regions related to human faces, cars and cats, for example, being pretty dense, given the distribution of images one finds on a survey of the whole internet.
It is these regions that an AI image generator will draw on most heavily when creating its pictures. But there are other places, less visited, that come into play when negative prompting – or indeed, nonsense phrases – are deployed. In order to satisfy such queries, the machine must draw on more esoteric, less certain connections, and perhaps even infer from the totality of what it does know what its opposite may be. Here, in the hinterlands, Loab and Crungus are to be found.
That’s a satisfying theory, but it does raise certain uncomfortable questions about why Crungus and Loab look like they do; why they tip towards horror and violence, why they hint at nightmares. AI image generators, in their attempt to understand and replicate the entirety of human visual culture, seem to have recreated our darkest fears as well. Perhaps this is just a sign that these systems are very good indeed at aping human consciousness, all the way down to the horror that lurks in the depths of existence: our fears of filth, death and corruption. And if so, we need to acknowledge that these will be persistent components of the machines we build in our own image. There is no escaping such obsessions and dangers, no moderating or engineering away the reality of the human condition. The dirt and disgust of living and dying will stay with us and need addressing, just as the hope, love, joy and discovery will.
This matters, because AI image generators will do what all previous technologies have done, but they will also go further. They will reproduce the biases and prejudices of those who create them, like the webcams that only recognise white faces, or the predictive policing systems that lay siege to low-income neighbourhoods. And they will also up the game: the benchmark of AI performance is shifting from the narrow domain of puzzles and challenges – playing chess or Go, or obeying traffic laws – to the much broader territory of imagination and creativity.
While claims about AI’s “creativity” might be overblown – there is no true originality in image generation, only very skilled imitation and pastiche – that doesn’t mean it isn’t capable of taking over many common “artistic” tasks long considered the preserve of skilled workers, from illustrators and graphic designers to musicians, videographers and, indeed, writers. This is a huge shift. AI is now engaging with the underlying experience of feeling, emotion and mood, and this will allow it to shape and influence the world at ever deeper and persuasive levels.
ChatGPT was introduced in November 2022 by OpenAI, and further shifted our understanding of how AI and human creativity might interact. Structured as a chatbot – a program that mimics human conversation – ChatGPT is capable of a lot more than conversation. When properly entreated, it is capable of writing working computer code, solving mathematical problems and mimicking common writing tasks, from book reviews to academic papers, wedding speeches and legal contracts.
It was immediately obvious how the program could be a boon to those who find, say, writing emails or essays difficult, but also how, as with image generators, it could be used to replace those who make a living from those tasks. Many schools and universities have already implemented policies that ban the use of ChatGPT amid fears that students will use it to write their essays, while the academic journal Nature has had to publish policies explaining why the program cannot be listed as an author of research papers (it can’t give consent, and it can’t be held accountable). But institutions themselves are not immune from inappropriate uses of the tool: in February, Peabody College, a private university in Tennessee, shocked students when it sent out a letter of condolence and advice following a school shooting in Michigan. While the letter spoke of the value of community, mutual respect and togetherness, a note at the bottom stated that it was written by ChatGPT – which felt both morally wrong and somehow false or uncanny to many. It seems there are many areas of life where the intercession of machines requires some deeper thought.
If it would be inappropriate to replace our communications wholesale with ChatGPT, then one clear trend is for it to become a kind of wise assistant, guiding us through the morass of available knowledge towards the information we seek. Microsoft has been an early mover in this direction, reconfiguring its often disparaged search engine Bing as a ChatGPT-powered chatbot, and massively boosting its popularity by doing so. But despite the online (and journalistic) rush to consult ChatGPT on almost every conceivable problem, its relationship to knowledge itself is somewhat shaky.
One recent personal interaction with ChatGPT went like this. I asked it to suggest some books to read based on a new area of interest: multi-species democracy, the idea of including non-human creatures in political decision-making processes. It’s pretty much the most useful application of the tool: “Hey, here’s a thing I’m thinking about, can you tell me some more?” And ChatGPT obliged. It gave me a list of several books that explored this novel area of interest in depth, and described in persuasive human language why I should read them. This was brilliant! Except, it turned out that only one of the four books listed actually existed, and several of the concepts ChatGPT thought I should explore further were lifted wholesale from rightwing propaganda: it explained, for example, that the “wise use” movement promoted animal rights, when in fact it is a libertarian, anti-environment concept promoting the expansion of property rights.
Now, this didn’t happen because ChatGPT is inherently rightwing. It’s because it’s inherently stupid. It has read most of the internet, and it knows what human language is supposed to sound like, but it has no relation to reality whatsoever. It is dreaming sentences that sound about right, and listening to it talk is frankly about as interesting as listening to someone’s dreams. It is very good at producing what sounds like sense, and best of all at producing cliche and banality, which has composed the majority of its diet, but it remains incapable of relating meaningfully to the world as it actually is. Distrust anyone who pretends that this is an echo, even an approximation, of consciousness. (As this piece was going to publication, OpenAI released a new version of the system that powers ChatGPT, and said it was “less likely to make up facts”.)
The belief in this kind of AI as actually knowledgeable or meaningful is actively dangerous. It risks poisoning the well of collective thought, and of our ability to think at all. If, as is being proposed by technology companies, the results of ChatGPT queries will be provided as answers to those seeking knowledge online, and if, as has been proposed by some commentators, ChatGPT is used in the classroom as a teaching aide, then its hallucinations will enter the permanent record, effectively coming between us and more legitimate, testable sources of information, until the line between the two is so blurred as to be invisible. Moreover, there has never been a time when our ability as individuals to research and critically evaluate knowledge on our own behalf has been more necessary, not least because of the damage that technology companies have already done to the ways in which information is disseminated. To place all of our trust in the dreams of badly programmed machines would be to abandon such critical thinking altogether.
AI technologies are bad for the planet too. Training a single AI model – according to research published in 2019 – might emit the equivalent of more than 284 tonnes of carbon dioxide, which is nearly five times as much as the entire lifetime of the average American car, including its manufacture. These emissions are expected to grow by nearly 50% over the next five years, all while the planet continues to heat up, acidifying the oceans, igniting wildfires, throwing up superstorms and driving species to extinction. It’s hard to think of anything more utterly stupid than artificial intelligence, as it is practised in the current era.
So, let’s take a step back. If these current incarnations of “artificial” “intelligence” are so dreary, what are the alternatives? Can we imagine powerful information sorting and communicating technologies that don’t exploit, misuse, mislead and supplant us? Yes, we can – once we step outside the corporate power networks that have come to define the current wave of AI.
In fact, there are already examples of AI being used to benefit specific communities by bypassing the entrenched power of corporations. Indigenous languages are under threat around the world. The UN estimates that one disappears every two weeks, and with that disappearance goes generations of knowledge and experience. This problem, the result of colonialism and racist assimilation policies over centuries, is compounded by the rising dominance of machine-learning language models, which ensure that popular languages increase their power, while lesser-known ones are drained of exposure and expertise.
In Aotearoa New Zealand, a small non-profit radio station called Te Hiku Media, which broadcasts in the Māori language, decided to address this disparity between the representation of different languages in technology. Its massive archive of more than 20 years of broadcasts, representing a vast range of idioms, colloquialisms and unique phrases, many of them no longer spoken by anyone living, was being digitised, but needed to be transcribed to be of use to language researchers and the Māori community. In response, the radio station decided to train its own speech recognition model, so that it would be able to “listen” to its archive and produce transcriptions.
Over the next few years, Te Hiku Media, using open-source technologies as well as systems it developed in house, achieved the almost impossible: a highly accurate speech recognition system for the Māori language, which was built and owned by its own language community. This was more than a software effort. The station contacted every Māori community group it could and asked them to record themselves speaking pre-written statements in order to provide a corpus of annotated speech, a prerequisite for training its model.
There was a cash prize for whoever submitted the most sentences – one activist, Te Mihinga Komene, recorded 4,000 phrases alone – but the organisers found that the greatest motivation for contributors was the shared vision of revitalising the language while keeping it in the community’s ownership. Within a few weeks, it created a model that recognised recorded speech with 86% accuracy – more than enough to get it started transcribing its full archive.
Te Hiku Media’s achievement cleared a path for other indigenous groups to follow, with similar projects now being undertaken by Mohawk peoples in south-eastern Canada and Native Hawaiians. It also established the principle of data sovereignty around indigenous languages, and by extension, other forms of indigenous knowledge. When international for-profit companies started approaching Māori speakers to help build their own models, Te Hiku Media campaigned against these efforts, arguing, “They suppressed our languages and physically beat it out of our grandparents, and now they want to sell our language back to us as a service.”
“Data is the last frontier of colonisation,” wrote Keoni Mahelona, a Native Hawaiian and one of the co-founders of Te Hiku Media. All of Te Hiku’s work is released under what it named the Kaitiakitanga License, a legal guarantee of guardianship and custodianship that ensures that all the data that went into the language model and other projects remains the property of the community that created it – in this case, the Māori speakers who offered their help – and is theirs to license, or not, as they deem appropriate according to their tikanga (Māori customs and protocols). In this way, the Māori language is revitalised, while resisting and altering the systems of digital colonialism that continue to repeat centuries of oppression.
The lesson of the current wave of “artificial” “intelligence”, I feel, is that intelligence is a poor thing when it is imagined by corporations. If your view of the world is one in which profit maximisation is the king of virtues, and all things shall be held to the standard of shareholder value, then of course your artistic, imaginative, aesthetic and emotional expressions will be woefully impoverished. We deserve better from the tools we use, the media we consume and the communities we live within, and we will only get what we deserve when we are capable of participating in them fully. And don’t be intimidated by them either – they’re really not that complicated. As the science-fiction legend Ursula K Le Guin wrote: “Technology is what we can learn to do.”
This article was adapted from the new edition of New Dark Age: Technology and the End of the Future, published by Verso