Can Machines Lie?
You probably never heard of Liam Porr.
Liam was a college student at Berkeley when he decided to play around with the most powerful natural language deep neural network model ever made, GPT-3 (more on GPT-3 later). Using it, he created a number of blog posts that were completely machine-generated. His posts reached 26,000 people in two weeks, some of which have subscribed to “his” blog.
I would write the title and introduction, add a photo, and let GPT-3 do the rest. The blog has had over 26 thousand visitors, and we now have about 60 loyal subscribers…
And only ONE PERSON has noticed it was written by GPT-3.Liam Porr, August 2020
In fact, the very first post made it to the number one spot on Y-Combinator’s, highly popular, Hacker News website. The funny thing was that the few comments that hinted the blog post was artificially generated were actually down-voted by the community.
Well, you shouldn’t be. People are lazy machines, we often believe what reinforces our prior beliefs, we hardly try, nor have the time and resources to fact-check the enormous flow of information we are exposed to on a daily basis. It’s really down to our most basic architecture. People are energy efficient, brains developed to be as energy-efficient as possible, they had to.
Our brains like the familiar, the warm and cozy patterns already learned. Learning new patterns is an effort, calorie-wise. Cognitive psychology experiments have shown that when solving real-world problems we always prefer to use previously-learned patterns even in the presence of other (sometimes simpler) approaches. So, when introduced with new data, textual or otherwise, we tend to fit it into well-known mental slots. When the blog readers of Porr’s posts went through them, they were automatically comparing the text to similar ones they have read in the past. Once this subliminal comparison did not come up with any oddities, they concluded the text was the legitimate work of a human.
But this is what GPT-3 is all about. Replicating human texts, based on learning and modeling billions of existing ones. How could they have told the difference?
Lies Spread Faster than the Truth
There’s an old saying that a lie can travel halfway around the world before the truth can get its boots on, this, however, is no longer a saying, it’s reality.
To prove that, MIT Prof. Soroush Vosoughi used a data set of rumor cascades on Twitter from 2006 to 2017. He reported that about 126,000 rumors were spread by ∼3 million people. False news reached more people than the truth; the top 1% of false news cascades diffused to between 1000 and 100,000 people, whereas the truth rarely diffused to more than 1000 people. Falsehood also diffused faster than the truth (see The spread of true and false news online, MIT).
So it seems we are very bad at discerning falsities from truths. It may even seem we don’t care (as long as they confirm our beliefs). Moreover, in the current world of rapid communications and personal publishing, we tend to spread falsities faster than factual data and truths, further amplified by biased algorithms that are intentionally tuned to promote popular data.
But it gets worse.
Companies are now providing fake news bot farms, false social network profiles, and falsities amplification. This industry has been flying low under the radar, but have no mistake, it is flourishing. Social network fake profiles are built and maintained over time to resemble actual human profiles, human text generators are employed to create realistic-looking (but fake) news articles, and elaborate tools are used to piggyback on the social networks and search engine algorithms, in order to massively spread the former.
What is GPT-3
The quest for artificial intelligence (AI) is centuries old, dating back to Mary Shelly’s Frankenstein or Karel Čapek ‘s . Early hacks at building machines that think have been abandoned in the early 1990s due to lack of computing power and efficient algorithms, only to resurface with a vengeance in the last decade. Deep learning (using multiple layers of brain-like neurons) has become the de-facto standard of creating machines that could solve domain-specific problems like driving a car or diagnosing a patient. However, the holy grail was, and still is, building a machine that could understand and act in the world as a human would, that is, poses general (yet, artificial) intelligence.
GPT-3 is one of those hacks at the GAI problem and a pretty good one. It is a deep neural network that has been trained over most of the texts available online. In layman’s terms, it is a computer model mimicking the human brain, that has literally read (almost) everything on the Internet.
To understand GPT-3 one may fall back on the well-known search query auto-completion. Once you start typing a search query, the search engine tries to complete your sentence as best as it can be based on previously-seen queries by other people. This rudimentary (but efficient) mechanism is a primitive generative language model whose current state of the art is Open AI ‘s GPT-3.
Once the model has been trained over billions of sentences (from either natural or programming languages) it has learned the intricate relations between single as well as multiple sequences of words. It can then be fed with a snippet of text, generating an appropriate “follow-up” text snippet. Once this is done, it takes the generated snippet and goes on to create more texts based on the latter. Simple as it may sound, this method was shown to successfully create realistic (fake) news articles and even artificially generated interviews, as well as fully functional and correct computer code.
Not something you want to fall into the wrong hands, right? Well, too late.
GPT-3 is now in the public domain, and everybody can fill a form and become a beta user (on top of the paid model also publicly and not too pricily available). In fact, that is exactly what Liam Porr has done. And all he had to do to create those blog posts was to grab a number of popular headlines from existing blog posts, feed them to the machine, and publish the results in his (fake) blog. No one (well, most of the 26,000 readers) could not make the difference.
Now imagine an unstable billionaire, or a foreign power using such a tool at scale to skew a democratic election, or manipulate public opinion in order to wreak havoc on an already divided society.
Around 2016 Microsoft release a Twitter bot named Tay that was supposed to learn “conversational understanding” from twits and respond in a “casual and playful” manner. It took less than 24 hours for Twitter to convert the innocent AI chatbot into a full-fledged Nazi racist and myogenic pig. And it was not due to the wrongdoings of the bot.
Microsoft quickly removed Tay from the network, but the fact remains, Tay — being essentially a robot parrot with an internet connection — simply repeated common human sentiments back to users. Tay in itself was not a bad machine, however, should it have been left unattended and mistaken for a human, Tay could have amplified toxic discourse in ways that further divide an already torn society.
Now imagine hundreds of thousands of Tays or GPT-3 based engines, operated by a rogue government, political or terrorist organization, or a large corporate against their foes. Such large-scale operations can (and most probably will) bring down economies, run stocks and monetary instruments, create local strifes, and may even contribute to nation wide mis-decisions as exemplified by the Russian involvement in electing an inapt former president.
It is the notions of fact and truth that are under attack, and by that our common grasp of reality. Once facts become narratives, with the help of large-scale AI, we as humans would lose what has allowed for society to exist, let alone our own personal grip on what is real and what is fake.
If there ever was a time to fight back, now is the time. And we need machines on our side.
An atom-blaster is a good weapon, but it can point both ways.Isaac Asimov
Just a Little Pin Prick
How do we use AI to fight falsities, fake news, and manipulative machine-generated data? The answer is: we build adversarial machines. By adversarial I mean we build AI to fight AI. The notion is quite old, fighting fire with fire, however, it has a modern twist.
If machines are generating, publishing, and promoting falsities and manipulative campaigns aimed at our concept of truth and factual evidence, we need counter-machines to detect, prevent, and block those attempts on our society. It is an arms-race of course. As the rogue AI gets better, our counter-AI must too. As the algorithms and networks used to create fake content become more human-like, we’ll need to develop AI that detects the subtle differences and traits of those falsities and block them before they become prolific. It’s a data-driven immune system, and as such, it would need to constantly learn and battle new threats (the avid reader may find this article I have published on Towards Data Science magazine a few years ago interesting).
Having said that, we must understand that should the above become possible, should the machines that generate the fake texts and data (e.g. images, deep-fake videos, etc.) be indiscernible from human-generated content by us mortals, only detectable by counter-AI, we have an even bigger problem on our hands as that level of AI would have surpassed human intelligence (as Elon Musk and others occasionally remind us).
It may be the case that this becomes an interim period, much like the cold war of the former century. That at some point the human race would understand those AI tools and counter-tools are MAD (mutually assured destruction) and decide to dismantle them. That AI-ethics is formed and AI-laws are put in place to disallow malicious use of AI, as we have with nuclear weapons.
We need to act, and we need to act sooner than later. Regretfully, as history has shown us, later is often our choice