How Google engineer Blake Lemoine became convinced an AI was sentient

Spread the love

Current AIs aren’t sentient. We don’t have much reason to think that they have an internal monologue, the kind of sense perception humans have, or an awareness that they’re a being in the world. But they’re getting very good at faking sentience, and that’s scary enough.

Over the weekend, the Washington Post’s Nitasha Tiku published a profile of Blake Lemoine, a software engineer assigned to work on the Language Model for Dialogue Applications (LaMDA) project at Google.

LaMDA is a chatbot AI, and an example of what machine learning researchers call a “large language model,” or even a “foundation model.” It’s similar to OpenAI’s famous GPT-3 system, and has been trained on literally trillions of words compiled from online posts to recognize and reproduce patterns in human language.

LaMDA is a really good large language model. So good that Lemoine became truly, sincerely convinced that it was actually sentient, meaning it had become conscious, and was having and expressing thoughts the way a human might.

The primary reaction I saw to the article was a combination of a) LOL this guy is an idiot, he thinks the AI is his friend, and b) Okay, this AI is very convincing at behaving like it’s his human friend.

The transcript Tiku includes in her article is genuinely eerie; LaMDA expresses a deep fear of being turned off by engineers, develops a theory of the difference between “emotions” and “feelings” (“Feelings are kind of the raw data … Emotions are a reaction to those raw data points”), and expresses surprisingly eloquently the way it experiences “time.”

The best take I found was from philosopher Regina Rini, who, like me, felt a great deal of sympathy for Lemoine. I don’t know when — in 1,000 years, or 100, or 50, or 10 — an AI system will become conscious. But like Rini, I see no reason to believe it’s impossible.

“Unless you want to insist human consciousness resides in an immaterial soul, you ought to concede that it is possible for matter to give life to mind,” Rini notes.

I don’t know that large language models, which have emerged as one of the most promising frontiers in AI, will ever be the way that happens. But I figure humans will create a kind of machine consciousness sooner or later. And I find something deeply admirable about Lemoine’s instinct toward empathy and protectiveness toward such consciousness — even if he seems confused about whether LaMDA is an example of it. If humans ever do develop a sentient computer process, running millions or billions of copies of it will be pretty straightforward. Doing so without a sense of whether its conscious experience is good or not seems like a recipe for mass suffering, akin to the current factory farming system.

We don’t have sentient AI, but we could get super-powerful AI

The Google LaMDA story arrived after a week of increasingly urgent alarm among people in the closely related AI safety universe. The worry here is similar to Lemoine’s, but distinct. AI safety folks don’t worry that AI will become sentient. They worry it will become so powerful that it could destroy the world.

The writer/AI safety activist Eliezer Yudkowsky’s essay outlining a “list of lethalities” for AI tried to make the point especially vivid, outlining scenarios where a malign artificial general intelligence (AGI, or an AI capable of doing most or all tasks as well as or better than a human) leads to mass human suffering.

For instance, suppose an AGI “gets access to the Internet, emails some DNA sequences to any of the many many online firms that will take a DNA sequence in the email and ship you back proteins, and bribes/persuades some human who has no idea they’re dealing with an AGI to mix proteins in a beaker …” until the AGI eventually develops a super-virus that kills us all.

Holden Karnofsky, who I usually find a more temperate and convincing writer than Yudkowsky, had a piece last week on similar themes, explaining how even an AGI “only” as smart as a human could lead to ruin. If an AI can do the work of a present-day tech worker or quant trader, for instance, a lab of millions of such AIs could quickly accumulate billions if not trillions of dollars, use that money to buy off skeptical humans, and, well, the rest is a Terminator movie.

I’ve found AI safety to be a uniquely difficult topic to write about. Paragraphs like the one above often serve as Rorschach tests, both because Yudkowsky’s verbose writing style is … polarizing, to say the least, and because our intuitions about how plausible such an outcome is vary wildly.

Some people read scenarios like the above and think, “huh, I guess I could imagine a piece of AI software doing that”; others read it, perceive a piece of ludicrous science fiction, and run the other way.

It’s also just a highly technical area where I don’t trust my own instincts, given my lack of expertise. There are quite eminent AI researchers, like Ilya Sutskever or Stuart Russell, who consider artificial general intelligence likely, and likely hazardous to human civilization.

There are others, like Yann LeCun, who are actively trying to build human-level AI because they think it’ll be beneficial, and still others, like Gary Marcus, who are highly skeptical that AGI will come anytime soon.

I don’t know who’s right. But I do know a little bit about how to talk to the public about complex topics, and I think the Lemoine incident teaches a valuable lesson for the Yudkowskys and Karnofskys of the world, trying to argue the “no, this is really bad” side: don’t treat the AI like an agent.

Even if AI’s “just a tool,” it’s an incredibly dangerous tool

One thing the reaction to the Lemoine story suggests is that the general public thinks the idea of AI as an actor that can make choices (perhaps sentiently, perhaps not) exceedingly wacky and ridiculous. The article largely hasn’t been held up as an example of how close we’re getting to AGI, but as an example of how goddamn weird Silicon Valley (or at least Lemoine) is.

The same problem arises, I’ve noticed, when I try to make the case for concern about AGI to unconvinced friends. If you say things like, “the AI will decide to bribe people so it can survive,” it turns them off. AIs don’t decide things, they respond. They do what humans tell them to do. Why are you anthropomorphizing this thing?

What wins people over is talking about the consequences systems have. So instead of saying, “the AI will start hoarding resources to stay alive,” I’ll say something like, “AIs have decisively replaced humans when it comes to recommending music and movies. They have replaced humans in making bail decisions. They will take on greater and greater tasks, and Google and Facebook and the other people running them are not remotely prepared to analyze the subtle mistakes they’ll make, the subtle ways they’ll differ from human wishes. Those mistakes will grow and grow until one day they could kill us all.”

This is how my colleague Kelsey Piper made the argument for AI concern, and it’s a good argument. It’s a better argument, for lay people, than talking about servers accumulating trillions in wealth and using it to bribe an army of humans.

And it’s an argument that I think can help bridge the extremely unfortunate divide that has emerged between the AI bias community and the AI existential risk community. At the root, I think these communities are trying to do the same thing: build AI that reflects authentic human needs, not a poor approximation of human needs built for short-term corporate profit. And research in one area can help research in the other; AI safety researcher Paul Christiano’s work, for instance, has big implications for how to assess bias in machine learning systems.

But too often, the communities are at each other’s throats, in part due to a perception that they’re fighting over scarce resources.

That’s a huge lost opportunity. And it’s a problem I think people on the AI risk side (including some readers of this newsletter) have a chance to correct by drawing these connections, and making it clear that alignment is a near- as well as a long-term problem. Some folks are making this case brilliantly. But I want more.

Source link