In a stark white browser tab, Sam — a young blonde woman with perfectly shaped lips — asks me for the solution to 2+2. I immediately think of the infamous Star Trek: The Next Generation episode in which a tortured Captain Picard is shown four lights. If he admits there are five lights, the ordeal will stop. “There are four lights!” Picard shouts defiantly. Of course, I’m not being tortured. I’m at home, staring at the future face of the metaverse and trying valiantly not to think about memes from a TV show known for its exploration of ethics and humanity.
Sam isn’t a real person — she’s a digital human created by Auckland-based tech company Soul Machines. Designed to have a short conversation with visitors about herself, she runs on a proprietary “digital brain” and studies my expressions via webcam. At one point Sam asks me to smile but can’t seem to register my biggest, brightest “cheese.” I wonder if I’m just bad at emoting. When she asks if I know what autonomous animation is, I respond “No, but you’re about to tell me.”
“Good answer!” Sam chirps. “You should come and do my job!” She then explains that it means her speech and action aren’t pre-recorded — she can respond to every moment like a natural interaction. Digital people running on Soul Machines’ dystopian-sounding Humans OS 2.0 can use their hands and will one day be able to use full-body motions. “Does that make me more relatable to you?” she asks.
When we think of the metaverse, popularly depicted as a game-like virtual environment where humans use avatars to live, work, and play, we’re often the main characters. But it’s Soul Machines which will be filling the metaverse with what co-founder Greg Cross refers to as a “digital workforce” — a stream of bespoke Sams who will form the equivalent of NPCs in nascent digital worlds, as well as extensions of ourselves.
“When we’re playing a game, we adopt a certain persona or personality, when we’re coaching our kids’ football team we adopt another persona, we have a different personality when we’re at the pub having a beer with our mates,” Cross explains. “As human beings, we’re always adjusting our persona and the role we have within those parameters. With digital people, we can create those constructs.”
Right now, Soul Machines mostly makes digital people for customer service and public outreach, but they’ve also worked with will.i.am and Carmelo Anthony; in will.i.am’s short promotional video for his digital twin, the Black Eyed Peas rapper observes a pimple that Soul Machines replicated on his face. The company has digital people repping the World Health Organization, Maryville University, Westpac bank, the New Zealand police, and SK-II skincare. Ruth is a digital baking coach who works for Nestle. The company has been doing this for years, starting with BabyX in 2013 — a prototype AI that remains the core of its research arm. Cross’ co-founder is engineer Mark Sagar, who had an Oscar-winning digital effects career in Hollywood (including a stint at Weta Workshop) before returning to the University of Auckland to create BabyX, which is modeled after his daughter.
There are naturally other players in the digital person industry, like the AI Foundation (AIF) which also boasts a mixed team of scientists and entertainment veterans, including AI-driven game creator Lars Buttler (Trion Worlds), visual effects artist Rami Hachache (creator of fake Quibi celebrity Kirby Jenner), and Hollywood executive Joe Drake from Lionsgate. AIF’s website offers even less insight into the nuts and bolts of their tech, aside from a similar general message about “bringing the potential of AI to everyone.”
Soul Machines stands apart for two reasons. Through Sagar’s groundbreaking research, it’s a pioneer in bringing hard neuroscience to the art of creating digital humans, which are a staple part of the Hollywood special effects arsenal. Second, it’s already got digital people working in the field with “teachable” digital brains borne out of the work done with BabyX, who is now a toddler; the company can’t release a full list of its digital laborers without each client’s consent. Yet, poring over the company’s white paper doesn’t yield much more than an extended summary of how the digital brain animates cutting-edge CGI to allow its digital people to adapt to real-time interactions. At the moment its digital people still require guidance from a human “trainer.” Soul Machines’ end goal is to teach a digital person how to make goal-based (and one day, value-based) decisions, which is still a long way off.
“At some point in the future,’ says Cross, “you might be able to create a digital version of yourself or multiple versions of yourself, and they can go out and do stuff, make money for you, make money for your company, while you’re doing something else that’s a whole lot more fun.”
My first thought is that this is going to be an absolute field day for MMORPG botters farming for resources. But Cross goes on to suggest using a digital person to play a game like Call of Duty. “Those types of digital people are what we call human-enabled or human-driven digital people, they’re mimicking — they’re under the instruction of real people,” he explains.
Call of Duty is a curious choice. It’s a skill-based multiplayer game where cheating is an open problem, and one where bragging rights hinge on having flesh-and-blood opponents. “A large part of the appeal of multiplayer games is playing another human, and knowing that you are matching wits and reflexes with another human brain,” says Mark Johnson, who studies digital culture and emerging forms of labor at the University of Sydney. “In this regard I’m not sure how digital people would be different from just a really strong AI … and while those are interesting, it’s telling how little people play against AIs when human players are available, no matter how savvy the AI competitor.”
Most MMOs don’t allow bots or third-party services; historically, one of the most straightforward ways to identify a botter in World of Warcraft was to strike up a conversation with a suspicious player and see if they would respond like a real person. “There are many obvious reasons for these sorts of rules and ‘digital people’ would massively upset these and force us to profoundly rethink ideas around play, work, fairness, the use of our time,” Johnson adds. “If it’s just a competitive multiplayer game, or even a single player game, the entire point is to play the game.”
Beyond games, Soul Machines has loftier aspirations for its digital people, which starts to veer toward the kind of excitable Panglossianism that colored the early internet years. Cross is quick to clarify that Soul Machines wants to do good, and the company has publicly stated that it won’t let politicians use its services to avoid amplifying extremism. “It’s important that companies and people learn from each generation of technology we’ve created and think about how we want to do things differently going forward,” Cross says, pointing to healthcare and education as classic examples where infrastructure and resources are a problem.
“We don’t see digital people replacing healthcare professionals and teachers, we see a means to augment and amplify them,” Cross says. “We just don’t attract people into those positions at that moment based on what we’re prepared to pay them.” Where a less advantaged person may not be able to afford one-on-one time with a doctor or teacher, Soul Machines’ digital people — each programmed and trained in the relevant field and deployed at scale — would theoretically allow anyone to experience a greater degree of personalized care (which still requires an internet connection and webcam). He goes on to explain that in a hypothetical banking situation, users might feel more comfortable talking to a digital person about their personal finances. “We have a lot of hard data now that there are many people in many interactions who prefer to speak to digital people over real people.”
Virtual and augmented reality are already being used in teaching environments, but the idea of using digital people to supplement a uniquely messy human experience is understandably controversial. “Our children learn how to be social, emotional, ethical, complex, ambiguous creatures in the world partly by watching what the adults in their lives model for them — and they are watching teachers for something like 30 hours per week for 12 years of their life,” says Nick Kelly, who specializes in design education and cognition at the Queensland University of Technology. He acknowledges that technology can improve schooling, but it’s a complex issue, and the idea of students using AI isn’t inherently dangerous.
“It doesn’t matter how well animated the ‘digital person’ is, how high their resolution is, or how well-programmed they are to provide ‘individual attention,’ they are unable to replace human teachers who have a monopoly on knowing how to be human — now and in the foreseeable future,” Kelly explains, pointing out the dangers of having a for-profit company, working with governments, to determine the shape of entire education systems.
Introducing digital people into the healthcare system, especially given tech’s notoriously poor issues with digital privacy, offers a whole new set of problems, like whether it’s ethical to let users choose how a digital worker looks or behaves. Soul Machines’ Humans OS 2.0 platform can create a digital person in real-time; at best this means a Black user might feel more comfortable speaking to a Black digital person, and at worst it could mean that racists could opt for an artificial environment of white customer service. Women would be able to choose to speak to a female-presenting digital person about bra fittings or puberty problems, which could theoretically be nice for teens who don’t want to do that with a parent. But these options, for all their business-centric benefits, could potentially feed existing prejudices and create a false sense of reality.
This new digital workforce would also operate under a specific set of conditions. “When we create a digital brand representative for a big enterprise, of course, they’re not going to have the ability to express negative emotions,” Cross explains. “We’re expected to behave in a way that is consistent with the values of that particular role.” It’s a troubling standard when you consider how Big Tech dehumanizes its workers today, with Amazon leading the pack. While digital humans don’t need rest (or basic compassion), a corporate “positive vibes only” mandate means changing the way we perceive and interact with “customer service” even as it wears a human face.
“People are more aware of what data is getting captured and how it’s getting captured and how it’s getting used,” Cross says when I ask him about ethics of metaverse construction, especially in the wake of an era when Google once bore the slogan “don’t be evil.” “Every wave of new technology has been used to make a huge difference in the world, in terms of productivity, democratization, our ability to travel… technology has always been used by most of us to do incredibly good things and by a few of us to do the things that aren’t very nice or simply plain evil. That is a reflection of the human condition.”
It’s an interesting ideal to hold up in an age where Facebook is being rightfully pilloried as an “indisputable” source of harm to its millions of users, especially when you consider its origins as a glorified “hot or not” for Mark Zuckerberg’s fellow Harvard students. Over at The Atlantic, Ian Bogost points out that tech-driven metaverse buildup is very much “a fantasy of power and control” that twists and romanticizes a sci-fi concept into an equally twisted escape route for billionaires who don’t need to contend with the reality of their capitalist legacies.
As we wade into more metaverse hype, it’s clear that companies like Soul Machines will be filling in customer service gaps across every industry, from games to healthcare. In fielding my questions, Cross exudes the serene patience of a man who knows that his work will take years, if not several decades, to really take off. “I would argue that the human touch is already being lost (something we see as accelerating),” he says in a follow-up email, pointing to the rise in transactional apps like the kind we see in online banking. “This is exactly why we see empathetic customer experiences as being a very important part of connecting with people in the future.”
But if digital people are going to form the backbone of this bold new universe that transcends the physical world, their evolution in games should invite the same level of scrutiny as it would in other areas like healthcare and education. Gaming isn’t just something you do to kick back and relax anymore — it’s a billion-dollar industry with whole ecosystems of developers, artists, producers, voice actors, QA professionals, marketers, and streamers. Microtransactions, “gamblification,” and pay-to-win features are often a big part of multiplayer and mobile games. Fortnite’s in-game purchases, for instance, reflect a very specific vision of a functioning metaverse economy that relies on cosmetic skins and topical pop culture. Big games are a distinctly corporate machine, and using digital people to humanize an already hyper-aggressive capitalist environment could have dire social and cultural consequences.
“The ‘blockbuster’ games industry has not exactly been renowned in recent years for its ethical practices, whether we’re looking at microtransactions, games as a service, loot boxes, skin betting, or whatever else,” says Johnson. “All of these phenomena have brought along major ethical questions that are only now really beginning to be interrogated and dealt with (research and legislation always, sadly, come after new developments) … I honestly can’t believe the big-money parts of the games industry are going to do anything ethically, frankly, unless they explicitly prove to us otherwise.”
Soul Machines’ work also raises questions about how digital people would work across different metaverses in an industry where console companies are notoriously insular; for instance, if you had a digital person assistant, how would they traverse seamlessly through different game franchises and IPs and retain a sense of continuity? How would a simulated human presence affect our relationship with games as a fantasy escape, a livelihood, a relaxing hobby, or all of the above? What kind of tensions would arise in the gaps between people who can afford digital personae and those who can’t or won’t use them?
Limiting digital people to NPC roles seems a benign enough start. “Creating big multiplayer games is a huge cost. It’s a huge amount of investment … all of the different animations, and all of the different scenarios that could play out,” says Cross. “What we’re looking at in autonomous animation is digital people that can respond in real time based on what they see, what they hear, and what’s going on around them.” He offers an example — creating more adaptable villains to create a more lifelike sense of immersion.
While digital people aren’t murderous terminators here to take over the world, they’re already being used as a building block in our slowly (but surely) evolving new meta-world. “A digital workforce has more utility as the metaverse evolves, as it becomes an economy or a series of economies and industries and businesses,” Cross says. “So that’s why we think the sort of animation that we’re doing, the autonomous animation we’re doing with digital people is really, really going to be a core piece of metaverse construction.”
If we’re thinking about a workforce in terms of sheer utility, it makes sense to use the hell out of an omnipresent AI with the capacity to simulate human behavior. But given the cost of reaching that point — a distant point where digital people are indistinguishable from the real deal — it seems like they can only be used to turn profits, as vehicles for capitalism on steroids. Examining digital people raises a mirror to our own faces, of what we want to see in our workforce and labor and how we see those doing the same work right now. I probably won’t be alive to see Soul Machines reach the peak of its journey, nor will Cross, but something tells me Sam — at least a small part of Sam — will.