Home Artificial Intelligence How AI and Robotics Are Driving Next-Gen Collaboration

How AI and Robotics Are Driving Next-Gen Collaboration

How AI and Robotics Are Driving Next-Gen Collaboration

Depending on the generation in which you grew up, the topic of cutting-edge video collaboration might evoke such images as a hard-boiled detective talking to his watch, or a lazy sprocket maker being surprised by his tyrannical boss on a giant screen. And, though the portrayals of these technologies were way ahead of their time, they’re not that far off from what’s possible now. 

These days, we live in a world that’s so used to the video conferencing that was thrust upon us during the pandemic that, unless otherwise specified, “let’s get on a call” implies the use of video. It’s expected that video will be used in everything from the job recruitment process to regular meetings. And, this extends beyond business powwows to everyday interactions between friends and families of every generation. 

For instance, it is increasingly commonplace not only to artistically express ourselves on YouTube (and other creator platforms), but also to routinely embed various other video solutions into our lives. These include sending grandparents video messages of their grandkids opening presents, personally welcoming new patrons to our businesses and even receiving paid messages from celebrities

Video is also a huge component in surveillance, which features cameras built into friendly robots that amble around, vacuum your house, or even fly throughout your home or business. And while these security solutions are certainly worthy of their own separate discussion, they highlight a powerful trend: the future of video conferencing will be defined by an amalgamation of artificial intelligence and robotics. 

Owl Labs’ birdlike device uses AI to process 360-degree audio and video, integrating remote colleagues and hybrid workers. Image: Owl Labs

Getting Everyone in the Picture 

One of the pitfalls of video calls is that while they may be fine for facilitating one-to-one virtual encounters, they tend to come up short when attempting to accommodate too many people. Inevitably, folks are required either to squish unnaturally close together or to place the camera so far back that they sound and appear distant. 

Instead of forcing us to crowd around the smaller screens of our phones and laptops, it makes sense that companies would look to our TVs, the audio-visual hub in our homes. 

Back in 2012, Biscotti created a camera (shaped like the hard Italian pastry for which it was named) to place atop the family TV set. For context, back then flatscreen TVs had recently become a staple and Netflix had launched its ubiquitous streaming service only several years prior. Needless to say, families had become very much ensconced with their televisions. But as freeing as it was to be able to video chat with the whole family—which in my case at the time included a newborn and a toddler—while casually sprawled out on the couch and floor in front of the TV, the system was a little clunky in terms of interoperability and perhaps a bit ahead of its time. 

Fast forward to the present day, Boston-based Owl Labs has solved the camera-placement problem for businesses with an innovative 360-degree camera mounted atop a base unit that includes speakers and microphones. The highly portable unit—which looks like a wingless owl, now in its third generation—is meant to nestle down in the middle of a conference room table. It can be used independently or paired with the company’s latest offering, a soundbar with a built-in camera and microphone. Either on their own or when configured together, this solution’s secret sauce is how the AI parses out who’s speaking and syncs that to the video displayed. The result is that the meeting’s in-person contingent can sit around the table and converse naturally. From there, remote colleagues can view everyone at once in a super-wide strip or featured individuals in close-up panels based on who is speaking. 

How AI and Robotics Are Driving Next-Gen Collaboration
Insta360’s Link webcam offers 4K resolution, AI-powered tracking and gesture controls. Photo Courtesy of Insta360

If you work from home, you probably don’t need a 360-degree web camera; but Insta360—a company known for developing 360-degree action cams—makes a forward-thinking webcam you should know about. The Link uses a relatively large 1/2-inch sensor to capture Ultra HD (4K resolution) video. Thanks to that sensor size, the camera is quite forgiving even in poor lighting conditions. It’s able to focus quickly, which minimizes or eliminates blurring during calls. And, unlike traditional fixed-position webcams, this one rocks an AI-powered, three-axis gimbal to automatically track, zoom and keep you in the frame no matter how much you’re moving around. Combining those capabilities, the Link has several useful party tricks up its sleeve, including modes that focus on your physical desktop (a bird’s-eye view, to highlight hand drawings and unboxings), shoot in portrait (for social media) and respond to gesture controls (when you’re standing away from your desk for a presentation). 

Is This the Droid You’re Looking For? 

Back in 2010, when an episode in the fourth season of “The Big Bang Theory” featured “ShelBot,” a robotic proxy for the show’s main character Sheldon, millions of nerd-loving fans were introduced to the concept of telepresence. Since then, many advancements have taken place, but the concept remains the same: an elevated mobile tablet on wheels that an off-site controller can use to see and hear—and likewise, be seen and heard by—folks who happen to be at that location. 
 
Although this technology presents strong use cases for healthcare, distance learning, senior care, manufacturing and warehouse management, it can also be used more generally for telecommuting. Instead of having to spend excessive time or money on traveling—or even scheduling an intricate series of meetings—a remote manager could casually roam a facility and engage with employees in a more organic way, without leaving their base of operation. 

Case in point, OhmniLabs’ Ohmni stands 4 feet 8 inches tall and uses 30-watt brushless motors to roll its steady trio of six-inch wheels at up to 2 miles per hour. (It’s not self-balancing, which helps conserve battery power.) With a 130-degree tilting screen, its 4K camera can view angles from floor to ceiling and capture still images during calls. Because the robot weighs just 25 pounds and folds, it transports and stores easily. It’s able to provide up to five hours of call time, during which it can be controlled by an authorized operator on a phone, tablet or computer using an encrypted browser connection. And, when it needs to recharge, the robot uses computer vision to automatically find its dock. 

Adding a Third Dimension 

As opposed to Owl Labs, which is deftly translating real people in 3D settings onto 2D screens, Swiss startup Copresence enables the creation of photorealistic 3D avatars. Within their app, you can scan your face from several angles. Then, the inherent AI morphs those scans into a surprisingly realistic representation of your head —complete with eye movements and facial expressions—that can then be dropped into a variety of gaming and mixed reality settings, offering a potentially wide gamut of applications. 

Additionally, some key companies have been driving the development of a glasses-free 3D video experience. Dutch company Dimenco has partnered with tech giants such as Asus, Dell, Microsoft and Intel to forge its Simulated Reality (SR) technology. Having witnessed it firsthand at CES 2023, I can say that using this tech was uncanny in its ability to bring new depth to the video conferencing experience. Dimenco’s version uses two small cameras built right into the laptop monitor’s bezel. Whereas Google’s Project Starline, a new version of which was just revealed at this year’s I/O event, utilizes three strategically placed external cameras (and some fancy software) to bring an added sense of realism to video calls. 

It’s All About the Hardware 

Instead of purely software-based solutions, Wehead sees hardware driving the future of video conferencing. They’ve developed a fully functional robotic head prototype with multiple screens around the “face,” meant to mimic the spatial sense and organic movement of a human head. And, while the initial product’s deconstructionist aesthetic may be a bit polarizing, the biopsychological reasoning behind the design is solid. 

With multiple screens on a robotic noggin, Wehead incorporates three dimensions and expressive motions. Image: Wehead.

According to a 2021 Stanford study, a few of the reasons we may find videoconferencing so fatiguing over long stretches include excessive eye contact, reduced mobility and increased cognitive load. “We must work harder to send and receive nonverbal cues that are crucial for expressing emotions, building rapport, and creating a connection,” says Wehead’s Founder and CEO, Ilya Sedoshkin. “With Wehead, there is no need to stare at the screen constantly. Both conversation participants have the freedom to move their heads naturally, enabling nonverbal communication.” 

In other words, there’s something freeing and ironically human about a robotic noggin that twists and bobs based on a colleague’s actual head movements. With a turn of their head (in real life), that remote colleague can rotate the Wehead (including the camera mounted at its top) wherever it’s located, allowing them to look around the room they’re calling into. Moreover, those who are in the room can mentally relax a bit by not having to continuously stare so intently at a flat screen while trying to ignore their own images. (To get a better sense of how the device works and feels, go to the Wehead site, scroll down and click on “Start host demo” or “Start guest demo.”) 

As Sedoshkin points out, “Currently, we see that many jobs primarily based on intellectual function can be done remotely. In the near future, jobs that involve communication will also be remote, such as work in concierge services, psychotherapy, HR, education, entertainment, and others.” He also hints that the next version they’re developing will even further enhance the feeling of a remote person having a physical presence. 

Holography: The Final Frontier 


Realistic full-body holograms have always felt like the ultimate endgame for remote communications. With scores of examples throughout the science fiction genre, communicating via 3D holograms has been a collective fantasy for decades. And right now, that reality may be closer than you think. 

Corporate communications stalwart Webex has developed Hologram, a collaboration system that uses a 12-camera array on one end to transmit a 3D image to a colleague wearing virtual reality goggles on the other end. Holoconnects and Proto each offer large projection boxes—roughly 88 inches tall—in which a life-sized person (broadcasting from a remote studio) can appear without the audience needing special goggles. (Proto also offers a smaller version that’s about a fourth the size, geared more towards tabletop presentations.) And DVE, the self-proclaimed pioneers in augmented reality presence, foregoes the box approach in favor of a large translucent panel on which to project its holographic images. 

But, while we may see some broadcasters, conference programmers, concert producers and high-end retailers using holography solutions, the technology is still pretty inaccessible and prohibitively expensive for most of us. At least for the moment then, we may just have to settle for using such familiar 2D video conferencing solutions as Zoom, Skype, Teams, and Meet. But the future of video collaboration feels (at least figuratively) within reach. 

SUMMARY 
Video is no longer merely a medium by which we passively consume entertainment in our living rooms. Increasingly, it’s being integrated as a method for communicating in business and expressing ourselves in leisure. This deep dive into video conferencing innovations past, current and future includes an all-seeing birdlike device, three-dimensional avatars, an expressive multi-screened humanoid head, to and the holography that’s starting to peek over the not-too-distant horizon.