One day, Augmented Reality (AR) will live in our glasses or contact lenses, in the photons hacked on their way to our visual receptors, or embedded right into our brains. For now, it lives inside our iPhones and iPads. Apple seems to want to make sure it goes as fast and as far as possible with the technology on our current devices so it's as mature as possible by the time we get to whatever comes next.
That suits me just fine.
Reminders of the last ARKit
ARKit is Apple's framework for understanding what the camera sees and tracking the position and orientation of that camera relative to the world around it. That includes surfaces, objects, and faces, and their movements and expressions.
ARKit 1.0, introduced just one short year ago this month, could handle single-user experiences and horizontal surfaces, as well as lighting conditions, and could render virtual 3D objects into the scene — using SpriteKit, SceneKit, or Metal templates — that felt almost like they belonged in the real world.
ARKKit 1.5, which shipped earlier this year, included support for vertical and irregular surfaces, continuous autofocus, and relocalization so apps could resume from background.
ARKit 2.0, currently in beta, includes 4:3 video formats (think iPad); 2D image tracking and 3D object detection, so a photo can drag an augmentation with it, and a real-world model can come alive through AR; environment textures and mapping to make the augmented look more real; persistent experiences, so you can put that model away at home today, take it back out at school tomorrow, and continue right where you left off; improved face tracking, specifically for winking, gaze, and tongue — yes, tongue — movement; and… wait for it… multi-user experiences.
What's new in ARKit 2.0
There's a ton of cool tech in ARKit 2, including a new, standardized file format for AR called USDZ. It was launched in collaboration with Pixar and Adobe, so it should catch on.
The mesh for face tracking still looks totally sci-fi. Apple renders it with lighting that estimates intensity, direction, and ambient color temperature, but also provides spherical harmonic coefficients — yeah, I had to look it up, too — for apps that need them. It basically makes rendered augmentations look like they fit into the real world around them.
Some find the new gaze-tracking creepy and the tongue detection a tad on the excessive side. I haven't explored the former enough to form an educated opinion on it, though I'll be doing just that before the fall. The latter is just plain fun, though. (It's also binary for now, so tongue in or out, no elaborate gymnastics just yet.)
Map saving and loading are what enables persistence and multi-person experiences. Instead of generating and then throwing away the 3D world map, ARKit now provides it to developers, complete with simple anchors and support for custom anchors, to keep and share.
That multi-person though...
For multi-user, because the world map represents a single, real-world coordinate system, one person can initiate the experience and then share it with one or more other people. Each of them shares the same AR space from their own physical perspective. It does this by being not only faster to initialize in iOS 12, but through faster and more robust tracking and plane detection.
The experience is shared locally through multi-peer Bluetooth and Wi-Fi networking, which means you're not relying on the cloud, which is great for performance and privacy. You can use the internet when you want to, not because you have to.
There's also a fiendishly clever system for reflection mapping. While building the world map, ARKit also uses computer vision to build out a cube or other map representation based on scene texture. Machine learning is used to fill in any holes and create the "hallucination" of a complete map — as long as you don't look too closely. Reflection probes then detect objects and apply that texture when and as appropriate.
I had the chance to try the LEGO ARKit experience at WWDC and it was a ton of fun. It uses the 3D object detection, which means the object needs to be both built in the real world and scanned into the digital one first. Those objects need to be textured, rigid, and non-reflective, at least for now.
The real-world LEGO building sat on a table but, with an iPad, I could crack it wide open, drop the Joker in, and watch as he promptly set it on fire. Then, another person I was playing with could call up The Batman, get him into the chopper, and rescue the clowns (?!) while a third positioned a firetruck to put the blaze out. I've built LEGO for years, including a pretty great Arkham Asylum set, and I'd like nothing more than to bring it to life with my godkids for a few hours every week.
I also played a bunch of the SwiftShot game, which is a code sample for ARKit. You have three catapults. Your opponent has three catapults. Last catapult standing wins. What made it so much fun, especially the tournament on the last day of WWDC, was that multiple people could join in and watch and, when that happens, it starts to blur the lines between augmented and reality.
More than just fun and games
Shared experiences are going to be transformative in everything from education, where a teacher can lead a classroom through augmented explorations and labs, to enterprise, where teams can collaborate on prototyping, production, repair, and iteration.
So, yeah, we're all still semi-awkwardly holding up our phones and tablets to do all this – for now. But, again, at some point in the near future we won't be. And Apple will have all this technology — and apps will have all of this content and these services — for us when that time comes.
Clever fruit company.