Behind The Scenes: The Technology
Commencement is a communal moment that isn't simply watched, but experienced. Unfortunately this year's physical distancing poses a challenge in how we involve graduates.
Even with unprecedented social media, screen sharing and TV broadcasting media allow students to watch their graduation but do not allow students to take part.
For that reason we created Volaroid with 3D game engines to prototype and develop an AR (augmented reality) story and volumetric photo capture technology to recreate the graduate to insert them in the cinematic storytelling experience.
By generating avatars of both MIT President Raphael Reif and the graduates themselves, the app turns the graduating student's would-be experience of receiving their diploma in Killian Court into a virtual production to be enjoyed from within their own home and shared with the world.
A Three Step Process
Like other cinema, Volaroid was created in a three-step process. We built a set, produced its characters, and added effects to transform our idea into a story.
THE ENVIRONMENT
Building immersive sets requires fleshing out its two core parts: its audio/visual and its rules/logic - defining ultimately how it feels and how it behaves. For the creation of the immersive environment we turned to the popular game engines Unity and Unreal, platforms used in the creation of cinema, games, and Virtual and Augmented reality experiences.
In the prototype phase, we developed a Unity scene where one stock human model walks towards another and as they finally meet and shake hands. This idea encapsulated what the experience could be: a graduate or their avatar receiving their diploma in a Covid-struck world.
Let's reverse-engineer this demo back to its elements in order to understand how to create virtual experiences. In the above demo their are two cohesive modules, first a scene's models or assets generally modeled and rigged outside the game engine and the environment itself - its physics and scene lighting in the game engine.
Avatar assets are created in two stages - they are first modeled and then rigged. Models are your scene assets. Every object or character with a likeness is a model. Rigging is the process of adding a skeleton and joints to allowing those models to move. In this case, we took a two human models from sketchfab, animated their actions with rigging done in Mixamo, and finally imported the full asset to Unity where we added colliders to allow the two characters to interact with other game assets.
To reference a familiar example, In creating the familiar classic Mario video games, professional game designers would first need to design and model Mario in a 3d modelling software and rig him with a skeleton that allowed his knees to bend when he jumps and extend when he falls. With this complete, the model is now ready for the engine.
When creating virtual worlds, it is important to consider the believability and familiarity the digital world has when compared to the real one). The game engine's physics and lighting systems contributes to the immersive experience and convince us that an object is real and natural.
Every object that moves has a shadow to show depth and the act of movement, as you can see in this model below.
To make scenes realistic lighting is also very important. Lighting can give a scene and even a game an ominous feel or a glorious feel - like how each of our favorite times of the day have so much to do with how much light there is outside.
The physics of a game engine determines the interactions between ~~~~all the elements in the world. Physics provides much of the realism to the scene. As the graduate walks, gravity anchors them to the ground. We simulate the physics of the regalia cloth to show it swaying and moving as the graduate walks.
Move around and explore the mechatronic model. Notice the shadows and lighting effects
RAMPING UP TO PRODUCTION
While Unity is easily accessible for beginners and great for prototyping in small groups, we needed a more open-source platform to design our own avatar creation pipeline with a large team. Therefore, in the execution phase, we used the Unreal Game Engine. Unreal engine provides access to the direct source code allowing custom implementations of libraries and other tools. Additionally, the graphical capabilities of unreal are unparalleled by other game engine. Unreal is often used in Hollywood to create photo-realistic CGI scenes for blockbusters such as the Star Wars Mandalorian and Finding Dory as a regular tool for modern film-making.
THE CHARACTERS - AVATAR GENERATION
To bring the graduate in this experience we needed to consider how the avatar could be customized in the creation of a real proxy for the graduate themselves.
Universality and accessibility were essential considerations for the project. In this we needed to consider two parts of the character, its head and its body. Design considerations ranged from the ability to recreate avatars for all facial structures and features and including options for students that can’t walk.
After experimenting with various facial reconstruction approaches,we identified a few baseline requirements needed. The solution would need to take a single frontal user photo and recreate a 3d mesh of their head fast, accurate, and in a performant manner. Running on a mobile phone meant that we would need to leverage the cloud for doing this. After experimenting with custom solutions, while we were able to generate accurate face-meshes, extrapolating that for hair and head structure was difficult. To accomplish this, we partnered with ItSeez3d, who have years of academic research in their background doing computer vision and machine learning research. We built a custom Unreal engine plugin integration for their cloud facial reconstruction software. It takes the input image, isolates the face and reconstructs a face mesh. Using machine learning, they are able to modify existing head meshes to generate an accurate head mesh. Utilizing an existing large database of hair models, the software identifies the closest hair model, modifies it, applies the appropriate texturing and coloring the the hair. With this, we now have an accurate, performant, avatar reconstruction pipeline.
You've probably used ARCore face meshing on instagram, snap chat, or tik tok where face filters and various effects are applied on top of the face mesh. Volaroid takes similar capability and uses it to reconstruct your face.
Effects (FX) and Scoring
For the creation of the cinematography which turned the experience from a proof-of-concept into an immersive story, we included other models to give the experience an MIT environment and included an original composed score to immerse the audience aurally. You can read more about the score here. To transition between visual sets, we used Unreal Engine Niagara Particle Systems to create seamless set transitions and the feeling of magical realism.
PLACING THE SCENE
Just as stories require a printing press and films require TV and broadcasting, augmented reality experiences require their own unique setup for experiencing the app. They requires technology for placing the app and AR libraries like ARKit and ARCore use either marker-based or markless tracking to anchor the experience to a fixed spot. In the prototype shown above, we used the MIT logo as a marker to center the world before starting the environment. Markless technology works differently. When the device detects a reasonably level plane, the environment is anchored and fitted to the space. Utilizing the phone's gyroscopes and camera-based computer vision feature mapping, the phone keeps the scene anchored even when the user moves. The app's final version does not require any kind of marker. You can open it up in your living room, on a kitchen counter, or even outside on your lawn.
In the span of a few weeks, our idea evolved into prototype and finally matured into a story. A finished version will be released soon after graduation with even more fun-packed features. We can't wait for you to see and play with Volaroid.
KEYWORDS
#VR #AR #Mixed Reality #3D Modelling #Model Rigging #Unity #Unreal #ARCore
#Avatar Generation #OpenCV #Niagra Particle System #Colliders #Marker-basedAR #MarkerlessAR