The realm of live-action 4D (3D + time) encompasses a vast swath of digitization potential, much of which is already being realized today: from simultaneous localization and mapping systems for autonomous robots and self-driving cars to stadium-sized systems for digitizing live sporting events.
At Digital Air we're interested in standards for a very small subset of that realm: digitizing live-action human bodies and faces in motion. This is because the human body and face are the two most important assets in storytelling. Initially, we're focussed on an even smaller subset of that: open format standards for the organization and description of live-action 4D human body and face data for multi-vendor workflows. We're interested in workflow standards because we operate at the start point of those workflows: image acquistion.
Multi-vendor workflows are important because specialization drives progress. 4D workflows are a series of specialized, rapidly evolving steps -- from our area of expertise: camera array photography -- to photogrammetry, compression, pose estimation, autorigging, retopology, reanimation, lighting, and rendering.
Current solutions for digitizing photographically-derived live-action 4D human content fall into two principal categories: high resolution discrete sequential 3D models (useful for visual effects production where there is no limit on bandwidth) and low resolution temporally compressed streamable assets (useful for interactive devices where bandwidth is an issue). At each step of both of those workflows, best practice processes evolve out of a combination of collaboration across and within disciplines. Open standards are needed at each step in order for vendors to understand the data that they are receiving and to describe the data that they are generating: including format, resolution, compression, decompression, scale, origin, and camera parameters. Such basic workflow standards are needed for collaboration and competition to flourish and for multi-vendor workflows and solutions to answer the needs of clients today and in the future.
Step by Step
Despite the relatively high data density and complexity of 4D data compared with traditional 2D motion pictures, simple standards, official or unofficial, that facilitate multi-vendor workflows should be relatively easy to define. Doing so will in turn fuel opportunities for growth and collaboration across the specialized processes of 4D content creation, editing, and playback.
"Stage One" (Workflow), "Stage Two" (Format)
After "stage one" standards for workflows are established, "stage two" standards for temporally compressed streamable, and eventually playable (rigged and reanimatable) assets will also be needed. Animated textures derived from photogrammetry are expensive in terms of bandwidth. Multiple competitive solutions for temporal compression of both geometries and textures are needed so that natural market selection can promote the best solutions to their targeted platforms and use-cases. Vendors developing such solutions need content producers to deliver professionally and predictably shot, formatted, and documented data for ingest into downstream compression and rigging processes. Additionally, because of the amount of research and software that has already been developed with respect to retopology, compression, and autorigging, "stage two" standards will need to take into account the need for industry participants to bring their innovations to a multi-vendor workflow while preserving their opportunity to be paid for their software. This will require an open community that respects, communicates with, and engages with one another. Getting to that point sooner than later is why I see the potential for relationships and collaboration across the industry resulting from "stage one" open workflow standards, as described above, as the first step towards the more complex "stage two" open standards needed for cross-platform, resolution-independent 4D temporal compression codecs and rigged playable assets.
"Stage two" standard content needs to be as portable across platforms as JPEG images and MPEG-4 videos are today.
Where We Are Now
Most of the work required for rigged, playable 4D assets as it relates to machine vision and machine learning is already at an advanced stage of development, particularly in academia. The question is not can machine vision bring real world 4D human performance-based assets into computer graphics syntax, but rather how fast and efficiently can the industry as a whole do so in a way that is portable across easy to use workflows, platforms, and devices? The answer to that question is directly correlated to the question of how long it will take to establish workflow standards and processes for collaboration that enable the entire community to engage in producing solutions.
Open Standards vs. Open Source
Open standards is not the same as open-source. It's important that IP that solves a specific part of a process can remain closed source and can be monetized to incentivize and reward its creation. If your software performs a useful function: from compression to retopology to autorigging to editing -- you should not have to own a film studio or a tech platform to monetize it. Open standards will expand the number participants in the industry and the number of customers for your product by defining how proprietary functions connect to the overall workflow. The resulting new and better solutions will make the overall technology more useful and expand the market for your product and everyone else's as well.
4D standards will naturally be an outgrowth of 3D standards, and 3D standards are already well established, for example at the Khronos Group, which is responsible for Collada, KTX, OpenCL, OpenGL, OpenXR, glTF, Vulcan, and WebGL standards, and at the World Wide Web Consortium (W3C) with WebXR.
The Value of Collaboration
Computer graphics, like all technology development, is a collection of collaborative processes. Open standards play a critical role in how the disparate elements of those processes fit together. Standards for 4D will be needed before widespread adoption of 4D technology can take place. Until standards are established we will likely continue to see only the shoots and sprouts of all of the constructive things that will eventually be done routinely with 4D.
Digital Air has been awarded the first phase of an Epic MegaGrant to produce Rights Free 4D Human Datasets for Open Source 4D Research and Standards Development. The resulting datasets are intended to be useful for training machine learning systems for pose estimation as well as being useful for comparative results for different downstream workflows and codecs. Datasets allow researchers to design and test new methods and experiment with new technology on real-world data prior to making the capital investment necessary to record real-world data themselves. This is particularly important for software researchers and developers that have little interest in camera arrays other than the data that they produce. Our aim with the datasets is to facilitate research, collaboration, and conversation between industry participants and to help develop a framework for open standards for data recording and compression that includes provisions for existing and future IP rights in the space.
A discussion of the current status of the MegaGrant and the process of defining its contents and scope follows in the next blog post.
Update May 19, 2021
We've submitted a New Initiatives Proposal to Khronos Group. If you would like to be included in this and are not listed please contact us. Khronos Exploratory Groups can be joined by non-members later if the NIP is accepted. There is no urgency to get on the list unless you wish to show support already at this early stage or contribute resources.