This Crazy New Nvidia Tech Turns 2D Photos Into Fully 3D Scenes In Seconds

Creating 3-dimensional worlds and objects from flat 2-dimensional photographs isn’t a new concept. It’s been happening for years. In the early days (and still often, today), we used photographs inside 3D modelling software as reference images for our 3D creations. A few years pass and along comes photogrammetry, allowing the automatic creation of 3D models and scenes from a couple of hundred photos. Photogrammetry often takes hours, but this new tech from NVIDIA can do it in seconds. And you don’t need hundreds of images in order to do it, either. Just a few dozen. The tech demo above illustrates the principle where NVIDIA reproduces an iconic image of Andy Warhol shooting a Polaroid camera and with just a relatively handful of images, it gets turned into a 3D scene with a virtual camera that can fly around it with ease.

The new technology was revealed during a keynote presentation at NVIDIA GTC this week. Naturally, it uses AI to get this level of performance using neural radiance fields (NeRF). It’s a relatively new technique, with NVIDIA’s new implementation of it being dubbed “Instant NeRF” due to the short amount of time required to synthesise the 3D scene. NVIDIA says it’s the fastest NeRF technique to date.

The bit you want to watch is about the Omniverse Digital Twins, which starts at 51:04 in the video. The new technique offers speed increases of over 1,000x in some cases, with a model requiring just seconds to train on a few dozen still photos. It still requires some basic information about the cameras, too, like the angles from which the images were shot – like regular photogrammetry – but then can render the 3D scene “within tens of milliseconds”. A millisecond is a thousandth of a second. And “tens of milliseconds”… Well, if it can do it less than four tens of milliseconds (0.04 seconds) then that’s pretty much real-time image generation at 24fps from a 3D scene made from still photographs that itself only took a handful of seconds to create. That’s pretty instant! To get the level of speed that it does, NVIDIA says that it relies on a technique they developed called multi-resolution hash grid encoding which, naturally, is optimised to run on NVIDIA GPUs. The new encoding methods allow researchers to achieve high quality 3D results using tiny and fast neural networks. It was developed using the NVIDIA CUDA Toolkit and the Tiny CUDA Neural Networks library and can be trained and run on a single NVIDIA GPU – as long as it’s an RTX card with Tensor cores. NVIDIA says that the Instant NeRF tech could be used to create avatars or scenes for virtual worlds, to capture video conference participants and their environments in 3D or to reconstruct scenes for 3D digital maps. I can also see this being combined with some of the Unreal Engine tech used in virtual sets like those used to shoot The Mandalorian or the BBC Olympics coverage to place actors and talent in projected 3D worlds that could have been shot mere minutes or seconds before being used. Outside of creative uses and looking at more practical everyday applications, NVIDIA suggests that it might be used to train robots and self-driving cars to construct “real” 3D worlds from the 2D flat images they capture to provide more context to its environment, ultimately making them safer and more reliable. There’s no word yet on when something will be released that we can have a go with ourselves at home. I did ask NVIDIA and they said that they “can’t comment on any unannounced products”. So, it could be weeks or even years before something is released to the general public. But when it does, that’s gonna be a lot of fun to play with! You can find out more and see the full keynote on the NVIDIA website.