I love digital paitning, animation and clothes making.
I also love reading research papers, designing neural architectures and developing digital tools for cinematography.
During my PhD, I had the opportunity to marry my passions, by researching computer vision tools for filmmaking.
Through this, I gained experience working of Virtual Production and 360 degree Metaverse filming stages and
collaborating with artists to develop digital tools and production workflows that solve real world problems
and/or propose new approaches to content creation. Thinking about the future, I would like continue pursuing
my passion for digital tooling for artistic applications.
If you are curious about my work, looking for gaps in research or interested to hire/collaborate with me
feel free to email me: aazzarellib@gmail.com
To download my CV click here!
...
Intelligent Cinematography applies machine learning to real in-camera content acquisition for filmmaking.
Throughout the 3.5 years of PhD, my focus was on adapting 3-D reconstruction techniques to filmmaking scenarios.
I explored various topics, from efficient/compact dynamic 3-D reconstruction, sparse multi-view camera problems
and relighting real scenes in Virtual Production scenarios. (see "Research" section for specific topics).
Key words: Neural Radiance Fields,
Local Light Fields, Gaussian Splatting, Differentiable Rendering, Dynamic 3-D, Relightable 3-D Content.
Industry collaborations: Condense Reality, LuxAeterna.
My thesis A Decision Making Framework for Generalized Philosophical, Social and Legal Dilemma explores a multi-model framework
for assessing and resolving complex social decisions. I developed algorithms based on the philosophy of logic applied to philosophical, social and legal
dilemma. This thesis was graded 86% (high First Class for UK grading) and recieved the title of "award worthy".
Relevant Modules: (graded First Class)
Advanced Programming, Robotic Systems, Advanced Image Processing, Numerical Methods, Mathematcal Optimization,
Reinforcement and Online Learning, Engineering Management and Law
Co-Curricular:
Student Faculty President 2020 & 21(COVID period)
Head of the Electronic and Computer Science Mentoring Schee, 2020 & 2021 (COVID period)
I am French and Italian, and grew up in London, UK. I can speak English and French with high profficients. My Italian is casual.
My MEng required me to learn obj-C, C++, Python and MatLab applied to various domains including embedded systems, image processing and robotics. My PhD required further developed my Python skills. I have also engaged in some personal projects (e.g. following CatLike Tutorials) to develop my understanding of C#, HSLS and C++ for graphics applications. I have also tinkered with Julia, JS and other languages for some personal tools.
I have experience using Blender, Unity, Unreal, Procreate CSP, Photoshop and DaVinci for personal artistic projects. Whether its for a cool effect or smooth animation, I love exploring new ways to develop my artistic abilities. I also have work/coding experience with Blender-Python, Unreal-C++ and Unity-C#. I also have work experience with CUDA/OPTIX, PyTorch and most other python-based libraries.
This paper investigates 3D scene reconstruction and relighting for virtual production (VP) stages, where real foreground objects are captured in front of LED walls that display virtual backgrounds and provide real image-based lighting (IBL) to the foreground. However, this fixes the scene's appearance at capture time and as VP stage lighting is unique to each set it is difficult to anticipate the scene's appearance. So, when the footage does not match the director's expectations re-shoots are required. These are uniquely expensive due to the cost and logistical complexity of VP stages. Addressing this, we propose a VP 3D reconstruction and relighting (VSR) pipeline that synthesizes photorealistic 3D twins of VP scenes and enables changing LED wall content while propagating the corresponding lighting effects to the foreground.
Our VSR method relies on multi-view images of a static scene including a real in-camera IBL background, captured under variable background and lighting conditions - a capture setting for which no prior reconstruction or relighting method exists. We therefore establish a foundation for VSR by introducing a novel Gaussian Splatting (GS)-based pipeline, several proxy baselines and multiple datasets including real VP captures. The main technical problems concern designing a suitable representation and re-lighting scheme that samples IBL textures to change the scene's lighting on a per-primitive basis. The main contribution is a geometry-independent GS lighting model that represents IBL texture sampling coordinates and lighting intensity as view-dependent GS parameters. As this model avoids ray-based inverse rendering, it does not require depth or normal priors to learn transmission and reflection effects, it supports complex scenes including transparent objects, and it is implemented without custom CUDA or RTX code. This resulting representation is compact and efficient, requiring less than 5GB of RAM or VRAM to train 1080p scenes. In practice, our approach reduces the reliance on VP-specific hardware and enables greater creative flexibility in post-production. The code, datasets, documentation and video results are available online.
Deformable Gaussian Splatting (GS) accomplishes photorealistic dynamic 3-D reconstruction from dense multi-view video (MVV) by learning to deform a canonical GS representation. However, in filmmaking, tight budgets can result in sparse camera configurations, which limits state-of-the-art (SotA) methods when capturing complex dynamic features. To address this issue, we introduce an approach that splits the canonical Gaussians and deformation field into foreground and background components using a sparse set of masks for frames at t=0. Each representation is separately trained on different loss functions during canonical pre-training. Then, during dynamic training, different parameters are modeled for each deformation field following common filmmaking practices. The foreground stage contains diverse dynamic features so changes in color, position and rotation are learned. While, the background containing film-crew and equipment, is typically dimmer and less dynamic so only changes in point position are learned. Experiments on 3-D and 2.5-D entertainment datasets show that our method produces SotA qualitative and quantitative results; up to 3 PSNR higher with half the model size on 3-D scenes. Unlike the SotA and without the need for dense mask supervision, our method also produces segmented dynamic reconstructions including transparent and dynamic textures.
Dynamic Novel View Synthesis (Dynamic NVS) enhances NVS technologies to model moving 3-D scenes. However, current methods are resource intensive and challenging to compress. To address this, we present WavePlanes, a fast and more compact hex plane representation, applicable to both dynamic Neural Radiance Fields and Gaussian Splatting methods. Rather than modeling many feature scales separately (as done previously), we use the inverse discrete wavelet transform to reconstruct features at varying scales. This leads to a more compact representation and allows us to explore wavelet-based compression schemes for further gains. The proposed compression scheme exploits the sparsity of wavelet coefficients, by applying hard thresholding to the wavelet planes and storing nonzero coefficients and their locations on each plane in a Hash Map. Compared to the state-of-the-art (SotA), WavePlanes is significantly smaller, less resource demanding and competitive in reconstruction quality. Compared to small SotA models, WavePlanes outperforms methods in both model size and quality of novel views.
As research on neural volumetric video reconstruction and compression flourishes, there is a need for diverse and realistic datasets, which can be used to develop and validate reconstruction and compression models. However, existing volumetric video datasets lack diverse content in terms of both semantic and low-level features that are commonly present in real-world production pipelines. In this context, we propose a new dataset, \name, for VolumetrIc VideO reconstruction and compression. The dataset is faithful to real-world volumetric video production and is the first dataset to extend the definition of diversity to include both human-centric characteristics (skin, hair, etc.) and dynamic visual phenomena (transparent, reflective, liquid, etc.). Each video sequence in this database contains raw data including fourteen multi-view RGB and depth video pairs, synchronized at 30FPS with per-frame calibration and audio data, and their associated 2-D foreground masks and 3-D point clouds. To demonstrate the use of this database, we have benchmarked three state-of-the-art (SotA) 3-D reconstruction methods and two volumetric video compression algorithms. The obtained results evidence the challenging nature of the proposed dataset and the limitations of existing datasets for both volumetric video reconstruction and compression tasks, highlighting the need to develop more effective algorithms for these applications.
The first (comprehensive) review of computer vision research in the context of real video content acquisition for entertainment. To establish a structure, we categorise work by General, Virtual, Live and Aerial production, and within each category we discuss various machine learning applications and their links to other forms production. We also provide category-specific comments on future works and discuss the socail responsibilities for conducting ethical research.
We provide an overview of Dynamic NeRF and Gaussian Splatting research in the context of cinematography and explore the use of these technologies (Nerfacto, 4D-GS and SC-GS) to produce (very) short film. Topics discussed: (1) Dynamic representations, (2) Articulated models vs Scene-based modelling, (3) Data collection