My full publication list is available on Google Scholar.
We present Amazon Nova, a new generation of state-of-the-art foundation models that deliver frontier intelligence with industry-leading price performance. Amazon Nova Reel is our advanced video generation model, offering high-quality outputs, customization, and motion control. Nova Canvas enables professional-grade image generation with rich customization tools. Our multimodal models Nova Pro, Lite, and Micro offer cutting-edge capabilities across text, image, video, and document processing.
We present Diffusion Soup, a compartmentalization method for Text-to-Image Generation that averages the weights of diffusion models trained on sharded data. By construction, our approach enables training-free continual learning and unlearning with no additional memory or inference costs. Our method achieves up to 30% improvement vs. a paragon model, and has applications in anti-memorization and zero-shot style mixing.
Amazon Titan Image Generator is a foundational generative model which enables content creators to generate and edit high-quality images using natural language prompts. The model allows users to create, modify, and repurpose images, and supports responsible AI use with built-in safeguards and invisible watermarking.
We tackle the problem of obtaining dense 3D human reconstructions from single and partially occluded views. In such cases, the visual evidence is usually insufficient to identify a 3D reconstruction uniquely, so we recover a set of plausible and reconstructions, consistent with the input image. We train using a best-of-M loss, to which we add flexibility with a novel quantization scheme based on normalizing flows.
We introduce a fully automatic, end-to-end system for 3D dog reconstruction trained using only weak 2D supervision. We use SMBLD, a new 3D deformable template model which includes a detailed shape prior learnt training using expectation maximization. We also release StanfordExtra: the largest dataset of 2D keypoints and segmentations for an animal category.
A system to recover the 3D shape and motion of a wide variety of quadruped animals from video. We overcome the limited availability of animal motion capture data and ensure generalizability to real-world sequences by training on synthetic silhouettes. We apply our method on manually-segmented and automatically-segmented monocular animal videos and require no other form of user intervention.
We improve 3D body shape estimation for diverse body types. While existing methods successfully estimate 3D pose, reliably estimating precise shape remains challenging. To address this gap, we propose new loss functions and a test-time optimization routine that can be readily integrated into parametric 3D human reconstruction pipelines.
We propose deep implicit functions to reconstruct large-scale driving scenes. To avoid requiring watertight meshes for training, we instead use LiDAR to approximate ground truth occupancy labels. We evaluate on a real-world autonomous driving dataset and show incorporating geometric priors improves reconstruction quality.
This thesis focuses on designing methods for animal reconstruction, making use of 3D morphable models. Topics covered include: training 3D animal reconstruction algorithms with synthetic silhouette data, refining 3D shape priors in-the-loop and handling ambiguous input images.
Our virtual try-on method generates a realistic digital model of a person based on an image and seamlessly applies clothing to it. By analyzing both the body and clothing images, we create a layer mask that determines how each pixel should be rendered, based on which part of the body it belongs to. The result is a highly accurate virtual try-on experience, suitable for rendering full outfits on a selected body. This was built as part of the Amazon Style project, an ML-powered physical fashion store.
Behaviour and key point predictions at ~15fps by a deep learning architecture we refer to as RodentNet. Results shown on validation sequences from the SCORHE dataset.
Computer vision application for verifying regulatory gowning procedures in collaboration with GlaxoSmithKline. Won departmental award for best third year dissertation at the University of Warwick.