TCNA’s Mobility AI/ML (machine learning) team leveraged state-of-the-art Generative AI models such as open source Stable Diffusion  and ControlNet architectures  to train proprietary text-to-image latent diffusion models that are personalized for Lexus and Toyota vehicles to generate various artistic-style images. The team also used Dreambooth  framework released by Google Research for fine-tuning off the shelf text-to-image models. The coolest thing about these models is that they provide customers with a platform to visualize these vehicles in any dream setting. For example, “Lexus RZ driving on Mars” or even “Digital art of Toyota BZ4X in manga style”.
TCNA developed an AI Art generator product for Lexus Marketing and officially launched the program at the 2023 New York International Auto Show .
Thousands of auto show attendees brought Lexus RZ and RX vehicles to life in their ideal settings through digital, painting and futuristic art styles. Below are few sample images our team generated along with their respective text prompts. These images were generated with the AI art tool we built.
Text-to-image models are computationally heavy, and the latency to output a single image varies between 3-5 seconds on a single NVIDIA GPU device depending on the device type. We leveraged multiple GPUs to achieve parallelism during the inference, which helped reduce the overall latency drastically – by 75% – while generating four outputs for every user input.
Image Generation Workflow:
Distributed Inferencing on Multi GPU instances
Traditional Inference Pipeline
Let's consider a simple illustrative example where we generate image variations of the prompt “a photograph of an astronaut riding a horse.” This surreal prompt shows us how powerful these Stable Diffusion models are at combining unrelated concepts in profoundly creative ways. Let us take a deep dive into the technical details of the implementation of the inference pipeline,
The traditional model inference pipeline using Stable Diffusion 2.1, takes around 19 seconds to generate the following image.
In addition to the considerable inference time, we also observed the workload was pinned to a single GPU core, when executing the pipeline with a larger batch size of four images, as demonstrated in the below snapshot of GPU usage:
We found several opportunities for improvement like using the latest Pytorch 2.0 for optimized and memory-efficient attention implementation, employing half precision weights to achieve faster model load and inference times. When using the Pytorch 1.13 version, we noticed that the memory-efficient attention implementation through xformers toolbox greatly helped improve the inference times.
To assess the performance of different model variants, it was necessary for us to separately evaluate the model load time and inference time. During our extensive testing of different model variants, we discovered that the model load time typically ranged between 2.5 and 3.5 seconds. Despite varying inference times, the above stated optimization techniques significantly enhanced the speed of inference for us.
Additionally, Pytorch 2.0 includes an optimized and memory-efficient attention implementation through the torch.nn.functional.scaled_dot_product_attention function, which automatically enables several optimizations depending on the inputs and the GPU type.
Optimized/Distributed Inference Pipeline
Even with these modifications, we needed to generate up to several different images for each request. This prompted us to further optimize using torch multiprocessing package. The multiprocessing module offers both local and remote concurrency, effectively side-stepping the Global Interpreter Lock by using subprocesses instead of threads. The multi-processing technique allowed us effectively to use all GPU cores parallelly as seen in the below snapshot of GPU usage,
The following code is created with torch multi-processing is also 100% compatible with native multiprocessing module packaged with Python. This allows us to further optimize our workflow for image generation and allows us to generate 4 images in roughly 11 seconds. The code created with torch multi-processing is 100% compatible with native multiprocessing module packaged with Python.
The generation of AI art promoting Toyota and Lexus vehicles is only the beginning. Future applications will enable both companies to create new levels of personalization for customers to experience their brands. After the success of the New York Auto Show, we are continuing to identify potential opportunities across several different functions and products in which we can use Generative AI technology and are looking forward to delivering these cutting-edge solutions.