nyris Discusses #TechWeLove: Stable Diffusion

In the first #TechWeLove article, nyris dives into Stable Diffusion with Alessandro Leone, exploring how this AI revolutionizes image generation and synthetic data.

Image based generative AI has gained significant popularity over the last few years with many brands becoming household names including Stable Diffusion and DALL-E from OpenAI. 

From a user perspective, image generation AI seems fairly straightforward and pretty fun to use. We sat down with nyris’ Head of Synthetic Data, Alessandro Leone as he explains how there is much more than meets the eye. 

In nyris’ first instalment of #TechWeLove, let’s talk about Stable Diffusion.

First Thing’s First: What is Stable Diffusion?

Stable Diffusion is an example of an open source Generative AI algorithm that generates images based on a specific input. These inputs can include text prompts, images, or a combination of both.

Professor Björn Ommer and the Computer Vision and Learning group at LMU Munich have played a crucial role in advancing stable diffusion models through their research in generative AI. By addressing key challenges in the training and optimization of these models, their contributions have led to improvements in the stability and quality of diffusion processes.

Prof. Ommer’s model is a small model, aimed at being more financially and environmentally friendly. The small model needs less energy and can run on a standard PC without the need for energy hungry data centres. nyris is proud to have Prof. Ommer as part of our Board of Technology.

Stable Diffusion Demystified

Alessandro explains that diffusion models involve two main phases: 

  1. The forward process, where random noise (i.e. static) is gradually added to a clean image, creating a randomness in pixels 
  1. The reverse process, where the algorithm progressively removes the noise in an attempt to restore the original image 

The model is trained to predict and subtract a calculated amount of noise in each step, but because the noise removal process is not deterministic, the final output is a newly generated image that is similar to, but not exactly the same as the original - an inherently unique image created by AI!

The Power of CLIP: Bridging Language and Vision 

CLIP (Contrastive Language-Image Pre-Training) is a neural network that is designed to understand images and their associated text descriptions by learning from a vast dataset of image-text pairs. Stable Diffusion uses this to enhance its image-text pairing abilities. 

CLIP works by embedding both images and text into the same high-dimensional space, where their attributes can be easily compared. Images are represented by vectors that capture their key features in relation to their text descriptions in this shared space. This process enables the model to align images and texts, where vectors that are closer together represent more similar items. 

From Image to Embedding: The "Guess Who?" of AI

When we want to physically describe something, we naturally use a set of keywords that are unique identifiers to whatever we are thinking of. In order to generate an image users of Stable Diffusion can prompt the model by inputting a caption containing keywords of what they want to see.

How does this work in the algorithm? Alessandro explains an interesting real life comparison: the “Guess Who?” board game! 

While playing we ask our opponent a series of questions using keywords to identify the attributes of the person on their card (e.g. Does your person have blonde hair?, blue eyes?, a big nose?). By process of elimination using binary ‘yes/no’ questions, we are able to create a mental image of what that translates to, and inevitably Guess Who? or What? their card is. 

Each character in “Guess Who?” can be represented as a vector of attributes in an embedding space, where visually similar characters are closer together. Just like filtering points in the embedding space, by asking questions we divide the space up, narrowing down possibilities and making it easier to identify the correct character based on their features. 

This approach helps in efficiently comparing, clustering, and analysing data, much like how we work out our opponent’s character card in the game. 

When using a text prompt to generate an image, the text is encoded by a language model into numerical representations. Both processes involve progressively narrowing down possibilities through iterative refinement and guided input to achieve the desired output or “right answer”.

 

Fine-Tuning: The DJ Decks of AI 

Fine-tuning is a process in machine learning that takes a pretrained model based on a large dataset and very slightly adjusting the internal parameters or “weights” to create a different output. Without creating a new model or significantly altering the overall architecture, it becomes more intelligent, memorising ways to create the desired output from a new input.

Alessandro draws an interesting parallel between fine-tuning AI models and adjusting a DJ's turntable. Just as a DJ fine-tunes their mix by adjusting various knobs, AI researchers can enhance their model’s performance by tweaking the weights in the neural network. This process allows Stable Diffusion to be customised for specific tasks or domains, improving its performance in targeted applications.

In fine-tuning, the focus is typically on the last layers of weights in a pre-trained model like Stable Diffusion. This involves providing new images and making slight adjustments to align the model with specific desired outcomes.

Use case for nyris

At nyris, our main goal is to help our customers reach their maximum efficiency by speeding up their product identification processes through visual search. 

When our customers lack image data for their search index, nyris can provide synthetic data services to create photorealistic models of their CAD data. Stable Diffusion offers interesting potential to generate high quality images for clients who have imperfect or low visual data. 

Alessandro explains that while nyris can build a search index with products with a single image, the more we have the better and with Stable Diffusion, his team can create multiple high quality images of spare parts from a single product image or detailed description, narrowing the data gap for our customers. However, this comes with limitations…

Challenges and Future Directions

While Stable Diffusion is incredibly powerful, we know every bit of technology has its limitations. Alessandro doesn't shy away from mentioning a few from his experiences:

  • Often struggle with generating highly detailed images due to the need for very detailed descriptions 
  • Issues generalising to unseen objects or scenarios, as they rely heavily on the patterns in their training data. 
  • Lack a “real world” understanding, limiting their ability to interpret complex scenes accurately 
  • Can sometimes produce artefacts or visual inconsistencies, particularly with complex structures, which can affect the overall quality and realism of the generated images

However, these challenges are driving innovation in the field. Professor Björn Ommer, his Computer Vision and Learning group, and the team at nyris are continually working to push the boundaries of what's possible for synthetic data generation. 

Want to learn more about how we're leveraging Stable Diffusion and other cutting-edge AI technologies to revolutionise visual search and synthetic data generation? Simply contact us on the nyris website and let’s explore the future of AI-driven solutions together!

Image generated by Alessandro using Stable Diffusion
GenerativeAI
StableDiffusion
SyntheticData
AIInnovation
TechTrends
VisualSearch
MachineLearning
nyris Discusses #TechWeLove: Stable Diffusion
Sales Representative
Christina Lynn
Christina Lynn is currently Marketing Manager at nyris, with a diverse background in both Sales and Science. Using her creative and analytical skills, she aims to drive awareness of visual ai.

Experience a live customised demo

Get answers to your specific question, and find out why nyris is the right choice for your business.

Thank you! Your request was submitted successfully.
One of our representatives will contact in the next 48 hours.
Oops! Something went wrong while submitting the form.
Live demo