Harnessing LLMs for breakthrough AI Image Generation in Production in 2024

Thumbnail AI Image Generation
April 17, 20244 min
Adi Nisman photo
Adi Nisman

When looking at a large language model (LLM), avoiding the unwanted is just the beginning; the challenge lies in precisely creating the visuals with the items you intend to see, especially in a production environment. Let’s explore how we solved it and why it’s crucial.

Achieving precision in AI-generated images isn’t just an ambition—it’s essential. Our challenge is to produce flawlessly realistic images on the first attempt for any query, no matter how complex. This imperative led us to an innovative solution, leveraging advanced language models to craft our prompts, ensuring each image precisely captures the intended concept.

This article explores our journey towards elevating AI image generation to new heights, where we create accurate images about all of our generated concepts, directly enhancing user satisfaction and engagement.

The Challenge of Production-Grade AI Use


Most people playing with AI image generators have the luxury of trial and error. They can tweak prompts or switch models until they get the perfect image. The stakes are much higher in a production environment, where we use these models to generate images for any user query instantly. Our previous images were good, but “good” isn’t good enough. We needed to guarantee top-notch, realistic images for even the most specific and uncommon requests.

The AI Image Generation Problem

The solution required an innovative approach. We couldn’t just feed prompts into the model; we had to ensure these prompts were tailored to produce the best possible outcome every time. So, we integrated LLMs like ChatGPT to refine and structure the prompts for our image generator.

However, despite these efforts, the “no parameter” setting in these models does not inherently guide what to include, only what to exclude. This means they do not “understand” text how humans do (Midjourney No Parameter, n.d.). They give weight to each word, meaning our prompts must be precise and informative.

For instance, our approach has significantly enhanced the accuracy of image generation across a variety of scenarios. To clarify, two illustrative examples: Initially, when we requested images of french fries without ketchup (Image 1), the model persisted in including ketchup, revealing its limitations in understanding the text as humans do.

Image 1

AI Images

Prompt: french fries. without ketchup

A more intricate challenge arose with the idea of an innovative bagel (Image 2). Simply providing a basic prompt often resulted in ambiguity, particularly regarding whether the innovation pertained to the bagel dough or its filling.

Image 2

AI Images

Prompt: Ramp bagel. Plated

Two of the results are plain bagel and the other two show bagels as sandwiches.

Our AI Image Generation Solution: An LLM Innovation

To address these challenges, we explained to an LLM how to transform our ideas into precise, unambiguous prompts. This strategy enabled the image generation model to accurately produce visuals that matched our specific intentions, ranging from ketchup-free french fries (Image 3) to uniquely innovative bagels (Image 4).

These examples underscore the model’s occasional misinterpretation of our requests. A practical scenario further illustrating this point is when we suggest adding white chocolate to a cookie.

Image 3

AI Images

This image showcases the model’s ability to showcase realistic textures and variations in shape for a common food, and make it more interesting.

Image 4

AI Images

This image highlights the model’s capacity to create detailed textures for baked goods, balancing the filling inside and crispy exterior.

Image 5

AI Images

The model needs clear guidance on whether the white chocolate should flavor the cookie, serve as a topping, fill, or be integrated as chunks (Image 5 and Image 6).

Image 6

AI Images

Here, the model showcases its ability to change how the food looks based on prompts, where the white chocolate is integrated into the cookie itself.

Midjourney no parameter. (n.d.).

These refinements demonstrate our commitment to ensuring that every image precisely conveys our suggestions, enhancing user understanding and interaction with our content.

Looking Forward

Our work doesn’t stop here. The goal is to keep pushing the boundaries of what AI image generation can achieve, ensuring our images are realistic and portray what we want to show. We’re dedicated to continuous improvement, aiming always to surpass user expectations. In this case, we are just speaking about images but we have more interesting things coming down the line like videos and who knows what else?

In sum, having an LLM create our prompt for our AI image generation models is crucial to enhancing our visuals’ quality and realism. We need an excellent alignment between our text and images because it directly impacts user satisfaction and engagement.

What can food intelligence do for you?