The Art of Sushi and AI-generated Image: Perfecting Skills Through Practice

Kaz Matsune
Jan 29, 2024
5 min read

Dall-e generated illustration of California Roll — California Roll by Dall•E

I've been experimenting with the combined powers of Chat GPT and Dall•E by OpenAI. This integration marks a significant breakthrough in AI technology: Dall•E's image generation is now part of Chat GPT, making it easier to use without switching between platforms. Chat GPT-4 can effortlessly generate images on request, enhancing my explorations with AI. Here’s a glimpse of what I’ve achieved.

So, here is what I’ve done.

Crafting Sushi Images with AI

Focusing on sushi imagery, what could be more fitting than a California Roll, the classic American/inside-out roll? Here's the captivating result from GPT's image generation:

“a realistic illustration of a California roll sushi, featuring an inside-out sushi roll with crab meat or imitation crab, avocado, and cucumber.”

While the accuracy of GPT's generated sushi image isn't perfect, the mere ten seconds it took to create is incredibly impressive and unmatched by human speed. This initial attempt highlights that AI's capability in sushi image generation still needs refinement. With this in mind, I began experimenting with various prompts to explore its potential further.

“a realistic illustration of a New York roll sushi, featuring an inside-out sushi roll with smoked salmon, apple, avocado, and cream cheese”

Illustration of New York Roll, by Dall•E — New York Roll, by Dall•E

The New York roll typically features shrimp and avocado inside, with rice on the outside. Another variant resembles the Philly roll, but interestingly, neither includes salmon on the outside, a detail the AI-generated image from GPT mistakenly added. Curious about this, I next turned my attention to the Rainbow roll.

Illustration of Rainbow Roll generated by Dall•E — Rainbow Roll by Dall•E

The AI's rendition of a Rainbow roll was certainly colorful (like Skittles!) but not quite right.

The mysterious magenta (fish cake?), green (kabocha skin?), and yellow (mango perhaps?) elements left me puzzled. Despite those colorful, mysterious ingredients, the fact it only took a mere ten seconds to generate the illustration was intriguing. The image included curious-looking fillings – lettuce, tofu, or cream cheese-like objects, and candy cane-colored ingredients. The most striking feature was the whipped cream-style wasabi - a truly otherworldly interpretation by the AI.

I've noticed a pattern in the AI's sushi images:

GPT often includes multiple square fillings inside the rolls.
It portrays wasabi as if piped through a pastry bag with a wavy metal tip.

Curious about its understanding of more complex sushi, I asked GPT to illustrate a Dragon roll. Would the AI recognize that it typically consists of cooked unagi on top of a California or shrimp tempura roll?

Here's GPT's take on what a Dragon roll looks like:

Illustration of Dragon Roll generated by Dall•E — Dragon Roll by Dall•E

As anticipated, GPT's interpretation of the Dragon Roll was quite literal – a whimsical, dragon-shaped creation!

Here is another one.

The AI's rendition of a realistic Dragon Roll was less appetizing than expected, featuring an actual dragon complete with sharp, hawk-like legs on the plate. It seems even the fish company has delved into the realm of mythical creatures!

Another one.

Why is Dall•E struggling to create accurate sushi images?

Alright, you've seen the examples. It's now time to delve into why Chat GPT is struggling to produce a more accurate depiction of a dragon roll:

“The issue with inaccuracies in Dall•E generated images, especially concerning specific and detailed subjects like sushi, can be attributed to a few key factors:

Training Data Limitations:

Dall-E's training involves a vast, but not exhaustive, dataset of images and text. This dataset may not have enough examples of specific types of sushi or may not represent them accurately. Thus, the AI might not have a precise understanding of every sushi variant.

Complexity of the Subject:

Sushi is a diverse and intricate culinary art. There are many types, each with unique ingredients, shapes, and presentation styles. Capturing these nuances accurately requires a deep, specialized understanding that Dall-E, a generalist tool, might not possess.

Text Interpretation Challenges:

The prompts given to Dall•E might be interpreted in various ways. If the prompt isn't exceedingly specific, Dall•E might generate images based on a broader or different interpretation of the text than intended.

Inherent AI Limitations:

AI, including Dall•E, sometimes struggles with tasks that require high fidelity to real-world objects and their accurate representations. This is particularly true for items that need precise details and proportions, like food.

Semantic Gap:

There is often a "semantic gap" between the user's expectations based on their expert knowledge (like yours in sushi) and the AI's interpretation and rendering of the prompt. What you consider a specific type of sushi might be represented differently in the AI's training data, leading to discrepancies.

After conducting a Google search for “AI-generated sushi images,” I discovered some impressively realistic and accurate renditions. This revelation affirmed that current technology is indeed capable of producing precise sushi images, prompting me to take a different approach in my exploration.

How to prompt Dall•E to create more accurate sushi images

Me:

What are your recommendations and tips for writing a bette prompt to generate sushi related images using Dall•E or other AI, LLM?

ChatGPT:

Creating effective prompts for generating sushi-related images with AI tools like Dall•E or other large language models (LLMs) requires a combination of specificity, clarity, and creativity. Here are some recommendations and tips:

Be Specific:

Clearly describe the type of sushi you want to visualize. Include details such as the kind of sushi (e.g., nigiri, maki, sashimi), specific ingredients (e.g., salmon, avocado, rice), and any unique characteristics (e.g., topped with roe, wrapped in seaweed).

Use Descriptive Language:

Enhance the prompt with vivid, descriptive language. For example, instead of just saying "salmon sushi," you could say "fresh, bright orange salmon nigiri with a delicate slice of lemon and a small dab of wasabi."

Mention Presentation Style:

Describe how you want the sushi to be presented. Is it on a platter, a wooden sushi board, or perhaps in a bento box? Include details like garnishes, arrangement style, and even the type of plate or surface.

Consider Adding Context:

Sometimes providing a context or setting can help the AI generate more accurate images. For instance, you might say, "an elegant sushi arrangement ready for a high-end restaurant" or "a casual sushi meal for a family dinner."

Use Culinary Terms:

If you're familiar with specific culinary terms relevant to sushi, use them. This could include Japanese terms for certain fish, sushi styles, or preparation methods.

Balance Detail with Flexibility:

While specifics are good, too many details can confuse the AI. Aim for a balance where you provide enough information to guide the image generation but not so much that it restricts creativity.

Iterate and Experiment:

AI image generation can be unpredictable. Be prepared to iterate on your prompts, tweaking and adjusting based on the results you get.

Study Successful Examples:

Look at examples of well-crafted prompts and successful sushi images generated by AI. Analyzing these can give you insights into what works well.

Incorporate Feedback:

If you're working with a team or have access to sushi experts, get their feedback on the prompts and the generated images. They might offer valuable perspectives on making your prompts more effective.

Stay Updated:

AI technology is rapidly evolving. Stay informed about updates and improvements to Dall-E and other AI tools, as new features might enhance your ability to create more accurate sushi images.

After this interaction, here is the prompt I created:

“Generate the image of "fresh, bright orange salmon nigiri with a delicate slice of lemon and a small dab of wasabi" in the style of "an elegant sushi arrangement ready for a high-end restaurant."

Illustration of Salmon Nigiri generated by Dall•E — Salmon Nigiri by Dall•E

This is quite a departure from the previous one “Create an illustration of salmon sushi.”

Now, It’s time for tuna nigiri:

Generate the image of "fresh, bright red tuna nigiri, no garnish, no wasabi, no seaweed" in the style of "an elegant sushi arrangement ready for a high-end restaurant"

Illustration of Tuna Nigiri Roll generated by Dall•E — Tuna Nigiri by Dall•E

This is the best image Dall•E has produced so far, showcasing the importance of precise prompts.

The key takeaway?

It mirrors the human learning process - trial and error. When one approach fails, it's essential to try something new, analyze it, and refine it. It's a continuous learning journey. Much like any skill, practice makes perfect.

Sushi making follows the same principle.

If you're keen on mastering the basics of sushi, consider joining our public sushi class. For a more tailored experience, you can also inquire about private, corporate team-building events.