A few months back, I came up with an idea for testing AI image generators: creating a mood meter graphic that teachers could use in their classrooms. I thought this would be a good way to test how well AI image generators create something with several different elements that also look visually cohesive.
At the time, I wrote this note:
I doubt any of the current AI image generators could make something like this today, especially with the text in it. The snowmen would not look visually consistent like this. And that visual consistency is important because you want the viewer to focus on the snowman’s expression, not the little differences that might appear between each individual snowman, like the number of berries or the length of its carrot or the look of his hat.
That note references this free mood meter posted on Teachers Pay Teachers. Created in 2000, you’ll notice every element of the mood meter’s snowmen is the same except for the mouths and eyebrows. Duplicating the snowman except for those two small elements helped the artist produce it quickly. Can AI generate a mood meter with that same degree of stylistic continuity?
Here is the draft prompt I wrote back in December to test that ability:
Please create a “How are you feeling today? chart. The chart should feature a 3×3 grid of nine snowman faces expressing the following emotions:
- excited
- upset
- surprised
- frustrated
- happy
- mad
- worried
- sad
- something else
The snowman faces should be simple cartoon style that kids would like.
I managed to try this prompt on Dall-E 3 using ChatGPT back then and here is the output.
OpenAI’s Dall-E 3 couldn’t follow the instructions to produce a classroom-ready poster. There were 16 snowmen instead of nine. There were no labels for each mood. And the snowmen weren’t visually consistent, either. Hat style and colors, noses, eyes, and mouths all differed. Maybe I could have gotten better output by experimenting with the prompts, but teachers don’t have time for that. AI tools can’t add to teacher workloads. Ideal image generators should work the first time, though even the best creative tools require some refining of the output to get it just right. After this failure, I got sick and didn’t get a chance to try other image generators like Midjourney or Google’s Gemini.
A breakthrough
When OpenAI announced GPT-4o image generation a few weeks ago, I remembered this idea and immediately tested the original prompt.
Though rather plain, the mood meter is nearly perfect. The only quibble I have is with the sad snowman’s nose. It’s not a carrot or orange like the others.
The power of this new image generator isn’t just its ability to create a usable image on the first try. ChatGPT-4o also shines when you want to tweak the image. Photoshop isn’t required anymore; plain language is the only skill you need.
“Please add some red holly berries and a holly leaf to the snowman’s hat” produced this image.
The quality of the image has degraded somewhat. The berries aren’t round, the text isn’t crisp, and “UPSET” now reads “UPBET.” But my next prompt fixed all that.
“Please make the hat a lighter shade of blue with a dark blue band at the bottom of the hat’s crown.”
The text is crisp, the holly berries are round, and the holly leaves look better, too. What I think happened is that ChatGPT-4o regenerated the text so that its color would match the hat’s. Another unrequested improvement is the snowman’s face is now white, standing out better against the light beige background.
ChatGPT-4o’s image generator still has room to improve though. Sometimes it stubbornly refuses to make minor changes. I tried prompting it to make the SAD snowman’s nose match the others, but it wouldn’t change. So while you might need some basic photo editing skills for a bit longer, OpenAI’s image generation is already a massive improvement. It’s genuinely useful because now teachers can more easily make their classrooms more dynamic. They can generate their new mood meters to suit the seasons, holidays, or the unit they’re currently studying in class.
Testing other image generators
Having seen ChatGPT-4o’s strengths, let’s see how other popular AI image generators stack up. I tested five other popular AI art generators to compare how they measure up to OpenAI’s generator. The same prompt was used in each case. While tweaking prompts might squeeze out some improvement with the images, that’s not the goal here. I want to find out how useful AI image generators are in practice. That means giving them only one shot to prove their abilities. Truly useful tools should minimize teacher tinkering.
Claude Sonnet 3.7
Anthropic’s Claude doesn’t have a native image generator. However, it’s fantastic at generating code. Anthropic’s solution is to generate code-based images using SVG files, which describe images using mathematical formulas rather than pixels. An advantage of SVG files is they can be enlarged to poster size without getting blurry. The downside is they don’t look kid-friendly or “realistic.” Claude does an admirable job with the tools it has available, but I doubt teachers would opt for this style unless they teach coding.
Midjourney
Midjourney is renowned for generating stunning images that are highly detailed and stylistic. However, poster-type graphics are not its strong suit. While it generally followed the instructions and generated a 3×3 grid, it can’t add text labels. Nor were the facial expressions unique enough that a teacher could tell which emotion they were expressing if they wanted to manually add labels. However, I do notice that Midjourney’s new V7 model does produce more stylistically cohesive grids than V6.1.
Grok
xAI’s image generator couldn’t generate a 3×3 grid or text labels. It’s one of the least capable image generators at this point, but they’re advancing rapidly.
Gemini
Google’s Gemini Imagen 3 generator is almost there. However I first had to edit the prompt’s first sentence to, “Please create an image of a “How are you feeling today?” chart.” If you don’t explicitly request an image, Gemini generates an ASCII “chart” or table of text and emojis. With the tweaked prompt, the style of the snowmen is pretty consistent, and the 3×3 grid was no problem. But the expressions aren’t good; they’re a bit creepy. Gemini created text labels, but some get garbled or put in the wrong location, and they don’t match the expressions.
Canva
Canva is very popular in schools, especially because it makes graphic design more accessible to teachers and students. Dream Lab is their AI image generation tool, using Leonardo’s Phoenix model. The story is pretty much the same as the other less sophisticated image generators: it can’t follow instructions, reliably label the emotions, and the quality/consistency of the art ranks near the bottom. Canva does offer the ability to generate images using Google Gemini’s Image 3 and OpenAI’s Dall-E 3. However, we’ve already seen they’re not ready for primetime. I recommend waiting until they integrate OpenAI’s new ChatGPT-4o generator.
Discussion
How did ChatGPT-4o get so good?
So what’s going on here? How did OpenAI pull ahead and unlock this level of control and text quality when generating images? Ethan Mollick’s recent post has a pretty good explanation of this multimodal image generation. Basically, this is the first time that a large language model (LLM) is generating the image. Before, ChatGPT would create the prompt and then pass it to a diffusion model (Dall-E 3) to produce the image. Diffusion models only do one thing: generate images. Midjourney and Stable Diffusion are other examples of diffusion models. Their weakness is that they don’t grasp language as well as LLMs, making precisely controlling their output difficult.
OpenAI’s new generator works more like how it generates text: one token (word fragment) at a time, or in the case of images, one element at a time. Then it synthesizes all those elements into the final image, proceeding from left to right, and top to down. In practice, this means you have more control over the details of an image. You can request changes to particular elements in the image without worrying about other parts of the image changing in unintended ways.
I realize this was a bit of a technical tangent, but the takeaway is that it probably won’t be long before Gemini, Grok, and maybe Claude add this capability and catch up to ChatGPT.
Creating your own AI tests
Developing good AI skills requires experimentation. And knowing how AI fails may be more important than knowing when it succeeds. Creating tests like mine helps develop intuition for AI models and learn their jagged frontiers. AI is advancing so rapidly that for cases when it doesn’t quite work, chances are that it will work in a matter of months. Less than four months passed between my initial and successful mood meter tests in ChatGPT. I suspect Gemini and Grok will pass this test before the end of 2025.
Begin creating your tests by thinking of ways generative AI can help you get work done. Then create prompts to see how well each AI performs. Does it work or not? That’s all the evaluation you need. If the prompt almost works, but not quite, set a reminder to test it again when the next versions of ChatGPT, Gemini, Claude, etc. become available.
You don’t have to use AI
Remember that just because you can generate a mood meter with AI, it doesn’t mean you have to. This post is more about learning to discover use cases rather than advocating for AI-generated mood meters. Students crafting their own mood meters with good old-fashioned crayons and paper can have a much bigger impact because they’re getting an opportunity to meaningfully contribute to the class. So while it’s important to look for where AI may be useful, try to consider the students first. A little extra work planning and prepping an art project for them would probably brighten their day more than letting a machine decorate the classroom.
The leader (for now)
ChatGPT is the only AI model capable of producing a classroom-ready mood meter in one shot. But as this example shows, don’t be discouraged if your chatbot of choice can’t yet do this or any other task you have in mind. The tech is advancing so fast that your ideal use case could be viable in a matter of months or even weeks. Keep experimenting to discover new use cases you hadn’t thought of to save time or spark creativity. At the same time, remember that we work with curious little humans who crave connection with us and their peers. Strive to ensure those interactions remain front and center over the race for efficiency and productivity.
Join the discussion...