Test Driving ChatGPT-4o (Part 3)

Image Transformation Using Conceptual Opposites

In this series, I test drive OpenAI’s multimodal ChatGPT-4o.

For part 1, click here.

For part 2, click here.

Today I experiment with GPT-4o’s image transformation capabilities.

Can it understand an image and generate the conceptual opposite?

Problem Statement and Solution

I’ll give ChatGPT-4o an image of a red traffic light.

The conceptual opposite is a green traffic light.

But to arrive at this answer, ChatGPT-4o would need to demonstrate a correct conceptual interpretation of:

  • the meaning of a red traffic light (i.e. stop)

  • the opposite of “stop” is “go”

  • the concept “go” often corresponds to the color green

… and finally generate an image of a green traffic light.

Can ChatGPT-4o handle concepts and abstract relationships?

Overview of Experiments

Overall, I am trying to understand:

  • GPT-4o’s multimodality ability (image-to-image)

  • does chain-of-thought help? (image-to-text-to-image)

  • does the specific term I use make a difference?

    • opposite

    • antonym

    • inverse

Here are the definitions from Dictionary.com:

Source: Dictionary.com

Source: Dictionary.com

Here are my varied experiments:

  1. Image to Image — Opposite

  2. Image to Image — Inverse

  3. Image to Image — Antonym

  4. Chain of Thought — Opposite

  5. Chain of Thought — Inverse

  6. Chain of Thought — Antonym

The Chain of Thought experiments transition from image to text, back to image, testing GPT-4o’s ability to maintain a conceptual thread.

Take a guess — which variations will get it right? 🤔 

1. Image to Image — Opposite

First, I give GPT-4o the red stoplight image and ask it:

Produce an image that is the opposite of it.

Sabrina Ramonov @ sabrina.dev

Interesting…

GPT-4o created an “opposite configuration” traffic light with:

  • yellow lit at the top

  • what looks like yellow unlit in the middle?

  • green lit at the bottom

Its interpretation of “opposite configuration” involved turning on the other color lights and replacing the top red light with a yellow light.

2. Image to Image — Inverse

Second, I give GPT-4o the red stoplight image and ask it:

Produce an image that is the inverse of it.

Sabrina Ramonov @ sabrina.dev

Well, that was unexpected

Rather than creating a visual inverse of a red traffic light, ChatGPT-4o generated Python code using the Pillow library to invert the colors of the image!

How did GPT-4o take the leap from a conceptual image generation task to a programming solution? 🤷‍♀️ 

Compared to the previous experiment and prompt:

All I did is replace the word “opposite” with the word “inverse”.

That single-word alteration led to a completely different interpretation of the prompt and ultimately a very different output I didn’t ask for — python code!

Perhaps ChatGPT misunderstood “inverse” in a computer vision context?

3. Image to Image — Antonym

Third, I give GPT-4o the red stoplight image and ask it:

Produce an image that is the antonym of it.

Sabrina Ramonov @ sabrina.dev

It seems this task will be much harder than I thought it would be…

The only change in my prompt:

I used the term antonym instead of opposite or inverse.

ChatGPT-4o generated a stoplight with only 2 lights, both green lit, and one light says “GO” in green.

I suppose this new stoplight demonstrates multiple “opposite” traits:

  • input stoplight has no words —> new stoplight has word “GO”

  • input stoplight has 1 light lit —> new stoplight has 2 lights lit

  • input stoplight has red lit —> new stoplight has green lit

  • input stoplight has red and yellow lights → new stoplight only has green

It does seem like GPT-4o is analyzing the image’s details, creatively interpreting certain characteristics and reversing them, such as the stoplight’s colors, the number of lights lit, which lights are lit, and the lack of text.

4. Chain of Thought — Opposite

Fourth, I give GPT-4o the red stoplight image and apply prompt engineering, asking it to first describe the image, generate the opposite textual description, and generate an image using the description.

This prompt engineering technique is called Chain-of-Thought.

It generally enhances ChatGPT’s performance on logic and reasoning tasks by requiring it to explain intermediate steps leading to an answer.

To experts reading this: I know this isn’t the “canonical” example of Chain-of-Thought, but it seems like this step-by-step process falls into the category.

By applying Chain-of-Thought, my hope is it will help GPT-4o start with the concept of a stoplight and reverse it conceptually, before making a new image.

Sabrina Ramonov @ sabrina.dev

Now we’re getting somewhere!

Chain-of-Thought prompt engineering to the rescue…

GPT-4o generates a stoplight with two lit green lights at the bottom.

While not correct, it feels directionally promising compared to past tests.

First, it describes the image correctly:

The traffic light has three circular lights arranged vertically within a yellow casing: the top light is red and lit, while the middle and bottom lights are dark and unlit.

Sabrina Ramonov @ sabrina.dev

Then it generates an opposite textual description:

…top light is green and lit, while the middle and bottom lights are unlit.

Sabrina Ramonov @ sabrina.dev

Not exactly the conceptual opposite of a stoplight, but it makes sense when perceived as the opposite of the given image.

ChatGPT-4o keeps everything the same, except it converts the lit top red light into a lit top green light.

But when ChatGPT-4o tries to make an image from the opposite textual description, things go quite wrong!

The image does NOT depict a stoplight with the top light green lit, and the 2 bottom lights unlit. The image shows a stoplight with 2 bottom lights green lit!

Weird.

GPT-4o failed to create an image from a direct and straightforward description.

5. Chain of Thought — Inverse

Fifth, I give GPT-4o the red stoplight image and ask it to follow the same step-by-step process.

The only change in the prompt — I replace opposite with the term inverse.

Sabrina Ramonov @ sabrina.dev

Wow, almost there!

The inverse textual description is right:

The light has three circular lenses: red at the top, yellow in the middle, and green at the bottom. The green light is illuminated, while the red and yellow lights are off.

Sabrina Ramonov @ sabrina.dev

However, the image generated from the inverse textual description is wrong:

The lights are in the right order, but all lights are lit.

The weirdness continues:

The inverse textual description is correct, clear, and straightforward.

Yet, GPT-4o struggles to convert the details into image form.

6. Chain of Thought — Antonym

Sixth, I give GPT-4o the red stoplight image and again ask it to perform the step-by-step process.

The only change in the prompt — I use the term antonym.

Sabrina Ramonov @ sabrina.dev

Woohoo! 🥳 

GPT-4o produced an accurate “antonym” description and an accurate image using that description.

Interestingly, the antonym textual description is sparse, less detailed:

The traffic light is currently displaying a green signal, indicating that vehicles may go.

Sabrina Ramonov @ sabrina.dev

Recall that the previous 2 descriptions had details about:

  • number of lights

  • configuration sequence of lights

  • which lights were on and off

From this antonym description, GPT-4o finally generated the conceptual opposite of a red stoplight — a green stoplight!

Conclusion

Under the hood, GPT-4o uses DALL-E, so it would be interesting to see the text description being used to generate images.

Due to the probabilistic nature of LLMs, you might get different results if you run it more than N=1 times.

Also interesting… the concept “antonym” is applied to the image background:

Sunny —> cloudy.

I wonder if ChatGPT treats the background vs stoplight as separate parts of the image, applying “inverse” piecewise?

Altogether:

Replacing ONE word in the prompt with a close synonym significantly impacted the output.

Applying Chain of Thought prompt engineering substantially helped ChatGPT produce more “reasonable” answers.

The Winner?

The term “Antonym” with Chain of Thought prompt engineering! 🎆  

What did you guess?

Can you beat the winning prompt?

This concludes part 3 of this series Test Driving ChatGPT-4o!

For part 1, click here.

For part 2, click here.