Test Driving ChatGPT-4o (Part 1)

D&D Story, Real-Time Data, and Problem Solving

In this series, I test-drive OpenAI’s new GPT-4o model.

But first…

What is GPT-4o?

Introduction to GPT-4o

OpenAI's 2024 Spring Update Event introduced their new GPT-4o model.

Its key advancements:

Multimodal

GPT-4o integrates text, vision, and audio processing in a single model, enhancing the user experience with more natural transitions between tasks.

It can ingest and generate any combination of text, vision, and audio.

“o” stands for omni — which means “all” (modes).

Conversational Capability

GPT-4o offers improved conversational interaction, bringing to mind Her.

This includes:

  • handling live conversation smoothly

  • being interrupted while speaking

  • adjusting course in real-time

  • changing tonality

Live Voice Translation

The live translation feature allows real-time, voice-to-voice communication across languages, similar to human translators.

This is still under development with planned limited rollout.

Accessibility

Demonstrations highlighted GPT-4o's potential as an accessibility tool, assisting visually impaired users by vocally describing visual content.

Test: Dungeons & Dragons Story

First, I tried to recreate the OpenAI demo showcasing varying levels of dramatic voice and tones.

I opened GPT-4o on my phone, just like the OpenAI demo.

Then, I requested a Dungeons and Dragons story — my own personal DM*!

*DM stands for Dungeon Master, i.e. the heavy metal guy in Stranger Things.

ChatGPT tell me a dungeons and dragons story in a dramatic voice

Sabrina Ramonov @ sabrina.dev

Off to a great start…

The story seemed convincing, detailed, and D&D themed — we’ve got a dwarf, elf, mage, rogue, and buzzwords like dragon, forbidden, and ancestors.

Sabrina Ramonov @ sabrina.dev

But I couldn’t interrupt ChatGPT once it got going.

I tried speaking multiple times, but it would keep rambling.

In a real conversation, I’d expect to be able to interrupt.

I had to tap the screen to make it stop.

Then I asked:

Continue the story but with a really dramatic voice

Sabrina Ramonov @ sabrina.dev

But I didn’t notice any difference in tone or voice!

Next, I asked ChatGPT to use its maximum dramatic voice.

But it didn’t seem to detect my request.

ChatGPT kept listening, waiting, listening, waiting...

I could see its microphone input spike up when I talked, presumably indicating it’s hearing my voice.

I tried multiple times, eventually getting it to continue:

Sabrina Ramonov @ sabrina.dev

Unfortunately, I still didn’t hear any change in voice or tone!

After some googling, I was extremely disappointed to learn:

Public GPT-4o only supports text and vision, not audio or video!

Lame!

+1 to Google search for working as expected.

What I find strange and misleading, however, is when I ask GPT-4o what accents she can speak in:

Sabrina Ramonov @ sabrina.dev

I asked ChatGPT to tell me a story in a British accent.

But it started speaking in a strange accent that is definitely NOT British.

I’ve watched enough Sherlock and BBC to tell!

It had a slight accent, but I couldn’t place the accent at all.

Overall, a weird experience, but I appreciated the detailed fictional stories ChatGPT generated with minimal prompt engineering.

Test: Real-Time Data in Casual Conversation

Next, I tested GPT-4o’s virtual assistant capabilities.

Instead of asking for a long story, my questions were short and straightforward:

What’s the weather today in Park City?

Sabrina Ramonov @ sabrina.dev

I heard ChatGPT “typing” looking up the weather.

It was correct — sunny and perfect. Easy enough.

Follow up question:

What are the top 3 movies out right now?

Sabrina Ramonov @ sabrina.dev

But its first answer, Furiosa, is not released yet.

Not near me, anyway, until next week May 23.

So I clarified:

What are the top 3 movies out today that I can watch near me?

Sabrina Ramonov @ sabrina.dev

Sabrina Ramonov @ sabrina.dev

Yay!

It remembered earlier context when I asked about weather in Park City.

It used that context to search available movies in Park City.

Its first answer was Fall Guy, which is indeed available.

However, its second answer was again Furiosa, which isn’t available anywhere near me!

Ugh…

I felt disappointed that I couldn’t rely on its answers as a virtual assistant.

This broke the user experience for me.

Test: Image of Math Problem

As a multimodality test, I fed in an image of a math problem.

I pulled a question from the list of The 15 Hardest SAT Math Questions.

To solve this question, GPT4-o must:

  • analyze the image

  • interpret the problem to be solved

  • correctly extract the structure’s dimensions

I uploaded the image and simply asked ChatGPT:

Solve this math problem. Explain your reasoning step-by-step.

Sabrina Ramonov @ sabrina.dev

Sabrina Ramonov @ sabrina.dev

Sabrina Ramonov @ sabrina.dev

Impressive!

The correct answer is indeed D (1047.2 cubic feet).

GPT-4o analyzed the image to extract the math problem being asked and the variables needed to solve it. It broke down each part of the volume calculation separately, later combining them to arrive at the right answer.

Geez, that does seem like a hard SAT question…

To play skeptic:

It is possible this question, image, and explanation is part of ChatGPT’s training dataset since I pulled it from a top-ranked google search result.

To take things a step further, I should whiteboard some math questions ChatGPT has (hopefully) never seen before and see if it can solve them.

Stay tuned as I continue testing GPT-4o in this newsletter series.

Impatiently waiting for more modalities to be released…

DM me if you have ideas you want me to test: