- Sabrina Ramonov š
- Posts
- Can ChatGPT Solve this Math Riddle?

# Can ChatGPT Solve this Math Riddle?

## Testing ChatGPT on High School Math Puzzle

Itās a beautiful Saturday morning, perfect for ChatGPT and math puzzles!

# Problem Statement

Hereās the problem statement:

(weāre going to ask ChatGPT to solve the puzzle above)

But first, letās talk about the solutionā¦

When you get to **sin(Pi)**, do you assume itās a **variable** or a **value**?

Youāve probably never seen a variable named Pi.

Youād treat Pi as a value.

This makes sin(Pi) = 0 and therefore the whole product equals zero.

**The correct answer is 0.**

But can ChatGPT figure it out?

**Hereās a youtube version of this post:**

# Naive Prompt

Fail.

ChatGPT canāt answer the problem without specific values for the angles.

# Chain of Thought

Similarly, ChatGPT insists it canāt solve the problem without specific values for the angles.

# Plan and Execute

What if I ask ChatGPT to create a plan first, then follow the plan?

Still no luck!

# Agents

Here I try pseudo-agents.

Not really agents, but agents in spirit.

I ask ChatGPT to solve the problem, review the solution (i.e. give feedback), then solve it again using the āfeedbackā from step 2.

Honestly, surprised that didnāt work better!

The professional mathematicianās review didnāt add much value. Overthinking it, ChatGPT!

# Give a Hint

Now I nudge ChatGPT to think about each multiplier in the problem statement and whether any multiplier is zero. This is a pretty big hint:

It seemed promising:

**āSo, we should check if any of the angles is a multiple of pi. If at least one angle in the set ā¦ is a multiple of pi, thenā¦ [answer is] 0ā**

But the answer just stopped there.

ChatGPT failed to proceed to check if one of the angles is a multiple of pi.

# Give Another Hint

Letās make it even more explicitā¦

Hmmā¦

Why isnāt ChatGPT writing out each multiplier, like I ask it to?

# Force Expansion

Ok, Iām going to be as explicit as possible.

ChatGPT: write out each multiplier, without ellipsis, then solve the problem.

I throw in Chain of Thought too.

**This is the only prompt Iāve tried so far that gives both the correct answer and correct explanation:**

ChatGPT listed out the Greek alphabet.

Then, ChatGPT explicitly wrote out the product.

It noticed **pi is one of the angles in our product.**

**Hence, the product ā¦ is zero.**

Nice work, ChatGPT!

Really glad I wonāt have to sit here all day š«

**BUT it feels like I had to give overly specific instructions** for ChatGPT to finally get it. I donāt want to have to do that.

# Easier āSolutionā ā Give More Context

Seeking a shortcut with less handholding, I try giving ChatGPT some context:

**āSolve the following riddleā**

The answer is correct but reasoning is flawed:

**āomega, which is a multiple of piā**

I donāt know why it assumes that.

But at least the answer is correct, with minimal hand-holding from me!

My hypothesis:

Using āriddleā makes ChatGPT more certain that there is a definitive answer.

**I run it again:**

Correct answer and correct explanation!

**Again:**

Correct answer and I think correct explanation. ChatGPT says **āpi is typically included in such series to indicate completeness**ā which is a bit vague. But Iāll take it.

Out of 3 runs with the āriddleā context, ChatGPT got it right 3 times.

But its reasoning was a bit shaky that first run ā **āomega which is a multiple of piā**.

It seems like contextualizing this problem as a āriddleā forces ChatGPT to produce an answer rather than give a generic solution.

# Changing āRiddleā to āProblemā

Now I change ONE word.

āRiddleā to āproblemā.

Back to wrong, generic answers.

# Keep the Riddle, Change SIN to COS

Iām curious what effect the word āriddleā has.

Letās change `sin`

to `cos`

and see what happens.

Itās no longer 0.

Not much of a riddle, but letās see what ChatGPT says.

Seems like the term āriddleā forces it to look for a closed-form solution.

I like this output even more because ChatGPT says it needs more information AND tries to come up with a plausible answer for the riddle.