A few days ago I was thinking about what you need to know to use ChatGPT (or Bing/Sydney, or any similar service). It’s easy to ask questions, but we all know that these big language models often generate false answers. Which begs the question. If I ask ChatGPT something, how much do I need to know to determine if the answer is correct?
So I did a quick experiment. As a short programming project a few years ago I compiled a list of all prime numbers less than 100 million. I used this list to generate a 16-digit number that was the product of two 8-digit prime numbers (99999787 times 99999821 is 9999960800038127). I then asked ChatGPT if this number was prime and how it determined if the number was prime.
Learn faster. Dig deeper. See further.
ChatGPT correctly answered that this number is not clear. This is somewhat surprising because, if you’ve read much about ChatGPT, you know that math is not one of its strong points. (There’s probably a big list of prime numbers somewhere in his study kit.) However, its reasoning was flawed, and it’s much more interesting. ChatGPT gave me a bunch of Python code that ran the Miller-Rabin primality test and told me my number was divisible by 29. The code provided had a few basic syntax errors, but that wasn’t the only problem. First, 9999960800038127 is not divisible by 29 (I’ll let you prove that to yourself). After fixing the obvious bugs, the Python code appeared to be a correct implementation of Miller-Rabin, but the number that Miller-Rabin outputs is not a factor, it’s a “witness” that confirms the fact that the number you’re testing is no: not important. The number it issued is not 29 either. So ChatGPT didn’t actually run the program; Not surprisingly, many commenters have pointed out that ChatGPT does not run the code it writes. It also misunderstood what the algorithm did and what its output meant, a more serious mistake.
I then asked him to reconsider the reasoning behind his previous answer and very politely apologized for the mistake, along with another Python program. This plan was right from the start. It was a brute force primality test that tried every integer (both odd and even!) smaller than the square root of the number being tested. Not elegant, not performance, but just right. But again, since ChatGPT doesn’t actually run the program, it gave me a new list of “key factors”, none of which were correct. Interestingly, it included its expected (and incorrect) result in the code:
n = 9999960800038127
factors = factorize(n)
print(factors) # prints [193, 518401, 3215031751]
I’m not saying that ChatGPT is useless. far from it. It is good at suggesting ways to solve a problem and can guide you to the right solution, whether it gives you the right answer or not. Miller-Rabin is interesting. I knew it existed, but I wouldn’t have bothered to look it up unless it was pointed out to me. (It’s a nice irony – I was actually prompted by ChatGPT).
Let’s go back to the original question. ChatGPT is good at providing “answers” to questions, but if you need to know the answer is correct, you need to be able to either solve the problem yourself or do the research you need to do. solve that problem. It’s probably a win, but you have to be careful. Don’t put ChatGPT in situations where correctness is an issue unless you’re willing and able to do the hard work yourself.