Stunning Coding Assistant Threatens ChatGPT

Article

As part of my AI coding evaluations, I run a standardized series of four programming tests against each AI. These tests are designed to determine how well a given AI can help you program. This is kind of useful, especially if you’re counting on the AI to help you produce code. The last thing you want is for an AI helper to introduce more bugs into your work output, right?

Test 1: Write a simple WordPress plugin

Wow. Well, this is certainly a far cry from how Bard failed twice and Gemini Advanced failed back in February 2024. Quite simply, Gemini Pro 2.5 aced this test right out of the gate. The challenge is to write a simple WordPress plugin that provides a simple user interface. It randomizes the input lines and distributes (not removes) duplicates so they’re not next to each other.

Test 2: Rewrite a string function

In the second test, I asked Gemini Pro 2.5 to rewrite some string processing code that processed dollars and cents. My initial test code only allowed integers (so, dollars only), but the goal was to allow dollars and cents. This is a test that ChatGPT got right. Bard initially failed, but eventually succeeded.

Test 3: Find a bug

At some point during my coding journey, I was struggling with a bug. My code should have worked, but it did not. The issue was far from immediately obvious, but when I asked ChatGPT, it pointed out that I was looking in the wrong place.

Test 4: Writing a script

This last test isn’t all that difficult in terms of programming skill. What it tests is the AI’s ability to jump between three different environments, along with just how obscure the programming environments can be. This test requires understanding the object model internal representation inside of Chrome, how to write AppleScript (itself far more obscure than, say Python), and then how to write code for Keyboard Maestro, a macro-building tool written by one guy in Australia.

Conclusion

It was really just a matter of when. Google is filled with many very, very smart people. In fact, it was Google that kicked off the generative AI boom in 2017 with its "Attention is all you need" research paper. So, while Bard, Gemini, and even Gemini Advanced failed miserably at my basic AI programming tests in the past, it was only a matter of time before Google’s flagship AI tool caught up with OpenAI’s offerings.

FAQs

Q: Have you tried Gemini Pro 2.5 yet?
A: Yes, I have tried it, and it has performed well on my coding tasks.

Q: How did it perform on your own coding tasks?
A: Gemini Pro 2.5 was able to assist me in writing a simple WordPress plugin, rewriting a string function, and finding a bug in my code.

Q: Do you think it has finally caught up to, or even surpassed, ChatGPT when it comes to programming help?
A: Yes, I believe Gemini Pro 2.5 has caught up to ChatGPT in terms of programming help.

Q: How important is speed versus accuracy when you’re relying on an AI assistant for development work?
A: Accuracy is more important than speed when relying on an AI assistant for development work.

Q: Did Gemini Pro 2.5 surprise you the way it did here?
A: Yes, Gemini Pro 2.5 surprised me with its ability to assist me in writing code and finding bugs in my code.

Post Views: 36

Stunning Coding Assistant Threatens ChatGPT

Engineering confidence to navigate uncertainty | MIT News

Generate single title from this title Best of MWC 2026: Live updates on phones, concepts, and robots we’re seeing in 100 -150 characters. And...

Featured video: Coding for underwater robotics | MIT News

Generate single title from this title Upgrading agentic AI for finance workflows in 100 -150 characters. And it must return only title i dont...

Generate single title from this title Making Softmax More Efficient with NVIDIA Blackwell Ultra in 100 -150 characters. And it must return only title...

Engineering confidence to navigate uncertainty | MIT News

Generate single title from this title Best of MWC 2026: Live updates on phones, concepts, and robots we’re seeing in 100 -150 characters. And...

Featured video: Coding for underwater robotics | MIT News

Generate single title from this title Upgrading agentic AI for finance workflows in 100 -150 characters. And it must return only title i dont...

Generate single title from this title Making Softmax More Efficient with NVIDIA Blackwell Ultra in 100 -150 characters. And it must return only title...

Generate single title from this title Nvidia shares fall as blockbuster results fail to dazzle in 100 -150 characters. And it must return only...

Generate single title from this title It exposed what was already broken in 100 -150 characters. And it must return only title i dont...

What is a Performance Review + Definition?

LEAVE A REPLY Cancel reply

Latest

Engineering confidence to navigate uncertainty | MIT News

Generate single title from this title Best of MWC 2026: Live updates on phones, concepts, and robots we’re seeing in 100 -150 characters. And...

Featured video: Coding for underwater robotics | MIT News

Categories

Useful Links

Our Newsletter