Date:

GitHub Copilot’s AI to the test

GitHub Copilot: A Mixed Bag of Results in AI Coding Tests

The thing I find most baffling about the programming tests I’ve been running is that tools based on the same large language model tend to perform quite differently.

For example, ChatGPT, Perplexity, and GitHub Copilot are all based on the GPT-4 model from OpenAI. But, as I’ll show you below, while ChatGPT and Perplexity’s pro plans performed excellently, GitHub Copilot failed as often as it succeeded.

Test 1: Writing a WordPress Plugin

This test involves asking the AI to create a fully functional WordPress plugin, complete with admin interface elements and operational logic. The plugin takes in a set of names, sorts them, and, if there are duplicates, separates the duplicates so they’re not side by side.

This was a real-world application that my wife needed as part of an involvement device she runs on her very active Facebook group as part of her digital goods e-commerce business.

Results

Most of the other AIs passed this test, at least partly. Five of the 10 AI models tested passed the test completely. Three of them passed part of the test. Two (including Microsoft Copilot) failed completely.

The thing is, I gave GitHub Copilot the same prompt I give all of them, but it only wrote PHP code. To be clear, this problem can be solved solely using PHP code. But some AIs like to include some JavaScript for the interactive features. GitHub Copilot included code for using JavaScript but never actually generated the JavaScript that it tried to use.

Test 2: Rewriting a String Function

This test is fairly simple. I wrote a function that was supposed to test for dollars and cents but wound up only testing for integers (dollars). The test asks the AI to fix the code.

Results

GitHub Copilot did rework the code, but there were a bunch of problems with the code it produced. It assumed a string value was always a string value. If it was empty, the code would break. The revised regular expression code would break if a decimal point (i.e., "3.") was entered, if a leading decimal point (i.e., ".3") was entered, or if leading zeros were included (i.e., "00.30").

Test 3: Finding an Annoying Bug

This test is another one pulled from my real-life coding escapades. What made this bug so annoying (and difficult to figure out) is that the error message isn’t directly related to the actual problem.

Results

GitHub Copilot got this right. This is another test pulled from my real-life coding escapades. What made this bug so annoying (and difficult to figure out) is that the error message isn’t directly related to the actual problem.

Test 4: Writing a Script

Here, too, GitHub Copilot succeeded where Microsoft Copilot failed. The challenge here is that I’m testing the AI’s ability to create a script that knows about coding in AppleScript, the Chrome object model, and a little Mac-only third-party coding utility called Keyboard Maestro.

Results

To pass this test, the AI has to be able to recognize that all three coding environments need attention and then tailor individual lines of code to each of those environments.

Conclusion

Given that GitHub Copilot uses GPT-4, I find the fact that it failed half of the tests discouraging. GitHub is just about the most popular source management environment on the planet, and one would hope that the AI coding support was reasonably reliable.

As with all things AI, I’m sure performance will get better. Let’s stay tuned and check back in a few months to see if the AI is more effective at that time.

FAQs

Q: What is GitHub Copilot?
A: GitHub Copilot is an AI-powered coding tool that uses the GPT-4 model from OpenAI.

Q: How does GitHub Copilot perform in coding tests?
A: GitHub Copilot performed mixed results in the coding tests, failing half of the tests and succeeding in the other half.

Q: Is GitHub Copilot reliable?
A: The reliability of GitHub Copilot is questionable, as it failed half of the tests in the coding tests.

Q: Can I use GitHub Copilot for my coding needs?
A: Yes, you can use GitHub Copilot as a coding tool, but be aware of its limitations and potential failures in certain tests.

Latest stories

Read More

LEAVE A REPLY

Please enter your comment!
Please enter your name here