Article
I’ve been around technology long enough that very little excites me, and even less surprises me. But shortly after Open AI’s ChatGPT released, I asked it to write a WordPress plugin for my wife’s e-commerce site. When it did, and the plugin worked, I was indeed surprised.
That was the beginning of my deep exploration into chatbots and AI-assisted programming. Since then, I’ve subjected 14 large language models (LLMs) to four real-world tests.
ChatGPT Plus
ChatGPT Plus with GPT-4 and GPT-4o passed all my tests. One of my favorite features is the availability of a dedicated app. When I test web programming, I have my browser set on one thing, my IDE open, and the ChatGPT Mac app running on a separate screen.
Perplexity Pro
Perplexity Pro with GPT-4o, Claude 3.5 Sonnet, Sonar Large, Claude 3 Opus, and Llama 3.1 405B passed all my tests. One of my favorite features is the ability to run multiple LLMs. While you can’t set an LLM for a given session, you can easily go into the settings and choose the active model.
GitHub’s Copilot
GitHub’s Copilot integrates quite seamlessly with VS Code. It makes asking for coding help very quick and productive, especially when working in context. That’s why it’s so disappointing that the code it writes can often be so very wrong.
Meta AI
Meta AI is Facebook’s general-purpose AI. As you can see above, it failed three of our four tests.
Meta Code Llama
Meta Code Llama is Facebook’s AI designed specifically for coding help. It’s something you can download and install on your server. I tested it running on a Hugging Face AI instance.
Claude 3.5 Sonnet
Anthropic claims the 3.5 Sonnet version of its Claude AI chatbot is ideal for programming. After failing all but one test, I’m not so sure.
Gemini Advanced
Gemini Advanced is Google’s $20 pro version of its Gemini (formerly Bard) chatbot. I expected the tool to do better than one out of four. Interestingly, it passed the one test that every AI other than GPT-4/4o failed — knowledge of that fairly obscure programming language produced by one programmer in Australia.
Microsoft Copilot
You’d think the company with the "Developers! Developers! Developers!" mantra in its DNA would have an AI that does better on the programming tests. Microsoft produces some of the best coding tools on the planet. And yet, Copilot did badly.
Conclusion
The results of my tests were fairly surprising, especially given the big investments of Microsoft and Google. But this area of innovation is improving at warp speed, so we’ll be back with updated tests and results over time. Stay tuned.
FAQs
Q: Which AI chatbots are best for programming?
A: ChatGPT Plus with GPT-4 and GPT-4o and Perplexity Pro with GPT-4o, Claude 3.5 Sonnet, Sonar Large, Claude 3 Opus, and Llama 3.1 405B are the top-performing AI chatbots for programming.
Q: Can I use GitHub’s Copilot for programming?
A: Yes, GitHub’s Copilot integrates seamlessly with VS Code and can be a useful tool for coding help, but its produced code is not always accurate.
Q: Which AI chatbot is best for my budget?
A: ChatGPT Plus and Perplexity Pro both offer a paid version with more features and better performance. You can choose the one that fits your budget.
Q: How do I use these AI chatbots for programming?
A: Each AI chatbot has its own interface and features. Follow the instructions and tutorials provided to learn how to use them effectively.