OpenAI Launches GPT-4.1 Models for Coding and Instruction Following
OpenAI has launched a new family of models called GPT-4.1, which excel at coding and instruction following. The models are available through OpenAI’s API but not ChatGPT.
Key Features
- 1-million-token context window, allowing for the processing of approximately 750,000 words at once
- Three models: GPT-4.1, GPT-4.1 mini, and GPT-4.1 nano
- Optimized for real-world use based on direct feedback from developers
Competition and Goals
OpenAI’s GPT-4.1 models arrive as rivals like Google and Anthropic ramp up efforts to build sophisticated programming models. Google’s Gemini 2.5 Pro and Anthropic’s Claude 3.7 Sonnet have also shown high performance on popular coding benchmarks.
OpenAI’s grand ambition is to create an “agentic software engineer,” capable of performing complex software engineering tasks, such as programming entire apps end-to-end, handling quality assurance, bug testing, and documentation writing.
Benchmark Performance
GPT-4.1 has been tested on various benchmarks, including SWE-bench Verified, where it scored between 52% and 54.6%. This is slightly under the scores reported by Google and Anthropic for Gemini 2.5 Pro and Claude 3.7 Sonnet, respectively.
Additional Evaluation
GPT-4.1 was also evaluated using Video-MME, a measure of a model’s ability to understand content in videos. The model achieved a chart-topping 72% accuracy on the “long, no subtitles” video category.
Limits and Challenges
While GPT-4.1 shows promising performance, it’s essential to recognize its limitations. The model becomes less reliable as the number of input tokens increases, and it may require more specific, explicit prompts to produce accurate results.
Conclusion
GPT-4.1 is a significant step towards OpenAI’s goal of creating an “agentic software engineer.” While it has its limitations, the model shows promising performance on various benchmarks and has the potential to improve the efficiency and accuracy of coding tasks.
FAQs
Q: What is GPT-4.1?
A: GPT-4.1 is a new family of models from OpenAI that excel at coding and instruction following.
Q: What are the key features of GPT-4.1?
A: The key features of GPT-4.1 include a 1-million-token context window, allowing for the processing of approximately 750,000 words at once, and three models: GPT-4.1, GPT-4.1 mini, and GPT-4.1 nano.
Q: How does GPT-4.1 compare to other models?
A: GPT-4.1 has been tested on various benchmarks and has shown promising performance. However, it’s slightly under the scores reported by Google and Anthropic for Gemini 2.5 Pro and Claude 3.7 Sonnet, respectively.
Q: What is the pricing for GPT-4.1?
A: The pricing for GPT-4.1 varies depending on the model and the number of input tokens. GPT-4.1 costs $2 per million input tokens and $8 per million output tokens, while GPT-4.1 mini and nano are more affordable options.

