Google Gemini 2.5 Pro I/O converts video to code, Apple and Anthropic's vibe coding tool, Qwen 3 model family, and more - Week #18
Hello AI Enthusiasts!
Welcome to the eighteenth edition of "This Week in AI Engineering"!
Google's Gemini 2.5 Pro claims the #1 spot for web development with an impressive 1420 ELO score, Gemini 2.0 Flash handles up to 1 million tokens with multimodal capabilities, Apple partners with Anthropic on a new AI-powered coding environment, and Alibaba's Qwen3 introduces an innovative hybrid thinking architecture with MoE models.
With this, we'll also explore some under-the-radar tools that can supercharge your development workflow.
Don’t have time read the newsletter? Listen to it on the go!
Gemini 2.5 Pro is the Best Choice for Web Development
Google has released an early update to Gemini 2.5 Pro (I/O Edition) just weeks before Google I/O, featuring significant improvements to its already impressive coding capabilities. This update (05-06) represents a major leap forward in the model's ability to handle frontend and UI development tasks.
Performance Benchmarks
The updated model now dominates multiple coding benchmarks:
#1 on WebDev Arena: Achieved 1420 ELO score, surpassing Claude 3.7 Sonnet's 1357
Scientific Reasoning: 84% on GPQA Diamond, outperforming both OpenAI's o3-mini (79.7%) and Claude 3.7 Sonnet (78.2%)
Mathematics: 86.7% on AIME 2025, slightly ahead of o3-mini's 86.5% and significantly better than Claude's 49.5%
Video Understanding: 84.8% on VideoMME benchmark
Key Strengths
The model demonstrates exceptional capabilities in several areas:
Video-to-Code Conversion: Can generate complete interactive applications from video inputs
Frontend Web Development: Produces aesthetically pleasing UIs with attention to details like animations and responsive design
Agentic Programming: Enhanced function calling with higher trigger rates and fewer errors
Feature Implementation: Simplified process of translating design specifications into working code
Real-World Applications
Several companies are already leveraging the model's capabilities:
Replit: Using it for latency-sensitive tasks requiring high reliability
Cognition: Reported it was the first model to solve complex backend refactoring evaluations
Cursor: Powering their code agent
According to Michele Catasta, President of Replit, Gemini 2.5 Pro offers "the best frontier model when it comes to capability over latency ratio," while Cognition's founding team member Silas Alberti noted it "felt like a more senior developer because it was able to make correct judgment calls and choose good abstractions."
The update maintains the same pricing as the previous version, with automatic upgrades for existing users as the model ID (03-25) now points to the latest version (05-06).
Apple and Anthropic are Working on a Vibe Coding Tool
Apple is reportedly developing a new AI-powered development environment in collaboration with Anthropic, informally referred to as "vibe-coding" software. This project represents a significant evolution of Apple's developer tools and signals a strategic shift in the company's approach to AI integration.
Technical Details
According to Bloomberg's Mark Gurman, the tool is built on several key technologies:
Foundation: A revamped version of Xcode with deep AI integration
AI Model: Powered by Anthropic's Claude Sonnet model
Interface: Features a chat-based interaction system for natural language coding requests
Capabilities: Can write new code, debug existing applications, and test user interfaces
Strategic Context
This collaboration marks an important pivot in Apple's AI strategy:
Internal Testing: Currently limited to Apple's internal development teams
Previous Attempt: Follows Apple's unreleased Swift Assist tool that reportedly suffered from hallucinations and performance issues
External Partnership: Represents a departure from Apple's traditional preference for in-house solutions
Leadership Reorganization: Coincides with a restructuring that has John Giannandrea focusing on AI research while Craig Federighi oversees consumer-facing implementations
Potential Impact
If eventually released publicly, this tool could significantly alter the developer experience in the Apple ecosystem:
Developer Productivity: Streamlining code creation and testing processes
Competitive Positioning: Helping Apple catch up to Microsoft's GitHub Copilot and other AI coding tools
Anthropic Boost: Strengthening Anthropic's position alongside its existing partnership with Amazon
Hybrid Approach: Aligning with Tim Cook's recently stated strategy of balancing in-house development with external partnerships
The cautious internal-only rollout suggests Apple is taking a measured approach to ensure the reliability of the system before potentially making it available to the broader developer community.
Tools & Releases YOU Should Know About
JADBio is an automated machine learning (AutoML) platform designed to make advanced predictive modeling accessible to non-experts. Unlike mainstream AutoML tools, JADBio stands out for its focus on biomedical and life sciences data, offering robust automation for feature selection, model training, and interpretation. Its user-friendly interface and transparent model explanations make it ideal for researchers and small teams who lack deep data science expertise
Sweep is an AI-powered tool that automates the process of handling code reviews and pull requests. It can review code changes, suggest improvements, and even auto-fix simple issues. Sweep is a productivity booster for teams looking to maintain high code quality with minimal manual intervention, but it remains under the radar compared to mainstream code review bots.
Lalal.ai uses advanced AI to separate vocals and instrumental tracks from audio files, making it a powerful tool for musicians, podcasters, and content creators. Its deep learning models deliver high-quality stem separation, outperforming many mainstream alternatives. Despite its effectiveness, Lalal.ai remains relatively niche and is perfect for anyone needing quick, studio-grade audio isolation without expensive software
Apidog MCP Server acts as a bridge between your backend APIs and AI coding assistants. By connecting your OpenAPI definitions, it enables AI tools to auto-generate API logic and DTOs, and lets AI assistants access real-time API documentation for smarter suggestions. It's especially valuable for teams managing frequently changing APIs or practicing domain-driven design, streamlining backend and frontend development workflows
And that wraps up this issue of "This Week in AI Engineering", brought to you by jam.dev— your flight recorder for AI apps! Non-deterministic AI issues are hard to repro, unless you have Jam! Instant replay the session, prompt + logs to debug ⚡️
Thank you for tuning in! Be sure to share this newsletter with your fellow AI enthusiasts and subscribe to get the latest updates directly in your inbox.
Until next time, happy building!