OpenAI o3 is 80% CHEAPER, Apple WWDC 2025's biggest update, Mistral's first reasoning model, and more - Week #23

Jun 14, 2025

Hello AI Enthusiasts!

Welcome to the Twenty-Third edition of "This Week in AI Engineering"!

This week, OpenAI released its new o3‑pro model, and made o3-mini 80% cheaper, Apple open-sourced its on‑device foundational AI to third‑party developers, Mistral launched Magistral, their first reasoning model, Higgsfield launched a new video model with Flux.1 Kontext integration, and Sakana AI Labs built a Text‑to‑LoRA hypernetwork for on‑the‑fly LLM adapter generation.

With this, we'll also explore some under-the-radar tools that can supercharge your development workflow.

Don’t have time read the newsletter? Listen to it on the go!

OpenAI launches o3-pro, slashes o3 price by 80%

OpenAI has launched o3‑pro, its newest flagship language model, boasting a staggering 80 percent reduction in price per token alongside a suite of architectural and efficiency upgrades. Not only is this release the most cost‑effective option in OpenAI’s lineup, but it also delivers improved context handling, faster inference, and greater multi‑modal flexibility.

What’s New

Adaptive Token Bundling: Groups common token sequences into fused operations, reducing memory overhead by 25 percent.

Priority Attention Scheduling: Assigns dynamic compute priority to tokens based on salience, improving response relevance in low-resource settings.

Enhanced Multimodal Fusion: Introduces a cross-attention normalization layer for synchronized processing of image and text inputs, boosting accuracy on vision-language tasks by 15 percent.

Aggressive Pricing & Efficiency

80 Percent Price Drop: Access to o3 is now four times cheaper than its predecessor, making high‑end LLM capabilities more affordable for startups and enterprises alike.
o3 Pricing: $2 per 1M input tokens, $8 per 1M output tokens (previously five times higher). This is now in effect, the same o3 model, just much cheaper due to inference stack optimizations.
o3-pro Pricing: $20 per 1M input tokens, $80 per 1M output tokens, an 87% reduction compared to o1-pro, reflecting the increased compute and capabilities of this tier. OpenAI recommends using background mode with o3-pro for long-running tasks, which are processed asynchronously to prevent timeouts.
Dynamic Precision Scaling: Automatically adjusts bit‑width precision per layer, balancing compute cost versus output fidelity in real time.
Multi‑Modal Support: Natively ingests text, image, and tabular data, enabling richer context for complex queries.

Performance Benchmarks

Contextual Understanding: 10 percent gain on SuperGLUE compared to o3, reducing common-sense reasoning errors.

Inference Speed: 1.8× faster median latency at 2048‑token context, thanks to block‑sparse attention optimizations.

Throughput: Sustains 150 tokens/sec on a single A100 GPU, up from 90 tokens/sec in o3.

With these updates, o3-pro sets a new standard for cost-effective, high-performance, and flexible AI reasoning, making advanced language and multimodal capabilities more accessible than ever before.

Apple Intelligence Is Finally Getting The Treatment It Deserves

For the first time, Apple has opened its on‑device large language model, powered by Apple Intelligence, to third‑party developers. This move grants direct API access to a model optimized for privacy, efficiency, and seamless integration across iOS, macOS, and visionOS, By enabling on-device inference, Apple AI dramatically reduces latency and enhances data security, critical for real-time user interactions. Third‑party integrations can tap into Apple’s tightly optimized neural engines, delivering consistent performance across devices without network dependencies. Developers can now build immersive, privacy-preserving experiences that leverage system-wide context (e.g., user preferences, sensor data) to deliver smarter, more adaptive applications.

Privacy‑First Integration

On‑Device Inference: All prompt processing and generation occur locally, ensuring user data never leaves the device.
Developer SDK: New Swift and Objective‑C APIs let apps invoke the LLM for tasks like summarization, translation, and conversational assistants.
Cross‑Platform Consistency: Identical behavior and performance whether on iPhone, iPad, Mac, or Vision Pro.

Key Use Cases

Secure Chatbots: Build customer support agents that process sensitive information entirely offline.
Contextual UI Automation: Drive adaptive interfaces based on user behavior and screen content in real time.
Augmented Reality Narration: Provide natural‑language annotations for Vision Pro experiences without network latency.

The Future of Apple Intelligence?

This developer access marks a pivotal moment for Apple Intelligence, signaling that by the iPhone 17 launch or the end of 2025, Apple’s AI capabilities will be significantly more advanced and deeply integrated.
With months for developers to build on these new tools, expect a surge of smarter, privacy-first, context-aware apps across the Apple ecosystem.
As Apple expands language and device support, Apple Intelligence will become a core part of iPhone, iPad, Mac, and Vision Pro experiences, delivering richer, more adaptive, and secure AI-powered interactions for users everywhere.

Mistral’s New Reasoning model Cuts down Hallucinations by 30%

Mistral AI has unveiled Magistral, the industry’s first open reasoning model. By combining symbolic reasoning modules with neural backbones, it excels at step‑by‑step logic tasks, bridging the gap between raw compute and human‑like deduction. Magistral’s hybrid design addresses a common limitation in pure‑neural LLMs: logical consistency. Symbolic modules encode explicit rules for domains like mathematics and graph traversal, while the transformer handles unstructured language. Early adopters report 30 percent fewer hallucinations in multi‑step problem solving compared to standard 16 B models.

Hybrid Reasoning Architecture

Neuro‑Symbolic Core: Integrates a logic engine for propositional reasoning with a 16 B transformer for natural language understanding.
Self‑Verifying Chains: Each reasoning step includes an internal consistency check, reducing error propagation.
Modular Plugins: Extendable modules for math, code verification, and knowledge graph queries.

Benchmark Performance

Proof Generation: Solves advanced theorem tasks on GSM8K with 85 percent accuracy.
Multi‑Hop QA: Outperforms comparable LLMs by 12 percent on HotpotQA.
Code Reasoning: Excels at static analysis challenges, spotting logical bugs in unseen code snippets.

Meta AI’s Big Step Towards True AGI

Meta’s V-JEPA 2 is a powerful world model that significantly advances AI’s ability to understand, predict, and generate video content over long time horizons, a crucial step toward Artificial General Intelligence (AGI). By processing up to 1,024 frames (about 34 seconds at 30 fps) in a single pass and maintaining smooth, flicker-free motion, V-JEPA 2 demonstrates key AGI traits: learning from raw sensory data, generalizing to new tasks, and reasoning about complex, dynamic environments much like humans do.

What’s A World Model?

A world model is an AI system that learns an internal map of its environment, allowing it to understand, predict, and plan in the real world, much like how humans anticipate what happens next by observing their surroundings.

Read more about world models here.

Temporal & Generative Enhancements

Extended Context Window: Handles long video sequences with up to 1,024 frames, enabling consistent narrative and visual coherence over extended periods.
Flow-Guided Generation: Uses optical flow priors to preserve smooth, stable motion across frames, reducing flicker and artifacts in generated videos.
Adaptive Resolution: Dynamically adjusts spatial resolution per frame based on motion intensity to optimize detail and computational efficiency.

AGI-Relevant Capabilities

World Modeling & Physical Reasoning: Trained on over 1 million hours of video and 1 million images, V-JEPA 2 learns to anticipate outcomes, understand cause and effect, and plan actions in new environments.
Zero-Shot Robot Planning: Enables robots to perform complex manipulation tasks in unfamiliar settings using only visual goal images, with minimal fine-tuning.
Multimodal Reasoning: Achieves state-of-the-art results in video question answering by integrating visual and language understanding.
Benchmark Leadership: Excels on physical reasoning benchmarks like IntPhys 2, MVPBench, and CausalVQA, measuring plausibility, anticipation, and counterfactual reasoning.

Key Use Cases

Video Summarization: Creates concise highlight reels with narrative captions from hours of footage.
Augmented Reality Filters: Powers dynamic, object-tracking effects that remain stable over time.
Synthetic Data Generation: Produces coherent multi-view video clips for training autonomous systems and robots.
By enabling AI to model, predict, and plan in complex, real-world environments using only video data, V-JEPA 2 brings us closer to the vision of AGI, an adaptable, general-purpose intelligence capable of understanding and interacting with the world as flexibly and robustly as humans.

This Tool Animates Any Face With 92% Accuracy

Higgsfield has launched Speak, a generative engine that animates any face, be it a human, car grille, zombie, or even a coffee mug, letting them speak natural language. Combined with Flux.1 Kontext integration, it delivers fully context‑aware talking avatars. built on a layout-aware transformer and a rule-based spec generator, By leveraging pre-trained facial landmarks and a lightweight GAN for expression synthesis, Speak adapts to diverse subjects with just five reference frames. Voice cloning support lets characters adopt any style, from dramatic or”/l.atory to casual conversation.

Universal Facial Animation

Any Face, Any Subject: Train on a single reference image or object and generate lifelike speech-driven animations.
Flux.1 Kontext Integration: Leverage multi‑turn context understanding to maintain character consistency across dialogues.
Audio‑Lip Sync: Fine‑tuned to match phonemes with precise mouth shapes and expressions.

Key Applications

Interactive Marketing: Create talking product demos where the product itself explains features.
Educational Avatars: Bring historical figures to life, delivering lectures in their own “voice.”
Entertainment: Generate comedic skits with inanimate objects as characters.

OpenAI Whisper, But Way Better

Cartesia has taken OpenAI’s whisper‑large‑v3‑turbo and reimagined it as Ink‑Whisper, a purpose‑built streaming speech‑to‑text model crafted for live dialogue. Unlike standard Whisper, which excels at bulk transcription but struggles with latency and challenging acoustics, Ink‑Whisper delivers studio‑grade accuracy, ultra‑low lag, and resilience in the wild, across phone calls, crowded rooms, and diverse accents.

Core Real‑Time Enhancements

Dynamic Chunking: Audio is split at semantic boundaries, pauses, sentence ends, or punctuation, so each fragment carries meaningful context, slashing transcription errors and hallucinations.
Adaptive Inference Pipeline: Low‑bitrate telephony streams receive on‑the‑fly noise reduction and gain normalization, restoring clarity to compressed audio.
Domain Adaptation Layers: Fine‑tuned on jargon‑dense corpora (financial reports, product catalogs, medical terminology) to nail proper nouns and specialized vocabulary.
On‑the‑Fly Acoustic Calibration: Continuous profiling of environmental noise, traffic, café chatter, static, enables real‑time spectral adjustments without manual retuning.
Accent‑Robust Encoder: Trained on a global accent dataset to ensure non‑native and regional English varieties are transcribed with equal fidelity.
Disfluency & Silence Handling: Recognizes “um,” “uh,” and extended pauses as conversational cues instead of errors, keeping transcripts natural and comprehensive.

Performance & Latency

Beyond accuracy, Ink‑Whisper prioritizes time‑to‑complete‑transcript (TTCT)—the delay from end of speech to full transcript. Leveraging its dynamic chunking and streamlined inference, Ink‑Whisper achieves industry‑leading TTCT, preserving the natural rhythm of conversation and preventing bot‑like delays that frustrate users.

Key Use Cases

Voice‑Enabled Contact Centers: Accurate, real‑time transcription of customer calls—even on unstable cellular networks.
Interactive Voice Assistants: Instant turn‑taking with near‑zero lag, enabling truly conversational AI.
Live Captioning & Accessibility: Real‑time captions for lectures, webinars, and broadcasts in any environment.
Domain‑Specific Transcription: Precise dictation for finance, healthcare, and legal sectors, thanks to specialized vocabulary support.

Affordable Streaming & Seamless Integration

Cost‑Effective: Just 1 credit/sec (≈ $0.13/hr), the lowest price for a production‑grade streaming STT model.
Open Source & Self‑Hostable: Full weights available for custom deployments and further fine‑tuning.
Easy Plug‑Ins: Ready integrations for Vapi, Pipecat, and LiveKit get you streaming in minutes.
Enterprise Reliability: Backed by 99.9 % uptime, SOC 2 Type II, HIPAA, and PCI compliance.

In every case, Ink‑Whisper meets or beats whisper‑large‑v3‑turbo on word‑error rate (WER), ensuring fewer misheard commands and clearer captions under real‑world conditions.

Tools & Releases YOU Should Know About

text-to-api.ai is a prompt-driven platform that lets you build and deploy AI‑powered APIs in seconds. Simply describe the behavior you need, and it generates a fully hosted endpoint complete with authentication, auto‑scaling, and usage analytics. With out‑of‑the‑box integrations for popular frameworks and SDKs, it’s perfect for backend developers and startups who want to turn AI experiments into production‑grade services without managing infrastructure.

Windframe.dev accelerates front‑end development by generating AI‑assisted components and templates that you can customize in a visual editor. Whether you’re crafting dashboards, landing pages, or complex web apps, Windframe’s library of pre‑styled UI blocks and one‑click theming tools help you go from sketch to code up to 10× faster. It exports clean React, Vue, or plain HTML/CSS, making it ideal for designers and engineers who need pixel‑perfect results on tight deadlines.

Auteng.aibrings a conversational interface to your entire development workflow, just chat to create functions, track down bugs, or generate documentation. It understands context across files and can refactor code, write tests, and even propose CI configurations. By integrating with Git and popular IDEs, Auteng.ai empowers professional teams and solo engineers to code, debug, and document through natural language prompts, reducing friction and keeping everyone in sync.

And that wraps up this issue of "This Week in AI Engineering", brought to you by jam.dev— your flight recorder for AI apps! Non-deterministic AI issues are hard to repro, unless you have Jam! Instant replay the session, prompt + logs to debug ⚡️

Thank you for tuning in! Be sure to share this newsletter with your fellow AI enthusiasts and subscribe to get the latest updates directly in your inbox.

Until next time, happy building!

This Week in AI Engineering

OpenAI o3 is 80% CHEAPER, Apple WWDC 2025's biggest update, Mistral's first reasoning model, and more - Week #23

OpenAI launches o3-pro, slashes o3 price by 80%

What’s New

Aggressive Pricing & Efficiency

Performance Benchmarks

Apple Intelligence Is Finally Getting The Treatment It Deserves

Privacy‑First Integration

Key Use Cases

The Future of Apple Intelligence?

Mistral’s New Reasoning model Cuts down Hallucinations by 30%

Hybrid Reasoning Architecture

Benchmark Performance

Meta AI’s Big Step Towards True AGI

What’s A World Model?

Temporal & Generative Enhancements

AGI-Relevant Capabilities

Key Use Cases

This Tool Animates Any Face With 92% Accuracy

Universal Facial Animation

Key Applications

OpenAI Whisper, But Way Better

Core Real‑Time Enhancements

Performance & Latency

Key Use Cases

Affordable Streaming & Seamless Integration

Tools & Releases YOU Should Know About

Discussion about this post