GoogleGeminiAPIEnterpriseCIO

Google Launches Flex and Priority Inference Tiers for Gemini API

Joachim Høgby

3. april 20263. april 20263 min lesingKilde:

Del

LinkedIn X Facebook E-post WhatsApp Telegram

Google today introduced two new pricing tiers for the Gemini API: Flex Inference and Priority Inference. The new options give developers and enterprises significantly more control over balancing cost and reliability for AI workloads.

Flex Inference offers a 50% discount compared to standard API pricing. It leverages opportunistic, off-peak compute capacity and is designed for latency-tolerant workloads such as background CRM updates, large-scale research simulations, and agentic workflows. Requests may be preempted if standard traffic spikes, and users are responsible for implementing client-side retry logic.

Priority Inference is the premium tier, priced 75 to 100% above standard rates. It guarantees the lowest latency and highest reliability. Priority traffic is never shed, and in cases where capacity limits are reached, requests automatically fall back to the standard tier. This tier is ideal for live customer chatbots, real-time fraud detection, and business-critical copilots. Priority Inference requires Tier 2 or Tier 3 paid projects.

For CIOs and engineering teams already using the Gemini API in production, this is a practical improvement: no more overpaying on every API call just to ensure reliability where it actually matters.

📬 Likte du denne?

AI-nyheter for ledere. Kuratert av en CIO som bygger det selv. Daglig i innboksen.

Relaterte saker

GoogleGeminiProduct update

Google Launches Flex and Priority Inference Tiers for Gemini API

Relaterte saker

Google gives Gemini interactive simulations and 3D models

Google gir Gemini interaktive simuleringer og 3D-modeller

Google adds crisis and mental health safeguards to Gemini