Hopp til hovedinnhold
Fredag 24. april 2026AI-nyheter, ferdig filtrert for ledere
SISTE:
DeepSeek åpner V4 Preview med 1M kontekst og API-kompatibilitetOpenAI lanserer GPT-5.5 for ChatGPT og CodexAnthropic og Amazon utvider AI-alliansen med 5 GW kapasitet og ny investeringDeepSeek åpner V4 Preview med 1M kontekst og API-kompatibilitetOpenAI lanserer GPT-5.5 for ChatGPT og CodexAnthropic og Amazon utvider AI-alliansen med 5 GW kapasitet og ny investering
Google Launches Flex and Priority Inference Tiers for Gemini API
GoogleGeminiAPIEnterpriseCIO

Google Launches Flex and Priority Inference Tiers for Gemini API

JH
Joachim Høgby
3. april 20263. april 20263 min lesingKilde:

Google today introduced two new pricing tiers for the Gemini API: Flex Inference and Priority Inference. The new options give developers and enterprises significantly more control over balancing cost and reliability for AI workloads.

Flex Inference offers a 50% discount compared to standard API pricing. It leverages opportunistic, off-peak compute capacity and is designed for latency-tolerant workloads such as background CRM updates, large-scale research simulations, and agentic workflows. Requests may be preempted if standard traffic spikes, and users are responsible for implementing client-side retry logic.

Priority Inference is the premium tier, priced 75 to 100% above standard rates. It guarantees the lowest latency and highest reliability. Priority traffic is never shed, and in cases where capacity limits are reached, requests automatically fall back to the standard tier. This tier is ideal for live customer chatbots, real-time fraud detection, and business-critical copilots. Priority Inference requires Tier 2 or Tier 3 paid projects.

For CIOs and engineering teams already using the Gemini API in production, this is a practical improvement: no more overpaying on every API call just to ensure reliability where it actually matters.

📬 Likte du denne?

AI-nyheter for ledere. Kuratert av en CIO som bygger det selv. Daglig i innboksen.