AWS makes SageMaker compatible with the OpenAI API

Breaking

CIO CISOBoardAWSSageMakerOpenAIAI AgentsEnterprise AIAI GovernanceCloudIAMVendor Risk

AWS makes SageMaker compatible with the OpenAI API

Joachim Høgby

21. mai 202621. mai 20265 min lesingKilde: AWS

Del

LinkedIn X Facebook E-post WhatsApp Telegram

AWS has made a small API change with large governance consequences: SageMaker AI now supports OpenAI-compatible endpoints for real-time inference.

Applications that already use the OpenAI SDK, LangChain or Strands Agents can call models on SageMaker by changing the endpoint URL. AWS says teams do not need a custom client, a SigV4 wrapper or application rewrites.

This is not mainly a model story. It is a control story. The OpenAI format has become the interface many developers build agents, chat systems and internal AI tools around. By making SageMaker speak the same language, AWS is shifting the conversation from model branding to where inference runs, who controls access, and where data and logs actually live.

The OpenAI format is becoming infrastructure

AWS says SageMaker endpoints now expose an /openai/v1 path that accepts Chat Completions requests and returns responses from the container, including streaming. OpenAI-compatible endpoints are enabled for all SageMaker endpoints and inference components using standard SageMaker APIs and SDKs.

For a CIO, that matters for a simple reason: much of the enterprise AI stack is already written around OpenAI clients. Internal tools, agent frameworks, gateways and evaluation pipelines often assume they can send chat-completion calls. If every provider or private model environment requires custom application code, portability becomes a slide-deck promise.

AWS is trying to remove some of that friction. The same application can point to a SageMaker endpoint running Llama, Mistral, Qwen or a fine-tuned internal model. That does not make migration effortless. But it lowers the threshold for testing whether workloads can move from an external model API to a company-controlled cloud account, dedicated GPU capacity and enterprise IAM policies.

Agents can run on owned infrastructure

AWS specifically highlights agentic workflows. Teams building multi-step agents with Strands Agents or LangChain can run those workflows on their own SageMaker endpoints while keeping the OpenAI-compatible interface the agents were built for.

That is the executive angle. The question is no longer just which model scores best. It is where the agent is allowed to run when it reads documents, searches internal systems, classifies customer requests, writes code or proposes actions.

If inference runs on an external model platform, risk has to be managed through data-processing terms, logging rules, retention, model terms and the vendor’s security regime. If inference runs in the company’s own AWS account, more of the control shifts into the existing cloud operating model: IAM, VPCs, CloudWatch, network controls, KMS, cost governance and internal audit. That is not automatically safer. It is more governable if the company already has mature cloud governance.

Tokens are part of the risk

The launch is not only a developer convenience. AWS also introduces time-limited bearer tokens for SageMaker endpoints. Tokens can be valid for up to 12 hours and are generated from existing AWS credentials. AWS says separate API keys are not required, but also stresses that the token carries the same authorization as the underlying AWS credentials.

That is a familiar governance trap. When a service becomes compatible with popular OpenAI clients, it can also become easier to paste tokens into notebooks, environment variables, logs or agent configuration files. AWS recommends short lifetimes, least-privilege IAM policies, no disk storage and no token logging.

For leaders, API compatibility must therefore come with explicit secrets-management rules. Who can generate tokens? Which SageMaker endpoints can they reach? Can agents refresh their own tokens? Is token use logged? Are there controls that prevent a broad developer role from becoming a general-purpose AI key?

This is also about vendor power

AWS is positioning SageMaker as a place where companies can host multiple models behind one interface. A team could run a general Llama model, a fine-tuned Mistral model for domain-specific work and a smaller classification model through the same OpenAI SDK. Each model can have its own resource allocation through inference components.

That is a practical product update. The larger point is strategic: the OpenAI API has become a de facto standard, even for competitors. That gives buyers more leverage if they design for portability. It also creates risk if the company locks its agent platform to one API pattern without cross-model evaluation, cost control, data requirements and observability.

Executives should read this kind of launch as a signal of where enterprise AI is heading. Models look more interchangeable at the surface. The control plane becomes more important. The strongest AI organizations will not only ask which model is best. They will ask where it runs, who can call it, how calls are logged, what it costs, and how quickly the workload can move if price, security or regulation changes.

That is less flashy than a model demo. It is also where enterprise AI becomes operations. And operations are where the bill, the risk and the accountability land.

Sources and media

Primary source: AWS, “Announcing OpenAI-compatible API support for Amazon SageMaker AI endpoints”, published May 20, 2026: https://aws.amazon.com/blogs/machine-learning/announcing-openai-compatible-api-support-for-amazon-sagemaker-ai-endpoints/
AWS links to a GitHub notebook for SageMaker/OpenAI-compatible inference: https://github.com/aws-samples/sagemaker-genai-hosting-examples/blob/main/03-features/openai/sagemaker-inference-openai-api.ipynb
Thumbnail: GPT/OpenAI Image 2 / hogby.ai.

📬 Likte du denne?

AI-nyheter for ledere. Kuratert av en CIO som bygger det selv. Daglig i innboksen.

Relaterte saker

Anthropic gjør Claude Opus 5 til ny toppmodell for agentarbeid

Breaking

AI-modellerAnthropicClaude

Anthropic gjør Claude Opus 5 til ny toppmodell for agentarbeid

Claude Opus 5 flytter Anthropic-kampen fra ren intelligens til styrbar kost, fart og sikkerhet i agentarbeid. Det er en tydelig CIO-sak, ikke bare en modellnyhet.

24. juli 20265 min lesing

Anthropic

Åpne saken

CIOCISOCTO

GitHub ruller Claude Opus 5 inn i Copilot for agentisk koding

Claude Opus 5 er tilgjengelig i GitHub Copilot for Pro+, Max, Business og Enterprise. GitHub fremhever agentiske kodeflyter, egenverifisering og strengere cyber-sperrer. For IT-ledere blir modellvalg i Copilot et spørsmål om styring, kostnad og sikkerhet – ikke bare autocomplete.

24. juli 20265 min lesing

GitHub

Åpne saken

AI-modellerGoogle AIGemini

Google gjør Gemini Flash raskere for agentarbeid

Google lanserer Gemini 3.6 Flash og 3.5 Flash-Lite med tydeligere fokus på hastighet, token-effektivitet og produksjonsklare AI-agenter.

24. juli 20264 min lesing

Google AI

Åpne saken