AI & MLvGPT-5.5

OpenAI API

The models behind AI features that answer your customers at 3am.

AI chatbots that answer customers around the clockSemantic search that finds meaning, not just keywordsPulling structured data out of contracts, emails, and documentsTool calling that lets AI take real actions in your systems

integrations

documented sections

Overview

The OpenAI API gives software access to large language and multimodal models over HTTP: send text, images, or audio with instructions, get generated output back. Applications authenticate with an API key, and official SDKs wrap the endpoints for common languages.

As of 2026, the flagship models are GPT-5.5 and GPT-5.5 Pro, with GPT-5.3-Codex specialized for agentic coding work. The Responses API is the modern endpoint for new integrations, and GPT-5.5 supports a context window of one million tokens, large enough to reason over entire document collections in a single request.

Read the official OpenAI API documentation for this topic →

Text Generation

Text generation is a structured conversation with the model: you provide instructions that set behavior and input that carries the actual request, and the model returns a continuation. The Responses API is the modern way to do this, replacing the older chat completions pattern with a simpler request shape and built-in support for tools and multi-step workflows. Responses stream token by token so interfaces show text as it is written rather than after a long pause, and parameters control how deterministic the output is, which matters for tasks that need consistency.

typescript

import OpenAI from 'openai'

const client = new OpenAI()

const response = await client.responses.create({
  model: 'gpt-5.5',
  instructions: 'You answer concisely.',
  input: 'Summarize this customer request.',
})

const text = response.output_text

Read the official OpenAI API documentation for this topic →

Function Calling

Function calling is how an AI assistant goes from talking to doing. The application describes available functions and their parameters, and when the model decides one is needed, it returns the function name and arguments instead of free text. The application runs the function, feeds the result back, and the model grounds its answer in real data or completes a real action. This is the mechanism behind assistants that check your inventory, look up an order, or book an appointment for a customer.

Read the official OpenAI API documentation for this topic →

Embeddings and Search

An embedding turns a piece of text into a list of numbers that captures its meaning, so similar meanings land near each other even when the words differ. A customer searching for "money back" finds your refund policy because the meanings match, not the keywords. Stored in a vector index, embeddings retrieve the most relevant documents for any query, which is the foundation of retrieval-augmented generation. We pair the embeddings endpoint with the pgvector extension in PostgreSQL, keeping search and source data in one database.

Read the official OpenAI API documentation for this topic →

How Devyst Uses the OpenAI API

We integrate the OpenAI API behind a provider abstraction, so your AI features depend on an internal interface rather than directly on one vendor. Prompts are versioned in source control like any other code, and responses that feed downstream logic use structured output validated against a schema. Retrieval-augmented generation combines embeddings in PostgreSQL with GPT-5.5 generation so answers are grounded in your actual business data, not guesses. API keys live in server-side environment variables and never reach the browser.

Read the official OpenAI API documentation for this topic →

Reliability and Cost Control

An AI feature that fails loudly or burns budget quietly is worse than no feature at all. Remote models bring remote failure modes, including rate limits, timeouts, and occasional errors, so we wrap every call with retries, backoff, sensible timeouts, and fallback behavior that keeps your user experience intact when a request fails. Cost scales with input and output length, so prompts stay tight, responses stay bounded, and token usage is tracked per feature. Caching repeated requests cuts both latency and spend.

Read the official OpenAI API documentation for this topic →

import OpenAI from 'openai' const client = new OpenAI() const response = await client.responses.create({ model: 'gpt-5.5', instructions: 'You answer concisely.', input: 'Summarize this customer request.', }) const text = response.output_text