Tutorial
AI API Streaming with Server-Sent Events: Complete Guide
April 17, 2026 · 8 min read
Streaming lets your AI application display responses token-by-token as they are generated, rather than waiting for the entire response. This dramatically improves perceived latency and user experience. Under the hood, AI APIs use Server-Sent Events (SSE) to push tokens to your client in real time.
How SSE Streaming Works
When you set stream=True, the API sends a series of small JSON chunks over an HTTP connection instead of one large response. Each chunk contains one or more tokens. The connection stays open until the response is complete.
- Client sends a POST request with
stream: true - Server responds with
Content-Type: text/event-stream - Server pushes
data: {...}events as tokens are generated - Final event:
data: [DONE]signals completion
Python: Basic Streaming
from openai import OpenAI
client = OpenAI(
base_url="https://api.aipower.me/v1",
api_key="YOUR_AIPOWER_KEY",
)
# Enable streaming
stream = client.chat.completions.create(
model="deepseek/deepseek-chat",
messages=[{"role": "user", "content": "Explain how neural networks learn"}],
stream=True,
)
# Process tokens as they arrive
for chunk in stream:
content = chunk.choices[0].delta.content
if content:
print(content, end="", flush=True)
print() # Newline at the endNode.js: Basic Streaming
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.aipower.me/v1",
apiKey: "YOUR_AIPOWER_KEY",
});
async function streamResponse() {
const stream = await client.chat.completions.create({
model: "deepseek/deepseek-chat",
messages: [{ role: "user", content: "Explain how neural networks learn" }],
stream: true,
});
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content;
if (content) process.stdout.write(content);
}
console.log();
}
streamResponse();Building a Real-Time Chat UI (FastAPI + SSE)
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
from openai import OpenAI
app = FastAPI()
client = OpenAI(base_url="https://api.aipower.me/v1", api_key="YOUR_KEY")
@app.post("/api/chat")
async def chat(messages: list):
async def generate():
stream = client.chat.completions.create(
model="deepseek/deepseek-chat",
messages=messages,
stream=True,
)
for chunk in stream:
content = chunk.choices[0].delta.content
if content:
yield f"data: {content}\n\n"
yield "data: [DONE]\n\n"
return StreamingResponse(generate(), media_type="text/event-stream")Frontend: Consuming SSE in JavaScript
async function sendMessage(messages) {
const response = await fetch("/api/chat", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify(messages),
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
let result = "";
while (true) {
const { done, value } = await reader.read();
if (done) break;
const text = decoder.decode(value);
const lines = text.split("\n").filter(line => line.startsWith("data: "));
for (const line of lines) {
const data = line.slice(6);
if (data === "[DONE]") return result;
result += data;
updateChatUI(result); // Re-render the message in your UI
}
}
return result;
}Streaming with Error Handling
def stream_with_retry(messages, model="deepseek/deepseek-chat", max_retries=3):
for attempt in range(max_retries):
try:
stream = client.chat.completions.create(
model=model, messages=messages, stream=True,
)
full_response = ""
for chunk in stream:
content = chunk.choices[0].delta.content
if content:
full_response += content
yield content
return
except Exception as e:
if attempt == max_retries - 1:
raise
print(f"Stream error (attempt {attempt + 1}): {e}")Performance Tips
- First-token latency: Streaming shows the first token in 200-500ms vs 2-5s for non-streaming full responses
- Cost is identical: Streaming does not cost more or less than non-streaming requests
- Abort early: Cancel the stream if the user navigates away to save output tokens
- Buffer for rendering: Batch UI updates every 50ms instead of per-token to avoid jank
All 16 models on AIPower support streaming. Start building real-time AI UIs at aipower.me with 50 free API calls.