Tutorial
Batch AI Processing: How to Process Large Datasets with AI APIs
April 17, 2026 · 8 min read
Processing 10,000 customer support tickets through AI. Classifying 50,000 product descriptions. Summarizing 5,000 legal documents. Batch AI processing is one of the highest-value use cases — but doing it wrong means blown budgets, rate limit errors, and lost data. Here's how to do it right.
Batch Processing Architecture
- Load data — Read from CSV, database, or API
- Chunk — Split into batches that respect rate limits
- Process concurrently — Use async workers with backoff
- Collect results — Write to output as results arrive
- Handle failures — Retry failed items, log errors
Model Selection for Batch Jobs
| Task | Recommended Model | Cost per 10K items | Why |
|---|---|---|---|
| Classification | GLM-4 Flash | $0.04 | Cheapest, fast, good accuracy |
| Summarization | DeepSeek V3 | $2.10 | Good quality, affordable |
| Extraction | Doubao Pro | $0.34 | 256K context, very cheap |
| Translation | Qwen Plus | $0.80 | Best multilingual, cheap |
| Analysis | Claude Sonnet 4 | $67.50 | Best quality for complex tasks |
Production Batch Processor
import asyncio
import csv
import json
import time
from openai import OpenAI
client = OpenAI(base_url="https://api.aipower.me/v1", api_key="YOUR_KEY")
async def process_item(item, semaphore, model="zhipu/glm-4-flash"):
"""Process a single item with rate limiting."""
async with semaphore:
try:
response = await asyncio.to_thread(
client.chat.completions.create,
model=model,
messages=[
{"role": "system", "content": "Classify this text into: positive, negative, neutral"},
{"role": "user", "content": item["text"]},
],
max_tokens=10,
)
return {
"id": item["id"],
"result": response.choices[0].message.content.strip(),
"status": "success",
}
except Exception as e:
return {"id": item["id"], "error": str(e), "status": "failed"}
async def batch_process(items, concurrency=20, model="zhipu/glm-4-flash"):
"""Process items with controlled concurrency."""
semaphore = asyncio.Semaphore(concurrency)
tasks = [process_item(item, semaphore, model) for item in items]
results = []
for i, coro in enumerate(asyncio.as_completed(tasks)):
result = await coro
results.append(result)
if (i + 1) % 100 == 0:
success = sum(1 for r in results if r["status"] == "success")
print(f"Progress: {i + 1}/{len(items)} | Success: {success}")
return results
# Run it
items = [{"id": i, "text": f"Customer review {i}..."} for i in range(10000)]
results = asyncio.run(batch_process(items))Error Handling and Retry
async def process_with_retry(item, semaphore, max_retries=3):
"""Process with exponential backoff retry."""
for attempt in range(max_retries):
result = await process_item(item, semaphore)
if result["status"] == "success":
return result
wait = 2 ** attempt
await asyncio.sleep(wait)
return result # Return last failure
async def batch_with_retry(items, concurrency=20):
"""Process all items, then retry failures."""
results = await batch_process(items, concurrency)
# Separate successes and failures
successes = [r for r in results if r["status"] == "success"]
failures = [r for r in results if r["status"] == "failed"]
print(f"First pass: {len(successes)} success, {len(failures)} failed")
# Retry failures with lower concurrency
if failures:
failed_items = [item for item in items if item["id"] in {f["id"] for f in failures}]
retry_results = await batch_process(failed_items, concurrency=5)
successes.extend(r for r in retry_results if r["status"] == "success")
return successesCost Optimization Tips
- Use the cheapest model that works — GLM-4 Flash at $0.01/M handles most classification
- Set
max_tokensaggressively — classification needs 10 tokens, not 1000 - Batch similar items — process multiple items per API call when possible
- Cache duplicate inputs — hash inputs and skip duplicates
- Test on a sample first — run 100 items before committing to 100,000
- Use structured output — JSON mode prevents parsing errors
Monthly Cost Estimates for Batch Jobs
| Items/month | GLM-4 Flash | DeepSeek V3 | Claude Sonnet |
|---|---|---|---|
| 10,000 | $0.04 | $2.10 | $67.50 |
| 100,000 | $0.40 | $21.00 | $675.00 |
| 1,000,000 | $4.00 | $210.00 | $6,750.00 |
Process millions of items affordably at aipower.me — GLM-4 Flash at $0.01/M tokens makes batch AI practically free. 50 free API calls to test your pipeline.