Processing 10,000 customer support tickets through AI. Classifying 50,000 product descriptions. Summarizing 5,000 legal documents. Batch AI processing is one of the highest-value use cases — but doing it wrong means blown budgets, rate limit errors, and lost data. Here's how to do it right.

Batch Processing Architecture

Load data — Read from CSV, database, or API
Chunk — Split into batches that respect rate limits
Process concurrently — Use async workers with backoff
Collect results — Write to output as results arrive
Handle failures — Retry failed items, log errors

Model Selection for Batch Jobs

Task	Recommended Model	Cost per 10K items	Why
Classification	GLM-4 Flash	$0.04	Cheapest, fast, good accuracy
Summarization	DeepSeek V3	$2.10	Good quality, affordable
Extraction	Doubao Pro	$0.34	256K context, very cheap
Translation	Qwen Plus	$0.80	Best multilingual, cheap
Analysis	Claude Sonnet 4	$67.50	Best quality for complex tasks

Production Batch Processor

import asyncio
import csv
import json
import time
from openai import OpenAI

client = OpenAI(base_url="https://api.aipower.me/v1", api_key="YOUR_KEY")

async def process_item(item, semaphore, model="zhipu/glm-4-flash"):
    """Process a single item with rate limiting."""
    async with semaphore:
        try:
            response = await asyncio.to_thread(
                client.chat.completions.create,
                model=model,
                messages=[
                    {"role": "system", "content": "Classify this text into: positive, negative, neutral"},
                    {"role": "user", "content": item["text"]},
                ],
                max_tokens=10,
            )
            return {
                "id": item["id"],
                "result": response.choices[0].message.content.strip(),
                "status": "success",
            }
        except Exception as e:
            return {"id": item["id"], "error": str(e), "status": "failed"}

async def batch_process(items, concurrency=20, model="zhipu/glm-4-flash"):
    """Process items with controlled concurrency."""
    semaphore = asyncio.Semaphore(concurrency)
    tasks = [process_item(item, semaphore, model) for item in items]

    results = []
    for i, coro in enumerate(asyncio.as_completed(tasks)):
        result = await coro
        results.append(result)
        if (i + 1) % 100 == 0:
            success = sum(1 for r in results if r["status"] == "success")
            print(f"Progress: {i + 1}/{len(items)} | Success: {success}")

    return results

# Run it
items = [{"id": i, "text": f"Customer review {i}..."} for i in range(10000)]
results = asyncio.run(batch_process(items))

Error Handling and Retry

async def process_with_retry(item, semaphore, max_retries=3):
    """Process with exponential backoff retry."""
    for attempt in range(max_retries):
        result = await process_item(item, semaphore)
        if result["status"] == "success":
            return result
        wait = 2 ** attempt
        await asyncio.sleep(wait)

    return result  # Return last failure

async def batch_with_retry(items, concurrency=20):
    """Process all items, then retry failures."""
    results = await batch_process(items, concurrency)

    # Separate successes and failures
    successes = [r for r in results if r["status"] == "success"]
    failures = [r for r in results if r["status"] == "failed"]
    print(f"First pass: {len(successes)} success, {len(failures)} failed")

    # Retry failures with lower concurrency
    if failures:
        failed_items = [item for item in items if item["id"] in {f["id"] for f in failures}]
        retry_results = await batch_process(failed_items, concurrency=5)
        successes.extend(r for r in retry_results if r["status"] == "success")

    return successes

Cost Optimization Tips

Use the cheapest model that works — GLM-4 Flash at $0.01/M handles most classification
Set max_tokens aggressively — classification needs 10 tokens, not 1000
Batch similar items — process multiple items per API call when possible
Cache duplicate inputs — hash inputs and skip duplicates
Test on a sample first — run 100 items before committing to 100,000
Use structured output — JSON mode prevents parsing errors

Monthly Cost Estimates for Batch Jobs

Items/month	GLM-4 Flash	DeepSeek V3	Claude Sonnet
10,000	$0.04	$2.10	$67.50
100,000	$0.40	$21.00	$675.00
1,000,000	$4.00	$210.00	$6,750.00

Process millions of items affordably at aipower.me — GLM-4 Flash at $0.01/M tokens makes batch AI practically free. 10 trial calls to test your pipeline.

Batch AI Processing: How to Process Large Datasets with AI APIs