Tutorial

Batch AI Processing: How to Process Large Datasets with AI APIs

April 17, 2026 · 8 min read

Processing 10,000 customer support tickets through AI. Classifying 50,000 product descriptions. Summarizing 5,000 legal documents. Batch AI processing is one of the highest-value use cases — but doing it wrong means blown budgets, rate limit errors, and lost data. Here's how to do it right.

Batch Processing Architecture

  1. Load data — Read from CSV, database, or API
  2. Chunk — Split into batches that respect rate limits
  3. Process concurrently — Use async workers with backoff
  4. Collect results — Write to output as results arrive
  5. Handle failures — Retry failed items, log errors

Model Selection for Batch Jobs

TaskRecommended ModelCost per 10K itemsWhy
ClassificationGLM-4 Flash$0.04Cheapest, fast, good accuracy
SummarizationDeepSeek V3$2.10Good quality, affordable
ExtractionDoubao Pro$0.34256K context, very cheap
TranslationQwen Plus$0.80Best multilingual, cheap
AnalysisClaude Sonnet 4$67.50Best quality for complex tasks

Production Batch Processor

import asyncio
import csv
import json
import time
from openai import OpenAI

client = OpenAI(base_url="https://api.aipower.me/v1", api_key="YOUR_KEY")

async def process_item(item, semaphore, model="zhipu/glm-4-flash"):
    """Process a single item with rate limiting."""
    async with semaphore:
        try:
            response = await asyncio.to_thread(
                client.chat.completions.create,
                model=model,
                messages=[
                    {"role": "system", "content": "Classify this text into: positive, negative, neutral"},
                    {"role": "user", "content": item["text"]},
                ],
                max_tokens=10,
            )
            return {
                "id": item["id"],
                "result": response.choices[0].message.content.strip(),
                "status": "success",
            }
        except Exception as e:
            return {"id": item["id"], "error": str(e), "status": "failed"}

async def batch_process(items, concurrency=20, model="zhipu/glm-4-flash"):
    """Process items with controlled concurrency."""
    semaphore = asyncio.Semaphore(concurrency)
    tasks = [process_item(item, semaphore, model) for item in items]

    results = []
    for i, coro in enumerate(asyncio.as_completed(tasks)):
        result = await coro
        results.append(result)
        if (i + 1) % 100 == 0:
            success = sum(1 for r in results if r["status"] == "success")
            print(f"Progress: {i + 1}/{len(items)} | Success: {success}")

    return results

# Run it
items = [{"id": i, "text": f"Customer review {i}..."} for i in range(10000)]
results = asyncio.run(batch_process(items))

Error Handling and Retry

async def process_with_retry(item, semaphore, max_retries=3):
    """Process with exponential backoff retry."""
    for attempt in range(max_retries):
        result = await process_item(item, semaphore)
        if result["status"] == "success":
            return result
        wait = 2 ** attempt
        await asyncio.sleep(wait)

    return result  # Return last failure

async def batch_with_retry(items, concurrency=20):
    """Process all items, then retry failures."""
    results = await batch_process(items, concurrency)

    # Separate successes and failures
    successes = [r for r in results if r["status"] == "success"]
    failures = [r for r in results if r["status"] == "failed"]
    print(f"First pass: {len(successes)} success, {len(failures)} failed")

    # Retry failures with lower concurrency
    if failures:
        failed_items = [item for item in items if item["id"] in {f["id"] for f in failures}]
        retry_results = await batch_process(failed_items, concurrency=5)
        successes.extend(r for r in retry_results if r["status"] == "success")

    return successes

Cost Optimization Tips

  • Use the cheapest model that works — GLM-4 Flash at $0.01/M handles most classification
  • Set max_tokens aggressively — classification needs 10 tokens, not 1000
  • Batch similar items — process multiple items per API call when possible
  • Cache duplicate inputs — hash inputs and skip duplicates
  • Test on a sample first — run 100 items before committing to 100,000
  • Use structured output — JSON mode prevents parsing errors

Monthly Cost Estimates for Batch Jobs

Items/monthGLM-4 FlashDeepSeek V3Claude Sonnet
10,000$0.04$2.10$67.50
100,000$0.40$21.00$675.00
1,000,000$4.00$210.00$6,750.00

Process millions of items affordably at aipower.me — GLM-4 Flash at $0.01/M tokens makes batch AI practically free. 50 free API calls to test your pipeline.

Ready to try?

50 free API calls. 16 models. One API key.

Create free account