How Replicate Bills
Replicate’s pricing is decoupled from the specific model being run. Whether you’re generating an image with SDXL or running Llama 3, the billing is determined by the hardware tier and the duration of execution. This lets them host thousands of open-source models without needing a separate pricing plan for each one.| Hardware | Price per Second | Price per Hour |
|---|---|---|
| NVIDIA CPU | $0.000100 | $0.36 |
| NVIDIA T4 GPU | $0.000225 | $0.81 |
| NVIDIA A40 GPU | $0.000575 | $2.07 |
| NVIDIA A40 (Large) GPU | $0.000725 | $2.61 |
| NVIDIA A100 (40GB) GPU | $0.001150 | $4.14 |
| NVIDIA A100 (80GB) GPU | $0.001400 | $5.04 |
- Hardware-Specific Rates: The cost per second varies based on the compute resources required. Each hardware tier has a different price point.
- Pure Usage-Based Model: There are no monthly fees, no overages, and no limits. Users are billed for exact compute time (e.g., “12.4 seconds on an A100”) rather than per-generation.
- Per-Second Granularity: Traditional cloud providers bill by the hour or minute, leading to waste on short-lived tasks. Per-second billing eliminates this inefficiency for both small experiments and large production workloads.
Cold starts are also billable. The first request to a model often takes 10-30 seconds to load the model into memory. This loading time is billed at the same rate as execution time.
What Makes It Unique
- Hardware-specific metering: The same model costs more on better hardware. Users choose between speed and cost. A T4 GPU works for non-time-sensitive tasks, while an A100 handles real-time applications.
- Per-second granularity: Billing is calculated to the second, so users are never overcharged for short tasks.
- No subscription: Zero commitment to start. It scales infinitely with usage, making it ideal for startups and developers experimenting with different models.
- Model-agnostic: The billing logic stays the same regardless of task type (image generation, text processing, audio transcription, or video synthesis). This lets the platform support a vast model ecosystem without complex pricing tables.
Build This with Dodo Payments
You can replicate this billing model using Dodo Payments’ usage-based billing features. The key is to use multiple meters to track different hardware tiers and attach them to a single product.Create Usage Meters (One Per Hardware Class)
Create separate meters for each hardware tier. Each hardware type has a different cost per second, so independent metering lets Dodo price each tier differently and provide itemized billing.
The
| Meter Name | Event Name | Aggregation | Property |
|---|---|---|---|
| CPU Compute | compute.cpu | Sum | execution_seconds |
| GPU T4 Compute | compute.gpu_t4 | Sum | execution_seconds |
| GPU A40 Compute | compute.gpu_a40 | Sum | execution_seconds |
| GPU A40 Large Compute | compute.gpu_a40_large | Sum | execution_seconds |
| GPU A100 40GB Compute | compute.gpu_a100_40 | Sum | execution_seconds |
| GPU A100 80GB Compute | compute.gpu_a100_80 | Sum | execution_seconds |
Sum aggregation on the execution_seconds property calculates total compute time per hardware tier over the billing period.Create a Usage-Based Product
Create a new product in the Dodo Payments dashboard:
Set the Free Threshold to 0 for all meters. Every second of execution is billable.
- Pricing type: Usage Based Billing
- Base Price: $0/month (no subscription fee)
- Billing frequency: Monthly
| Meter | Price Per Unit (per second) |
|---|---|
| compute.cpu | $0.000100 |
| compute.gpu_t4 | $0.000225 |
| compute.gpu_a40 | $0.000575 |
| compute.gpu_a40_large | $0.000725 |
| compute.gpu_a100_40 | $0.001150 |
| compute.gpu_a100_80 | $0.001400 |
Send Usage Events
Send usage events to Dodo whenever a model execution completes. Include a unique
event_id for each prediction to ensure idempotency.Measure Execution Time Precisely
Wrap your model execution with precise timing using
performance.now(). Round to the nearest tenth of a second for billing.Accelerate with the Time Range Ingestion Blueprint
The Time Range Ingestion Blueprint simplifies per-second compute tracking. Create one ingestion instance per hardware tier and usetrackTimeRange for cleaner event submission.
Cost Estimation for Users
Since usage-based billing can be unpredictable, provide users with cost estimates before they run a model. This reduces surprise bills and builds trust.Example Cost Calculations
| Model | Hardware | Avg Time | Cost Per Run |
|---|---|---|---|
| SDXL (image) | A100 80GB | ~8 sec | ~$0.0112 |
| Llama 3 (text) | A100 40GB | ~3 sec | ~$0.0035 |
| Whisper (audio) | GPU T4 | ~15 sec | ~$0.0034 |
Building a Cost Calculator
Enterprise: Reserved Capacity
For enterprise customers who need guaranteed availability and no cold starts, Replicate offers “Private Instances” at a fixed hourly rate. With Dodo Payments, model this as a subscription product:- Product Type: Subscription
- Price: Fixed monthly price (e.g., “Reserved A100 Instance - $500/month”)
- Billing Cycle: Monthly
Advanced: Heartbeat Metering
For tasks that take several minutes or hours, sending a single event at the end is risky. If the process crashes, you lose the usage data. A better approach is to send usage events every 30-60 seconds during execution.Key Dodo Features Used
Usage-Based Billing
Set up products that bill based on consumption.
Meters
Define the metrics you want to track and bill for.
Event Ingestion
Send usage data to Dodo in real-time.
Subscriptions
Manage recurring billing for reserved capacity and enterprise plans.
Time Range Blueprint
Per-second compute tracking with duration helpers.