Model
Retention Cost Model
Generic methodology to forecast storage volume and cost, then evaluate rolling retention options for a large dataset without double-counting growth.
Calibration window
4 months (post-optimization)
Total MoM growth (illustrative)
~2.2% (avg Δ / baseline)
Target dataset MoM growth (illustrative)
~3.3% (avg Δ / baseline)
What this model answers
Forecast baseline growth
Projects total volumes forward using a short, post-optimization calibration window to avoid mixing eras.
Isolate target dataset
Models the retained dataset separately so retention can be applied without double-counting total growth.
Estimate unit cost
Derives an average $/GB from the calibration window and applies it consistently across scenarios.
Compare retention scenarios
Computes adjusted total volumes and costs under rolling retention windows (e.g., 3/6/12 months).
Explain the ‘so what’
Makes it clear why some retention choices produce small deltas when the target dataset becomes a small % of total.
Guardrails
In production, retention execution should include throttling and safety controls to avoid overwhelming the storage layer and downstream systems.
Backpressure + rate limits per dataset/partition
Concurrency caps for deletes / metadata updates
Dead-letter queue for poison messages
Idempotent execution (safe retries)
Audit logs + metrics for completeness and drift
Method
Step 1 — Estimate baseline total growth (no retention)
- —Use a short, recent calibration window (post-optimization) to avoid mixing eras.
- —Compute average month-over-month (MoM) delta in total volume.
- —Convert that average delta to a MoM growth rate relative to the baseline month.
Step 2 — Forecast future total volumes
- —Apply the baseline MoM growth rate forward to forecast monthly total volumes.
- —This produces the baseline projection (no retention applied).
Step 3 — Model the target dataset separately
- —Compute average MoM delta and MoM growth rate for the target dataset (e.g., Chronicle).
- —Forecast both (a) target dataset monthly increments and (b) cumulative target dataset total volume.
- —This isolates the dataset so retention can be applied cleanly.
Step 4 — Estimate unit cost ($/GB) and baseline cost
- —Derive a per-GB unit cost from the same calibration window.
- —Use the average unit cost and multiply by forecasted total volumes to estimate baseline monthly cost.
Step 5 — Apply rolling retention to the target dataset
- —For a retention window (e.g., 3/6/12 months), sum only the last N months of target dataset incremental volumes.
- —This yields target dataset ‘with retention’ volume per month.
Step 6 — Compute adjusted totals and costs
- —Adjusted Total Volume = Total (baseline) − Target Total (baseline) + Target (with retention).
- —Adjusted Cost = Adjusted Total Volume × Avg Unit Cost.
- —Compare baseline vs scenarios to quantify savings and explain sensitivity.
Formulas (copy/paste)
Avg MoM delta (total volume)
Inputs: TotalVol[m] for each month m in calibration window ΔTotal[m] = TotalVol[m] - TotalVol[m-1] AvgΔTotal = average( ΔTotal[m] ) over calibration months excluding baseline month
MoM growth rate (total volume)
Baseline month: b (e.g., first month in window) MoMGrowthTotal = AvgΔTotal / TotalVol[b]
Forecast total volume (baseline, no retention)
For forecast month t (t = b+1, b+2, ...): TotalVolForecast[t] = TotalVolForecast[t-1] * (1 + MoMGrowthTotal)
Avg unit cost ($/GB)
UnitCost[m] = Cost[m] / TotalVol[m] AvgUnitCost = average( UnitCost[m] ) over calibration months
Baseline monthly cost (no retention)
CostBaseline[t] = TotalVolForecast[t] * AvgUnitCost
Target dataset rolling retention volume
Retention window: N months TargetInc[t] = forecasted monthly incremental volume for target dataset TargetWithRetention[t] = sum( TargetInc[t - k] ) for k = 0..(N-1)
Adjusted total volume (retention scenario)
TargetTotalBaseline[t] = forecasted cumulative target dataset volume (no retention) AdjustedTotal[t] = TotalVolForecast[t] - TargetTotalBaseline[t] + TargetWithRetention[t]
Adjusted cost (retention scenario)
CostScenario[t] = AdjustedTotal[t] * AvgUnitCost
Appendix A — Calibration inputs (edit these)
| Month | Total Volume (GB) | Target Dataset (GB) | Cost (USD) |
|---|---|---|---|
| 2025-01 | 831,640,814.1 | 7,472,975 | 1,514,802 |
| 2025-02 | 847,841,295 | 6,950,684 | 1,560,000 |
| 2025-03 | 867,477,568 | 8,101,004 | 1,590,000 |
| 2025-04 | 885,610,182 | 8,181,687 | 1,620,000 |
Tip: keep this appendix generic by labeling the target dataset as “Target Dataset” if you don’t want to expose internal dataset names.