Model

Retention Cost Model

Generic methodology to forecast storage volume and cost, then evaluate rolling retention options for a large dataset without double-counting growth.

Calibration window
4 months (post-optimization)
Total MoM growth (illustrative)
~2.2% (avg Δ / baseline)
Target dataset MoM growth (illustrative)
~3.3% (avg Δ / baseline)

What this model answers

Forecast baseline growth
Projects total volumes forward using a short, post-optimization calibration window to avoid mixing eras.
Isolate target dataset
Models the retained dataset separately so retention can be applied without double-counting total growth.
Estimate unit cost
Derives an average $/GB from the calibration window and applies it consistently across scenarios.
Compare retention scenarios
Computes adjusted total volumes and costs under rolling retention windows (e.g., 3/6/12 months).
Explain the ‘so what’
Makes it clear why some retention choices produce small deltas when the target dataset becomes a small % of total.

Guardrails

In production, retention execution should include throttling and safety controls to avoid overwhelming the storage layer and downstream systems.

Backpressure + rate limits per dataset/partition
Concurrency caps for deletes / metadata updates
Dead-letter queue for poison messages
Idempotent execution (safe retries)
Audit logs + metrics for completeness and drift

Method

Step 1 — Estimate baseline total growth (no retention)

  • Use a short, recent calibration window (post-optimization) to avoid mixing eras.
  • Compute average month-over-month (MoM) delta in total volume.
  • Convert that average delta to a MoM growth rate relative to the baseline month.

Step 2 — Forecast future total volumes

  • Apply the baseline MoM growth rate forward to forecast monthly total volumes.
  • This produces the baseline projection (no retention applied).

Step 3 — Model the target dataset separately

  • Compute average MoM delta and MoM growth rate for the target dataset (e.g., Chronicle).
  • Forecast both (a) target dataset monthly increments and (b) cumulative target dataset total volume.
  • This isolates the dataset so retention can be applied cleanly.

Step 4 — Estimate unit cost ($/GB) and baseline cost

  • Derive a per-GB unit cost from the same calibration window.
  • Use the average unit cost and multiply by forecasted total volumes to estimate baseline monthly cost.

Step 5 — Apply rolling retention to the target dataset

  • For a retention window (e.g., 3/6/12 months), sum only the last N months of target dataset incremental volumes.
  • This yields target dataset ‘with retention’ volume per month.

Step 6 — Compute adjusted totals and costs

  • Adjusted Total Volume = Total (baseline) − Target Total (baseline) + Target (with retention).
  • Adjusted Cost = Adjusted Total Volume × Avg Unit Cost.
  • Compare baseline vs scenarios to quantify savings and explain sensitivity.
Formulas (copy/paste)

Avg MoM delta (total volume)

Inputs: TotalVol[m] for each month m in calibration window

ΔTotal[m] = TotalVol[m] - TotalVol[m-1]
AvgΔTotal = average( ΔTotal[m] ) over calibration months excluding baseline month

MoM growth rate (total volume)

Baseline month: b (e.g., first month in window)

MoMGrowthTotal = AvgΔTotal / TotalVol[b]

Forecast total volume (baseline, no retention)

For forecast month t (t = b+1, b+2, ...):

TotalVolForecast[t] = TotalVolForecast[t-1] * (1 + MoMGrowthTotal)

Avg unit cost ($/GB)

UnitCost[m] = Cost[m] / TotalVol[m]
AvgUnitCost = average( UnitCost[m] ) over calibration months

Baseline monthly cost (no retention)

CostBaseline[t] = TotalVolForecast[t] * AvgUnitCost

Target dataset rolling retention volume

Retention window: N months
TargetInc[t] = forecasted monthly incremental volume for target dataset

TargetWithRetention[t] = sum( TargetInc[t - k] ) for k = 0..(N-1)

Adjusted total volume (retention scenario)

TargetTotalBaseline[t] = forecasted cumulative target dataset volume (no retention)

AdjustedTotal[t] = TotalVolForecast[t] - TargetTotalBaseline[t] + TargetWithRetention[t]

Adjusted cost (retention scenario)

CostScenario[t] = AdjustedTotal[t] * AvgUnitCost
Appendix A — Calibration inputs (edit these)
MonthTotal Volume (GB)Target Dataset (GB)Cost (USD)
2025-01831,640,814.17,472,9751,514,802
2025-02847,841,2956,950,6841,560,000
2025-03867,477,5688,101,0041,590,000
2025-04885,610,1828,181,6871,620,000

Tip: keep this appendix generic by labeling the target dataset as “Target Dataset” if you don’t want to expose internal dataset names.