Model

Retention Cost Model

Generic methodology to forecast storage volume and cost, then evaluate rolling retention options for a large dataset without double-counting growth.

View Architecture

Calibration window

4 months (post-optimization)

Total MoM growth (illustrative)

~2.2% (avg Δ / baseline)

Target dataset MoM growth (illustrative)

~3.3% (avg Δ / baseline)

What this model answers

Forecast baseline growth

Projects total volumes forward using a short, post-optimization calibration window to avoid mixing eras.

Isolate target dataset

Models the retained dataset separately so retention can be applied without double-counting total growth.

Estimate unit cost

Derives an average $/GB from the calibration window and applies it consistently across scenarios.

Compare retention scenarios

Computes adjusted total volumes and costs under rolling retention windows (e.g., 3/6/12 months).

Explain the ‘so what’

Makes it clear why some retention choices produce small deltas when the target dataset becomes a small % of total.

Guardrails

In production, retention execution should include throttling and safety controls to avoid overwhelming the storage layer and downstream systems.

Backpressure + rate limits per dataset/partition

Concurrency caps for deletes / metadata updates

Dead-letter queue for poison messages

Idempotent execution (safe retries)

Audit logs + metrics for completeness and drift

Method

Step 1 — Estimate baseline total growth (no retention)

—Use a short, recent calibration window (post-optimization) to avoid mixing eras.
—Compute average month-over-month (MoM) delta in total volume.
—Convert that average delta to a MoM growth rate relative to the baseline month.

Step 2 — Forecast future total volumes

—Apply the baseline MoM growth rate forward to forecast monthly total volumes.
—This produces the baseline projection (no retention applied).

Step 3 — Model the target dataset separately

—Compute average MoM delta and MoM growth rate for the target dataset (e.g., Chronicle).
—Forecast both (a) target dataset monthly increments and (b) cumulative target dataset total volume.
—This isolates the dataset so retention can be applied cleanly.

Step 4 — Estimate unit cost ($/GB) and baseline cost

—Derive a per-GB unit cost from the same calibration window.
—Use the average unit cost and multiply by forecasted total volumes to estimate baseline monthly cost.

Step 5 — Apply rolling retention to the target dataset

—For a retention window (e.g., 3/6/12 months), sum only the last N months of target dataset incremental volumes.
—This yields target dataset ‘with retention’ volume per month.

Step 6 — Compute adjusted totals and costs

—Adjusted Total Volume = Total (baseline) − Target Total (baseline) + Target (with retention).
—Adjusted Cost = Adjusted Total Volume × Avg Unit Cost.
—Compare baseline vs scenarios to quantify savings and explain sensitivity.

Formulas (copy/paste)

Avg MoM delta (total volume)

Inputs: TotalVol[m] for each month m in calibration window

ΔTotal[m] = TotalVol[m] - TotalVol[m-1]
AvgΔTotal = average( ΔTotal[m] ) over calibration months excluding baseline month

MoM growth rate (total volume)

Baseline month: b (e.g., first month in window)

MoMGrowthTotal = AvgΔTotal / TotalVol[b]

Forecast total volume (baseline, no retention)

For forecast month t (t = b+1, b+2, ...):

TotalVolForecast[t] = TotalVolForecast[t-1] * (1 + MoMGrowthTotal)

Avg unit cost ($/GB)

UnitCost[m] = Cost[m] / TotalVol[m]
AvgUnitCost = average( UnitCost[m] ) over calibration months

Baseline monthly cost (no retention)

CostBaseline[t] = TotalVolForecast[t] * AvgUnitCost

Target dataset rolling retention volume

Retention window: N months
TargetInc[t] = forecasted monthly incremental volume for target dataset

TargetWithRetention[t] = sum( TargetInc[t - k] ) for k = 0..(N-1)

Adjusted total volume (retention scenario)

TargetTotalBaseline[t] = forecasted cumulative target dataset volume (no retention)

AdjustedTotal[t] = TotalVolForecast[t] - TargetTotalBaseline[t] + TargetWithRetention[t]

Adjusted cost (retention scenario)

CostScenario[t] = AdjustedTotal[t] * AvgUnitCost

Appendix A — Calibration inputs (edit these)

Month	Total Volume (GB)	Target Dataset (GB)	Cost (USD)
2025-01	831,640,814.1	7,472,975	1,514,802
2025-02	847,841,295	6,950,684	1,560,000
2025-03	867,477,568	8,101,004	1,590,000
2025-04	885,610,182	8,181,687	1,620,000

Tip: keep this appendix generic by labeling the target dataset as “Target Dataset” if you don’t want to expose internal dataset names.

What this model answers

Forecast baseline growth

Projects total volumes forward using a short, post-optimization calibration window to avoid mixing eras.

Isolate target dataset

Models the retained dataset separately so retention can be applied without double-counting total growth.

Estimate unit cost

Derives an average $/GB from the calibration window and applies it consistently across scenarios.

Compare retention scenarios

Computes adjusted total volumes and costs under rolling retention windows (e.g., 3/6/12 months).

Explain the ‘so what’

Makes it clear why some retention choices produce small deltas when the target dataset becomes a small % of total.

Guardrails

In production, retention execution should include throttling and safety controls to avoid overwhelming the storage layer and downstream systems.

Backpressure + rate limits per dataset/partition

Concurrency caps for deletes / metadata updates

Dead-letter queue for poison messages

Idempotent execution (safe retries)

Audit logs + metrics for completeness and drift

Method

Step 1 — Estimate baseline total growth (no retention)

—Use a short, recent calibration window (post-optimization) to avoid mixing eras.
—Compute average month-over-month (MoM) delta in total volume.
—Convert that average delta to a MoM growth rate relative to the baseline month.

Step 2 — Forecast future total volumes

—Apply the baseline MoM growth rate forward to forecast monthly total volumes.
—This produces the baseline projection (no retention applied).

Step 3 — Model the target dataset separately

—Compute average MoM delta and MoM growth rate for the target dataset (e.g., Chronicle).
—Forecast both (a) target dataset monthly increments and (b) cumulative target dataset total volume.
—This isolates the dataset so retention can be applied cleanly.

Step 4 — Estimate unit cost ($/GB) and baseline cost

—Derive a per-GB unit cost from the same calibration window.
—Use the average unit cost and multiply by forecasted total volumes to estimate baseline monthly cost.

Step 5 — Apply rolling retention to the target dataset

—For a retention window (e.g., 3/6/12 months), sum only the last N months of target dataset incremental volumes.
—This yields target dataset ‘with retention’ volume per month.

Step 6 — Compute adjusted totals and costs

—Adjusted Total Volume = Total (baseline) − Target Total (baseline) + Target (with retention).
—Adjusted Cost = Adjusted Total Volume × Avg Unit Cost.
—Compare baseline vs scenarios to quantify savings and explain sensitivity.

Month

Total Volume (GB)

Target Dataset (GB)

Cost (USD)

2025-01

831,640,814.1

7,472,975

1,514,802

2025-02

847,841,295

6,950,684

1,560,000

2025-03

867,477,568

8,101,004

1,590,000

2025-04

885,610,182

8,181,687

1,620,000