Concepts
Point-in-Time
The core principle of SafeFeat is point-in-time correctness: when computing features for a prediction at cutoff time $t$, only use information available before time $t$.
Example: If you predict churn on 2024-02-15, don't use events from 2024-02-16.
Spine
The spine defines your prediction scenarios:
entity_id | cutoff_time
----------|------------
user_1 | 2024-01-31
user_1 | 2024-02-15
user_2 | 2024-02-15
Each row is an "as-of" moment: compute features for that entity looking back from that date.
Aggregations
WindowAgg
Count or summarize events in a rolling time window before the cutoff.
WindowAgg(
table="events",
windows=["7D", "30D", "3M", "1Y", None],
metrics={"*": ["count"], "amount": ["sum", "mean"]}
)
For cutoff 2024-02-15:
- 7D window: events after
2024-02-08up to2024-02-15 - 30D window: events after
2024-01-16up to2024-02-15 - 3M window: events after
2023-11-15up to2024-02-15 - 1Y window: events after
2023-02-15up to2024-02-15 - None window: all events up to
2024-02-15(no lookback limit)
Supported Aggregations
"count"— number of events (on wildcard"*"only)"sum"— total of a numeric column"mean"— average of a numeric column"nunique"— count distinct values
Window Units
| Unit | Example | Meaning |
|---|---|---|
D |
"30D" |
Exact days |
H |
"24H" |
Exact hours |
min |
"90min" |
Exact minutes |
s |
"30s" |
Exact seconds |
M |
"3M" |
Calendar months (respects month length) |
Y |
"1Y" |
Calendar years (respects leap years) |
None |
None |
All history before cutoff |
M and Y use calendar-aware arithmetic, meaning they respect actual
month and year lengths rather than assuming fixed days.
The window boundary is computed as:
window_start = cutoff - duration
Events are included if window_start < event_time <= cutoff
(window start is exclusive, cutoff is inclusive).
"1M"is not the same as"30D". For cutoff2024-02-03: -"1M"→ window_start =2024-01-03(31 days back) -"30D"→ window_start =2024-01-04(exactly 30 days back)An event on
2024-01-04is counted in"1M"but not in"30D".
RecencyBlock
Time since the last event:
RecencyBlock(
table="events",
filter_col="event_type",
filter_value="purchase"
)
Returns the number of days between the cutoff and the most recent matching event.
Useful for: - Churn modeling (days since purchase) - Fraud detection (days since login) - Activity recency
Data Quality & Leakage
Future Events
Events that occur after the cutoff are automatically excluded. This prevents leakage.
allowed_lag = "0s" # Default: strict point-in-time
# or
allowed_lag = "24h" # Allow events up to 24h after cutoff
Use allowed_lag if your event data has slight delays in ingestion.
Audit Report
The report tracks data quality by table:
features, report = build_features(..., return_report=True)
audit = report.tables["events"]
Available metrics:
- total_joined_pairs — events matched to entities/cutoffs
- kept_pairs — events within cutoff
- dropped_future_pairs — events after cutoff (leakage risk)
- max_future_delta — largest lateness of dropped events
High dropped_future_pairs may indicate:
- Ingestion delays
- Timezone issues
- Data quality problems
Entity-Cutoff Pairs
Each row in the spine creates an entity-cutoff pair. Events are grouped by: - entity_id — which user/customer/entity - cutoff_time — the prediction/cut-off moment
When computing features, we:
- Join events to entity-cutoff pairs by entity
- Filter to events up to the cutoff time
- Aggregate by entity-cutoff pair
Example:
Spine row: user_1, 2024-02-15
All events for user_1:
2024-01-10: 10 ← excluded (before 30D window start 2024-01-16)
2024-01-20: 20 ← included
2024-02-01: 15 ← included
2024-02-20: 5 ← excluded (after cutoff)
30D window (events after 2024-01-16 up to 2024-02-15):
Count: 2
Sum: 35
Output Column Names
Features are named automatically based on the table, metric, aggregation, and window:
| Pattern | Example |
|---|---|
{table}__n_events__{window} |
events__n_events__30d |
{table}__{col}__{agg}__{window} |
events__amount__sum__3m |
{table}__n_events__all |
events__n_events__all |
{table}__recency |
events__recency |
{table}__recency__{col}_{val} |
events__recency__event_type_purchase |
Window suffixes are lowercased: "30D" → 30d, "3M" → 3m, "1Y" → 1y, None → all.