Pre-deployment evaluation for models that run continuously
Source: Dev.to
Discussion
When working with models that run continuously, I’ve found it hard to reason about how performance degrades over time using only static train/test evaluation. For those of you who deploy long‑lived models: how do you currently build intuition about model behavior under distributional change before deployment, if at all? What kinds of tools or practices do you rely on?
