RecSys Metrics Laddering

To optimize on CLTV, stop trying to iterate on CLTV.

An old colleague of mine, let's call him David, has just left a meeting with his CEO. She's given him a mandate to start optimizing his team’s recommender system (RecSys) for Customer Lifetime Value (CLTV). He's baffled—models can only predict transaction level outcome and can't mathematically "optimize" for CLTV. She's frustrated—why won't he follow my simple instructions and start building?

I want to contrast this with an example from a widely respectful and successful team at Stitch Fix. This team built a probability of purchase model that was foundational to our recommender system, and made big contributions to CLTV despite never using it as a direct operating metric. They did that separating out their objectives into different tiers:

  • Strategic goals. What is the company trying to achieve? Customer growth? Retention? CLTV? The team took its cue from senior leadership on strategic goals.

  • Product metrics. How is product success measured? Conversion? Revenue? The team had a strong influence in choosing these, but were always able to justify how investing in product metrics moved the strategic objectives. 

  • Model backtesting metrics. How do we know if a model is good enough to A/B test? Log likelihood? AUC? Something else? The team developed several custom metrics for offline model evaluation, such as cAUC (within-client AUC) that correlated with our product goals better than off the shelf standard metrics.

This was effective because strategic goals provided a north star, while the lower-level goals enabled rapid iteration.

Three Types of Goals

Measuring success is hard. Ultimately, executives are accountable for the success of the company broadly, which we could think of as enterprise value. To steer the company toward better outcomes, leaders turn to strategic goals like improved CLTV to rally teams to row in the same direction.

CLTV is a great example of a metric that can be an excellent strategic goal, but is hard to goal product teams around. And it’s impossible to directly optimize CLTV with a RecSys model. Great goals form a ladder: Strategic goals set the direction of the company. Product metrics approximate the strategic goals while respecting the constraints of online testing. Model backtesting metrics correlate with A/B testing metrics, while respecting the constraints of ML model building and evaluation.

Strategic goals

The organization’s hopes and dreams. Strategic goals provide all teams a vision to strive for. 

Examples include practical objectives, like improved CLTV, or abstract ideas, like being recognized as the industry leader in personalization. For any given goal it’s important for there to be some way to distinguish success from failure, but it’s not crucial for the goal to be measurable with an AB test.

Strategic goals should be owned by senior leadership and clearly communicated to all teams at the company.

Product metrics

Product metrics are tactical. If the company launches a new recommender system, how do we know if it worked?  Product metrics reflect strategic goals while providing effective tactical feedback to the product team.

CLTV is great at reflecting strategic goals, but horrible at providing effective tactical feedback. Lifetime value takes years to measure, but great teams need to get feedback and make decisions fast.

At Stitch Fix, we tended to look at shorter term metrics like items sold as the primary measure, with revenue, margin, customer satisfaction, and retention as secondary measures. We placed the highest value on retention, but since churn is a relatively rare event, it was difficult to run experiments with enough power to detect changes in retention. We were able to run an experiment in a week or two and make a launch decision shortly thereafter. 

Testing goals should be owned by the product team or ML team directly, with signoff from the executive sponsor.

Model Backtesting Metrics

Unless you want to send your team into a tailspin, a recommender systems’s offline metrics must be technical details, not political calling cards. Evaluating backtesting metrics requires the ML expertise of practitioners and are unlikely to be intuitive to outsiders.

Most of the ML Models used in RecSys are supervised: They take some objective (like clicks or purchases) and train to maximize that outcome. Although these objectives can look a lot like AB test outcomes, it’s better to think of the choice of objective function as a technical detail. For example, a model trained on clicks may be better at increasing purchases than a model trained on purchases because often there is a lot more data available about clicks than purchases. The team may opt for a hybrid model that does both, although this can increase complexity. The team will also choose offline testing evaluation, such as NDCG, with an eye to best approximating the product goals.

CLTV is a great example of an outcome that cannot be trained on directly. RecSys models infer the causal impact of exposing a particular user to a particular item. It’s hard enough to validate that exposing a user to an item caused a user to buy that item, it is completely impractical to validate that a single exposure caused the user’s lifetime value to increase. Since so many other things are also contributing to that outcome, the signal becomes completely drowned out by noise.

Model design goals should be owned by the ML team that is building the model. That team should be accountable for great A/B tests.

How to get an ML team to optimize CLTV

Going back to our original example: How should a senior leader get an ML team to maximize CLTV? The answer is to ladder goals down through AB test outcomes and ML model design. An example ladder might look like this:

  • Strategic goal: Improve customer CLTV. Executives communicate this goal to the entire company, the shareholders, and the board.

  • Product metrics: Improve conversion with customer satisfaction as a secondary goal. This decision is to be owned by the product team.

  • Backtesting metric: NDCG focused on clicks. This metric is owned by the ML team.

By setting appropriate goals at different levels of granularity, every team has a metric they can move fast to improve. That’s the fastest route to strategic success, whether improved CLTV, growth, or whatever is right for your business.

Leveling up your RecSys

Ready to elevate your team’s recommender system? Schedule a free 30-minute RecSys consultation with Rubber Ducky Labs to talk through how these ideas can apply to your team.

Previous
Previous

What You’re Getting Wrong About A/B Tests (And How to Fix Them)

Next
Next

From Frustration to Function: How Great Teams Get Great Results from RecSys