From Frustration to Function: How Great Teams Get Great Results from RecSys
As head of recommender systems at Stitch Fix, John saw first hand how an effective recommender system can be a crucial pillar for an ecommerce business. But many organizations struggle to implement them effectively, leading to wasted cycles and missed opportunities. The most common issue is a failure to move through projects and ideas quickly to get results. This post lays out common pitfalls and outlines the best practices of successful recommender systems teams.
Ineffective RecSys Iteration
To paraphrase Tolstoy: All happy ML teams are alike, all unhappy ML teams are unhappy in their own way. Here are some examples of experiences teams might have before they’ve become highly effective:
Challenges proving success. The team built a prototype for their new recommender systems months ago, but they’re still stuck in analysis paralysis deciding whether or not to ship. Everyone thinks the recommendations look reasonable, but no one knows which aggregated metrics to look at.
Challenges with instrumentation. When the team analyzed the A/B test results for the new "recommended items" carousel, they couldn't distinguish between the novelty of a new carousel vs the effectiveness of the recommendations. That left them unsure of how to answer their team's question, “Is our recommender system actually working?”
Challenges investing wisely. With that launch complete, there are still dozens of problems. The CEO keeps asking why she's getting recommended sweaters in June, data quality issues are popping up everywhere, and the team still doesn't understand which of the multiple carousels is driving engagement. With many areas of concern, it's difficult to pick the next area for investment.
This situation is stressful for everyone involved, and team members start to wonder if their work makes a difference for their customers at all.
Effective RecSys Iteration
Returning to our Tolstoy metaphor, effective teams tend to work in similar ways. Teams that ship experiments quickly and efficiently are able smoothly iterate through a sequence that looks like this:
Planning. The team reflects on the organization’s goals for the recommender system and proposes changes or experiments. They stack rank changes by ROI, incorporating both deep domain expertise from practitioners, and a shared vision for success developed by product managers and senior management.
Backtesting. For each proposed change to the recommender system, the team prototypes a new model. Backtesting enables both historical model evaluation as well as bugs or conceptual errors. The historical data for testing and simulations is easy to pull and matches what inference models will see in production. As a rule of thumb, about half of backtested models advance to the A/B testing stage.
AB Testing. Promising prototypes are A/B tested. The A/B test is designed to provide a clear answer if the algorithm is making statistically significant improvements to key metrics. Again, only roughly half of A/B tests have a favorable result.
Deciding. The team is decisive in its call about whether to launch an A/B tested model to production, thanks to their shared understanding of the success definition. They quickly decide if a test is successful, then ship it and move on to the next project. About a quarter of prototypes end up shipping.*
Our successful team doesn’t succeed by being right every time: Most ideas end up on the chopping block. What differentiates a successful team is the ability to experiment with ideas quickly to find the few that truly make a difference.
On the path to effectiveness, the team has overcome the challenges we mentioned at the beginning of this post:
Proving success is straightforward. The team has aligned around the definition of success and a strong A/B testing practice enables everyone to quickly get on the same page about what’s working and what’s not.
The team has mature instrumentation. Since the system is instrumented to make analysis, backtesting, and production launches easy, the team doesn’t spend excessive cycles working around gaps in their infrastructure or blocked by other engineering teams.
The team can decisively choose investments. Thanks to alignment around success measurement and mutual trust, product managers and ML engineers easily have productive conversations about where to invest time and resources.
Once you know what an effective set up looks like, it may take significant investment to become effective. But the result is improved visibility, rapid decision making, and a flywheel that can consistently deliver better outcomes for customers.
Going Deeper
We wanted to share what we’ve learned helping teams create world class recommender systems, so we’re releasing a series of posts about what it takes to enable game-changing recommendations for your customers:
Was my launch successful? This will address challenges around measuring impact, improving decisiveness, and steering development in the right direction.
Instrumenting rapid iteration: The foundation you need to build to operate a successful recommender system. This will address common challenges to unblocking rapid iteration.
Investing in RecSys: How to think about RecSys investments and ROI, whether you’re just starting out or have a mature system in place.
Ready to elevate your team’s recommender system? Schedule a free 30-minute RecSys consultation with Rubber Ducky Labs to talk through how these ideas can apply to your team.
* If you are a nerd, you might be interested to note that this 4-step process is a kind of OODA loop: Observe, orient, decide, act.