Don’t Over-Invest: How to efficiently stand up ML capabilities

The Pareto Principle, commonly known as “80 / 20,” suggests that the majority of value is captured in the first small application of effort. Applying that concept to strategic management of data analytics projects, here are two simple, but helpful, applications of this principle about efficiency:

Diminishing Returns: 80% of model accuracy can be discovered in 20% of time.
In the time it takes to create one perfect model, we could have created five imperfect ones. Digging around for a better result and hand tuning a model ends up being a “luxury” activity that should be applied to situations where it clearly makes sense to burn time to eke out incremental quality. But in most cases, even a basic linear regression approach with a few variables is a more holistic approach to targeting a specific response than just eyeballing or guessing. 

As exemplified here, the quality of the model run within a progressively sampling AutoML quickly converges

Fail Fast: 80% of value is found in 20% of potential modeling questions.
We need to create a process to quickly discover the most valuable areas for time investment, since not all analyses end up valuable. Some outcomes are just not predictable – either because we are missing the right drivers or simply because random chance plays too big a role. In either case, it is important to not have over-administered and project managed the analysis. Getting to a first answer efficiently helps prioritize what discussions should happen. 

This model, built in the 30 second run above, is all you need to have a discussion with Marketing leadership. 

These two propositions actually converge: make many imperfect models. Rapid prototyping lets us filter and quickly shoot down the bad models, and rapid deployment lets us start realizing value immediately. More work has diminishing returns, and given that teams have limited bandwidth, further investments past some point is not advisable since it will not efficiently yield results.

Certainly, some data science projects should be careful and deliberate: object identification for self-driving cars or regulated banking risk models, for instance. But for the vast majority of practical business applications, the bar to clear is not 100% accuracy but rather improves on gut instinct. 

Ultimately, in most organizations, there is a long list of decisions being made with limited data-driven guidance, that even simple ML is able to easily achieve incrementality. Moving swiftly through a range of topics, finding the “pretty good” answer, and widely democratizing AI/ML is a worthwhile data strategy for the immediate term.