In this blog, we discuss the collaboration challenges in data science today. Then, we share a case study of how one of our clients implemented live collaboration between business owners and data scientists.
Aligning business owners and data scientists is a recognized priority within most organizations. In particular, the pandemic has highlighted the importance of collaboration tools as a cornerstone to productivity, and forced us to innovate new ways of working together.
However, conceptual agreement about the need to collaborate has not ensured frictionless insight generation. One persistent challenge we’ve heard from data science teams is the gap in contextual knowledge. Business owners have subject matter expertise that are inputs to accurate analysis:
- Properly selecting the analysis population requires understanding of treatment nuances and potential outliers.
- Separating intuitive drivers from accidental correlation benefits from prior knowledge and experience.
- Recommendations should align with overall strategy, and selected outcome variables to target need to be tied to levers of change which are feasible to change.
Most organizations have defaulted to the traditional collaboration tools: frequent conference calls and long emails. But while this is adequate for transferring higher level context, it does not scale well with the increasing complexity of data and systems. Legacy working models cannot keep up with today’s new datasets, with dozens or even hundreds of dimensions.
Einblick was designed to natively support this kind of data science collaboration. We discuss our approach more fully in a prior post, but key features include:
- A live co-working canvas allows for data scientists and business owners to collaborate live in a single environment.
- Data science collaboration stops being an asynchronous process where context setting happens separately from output creation.
- The business owner is no longer removed from analysis creation process, improving buy-in from the business and reducing errors from misinterpreting outputs.
- Progressive sampling engine means that Einblick supports highly responsive outputs even on very large datasets.
- Data exploration becomes an interactive experience, even when there are no pre-aggregated dashboards.
- Iteration is sped up, as model building does not require hours of machine time.
- Simple drag-and-drop descriptive analytics allows the business owner to take the lead in the exploratory pieces of the analytic process.
- The analysis driver becomes the team member who best understands the subject matter domain, not the person who just happens to know how to make R plots.
Client’s Old Way New Approach with Einblick Call gets scheduled to define the analytic question and try to get through as much context as possible. Call gets scheduled for business owner and data scientist to login into Einblick together, from their respective remote desks. A CSV file gets sent to the data scientist, who spends some time running descriptive statistics.
A list of questions is created about the data, and sent to the business owner who sends back a multi-page response.
The business-owner, not the data scientist, runs the initial descriptive analytics. Einblick’s intuitive visual querying system allows users to quickly create outputs without code, and allows the context owner to drive the conversation.
Then, discussion about the data can occur while looking at the dataset directly, not in abstract.
For an initial run of modeling, the data scientist might take a kitchen sink approach and use all possible data features.
Sometimes the business owner gives directional guesses on relative feature importance based on contextual knowledge, but generally is hands-off in the first run.
To generate hypotheses in a systematic way, the data scientist can insert data into the Key Driver Analysis (KDA) operator. KDA helps scan through hundreds of candidate drivers of performance, and surfaces the most influential.
The business owner can scan across the narrower range of highly rated candidate drivers to highlight which features are appropriate to use in a model, and what features might need to be excluded.
The data scientist sets up a model pipeline and takes an initial cut, creating a model using their best understanding of required inputs and outputs.
The model is then shared with the business owner to try and validate whether or not the right variables were used or targeted.
The business owner answers a guided Q/A workflow to help inform what the model aims to accomplish.
The AutoML engine then searches for the right algorithm and tunes parameters to find a model that best fits the goal defined for the analysis.
Iteration requires firing off more numerous emails and scheduling follow-up chats to determine how to change the variables included in modeling. Each run of the model takes a few hours, and back and forth is required, so the last 10% of the process ends up taking 50% of the time. Einblick displays the top analysis drivers, to ensure that the values make sense to the business analyst. The platform also gives technical specifications to the model to the data science team. If other drivers are needed, iteration can happen in seconds, as progressive computation allows Einblick to provide near realtime model computation even on large data sets.
Once aligned, the model Python code can be downloaded to be incorporated into an existing pipeline, or be deployed as a data product.
In short, a good collaborative model for data science requires active participation by people of multiple knowledge domains. Einblick was designed to do just that – bring data scientists and business owners into the same canvas, to visually work through problems together.
Request a demo today and find out how you can transform analytic collaboration at your organization too.