Photo by Sunrise Photos on Unsplash

To improve your relationships with DS stakeholders, offer them a menu

Data science is hard. We invest great amounts of time and focus into developing our abilities to reason through models and apply hard-earned intuitions in practice. Interpret and summarize enough logistic regression models, and we start to forget the hard work it took to understand what a linear log-odds relationship is.

And that’s a relatively simple method. Imagine our poor stakeholders when we say things like, “Well, that first model made no assumptions on structure—it treated the dataset as completely flat. If we changed the generative model to assume mixed-membership in a set of clusters, we might improve performance, especially on cold starts.”

🥺

“Buddy, what?” I thought when I heard that in a meeting. And I was the one who said it.

Ultimately, we data scientists are only empowered to drive value via models if we understand the practical mechanisms, in our own organizations, by which we can deliver value. This necessarily entails mutual learning between us and our counterparts. How might your model be used? If you can better predict conversions or churn, how might teams that work on these problems use your work?

There are no easy answers on the road to productive collaborations and high-trust relationships. But here we’ll develop one shortcut, the concept of a menu of options, to make our stakeholder communications smoother and more transparent. We’ll focus on defining units of delivery, our repeatable project forms, and their associated costs to the organization. This will help everyone make informed tradeoff decisions.

If you’re a data science contributor or lead looking to speed up and smooth out project startup and execution, read on.

Menus describe exactly what a customer receives. In our case, we want to preemptively outline the scope of possible data science deliverables for a team or individual. This can be doubly challenging for many modelers. One, we often tend to approach every problem as though it were a blank slate. Approaching problems fresh is one great joy of the job, and thinking through the forms our projects take can make them feel cookie cutter and ho-hum. Two, it’s a serious challenge to forget the jargon that helps us communicate within our discipline and craft descriptions that resonate outside it. Nevertheless, these challenges can be fun opportunities to greet your counterparts with tangible examples they can touch.

Consider these examples, in the imagined context of a guide you’ve written for your non-ML-practitioner counterparts:

  • GLMs: This class of models allows us to model relationships among different variables, while controlling for others, and offer straightforward interpretations. For example, say you’re interested in individual customer spending and see that our high-earning customers spend more on our products, as do our older customers. But many of our high-earning customers are advanced in their careers, and higher in age. After controlling for the other, which has an effect on spend? These models can answer these sorts of questions.
  • Black box neural networks: These are usually difficult, often practically impossible, to interpret. But they’re able to learn more complicated relationships and often make better predictions than simpler models. If we have a way to make profitable use of accurate predictions, these could be a wise choice.

“But,” I hear you worry, “there’s so much more we know how to do! Will this ever be comprehensive? Will this pigeonhole us?”

Fair concerns, and to the first, no: In practice you’ll probably never produce a comprehensive catalog of everything you and your team can do. Nor is a menu a comprehensive list of all the items a chef can cook. This can be a very good thing, provided you’re clear to your internal customers that this isn’t all you do, but a sample of things you’re well-practiced in. And, if you like, recommendations for common use cases. Becoming more familiar with what you’ve done will generally give your friends in other functions more ideas on what you could do. “Here’s a problem we’ve been facing,” many pleasant, productive conversations begin, “I took a look at your team’s ‘About Me’ page and didn’t see anything like it, but is this the kind of thing you can help with?”

To the second, I haven’t seen this lead to pigeonholing in practice. Typical DS projects include a call out for next steps or future work. It’s more typical that, for example, the first logistic regression model you deliver for a partner will lead to follow up requests that require different, more sophisticated methods. You can even sequence your options, if you want to get fancy.

If crafting transparent, approachable descriptions of our work wasn’t challenging enough, describing its cost can bring the pain. We stretch the bounds on our metaphor here: Where the typical menus we’re all familiar with offer straightforward prices, the full cost of data science work is an elaborate mixture of efforts by both data science and partner teams. Here our goal is to give counterparts a clear idea of what our units of work take. That’s crucial as both we data scientists and our partners make tradeoff decisions on what they ask of us and how they spend their own time.

Let’s return to our prior two examples:

  • Once we have (1) developed scope, in the form of dependent and independent variables we’ve decided we would like to understand (2) built a data pipeline, a GLM takes a member of our team roughly 1-2 weeks of focused effort to turn around our first pass model. This deliverable includes a report (slide deck) on which variables show significant effects, and which of those are substantial.
  • Benchmark neural networks can be simple or complex depending on the complexity of the problem. In simple classification cases, Assuming a developed scope and an existing data pipeline, one data scientist can turn around a benchmark neural network in 2 to 4 focused weeks. This deliverable includes a description of the methods tried and a report of model performance.

Note the repeated references to focused effort. This is an important distinction that demands stress: Your message must be clear that these estimates assume no competing priorities. This can prompt conversations about what your team should be working on. Keep those conversations open and transparent, and that’s a very positive thing. Also note that the deliverables do not promise a particular level of performance.

“But wait,” you might wonder, “haven’t you side-stepped how long it takes to scope a project and build a pipeline?” That’s one way to put it, but I’d encourage you to frame this is increased transparency into what causes uncertainty into your turnaround times. List, to the fullest extent you can, the unknowns that cause variation. It’s natural and fair for our business friends to ask how long scoping takes. This is harder to answer, since it’s necessarily collaborative. Even if they have a very clear ask that’s perfectly suited to your DS team, communicating and documenting intent takes time. Likewise, building a model when there’s an existing, cleaned dataset—say one that’s used for monthly KPI reports—is typically much easier and faster. (Exceptions occur when a modeling use case can’t live with any assumptions made for the purpose of reporting.) Again, this tends to prompt more communication of mutual interdependencies, itself a very good thing.

Again, the goal here is not to make perfectly calibrated estimates of turnaround time. It’s to share your challenges and opportunities so that you and your stakeholders can help each other remove the challenges and create newer, bigger opportunities. If it helpful, use ranges or t-shirt sizes to broadcast the uncertainty and buy yourself some wiggle room.

Finally, an easy one: Your menu needs to be easily accessible. Publish it using whatever is your organization’s knowledge management tool of choice. Github pages, a google site, a link to either in your Slack channel, whatever gets it done. There’s no need to overcomplicate things here, it just needs to be accessible, and ideally collaborative. (It’s much easier to ask questions inline than write an email.)

Getting started

As I’ve laid out above, this approach has a few challenges, and developing your menu can prompt a little apprehension. In a sense, you’re committing to turnaround times. If you or your team fall into the trap of confusing this with an SLA of the form, “We turn around any model in two weeks,” you can quickly find yourselves over-promising and stressed out. It’s crucial to keep any implied commitments feasible and clear.

Here are a few steps you can take to develop a first draft:

  1. Start by brainstorming a list of projects you’ve recently completed.
  2. For each project, think through and jot down how long it took to do typical steps of identifying data, cleaning it, transforming it, and finally training models and measuring performance.
  3. For each project, tightly scope or description of the portion of work that is in your teams control. Then, transparently describe the inputs required. It’s important to keep scope minimal. Most of us modelers would happily tinker with an interesting modeling problem until the heat death of the universe. Putting a cap on your first delivery in via very helpful way to keep the perfect from being an enemy of the good.
  4. Think through how well each step went, and whether any obstacles that occurred would likely repeat themselves. Were your turnaround times greatly delayed? A happy accident because everything went perfectly?
  5. For each item, using the information you gathered in the prior step, select the smallest amount of time you would be comfortable saying it would take to complete another project of the same form, and use that as the low end of a range. If you’re doing this for a team, be careful to account for variation in different members’ familiarity. Bias yourself toward the higher amount of time.

Wrapping up with a minimal working example

To conclude, we’ll pull together the whole example menu below. Imagine this in the form of a website, internal to your organization’s private network, for the audience that is your internal prospective customers.

I hope you find this method as helpful as I have. What other methods and communication hacks have you found helpful in collaborating on data science projects?

Team RaDSkills looks forward to working with you! We are a DS team who boast skills in a variety of technical disciplines, here we’ve collected a sample of the kinda of projects we’ve completed, along with sketches of their input requirements and rough turnaround times. These aren’t the only things we can do, and we encourage folks to reach out with questions. We hope this menu prompts ideas on what we could work on together!

  • GLMs: This class of models allows us to model relationships among different variables, while controlling for others, and offer straightforward interpretations. For example, say you’re interested in individual customer spending and see that our high-earning customers spend more, as do our older customers. But many of our high-earning customers are advanced in their careers, and higher in age. After controlling for the other, which has an effect on spend? These models can answer these sorts of questions.

Once we have (1) developed scope, in the form of dependent and independent variables we’ve decided we would like to understand (2) built a data pipeline, a GLM takes a member of our team roughly 1-2 weeks of focused effort to turn around our first pass model. This deliverable includes a report (slide deck) on which variables show significant effects, and which of those are substantial.

  • Black box neural networks: These are usually difficult, often practically impossible, to interpret. But they’re able to learn more complicated relationships and often make better predictions than simpler models. If we have a way to make profitable use of accurate predictions, these could be a wise choice.

Benchmark neural networks can be simple or complex depending on the complexity of the problem. In simple classification cases, Assuming a developed scope and an existing data pipeline, one data scientist can turn around a benchmark neural network in 2 to 4 focused weeks. This deliverable includes a description of the methods tried and a report of model performance.

This isn’t everything we can do, and if you have ideas that aren’t represented here, please reach out!

Sean M. Easter
Data Science Leader

Posts may contain affiliate links, read more here