Photo by Jeremy Bishop on Unsplash

Using dogmatic Agile for data science courts disaster

If you’re interested in or forced into using Agile for DS, these tactics can keep tragedy at bay

Once the great liberator that would free us all from the onerous gated documentation of Waterfall, Agile is now de rigueur in most tech organizations.

Despite the sharp-elbowed title, this post will take an evenhanded look at Agile — both scrum and Kanban — its potential application to data science, and why you may or may not want to leverage it in your shop. And if you do choose to use it, you’ll come away with techniques to apply and traps to avoid to make yourself and your team successful.

“Data science” here will focus on the function of training, testing and comparing models, be they statistical or machine learning. We’ll assume some familiarity with Agile practice, and use terms like “sprint” and “sprint planning” without definition. If you’re a data scientist or product pro struggling to make Agile a valuable practice for data science teams, this post will help.

Should you use agile for data science at all?

Sure, if you’re committed to taking a clear-eyed look at how the work of data science is done, and using Agile in a thoughtful way that supports the work.

If we forget for a second about the cottage industry of rent-seeking Agile consultants and the laborious processes that have emerged from it, and return to the original Agile manifesto principles, we in machine learning find much that resonates. Sure, many of us will read several of those principles as overly focused on software, e.g., “Working software is the primary measure of progress.” We machine learning pros have an urge to nudge this principle more toward model results or metrics. Until our models move the target needle, the value of our work is hypothetical. (Later we’ll discuss practices that can easily merge these perspectives.) “Build projects around motivated individuals. Give them the environment and support they need, and trust them to get the job done[,]” is a principle that truly transcends disciplines.

Of course, rare is the real-world organization whose practice truly takes its motivation from these principles. More common — predominantly so, in my experience — are shops that are agile because, for example, “Our product owner likes scrum,” or, “The whole company is already on Jira.” It’s a moment of disillusionment that many developers share: When she realized that much of modern Agile practice is more concerned with tracking and reporting the activities of developers an organization doesn’t trust, and much less concerned with bringing joy to the pursuit of excellence.

There’s nothing inherently wrong with tracking or reporting, to the contrary they can be very valuable to all parts of an organization. What’s critical is that an organization make sure that after accounting for the cost of tracking, prioritization and communication of work, those activities have a positive return on investment. When done well, this return can be substantial, in the form of lowered communication costs imposed on a team and broader organization. That is, the less time you spend telling people what problems you’re solving, the more time you spend solving problems. For many data scientists, the feeling of productivity from focused effort on hard problems, is the true joy of the job.

At this point, some readers might ponder the cause for, or even bristle at, my repeated mention of bringing joy to work. We might summarize the first perspective as, “I’m all about fun at work, but why the focus?” A more bearish perspective might grump, “It’s work. Stop complaining, fall in line and get it done.”

If you’re in the first group, I’d encourage you to read Better, Simpler Strategy or Love + Work. (Affiliate links, through which I earn a commission at no cost to you.) The former offers an argument that there’s strategic benefit in competing for talent on many vectors beyond just “paying market.” The latter will help readers design work that they and others love, and describe the many benefits of loving your work. Here’s author Marcus Buckingham in an article adapted from the book, note and emphasis mine:

But research does reveal that when you’re engaged in an activity you love, that same chemical cocktail [that occurs during romantic love] is present in your brain — along with anandamide, which brings you feelings of joy and wonder. Primed by this cocktail, you interact with the world differently. Research by neurobiologists suggests that these “love chemicals” lessen the regulatory function of your neocortex, widening your perspective on yourself and liberating your mind to accept new thoughts and feelings. You register other people’s emotions more intensely. You remember details more vividly. You perform cognitive tasks faster and better. You are more optimistic, more loyal, more forgiving, and more open to new information and experiences. One could say that doing what you love makes you more effective, but it’s so much more than that: You’re on fire without the burnout.

If you’re in the second, grumpish camp, read Roger Schwarz describe the Mutual Learning mindset, in any Smart Leaders, Smarter Teams, The Skilled Facilitator (affiliate links, ibid.), or the free article “Eight Behaviors for Smarter Teams”, and give special note to its third assumption: “I may be contributing to the problem.”

Organizations that design work people love are rewarded in the form of higher productivity, engagement and retention. And luckily for leaders of organizations that cannot compete on compensation or perks, increasing it doesn’t come with financial cost: We can do it for free, just by doing our jobs better. The trick to doing this for data science and machine learning is to employ practices that account for the their distinctions from standard product development, to reduce the frustrations of micromanagement and bring your teams maximum joy.

What’s so special about data science and machine learning?

Uncertainty. The primary role of a data scientist is to grow an organization’s body of knowledge, typically through inferential statistics where we quantify uncertainty and apply statistical tests; or, through proof by construction, where we prove that machine learning can solve a given problem by constructing a black box that solves it. The production of knowledge and this process is uncertain.

This can create a few sources of friction among the varied functions on an interdisciplinary team. Product pros are primarily, and rightly, concerned with getting things to market faster than competitors. Faced with a newly machine-learning-powered feature, a product manager unaccustomed to working machine learning will often ask the seemingly harmless question, “When will this model be ready for deployment?” Seeking clarity of dependencies and roadmapping them is, after all, their job. A well-meaning data scientist will often offer the factually true but organizationally naive response, “There’s no way to know.”

This can be jarring to folks accustomed to working in Agile shops, which these days is most. The premise of sprint cycles is to tune your work commitments as you learn more and more about what a team can deliver over that cycle. Estimating time to completion is a key skill for software ICs and leads alike, so when a PM hears a full stop “don’t know and can’t know,” they’re trained by experience to become a little skeptical. “We haven’t built a feature of this complexity before, so our estimate could be a little off” is a natural statement from an Agile developer; “No way to know” can smell to many PMs like incompetence.

This frequent micro-miscommunication isn’t insurmountable though, it’s just the product of different mental patterns across disciplines. To overcome it, we have to shift our conception of DS units of delivery from modeling outcomes to artifacts of reproducible knowledge.

How can we use Agile to manage this uncertainty successfully?

If you’re a data scientist negotiating units of work in an Agile framework, only commit to estimable units of modeling work; never commit to modeling outcomes. If you’re a PM, never seek commitment in your stories, tasks and acceptance criteria to modeling outcomes. Note that this principle applies at the level of specific units of committed deliverables, and not the level of a program’s key objectives.

Perhaps there’s a devil on your shoulder whispering, “Well, never say never…” Please banish that voice to another room: That voice is the reason we can’t have nice things.

By “modeling outcomes,” I mean measurable improvements in model performance. For an example commitment to a modeling outcome, take the acceptance criterion, “Model updates will improve error on our conversion model by two percent, relative.” Authoring criteria like this is a mistake. It assumes an effect is present and of meaningfully substantial magnitude. Science is a process: Hypothesize, experiment, measure, repeat. Your committed project tasks should each capture an iteration through this cycle.

If you’re a data scientist, focus your tickets on acceptance criteria that are fully within your agency to deliver in the allotted time and can be observed to be incontrovertibly done. For example, contrast these two criteria:

  1. Model training pipeline will include three new, specific predictive features. Difference of predictive performance of model trained with and without new features will be reported.
  2. Model training pipeline will include new features. Features will improve model performance, as measured by a reduction in mean-square error by no less than three percent of its current value.

In (1), we have a fixed-scope unit of work whose input — in terms of a data scientist’s time — can be estimated just like any other technical task in a typical Agile software shop. (ML newcomers may be confused by the term “predictive features” here. It’s loosely the ML community’s version of “independent variables,” think “inputs to our model.”) Just like any other Agile shop, a team’s commitments could take tuning. As a data scientist grows in her experience with the team’s data infrastructures and workflows, she’ll be able to add more features in a single sprint. If the organization moves to a discoverable feature store, this sort of work will likely achieve true velocity. All of this will empower an organization to learn more about feature lift, without providing even a tiny sliver of guarantee of what we’ll learn or what the outcome will be.

In (2), we’ve committed to a task with a duration somewhere between a single sprint — if we’re magnificently lucky — and the time that will elapse between now and the heat death of the universe.

Recall our motivating principles of agency and observable status. The targeted lift is observable, but what else must be true in order to achieve that lift in a single sprint?

  • The organization must have infrastructure that enables the data scientist to retrieve the new features and compare model performance with and without them, and the data scientist must have sufficient skill to do so in the allotted time.
  • The features must actually provide lift within the hypothesis space of the existing model.

The first is something a team should be able to learn from experience. The second condition may be true, and answering whether it’s true is the whole point of adding the features and performing the model comparison experiment.

Savvy readers will be noting that the scientific method includes a prediction step, wherein we derive a falsifiable prediction from knowledge and theory under test. What’s wrong, one might wonder, with putting that prediction in a story or ticket?

In essence, nothing. To use Agile successfully for data science, an organization must appreciate the distinction between expectation and commitment. “From prior experience on similar models and features, I wouldn’t be surprised if we saw three percent lift,” expresses an example expectation, and is a sane and rational thing to say in sprint planning or in a comment on a ticket. But the work of science is answering questions, and we can never guarantee what that answer will be: All we can commit to is that we’ll do the work to find out.

Which of scrum or Kanban is better for DS?

Now that we’ve reframed the tasks of data science to creating units of knowledge, we’ve set the foundation for successful DS in Agile. One naturally wonders whether there’s a big difference between scrum and Kanban styles of project management. Summarized, scrum plans units of work in fixed-length sprints, where teams choose how much work to commit to over that period. Kanban is a continuous flow model that seeks to minimize work in-flight and maximize throughput. (For more, this post from Atlassian is a helpful, thorough introduction.) The short answer is that either of these can work, but these targeted questions can help figure out the best one for your teams:

  • Is one of these already in broad use at your organization? Is that use uniform? Is everyone on a shared two-week sprint cadence?
  • How well-developed are your MLOps and prototype-to-production deployment practices? (I’ve written about how Make, for example, can ease the hand off to friends in engineering here.)
  • Is your team more focused on value through deployed machine learning, or on value through inferential models that inform strategy? Start by answering each, then consider how they play together. For example, say your dev teams are on a shared scrum cadence, and your team is focused on learning about customers through big data models. If your role is to inform product backlog decisions, and you should be learning all you can as fast as you can, one might argue that Kanban helps you do that better and sacrifices nothing.

Conversely, if you are reliant on development teams to build a micro service that serves your models predictions, you’ll likely derive great value from sharing the same cadence. If your MLOps practice is sufficiently mature that you’re able to train, validate, and deploy models quickly and independently — and able to monitor for data and model drift, and retrain and redeploy, self-reliantly — then this might change things. Perhaps you have the freedom to adopt a different cadence, or perhaps the teams that consume your model APIs would appreciate the ease of roadmapping that a shared cadence would afford you. If you have separate analytics teams, consider their cadence too, and how you can mutually amplify each other’s value. (I’ve written more about this here.)

This is very much a one-size fits one situation. But here’s another case where staff joy can guide us: If you remember that obstacles to delivery frustrate data scientists and that impact brings us joy increases it, you’re likely to find the right answer.

This is very much a one-size fits one situation. But here’s another case where staff joy can guide us: If you remember that obstacles to delivery frustrate data scientists and that impact brings us joy increases it, you’re likely to find the right answer.

Concluding and getting started

First, make sure to appreciate that getting this right can be a challenge. Again, the different vectors of uncertainty make for very different work from typical software, and an organization must shift their mindset together. This challenge can be compounded depending on personalities in both camps. There are product managers who infer from their role in prioritizing a backlog a certain permission to bring an “I am in charge” attitude into every interaction, and there are far too many data scientists who take their quantitative gifts as permission to lob micro aggressions at anyone who can’t immediately interpret a log-odds relationship on a normalized variable.

  1. If you’re already in a scrum style agile environment, and the challenges described above have manifested in your environment, bring them up in your next sprint planning or post-mortem. Be upfront that you can’t commit to modeling results, only to deliverables that record found answers.
  2. If you’re not on Agile and want to experiment with using it for data science, start with the most light weight management system you can. If your organization is already on GitHub, their Projects feature can be a great option to kick off a lightweight Kanban process.
  3. If your stakeholders are confused about what to request for your backlog, consider offering them a menu of options describing what your team can deliver (full details here). Even better, start a conversation on embedding your data science team throughout the organization.

And if you have a more specific challenge that’s not covered here—or if you have other tips on building high-productivity data science teams—feel free to reach out via email or on LinkedIn.

Sean M. Easter
Data Science Leader

Posts may contain affiliate links, read more here