Dienstag, 7. März 2017

Kanban and the Post Office

This is a translated and modified version of my article "Wie bei Kanban die Post abgeht", which was published online at Projektmagazin in February 2015. If you prefer German, you can find the original German article here.

Once I was with a company, that had freshly started using Kanban. Peter, the CEO was happy with what they already had achieved and proudly presented their board to me. Indeed I was impressed with how far they had come in such a short time without having much external expert support! The board roughly looked like this:

Sipping our coffee, we had a chat about Kanban in general and their board in particular. After a while I asked about the problem with queues in the post office. My conversation partner looked in me in disbelief, so I started from the beginning.

"When I was a kid", I said "every post office was organized in a way that each counter had its own queue. If you had a package to ship, you had to decide for one of the queues and line up - of course it always was the wrong choice, because it turned out that all other queues were serviced much faster. Today, post offices are organized differently. Usually, you‘ll only find one central queue for all counters. Only shortly before its your turn, you‘ll be directed to the next available clerk. Why is this? This system offers at least three major advantages:
  1. Predictability improves significantly. In the old system, it happened quite frequently that I had lined up in a specific queue only to discover later, that the person in fron of me had a very complicated issue he wanted to discuss with the clerk. Bad luck! The same thing happened, if my queue was served by a trainee, who (of course) was way slower than his/her colleagues. Certainly these things happen in the new system, as well. But here trainees and complicated requests have different ramifications, because the delays are split amongst all the customers, so to say. This is called "using pooling to buffer variability". Needless to say that this improved predictability and unified wait times lead to higher customer satisfaction.
  2. The new system is way more reliable. Imagine one of the clerks needs to leave his/her counter, because he/she is needed elsewhere. In the old system, this had caused severe disturbance, because the queue now had to be distributed amongst the other queues. But by what mechanism? Should the customers just line up at all the other queues? And wouldn‘t that be unfair, because they had already waited in the disintegrated queue? In the new system, it‘s still annoying, when a counter is being closed. But the impact for the overall system is rather low. Nobody has to be redirected. The wait time for all customers gets a little bit longer, but nobody feels treated unfairly, because the additional wait time will be evenly distributed amongst all customers.
  3. The new system is less stressful for the clerks. In the old system, every clerk had to serve each of his/her customers, before he/she could close the counter and leave work. Everyone had "own" customers and should have felt responsible for them individually. In the new system, all the clerks work as a team and share responsibility for all the customers. If someone is having a rough day (or difficult customers), his/her team mates will compensate for this automatically."

Keeping an eye on the queue as a whole

During my little monolog, Peter had been nodding a lot, so ha seemed to agree with all this. After I had finished, he directed his gaze to the board again and started thinking out loud: 
"In our context, the tickets on our board are the customers in the post office, each with different requests. Our team members are the clerks, who deal with the requests. A finished ticket is the equivalent of a served customer (hopefully a happy one!) At the moment we assign people to tickets right from the beginning, everyone has his/her own queue and is responsible for dealing with it. This means we are working in the old post office system. When I think about it, I have witnessed all the disadvantages you have just described. So for us, too, one joint queue should be a much better solution. But for this our board had to look differently."
He grabbed a marker, went to a nearby flip chart and started to sketch a modified version of the board. 

For the columns "To Do", "Next" and "Done", the swim lanes had disappeared. They now only existed for the activity columns "Dev", "Test" and "Deploy". It seemed like a small change, but I think it was a big step in the right direction towards more flexibility, collaboration and knowledge sharing. 

Holes in the input queue

I then nudged Peter a little bit more and asked: "What about holes in the "Next" column?" Again, he seemed to be puzzled, so I elaborated: "I guess, that the tickets in the "Next" column are prioritized by some sort of mechanism that makes sure the tickets with the highest value (or best value-cost-ratio) are on top of the queue and the less valuable ones are at the bottom, right?" Peter nodded, so I went on: "Now what happens when, let‘s say Stefan is done developing a ticket and wants to pull the next one from the "Next" column? Of course, he should pull the one at the top of the queue." 
I grabbed a marker and wrote "1" on the top ticket, "2" on the next one and so on. "Unfortunately, I am pretty sure that in many cases that will not happen. And the cause for this is specialization. That‘s probably the reason why you have the swim lanes in the first place. So if ticket 1 requires a specific skill, which Stefan does not master, he will leave the ticket where it is and pull ticket 2 instead. Now after a while Inken wants to pull a new ticket. But again, she‘s lacking the required skill for ticket 1. She doesn‘t feel comfortable pulling ticket 3, either, so she pulls ticket 4. Does this scenario sound realistic?" 
Peter nodded his head thoughtfully. He knew where I was getting at and changed his sketch of the board again. At this point it looked like this:

Before I could go on, Peter started speaking, and he said the exact same thing I was about to say: "This is a disaster! From a business perspective, these "holes" are an absolut no-go, because they mean we are delaying the most important tasks/projects, while we are working on less important ones. We cannot have this!" 
While he paused, I explained, what I had seen many times before: "Yes, I agree. But here comes the catch: If you want to make sure the most important tickets are always being worked on, the company has to invest in this new way of working. And depending on the degree of specialization, the technology you use, your code base etc. it might be a quite high investment. The board helps, because if designed appropriately, it will show you how far you‘ve come at any time. But of course, the sole visualization will not be enough. You will probably have to train your staff and collaboratively develop policies that make sure you gradually improve."
"Of course, I understand", Peter said (and he looked a little bit depressed). "I will start working on this tomorrow. Do you have any other advice on how the board can support us in this effort?" 

Some more ideas for improvement

I paused for a moment and thought about this. Then I replied: "In my experience it‘s mostly unfavorable to have swim lanes (or columns for that matter) named after specific people, because it tends to perpetuate the current division of work. It might be a good idea to have the specialization written at your swim lanes instead. The specialization might be a technology, a project, a product etc. And if you decide to do so, it might be a good idea to have avatars for each team member to indicate who‘s working on which item. The advantage of avatars compared to swim lanes is that they are more fluid. It‘s easier to move avatars from one item to another. And probably even more important in your case: you can easily have two or even three avatars on the same ticket. And that‘s exactly what you want: several people working on the same task, so that the specialist knowledge can spread." 
Again, Peter started to change the layout of the board, and now it looked like this:

I finished my thoughts by throwing a couple of more ideas at Peter: "In order to make more explicit, what you want to achieve, you could introduce a policy that goes like this: In every column there have to be at least two different avatars at any time. You could easily track your improvements by counting the number of tickets on which two or more people have collaborated. It might also make sense to limit the number of avatars per person, to encourage collaboration across specializations...But I feel we get carried away. So before going any further, I think you should gather your team and explain to them what you want to achieve and why. I am sure they have plenty of ideas how to get there themselves!"


One of the biggest benefits of Kanban is that it focuses on queues. Through the board we can gain visibility of the queues. And by limiting work in progress, we actively manage queues and thereby improve flow. Managing queues is a powerful, yet relatively simple way to decrease lead time dramatically. Of course the board can only show us improvement opportunities, if we are willing to look at them and if we know where to look. So the board should be designed in a way that it shows queues immediately. Ideally, this visualization is accompanied by metrics, which show the impact of (often growing) queues over time. 
In my experience it‘s always a good idea to have a closer look at the division of queues and ask questions like: How many different queues do we see? Why are they separated? What would be the advantages and disadvantages of merging several queues into one major one? How high would the investment be? 
I am not saying it‘s always a good idea to have one common queue (the modern post office system). It‘s mainly a tradeoff decision, which should be answered considering different factors like risk, short-term costs, customer and employee satisfaction etc. In fact, many post offices still do have a separate queue for banking issues. This probably makes sense, because it would be very costly to train every clerk in banking tasks.
Another interesting point is to look at this problem at different levels. In this post I‘ve discussed the issue of having individual specialists, whose work is divided by separate queues. Merging these queues might cause more collaboration and spreading knowledge. The same logic applies at a team or even department level. For instance, in product development it‘s worth asking: Do our development teams work on separate queues or do they pull items from a common queue? And again: What are the advantages and disadvantages? What would it take to change this? At which cost? And then at an even higher level: How do the queues look like for our departments and business units? Would it make sense to have a common queue here, as well (at least for some kind of work)? To get a better understanding of these different levels, I find the concept of Kanban Flight Levels as developed by Klaus Leopold very useful. 

What are your experiences with separate vs. common queues (and the post office)? Please leave a comment!


Like this post? Then you should check out my post Utilization as a proxy and my more recent post Keep the Ball rollin´

Montag, 20. Februar 2017

Seriously, what is a Pull System?

Almost 5 years ago, I‘ve published a blog post called What the F*** is Pull? The distinction between Pull System and Pull Behavior, that we‘d come up with earlier at the Kanban Leadership Retreat still makes a lot of sense to me. Yet I keep seeing a lot of confusion around the concept of pull, and I myself often had troubles explaining it in a crisp, comprehensive way. A couple of months ago, I was fed-up, freed up some time and thought about it a little bit more. After a little bit of scribbling and googling, I wrote down a short definition, which I am quite happy with. And like with most things, I did not invent this definition, it had all been written down before. It‘s just that I did not read the right resources before and that the wording of many texts did not convince me. Often, the definitions are too detailed for my taste and too focused on manufacturing systems. So here's how I define a Pull System as opposed to a Push System in a context of Kanban - maybe it‘s useful for others, as well.

Definition of Push vs. Pull

In a Push System, new input is determined by a plan or event. Output has to be adjusted accordingly.

In a Pull System, new input is determined by the system‘s capacity/capability. Input has to be adjusted to the output.

Pull Systems and WIP Limits

Now the connection between Pull and WIP limitation becomes evident. As Don puts it: "WIP limits are inherent to Pull Systems." If the input is to be determined by the system‘s capacity/capability, we 1) have to know this capacity/capability (therefore Lean‘s notion of studying the system and Understanding as one of Kanban‘s core values); and 2) we have to make sure that we never load the system beyond its capacity/capability. The easiest way of doing this is to only allow a new work item to enter the system, after another one has been finished. We have to "read" our system from right to left - just as we should "read" our Kanban board from right to left - hence the slogan Stop Starting, Start Finishing! 

Pure Pull Systems?

It‘s worth mentioning that pure Pull Systems probably do not exist. As Don Reinertsen points out in his brilliant presentation The Science of WIP Constraints, even the leanest system has a push-pull-boundary, meaning that the pull mechanism only starts after a certain process step. Before this step, work is pushed into the system. Even at Toyota, there is a minimum of planning and buffering involved - they don‘t melt new steel for every new car.
What‘s probably more important to knowledge work is the fact that want to achieve as much pull as possible in our system, but we also want our system to be able to absorb some push. Sounds strange, but it enables us to cope with major unforeseen events. In Kanban lingo, most expedite tickets will be pushed into the system. Ideally, our system is under-utilized, so that it provides spare capacity to deal with this extra work. But even if it does not, we might be willing to accept the push, because the cost of waiting for a free slot would be much higher than the cost of temporally overburdening the system. But that should be discussed further in a separate blog post...


Like this post? Then you should check out my previous post Keep the Ball Rollin‘

Dienstag, 7. Februar 2017

Keep the Ball Rollin‘

This is a translated (and slightly modified) version of my article "Der Ball muss rollen!", which was published online at Projektmagazin in May 2014. If you prefer German, you can find the original article here.

Yet another sports analogy

Once I was with a company, where every conference room was soccer-themed. Not only were the rooms named after famous soccer clubs, but they were also decorated with "devotional objects" like jerseys, balls, pennants, etc. Of cours,e you would also find those funny quotes like "Soccer is like chess, only without the dice" all over the place.
I recall one meeting, where people were vividly discussing how effective software development teams should be set up. Probably influenced by the environment, I started thinking about soccer teams and what we could learn from them. Okay, sports analogies are not really new in software development, but let‘s give it a shot...

How (not) to manage a soccer team

Let‘s start with the composition of a soccer team. We‘ll find a goalie, defending players, midfielders, and of course the strikers. Looks trivial at first glance, because in order to win a match, there are "tasks" that need to be done in each of these areas. And obviously, a goalie needs completely different skills than a striker.
But wait a minute! If we think about it a little bit harder, we‘ll find an incredible waste of resources here! Even if midfielders support the defense every now and then and strikers fall back occasionally, it‘s very clear that all players are extremely poorly utilized! What‘s the ball possession of an average striker? Two to three minutes? And if we look at the goalie, it‘s even worse! It looks like he‘s just standing there and waiting for something to happen at least 99.9% of the time. Now think about the ridiculously high salary level of professional soccer players and you‘ll probably be close to a nervous breakdown.
Needless to say, that we as experienced project manager instantly understand the disastrous management we‘re dealing with here! If a striker is only needed, let‘s say, 30% of the time (we also calculate things like zone defence here), then why not have him play three matches in parallel? We would still have a 10% buffer if something goes wrong. Looking at the goalie, we‘ll find even more to optimize, because he‘s "needed" even less often. So he could easily play ten matches simultaneously. That‘s good, because our amateur teams are desperately looking for a better goalie...
Now let‘s take a quick look at the substitution bench! Here we‘ll find plenty of great resources that are not utilized at all. They are paid for doing nothing! So let‘s reduce the number of substitutes dramatically. How many of them do we really need? Three should be more than enough. All the others could be used way better outside of the bench: They could play in other teams, train our junior teams, give out autographs at the mall, etc.

How (not) to manage a project

All this is, obviously, nonsense! Nobody would even consider optimizing a soccer team in the way just described. But why not? Why exactly is it common in professional soccer teams to under-utilize extremely expensive "resources", while it scares us to death to do the same thing in knowledge work? One major difference lies in the fact that in soccer it‘s really easy to see the damage that‘s done, when a player is not at the right spot at the right time: the other team scores! In professional soccer, the difference between scoring a goal and allowing the other team to score, is probably worth a 6- or 7-digit number. Given this order of magnitude, who cares if a player is not fully utilized?
And a second point comes into play: In soccer, it‘s clear to everyone, that what the coach can do is to train the team and provide them with a viable strategy. What he can not do, though, is to come up with a detailed plan for the whole match - or even the first ten minutes. If this would work, we could indeed create plans for every individual player, and we could even have them play several matches simultanously. The plans then would read something like: "At 15:53 pass the ball to player 8...At  15:57 prevent the ball from being lost on field 3...At 15:59 score a goal on field 2..." Of course it‘s ridiculous to even try this, because we know a soccer match is way too complex to even try to plan at this level of detail (1). And we all know that not everything goes according to our plan in a soccer match, and we have no chance of predicting what the opposing players will do at any time.

Comparing projects to soccer?

Let‘s summarize what we‘ve got so far: When it comes to professional soccer, we‘ve long accepted the fact that the high degree of uncertainty makes it useless to come up with detailed plans upfront. In addition, it‘s relatively easy to access the risk of a player not being at the right spot at the right time. This enables us to make reasonable trade-off decisions: How high are the costs of under-utilized players compared to the cost of a delay, because the team has to wait for a player, who‘s not ready to take-over the ball or block an opponent? In soccer, this cost of delay is so enormous, that dramatic under-utilization is accepted even for players who earn millions of Euros.

If we keep this in mind, perhaps the comparison to project work is not that absurd after all! Just as in soccer, in many projects we have to deal with great uncertainty, that often renders our beautiful plans useless. Also, in project work cost of under-utilization and cost of delay are factors that should be taken into consideration (2).

A fresh view on resource planning

Just to be clear: under-utilization of people and machines does matter, because it can lead to lost opportunities: When a highly skilled expert is idle, she might do something of great value elsewhere instead. That‘s the reason why it seems totally normal to us to come up with plans that make sure this person wil never be idle. What we ignore, though, is the fact that costs also occur when a project (or even a supposedly small milestone) is blocked, due to an expert, who is not instantly available.
If we would have more clarity on these two different types of costs, we would certainly make different decisions and our resource planning would appear in a different light. For instance, in some contexts it now might make a lot of sense to build and keep stable, cross-functional product teams, consisting of developers, testers, analysts and designers. There might be times when a designer or a tester is not fully utilized. But when she is needed, she will be there to help the team immediately (just like the soccer player when a ball is passed to him). This is a major advantage and solves a lot of problems we witness in our daily project work: low quality due to frequent context switches; rework due to long feedback loops; poor transparency on our project‘s progress, because all work packages are "80% done", just to name a few.
It‘s true in project work as much as in soccer: We must keep the ball rolling (3)! If we manage to do this, it‘s way less relevant, how many players are moving at which pace. This, by the way, is the difference between resource efficiency and flow efficiency: When we focus on resource efficiency, we make sure that everyone is busy; when we focus on flow efficiency, we make sure that we make progress on the most important tasks at any time. For several decades now, we only took resource efficiency into account. It‘s time to give flow efficiency priority now (4)!

P.S. I‘ve just learned that there‘s an old song called "Keep the Ball Rollin‘" by a band called Jay & the Techniques. Looking at the lyrics, I don‘t think the song has much in common with this blog post;-)

(1) Funny enough, I‘ve just finished reading the book Team of Teams by General Stanley McChrystal. To illustrate one of his points, McChrystal describes, how the coach of a fictional basketball team tries exactly this level of detailed planning and fails miserably, despite the fact that the team comprises of the world‘s finest athletes.
(2) For more details on cost of delay, look into see the brilliant analyses of Don Reinertsen.
(3) The idea is not new, neither is explaining it with sports metaphors:-) Years ago, Don Reinersen  coined the phrase: "Watch the baton, not the runner!"
(4) Niklas Modig brilliantly illustrates the difference between resource efficiency and flow efficiency in his book This is Lean: Resolving the Efficiency Paradox and in this Ted Talk.


Like this post? Then you should check out my post Utilization as a proxy and my more recent post Seriously, what is a Pull System?