The penny game considered harmful

The lean/agile penny game has become a standard way to demonstrate the benefits of small batch sizes in a production flow. However, as a demonstration of the effects of agile transformation on a software development team it is woefully inadequate.

Penny game mechanics

The penny game works like this: We set up a "production line" consisting of a sequence of workers. Batches of work are given to the first worker in the line as a set of pennies. Each penny represents a unit of work in a project. The worker takes a batch of pennies, flips each of them once to represent completing that work item, and then passes the batch to the next worker downstream. When the final worker completes a batch, these are considered to be "delivered" to the customer.

The simulation is usually run first with a large batch size, typically 20 coins. Then it is run again, this time with a batch size of 5 coins. Metrics are calculated for each run, and these usually demonstrate the superior performance of working in small batches:

Value delivered:
The total value of the coins delivered through the whole process in each run. In the simulations below this is simply the total number of coins processed by each production line. In manual penny game demonstrations, which usually last only a few minutes, this metric shows a large difference in value delivered. However if the simulations are run for long enough, which you can do for yourself below, the marginal increase in value delivered by small-batch working becomes negligible.
Cycle time:
The time it takes one coin to pass through the entire production process. In software development this could be the time it takes us to fix a production defect, for example. In the clasic penny game, cycle time is much smaller for small-batch working.
Time to first value:
The period elapsed before the process delivers any usable value to the customer. Again, this is always lower (ie. better) for small-batch working.
Work in process:
The total amount of work that has been started, but not yet delivered. This represents the investment in the work that is currently in flight, and is also representative of the levels of frustration often felt by those outside of the team. Yet again, in the classic penny game there is much less work in process in the small-batch variant.

Criticisms

The penny game originated in manufacturing as a demonstration of the power of working in small batches. However, the game is often used to make the case for adopting agile working in software development projects. Unfortunately the game, as played in manual demonstrations, lacks many of the true features of software development:

Task size
In the penny game every worker takes roughly the same amount of time to flip each coin. But in real software development, there can be a huge difference between the amount of time required to write a story and the amount of time required to develop it. This means that, regardless of the batch size, analysed work is likey to pile up waiting to be developed. One antidote to this effect is to give the developers training and support in the XP practices, including YAGNI, Merciless Refactoring, Continuous Delivery etc; another is to create effective pull systems, so that work is only done (by anyone) when there is a customer demand for it.
Coupling
As the codebase grows, coupling between logically unrelated areas also grows. This coupling exerts a braking effect on the team, causing tasks to take longer and longer over the course of the project. Usually this also means that many tasks can become un-releasable on their own -- the effective batch size grows continually.
Defects
Any defects found by users or in downstream testing will disrupt later development tasks. The cost of interruption is high. (This is not currently represented in the simulation below.)
Multi-tasking
Developers often work on several tasks "simultaneously", or can be taken off development to work on the analysis of future tasks, or to attend meetings. All of these distractions increase the cost of developing any task, due to the high spin-up time required to get back in the zone after any interruption of more than a few minutes. (This is not currently represented in the simulation below.)

The simulation

The simulation below consists of three production lines. Each line executes the same process in exactly the same way, but each has been configured with parameter values that represent different software development processes:

"Waterfall"
This is the classic penny game played with a batch size of 20 coins. It is usually intended to represent a "waterfall" project lifecycle, although as a representation of software development projects it lacks verisimilitude.
"Agile"
This is the classic penny game played with a batch size of 5 coins. It is usually intended to represent an "agile" project lifecycle. The Criticisms above show why I believe this to be naive.
"Scrum"
This production line is configured to reflect what often happens in "agile adoption" projects. If the developers are not given the same amount of training and long-term support that the managers and analysts are given, they will continue to use software development practices that are most appropriate to large-batch processes.

The production line configuration parameters are:

Batch size
The batch size used by all workers in the project, except for Development.
Development batch size
The initial batch size used by Development. This is intended to reflect the fact that many development teams find it difficult to work in small increments, without the up-front design to which they are accustomed.
Development batch increment
The amount by which the development batch size grows with each iteration. Intended to reflect the effects of coupling and legacy code. Paradoxically, this effect is often magnified by asking the developers to forego up-front design without supporting them in adoption of something like the XP practices.
Development task size
The number of clock ticks required to complete any development task. Reflects the fact that creating production software often takes longer than writing requirements. Also (for now) reflects the time lost due to interruptions such as defects.

Future additions

  1. I hope to soon add controls that will allow you to change the configurations as you run these simulations. This will allow you to test the impact of fixing the various problems affecting the "Scrum" development team.
  2. I also plan to add a factor for multi-tasking, which is not yet represented in the simulation.
  3. I may also add a "defect rate", sending coins backwards from testing into development.
  4. Graphs, so that you can compare the way the various metrics of the three simulations change over time.

To run the simulations, press the play button below. You can pause at any time, or use the reset button to start again from an empty project.

Enjoy!