Do we learn from performance failures?

In the November issue of IEEE Computer, Bob Colwell’s column on Books Engineers Should Read caught my eye and led me to add several titles to my reading list. It also made me think about the contrasting messages of Vincenti’s What Engineers Know and some of the more well-known books like Henry Petroski’s To Engineer Is Human and Charles Perrow’s Normal Accidents.

The main theme of Colwell’s column is that we can learn from design failures: from the collapse of the Tacoma Narrows Bridge, from the Challenger shuttle disaster, from Three Mile Island. I can’t disagree, for I’ve been strongly influenced by Petroski’s and Perrow’s books, as well as by Peter Neumann’s Risks Digest. But I’ve been influenced even more by Vincenti’s book, which is much less concerned with design failures and much more with the research that enables engineers to set and meet design requirements.

One of Vincenti’s case studies, for example, describes how the NACA’s researchers in the 1930s studied the stability of aircraft in level flight. This led to them to discover how stability could be achieved by designing the plane’s controls to require a certain stick force per g of acceleration. From then onwards it became relatively easy for engineers to design planes with the stability that their customers wanted.

Are engineers worried only about avoiding catastrophic failures? I think not. More often, I believe, they’re worried about improving on last year’s product, about outdoing their competitors, about achieving the performance and cost targets they’ve taken on. And that, I believe, is why the researchers in What Engineers Know focused on discovering what performance criteria were important and how to meet them – how to avoid performance failures.

There is little doubt that software systems often fail to deliver the performance their users need. The most celebrated case is the so-called Project Ernestine, in which the redesign of a phone operator’s workstation was found to increase the time operators took to handle calls. It’s celebrated, not because of the size of the failure (the increase was only 3.4 percent) but because it was made public, and was confirmed by an elegant cognitive model. The project showed not only that performance failures were occurring – it showed that HCI could help prevent them.

There haven’t been any more cases quite like Project Ernestine, and I believe the reason is simple: few of us know how to measure the performance of the systems we use. As a result, we can’t tell whether they’re getting better or getting worse. Or at least we can’t tell unless the difference is enormous, as in the case of a Windows-based system that British Telecom installed for its customer service staff; it was so slow that the staff would switch back to the previous DOS-based system to deal with any complex request. When systems get slightly worse, nobody can really tell.

After reading What Engineers Know and hearing about Project Ernestine, I began to see a connection. Engineers know what to measure; designers of interactive systems don’t. It’s as simple as that.

Posted December 6th, 2005 + plink