Recently I was asked to prepare a case study of the Apollo system for intelligence analysis, and the ongoing efforts to test its effectiveness. (Apollo is developed by IDI and HuMRRO; I'm just reviewing it.) Apollo uses Bayesian networks to overcome known biases in human reasoning that affect intelligence analysis, and other forms of argument and inference. The testing is going slowly, so I concentrated on the method and history, arguing that Apollo picks up where some promising work from around 1970 left off.
Comments welcome: [Slides] | [Draft Paper]
Update: I got to speak with Dave Schum recently (late October). I'd been wondering why Zlotnick et al. became interested in Bayes when they did, and what happened. It seems I missed a lot of back story. I'm working on it.
Background
The paper was one of several papers commissioned for the two-day National Research Council Workshop on Field Evaluation of Behavioral and Cognitive Sciences-Based Methods and Tools for Intelligence and Counterintelligence.Apollo was developed by IDI and HumRRO, and combines software (mostly Netica), a modeling methodology, and an optional psychology subnet. There are a couple of papers on Apollo, such as this one. What is most remarkable, however, is that Apollo is being tested rather than just used.
To be clear: I had nothing to do with developing Apollo. I presume I was asked to do the review in part because I had reviewed a study design for a proposed test of Apollo a couple of years ago, when I was at IET, and in part because a committee member knew that I understood both Bayes nets and experimental design.
Executive Summary
Apollo builds Bayesian network models for predicting specific actions by foreign leaders. It combines software with a facilitated method. A modern version of work done by Zlotnick, Schweitzer, and Fisk in the 1970s, Apollo benefits both from ubiquitous computing power and the advent of graphical models – the “network” in “Bayesian network”. These advances allow the models to account for dependencies between variables, enabling much richer models, including a submodel for leader personality. The goal is to improve the accuracy (and calibration) of the predictions, by addressing some known biases in human reasoning.The Apollo team is running experiments to measure the performance gain (or loss), but it has been difficult to get enough analysts for enough time. A randomized controlled trial was infeasible, so the authors have adopted a pre-test / post-test design. The following suggestions may enhance the usefulness of these experiments to the Intelligence community:
- Also compare the gains produced by merely averaging the analysts’ probability judgments.
- Run similar experiments with other methods like ACH, Argument Maps, and Prediction Markets, and compare the gains.
- Run tests in other settings where access is easier. For example, use similar but unclassified prediction problems, including business, economic, or political forecasting. This could be done at the intelligence universities and also in academe.
- Adopt continuous measurement. Individual analysts can keep their own running tallies, and eventually agencies can formally track forecasting performance to improve production methods.
Similar suggestions can be found in Johnston’s recent ethnography, and indeed, traced back to Sherman Kent.
There is broad sentiment that structured methods may be fine in theory, but that for some reason they can’t, don’t, or won’t work in daily use. I am partially sympathetic. Although there are many reasons to think something like Apollo must work, adapting theory to practice often hits unforeseen complications. For example, although structured methods are designed to avoid “hypothesis lock”, one can imagine a situation where they backfire: having fixed the alternatives in the model, it becomes too difficult to change on the fly, so analysts lose the ability to add alternatives. Regardless the merits of this example, the point is that many such hidden effects will likely reduce the field effectiveness of good lab techniques. That is why we must test on practical problems.
The Apollo team is to be commended for their diligent effort to test the method within the Intelligence Community. That community would be well-served to support such endeavors, internally or externally.
But if that is to happen, we have to address the deep resistance to structured methods generally, and experiments specifically. Johnston documented several reasons for resisting structure, from time pressure to analytic culture. These forces are real, they exist for a reason, and to neglect them is to fail. If we wish Kent’s “poets” to collaborate with the “mathematicians”, then we “mathematicians” had better take the time to understand the cognitive, social, and organizational objections.