zmVault/macgregor_1994_judgmental-decomposition.md

---
id:
aliases: []
title: "Judgmental Decomposition: When Does It Work?"
tags:
  - authorship/other
  - exclude-from-word-count
  - type/media/article
dg-publish: false
---
# Judgmental Decomposition: When Does It Work?

Published in _International Journal of Forecasting_, 10 (1994), 495-906

Donald G. MacGregor Decision Research, Eugene, OR

J. Scott Armstrong _The Wharton School, University of Pennsylvania, Philadelphia, PA_

## Abstract

We hypothesized that multiplicative decomposition
would improve accuracy only in certain conditions.
In particular, we expected it to help for problems
involving extreme and uncertain values.
We first reanalyzed results from two published studies.
Decomposition improved accuracy for nine problems
that involved extreme and uncertain values,
but for six problems with target values that were not extreme and uncertain,
decomposition was not more accurate.
Next, we conducted experiments involving 10 problems with 280 subjects making 1078 estimates.
As hypothesized, decomposition improved accuracy
when the problem involved the estimation of extreme and uncertain values.
Otherwise, decomposition often produced less accurate predictions.

Keywords: Decision Analysis; Estimation; Extreme Values;
Forecasting; Multiplicative Decomposition; Uncertainty

## 1. Introduction

Consider the following question:
What is the estimated yearly circulation of a proposed new magazine on raising exotic animals?
People are likely to respond that they have no idea. But do they?
What are they likely to say if asked whether the number was greater than 100 million?
Would they say that it is less than 1000?
Most likely, people would say that the true value is somewhere between these two values.
Obviously, they know more than they think they do when first asked.

How well a person is able to forecast a quantity
is related to the relevant information that they have at their disposal,
either from information sources or from experts.
It is also a function of whether they can break the problem into parts
so that they can use their information effectively.
Forecasters frequently break a problem into parts,
make forecasts from each part,
then recombine the separate forecasts to make a forecast of the target value.
In 1968 Howard Raiffa (1968) claimed that such a procedure,
_decomposition,_ is 'the spirit of decision analysis.'
Since then, research has seemed to support the view
that decomposition is a useful strategy
with wide applicability and little risk.

Prior literature on judgmental decomposition
(Armstrong et al., 1975, and MacGregor et al., 1988)
concluded that decomposition would be especially effective
for problems involving uncertain values.
However, we do not know much about the conditions
under which judgmental decomposition is most useful.
Armstrong et al. (1975) had suggested that the scale of the problem
might make further study worthwhile,
and our paper addresses that issue.
In examining the problem, we reanalyzed results from two studies.
In addition, we, conducted experiments with new subjects.
We also examined alternative approaches for assessing uncertainty
to determine whether they would yield different recommendations
about when decomposition is appropriate.

## 2. Hypotheses

The basic idea behind decomposition is simple.
Given a target quantity that is difficult to estimate,
one breaks the problem down into subproblems that are easier to estimate.
The difficulty lies in translating this idea into practice.
For decomposition to be done successfully, certain conditions are desirable.
First, the target value should be _one_ that is difficult to estimate.
Second, estimation errors for each part should be less, relatively speaking,
than the errors for estimating the target value.
Third, estimation errors for the parts
should not have strong positive correlations between one another.
Negatively correlated errors are desirable so that one has offsetting errors.
These conditions are not easy to specify in operational terms.

Traditionally, the term decomposition
has been used to refer to the practice
of breaking a problem into multiplicative elements.
An additive breakdown is usually referred to as _disaggregation_ or _segmentation_.
Our paper is restricted to multiplicative decomposition
and we use the term decomposition to refer to this.

Decomposition is often viewed as a safe strategy.
Rather than putting all of one's eggs into a single basket, estimates are provided separately.
Errors in one element may compensate for errors in another.
However, when errors are positively correlated, they can be explosive.
For example, if two components are in the same direction and are each equal to 20%,
this would translate into an error of 44% in the target value (1.2 × 1.2 = 1.44).

Target values with extreme values
are likely to create difficulties for subjects
unless these numbers are well known.
For very large numbers, people might make estimates that are too small.
Lacking good intuition,
an estimator might assign a 'more reasonable number' to a quantity in question.
We would expect the converse for very small numbers, such as 'one in 10 million.'

We hypothesized that decomposition would improve accuracy for problems with extreme values
when subjects were highly uncertain about the target value.
The reasoning is simply that large numbers are confusing to many people.
With decomposition, the analyst might be able to avoid the extreme numbers associated with high uncertainty.
Uncertainty is an important aspect of this hypothesis.
Thus, we do not expect that decomposition would help to estimate well known numbers,
such as the distance from the Earth to the sun
(when most of the experts believe that the distance is about 93 million miles).

The operational definition of an extreme value is difficult to determine.
To provide a simple measure of an extreme value,
we initially defined it as any number having more than seven digits
(equal to or greater than 10 million).
Certainly, many people have difficulty grasping numbers of this magnitude.
For example, a book has been written with the sole purpose
of helping people to understand the magnitude of one million.
It consists of one million dots with comparisons at various points
where examples are given (Hertzberg, 1970).[^1]
Psychologists also refer to the ability of the human mind
to handle only seven things (plus or minus two).

[^1]: As an example of how difficult it is to think about extreme numbers, consider the following.
    A typographical error was made in Armstrong et al. (1975).
    The number of cards saying "Carefree Sugarless Gum"
    that were sent to a Philadelphia radio station
    was reported as 66.5 billion rather than the correct value,
    which was 66.5 million.
    We missed this in proofreading,
    and the number has subsequently been cited in other papers
    without any questions being raised.

The selection of the unit of measure causes problems.
For example, one could change the units from miles to inches
when asking someone to estimate the distance from New York to San Francisco.
However, some important quantities are not amenable,
either conceptually or computationally, to changes in scale.

We were also concerned about how best to assess uncertainty.
In particular, would different approaches lead to different conclusions
about when to use decomposition?

## 3. Reanalysis of prior studies

In an early study of judgmental decomposition,
Armstrong et al. (1975) concluded that multiplicative decomposition typically improves accuracy
and is unlikely to reduce accuracy.
The study involved such problems as estimating the number of packs of Polaroid film
that were used in the United States in 1970.
The results also supported the hypothesis that decomposition is especially useful for problems
where the estimator's perceived uncertainty about the true value is high.
A subsequent study by MacGregor et al., (1988)
also found that judgmental decomposition improves accuracy.
That study used similar problems, for example,
estimating the value of imported passenger cars sold in the U.S. the previous year.

Armstrong et al. (1975) examined uncertainty
by asking 151 subjects to rank problems according to the confidence
that they had in their ability to provide accurate answers.
MacGregor et al. (1988) addressed the same issue
by using the variability among 45 subjects in their estimates for each target value.
Specifically, they focused on the interquartile range.
The interquartile range represents the middle 50% of a distribution
and is calculated as the difference between
the point at the 75th percentile of the distribution (Q3)
and the point at the 25th percentile (Q1);
the median of the distribution is at the 50th percentile (Q2).
We expected that problems with extreme unknown values
would create uncertainty among estimators
and would therefore show up in the interquartile range.
We examined this hypothesis by comparing the number of digits
in each of the 16 problems in MacGregor et al. with the interquartile range of error ratios.
As expected, the number of digits was related to uncertainty.
The correlation between the number of digits in the actual values for each problem
and the corresponding interquartile range was about +0.75.

To examine whether decomposition improved accuracy for problems involving extreme unknown numbers,
we split the MacGregor et al. data according to magnitude and disagreement.
This yielded five problems where scale was not extreme
(using seven or fewer digits gave a roughly equal breakdown of the problems)
and where assessors were in agreement
(we used an interquartile range with a log l o of 1.3 or less,
which means that the ratio between the lowest quartile and the highest quartile is less than two).
The five problems were the numbers of physicians, marriages,
alcoholics, university employees and hospital employees.
Six problems had extreme magnitude (over seven digits)
and high disagreement among estimators
(interquartile range of 1.75 or more,
implying a ratio of 5.6 of the largest to smallest quartile).
These problems involved the numbers of welfare cases, imported cars,
alcohol dollars, mail handled by post offices, gasoline and cigarettes.

We estimated the average improvement for decomposition in MacGregor et al. in two steps. First, geometric mean estimates were calculated for the group of subjects who used the decomposed version (this being the computed full algorithm from Table 6 in MacGregor et al.) and for those who used the global version. These estimates were then compared with the actual values for each problem.

### Table 1 Decomposition versus global errors: reanalysis of prior studies

| Conditions                     | Number of problems |            | Median error ratios |                     |
| ------------------------------ | ------------------ | ---------- | ------------------- | ------------------- |
|                                |                    | **Global** | **Decomposition**   | **Error reduction** |
|                                |                    |            |                     |                     |
| _Not extreme, low uncertainty_ | 5                  | 1.8        | 2.3                 | -0.5                |
| MacGregor et al.               | 1                  | 5.4        | 2.3                 | 2.1                 |
| Armstrong et al.               |                    |            |                     |                     |
|                                |                    |            |                     |                     |
| _Extreme, high uncertainty_    |                    |            |                     |                     |
| MacGregor et al.               | 6                  | 99.3       | 3.0                 | 96.3                |
| Armstrong et al.               | 3                  | 18.0       | 5.7                 | 12.3                |

Decomposition errors were smaller than global errors for each of the six problems where dis agreement (interquartile range) was high and the actual values were extreme. Subjects who made global estimates were in error by a factor of 99.3 (9930%) on average. In contrast, the error ratio for the decomposed version for the same six problems was 3.0, or 300%.[^2] Thus, the median error was reduced by a factor of 96.3 (see bottom part of Table 1). For problems without extreme values and where disagreement was low, decomposition yielded less accurate estimates, as its error was 50% higher than that for the global approach. Table 1 summarizes these results.

[^2]: We calculated the geometric means of the two error ratios
    in the middle of the distribution for the global and decompositional conditions.

We did a similar analysis for the problems in Armstrong et al. (1975). Here, the analysis was based on individuals rather than groups. Error ratios were calculated for each subject's estimate for each problem by comparing their estimates with the actual values. The median error ratio was then obtained for each problem. Decomposition produced substantial gains (1230% error reduction) for the three extreme problems with the highest uncertainty.[^3] Decomposition also provided a lesser improvement for the one problem that did not involve an extreme number. Table 1 summarizes these results as well.

[^3]: We used the median error ratios across the new groups of subjects that were tested for the film and tobacco problems.
    Only two groups did the Contest problem, and here we used the geometric mean.

Averaging across the two studies (weighting according to the number of questions),
decomposition reduced error by a ratio of 68.3 for the nine problems involving extreme uncertain values.
However, decomposition had no overall effect for the other six problems.

## 4. An experiment on the effects of extreme uncertain values

We conducted an experiment to provide further evidence
on the effects of multiplicative decomposition
when applied to problems with extreme uncertain numbers.
This section describes the problems and the subjects.

### 4.1. Problems

We selected problems in which the magnitude of unknown numbers to be estimated varied.
Our extreme problems had seven or more digits, ranging in value from 3,540,940 to 4,243,000,000.
As noted earlier, this definition of _extreme_ is somewhat arbitrary.[^4]
Not extreme numbers in this set of problems had four digits or less,
in order to provide a marked distinction from extreme numbers.
Table 2 provides the 10 problems, along with the correct answers taken from almanacs and fact books.

[^4]: After analyzing the prior research (Table 1),
    we revised our definition of extreme for this study
    from 'more than seven digits' to 'seven or more digits.'
    Extremity could also be defined in terms of small numbers.
    An example would be,
    'What is the chance that a person in the U.S. will die next year because of botulism?'
    (The answer is 1/100,000,000.)

#### Table 2 Problems and magnitudes: versus actual versus estimated

%% TODO %%

All questions relate to the U.S. unless stated otherwise.

Because actual values would not be known to the subjects,
we first determined whether it would be possible
to identify problems that might involve extreme values.
We reasoned that typical subjects would not do well at such estimates.
Thus, we used the geometric mean of the upper quartile (top 25%) of the estimates.
That is, if the upper quartile of subjects expected this to be an extreme number,
then it was treated as such.
By this measure, the expected number of digits
was a good match of the actual number of digits, as shown in Table 2.
The largest estimate for the small group
was that Argentine immigrants would be a five-digit number,
and the smallest estimate for the extreme problems
was that the Circulation of TV _Guide_ would be a six-digit number,
so the classification of the problems was the same.

To determine whether the large target values were uncertain,
we examined the interquartile ranges.
The smallest of these ranges for the group of problems having extreme values
indicated that the upper quartile mean
was more than 10 times as large as the lower quartile mean.

For each problem we constructed a global version and a decomposed version.
Table 3 summarizes the full set of 10 decomposed algorithms.
For the sake of brevity,
only the algorithm steps requiring subjects to make component estimates are provided;
intermediate arithmetic steps are omitted.
We also asked subjects to rate their knowledge about each target value,
their expected accuracy
and the probability that their answer would be within 10% of the true value.

#### Table 3 Abbreviated descriptions of algorithms for the ten estimation problems

> ##### Argentine immigrants
>
> * Population in the U.S.
> * Proportion of population that immigrated to U.S.
> * Percentage of U.S immigrants from Argentina
>
> ##### Circulation of TV Guide
>
> * Households in the U.S.
> * Proportion of households with a TV
> * Proportion of households with a TV receiving TV Guide
>
> ##### Circumference of 50¢ coin
>
> * Diameter in inches of a 50¢ coin
> * Number of pieces of string the length of the diameter needed to wrap around circumference
>
> ##### Bushels of wheat
>
> * Population of the world
> * Number of bushels of wheat consumed per person per year
> * Proportion of wheat wasted per year
>
> ##### Bank failures in 1933
>
> * Current population of the U.S.
> * Population of U.S. in 1933 as a proportion of current population
> * Number of customers for a typical bank
> * Proportion of banks failed in 1933
>
> ##### U.S. presidents
>
> * Number of years U.S. has had presidents
> * Number of years the average president holds office
>
> ##### Men's pants
>
> * Number of men in the U.S.
> * Number of pairs of pants the average man buys each year
> * Number of women in the U.S.
> * Number of pairs of men's pants the average woman buys each year
> * Proportion of men's pants manufactured in the U.S. that are sold to U.S. customers
>
> ##### Athletic shoes
>
> * Population of the U.S.
> * Proportion of the population that wears athletic shoes
> * Pairs of athletic shoes each wearer buys per year
> * Proportion of athletic shoes manufactured in U.S. that are sold to U.S. customers
>
> ##### Auto accidents
>
> * Number of people in the U.S. of driving age
> * Proportion of people of driving age who drive
> * Number of accidents the average driver has per year
>
> ##### Area of U.S.
>
> * Distance in miles from San Francisco, CA to Washington, D.C.
> * Distance in miles from San Diego, CA to Seattle, WA
> * Proportion of the U.S. that would fit into a rectangle with an area equal to the product of the above dimensions

For some of the problems, such as _Athletic shoes_,
one of the components involved an extreme value.
However, we were reasonably confident that subjects would know this value.
Also, data on these values are readily available so that one could insert the known value.

### 4.2. Subjects

Subjects for the experiment were individuals
who answered advertisements in the University of Oregon daily newspaper.
The advertisements called for participation in judgment and decision-making tasks.
Two hundred and eighty individuals participated in the experiment,
which was conducted in two sessions.
Subjects were randomly assigned to either the global or the decomposition treatment.
In the first session, the problems _$ coin, U.S. presidents, Argentine immigrants,_
_Bank failures, Circulation of TV Guide_ and _Bushels of wheat_ were administered.
Those subjects assigned to the global treatment received all six problems.
Because of time constraints, subjects assigned to the decomposition condition received half of the problems.
In the second session, the remaining four problems were administered.
Again, subjects in the global condition received all four of the remaining problems,
while decomposition subjects received half of the problems.

## 5. Results

As had been done in previous studies of judgmental forecasting
(Armstrong et al., 1975, and MacGregor et al., 1988),
we used the error ratio as an index of accuracy.
The error ratio is computed as the ratio of the individual's estimated value to the correct answer,
or the reverse, such that the result is greater than or equal to 1.0.
Estimates for a given problem were summarized across subjects
by computing the geometric mean of the error ratios.

We had hypothesized that decomposition would improve accuracy for problems having extreme uncertain values. The results, shown in Table 4, were consistent with this hypothesis. We summarized the problems into two groups: extreme problems (correct answer greater than 3,540,940) and not extreme problems (correct answer 4,004 or less). Accuracy was superior for decomposition in five of the six extreme problems, with an error reduction that ranged from a factor of 4.10 (_Athletic shoes_) to 91.47 (_Auto accidents_). Only the _Circulation of TV Guide_ problem suffered a decrease in accuracy with decomposition. This decrease was modest compared to the gains in accuracy for the other five extreme problems, and this decrease was not statistically significant. Across all six problems, the median error was reduced by a factor of 19.78, approximately a 20-fold improvement in accuracy. Following Winer's method of adding is (as described in Rosenthal, 1978), these results were statistically significant at _p_ < 0.001 using a one-tail test.[^5]

[^5]: Because the ratios involved some extreme values, the _t_-tests were done on the logs of the error ratios rather than on the ratios themselves.

### Table 4 Error ratios for global versus decomposed estimates (for individuals)

| Problems                      | Sample size |        | Error ratios (geometric means) |        | Error reduction | t-test  |
| ----------------------------- | ----------- | ------ | ------------------------------ | ------ | --------------- | ------- |
|                               | Global      | Decomp | Global                         | Decomp |                 |         |
|                               |             |        |                                |        |                 |         |
| _Not extreme_                 |             |        |                                |        |                 |         |
| \$ coin                       | 64          | 62     | 1.82                           | 1.41   | 0.41            | 4.07**  |
| U.S. presidents               | 64          | 63     | 1.23                           | 1.35   | - 0.12          | -1.55   |
| Argentine immigrants          | 65          | 54     | 4.89                           | 46.77  | -41.88          | -5.85** |
| Bank failures                 | 64          | 57     | 10.45                          | 19.50  | - 9.03          | -1.69   |
| Median                        |             |        |                                |        | - 4.58          |         |
| Combined experiments (z-test) |             |        |                                |        |                 | -2.49*  |
|                               |             |        |                                |        |                 |         |
| _Extreme_                     |             |        |                                |        |                 |         |
| Area of U.S.                  | 30          | 30     | 33.88                          | 1.70   | 32.18           | 6.00**  |
| Circulation of _TV Guide_     | 64          | 60     | 7.76                           | 10.96  | - 3.20          | -1.11   |
| Athletic shoes                | 31          | 32     | 19.95                          | 15.85  | 4.10            | 0.47    |
| Auto accidents                | 31          | 30     | 93.33                          | 1.86   | 91.47           | 8.07**  |
| Men's pants                   | 31          | 31     | 17.38                          | 10.00  | 7.38            | 1.01    |
| Bushel of wheat               | 61          | 62     | 45.71                          | 6.92   | 38.79           | 4.57**  |
| Median                        |             |        |                                |        | 19.78           |         |
| Combined experiments (z-test) |             |        |                                |        |                 | 4.37**  |

<sup>\*</sup>Significant at _p_ < 0.05

<sup>\*\*</sup>Significant at _p_ < 0.001

By contrast, accuracy for not extreme problems was reduced with decomposition. Error Auction values for three of the four not extreme problems were negative, indicating a superiority of global estimation over decomposition. Decomposition increased the median error or these problems by 458%, an increase that as statistically significant at _p_ < 0.05. The test or the not extreme values was two-tailed cause we had no directional hypothesis. Our analysis overstates the statistical significance; the various estimates are not completely independent of one another.

### 5.1. Uncertainty of estimation

Whether decomposition is appropriate depends on some measure of uncertainty.
We propose that analysts first determine whether the problem,
is subject to much uncertainty.
If so, decomposition may be appropriate,
especially if one can structure the problem to avoid extreme certain values.

Otherwise, global estimates should be used.
Uncertainty decreases the degree to which an estimate from various assessors
exhibits a lower variance or a reduced range.
Table 5 shows the interquartile ranges for the global and decomposed estimates.
The entries consist of the logs of Q1 and Q3 as well as their differences.
Q1 corresponds to the 25th percentile of the distribution, while Q3 corresponds to the 75th percentile.
If decomposition reduces uncertainty, then a lower Q3-Q1 difference should result.
Computed in this way, the differences in Table 5 can be interpreted
as the number of digits by which the estimates of Q1 and Q3 differed.

#### Table 5 Analysis of interquartile ranges

| Problems                | Global |        |                     | Decomposed |        |                     |
| ----------------------- | ------ | ------ | ------------------- | ---------- | ------ | ------------------- |
|                         | Log Q3 | Log Q1 | Differences (Q3-Q1) | Log Q3     | Log Q1 | Differences (Q3-Q1) |
|                         |        |        |                     |            |        |                     |
| _Not extreme_           |        |        |                     |            |        |                     |
| $ coin                  | 0.48   | 0.20   | 0.28                | 0.63       | 0.42   | 0.21                |
| U.S. presidents         | 1.71   | 1.59   | 0.12                | 1.70       | 1.53   | 0.17                |
| Argentine immigrants    | 4.30   | 3.30   | 1.00                | 5.74       | 3.60   | 2.14                |
| Bank failures           | 3.70   | 2.08   | 1.62                | 4.92       | 3.18   | 1.74                |
|                         |        |        |                     |            |        |                     |
| _Extreme_               |        |        |                     |            |        |                     |
| Area of U.S.            | 6.30   | 4.00   | 2.30                | 6.76       | 6.39   | 0.37                |
| Circulation of TV Guide | 7.54   | 6.18   | 1.36                | 7.80       | 5.95   | 1.85                |
| Athletic shoes          | 8.00   | 6.00   | 2.00                | 8.84       | 7.75   | 1.09                |
| Auto accidents          | 6.18   | 5.00   | 1.18                | 7.80       | 6.08   | 1.72                |
| Men's pants             | 8.00   | 6.00   | 2.00                | 9.53       | 8.07   | 1.46                |
| Bushels of wheat        | 10.48  | 7.18   | 3.30                | 10.54      | 9.65   | 0.89                |

For not extreme problems, the interquartile ranges are higher for the decomposed estimates than the global estimates for three of the four problems. For one problem, _Argentine immigrants,_ the interquartile range for the decomposed version was higher than that for the global (2.14 versus 1.00). This occurred even though each part had the same interquartile range as the target value. This problem did not, then, meet the condition that the parts are easier to forecast than the target value, nor were the errors independent. Thus, it is not surprising that decomposition was not helpful for this problem.

For extreme problems, the range for the decomposed estimate was less than that for the global, except for the _Auto accidents_ and _Circulation of TV Guide_ problems. In other words, decomposition often improved confidence for difficult problems when the agreement among assessors' estimates was used to gauge confidence. Furthermore, the differences between the global and decomposed ranges for the four problems with improvements were substantial, being typically greater than one digit. Although the number of problems is not sufficient to assess the relationship between the interquartile ranges arid errors, this result is consistent with that found in the seven problems examined by Aschenbrenner and Kasubek (1978).

A tenet of decomposition states that the parts of a problem are more tractable than the whole. This means that uncertainty in the estimates of a problem's components should be lower than that for the global estimate. We computed the interquartile ranges for each of the components of the six problems in Table 6. The parts were easier to estimate than the target value for three problems: _50 ¢ coin, U.S. presidents_ and _Bushels of wheat._ The first two of these had target values that were easy to assess directly, whereas _Bushels of wheat_ had an extreme value that was difficult to measure. The _Bushels of wheat_ problem met all conditions for decomposition. As expected, decomposition was successful for this problem. Conversely, decomposition was less accurate for four of the other five questions.

**Table 6 Assessments of subjective confidence**

| Problems              | Mean knowledge ratings<sup>a</sup> |               | Mean accuracy ratings<sup>a</sup> |               | Mean probability ratings that estimate is within 10% of true answer |               |
| --------------------- | ---------------------------------- | ------------- | --------------------------------- | ------------- | ------------------------------------------------------------------- | ------------- |
|                       | Global                             | Decomposition | Global                            | Decomposition | Global                                                              | Decomposition |
|                       |                                    |               |                                   |               |                                                                     |               |
| U.S. Presidents       | 6.16                               | 5.46          | 6.02                              | 5.38          | 64.4                                                                | 35.2          |
| $ coin                | 5.50                               | 4.45          | 5.61                              | 4.45          | 54.9                                                                | 55.8          |
| Circulation/TV Guide  | 3.40                               | 2.35          | 3.46                              | 2.58          | 32.1                                                                | 24.6          |
| Bank failures         | 3.19                               | 2.17          | 3.06                              | 2.20          | 28.0                                                                | 18.9          |
| Argentinie immigrants | 2.15                               | 1.81          | 2.38                              | 2.32          | 24.4                                                                | 16.8          |
| Bushels of wheat      | 2.24                               | 2.02          | 2.16                              | 2.27          | 18.9                                                                | 19.9          |

<sup>a</sup> High scores imply greater knowledge and greater perceived accuracy (scale from 1 to 10).

### 5.2. Subjective confidence ratings

A second source of uncertainty estimates is the subjective confidence
that forecasters have in their knowledge about a problem.
We addressed three questions with respect to subjective uncertainty.
(1) Do alternative measures of uncertainty yield similar recommendations?
If yes, then we could use the least expensive approach to assessing uncertainty.
(2) Are judges more confident when they make decomposed estimates or global estimates?
(3) Does decomposition lead subjects to become better calibrated about their confidence?

As the simplest and least expensive approach, we asked subjects to provide judgments of their knowledge about each target value, and the degree to which they thought their estimate would be accurate. Self-ratings of knowledge and accuracy were obtained from the subjects before they made their estimates by using the following scales.

> "Before you begin, indicate on the scale below _how much you think you know about the topic_"
>
> (1= know very little; 10 = know a great deal).
>
> "How _accurately_ do you think you will be able to estimate this quantity?"
>
> (1 = low accuracy; 10 = high accuracy).

Judgments were obtained for a subset of six problems.
Table 6 shows alternative assessments of accuracy for these problems.

After subjects had estimated the value for each of the six problems,
we asked them to indicate the probability that their estimate was within 10% of the correct answer.
These results are also presented in Table 6.
Finally, we calculated the interquartile ranges of the global estimate for each problem,
shown in the last column of Table 6.

With the exception of the interquartile range,
the different approaches to subjective confidence produced similar results.
The intercorrelations among the three measures across the six problems were all over 0.99.
Given the close correspondence among the three measures,
they were expected to be of roughly equal value in deciding when to use decomposition.

We applied the same procedures to subjects who received the decomposed versions of the problems.
Across all six problems, subjects had higher self-ratings of problem knowledge
in the global condition than in the decomposition condition.
Because subjects in the decomposition condition received more than one estimation problem,
their self-ratings of problem knowledge
may have been influenced by the difficulties they experienced with the complexity of the problem.
This was also the case for self-ratings of accuracy, except for the _Bushels of wheat_ problem.
Similar results were obtained when we asked the questions about confidence
after subjects had completed their estimates.
In other words, the different assessments
each led to the conclusion that subjects in the decomposition condition
thought that the problems were more difficult than did subjects in the global estimation condition.
These results agree with the findings of Sniezek et al. (1990),
who had concluded that the increased processing
(for decomposed problems) leads to a reduction in confidence.
In retrospect, it might have been better for us
to have asked for estimates of the difficulty for each of the parts.
Henrion et al. (1993) did this, and their subjects reported
that the _components_ were easier to estimate than the global value.

Are subjects better calibrated when they use decomposition? Probability assessments are said to be externally calibrated if, for a given probability assessment (e.g., 0.6), exactly that proportion (e.g., 60%) turn out to be correct. We summarized the calibration results for global and decomposed estimates, across all ten problems. Mean probability assessments were generally higher than the proportion correct for both approaches, indicating overconfidence. On average, those making global estimates expected 38.9% of their answers to be within 10% of the true value, but only 10.9% were that accurate. Those using the decomposed approach expected 32.6% of their estimates to be within 10% of the true value, but only 9.0% were that accurate. In effect, decomposition reduced overconfidence from 28.0% in the global case to 23.6% for decomposition, with the largest reduction occurring in those situations where subjects felt most confident, as shown in Fig. 1.

### 5.3. Limitations

Two of the four problems in the not extreme version
_(Argentine immigrants_ and _Bank failures)_
involved elements with extreme values.
Because each of the components had an element dealing with the U.S. population,
we assumed that the subjects would be familiar with these values.
To examine this assumption, we analyzed the population estimates for each of the problems.
The median population estimate for the _Argentine immigrants_ problem
was in error by a factor of 1.97 from the actual,
while for _Bank failures_ it was in error by a factor of 1.42.
For both problems, errors for the U.S. population component
were less than errors for the global quantities.
Nevertheless, we were surprised at the difficulty individuals had with estimating this value.
In practical problems, of course, one could simply use the actual value.
In their study of decomposition, Henrion et al. (1993) gave the U.S. population value to the subjects.

> ##### Fig. 1. Calibration of probability assessments that estimated answer is within 10% of true answer.
>
> %% ![[figure_1.jpeg]] %%

The issue of 'how extreme is extreme' has not been resolved.
We proposed a definition based on the number of digits (six or seven),
but we did not examine alternatives.
Nor did we resolve the issue of how to specify the unit of measure.

We expect that other conditions might affect decisions on when to use decomposition.
For example, question type may have some importance.
We do not know the extent to which our problem selection may have affected findings.

## 6. Discussion

Despite the improved accuracy it afforded,
decomposition did not increase subjects' confidence in the accuracy of their estimates.
However, the interquartile estimates were smaller for the decomposed estimates
and confidence in the accuracy of estimates was slightly more appropriate.

Perceived uncertainty measures are easy to obtain.
As shown in Table 6, self-assessments of uncertainty
provided similar rankings of the relative uncertainty for the problems.
The interquartile ranges provided somewhat different information than the self-assessments.
Interquartile ranges of the estimates are not expensive, but they do require a pretest.

The present study addresses the issue of whether estimates by individuals
can be improved when no other data are available.
However, we expect that other situational characteristics
or estimation-aiding strategies
would also affect the usefulness of decomposition.
For example, a forecaster could decompose a problem
to use different sources of information or different experts.
For some parts of the problem, known values may exist.
Alternative decomposition methods could be used to produce an estimate,
and resulting values for a quantity could be resolved in light of one another.
MacGregor and Liehtenstein (1991) attempted such an approach
and found that subjects tended to resolve estimates by applying an averaging model.
Revised estimates generally fell between two estimates of a target quantity,
where each judgmental estimate was produced by a different method.

Although our approach to decomposition was harmful for problems
that did not involve extreme - uncertain numbers,
there might be alternative approaches that are successful.
For example, decomposition might restructure a problem
so that it is easier for subjects to think about.

Decomposition tended to reduce estimators' confidence levels,
perhaps because of the increased processing involved.
This reduction in overconfidence and the improvements in accuracy
produced modest gains in calibration.

### 7. Conclusions

The theory behind decomposition is simple.
What is difficult is how to translate the theory into operational terms.
We examined some operational procedures
for identifying conditions under which decomposition should improve accuracy.

Extreme uncertain values are difficult for subjects to estimate.
We hypothesized that decomposition to remove extreme values would improve estimation accuracy.
This study examined nine "extreme value-high uncertainty" problems from two prior studies.
Decomposition proved useful for each of these nine problems,
and the typical gain in accuracy was substantial
(error ratio was reduced by 96.3 for the study with six problems,
and by 12.3 for the study with three problems).
In the present study, involving six problems with extreme values,
the error ratio was reduced by a factor of almost 20.[^6]
Decomposition failed for one extreme problem
because it was not successful in producing more accurate estimates of the parts.

[^6]: The results from Hora et al. (1993) also are consistent with our hypothesis.
    They found that decomposition was more accurate than global estimates
    for three quantities whose true values had at least eight digits
    (e.g., What were the sales for Long's Drug Stores in Hawaii in 1986?).

Decomposition was risky for problems
that did not involve extreme and uncertain values.
For six such problems from two prior studies,
decomposition had little overall effect on accuracy.
However, for four such problems in the current study,
decomposition yielded less accurate estimates
by an average error ratio of 458%.

Based on the limited evidence to date,
we suggest the following procedure for judgmental decomposition.
First, assess whether the target value is subject to much uncertainty
by using either a knowledge rating or an accuracy rating.
If the problem is an important one, obtain interquartile ranges.
For those items rated above the midpoint on uncertainty
(or above 10 on the interquartile range),
conduct a pretest with 20 subjects
to determine whether the target quantity is likely to be extreme.
If the upper quartile geometric mean has seven or more digits,
decomposition should be considered.
For these problems, compare the interquartile ranges for the target value
against those for the components and for the recomposed value.
If the ranges are less for the global approach, use the global approach.
Otherwise use decomposition.

The current study suggests that decomposition has more limited value that previously thought.
It improved accuracy only when the situation involved uncertain and extreme quantities.
Furthermore, decomposed elements needed to be easier to estimate than the global.
For problems that did not concern extreme values with high uncertainty
or where estimates of the parts were not more accurate than that of the target value,
decomposition produced less accurate estimates.

## Acknowledgements

This research was supported in part by the National Science Foundation under Contract SES-9013069 to Decision Research. Fred Collopy, George Loewenstein, Robin Hogarth and unidentified referees provided helpful comments on early drafts. Jennifer L. Armstrong, Suzanne Berman, Gina Bloom, Vanessa Lacoss, Phan Lam and Leisha Mullican provided editorial assistance.

## References

* Armstrong, J.S., W.B. Denniston and M.M. Gordon (1975),
    "The use of the decomposition principle in making judgments,"
    _Organizational Behavior and Human Performance_, 14, 257-263.

* Aschenbrenner, K.M. and W. Kasubck (1978),
    "Challenging the Cushing syndrome: Multiattribute evaluation of cortisone drugs,"
    _Organizational Behavior and Human Performance_, 22, 216-234.

* Henrion, M., G.W. Fischer and T. Mullin (1993),
    "Divide and conquer? Effects of decomposition on the accuracy and calibration of subjective probability distributions,"
    _Organizational Behavior and Human Performance_, 55, 207--227.

* Hertzberg, H. (1970), _One Million_. Simon and Schuster, New York.

* Hora, S.C., N.G. Dodd and J.A. Hora (1993),
    "The use of decomposition in probability assessments of continuous variables,"
    _Journal of Behavioral Decision Making_, 6, 133-147.

* MacGregor, D.G. and S. Lichtenstein (1991),
    "Problem structuring aids for quantitative estimation,"
    _Journal of Behavioral Decision Making, 4, 101-116._

* MacGregor, D.G., S. Lichtenstein and P. Slovic (1988),
    "Structuring knowledge retrieval: An analysis of decomposed quantitative judgments,"
    _Organizational Behavior and Human Decision Processes_, 42, 303-323*.*

* Raiffa, H. (1968), _Decision Analysis._ Princeton University Press, Princeton, New Jersey.

* Rosenthal, R. (1978), "Combining results of independent studies," _Psychological Bulletin_, 85, 185-193.

* Sniczek, J.A., P.W. Paese and F.S. Switzer, III (1990),
    "The effect of choosing on confidence in choice,"
    _Organizational Behavior and Human Decision Processes_, 46, 264-282.