703 lines
44 KiB
Markdown
703 lines
44 KiB
Markdown
---
|
||
id:
|
||
aliases: []
|
||
title: "Judgmental Decomposition: When Does It Work?"
|
||
tags:
|
||
- authorship/other
|
||
- exclude-from-word-count
|
||
- type/media/article
|
||
dg-publish: false
|
||
---
|
||
# Judgmental Decomposition: When Does It Work?
|
||
|
||
Published in _International Journal of Forecasting_, 10 (1994), 495-906
|
||
|
||
Donald G. MacGregor Decision Research, Eugene, OR
|
||
|
||
J. Scott Armstrong _The Wharton School, University of Pennsylvania, Philadelphia, PA_
|
||
|
||
## Abstract
|
||
|
||
We hypothesized that multiplicative decomposition
|
||
would improve accuracy only in certain conditions.
|
||
In particular, we expected it to help for problems
|
||
involving extreme and uncertain values.
|
||
We first reanalyzed results from two published studies.
|
||
Decomposition improved accuracy for nine problems
|
||
that involved extreme and uncertain values,
|
||
but for six problems with target values that were not extreme and uncertain,
|
||
decomposition was not more accurate.
|
||
Next, we conducted experiments involving 10 problems with 280 subjects making 1078 estimates.
|
||
As hypothesized, decomposition improved accuracy
|
||
when the problem involved the estimation of extreme and uncertain values.
|
||
Otherwise, decomposition often produced less accurate predictions.
|
||
|
||
Keywords: Decision Analysis; Estimation; Extreme Values;
|
||
Forecasting; Multiplicative Decomposition; Uncertainty
|
||
|
||
## 1. Introduction
|
||
|
||
Consider the following question:
|
||
What is the estimated yearly circulation of a proposed new magazine on raising exotic animals?
|
||
People are likely to respond that they have no idea. But do they?
|
||
What are they likely to say if asked whether the number was greater than 100 million?
|
||
Would they say that it is less than 1000?
|
||
Most likely, people would say that the true value is somewhere between these two values.
|
||
Obviously, they know more than they think they do when first asked.
|
||
|
||
How well a person is able to forecast a quantity
|
||
is related to the relevant information that they have at their disposal,
|
||
either from information sources or from experts.
|
||
It is also a function of whether they can break the problem into parts
|
||
so that they can use their information effectively.
|
||
Forecasters frequently break a problem into parts,
|
||
make forecasts from each part,
|
||
then recombine the separate forecasts to make a forecast of the target value.
|
||
In 1968 Howard Raiffa (1968) claimed that such a procedure,
|
||
_decomposition,_ is 'the spirit of decision analysis.'
|
||
Since then, research has seemed to support the view
|
||
that decomposition is a useful strategy
|
||
with wide applicability and little risk.
|
||
|
||
Prior literature on judgmental decomposition
|
||
(Armstrong et al., 1975, and MacGregor et al., 1988)
|
||
concluded that decomposition would be especially effective
|
||
for problems involving uncertain values.
|
||
However, we do not know much about the conditions
|
||
under which judgmental decomposition is most useful.
|
||
Armstrong et al. (1975) had suggested that the scale of the problem
|
||
might make further study worthwhile,
|
||
and our paper addresses that issue.
|
||
In examining the problem, we reanalyzed results from two studies.
|
||
In addition, we, conducted experiments with new subjects.
|
||
We also examined alternative approaches for assessing uncertainty
|
||
to determine whether they would yield different recommendations
|
||
about when decomposition is appropriate.
|
||
|
||
## 2. Hypotheses
|
||
|
||
The basic idea behind decomposition is simple.
|
||
Given a target quantity that is difficult to estimate,
|
||
one breaks the problem down into subproblems that are easier to estimate.
|
||
The difficulty lies in translating this idea into practice.
|
||
For decomposition to be done successfully, certain conditions are desirable.
|
||
First, the target value should be _one_ that is difficult to estimate.
|
||
Second, estimation errors for each part should be less, relatively speaking,
|
||
than the errors for estimating the target value.
|
||
Third, estimation errors for the parts
|
||
should not have strong positive correlations between one another.
|
||
Negatively correlated errors are desirable so that one has offsetting errors.
|
||
These conditions are not easy to specify in operational terms.
|
||
|
||
Traditionally, the term decomposition
|
||
has been used to refer to the practice
|
||
of breaking a problem into multiplicative elements.
|
||
An additive breakdown is usually referred to as _disaggregation_ or _segmentation_.
|
||
Our paper is restricted to multiplicative decomposition
|
||
and we use the term decomposition to refer to this.
|
||
|
||
Decomposition is often viewed as a safe strategy.
|
||
Rather than putting all of one's eggs into a single basket, estimates are provided separately.
|
||
Errors in one element may compensate for errors in another.
|
||
However, when errors are positively correlated, they can be explosive.
|
||
For example, if two components are in the same direction and are each equal to 20%,
|
||
this would translate into an error of 44% in the target value (1.2 × 1.2 = 1.44).
|
||
|
||
Target values with extreme values
|
||
are likely to create difficulties for subjects
|
||
unless these numbers are well known.
|
||
For very large numbers, people might make estimates that are too small.
|
||
Lacking good intuition,
|
||
an estimator might assign a 'more reasonable number' to a quantity in question.
|
||
We would expect the converse for very small numbers, such as 'one in 10 million.'
|
||
|
||
We hypothesized that decomposition would improve accuracy for problems with extreme values
|
||
when subjects were highly uncertain about the target value.
|
||
The reasoning is simply that large numbers are confusing to many people.
|
||
With decomposition, the analyst might be able to avoid the extreme numbers associated with high uncertainty.
|
||
Uncertainty is an important aspect of this hypothesis.
|
||
Thus, we do not expect that decomposition would help to estimate well known numbers,
|
||
such as the distance from the Earth to the sun
|
||
(when most of the experts believe that the distance is about 93 million miles).
|
||
|
||
The operational definition of an extreme value is difficult to determine.
|
||
To provide a simple measure of an extreme value,
|
||
we initially defined it as any number having more than seven digits
|
||
(equal to or greater than 10 million).
|
||
Certainly, many people have difficulty grasping numbers of this magnitude.
|
||
For example, a book has been written with the sole purpose
|
||
of helping people to understand the magnitude of one million.
|
||
It consists of one million dots with comparisons at various points
|
||
where examples are given (Hertzberg, 1970).[^1]
|
||
Psychologists also refer to the ability of the human mind
|
||
to handle only seven things (plus or minus two).
|
||
|
||
[^1]: As an example of how difficult it is to think about extreme numbers, consider the following.
|
||
A typographical error was made in Armstrong et al. (1975).
|
||
The number of cards saying "Carefree Sugarless Gum"
|
||
that were sent to a Philadelphia radio station
|
||
was reported as 66.5 billion rather than the correct value,
|
||
which was 66.5 million.
|
||
We missed this in proofreading,
|
||
and the number has subsequently been cited in other papers
|
||
without any questions being raised.
|
||
|
||
The selection of the unit of measure causes problems.
|
||
For example, one could change the units from miles to inches
|
||
when asking someone to estimate the distance from New York to San Francisco.
|
||
However, some important quantities are not amenable,
|
||
either conceptually or computationally, to changes in scale.
|
||
|
||
We were also concerned about how best to assess uncertainty.
|
||
In particular, would different approaches lead to different conclusions
|
||
about when to use decomposition?
|
||
|
||
## 3. Reanalysis of prior studies
|
||
|
||
In an early study of judgmental decomposition,
|
||
Armstrong et al. (1975) concluded that multiplicative decomposition typically improves accuracy
|
||
and is unlikely to reduce accuracy.
|
||
The study involved such problems as estimating the number of packs of Polaroid film
|
||
that were used in the United States in 1970.
|
||
The results also supported the hypothesis that decomposition is especially useful for problems
|
||
where the estimator's perceived uncertainty about the true value is high.
|
||
A subsequent study by MacGregor et al., (1988)
|
||
also found that judgmental decomposition improves accuracy.
|
||
That study used similar problems, for example,
|
||
estimating the value of imported passenger cars sold in the U.S. the previous year.
|
||
|
||
Armstrong et al. (1975) examined uncertainty
|
||
by asking 151 subjects to rank problems according to the confidence
|
||
that they had in their ability to provide accurate answers.
|
||
MacGregor et al. (1988) addressed the same issue
|
||
by using the variability among 45 subjects in their estimates for each target value.
|
||
Specifically, they focused on the interquartile range.
|
||
The interquartile range represents the middle 50% of a distribution
|
||
and is calculated as the difference between
|
||
the point at the 75th percentile of the distribution (Q3)
|
||
and the point at the 25th percentile (Q1);
|
||
the median of the distribution is at the 50th percentile (Q2).
|
||
We expected that problems with extreme unknown values
|
||
would create uncertainty among estimators
|
||
and would therefore show up in the interquartile range.
|
||
We examined this hypothesis by comparing the number of digits
|
||
in each of the 16 problems in MacGregor et al. with the interquartile range of error ratios.
|
||
As expected, the number of digits was related to uncertainty.
|
||
The correlation between the number of digits in the actual values for each problem
|
||
and the corresponding interquartile range was about +0.75.
|
||
|
||
To examine whether decomposition improved accuracy for problems involving extreme unknown numbers,
|
||
we split the MacGregor et al. data according to magnitude and disagreement.
|
||
This yielded five problems where scale was not extreme
|
||
(using seven or fewer digits gave a roughly equal breakdown of the problems)
|
||
and where assessors were in agreement
|
||
(we used an interquartile range with a log l o of 1.3 or less,
|
||
which means that the ratio between the lowest quartile and the highest quartile is less than two).
|
||
The five problems were the numbers of physicians, marriages,
|
||
alcoholics, university employees and hospital employees.
|
||
Six problems had extreme magnitude (over seven digits)
|
||
and high disagreement among estimators
|
||
(interquartile range of 1.75 or more,
|
||
implying a ratio of 5.6 of the largest to smallest quartile).
|
||
These problems involved the numbers of welfare cases, imported cars,
|
||
alcohol dollars, mail handled by post offices, gasoline and cigarettes.
|
||
|
||
We estimated the average improvement for decomposition in MacGregor et al. in two steps. First, geometric mean estimates were calculated for the group of subjects who used the decomposed version (this being the computed full algorithm from Table 6 in MacGregor et al.) and for those who used the global version. These estimates were then compared with the actual values for each problem.
|
||
|
||
### Table 1 Decomposition versus global errors: reanalysis of prior studies
|
||
|
||
| Conditions | Number of problems | | Median error ratios | |
|
||
| ------------------------------ | ------------------ | ---------- | ------------------- | ------------------- |
|
||
| | | **Global** | **Decomposition** | **Error reduction** |
|
||
| | | | | |
|
||
| _Not extreme, low uncertainty_ | 5 | 1.8 | 2.3 | -0.5 |
|
||
| MacGregor et al. | 1 | 5.4 | 2.3 | 2.1 |
|
||
| Armstrong et al. | | | | |
|
||
| | | | | |
|
||
| _Extreme, high uncertainty_ | | | | |
|
||
| MacGregor et al. | 6 | 99.3 | 3.0 | 96.3 |
|
||
| Armstrong et al. | 3 | 18.0 | 5.7 | 12.3 |
|
||
|
||
Decomposition errors were smaller than global errors for each of the six problems where dis agreement (interquartile range) was high and the actual values were extreme. Subjects who made global estimates were in error by a factor of 99.3 (9930%) on average. In contrast, the error ratio for the decomposed version for the same six problems was 3.0, or 300%.[^2] Thus, the median error was reduced by a factor of 96.3 (see bottom part of Table 1). For problems without extreme values and where disagreement was low, decomposition yielded less accurate estimates, as its error was 50% higher than that for the global approach. Table 1 summarizes these results.
|
||
|
||
[^2]: We calculated the geometric means of the two error ratios
|
||
in the middle of the distribution for the global and decompositional conditions.
|
||
|
||
We did a similar analysis for the problems in Armstrong et al. (1975). Here, the analysis was based on individuals rather than groups. Error ratios were calculated for each subject's estimate for each problem by comparing their estimates with the actual values. The median error ratio was then obtained for each problem. Decomposition produced substantial gains (1230% error reduction) for the three extreme problems with the highest uncertainty.[^3] Decomposition also provided a lesser improvement for the one problem that did not involve an extreme number. Table 1 summarizes these results as well.
|
||
|
||
[^3]: We used the median error ratios across the new groups of subjects that were tested for the film and tobacco problems.
|
||
Only two groups did the Contest problem, and here we used the geometric mean.
|
||
|
||
Averaging across the two studies (weighting according to the number of questions),
|
||
decomposition reduced error by a ratio of 68.3 for the nine problems involving extreme uncertain values.
|
||
However, decomposition had no overall effect for the other six problems.
|
||
|
||
## 4. An experiment on the effects of extreme uncertain values
|
||
|
||
We conducted an experiment to provide further evidence
|
||
on the effects of multiplicative decomposition
|
||
when applied to problems with extreme uncertain numbers.
|
||
This section describes the problems and the subjects.
|
||
|
||
### 4.1. Problems
|
||
|
||
We selected problems in which the magnitude of unknown numbers to be estimated varied.
|
||
Our extreme problems had seven or more digits, ranging in value from 3,540,940 to 4,243,000,000.
|
||
As noted earlier, this definition of _extreme_ is somewhat arbitrary.[^4]
|
||
Not extreme numbers in this set of problems had four digits or less,
|
||
in order to provide a marked distinction from extreme numbers.
|
||
Table 2 provides the 10 problems, along with the correct answers taken from almanacs and fact books.
|
||
|
||
[^4]: After analyzing the prior research (Table 1),
|
||
we revised our definition of extreme for this study
|
||
from 'more than seven digits' to 'seven or more digits.'
|
||
Extremity could also be defined in terms of small numbers.
|
||
An example would be,
|
||
'What is the chance that a person in the U.S. will die next year because of botulism?'
|
||
(The answer is 1/100,000,000.)
|
||
|
||
#### Table 2 Problems and magnitudes: versus actual versus estimated
|
||
|
||
%% TODO %%
|
||
|
||
All questions relate to the U.S. unless stated otherwise.
|
||
|
||
Because actual values would not be known to the subjects,
|
||
we first determined whether it would be possible
|
||
to identify problems that might involve extreme values.
|
||
We reasoned that typical subjects would not do well at such estimates.
|
||
Thus, we used the geometric mean of the upper quartile (top 25%) of the estimates.
|
||
That is, if the upper quartile of subjects expected this to be an extreme number,
|
||
then it was treated as such.
|
||
By this measure, the expected number of digits
|
||
was a good match of the actual number of digits, as shown in Table 2.
|
||
The largest estimate for the small group
|
||
was that Argentine immigrants would be a five-digit number,
|
||
and the smallest estimate for the extreme problems
|
||
was that the Circulation of TV _Guide_ would be a six-digit number,
|
||
so the classification of the problems was the same.
|
||
|
||
To determine whether the large target values were uncertain,
|
||
we examined the interquartile ranges.
|
||
The smallest of these ranges for the group of problems having extreme values
|
||
indicated that the upper quartile mean
|
||
was more than 10 times as large as the lower quartile mean.
|
||
|
||
For each problem we constructed a global version and a decomposed version.
|
||
Table 3 summarizes the full set of 10 decomposed algorithms.
|
||
For the sake of brevity,
|
||
only the algorithm steps requiring subjects to make component estimates are provided;
|
||
intermediate arithmetic steps are omitted.
|
||
We also asked subjects to rate their knowledge about each target value,
|
||
their expected accuracy
|
||
and the probability that their answer would be within 10% of the true value.
|
||
|
||
#### Table 3 Abbreviated descriptions of algorithms for the ten estimation problems
|
||
|
||
> ##### Argentine immigrants
|
||
>
|
||
> * Population in the U.S.
|
||
> * Proportion of population that immigrated to U.S.
|
||
> * Percentage of U.S immigrants from Argentina
|
||
>
|
||
> ##### Circulation of TV Guide
|
||
>
|
||
> * Households in the U.S.
|
||
> * Proportion of households with a TV
|
||
> * Proportion of households with a TV receiving TV Guide
|
||
>
|
||
> ##### Circumference of 50¢ coin
|
||
>
|
||
> * Diameter in inches of a 50¢ coin
|
||
> * Number of pieces of string the length of the diameter needed to wrap around circumference
|
||
>
|
||
> ##### Bushels of wheat
|
||
>
|
||
> * Population of the world
|
||
> * Number of bushels of wheat consumed per person per year
|
||
> * Proportion of wheat wasted per year
|
||
>
|
||
> ##### Bank failures in 1933
|
||
>
|
||
> * Current population of the U.S.
|
||
> * Population of U.S. in 1933 as a proportion of current population
|
||
> * Number of customers for a typical bank
|
||
> * Proportion of banks failed in 1933
|
||
>
|
||
> ##### U.S. presidents
|
||
>
|
||
> * Number of years U.S. has had presidents
|
||
> * Number of years the average president holds office
|
||
>
|
||
> ##### Men's pants
|
||
>
|
||
> * Number of men in the U.S.
|
||
> * Number of pairs of pants the average man buys each year
|
||
> * Number of women in the U.S.
|
||
> * Number of pairs of men's pants the average woman buys each year
|
||
> * Proportion of men's pants manufactured in the U.S. that are sold to U.S. customers
|
||
>
|
||
> ##### Athletic shoes
|
||
>
|
||
> * Population of the U.S.
|
||
> * Proportion of the population that wears athletic shoes
|
||
> * Pairs of athletic shoes each wearer buys per year
|
||
> * Proportion of athletic shoes manufactured in U.S. that are sold to U.S. customers
|
||
>
|
||
> ##### Auto accidents
|
||
>
|
||
> * Number of people in the U.S. of driving age
|
||
> * Proportion of people of driving age who drive
|
||
> * Number of accidents the average driver has per year
|
||
>
|
||
> ##### Area of U.S.
|
||
>
|
||
> * Distance in miles from San Francisco, CA to Washington, D.C.
|
||
> * Distance in miles from San Diego, CA to Seattle, WA
|
||
> * Proportion of the U.S. that would fit into a rectangle with an area equal to the product of the above dimensions
|
||
|
||
For some of the problems, such as _Athletic shoes_,
|
||
one of the components involved an extreme value.
|
||
However, we were reasonably confident that subjects would know this value.
|
||
Also, data on these values are readily available so that one could insert the known value.
|
||
|
||
### 4.2. Subjects
|
||
|
||
Subjects for the experiment were individuals
|
||
who answered advertisements in the University of Oregon daily newspaper.
|
||
The advertisements called for participation in judgment and decision-making tasks.
|
||
Two hundred and eighty individuals participated in the experiment,
|
||
which was conducted in two sessions.
|
||
Subjects were randomly assigned to either the global or the decomposition treatment.
|
||
In the first session, the problems _$ coin, U.S. presidents, Argentine immigrants,_
|
||
_Bank failures, Circulation of TV Guide_ and _Bushels of wheat_ were administered.
|
||
Those subjects assigned to the global treatment received all six problems.
|
||
Because of time constraints, subjects assigned to the decomposition condition received half of the problems.
|
||
In the second session, the remaining four problems were administered.
|
||
Again, subjects in the global condition received all four of the remaining problems,
|
||
while decomposition subjects received half of the problems.
|
||
|
||
## 5. Results
|
||
|
||
As had been done in previous studies of judgmental forecasting
|
||
(Armstrong et al., 1975, and MacGregor et al., 1988),
|
||
we used the error ratio as an index of accuracy.
|
||
The error ratio is computed as the ratio of the individual's estimated value to the correct answer,
|
||
or the reverse, such that the result is greater than or equal to 1.0.
|
||
Estimates for a given problem were summarized across subjects
|
||
by computing the geometric mean of the error ratios.
|
||
|
||
We had hypothesized that decomposition would improve accuracy for problems having extreme uncertain values. The results, shown in Table 4, were consistent with this hypothesis. We summarized the problems into two groups: extreme problems (correct answer greater than 3,540,940) and not extreme problems (correct answer 4,004 or less). Accuracy was superior for decomposition in five of the six extreme problems, with an error reduction that ranged from a factor of 4.10 (_Athletic shoes_) to 91.47 (_Auto accidents_). Only the _Circulation of TV Guide_ problem suffered a decrease in accuracy with decomposition. This decrease was modest compared to the gains in accuracy for the other five extreme problems, and this decrease was not statistically significant. Across all six problems, the median error was reduced by a factor of 19.78, approximately a 20-fold improvement in accuracy. Following Winer's method of adding is (as described in Rosenthal, 1978), these results were statistically significant at _p_ < 0.001 using a one-tail test.[^5]
|
||
|
||
[^5]: Because the ratios involved some extreme values, the _t_-tests were done on the logs of the error ratios rather than on the ratios themselves.
|
||
|
||
### Table 4 Error ratios for global versus decomposed estimates (for individuals)
|
||
|
||
| Problems | Sample size | | Error ratios (geometric means) | | Error reduction | t-test |
|
||
| ----------------------------- | ----------- | ------ | ------------------------------ | ------ | --------------- | ------- |
|
||
| | Global | Decomp | Global | Decomp | | |
|
||
| | | | | | | |
|
||
| _Not extreme_ | | | | | | |
|
||
| \$ coin | 64 | 62 | 1.82 | 1.41 | 0.41 | 4.07** |
|
||
| U.S. presidents | 64 | 63 | 1.23 | 1.35 | - 0.12 | -1.55 |
|
||
| Argentine immigrants | 65 | 54 | 4.89 | 46.77 | -41.88 | -5.85** |
|
||
| Bank failures | 64 | 57 | 10.45 | 19.50 | - 9.03 | -1.69 |
|
||
| Median | | | | | - 4.58 | |
|
||
| Combined experiments (z-test) | | | | | | -2.49* |
|
||
| | | | | | | |
|
||
| _Extreme_ | | | | | | |
|
||
| Area of U.S. | 30 | 30 | 33.88 | 1.70 | 32.18 | 6.00** |
|
||
| Circulation of _TV Guide_ | 64 | 60 | 7.76 | 10.96 | - 3.20 | -1.11 |
|
||
| Athletic shoes | 31 | 32 | 19.95 | 15.85 | 4.10 | 0.47 |
|
||
| Auto accidents | 31 | 30 | 93.33 | 1.86 | 91.47 | 8.07** |
|
||
| Men's pants | 31 | 31 | 17.38 | 10.00 | 7.38 | 1.01 |
|
||
| Bushel of wheat | 61 | 62 | 45.71 | 6.92 | 38.79 | 4.57** |
|
||
| Median | | | | | 19.78 | |
|
||
| Combined experiments (z-test) | | | | | | 4.37** |
|
||
|
||
<sup>\*</sup>Significant at _p_ < 0.05
|
||
|
||
<sup>\*\*</sup>Significant at _p_ < 0.001
|
||
|
||
By contrast, accuracy for not extreme problems was reduced with decomposition. Error Auction values for three of the four not extreme problems were negative, indicating a superiority of global estimation over decomposition. Decomposition increased the median error or these problems by 458%, an increase that as statistically significant at _p_ < 0.05. The test or the not extreme values was two-tailed cause we had no directional hypothesis. Our analysis overstates the statistical significance; the various estimates are not completely independent of one another.
|
||
|
||
### 5.1. Uncertainty of estimation
|
||
|
||
Whether decomposition is appropriate depends on some measure of uncertainty.
|
||
We propose that analysts first determine whether the problem,
|
||
is subject to much uncertainty.
|
||
If so, decomposition may be appropriate,
|
||
especially if one can structure the problem to avoid extreme certain values.
|
||
|
||
Otherwise, global estimates should be used.
|
||
Uncertainty decreases the degree to which an estimate from various assessors
|
||
exhibits a lower variance or a reduced range.
|
||
Table 5 shows the interquartile ranges for the global and decomposed estimates.
|
||
The entries consist of the logs of Q1 and Q3 as well as their differences.
|
||
Q1 corresponds to the 25th percentile of the distribution, while Q3 corresponds to the 75th percentile.
|
||
If decomposition reduces uncertainty, then a lower Q3-Q1 difference should result.
|
||
Computed in this way, the differences in Table 5 can be interpreted
|
||
as the number of digits by which the estimates of Q1 and Q3 differed.
|
||
|
||
#### Table 5 Analysis of interquartile ranges
|
||
|
||
| Problems | Global | | | Decomposed | | |
|
||
| ----------------------- | ------ | ------ | ------------------- | ---------- | ------ | ------------------- |
|
||
| | Log Q3 | Log Q1 | Differences (Q3-Q1) | Log Q3 | Log Q1 | Differences (Q3-Q1) |
|
||
| | | | | | | |
|
||
| _Not extreme_ | | | | | | |
|
||
| $ coin | 0.48 | 0.20 | 0.28 | 0.63 | 0.42 | 0.21 |
|
||
| U.S. presidents | 1.71 | 1.59 | 0.12 | 1.70 | 1.53 | 0.17 |
|
||
| Argentine immigrants | 4.30 | 3.30 | 1.00 | 5.74 | 3.60 | 2.14 |
|
||
| Bank failures | 3.70 | 2.08 | 1.62 | 4.92 | 3.18 | 1.74 |
|
||
| | | | | | | |
|
||
| _Extreme_ | | | | | | |
|
||
| Area of U.S. | 6.30 | 4.00 | 2.30 | 6.76 | 6.39 | 0.37 |
|
||
| Circulation of TV Guide | 7.54 | 6.18 | 1.36 | 7.80 | 5.95 | 1.85 |
|
||
| Athletic shoes | 8.00 | 6.00 | 2.00 | 8.84 | 7.75 | 1.09 |
|
||
| Auto accidents | 6.18 | 5.00 | 1.18 | 7.80 | 6.08 | 1.72 |
|
||
| Men's pants | 8.00 | 6.00 | 2.00 | 9.53 | 8.07 | 1.46 |
|
||
| Bushels of wheat | 10.48 | 7.18 | 3.30 | 10.54 | 9.65 | 0.89 |
|
||
|
||
For not extreme problems, the interquartile ranges are higher for the decomposed estimates than the global estimates for three of the four problems. For one problem, _Argentine immigrants,_ the interquartile range for the decomposed version was higher than that for the global (2.14 versus 1.00). This occurred even though each part had the same interquartile range as the target value. This problem did not, then, meet the condition that the parts are easier to forecast than the target value, nor were the errors independent. Thus, it is not surprising that decomposition was not helpful for this problem.
|
||
|
||
For extreme problems, the range for the decomposed estimate was less than that for the global, except for the _Auto accidents_ and _Circulation of TV Guide_ problems. In other words, decomposition often improved confidence for difficult problems when the agreement among assessors' estimates was used to gauge confidence. Furthermore, the differences between the global and decomposed ranges for the four problems with improvements were substantial, being typically greater than one digit. Although the number of problems is not sufficient to assess the relationship between the interquartile ranges arid errors, this result is consistent with that found in the seven problems examined by Aschenbrenner and Kasubek (1978).
|
||
|
||
A tenet of decomposition states that the parts of a problem are more tractable than the whole. This means that uncertainty in the estimates of a problem's components should be lower than that for the global estimate. We computed the interquartile ranges for each of the components of the six problems in Table 6. The parts were easier to estimate than the target value for three problems: _50 ¢ coin, U.S. presidents_ and _Bushels of wheat._ The first two of these had target values that were easy to assess directly, whereas _Bushels of wheat_ had an extreme value that was difficult to measure. The _Bushels of wheat_ problem met all conditions for decomposition. As expected, decomposition was successful for this problem. Conversely, decomposition was less accurate for four of the other five questions.
|
||
|
||
**Table 6 Assessments of subjective confidence**
|
||
|
||
| Problems | Mean knowledge ratings<sup>a</sup> | | Mean accuracy ratings<sup>a</sup> | | Mean probability ratings that estimate is within 10% of true answer | |
|
||
| --------------------- | ---------------------------------- | ------------- | --------------------------------- | ------------- | ------------------------------------------------------------------- | ------------- |
|
||
| | Global | Decomposition | Global | Decomposition | Global | Decomposition |
|
||
| | | | | | | |
|
||
| U.S. Presidents | 6.16 | 5.46 | 6.02 | 5.38 | 64.4 | 35.2 |
|
||
| $ coin | 5.50 | 4.45 | 5.61 | 4.45 | 54.9 | 55.8 |
|
||
| Circulation/TV Guide | 3.40 | 2.35 | 3.46 | 2.58 | 32.1 | 24.6 |
|
||
| Bank failures | 3.19 | 2.17 | 3.06 | 2.20 | 28.0 | 18.9 |
|
||
| Argentinie immigrants | 2.15 | 1.81 | 2.38 | 2.32 | 24.4 | 16.8 |
|
||
| Bushels of wheat | 2.24 | 2.02 | 2.16 | 2.27 | 18.9 | 19.9 |
|
||
|
||
<sup>a</sup> High scores imply greater knowledge and greater perceived accuracy (scale from 1 to 10).
|
||
|
||
### 5.2. Subjective confidence ratings
|
||
|
||
A second source of uncertainty estimates is the subjective confidence
|
||
that forecasters have in their knowledge about a problem.
|
||
We addressed three questions with respect to subjective uncertainty.
|
||
(1) Do alternative measures of uncertainty yield similar recommendations?
|
||
If yes, then we could use the least expensive approach to assessing uncertainty.
|
||
(2) Are judges more confident when they make decomposed estimates or global estimates?
|
||
(3) Does decomposition lead subjects to become better calibrated about their confidence?
|
||
|
||
As the simplest and least expensive approach, we asked subjects to provide judgments of their knowledge about each target value, and the degree to which they thought their estimate would be accurate. Self-ratings of knowledge and accuracy were obtained from the subjects before they made their estimates by using the following scales.
|
||
|
||
> "Before you begin, indicate on the scale below _how much you think you know about the topic_"
|
||
>
|
||
> (1= know very little; 10 = know a great deal).
|
||
>
|
||
> "How _accurately_ do you think you will be able to estimate this quantity?"
|
||
>
|
||
> (1 = low accuracy; 10 = high accuracy).
|
||
|
||
Judgments were obtained for a subset of six problems.
|
||
Table 6 shows alternative assessments of accuracy for these problems.
|
||
|
||
After subjects had estimated the value for each of the six problems,
|
||
we asked them to indicate the probability that their estimate was within 10% of the correct answer.
|
||
These results are also presented in Table 6.
|
||
Finally, we calculated the interquartile ranges of the global estimate for each problem,
|
||
shown in the last column of Table 6.
|
||
|
||
With the exception of the interquartile range,
|
||
the different approaches to subjective confidence produced similar results.
|
||
The intercorrelations among the three measures across the six problems were all over 0.99.
|
||
Given the close correspondence among the three measures,
|
||
they were expected to be of roughly equal value in deciding when to use decomposition.
|
||
|
||
We applied the same procedures to subjects who received the decomposed versions of the problems.
|
||
Across all six problems, subjects had higher self-ratings of problem knowledge
|
||
in the global condition than in the decomposition condition.
|
||
Because subjects in the decomposition condition received more than one estimation problem,
|
||
their self-ratings of problem knowledge
|
||
may have been influenced by the difficulties they experienced with the complexity of the problem.
|
||
This was also the case for self-ratings of accuracy, except for the _Bushels of wheat_ problem.
|
||
Similar results were obtained when we asked the questions about confidence
|
||
after subjects had completed their estimates.
|
||
In other words, the different assessments
|
||
each led to the conclusion that subjects in the decomposition condition
|
||
thought that the problems were more difficult than did subjects in the global estimation condition.
|
||
These results agree with the findings of Sniezek et al. (1990),
|
||
who had concluded that the increased processing
|
||
(for decomposed problems) leads to a reduction in confidence.
|
||
In retrospect, it might have been better for us
|
||
to have asked for estimates of the difficulty for each of the parts.
|
||
Henrion et al. (1993) did this, and their subjects reported
|
||
that the _components_ were easier to estimate than the global value.
|
||
|
||
Are subjects better calibrated when they use decomposition? Probability assessments are said to be externally calibrated if, for a given probability assessment (e.g., 0.6), exactly that proportion (e.g., 60%) turn out to be correct. We summarized the calibration results for global and decomposed estimates, across all ten problems. Mean probability assessments were generally higher than the proportion correct for both approaches, indicating overconfidence. On average, those making global estimates expected 38.9% of their answers to be within 10% of the true value, but only 10.9% were that accurate. Those using the decomposed approach expected 32.6% of their estimates to be within 10% of the true value, but only 9.0% were that accurate. In effect, decomposition reduced overconfidence from 28.0% in the global case to 23.6% for decomposition, with the largest reduction occurring in those situations where subjects felt most confident, as shown in Fig. 1.
|
||
|
||
### 5.3. Limitations
|
||
|
||
Two of the four problems in the not extreme version
|
||
_(Argentine immigrants_ and _Bank failures)_
|
||
involved elements with extreme values.
|
||
Because each of the components had an element dealing with the U.S. population,
|
||
we assumed that the subjects would be familiar with these values.
|
||
To examine this assumption, we analyzed the population estimates for each of the problems.
|
||
The median population estimate for the _Argentine immigrants_ problem
|
||
was in error by a factor of 1.97 from the actual,
|
||
while for _Bank failures_ it was in error by a factor of 1.42.
|
||
For both problems, errors for the U.S. population component
|
||
were less than errors for the global quantities.
|
||
Nevertheless, we were surprised at the difficulty individuals had with estimating this value.
|
||
In practical problems, of course, one could simply use the actual value.
|
||
In their study of decomposition, Henrion et al. (1993) gave the U.S. population value to the subjects.
|
||
|
||
> ##### Fig. 1. Calibration of probability assessments that estimated answer is within 10% of true answer.
|
||
>
|
||
> %% ![[figure_1.jpeg]] %%
|
||
|
||
The issue of 'how extreme is extreme' has not been resolved.
|
||
We proposed a definition based on the number of digits (six or seven),
|
||
but we did not examine alternatives.
|
||
Nor did we resolve the issue of how to specify the unit of measure.
|
||
|
||
We expect that other conditions might affect decisions on when to use decomposition.
|
||
For example, question type may have some importance.
|
||
We do not know the extent to which our problem selection may have affected findings.
|
||
|
||
## 6. Discussion
|
||
|
||
Despite the improved accuracy it afforded,
|
||
decomposition did not increase subjects' confidence in the accuracy of their estimates.
|
||
However, the interquartile estimates were smaller for the decomposed estimates
|
||
and confidence in the accuracy of estimates was slightly more appropriate.
|
||
|
||
Perceived uncertainty measures are easy to obtain.
|
||
As shown in Table 6, self-assessments of uncertainty
|
||
provided similar rankings of the relative uncertainty for the problems.
|
||
The interquartile ranges provided somewhat different information than the self-assessments.
|
||
Interquartile ranges of the estimates are not expensive, but they do require a pretest.
|
||
|
||
The present study addresses the issue of whether estimates by individuals
|
||
can be improved when no other data are available.
|
||
However, we expect that other situational characteristics
|
||
or estimation-aiding strategies
|
||
would also affect the usefulness of decomposition.
|
||
For example, a forecaster could decompose a problem
|
||
to use different sources of information or different experts.
|
||
For some parts of the problem, known values may exist.
|
||
Alternative decomposition methods could be used to produce an estimate,
|
||
and resulting values for a quantity could be resolved in light of one another.
|
||
MacGregor and Liehtenstein (1991) attempted such an approach
|
||
and found that subjects tended to resolve estimates by applying an averaging model.
|
||
Revised estimates generally fell between two estimates of a target quantity,
|
||
where each judgmental estimate was produced by a different method.
|
||
|
||
Although our approach to decomposition was harmful for problems
|
||
that did not involve extreme - uncertain numbers,
|
||
there might be alternative approaches that are successful.
|
||
For example, decomposition might restructure a problem
|
||
so that it is easier for subjects to think about.
|
||
|
||
Decomposition tended to reduce estimators' confidence levels,
|
||
perhaps because of the increased processing involved.
|
||
This reduction in overconfidence and the improvements in accuracy
|
||
produced modest gains in calibration.
|
||
|
||
### 7. Conclusions
|
||
|
||
The theory behind decomposition is simple.
|
||
What is difficult is how to translate the theory into operational terms.
|
||
We examined some operational procedures
|
||
for identifying conditions under which decomposition should improve accuracy.
|
||
|
||
Extreme uncertain values are difficult for subjects to estimate.
|
||
We hypothesized that decomposition to remove extreme values would improve estimation accuracy.
|
||
This study examined nine "extreme value-high uncertainty" problems from two prior studies.
|
||
Decomposition proved useful for each of these nine problems,
|
||
and the typical gain in accuracy was substantial
|
||
(error ratio was reduced by 96.3 for the study with six problems,
|
||
and by 12.3 for the study with three problems).
|
||
In the present study, involving six problems with extreme values,
|
||
the error ratio was reduced by a factor of almost 20.[^6]
|
||
Decomposition failed for one extreme problem
|
||
because it was not successful in producing more accurate estimates of the parts.
|
||
|
||
[^6]: The results from Hora et al. (1993) also are consistent with our hypothesis.
|
||
They found that decomposition was more accurate than global estimates
|
||
for three quantities whose true values had at least eight digits
|
||
(e.g., What were the sales for Long's Drug Stores in Hawaii in 1986?).
|
||
|
||
Decomposition was risky for problems
|
||
that did not involve extreme and uncertain values.
|
||
For six such problems from two prior studies,
|
||
decomposition had little overall effect on accuracy.
|
||
However, for four such problems in the current study,
|
||
decomposition yielded less accurate estimates
|
||
by an average error ratio of 458%.
|
||
|
||
Based on the limited evidence to date,
|
||
we suggest the following procedure for judgmental decomposition.
|
||
First, assess whether the target value is subject to much uncertainty
|
||
by using either a knowledge rating or an accuracy rating.
|
||
If the problem is an important one, obtain interquartile ranges.
|
||
For those items rated above the midpoint on uncertainty
|
||
(or above 10 on the interquartile range),
|
||
conduct a pretest with 20 subjects
|
||
to determine whether the target quantity is likely to be extreme.
|
||
If the upper quartile geometric mean has seven or more digits,
|
||
decomposition should be considered.
|
||
For these problems, compare the interquartile ranges for the target value
|
||
against those for the components and for the recomposed value.
|
||
If the ranges are less for the global approach, use the global approach.
|
||
Otherwise use decomposition.
|
||
|
||
The current study suggests that decomposition has more limited value that previously thought.
|
||
It improved accuracy only when the situation involved uncertain and extreme quantities.
|
||
Furthermore, decomposed elements needed to be easier to estimate than the global.
|
||
For problems that did not concern extreme values with high uncertainty
|
||
or where estimates of the parts were not more accurate than that of the target value,
|
||
decomposition produced less accurate estimates.
|
||
|
||
## Acknowledgements
|
||
|
||
This research was supported in part by the National Science Foundation under Contract SES-9013069 to Decision Research. Fred Collopy, George Loewenstein, Robin Hogarth and unidentified referees provided helpful comments on early drafts. Jennifer L. Armstrong, Suzanne Berman, Gina Bloom, Vanessa Lacoss, Phan Lam and Leisha Mullican provided editorial assistance.
|
||
|
||
## References
|
||
|
||
* Armstrong, J.S., W.B. Denniston and M.M. Gordon (1975),
|
||
"The use of the decomposition principle in making judgments,"
|
||
_Organizational Behavior and Human Performance_, 14, 257-263.
|
||
|
||
* Aschenbrenner, K.M. and W. Kasubck (1978),
|
||
"Challenging the Cushing syndrome: Multiattribute evaluation of cortisone drugs,"
|
||
_Organizational Behavior and Human Performance_, 22, 216-234.
|
||
|
||
* Henrion, M., G.W. Fischer and T. Mullin (1993),
|
||
"Divide and conquer? Effects of decomposition on the accuracy and calibration of subjective probability distributions,"
|
||
_Organizational Behavior and Human Performance_, 55, 207--227.
|
||
|
||
* Hertzberg, H. (1970), _One Million_. Simon and Schuster, New York.
|
||
|
||
* Hora, S.C., N.G. Dodd and J.A. Hora (1993),
|
||
"The use of decomposition in probability assessments of continuous variables,"
|
||
_Journal of Behavioral Decision Making_, 6, 133-147.
|
||
|
||
* MacGregor, D.G. and S. Lichtenstein (1991),
|
||
"Problem structuring aids for quantitative estimation,"
|
||
_Journal of Behavioral Decision Making, 4, 101-116._
|
||
|
||
* MacGregor, D.G., S. Lichtenstein and P. Slovic (1988),
|
||
"Structuring knowledge retrieval: An analysis of decomposed quantitative judgments,"
|
||
_Organizational Behavior and Human Decision Processes_, 42, 303-323*.*
|
||
|
||
* Raiffa, H. (1968), _Decision Analysis._ Princeton University Press, Princeton, New Jersey.
|
||
|
||
* Rosenthal, R. (1978), "Combining results of independent studies," _Psychological Bulletin_, 85, 185-193.
|
||
|
||
* Sniczek, J.A., P.W. Paese and F.S. Switzer, III (1990),
|
||
"The effect of choosing on confidence in choice,"
|
||
_Organizational Behavior and Human Decision Processes_, 46, 264-282.
|