204 lines
5.6 KiB
Markdown
204 lines
5.6 KiB
Markdown
---
|
|
tags:
|
|
- topic/estimating
|
|
- topic/risk
|
|
- type/encyclopedia
|
|
title: Estimator Calibration
|
|
---
|
|
# Estimator Calibration
|
|
|
|
Calibration is the process of learning to compensate for one's biases
|
|
in order to produce more accurate estimates.
|
|
|
|
Generally speaking, people tend to underestimate [[risk]]
|
|
and tend to be overconfident of their estimates.
|
|
|
|
> [!note] Confidence
|
|
> **Confidence** is an estimate of the accuracy of another estimate.
|
|
> To be "overconfident" is to consistently rate one's confidence
|
|
> above the observed accuracy of their estimates.
|
|
|
|
An estimator who is well calibrated
|
|
can properly account for such bias.
|
|
|
|
## Calibration Questions
|
|
|
|
Calibration is generally achieved
|
|
by having the estimator make many estimates,
|
|
then immediately observe their results.
|
|
|
|
This is repeated in rounds of question and review
|
|
until the desired results are achieved.
|
|
|
|
### Writing Good Calibration Questions
|
|
|
|
#### Ideal Difficulty
|
|
|
|
Calibration requires that the estimator have some confidence,
|
|
but not total certainty, in their response.
|
|
|
|
> [!failure] Bad
|
|
> T/F: When rolling 2 dice, a roll of 7 is more likely than a 3.
|
|
|
|
#### No "Trick" Questions
|
|
|
|
Questions should be unambiguously verifiable.
|
|
|
|
> [!failure] Bad
|
|
>
|
|
> > T/F: Any male pig is referred to as a hog.
|
|
>
|
|
> Referred to by whom?
|
|
|
|
> [!failure] Bad
|
|
>
|
|
> > T/F: In English, the word "quality" is more frequently used that the word "speed".
|
|
>
|
|
> Used more frequently where?
|
|
|
|
> [!success] Good
|
|
>
|
|
> > T/F: Pakistan shares a border with Russia
|
|
|
|
> [!tip]
|
|
> Definitions, terminology, and language are _always_ contentious,
|
|
> questions based on them always feel deceptive.
|
|
|
|
#### Phrasing
|
|
|
|
Interval "questions" should describe the quantity
|
|
rather than phrase it as a question.
|
|
|
|
> [!failure] Bad
|
|
> Q: How many gold medals did Jesse Owens win at the 1936 Berlin Olympics?
|
|
|
|
> [!success] Good
|
|
> Q: Number of gold medals won by Jesse Owens in the 1936 Berlin Olympics
|
|
|
|
### Strategy for Answering Calibration Questions
|
|
|
|
Confidence should never be less than probability of picking randomly
|
|
(50% for true)
|
|
|
|
### Examples
|
|
|
|
#### Boolean
|
|
|
|
> The melting point of tin is higher than the melting point of aluminum.
|
|
|
|
> California's giant sequoia trees are named for an early 19th century leader of the Cherokee Indians.
|
|
|
|
reductive
|
|
|
|
> The Model T was the first car produced by Henry Ford.
|
|
|
|
reductive (Henry Ford didn't produce cars)
|
|
|
|
> No one has ever been reported to have been hit by any object that fell from space.
|
|
|
|
reductive (reported by whom?)
|
|
|
|
> Sir Christopher Wren was a British anthropologist.
|
|
|
|
> Pakistan does not border Russia.
|
|
|
|
unnecessary negative form, otherwise good.
|
|
|
|
> The Navy won the first Army-Navy football game.
|
|
|
|
perfect.
|
|
|
|
> The paperback version of the book "The Da Vinci Code",
|
|
> as of July 2007, still ranks in the top 500 bestselling books on Amazon.
|
|
|
|
obtuse phrasing, dated topic, otherwise good
|
|
|
|
> Italian has more words than any other language.
|
|
|
|
reductive (what is a word? what dialect?)
|
|
|
|
> The month of August is named after a Greek god.
|
|
|
|
borderline facile, reductive
|
|
|
|
> The deepest ocean trench is deeper than the Grand Canyon.
|
|
|
|
facile
|
|
|
|
> Abraham Lincoln was the first president born in a log cabin.
|
|
|
|
deceptive phrasing
|
|
|
|
> As of July of 2007, more people search Google for "Harry Potter" than "Hillary Clinton"
|
|
> (according to GoogleTrends).
|
|
|
|
obtuse phrasing, dated topic, otherwise good
|
|
|
|
> The population of Alabama is higher than the population of Arizona.
|
|
|
|
borderline facile, deceptive phrasing
|
|
|
|
> No category 5 hurricane hit the US in 2004.
|
|
|
|
> The UK is among the top 10 largest economies in the world (by GDP).
|
|
|
|
> The movie Forest Gump has grossed more to date than E.T. The Extra Terrestrial.
|
|
|
|
obtuse phrasing, dated topic, otherwise good
|
|
|
|
#### Interval
|
|
|
|
> What percentage of bronze is typically made of copper?
|
|
|
|
As written, the correct answer is 100%.
|
|
All bronze alloys contain copper.
|
|
Being more generous,
|
|
there is no standard composition of bronze.
|
|
The subject can only guess at the interrogator's intent.
|
|
Average by weight produced?
|
|
Over what time frame?
|
|
In the U.S. or globally?
|
|
|
|
> How many countries have at least one McDonald's?
|
|
|
|
As of when?
|
|
|
|
> How many employees did eBay have in the first quarter of 2006
|
|
|
|
> What was the population of Miami (within the city limits, not the entire metropolitan area) in 1990?
|
|
|
|
> How many casualties did the French suffer in the Battle of Waterloo?
|
|
|
|
> What is the range in miles of a Minuteman Missile?
|
|
|
|
> What is the percentage of IT jobs in the US were unfilled in 1997?
|
|
|
|
> The Supremes' (with Diana Ross) song "Stop! In the Name of Love" was how long? (minutes, seconds)
|
|
|
|
> How many undergraduates attended Cambridge in 1990?
|
|
|
|
> If you could jump 50 feet straight up into the air, how many seconds would you be airborne before you landed?
|
|
|
|
> How many gallons are in a bushel (they are both measures of volume)?
|
|
|
|
I wonder if Hubbard had the same thought I did
|
|
while reading [[macgregor_1994_judgmental-decomposition|MacGregor et al. (1994)]].
|
|
|
|
> How many sovereign rulers has England had in the last thousand years?
|
|
|
|
> If the air temperature was 5 degrees below zero (Fahrenheit) and the wind speed was 15 mph, what would the temperature adjusted for wind-chill be?
|
|
|
|
> Average cost of testing in software development is what percentage of total project costs?
|
|
|
|
> On average, if a software development project was projected to take 17 months, it actually takes how many months?
|
|
|
|
> How many meters tall is the Sears Tower?
|
|
|
|
> How many gold medals did Jesse Owens win at the 1936 Berlin Olympics?
|
|
|
|
> In 2005, the average combined MPG for all US cars and light trucks on the road was how much?
|
|
|
|
> The average house in the United States uses how many gallons of water per day?
|
|
|
|
> What was the average price in the United States of a house sold in 2001?
|