180 lines
5.3 KiB
Markdown
180 lines
5.3 KiB
Markdown
---
|
||
id:
|
||
aliases: []
|
||
tags:
|
||
- destiny/permanent
|
||
- status/incomplete
|
||
- topic/estimating
|
||
- topic/risk
|
||
- type/encyclopedia
|
||
- authorship/original
|
||
title: Estimator Calibration
|
||
---
|
||
# Estimator Calibration
|
||
|
||
Calibration is the process of learning to compensate for one's biases
|
||
in order to produce more accurate estimates.
|
||
|
||
Generally speaking, people tend to underestimate [[risk]]
|
||
and tend to be overconfident of their estimates.
|
||
|
||
> [!note] Confidence
|
||
> **Confidence** is an estimate of the accuracy of another estimate.
|
||
> To be "overconfident" is to consistently rate one's confidence
|
||
> above the observed accuracy of their estimates.
|
||
|
||
An estimator who is well calibrated
|
||
can properly account for such bias.
|
||
|
||
## Calibration Questions
|
||
|
||
Calibration is generally achieved
|
||
by having the estimator make many estimates,
|
||
then immediately observe their results.
|
||
|
||
This is repeated in rounds of question and review
|
||
until the desired results are achieved.
|
||
|
||
### Writing Good Calibration Questions
|
||
|
||
#### Ideal Difficulty
|
||
|
||
Calibration requires that the estimator have some confidence,
|
||
but not total certainty, in their response.
|
||
|
||
> [!failure] Bad
|
||
> T/F: When rolling 2 dice, a roll of 7 is more likely than a 3.
|
||
|
||
#### No "Trick" Questions
|
||
|
||
Questions should be unambiguously verifiable.
|
||
|
||
> [!failure] Bad
|
||
>
|
||
> > T/F: Any male pig is referred to as a hog.
|
||
>
|
||
> Referred to by whom?
|
||
|
||
> [!failure] Bad
|
||
>
|
||
> > T/F: In English, the word "quality" is more frequently used that the word "speed".
|
||
>
|
||
> Used more frequently where?
|
||
|
||
> [!success] Good
|
||
>
|
||
> > T/F: Pakistan shares a border with Russia
|
||
|
||
> [!tip]
|
||
> Definitions, terminology, and language are _always_ contentious,
|
||
> questions based on them always feel deceptive.
|
||
|
||
#### Phrasing
|
||
|
||
Interval "questions" should describe the quantity
|
||
rather than phrase it as a question.
|
||
|
||
> [!failure] Bad
|
||
> Q: How many gold medals did Jesse Owens win at the 1936 Berlin Olympics?
|
||
|
||
> [!success] Good
|
||
> Q: Number of gold medals won by Jesse Owens in the 1936 Berlin Olympics
|
||
|
||
### Strategy for Answering Calibration Questions
|
||
|
||
Confidence should never be less than probability of picking randomly
|
||
(50% for true)
|
||
|
||
### Examples
|
||
|
||
#### Boolean
|
||
|
||
> The melting point of tin is higher than the melting point of aluminum.
|
||
|
||
> California's giant sequoia trees are named for an early 19th century leader of the Cherokee Indians.
|
||
|
||
reductive
|
||
|
||
> The Model T was the first car produced by Henry Ford.
|
||
|
||
reductive (Henry Ford didn't produce cars)
|
||
|
||
> No one has ever been reported to have been hit by any object that fell from space.
|
||
|
||
reductive (reported by whom?)
|
||
|
||
> Sir Christopher Wren was a British anthropologist.
|
||
|
||
> Pakistan does not border Russia.
|
||
|
||
unnecessary negative form, otherwise good.
|
||
|
||
> The Navy won the first Army-Navy football game.
|
||
|
||
should specify the official event name, otherwise good.
|
||
|
||
> The paperback version of the book "The Da Vinci Code", as of July 2007, still ranks in the top 500 bestselling books on Amazon.
|
||
|
||
obtuse phrasing, dated topic, otherwise good
|
||
|
||
> Italian has more words than any other language.
|
||
|
||
reductive (what is a word? what dialect?)
|
||
|
||
> The month of August is named after a Greek god.
|
||
|
||
borderline facile, reductive
|
||
|
||
> The deepest ocean trench is deeper than the Grand Canyon.
|
||
|
||
facile
|
||
|
||
> Abraham Lincoln was the first president born in a log cabin.
|
||
|
||
deceptive phrasing
|
||
|
||
> As of July of 2007, more people search Google for "Harry Potter" than "Hillary Clinton" (according to GoogleTrends).
|
||
|
||
obtuse phrasing, dated topic, otherwise good
|
||
|
||
> The population of Alabama is higher than the population of Arizona.
|
||
|
||
borderline facile, deceptive phrasing
|
||
|
||
> No category 5 hurricane hit the US in 2004.
|
||
|
||
> The UK is among the top 10 largest economies in the world (by GDP).
|
||
|
||
> The movie Forest Gump has grossed more to date than E.T. The Extra Terrestrial.
|
||
|
||
obtuse phrasing, dated topic, otherwise good
|
||
|
||
#### Interval
|
||
|
||
> What percentage of bronze is typically made of copper?
|
||
|
||
reductive
|
||
|
||
> How many countries have at least one McDonald's?
|
||
|
||
As of when?
|
||
|
||
> How many employees did eBay have in the first quarter of 2006
|
||
> What was the population of Miami (within the city limits, not the entire metropolitan area) in 1990?
|
||
> How many casualties did the French suffer in the Battle of Waterloo?
|
||
> What is the range in miles of a Minuteman Missile?
|
||
> What is the percentage of IT jobs in the US were unfilled in 1997?
|
||
> The Supremes' (with Diana Ross) song "Stop! In the Name of Love" was how long? (minutes, seconds)
|
||
> How many undergraduates attended Cambridge in 1990?
|
||
> If you could jump 50 feet straight up into the air, how many seconds would you be airborne before you landed?
|
||
> How many gallons are in a bushel (they are both measures of volume)?
|
||
> How many sovereign rulers has England had in the last thousand years?
|
||
> If the air temperature was 5 degrees below zero (Fahrenheit) and the wind speed was 15 mph, what would the temperature adjusted for wind-chill be?
|
||
> Average cost of testing in software development is what percentage of total project costs?
|
||
> On average, if a software development project was projected to take 17 months, it actually takes how many months?
|
||
> How many meters tall is the Sears Tower?
|
||
> How many gold medals did Jesse Owens win at the 1936 Berlin Olympics?
|
||
> In 2005, the average combined MPG for all US cars and light trucks on the road was how much?
|
||
> The average house in the United States uses how many gallons of water per day?
|
||
> What was the average price in the United States of a house sold in 2001?
|