The Crowd Within ctwardy | blog | SARBayes

NRC Panel: My paper is described in this Oct. blog entry.
SAR
Bob Koester
SAR World
Re-search
Ken C.
ISRID - inactive
D4H

Modeling
Causality
Causal Inference
Gelman
Flowing Data
Nuit Blanche
ACFR
Machine Learning
DecisionScience

Inference
Tim van Gelder
Overcoming Bias
Less Wrong
Zero Intel. Agents
ADVAT
Sources and ...

Φ
Socrates Wake
Computational Φ
Leiter Report
Experimental Φ
PhilPapers

Friends
pfh
njh'
Older njh
jpl-rpl


My Weblogs
SARBayes
Prior Analytics
SCI 410 news
Blogger -unused


According to Vul & Pashler, the "Wisdom of Crowds" happens within a single person. (Updated June 30, 2007 -- my guess wasn't as good as it first appeared.)

The Results

We know that averaging guesses from different people yields better estimates. The effect is called the Wisdom of the Crowds. Surowiecki has a recent book on it.

Curiously, according to the paper described above, the effect holds when averaging your own guesses. That is, when answering numerical questions like, "What percentage of the world's airports are in the USA?" you do better by guessing twice and averaging.

I don't know what percentage of airports are in the USA. I know that I'm uncertain, so I know I have a distribution over probable values. If I "knew" that distribution, even unconsciously, then my "best guess" should be the mean, and averaging two guesses shouldn't help. But apparently it does. The authors suggest this means that my "best guess" is really a random draw from some internal "distribution".

How much does it help? If I make my second guess right away, averaging does about 6% better than either guess, or 1/10th as much as averaging with someone else's opinion. If I wait three weeks, it does about 16% better, or about 1/3 as much as averaging an outside opinion.

Can we do better?

So just averaging two of my own guesses can improve my estimates. It helps to get at the uncertainty I intuitively know is there. But can we do better?

Njh asks whether we can use the shape of the distribution to improve our estimate of the mean. For example, could we fit a curve to our two points, and do better than averaging? We can estimate the shape on this task from Vul & Pasher's data. (Well, they could.) Is it robust?

What about asking about the uncertainty directly? Do I have any direct access to my internal distribution? Forcing myself to guess the percentage of airports in the USA, the first number that pops into mind is 50%. But I'm going to try to think through it. Hmmm... I know with 100% confidence it's in [0..100]%. And I know that the aviation industry is US-centric, for historical and economic reasons. Therefore I know the U.S. will have more than "it's share" of airports. I'm moderately confident that the answer is at least 50%. To be more cautious, say at least 40%. Surely it is less than 90%, at least for airports used by airlines. So I'd say [40..90]%. Beyond that, I don't know. So I'll guess the midpoint, 65%.

Intuitively, I feel I'm at least 95% sure, but I know that when people say this, they only bracket the true value about 70% of the time. So let's say it's a 70% credible interval. That means there's about a 1 in 3 chance I've missed it. There 10% above and 40% below, so probably if I've missed it, the real value is below 40%. How could that be? Maybe the aviation industry isn't "USA-centric". Maybe that's just my "home team" bias. The USA surely has proportionally more airports than smaller and train-savvy Europe, highly urban Australia, or the vast but economically challenged Soviet Union. But maybe not to the extent I thought. So, maybe 30%. That feels too low, but not impossible.

Can I average these? They're not independent, they're negatively correlated. But that should make it more likely I've bracketed the true value, so averaging will help.

Averaging 65% and 30% gets me 47.5%.

How did I do?

OK, time to check. (Seriously, I am only checking now.)

Look at the data fromclaiming to be from the CIA World Fact Book. The US has nearly 15,000, and only 5 other countries have more than 1000. My first thought is, "Wow, I was completely wrong. The US dominates. Maybe they counted private air strips." But I added the numbers. The total number of airports is 30791.

The USA has 48% of the world airports.

UPDATE: The real list shows the world has about 49,000 airports, not about 31,000. So the U.S. has 30% of the world's airports. In my case, averaging certainly improved my first guess, but my second guess was actually much better. When I first checked my answer, I just Googled for the list, and found a site claiming to use the data from the CIA World Fact Book (the same as the authors used in their paper). But in fact they omitted many countries, including countries ranked 8-11, which alone added 3,500 airports. My guesses are more like bracketing than the guesses in the paper, where participants did not know they would be asked for a second estimate. Averaging still applies. I became aware of my error because when reading Hal Finney's post, forwarded to the securitymetrics list. I also see that the paper has been picked up in the Economist and other places.




Loading
[æ]