Our MapScore paper is now in press at Transactions in GIS! From the abstract:
The MapScore project described here provides a way to evaluate probability maps using actual historical searches. In this work we generated probability maps based on the statistical Euclidean distance tables from ISRID data (Koester, 2008), and compared them to Doke’s (2012) watershed model. Watershed boundaries follow high terrain and may better reflect actual barriers to travel. We also created a third model using the joint distribution using Euclidean and watershed features. On a metric where random maps score 0 and perfect maps score 1, the ISRID Distance Ring model scored 0.78 (95%CI: 0.74-0.82, on 376 cases). The simple Watershed model by itself was clearly inferior at 0.61, but the Combined model was slightly better at .81 (95%CI: 0.77-0.84).
A logical extension of the Distance Rings model is to fit a smooth function to the distribution of data found in ISRID. Examining the Euclidean Distance data for different categories, it was found that a lognormal curve roughly captured the shape of the data. The Log-Normal (LN) is a two parameter distribution which assumes that the logarithm of your data follows a normal distribution. The probability density function of the LN curve is given by, where are the mean and standard deviation of the logarithm of distance.
Lin & Goodrich at Brigham Young are working on Bayesian motion models for generating probability maps. They have an interesting model, but need GPS tracks to train it. It's a nice complement to our approach, and it will be interesting to see how they compare.
~Originally a very cool review published in the first half of 2010. The review led to phone calls and a very productive collaboration on MapScore and other work.
Syrotuck's main study is his 1976, with N=242. But he gives much more detail about distance travelled in his 1975 paper, breaking distance down every 0.2 miles. Unfortunately, he only reports probabilities, not numbers, and doesn't even report total N. We know he got more data between 1975 and 1976, but didn't know how much. Is the 1975 breakdown representative of the 1976 data? Unfortunately, no one has Syrotuck's original data. But we re-created it. (Spreadsheets available!)
Early 2003: Charles Twardy plans to reanalyze the Virginia data, correcting for some problems in last year's run. In February, we will analyze the Australian data for the draft report.
Dec 2001: In preparation for the Australian data, Adam Golding analyzed the Virginia data. Cluster analysis revealed only 4 or 5 types of lost person, assuming Gaussian (bell-shaped curve) types.
Adam Golding and Luke Hope then tested several machine-learned models, Syrotuck's model, and a simple model estimated by Rik Head. There were strong differences in predictive accuracy, but negligible differences in a more meaningful score, information reward. The most recent presentation of this work was in Charles Twardy's presentation to the NASAR 2002 conference in Charlotte, NC (June 2002).
11 Dec 2002: We have 271 entries comprising about 200 separate cases from all states except New South Wales (and the Australian Capital Territory). We have sent each state a copy of their data and a request for corrections. We intend to release a draft report at the end of February 2003 summarizing the data. We expect a final report about six months later.