Comparing Weather APIs

Introduction

The SARBayes project uses the International Search & Rescue Incident Database (ISRID) [1] to study and forecast lost person behavior. To augment the predictive power of the project's models, we can supplement sparsely populated fields in ISRID with other sources of data. For instance, given an incident's date and location, we can pull data from online application programming interfaces (APIs) to fill in missing values for weather conditions such as temperature and precipitation.

We considered two historical weather data sources: the National Oceanic and Atmospheric Administration's (NOAA) Climate Data Online Web Service and Weather Services International's (WSI) Cleaned Observation API. This post presents how closely the measurements provided by these two services matched the values observed by search-and-rescue personnel on the ground and recorded in the database. On average, WSI had better coverage and was 2 °C, 1 °C, and 3 km/h closer in maximum temperature, minimum temperature, and wind speed, respectively.

Methods

Error Calculation

The features we compared are daily maximum temperature, daily minimum temperature, and wind speed. We found no quantitative measurements for snow or rain in ISRID, and excluded these features from the test. For each case with a valid date and latitude and longitude coordinates, we pulled "experimental" values from each API. If both an observed value and an experimental value for a particular feature were available, we calculated the error of that API's feature as

\[\]

That is, the error measures how far off the experimental value is and in which direction. A positive error indicates overestimation by the API, and a negative value indicates underestimation. The error shares the same units as its feature.

It is worth noting that the observations made and recorded by searchers are by no means perfect. For instance, we found several hundred instances where the high temperature was exactly the same as the low temperature, which may be an entry error. However, given how many cases occur in areas where conditions may vary greatly across small distances, such as mountainous terrain, we assume the data collected by searchers near the exact site of the incident best describe the actual conditions the lost person experienced.

API Access Implementation

Readers willing to trust the validity of our data collection process are free to skip to the Results section.

NOAA

To access the Climate Data Online Web Service [2], users must send HyperText Transfer Protocol (HTTP) requests to a valid API endpoint using the GET method. Each request requires an access token, which is initially limited to 1000 requests per day and is available free of charge to anyone who registers an email address with NOAA. The API returns the requested data as plain text in compressed JavaScript Object Notation (JSON) form.

Within the web service, measurements have associated data types, which are listed at the datatypes endpoint. We mapped the TMAX data type to the daily maximum temperature feature in ISRID, TMIN to daily minimum temperature, and AWND to wind speed. The API reports values for temperature in tenths of a degree Celsius and for wind speed in tenths of a meter per second.

For each case, we calculated the bounds enclosing a square region with the initial planning point (IPP) at the square's center. We arbitrarily chose 40 km as the length of each side, and assumed weather conditions would be similar within the region. Then, we queried the stations endpoint with the bounds and date to obtain a set of relevant station identifiers.

To obtain the actual measurements, we sent the date, set of station identifiers, and set of needed datatypes to the data endpoint. If the service reported more than one value per data type, we took a simple average. Each case consumed two requests, at most.

WSI

Users may access the Cleaned Observation API [3] by sending GET requests with the required parameters and key to http://cleanedobservations.wsi.com/CleanedObs.svc/GetObs. We received a free key during a trial period to conduct this test.

Unlike Climate Data Online, we merely neeeded to specify latitude and longitude coordinates to access historical data; the API manages data compilation internally. Also, measurements are offered for every hour. We obtained daily maximum and minimum temperatures by taking the extremes of a list of values for the surfaceTemperatureCelsius data type, the dry bulb temperature two meters above the surface. To obtain the average daily wind speed, we averaged that day's values for windSpeedKph, the unobstructed wind speed 10 meters up. The API reported temperatures in degrees Celsius and wind speeds in kilometers per hour.

Results

We found 3049 observed values for daily high temperature, 1824 for daily low temperature, and 1456 for wind speed. By taking the absolute value of each error value, we summarized the key statistics of each feature in the following tables.

NOAA API Error Statistics
Feature Count Coverage (%) Mean Standard Deviation
Daily Maximum Temperature 2952 96.8 7.4 °C 7.1 °C
Daily Minimum Temperature 1750 95.9 8.3 °C 7.5 °C
Daily Average Wind Speed 428 29.4 13.1 km/h 13.5 km/h
WSI API Error Statistics
Feature Count Coverage (%) Mean Standard Deviation
Daily Maximum Temperature 3048 99.9 5.5 °C 4.8 °C
Daily Minimum Temperature 1823 99.9 7.2 °C 7.1 °C
Daily Average Wind Speed 1455 99.9 10.3 km/h 11.9 km/h

For each feature, we also sorted the signed error values into 50 evenly-sized bins and charted the frequency of the bins as histograms. The normal distributions for each list of error values are overlaid on the plots as dashed curves. Distributions tightly clustered around 0, no error, indicate better conformance to the observed values.

daily-max-temp-error

daily-min-temp-error

Both APIs tend to overestimate values for both types of temperature. Daily maximum temperature estimates from both APIs tend to exhibit less variance than minimum temperature estimates; perhaps we can attribute this behavior to the aforementioned cases where the maximum temperature may have been carried over into the minimum temperature field. Interestingly, for both temperature features, NOAA has a noticable "tail" of high-error cases on the right where many estimates were off by more than 20 degrees Celsius.

daily-avg-wind-speed-error

NOAA also has many overestimates for wind speed, which may be responsible for dragging the mean of the absolute error up.

Conclusion

For the purpose of populating ISRID, WSI is the better of the two services, offering nearly universal coverage, lower error, and greater ease-of-use.

NOAA may have underperformed because the cases we sampled were not representative of the entire database. Searchers in different geographic regions may have different procedures and resources for collecting weather data, possibly excluding certain areas from the test where NOAA has more stations and performs particularly well. We may have also introduced error into NOAA's measurements by taking a simple average of values from multiple stations. Because few subjects actually reached the edges of the regions we sampled from, naïvely treating all stations equally may have harmed the API's accuracy. An average weighted on distance from the IPP can remedy this issue.

Future Work

Another API we may investigate is forecast.io, which is in use for SARCAT. Future work may include importing geospatial data and using the IPP and find location to populate fields such as elevation change and track offset. Also, survival models may benefit from measurements made closer to the time of the incident by using an hourly, rather than daily, time resolution.

References

[1] dbS Productions LLC. Incident Search & Rescue Incident Database. https://www.
dbs-sar.com/SAR_Research/ISRID.htm
, 2011. [Proprietary; provided for free for anal-
ysis purposes].

[2] National Oceanic and Atmospheric Administration. Web Services API (version 2) Docu-
mentation. https://www.ncdc.noaa.gov/cdo-web/webservices/v2. [Provided for free
to the public].

[3] Weather Services International Corporation. Cleaned Observation API Documentation.
PDF document. [Provided during a free trial].

Author: Jonathan Lee

Jonathan is a senior at Thomas Jefferson High School for Science and Technology in Northern Virginia. He joined the SARBayes project in the summer of 2014 and has worked on the MapScore website and survival modeling.