Weather or not…

John Hurley
6 min readMay 31, 2020

Simple daily weather summary with Python

Photo by David Watkis on Unsplash

As part of a larger project to automate Audubon’s annual Christmas Bird Count, I wanted to produce a short summary of weather conditions on a particular day. Dating back to the original Christmas Bird Count in 1900, part of the information requested is the local weather:

Such reports should be headed by the locality, hour of starting and returning, character of the weather, direction and force of the wind, and the temperature; the latter taken when starting.

The end result ideally would be something like this:

Weather was partly cloudy with a low of 45º F and a high of 62º F, winds from the NW gusting to 12mph.

As with all engineering problems, the requirements also come with a set of constraints. The main constraints for the overall project were that it should be very easy to use, it should be free and it should be available anywhere in the country (or world, for that matter).

Raining on my Parade

There is free weather data from NOAA but it is not accessible through an API as far as I can tell. My first attempt at this was semi-successful, in that I mostly could produce the result that I wanted. Using Pandas read_html got me 80% of the way there, but the multi-row headers meant that I needed to use Beautiful Soup to slog my way through the HTML. It was still hard to use though, as the user would have to find the closest weather station and the output still had METAR codes like FEW038 for cloud cover, which would need to be decoded.

I was in the process of writing an article about how you should avoid feature creep and keep things simple when I decided to do one more search for possible weather data sources or associated libraries.

The Silver Lining

I found a British company, OpenWeatherMap, that has a free tier of service. Even better, they have a well documented API and it allows you to pass in a geolocation. From the geolocation, they find the nearest weather station, and they have hundreds of thousands of stations worldwide. You need to sign up and get an API key. Some sample code using the requests package is below:

def weather_at_location(latitude, longitude, api_key) -> pd.DataFrame:
results = pd.DataFrame()
try:
api_url_base = 'https://api.openweathermap.org/data/2.5/onecall'
exclusions = ','.join(['minutely'])
url = api_url_base
# For temperature in Fahrenheit use units=imperial
params = {
'lat': str(latitude),
'lon': str(longitude),
'units': 'imperial',
'exclude': exclusions,
'appid': str(api_key)
}
rr = requests.get(url, params=params, headers=None, stream=True) # headers=api_auth_header
if rr.status_code == requests.codes.ok:
results = rr.json()
rr.raise_for_status()

except Exception as ee:
print(ee)

return results

I didn’t see any way to use the header parameter to pass the API key, so it shows up as part of the URL.

The result is returned as JSON with various embedded lists and dictionaries. I initially had some rather clever and complex code to flatten these out into a single dataframe. When I ran tests on a variety of cities, the data for Marysville, WA caused some trouble. The weather description was a list of dictionaries, which meant that converting the json with pd.DataFrame crashed. This is when I learned about Pandas json_normalize, which then led me to re-write a good chunk of the code. I ended up with simpler, more robust code as a result.

I ended up making two API calls to get the data that I wanted, as the call for current weather only shows hourly results on or after the time it is run. To get the conditions for the hours earlier in the day, I also had to make a call for “historical” data for the same day.

Code is in the cloud

The code for the project is in this GitHub repository:

One fun task was to produce a text representation of the wind direction in degrees, e.g. 340º is NNW. This makes use of the Pandas function cut:

def wind_direction_degrees_to_text(wind_direction: float) -> str:
section_degrees = (360 / 16)
half_bin = (section_degrees / 2)
north_bin_start = 360 - (section_degrees / 2)

# We rotate by a half_bin to make bins monotonically increasing
wbins = [(north_bin_start + ix * section_degrees + half_bin) % 360 for ix in range(17)]
# to make cut work
wbins[-1] = 360

dir_labels = ['N', 'NNE', 'NE', 'ENE', 'E', 'ESE', 'SE',
'SSE', 'S', 'SSW', 'SW', 'WSW', 'W', 'WNW',
'NW', 'NNW']
rx = pd.cut([(wind_direction + half_bin) % 360],
wbins, labels=dir_labels)
return rx[0]

A visual representation of this is seen here. As mentioned above, Pandas json_normalize is a also good function to know about.

The goal of simplicity was certainly met; the only information that needs to be provided is the geolocation of the reporter (and the API key):

reporting_location = (37.335471, -121.806204)
summary = create_weather_summary(reporting_location, api_key)

Hail no

This is the report for Marysville, WA after all the fixes. Note the multiple weather conditions in “Current” and at 10AM.

Current Conditions 2020-05-30 10:23:23
Temperature: 51.44 °F
Wind: 4.7 mph from ENE
Humidity: 93 %
Description: heavy intensity rain, thunderstorm with rain, mist, 90% cloudy
Sunrise: 2020-05-30 05:14:19
Sunset : 2020-05-30 20:58:45

Forecast 2020-05-30 10:23:23
Wind: 8.3 mph from NW
Rain: 20.44 mm
Humidity: 70 %
Description: heavy intensity rain, 98% cloudy

Conditions at 2020-05-30 07:00:00
Temperature: 52.05 °F
Wind: 4.7 mph from SSW
Humidity: 81 %
Description: scattered clouds, 40% cloudy

Conditions at 2020-05-30 10:00:00
Temperature: 51.49 °F
Wind: 4.7 mph from ENE
Humidity: 93 %
Description: thunderstorm with rain, heavy intensity rain, mist, 90% cloudy

Weather station location: (48.05, -122.18)
Reporting location : (48.0517637, -122.1770818)
Reporting location is 0.21 miles from weather station

The final report is far more useful than my original goal. It gets data for the current day at 7AM, 10AM, 1PM, and 4PM, depending on when it is run. As part of the scientific data for the Christmas Bird Count, it might help explain why there were greater or lesser numbers of birds in a particular year.

Not trying to snow you…

Here are some lessons from this particular project.

Just as I was about to give up, I found a much better way of doing it that met all of the requirements: simple, free, ubiquitous. Persistence can pay off.

Do as much testing as you can. I found the Marysville issue and another bug where wind directions close to 360 (North) showed up as NaN (I forgot to do a mod 360). This came from testing hundreds of cities. Never test weather-related code just with California locations (sunny, sunny, sunny…). If this were a product, each one of the cases above would be added to the unit tests.

This also reminded me of the importance of factoring code into functions. It is easy, particular with Jupiter notebooks, to end up with a bunch of code fragments that may get the job done, but are a complete rat’s nest. As I re-factored the code, I found commonalities and ended up fixing bugs in one place, rather than all over.

This project involved a lot of exploratory data analysis, particularly for the first, failed attempt. Handling JSON with oddly nested subcomponents would be annoying to figure out without the interactivity of Jupyter notebooks.

References

--

--

John Hurley

Mathematician, data scientist, equestrian, photographer, birder. I enjoy looking for patterns. https://www.linkedin.com/in/johnhurleyphd/