Field Notes

Free Hydrology Data Without the Hassle: Where to Download and Start Working

The data is free. The hassle is the plumbing — accounts, formats, undocumented APIs, half a gigabyte of HDF5 you have to clip before you can even plot it. Most people lose a day here before they do any hydrology. This is the catalogue I wish I’d had: what’s free, and the client library that fetches it so you can skip the click-through and start working.

Streamflow and reservoirs

  • USGS NWIS — US streamflow, stage, peaks, ratings. Don’t scrape the website; use the dataretrieval package (Python and R). One call returns a tidy dataframe.
  • GRDC — the Global Runoff Data Centre, the standard source for international discharge records.
  • CAMELS / CAMELS-IND — large-sample catchment datasets with streamflow plus catchment attributes already attached. CAMELS-IND covers Indian catchments and is what I use for regional machine-learning work; it saves you weeks of attribute assembly.
  • India-WRIS / CWC — the national portals for Indian discharge and water resources data.
  • GRanD and ResOpsUS / Global Reservoir Storage — reservoir locations, capacities, and operational storage time series. GRanD underpins my own work on global reservoir recovery.

Precipitation

  • NASA GPM IMERG — half-hourly, 0.1° global satellite precipitation. (I keep a small toolkit for downloading, clipping, and analysing it — see my other post.)
  • CHIRPS — daily, rainfall-focused, long record; excellent for drought and trend work in data-sparse regions.
  • ERA5 — the Copernicus reanalysis; precipitation plus every other forcing variable, fetched with the cdsapi client.
  • IMD gridded — India Meteorological Department gauge-based gridded rainfall (0.25°), the ground truth you validate satellite products against.
  • NOAA GHCN — global daily station records.

Terrain, soils, and land

  • HydroSHEDS / MERIT Hydro — hydrologically conditioned DEMs, river networks, and basin boundaries. This is where catchment delineation starts; my own geomatics toolkit conditions DEMs and extracts stream networks from exactly this kind of data.
  • SRTM and Copernicus DEM — global elevation.
  • SoilGrids — global gridded soil properties. ESA WorldCover and MODIS — land cover and vegetation.

Evapotranspiration, drought, and storage

  • MODIS ET — global evapotranspiration.
  • GRACE / GRACE-FO — total water storage anomalies; indispensable for groundwater and drought.
  • SPEI / SPI global databases — precomputed drought indices if you don’t want to roll your own.

The four habits that kill the hassle

  1. Use the official client, not the browser. dataretrieval for USGS, cdsapi for Copernicus, earthaccess for anything behind NASA Earthdata. These turn a download session into one function call.
  2. Authenticate once, store credentials in the environment. A NASA Earthdata login in a .netrc or environment variable means your script runs unattended. Never hard-code a password into a notebook.
  3. Cache locally. Hit the API once, save the response, and read from disk afterwards. My flood-analysis utilities cache every USGS and NOAA pull so a re-run costs nothing and works offline.
  4. Clip early. Subset to your basin shapefile (in EPSG:4326) the moment the data lands, before you analyse anything. A continental raster becomes a manageable NetCDF, and everything downstream is faster.

There’s also a growing amount of this on AWS Open Data and Google Earth Engine, where you can query and compute without downloading at all — worth it once your study area gets large.

Free data is one of the quiet superpowers of modern hydrology. The barrier was never cost; it was the plumbing. Learn the clients, cache aggressively, clip early — and spend your day on the water, not the wrangling.