ERA5-Land provides global hourly high resolution information of climate variables produced by the Copernicus Climate Change Service (C3S) at the European Centre for Medium-Range Weather Forecasts (ECMWF). It contains most meterological variables that we use including wind, temperature, precipitation, and many more. The ERA5-Land dataset covers the period from 1950 to 5 days before the current date and is updated daily. A detailed description of the dataset can be found here: Overview.
However, ERA5-Land is a gridded dataset at 0.1° x 0.1° spatial resolution, which may not be what we need. Depends on the research question and the model, we often need to re-grid the data to a different spatial resolution.
Method 1: Re-grid GeoTIFF data using Python
Special thanks to Dr. Shanti Shwarup Mahto for his contribution to this section!
This method is intended to be used after downloading ERA5 Data using Method 2 outlined in ERA5 Data Download
This code allows you to re-grid the GeoTIFF data you downloaded and output them in netCDF format, which is what we use most of the time. Here the process is done locally, so remember to download all the data you need from your Google Drive and put them in the correct folder before you start running the code below. Here we show the process of re-gridding the ERA5 Land data to 0.05° grid.
importrasterioimportnumpyasnpfromnetCDF4importDatasetimportosfromrasterio.warpimportcalculate_default_transform,reproject,Resamplingfromtqdmimporttqdm# For progress bar
# Defind the interval you want in degrees
interval=0.05# Define input and output folders
input_folder='ERA5_Hourly_raw'output_folder='ERA5_Hourly_'+str(interval)# Ensure the output directory exists
ifnotos.path.exists(output_folder):os.makedirs(output_folder)# Define the desired latitude and longitude range
lat_min,lat_max=0.125,29.975lon_min,lon_max=95.025,109.975# Create target longitude and latitude arrays
lon=np.arange(lon_min,lon_max+interval,interval)lat=np.arange(lat_max,lat_min-interval,-interval)# Reverse latitude to correct upside-down issue
# Process each .tif file in the input folder
tif_files=[fforfinos.listdir(input_folder)iff.endswith('.tif')]fortif_fileintqdm(tif_files,desc="Converting files"):print(f"Processing {tif_file}...")input_path=os.path.join(input_folder,tif_file)output_path=os.path.join(output_folder,tif_file[:-4]+'.nc')year=tif_file[9:-8]# Open the .tif file using rasterio
withrasterio.open(input_path)assrc:# Define the transform for the desired resolution and bounds
dst_transform,width,height=calculate_default_transform(src.crs,src.crs,len(lon),len(lat),left=lon_min,bottom=lat_min,right=lon_max,top=lat_max)# Create an empty array for the reprojected data
param=np.empty((src.count,height,width),dtype=np.float32)# Reproject each band using bilinear interpolation
foriintqdm(range(src.count),desc="Reprojecting bands",leave=False):reproject(source=src.read(i+1),destination=param[i],src_transform=src.transform,src_crs=src.crs,dst_transform=dst_transform,dst_crs=src.crs,resampling=Resampling.bilinear)# Define time array
time=np.arange(param.shape[0])# Create the NetCDF file
withDataset(output_path,'w',format='NETCDF4')asnc:# Create dimensions
nc.createDimension('longitude',len(lon))nc.createDimension('latitude',len(lat))nc.createDimension('time',len(time))# Create variables
longitude=nc.createVariable('longitude','f4',('longitude',))latitude=nc.createVariable('latitude','f4',('latitude',))times=nc.createVariable('time','i4',('time',))param_var=nc.createVariable('2m_temperature','f4',('time','latitude','longitude'),zlib=True,complevel=4)# Assign data to variables
longitude[:]=lonlatitude[:]=lattimes[:]=timeparam_var[:,:,:]=param# Add attributes
longitude.units='degrees_east'latitude.units='degrees_north'times.units=f'days since {year}-01-01'param_var.units='degree Celsius'print(f'NetCDF file created for {tif_file}')
Method 2: Re-grid the data using Climate Data Operators in Linux
Climate Data Operators is a collection of command line Operators to manipulate and analyze Climate data developed by Max Planck Institute for Meteorology. CDO is an incredibly powerful tool to process climate data, which carries out complex operations in a single line or two. For more information, check out Overview. If you are interested in a comprehensive guide to CDO, check out the tutorial here: User Guide. Unfortunately, CDO only works in a Linux environment, so you need to setup a Linux environment to use this, and figure out a way to transfer file between Linux and Windows.
Installing CDO in Linux is easy:
sudo apt-get install cdo
In Windows, you will have to build it from source. A instruction is given here CDO for Windows. I have not tested this, so feel free to give it a try and update the Lab Manual if it works!
In CDO, we use the remapbil function to re-grid the data, which performs a bilinear interpolation. This can be done in a single line:
cdo remapbil,targetgrid ifile ofile
The targetgrid part is slightly nuanced. Essentially, you will need a file (often .txt) that contains the number of rows, columns, cell size, and the coordinates of the lower left corner. Kindly approach me or Dr. Shanti for the grid desciption file. you can read more about it here Re-gridding with CDO
ERA5-Land provides global hourly high resolution information of climate variables produced by the Copernicus Climate Change Service (C3S) at the European Centre for Medium-Range Weather Forecasts (ECMWF).
ERA5-Land is a gridded dataset at 0.1° x 0.1° spatial resolution and an hourly temporal resolution. It contains most meterological variables that we use including wind, temperature, precipitation, and many more. The ERA5-Land dataset covers the period from 1950 to 5 days before the current date and is updated daily. A detailed description of the dataset can be found here: Overview.
Why do I need a script to download it?
Although there is a website to download the data, you will find out that you cannot multi-select Year or Month using the website.
Method 1: Downloading data using Climate Data Store API
Disclaimer: Currently there seems to be a rather small limit on how much data you can download at once. If you need to bulk download data e.g., 10+ years, it is recommended to take the detour lined out in Method 2.
First you will need a ECMWF account. you can register an account for free here: Registration. This will give you a personal API. Install the Climate Data Store API in your local environment just like any other Python library:
pip install cdsapi
To setup the API, follow the instructions here: API Setup.
You can now request and download data using a python script like the one below. The official website Website contains a helpful tool that helps you to generate the Python code, but you need to change some of the parameters due to the multi-select limit mentioned above.
importcdsapidataset="reanalysis-era5-land"request={"variable":["2m_temperature","10m_u_component_of_wind","10m_v_component_of_wind","total_precipitation"],# Check the variable name on the official website
"year":["2021","2022","2023"],# list of years
"month":["01","02","03","04","05","06","07","08","09","10","11","12",],# list of months
"day":["01","02","03","04","05","06","07","08","09","10","11","12","13","14","15","16","17","18","19","20","21","22","23","24","25","26","27","28","29","30","31"],# list of days
"time":["00:00","01:00","02:00","03:00","04:00","05:00","06:00","07:00","08:00","09:00","10:00","11:00","12:00","13:00","14:00","15:00","16:00","17:00","18:00","19:00","20:00","21:00","22:00","23:00"],# list of timestamps
"data_format":"netcdf","download_format":"zip","area":[90,-180,-90,180]# North, West, South, East
}client=cdsapi.Client()client.retrieve(dataset,request).download()
Method 2: Downloading data using Google Earth Engine
The Google Earth Engine method is slightly more complicated, but it allows us to bypass the download limit in Method 1. You can find out more about the dataset here: Catalog. Unfortunately, Google’s website only gives you the instruction in JavaScript, so let’s take a look at how to download it in python.
Step 1: Create a Earth-Engine-Enabled Google Cloud Project
Google requires a Cloud Project to use the Google Earth Engine authentication flow. Create one here: Create Google Cloud Project. Remember the name of your project, it will be needed later when calling the API.
You will then need to enable the Google Earth Engine API for the project you just created here: Enabling API for Your Project. Make sure you are signed in and double check the project name in the upper left hand corner.
Step 2: Authenticate inside Python
First install the Google Earth Engine API in your Python environment:
pip install earthengine-api
You will then need to import and authenticate the API. Run the following code:
You will be prompted to follow a URL in your Python IDE to generate a code and paste in a box in your Python IDE. After that you are done! One thing to note, the authentication expires after being idle for one week, so if you come back to your project after a while, you may need to do the authentication again.
Step 3: Download the data
Special thanks to Dr. Shanti Shwarup Mahto for his contribution to this section!
After authenticating the Google Earth Engine API, you can download the data using the code below. Remember to run the authentication bloc before this!
importosimporttimefromdatetimeimportdatetime,timedelta# Define the Area of Interest (AOI)
geometry=[95,30,110,0]# Top-left(lon, lat) and Bottom-right(lon, lat) coordinates
geometry=ee.Geometry.Rectangle(geometry)# Change with your location in the Google Drive
dtdr='/'os.chdir(dtdr)data_download_directory1='ERA5_Hourly_raw'# Folder name in your Google Drive, if the folder does not exist, it will create a new folder.
start_date=datetime(2009,7,1)end_date=datetime(2018,12,31)current_date=start_datewhilecurrent_date<=end_date:next_date=current_date+timedelta(days=1)dataset=ee.ImageCollection('ECMWF/ERA5_LAND/HOURLY') \
.filterDate(current_date.strftime('%Y-%m-%d'),next_date.strftime('%Y-%m-%d')) \
.filterBounds(geometry)count=dataset.size().getInfo()print(f"Number of images between {current_date.strftime('%Y-%m-%d')} and {next_date.strftime('%Y-%m-%d')}: {count}")# Select the variable
defprocess_image(image):layer1=image.select('temperature_2m').subtract(273.15)# Correct method name
returnlayer1.copyProperties(image,['system:time_start'])dataset=dataset.map(process_image)dataset=dataset.map(lambdaimage:image.clip(geometry))# Make a stack by combining all 24 hourly images into a single multi-band image
hourly_stack=dataset.toBands()print(current_date)#================================= Daily Stack
export_params1={'image':hourly_stack,'description':f"ERA5Land_{current_date.strftime('%Y%m%d')}",'scale':11000,'fileFormat':'GeoTIFF','region':geometry,'crs':'EPSG:4326','folder':data_download_directory1,'maxPixels':1e13,'formatOptions':{'cloudOptimized':True}}# Export the image as a cloud-optimized GeoTIFF to Google Drive
task=ee.batch.Export.image.toDrive(**export_params1)task.start()print(f"Exporting daily file: {current_date.strftime('%Y-%m-%d')}")current_date=current_date+timedelta(days=1)
A caveat of this is that the data is downloaded in your Google Drive instead of locally, so you will need to retrieve it from your Google Drive. Another caveat is that this code only creates a series of requests, not downloading the files directly. This means that the files are not downloaded (yet!) after the code has finished running. Depend on the size of your request, your internet speed, and the Google Earth Engine server, you may need to wait a significant while (more than an hour) before all the files requested show up in your Google Drive. Finally, these codes download files in GeoTIFF format, which may or may not be what you need. To post-process this, refer to the tutorial on How to Re-grid ERA5 Climate Data.
A first introduction to Python virtual environments
Edited by: Phumthep Bunnak
The problem: dependency hell
Imaging you’re juggling two Python projects simultaneously. Your first project, analyzing historical stocks data, relies on a specific version of the Pandas package (version 1.5.3, let’s say). Your second project, a cutting-edge machine learning model, demands the latest features of Pandas 2.1.0.
Here’s where things get tricky: A critical function in Pandas has changed between these two versions. In 1.5.3, a function expects a certain argument order; in 2.1.0, the order is completely different. Trying to run both projects in the same environment can lead to errors and crashes and headaches. This is a classic example of “dependency hell”.
The solution: virtual environments
A virtual environment in Python is a self-contained directory that houses a specific Python Installation (interpretor/version) and a set of packages independent of other Python environments. In other words, each virtual environment has its own
Python interpreter. This makes it easy to have multiple Python versions on your machine simultaneously.
Specific versions of Python packages. A Python package (e.g. Pandas) is not shared across environment.
Python virtual environments are isolated workspaces for your projects, preventing package and version conflicts. This isolation extends to software installed in other virtual environments and the default Python installation that might come with your operating system. Each environment is disposable and not tracked by version control systems like Git 1. You can customize each virtual environment to match a project’s specific requirements, ensuring ease of deployment and reproducibility across machines. Having a distinct virtual environment for each project helps maintain a clean and streamlined project workflow, free of dependency issues.
Several tools can create and manage virtual environments, but conda shines for our research group’s coding projects because conda has a large community of users, making it easy to find support. Ultimately, the choice of virtual-environment manager depends on your project needs.
Why choose Anaconda over Pip (and when to use both)
If Python comes with pip, a perfectly functioning package installer, why bother with conda? The short answer is that conda offers capabilities beyond pip’s scope. While pip solely manages Python packages, conda not only manages Python packages but also Python versions themselves, along with non-Python dependencies like C/C++ libraries often required by scientific computing or data analysis tools. Using pip outside a virtual environment can lead to conflicts with other system-wide Python applications, as pip installs packages globally. In contrast, conda ensures that your project’s dependencies remain isolated and don’t interfere with other installations. Additionally, conda’s intelligent solver automatically identifies and resolves conflicts among packages, saving time by avoiding the need to manually installing and removing multiple dependencies.
It’s important to note that conda and pip are not mutaully exclusive and can be used together effectively. In fact, a conda environment comes with pip pre-installed, enabling their simultaneous use in some workflow. The recommended practice is to first install all necessary packages using conda. Anaconda boasts curated channels with a wide selection of Python packages. However, if a specific package is not available through Anaconda channels, then you can easily switch to the bundled pip within your conda environment to install the package from the PyPI ecosystem.
Creating Virtual Environments with Anaconda (3 Ways)
We will now explore three cases for creating a virtual environment using Anaconda.
An environment.yml file is similar to requirements.txt but often contains more detailed dependency information.
Best Practices
Keep your projects organized and prevent conflicts by having one environment per project
Use descriptive names for environments (e.g., stock_analysis_v1) to avoid confusion
Keep an updated environment.yml or requirements.txt file when developing a project
[Bonus] Using pip to install a local project to a conda environment
Instead of just installing packages from online sources, pip allows you to install Python projects directly from your local machine. This setup is especially useful when you have developed your own packages or working with code not yet published online. Here’s how to do it within a conda environment.
Activate an environment (or create one if you haven’t)
conda activate <your_environment_name>
Navigate to your project folder. Use your terminal to move to the root directory of your local project containing the setup.py or pyproject.toml file.
Install with pip using the following command:
pip install -e .
The -e flag (short for “editable”) installs your project in “development mode.” This means that any changes you make to your project’s code will be immediately reflected in the conda environment without needing to reinstall. With these steps, you can develop and use your local Python package!
Links
For more details: https://docs.python.org/3/library/venv.html ↩︎
A brief overview on my experience as a Section Editor for JWRPM
These months are marking the end of my experience as a Section Editor for the Journal of Water Resources Planning and Management. Here’s a list of common issues I found in the papers I handled during the past years: