SoFunction
Updated on 2024-11-13

Partitioning statistics and batch extraction of raster data with Python

Sometimes we will have this idea, is for a certain region of the raster data, to extract its average value or other statistical indicators, such as in a province to extract the rainfall data for many years, and finally calculate some statistical values by region, or from a number of raster data to extract the value of a certain region to form a series. In order to facilitate the drawing of a diagram to see, for example, like the extraction of the region of a city in the region, and then form a series of data, which can use the rasterstats library, in addition to the partitioning of statistics can also use this library!

The data format used in this experiment is raster(*.tif) and vector(.shp), after the partition statistics operation and raster data extraction are derived from these two types of data. In order to be able to use this rasterstats library, chose to run the script in the google colab platform, because the installation of libraries is too convenient, the old is not installed on win, in the google notebook immediately get it done, and you can store the data to the google cloud disk, directly in the notebook is to be able to link to use!

So now it's time to do the test, using the data that is the raster and vector dataset on the left side
Importing related modules

import geopandas as gpd
import pandas as pd
import numpy as np
import  as plt
import rasterio
import rasterstats
from  import show
# show() method is used to show raster graphics
from  import show_hist
# Used to display histograms
import  as ccrs
import  as cfeature
from  import LongitudeFormatter, LatitudeFormatter

Read vector and raster data using geopandas and rasterio respectively

# Use geopandas to read vector data
districts = gpd.read_file('/content/drive/MyDrive/Datashpraster/Data/Districts/')

# Use rasterio to read raster data, the coordinate projection of raster data and vector data should be the same.
raster = ('/content/drive/MyDrive/Datashpraster/Data/Rainfall Data Rasters/')
# Plot vector and raster data onto an axis, which is not an axis, but a graphic
[''] = 'Times New Roman'
[''] = 20

fig, (ax1,ax2) = (1,2,figsize=(15,6))

show(raster, ax=ax1,title='Rainfall')
# The read-in vector data can be drawn directly by calling gpd's plot() method.
(ax=ax1, facecolor='None', edgecolor='red')
show_hist(raster,ax=ax2,title='hist')

()

Let's plot the results first and see

Read raster data:

# Extract rainfall raster values to numpy array
# Reads from the first band following GDAL rules
rainfall_data = (1)
rainfall_data

Start partition statistics:

# Set coordinate transformation information
affine = 

# Ready to start space partitioning calculations
# The first parameter is a vector partition, the second is a raster, the third is coordinate transformation information, and the fourth is a statistical mean
avg_rallrain = rasterstats.zonal_stats(districts,rainfall_data,affine=affine,stats=['mean'],geojson_out=True)
# avg_rallrain

# In addition to statistical averages,And the maximum and minimum values.

Draw it up, it's just a simple graphic

Of course the second part is more interesting, which is to extract data from multiple scattered raster data to form a sequence

It's all about the tif data.

loop these raster datasets:

Get the extracted result, yes, it's such a sequence of data, and then it's time to plot it

Converting data formats

# Convert Date columns to time-based
data['Date'] = pd.to_datetime(data['Date'], infer_datetime_format=True)

# print(data)

data['Date'] = data['Date'].
print(data)

The result is a simple graph.

# Preparing to draw a graph
fig,(ax1,ax2)= (2,1,figsize=(18,6))
[''] = 15

(x='Date', y='Average_RF_Porto', ax=ax1, kind='bar', title='Avg_Rail_Porto')
(x='Date', y='Average_RF_Faro', ax=ax2, kind='bar', title='Avg_Rail_Faro',color='red')

# Automatically adjust the distribution of graphics
plt.tight_layout()
()

The result is such a sequence plot, the purpose of which is to extract the specified study area from the raster, and then extract the values of the raster and plot them again

Although the feeling is not so fancy figure, but this should still be more practical, especially when large quantities of raster value extraction. As in google colab inside the operation of the steps are more, there may be omitted in the middle of the place, but the important should be in the text, of course, can also be migrated to other places, you can also check out this third-party library tutorials, such as read (1) what is the meaning of the official website of the docs on the writing there, it is very convenient!

Above is the raster data with Python partition statistics and batch extraction of the details, more about Python raster data partition statistics and batch extraction of the information please pay attention to my other related articles!