eumap.parallel.blocks.RasterBlockReader¶
- class RasterBlockReader(reference_file=None)[source]¶
Bases:
object
Thread-parallel reader for large rasters.
If
reference_file
is notNone
, builds an R-tree index [1] of the block geometries read from thereference_file
on initialization. All rasters read with the initialized reader are assumed to have identical geotransforms and block structures to the reference.For full usage examples please refer to the block processing tutorial notebook [2].
References
[1] pygeos STRTree
[2] Raster block processing tutorial
Examples
>>> from eumap.parallel.blocks import RasterBlockReader >>> from eumap.misc import ttprint >>> >>> fp = 'https://s3.eu-central-1.wasabisys.com/eumap/lcv/lcv_landcover.hcl_lucas.corine.rf_p_30m_0..0cm_2019_eumap_epsg3035_v0.1.tif' >>> >>> ttprint('initializing reader') >>> reader = RasterBlockReader(fp) >>> ttprint('reader initialized')
Methods
Thread-parallel reading of large rasters within a bounding geometry.
- read_overlay(src_path, geometry, band=1, geometry_mask=True, max_workers=2, optimize_threadcount=True)[source]¶
Thread-parallel reading of large rasters within a bounding geometry.
Only blocks that intersect with
geometry
are read. Returns a generator yielding(data, mask, window)
tuples for each block, wheredata
are the stacked pixel values of all rasters atmask==True
,mask
is the reduced (via bitwiseand
) block data mask for all rasters, andwindow
is therasterio.windows.Window
[1] for the block within the transform of thereference_file
. All rasters read with the initialized reader are assumed to have identical geotransforms and block structures to thereference_file
used for initialization. If the reader was initialized withreference_file==None
, the first file insrc_path
is used as the reference and the block R-tree is built before yielding data from the first block.- Parameters
src_path (
Union
[str
,Iterable
[str
]]) – Path(s) (or URLs) of the raster file(s) to read.geometry (
dict
) – The bounding geometry within which to read raster blocks, given as a dictionary (with the GeoJSON geometry schema).band (
int
) – Index of band to read from all rasters.geometry_mask (
bool
) – Indicates wheather or not to use the geometry as a data mask. IfFalse
, the block data will be returned in its entirety, regardless if some of it falls outside of thegeometry
.max_workers (
int
) – Maximum number of worker threads to use, defaults tomultiprocessing.cpu_count()
.optimize_threadcount (
bool
) – Wheather or not to optimize number of workers. IfTrue
, the number of worker threads will be iteratively increased until the average read time per block stops decreasing ormax_workers
is reached. IfFalse
,max_workers
will be used as the number of threads.
- Returns
Generator yielding
(data, mask, window)
tuples for each block.- Return type
Iterator[Tuple(np.ndarray, np.ndarray, rasterio.windows.Window)]
For full usage examples please refer to the block processing tutorial notebook [2].
References
[1] Rasterio Window
[2] Raster block processing tutorial
Examples
>>> geom = { >>> 'type': 'Polygon', >>> 'coordinates': [[ >>> [4765389, 2441103], >>> [4764441, 2439352], >>> [4767369, 2438696], >>> [4761659, 2441949], >>> [4765389, 2441103], >>> ]], >>> } >>> block_data_gen = reader.read_overlay(fp) >>> data, mask, window = next(block_data_gen)