eumap.parallel.blocks.RasterBlockReader

class RasterBlockReader(reference_file=None)[source]

Bases: object

Thread-parallel reader for large rasters.

If reference_file is not None, builds an R-tree index [1] of the block geometries read from the reference_file on initialization. All rasters read with the initialized reader are assumed to have identical geotransforms and block structures to the reference.

Parameters

reference_file (Optional[str]) – Path (URL) of the reference raster.

For full usage examples please refer to the block processing tutorial notebook [2].

References

[1] pygeos STRTree

[2] Raster block processing tutorial

Examples

>>> from eumap.parallel.blocks import RasterBlockReader
>>> from eumap.misc import ttprint
>>>
>>> fp = 'https://s3.eu-central-1.wasabisys.com/eumap/lcv/lcv_landcover.hcl_lucas.corine.rf_p_30m_0..0cm_2019_eumap_epsg3035_v0.1.tif'
>>>
>>> ttprint('initializing reader')
>>> reader = RasterBlockReader(fp)
>>> ttprint('reader initialized')

Methods

read_overlay

Thread-parallel reading of large rasters within a bounding geometry.

read_overlay(src_path, geometry, band=1, geometry_mask=True, max_workers=2, optimize_threadcount=True)[source]

Thread-parallel reading of large rasters within a bounding geometry.

Only blocks that intersect with geometry are read. Returns a generator yielding (data, mask, window) tuples for each block, where data are the stacked pixel values of all rasters at mask==True, mask is the reduced (via bitwise and) block data mask for all rasters, and window is the rasterio.windows.Window [1] for the block within the transform of the reference_file. All rasters read with the initialized reader are assumed to have identical geotransforms and block structures to the reference_file used for initialization. If the reader was initialized with reference_file==None, the first file in src_path is used as the reference and the block R-tree is built before yielding data from the first block.

Parameters
  • src_path (Union[str, Iterable[str]]) – Path(s) (or URLs) of the raster file(s) to read.

  • geometry (dict) – The bounding geometry within which to read raster blocks, given as a dictionary (with the GeoJSON geometry schema).

  • band (int) – Index of band to read from all rasters.

  • geometry_mask (bool) – Indicates wheather or not to use the geometry as a data mask. If False, the block data will be returned in its entirety, regardless if some of it falls outside of the geometry.

  • max_workers (int) – Maximum number of worker threads to use, defaults to multiprocessing.cpu_count().

  • optimize_threadcount (bool) – Wheather or not to optimize number of workers. If True, the number of worker threads will be iteratively increased until the average read time per block stops decreasing or max_workers is reached. If False, max_workers will be used as the number of threads.

Returns

Generator yielding (data, mask, window) tuples for each block.

Return type

Iterator[Tuple(np.ndarray, np.ndarray, rasterio.windows.Window)]

For full usage examples please refer to the block processing tutorial notebook [2].

References

[1] Rasterio Window

[2] Raster block processing tutorial

Examples

>>> geom = {
>>>     'type': 'Polygon',
>>>     'coordinates': [[
>>>         [4765389, 2441103],
>>>         [4764441, 2439352],
>>>         [4767369, 2438696],
>>>         [4761659, 2441949],
>>>         [4765389, 2441103],
>>>     ]],
>>> }
>>> block_data_gen = reader.read_overlay(fp)
>>> data, mask, window = next(block_data_gen)