eumap.misc.sample_groups¶
- sample_groups(points, *group_element_columns, spatial_resolution=None, temporal_resolution=None, date_column='date')[source]¶
Construct group IDs for spatial and temporal cross-validation.
Groups point samples into tiles of spatial_resolution width and height and/or intervals of temporal_resolution size. group_element_columns are also concatenated into the final group ID of each sample.
- Parameters
points (
GeoDataFrame
) – GeoDataFrame containing point samples.*group_element_columns –
Names of additional columns to be concatenated into the final group IDs.
spatial_resolution (
Union
[int
,float
,None
]) – Tile size (both x and y) for grouping, in sample CRS units.temporal_resolution (
Optional
[timedelta
]) – Interval size for grouping.date_column (
str
) – Name of the column containing sample timestamps (as datetime objects).
- Return type
ndarray
- Returns
1D string array containing the group id of each sample.
Examples
>>> import geopandas as gp >>> import pygeos as pg >>> import numpy as np >>> from datetime import datetime, timedelta >>> from sklearn.linear_model import LogisticRegression >>> from sklearn.model_selection import cross_val_score, GroupKFold >>> >>> from eumap.misc import sample_groups >>> >>> # construct some synthetic point data >>> coords = np.random.random((1000, 2)) * 4000 >>> dates = datetime.now() + np.array([*map( >>> timedelta, >>> range(1000), >>> )]) >>> >>> points = gp.GeoDataFrame({ >>> 'geometry': pg.points(coords), >>> 'date': dates, >>> 'group': np.random.choice(['a', 'b'], size=1000), >>> 'predictor': np.random.random(1000), >>> 'target': np.random.randint(2, size=1000), >>> }) >>> >>> # get the point groups >>> groups = sample_groups( >>> points, >>> 'group', >>> spatial_resolution=1000, >>> temporal_resolution=timedelta(days=365), >>> ) >>> >>> print(np.unique(groups)) >>> >>> kfold = GroupKFold(n_splits=5) >>> >>> # cross validate a classifier >>> print(cross_val_score( >>> estimator=LogisticRegression(), >>> X=points.predictor.values.reshape(-1, 1), >>> y=points.target, >>> scoring='f1', >>> groups=groups, # our groups go here >>> ))