eumap.misc.sample_groups

sample_groups(points, *group_element_columns, spatial_resolution=None, temporal_resolution=None, date_column='date')[source]

Construct group IDs for spatial and temporal cross-validation.

Groups point samples into tiles of spatial_resolution width and height and/or intervals of temporal_resolution size. group_element_columns are also concatenated into the final group ID of each sample.

Parameters
  • points (GeoDataFrame) – GeoDataFrame containing point samples.

  • *group_element_columns

    Names of additional columns to be concatenated into the final group IDs.

  • spatial_resolution (Union[int, float, None]) – Tile size (both x and y) for grouping, in sample CRS units.

  • temporal_resolution (Optional[timedelta]) – Interval size for grouping.

  • date_column (str) – Name of the column containing sample timestamps (as datetime objects).

Return type

ndarray

Returns

1D string array containing the group id of each sample.

Examples

>>> import geopandas as gp
>>> import pygeos as pg
>>> import numpy as np
>>> from datetime import datetime, timedelta
>>> from sklearn.linear_model import LogisticRegression
>>> from sklearn.model_selection import cross_val_score, GroupKFold
>>>
>>> from eumap.misc import sample_groups
>>>
>>> # construct some synthetic point data
>>> coords = np.random.random((1000, 2)) * 4000
>>> dates = datetime.now() + np.array([*map(
>>>     timedelta,
>>>     range(1000),
>>> )])
>>>
>>> points = gp.GeoDataFrame({
>>>     'geometry': pg.points(coords),
>>>     'date': dates,
>>>     'group': np.random.choice(['a', 'b'], size=1000),
>>>     'predictor': np.random.random(1000),
>>>     'target': np.random.randint(2, size=1000),
>>> })
>>>
>>> # get the point groups
>>> groups = sample_groups(
>>>     points,
>>>     'group',
>>>     spatial_resolution=1000,
>>>     temporal_resolution=timedelta(days=365),
>>> )
>>>
>>> print(np.unique(groups))
>>>
>>> kfold = GroupKFold(n_splits=5)
>>>
>>> # cross validate a classifier
>>> print(cross_val_score(
>>>     estimator=LogisticRegression(),
>>>     X=points.predictor.values.reshape(-1, 1),
>>>     y=points.target,
>>>     scoring='f1',
>>>     groups=groups, # our groups go here
>>> ))