Geometry and Series Blocks

Geometry-type blocks contain sets of geometries, optionally with 'start' and 'end' fields and other properties. Internally, geometry data is stored in GeoPandas GeoDataframes.

API Specification

Module containing the base geometry block classes.

class dask_geomodeling.geometry.base.GeometryBlock(*args)

The base block for geometries

All geometry blocks must be derived from this base class and must implement the following attributes:

  • columns: a set of column names to expect in the dataframe

A geometry request contains the following fields:

  • mode: 'intersects' or 'extent'. default 'intersects'.
  • geometry: limit returned objects to objects that intersect with this shapely geometry object
  • projection: projection to return the geometries in as WKT string
  • limit: the maximum number of geometries
  • min_size: geometries with a bbox that is smaller than this on all sides are left out
  • start: start date as UTC datetime
  • stop: stop date as UTC datetime
  • filters: dict of Django ORM-like filters on properties (e.g. id=598)

The data response contains the following:

  • if mode was 'intersects': a DataFrame of features with properties
  • if mode was 'extent': the bbox that contains all features

To be able to perform operations on properties, there is a helper type called``SeriesBlock``. This is the block equivalent of a pandas.Series. You can get a SeriesBlock from a GeometryBlock, perform operations on it, and set it back into a GeometryBlock.

class dask_geomodeling.geometry.base.SeriesBlock(*args)

A helper block for GeometryBlocks, representing one single field

class dask_geomodeling.geometry.base.GetSeriesBlock(source, name)

Get a column from a GeometryBlock.

Parameters:
  • source (GeometryBlock) – GeometryBlock
  • name (string) – name of the column to get
Returns:

SeriesBlock containing the property column

class dask_geomodeling.geometry.base.SetSeriesBlock(source, column, value, *args)

Set one or multiple columns (SeriesBlocks) in a GeometryBlock.

Parameters:
  • source (GeometryBlock) – source to add the extra columns to
  • column (string) – name of the column to be set
  • value (SeriesBlock, scalar) – series or constant value to set
  • args – string, SeriesBlock, …, repeated multiple times
Returns:

the source GeometryBlock with additional property columns

Example:
>>> SetSeriesBlock(view, 'column_1', series_1, 'column_2', series_2)

dask_geomodeling.geometry.aggregate

Module containing raster blocks that aggregate rasters.

class dask_geomodeling.geometry.aggregate.AggregateRaster(source, raster, statistic='sum', projection=None, pixel_size=None, max_pixels=None, column_name='agg', auto_pixel_size=False, *args)

Compute zonal statistics and add them to the geometry properties

Parameters:
  • source (GeometryBlock) – the source of geometry data
  • raster (RasterBlock) – the source of raster data
  • statistic (string) – the type of statistic to perform. can be 'sum', 'count', 'min', 'max', 'mean', 'median', 'p<percentile>'.
  • projection (string or None) – the projection to perform the aggregation in
  • pixel_size (float or None) – the pixel size to perform aggregation in
  • max_pixels (int or None) – the maximum number of pixels to use for aggregation. defaults to the geomodeling.raster-limit setting.
  • column_name (string) – the name of the column to output the results
  • auto_pixel_size (boolean) – determines whether the pixel_size is adjusted when a raster is too large. Default False.
Returns:

GeometryBlock with aggregation results in column_name

The currently implemented statistics are sum, count, min, max, mean, median, and percentile. If projection or max_resolution are not given, these are taken from the provided RasterBlock.

The count statistic calculates the number of active cells in the raster. A percentile statistic can be selected using text value starting with ‘p’ followed by something that can be parsed as a float value, for example 'p33.3'.

Only geometries that intersect the requested bbox are aggregated. Aggregation is done in a specified projection and with a specified pixel size.

Should the combination of the requested pixel_size and the extent of the source geometry cause the requested raster size to exceed max_pixels, the pixel_size is adjusted automatically if auto_pixel_size = True, else a RuntimeError is raised.

The global raster-limit setting can be adapted as follows:
>>> from dask import config
>>> config.set({"geomodeling.raster-limit": 10 ** 9})
class dask_geomodeling.geometry.aggregate.AggregateRasterAboveThreshold(source, raster, statistic='sum', projection=None, pixel_size=None, max_pixels=None, column_name='agg', auto_pixel_size=False, threshold_name=None)

Aggregate raster values ignoring values below some threshold. The thresholds are supplied per geometry.

Parameters:
  • source (GeometryBlock) – the source of geometry data
  • raster (RasterBlock) – the source of raster data
  • statistic (string) – the type of statistic to perform. can be 'sum', 'count', 'min', 'max', 'mean', 'median', 'p<percentile>'.
  • projection (string) – the projection to perform the aggregation in
  • pixel_size (float) – the pixel size to perform aggregation in
  • max_pixels (int) – the maximum number of pixels to use for aggregation
  • column_name (string) – the name of the column to output the results
  • auto_pixel_size (boolean) – determines whether the pixel_size is adjusted when a raster is too large. Default False.
  • threshold_name (string) – the name of the column with the thresholds
Returns:

GeometryBlock with aggregation results in column_name

See also:
dask_geomodeling.geometry.aggregate.AggregateRaster

dask_geomodeling.geometry.constructive

Module containing geometry block constructive operations

class dask_geomodeling.geometry.constructive.Buffer(source, distance, projection, resolution=16)

Buffer geometries.

Parameters:
  • source (GeometryBlock) – the geometry source
  • distance (float) – a distance measure in the given projection.
  • projection (string) – an EPSG or WKT string, e.g. EPSG:28992.
  • resolution (int) – quarter circle segments. Default is 16.
distance

Buffer distance.

The unit (e.g. m, °) is determined by the projection.

projection

Projection used for buffering.

resolution

Buffer resolution.

class dask_geomodeling.geometry.constructive.Simplify(source, tolerance=None, preserve_topology=True)

Simplify geometries up to given tolerance.

Parameters:
  • source (GeometryBlock) – the geometry source
  • tolerance (float) – the simplification tolerance. if no tolerance is given, the min_size request param is used.
  • preserve_topology (boolean) – whether to preserve topology. Default True.

dask_geomodeling.geometry.field_operations

Module containing geometry block operations that act on non-geometry fields

class dask_geomodeling.geometry.field_operations.Classify(source, bins, labels, right=True)

Classify a continuous-valued property into binned categories

Parameters:
  • source (SeriesBlock) – source data to classify
  • bins (list) – a 1-dimensional and monotonic list of bins. How values outside of the bins are classified, depends on the length of the labels. If len(labels) = len(bins) - 1, then values outside of the bins are classified to NaN. If len(labels) = len(bins) + 1, then values outside of the bins are classified to the first and last elements of the labels list.
  • labels (list) – the labels for the returned bins
  • right (boolean) – whether the intervals include the right or the left bin edge
See also:
https://pandas.pydata.org/pandas-docs/version/0.23.4/generated/pandas.cut.html
class dask_geomodeling.geometry.field_operations.ClassifyFromColumns(source, value_column, bin_columns, labels, right=True)

Classify a continuous-valued property based on bins located in different columns.

Parameters:
  • source (GeometryBlock) – geometry source to classify
  • value_column (string) – the column name that contains values to classify
  • bin_columns (list) – column names in which the bins are stored. The bins values need to be sorted in increasing order.
  • labels (list) – specifies the labels for the returned bins
  • right (boolean) – whether the intervals include the right or the left bin edge Default True.
See also:
dask_geomodeling.geometry.field_operations.Classify
class dask_geomodeling.geometry.field_operations.Add(source, other)

Addition of series and other, element-wise.

See also:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.add.html
class dask_geomodeling.geometry.field_operations.Subtract(source, other)

Subtraction of series and other, element-wise.

See also:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.subtract.html
class dask_geomodeling.geometry.field_operations.Multiply(source, other)

Multiplication of series and other, element-wise.

See also:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.multiply.html
class dask_geomodeling.geometry.field_operations.Divide(source, other)

Floating division of series and other, element-wise.

Putting source in the divisor is not possible: please use the Power for that instead.

See also:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.divide.html
class dask_geomodeling.geometry.field_operations.FloorDivide(source, other)

Integer (floor) division of series and other, element-wise.

See also:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.floordiv.html
class dask_geomodeling.geometry.field_operations.Power(source, other)

Power (exponent) of series and other, element-wise.

See also:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.pow.html
class dask_geomodeling.geometry.field_operations.Modulo(source, other)

Modulo of series and other, element-wise.

See also:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.mod.html
class dask_geomodeling.geometry.field_operations.Equal(source, other)

Equal to of series and other, element-wise.

See also:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.eq.html
class dask_geomodeling.geometry.field_operations.NotEqual(source, other)

Not equal to of series and other, element-wise.

See also:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.ne.html
class dask_geomodeling.geometry.field_operations.Greater(source, other)

Greater than of series and other, element-wise.

See also:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.gt.html
class dask_geomodeling.geometry.field_operations.GreaterEqual(source, other)

Greater than or equal to of series and other, element-wise.

See also:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.ge.html
class dask_geomodeling.geometry.field_operations.Less(source, other)

Less than of series and other, element-wise.

See also:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.lt.html
class dask_geomodeling.geometry.field_operations.LessEqual(source, other)

Less than or equal to of series and other, element-wise.

See also:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.le.html
class dask_geomodeling.geometry.field_operations.And(source, other)

Logical AND between series and other.

class dask_geomodeling.geometry.field_operations.Or(source, other)

Logical OR between series and other.

class dask_geomodeling.geometry.field_operations.Xor(source, other)

Logical XOR between series and other.

class dask_geomodeling.geometry.field_operations.Invert(source, *args)

Logical NOT operation on a series.

class dask_geomodeling.geometry.field_operations.Where(source, cond, other)

Replace values where the condition is False.

Parameters:
  • source (SeriesBlock) – source data
  • cond (SeriesBlock) – condition that determines whether to keep values from source
  • other (SeriesBlock, scalar) – entries where cond is False are replaced with the corresponding value from other.
See also:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.where.html
class dask_geomodeling.geometry.field_operations.Mask(source, cond, other)

Replace values where the condition is True.

Parameters:
  • source (SeriesBlock) – source data
  • cond (SeriesBlock) – condition that determines whether to mask values from source
  • other (SeriesBlock, scalar) – entries where cond is True are replaced with the corresponding value from other.
See also:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.mask.html
class dask_geomodeling.geometry.field_operations.Round(source, decimals=0)

Round each value in a SeriesBlock to the given number of decimals

Parameters:
  • source (SeriesBlock) – source data
  • decimals (int) – number of decimal places to round to (default: 0). If decimals is negative, it specifies the number of positions to the left of the decimal point.
See also:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.round.html

dask_geomodeling.geometry.geom_operations

Module containing operations that return series from geometry fields

class dask_geomodeling.geometry.geom_operations.Area(source, projection)

Block that calculates the area of geometries.

Parameters:
  • source (GeometryBlock) – geometry data
  • projection (string) – projection as EPSG or WKT string to compute area in
Returns:

SeriesBlock with only the computed area

dask_geomodeling.geometry.merge

Module containing merge operation that act on geometry blocks

class dask_geomodeling.geometry.merge.MergeGeometryBlocks(left, right, how='inner', suffixes=('', '_right'))

Merge two GeometryBlocks into one by index

Parameters:
  • left (GeometryBlock) – left geometry data to merge
  • right (GeometryBlock) – right geometry data to merge
  • how (string) – type of merge to be performed. One of ‘left’, ‘right’, ‘outer’, ‘inner'. Default ‘inner’.
  • suffixes (tuple) – suffix to apply to overlapping column names in the left and right side, respectively. Default ('', '_right').
See also merge:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html

dask_geomodeling.geometry.parallelize

Module containing blocks that parallelize non-geometry fields

class dask_geomodeling.geometry.parallelize.GeometryTiler(source, size, projection)

Parallelize operations on a GeometryBlock by tiling the request

Parameters:
  • source (GeometryBlock) – GeometryBlock
  • size (float) – the max size of a tile in units of the projection
  • projection (string) – the projection as EPSG or WKT string in which to compute tiles

Only supports ‘centroid’ and ‘extent’ request modes.

dask_geomodeling.geometry.set_operations

Module containing geometry block set operations

class dask_geomodeling.geometry.set_operations.Difference(source, other)

Block that calculates the difference of two GeometryBlocks.

The resulting GeometryBlock will have all geometries in source, and if there are geometries with the same ID in other, the geometries will be adapted using the Difference operation.

class dask_geomodeling.geometry.set_operations.Intersection(source, other=None)

Block that intersects geometries with the requested geometry.

Parameters:source (GeometryBlock) – the source of geometry data

dask_geomodeling.geometry.sources

Module containing geometry sources.

class dask_geomodeling.geometry.sources.GeometryFileSource(url, layer=None, id_field='id')

A geometry source that opens a geometry file from disk.

Parameters:
  • url – URL to the file. File paths have to be contained inside the current root setting. Relative paths are interpreted relative to this setting but internally stored as absolute paths).
  • layer (string) – the layer_name in the json to use as source. If None, the first layer is used.
  • id_field (string) – the field name to use as unique ID. Default 'id'.

The input of these blocks is by default limited to 10000 geometries.

Relevant settings can be adapted as follows:
>>> from dask import config
>>> config.set({"geomodeling.root": '/my/data/path'})
>>> config.set({"geomodeling.geometry-limit": 100000})

dask_geomodeling.geometry.text

Module containing text column operations that act on geometry blocks

class dask_geomodeling.geometry.text.ParseTextColumn(source, source_column, key_mapping)

Parses a text column into (possibly multiple) value columns.

Key, value pairs need to be separated by an equal (=) sign.

Parameters:
  • source (GeometryBlock) – data source
  • source_column (string) – existing column in source.
  • key_mapping (dict) – mapping containing pairs {key_name: column_name}: key_name: existing key in the text to be parsed. column_name: name of the new column created that contains the parsed value.