Blocks¶
The Block class¶
To write a new Block
subclass, we need to write the following:
- the
__init__
that validates the arguments when constructing the block - the
get_sources_and_requests
that processes the request - the
process
that processes the data - a number of attributes such as
extent
andperiod
About the 2-step data processing¶
The get_sources_and_requests
method of any block is called recursively from
get_compute_graph
and feeds the request from the block to its sources. It
does so by returning a list of (source, request) tuples. During the data
evaluation each of these 2-tuples will be converted to a single data object
which is supplied to the process
function.
First, an example in words. We construct a View
add = RasterFileSource('path/to/geotiff') + 2.4
and ask it the following:
- give me a 256x256 raster at location (138000, 480000)
We do that by calling get_data
, which calls get_compute_graph
, which
calls get_sources_and_requests
on each block instance recursively.
First add.get_sources_and_requests
would respond with the following:
- I will need a 256x256 raster at location (138000, 480000) from
RasterFileSource('/path/to/geotiff')
- I will need 2.4
Then, on recursion, the RasterFileSource.get_sources_and_requests
would
respond:
- I will give you the 256x256 raster at location (138000, 480000)
These small subtasks get summarized in a compute graph, which is returned by
get_compute_graph
. Then get_data
feeds that compute graph to dask.
Dask will evaluate this graph by calling the process
methods on each block:
- A raster is loaded using
RasterFileSource.process
- This, together with the number 2.4, is given to
Add.process
- The resulting raster is presented to the user.
Implementation example¶
As an example, we use a simplified Dilate block, which adds a buffer of 1 pixel around all pixels of given value:
class Dilate(RasterBlock):
def __init__(self, source, value):
assert isinstance(source, RasterBlock):
value = float(value)
super().__init__(source, value)
@property
def source(self):
return self.args[0]
@property
def value(self):
return self.args[1]
def get_sources_and_requests(self, **request):
new_request = expand_request_pixels(request, radius=1)
return [(self.store, new_request), (self.value, None)]
@staticmethod
def process(data, values=None):
# handle empty data cases
if data is None or values is None or 'values' not in data:
return data
# perform the dilation
original = data['values']
dilated = original.copy()
dilated[ndimage.binary_dilation(original == value)] = value
dilated = dilated[:, 1:-1, 1:-1]
return {'values': dilated, 'no_data_value': data['no_data_value']}
@property
def extent(self):
return self.source.extent
@property
def period(self):
return self.source.period
In this example, we see all the essentials of a Block implementation.
- The
__init__
checks the types of the provided arguments and calls thesuper().__init__
that further initializes the block. - The
get_sources_and_requests
expands the request with 1 pixel, so that dilation will have no edge effects. It returns two (source, request) tuples. - The
process
(static)method takes the amount arguments equal to the length of the list thatget_sources_and_requests
produces. It does the actual work and returns a data response. - Some attributes like
extent
andperiod
need manual specification, as they might change through the block. - The class derives from
RasterBlock
, which sets the type of block, and through that its request/response schema and its required attributes.
Block types specification¶
A block type sets three things:
- the response schema: e.g. “RasterBlock.process returns a dictionary with a numpy array and a no data value”
- the request schema: e.g. “RasterBlock.get_sources_and_requests expects a dictionary with the fields ‘mode’, ‘bbox’, ‘projection’, ‘height’, ‘width’”
- the attributes to be implemented on each block
This is not enforced at the code level, it is up to the developer to stick to
this specification. The specification is written down in the type baseclass
RasterBlock()
or
GeometryBlock()
.
API specification¶
Module containing the core graphs.
-
class
dask_geomodeling.core.graphs.
Block
(*args)¶ A class that generates dask-like compute graphs for given requests.
Arguments (args) are always stored in
self.args
. If a request is passed into the Block using theget_data
or (the lazy version)get_compute_graph
method, the Block figures out what args are actually necessary to evaluate the request, and what requests need to be sent to those args. This happens in the methodget_sources_and_requests
.After the requests have been evaluated, the data comes back and is passed into the
process
method.-
classmethod
deserialize
(val, validate=False)¶ Deserialize this block from a dict containing version, graph and name
-
static
from_import_path
(path)¶ Deserialize the Block by importing it from given path.
-
classmethod
from_json
(val, **kwargs)¶ Construct a graph from a json stream.
-
get_compute_graph
(cached_compute_graph=None, **request)¶ Lazy version of get_data, returns a compute graph dict, that can be evaluated with compute (or dask’s get function).
The dictionary has keys in the form
name_token
and values in the formtuple(process, *args)
, whereargs
are the precise arguments that need to be passed toprocess
, with the exception that args may reference to other keys in the dictionary.
-
get_data
(**request)¶ Directly evaluate the request and return the data.
-
get_graph
(serialize=False)¶ Generate a graph that defines this Block and its dependencies in a dictionary.
The dictionary has keys in the form
name_token
and values in the formtuple(Block class, *args)
, whereargs
are the precise arguments that were used to construct the Block, with the exception that args may also reference other keys in the dictionary.If serialize == True, the Block classes will be replaced by their corresponding import paths.
-
classmethod
get_import_path
()¶ Serialize the Block by returning its import path.
-
get_sources_and_requests
(**request)¶ Adapt the request and/or select the sources to be computed. The request is allowed to differ per source.
This function should return an iterable of (source, request). For sources that are no Block instance, the request is ignored.
Exceptions raised here will be raised before actual computation starts. (at .get_compute_graph(request)).
-
static
process
(data)¶ Overridden to modify data from sources in unlimited ways.
Default implementation passes single-source unaltered data.
-
serialize
()¶ Serialize this block into a dict containing version, graph and name
-
to_json
(**kwargs)¶ Dump the graph to a json stream.
-
token
¶ Generates a unique and deterministic representation of this object
-
classmethod
-
dask_geomodeling.core.graphs.
construct
(graph, name, validate=True)¶ Construct a Block with dependent Blocks from a graph and endpoint name.
-
dask_geomodeling.core.graphs.
compute
(graph, name, *args, **kwargs)¶ Compute a graph ({name: [func, arg1, arg2, …]}) using dask.get_sync