artemis.io.collector

Collector monitors Arrow memory-pool and manages file creation, spills to disk data on-demand, and flushes memory pool when required.

Module Contents

class artemis.io.collector.CollectorOptions
max_malloc = 2147483648
class artemis.io.collector.Collector(name, **kwargs)

Bases: artemis.core.algo.IOAlgoBase

initialize(self)
book(self)
execute(self)

Check total allocated memory in Arrow and call collect Collect does not ensure the file flushed Tuning on total allocated memory and the max output buffer size before spill

_collect(self)

Collect all batches from the leaves Occurs after single input source is chunked Each chunked converted to a batch Batches on leaves collected Input file -> Output Arrow RecordBatches

finalize(self)

Ensure the data store is empty Spill any remaining arrow buffers to disk

_flush_buffer(self)