artemis.io.collector¶
Collector monitors Arrow memory-pool and manages file creation, spills to disk data on-demand, and flushes memory pool when required.
Module Contents¶
-
class
artemis.io.collector.Collector(name, **kwargs)¶ Bases:
artemis.core.algo.IOAlgoBase-
initialize(self)¶
-
book(self)¶
-
execute(self)¶ Check total allocated memory in Arrow and call collect Collect does not ensure the file flushed Tuning on total allocated memory and the max output buffer size before spill
-
_collect(self)¶ Collect all batches from the leaves Occurs after single input source is chunked Each chunked converted to a batch Batches on leaves collected Input file -> Output Arrow RecordBatches
-
finalize(self)¶ Ensure the data store is empty Spill any remaining arrow buffers to disk
-
_flush_buffer(self)¶
-