artemis.meta.cronus

Interface to the Artemis Metadata Store

Module Contents

class artemis.meta.cronus.MetaObject

Helper data class for accessing a content object metadata The returned class does not give access to the original protobuf that is only accesible via uuid (content’s hash)

name :str
uuid :str
parent_uuid :str
address :str
class artemis.meta.cronus.BaseObjectStore(root, name, store_uuid=None, storetype='hfs', algorithm='sha1', alt_root=None)

Bases: artemis.core.book.BaseBook

Base Object Store derives from an OrderedDict-like class

property store_name(self)
property store_uuid(self)
property store_info(self)
property store_aux(self)
_load_from_path(self, name, id_)
save_store(self)
register_content(self, content, info, **kwargs)

Returns a dataclass representing the content object content is the raw data, e.g. serialized bytestream to be persisted hash the bytestream, see for example github.com/dgilland/hashfs

info object can be used to call the correct register method and validate all the required inputs are received

Metadata model includes: Menu metadata (Menu protobuf) Configuration metadata (config protobuf) Dataset metadata

Dataset metadata include: Partition keys Job Ids Dataset protobuf Log file Hists protobuf Job protobuf Data files Table (Schema) protobuf

Parameters
  • buf (bytestream, object ready to be persisted) –

  • info (associated metadata object describing the content of buf) –

Other Parameters
  • dataset_id (required for logs, files, tables, hists)

  • partition_key (required for files and tables)

  • job_id (job index)

  • menu_id (uuid of a stored menu)

  • config_id (uuid of a stored configuration)

  • glob (pattern for selecting files in an existing directory)

  • content (pass a serialized blob to compute hash for uuid)

Returns

Return type

MetaObject dataclass

register_dataset(self, menu_id=None, config_id=None)

dataset creation occurs before persisting storing information works as a datasink Datasets are not a persisted object in the datastore

Parameters
  • menu_id (uuid of a stored menu) –

  • config_id (uuid of a stored configuration) –

Returns

Return type

MetaObject dataclass describing the dataset content object

register_log(self, dataset_id, job_id)

log file content

Parameters
  • dataset_id (uuid of a dataset) –

  • job_id (index of job for this log) –

Returns

Return type

MetaObject dataclass describing the log content object

update_dataset(self, dataset_id, buf)
new_job(self, dataset_id)

Increment job counter of a dataset

Parameters

dataset_id (uuid of a registered dataset) –

new_partition(self, dataset_id, partition_key)

Add a partition key to a dataset Artemis datastreams are associated to partitions via the graph leaf

Parameters
  • dataset_id (uuid of dataset) –

  • partition_key (Leaf node name of menu) –

put(self, id_, content)

Writes data to kv store Support for: data wrapped as a pyarrow Buffer protocol buffer message

Parameters
  • id_ (uuid of object) –

  • content (pyarrow Buffer or protobuf msg) –

get(self, id_, msg=None)

Retrieves data from kv store Support for: pyarrow ipc file or stream pyarrow input_stream, e.g. csv, fwf, … bytestream protobuf message

Parameters
  • id_ (uuid of content) –

  • msg (protobuf message to be parsed into) –

Returns

  • In-memory buffer of data

  • Deserialized protobuf message in python class instance

  • Note – User must know protobuf message class to deserialize

open(self, id_)

Open a stream for reading Enables chunking of data Relies on the metaobject to determine how to read the file

Parameters

id_ (uuid of object to open in kv store) –

Returns

Return type

pyarrow IO handler

list(self, prefix='', suffix='')
list_partitions(self, dataset_id)
list_jobs(self, dataset_id)
list_tdigests(self, dataset_id)
list_histograms(self, dataset_id)
_compute_hash(self, stream)
_register_menu(self, menu, menuinfo)
_register_config(self, config, configinfo)

Takes a config protbuf bytestream

_register_partition_table(self, table, tableinfo, dataset_id, job_id, partition_key, file_id=None)

dataset uuid job key partition key file uuid – optional for tables

extracted from an input file or an output RecordBatchFile

_register_partition_file(self, buf, fileinfo, dataset_id, job_id, partition_key)

Requires dataset uuid partition key job key file uuid

_register_hists(self, hists, histsinfo, dataset_id, job_id)

Requires uuid of dataset generate a hists uuid from buffer job key common to all jobs in a dataset keep an running index of hists? extension hists.data dataset_id.job_name.hists_id.dat

_register_tdigests(self, tdigests, tdigestinfo, dataset_id, job_id)

Requires uuid of dataset generate a hists uuid from buffer job key common to all jobs in a dataset keep an running index of hists? extension hists.data dataset_id.job_name.hists_id.dat

_register_job(self, meta, jobinfo, dataset_id, job_id)

Requires uuid of dataset generate a hists uuid from buffer job key common to all jobs in a dataset keep an running index of hists? extension hists.data dataset_id.job_name.hists_id.dat

_register_file(self, location, fileinfo, dataset_id, partition_key)

Returns the content identifier for a file that is already in a store Requires a stream as bytes

_register_dir(self, location, glob, fileinfo, dataset_id, partition_key)

Registers a directory of files in a store

__setitem__(self, id_, msg)

book[key] = value enfore immutible store

_put_message(self, id_, msg)
_get_message(self, id_, msg)
_put_object(self, id_, buf)
_get_object(self, id_)
_parse_url(self, id_)
_open_ipc_file(self, id_)
_open_ipc_stream(self, id_)
_open_stream(self, id_)
class artemis.meta.cronus.JobBuilder(root, store_name, store_id, menu_id, config_id, dataset_id, job_id)

Class the simulate functionality of Artemis

execute(self)

Execute simulates creating data creating associating metaobject storing data and metadata

returns a serialized dataset object for updating a final store