artemis.meta.cronus¶
Interface to the Artemis Metadata Store
Module Contents¶
-
class
artemis.meta.cronus.MetaObject¶ Helper data class for accessing a content object metadata The returned class does not give access to the original protobuf that is only accesible via uuid (content’s hash)
-
name:str¶
-
uuid:str¶
-
parent_uuid:str¶
-
address:str¶
-
-
class
artemis.meta.cronus.BaseObjectStore(root, name, store_uuid=None, storetype='hfs', algorithm='sha1', alt_root=None)¶ Bases:
artemis.core.book.BaseBookBase Object Store derives from an OrderedDict-like class
-
property
store_name(self)¶
-
property
store_uuid(self)¶
-
property
store_info(self)¶
-
property
store_aux(self)¶
-
_load_from_path(self, name, id_)¶
-
save_store(self)¶
-
register_content(self, content, info, **kwargs)¶ Returns a dataclass representing the content object content is the raw data, e.g. serialized bytestream to be persisted hash the bytestream, see for example github.com/dgilland/hashfs
info object can be used to call the correct register method and validate all the required inputs are received
Metadata model includes: Menu metadata (Menu protobuf) Configuration metadata (config protobuf) Dataset metadata
Dataset metadata include: Partition keys Job Ids Dataset protobuf Log file Hists protobuf Job protobuf Data files Table (Schema) protobuf
- Parameters
buf (bytestream, object ready to be persisted) –
info (associated metadata object describing the content of buf) –
- Other Parameters
dataset_id (required for logs, files, tables, hists)
partition_key (required for files and tables)
job_id (job index)
menu_id (uuid of a stored menu)
config_id (uuid of a stored configuration)
glob (pattern for selecting files in an existing directory)
content (pass a serialized blob to compute hash for uuid)
- Returns
- Return type
MetaObject dataclass
-
register_dataset(self, menu_id=None, config_id=None)¶ dataset creation occurs before persisting storing information works as a datasink Datasets are not a persisted object in the datastore
- Parameters
menu_id (uuid of a stored menu) –
config_id (uuid of a stored configuration) –
- Returns
- Return type
MetaObject dataclass describing the dataset content object
-
register_log(self, dataset_id, job_id)¶ log file content
- Parameters
dataset_id (uuid of a dataset) –
job_id (index of job for this log) –
- Returns
- Return type
MetaObject dataclass describing the log content object
-
update_dataset(self, dataset_id, buf)¶
-
new_job(self, dataset_id)¶ Increment job counter of a dataset
- Parameters
dataset_id (uuid of a registered dataset) –
-
new_partition(self, dataset_id, partition_key)¶ Add a partition key to a dataset Artemis datastreams are associated to partitions via the graph leaf
- Parameters
dataset_id (uuid of dataset) –
partition_key (Leaf node name of menu) –
-
put(self, id_, content)¶ Writes data to kv store Support for: data wrapped as a pyarrow Buffer protocol buffer message
- Parameters
id_ (uuid of object) –
content (pyarrow Buffer or protobuf msg) –
-
get(self, id_, msg=None)¶ Retrieves data from kv store Support for: pyarrow ipc file or stream pyarrow input_stream, e.g. csv, fwf, … bytestream protobuf message
- Parameters
id_ (uuid of content) –
msg (protobuf message to be parsed into) –
- Returns
In-memory buffer of data
Deserialized protobuf message in python class instance
Note – User must know protobuf message class to deserialize
-
open(self, id_)¶ Open a stream for reading Enables chunking of data Relies on the metaobject to determine how to read the file
- Parameters
id_ (uuid of object to open in kv store) –
- Returns
- Return type
pyarrow IO handler
-
list(self, prefix='', suffix='')¶
-
list_partitions(self, dataset_id)¶
-
list_jobs(self, dataset_id)¶
-
list_tdigests(self, dataset_id)¶
-
list_histograms(self, dataset_id)¶
-
_compute_hash(self, stream)¶
-
_register_config(self, config, configinfo)¶ Takes a config protbuf bytestream
-
_register_partition_table(self, table, tableinfo, dataset_id, job_id, partition_key, file_id=None)¶ dataset uuid job key partition key file uuid – optional for tables
extracted from an input file or an output RecordBatchFile
-
_register_partition_file(self, buf, fileinfo, dataset_id, job_id, partition_key)¶ Requires dataset uuid partition key job key file uuid
-
_register_hists(self, hists, histsinfo, dataset_id, job_id)¶ Requires uuid of dataset generate a hists uuid from buffer job key common to all jobs in a dataset keep an running index of hists? extension hists.data dataset_id.job_name.hists_id.dat
-
_register_tdigests(self, tdigests, tdigestinfo, dataset_id, job_id)¶ Requires uuid of dataset generate a hists uuid from buffer job key common to all jobs in a dataset keep an running index of hists? extension hists.data dataset_id.job_name.hists_id.dat
-
_register_job(self, meta, jobinfo, dataset_id, job_id)¶ Requires uuid of dataset generate a hists uuid from buffer job key common to all jobs in a dataset keep an running index of hists? extension hists.data dataset_id.job_name.hists_id.dat
-
_register_file(self, location, fileinfo, dataset_id, partition_key)¶ Returns the content identifier for a file that is already in a store Requires a stream as bytes
-
_register_dir(self, location, glob, fileinfo, dataset_id, partition_key)¶ Registers a directory of files in a store
-
__setitem__(self, id_, msg)¶ book[key] = value enfore immutible store
-
_put_message(self, id_, msg)¶
-
_get_message(self, id_, msg)¶
-
_put_object(self, id_, buf)¶
-
_get_object(self, id_)¶
-
_parse_url(self, id_)¶
-
_open_ipc_file(self, id_)¶
-
_open_ipc_stream(self, id_)¶
-
_open_stream(self, id_)¶
-
property
-
class
artemis.meta.cronus.JobBuilder(root, store_name, store_id, menu_id, config_id, dataset_id, job_id)¶ Class the simulate functionality of Artemis
-
execute(self)¶ Execute simulates creating data creating associating metaobject storing data and metadata
returns a serialized dataset object for updating a final store
-