artemis.generators.common

Base class for collection of data generators

Refer to arrow/python/pyarrow/benchmarks/common.py

Common utils Common Base Builtin Generator

Module Contents

artemis.generators.common.KILOBYTE
artemis.generators.common.MEGABYTE
artemis.generators.common.DEFAULT_NONE_PROB = 0.0
artemis.generators.common.check_random_state(seed)

Turn seed into a numpy.random.RandomState instance Ensures if using multiple generators in code we avoid repeatability problems

https://scikit-learn.org/stable/developers/utilities.html#validation-tools

Parameters

seed: None | int | instance of RandomState

artemis.generators.common._multiplicate_sequence(base, target_size)
artemis.generators.common.get_random_bytes(n, seed=42)

Generate a random bytes object of size n. Note the result might be compressible.

artemis.generators.common.get_random_ascii(n, seed=42)

Get a random ASCII-only unicode string of size n.

artemis.generators.common._random_unicode_letters(n, seed=42)

Generate a string of random unicode letters (slow).

artemis.generators.common._1024_random_unicode_letters
artemis.generators.common.get_random_unicode(n, seed=42)

Get a random non-ASCII unicode string of size n.

class artemis.generators.common.GeneratorBase(name, **kwargs)

Common base class for generators

property random_state(self)
property name(self)

Algorithm name

reset(self)
to_msg(self)
static from_msg(logger, msg)
generate(self)
initialize(self)
__iter__(self)
__next__(self)
sampler(self)
class artemis.generators.common.BuiltinsGenerator(seed=None)

Bases: object

sprinkle(self, lst, prob, value)

Sprinkle value entries in list lst with likelihood prob.

sprinkle_nones(self, lst, prob)

Sprinkle None entries in list lst with likelihood prob.

generate_int_list(self, n, none_prob=DEFAULT_NONE_PROB)

Generate a list of Python ints with none_prob probability of an entry being None.

generate_float_list(self, n, none_prob=DEFAULT_NONE_PROB, use_nan=False)

Generate a list of Python floats with none_prob probability of an entry being None (or NaN if use_nan is true).

generate_bool_list(self, n, none_prob=DEFAULT_NONE_PROB)

Generate a list of Python bools with none_prob probability of an entry being None.

generate_decimal_list(self, n, none_prob=DEFAULT_NONE_PROB, use_nan=False)

Generate a list of Python Decimals with none_prob probability of an entry being None (or NaN if use_nan is true).

generate_object_list(self, n, none_prob=DEFAULT_NONE_PROB)

Generate a list of generic Python objects with none_prob probability of an entry being None.

_generate_varying_sequences(self, random_factory, n, min_size, max_size, none_prob)

Generate a list of n sequences of varying size between min_size and max_size, with none_prob probability of an entry being None. The base material for each sequence is obtained by calling random_factory(<some size>)

generate_fixed_binary_list(self, n, size, none_prob=DEFAULT_NONE_PROB)

Generate a list of bytestrings with a fixed size.

generate_varying_binary_list(self, n, min_size, max_size, none_prob=DEFAULT_NONE_PROB)

Generate a list of bytestrings with a random size between min_size and max_size.

generate_ascii_string_list(self, n, min_size, max_size, none_prob=DEFAULT_NONE_PROB)

Generate a list of ASCII strings with a random size between min_size and max_size.

generate_unicode_string_list(self, n, min_size, max_size, none_prob=DEFAULT_NONE_PROB)

Generate a list of unicode strings with a random size between min_size and max_size.

generate_int_list_list(self, n, min_size, max_size, none_prob=DEFAULT_NONE_PROB)

Generate a list of lists of Python ints with a random size between min_size and max_size.

generate_tuple_list(self, n, none_prob=DEFAULT_NONE_PROB)

Generate a list of tuples with random values. Each tuple has the form (int value, float value, bool value)

generate_dict_list(self, n, none_prob=DEFAULT_NONE_PROB)

Generate a list of dicts with random values. Each dict has the form

{‘u’: int value, ‘v’: float value, ‘w’: bool value}

get_type_and_builtins(self, n, type_name)

Return a (arrow type, list) tuple where the arrow type corresponds to the given logical type_name, and the list is a list of n random-generated Python objects compatible with the arrow type.