artemis.externals.tdigest

Package Contents

class artemis.externals.tdigest.TDigest(delta=0.01, K=25)

Bases: object

__add__(self, other_digest)
__len__(self)
__repr__(self)
__iter__(self)

Iterates over centroids in the digest.

_add_centroid(self, centroid)
_compute_centroid_quantile(self, centroid)
_update_centroid(self, centroid, x, w)
_find_closest_centroids(self, x)
_threshold(self, q)
update(self, x, w=1)

Update the t-digest with value x and weight w.

batch_update(self, values, w=1)

Update the t-digest with an iterable of values. This assumes all points have the same weight.

compress(self)
percentile(self, p)

Computes the percentile of a specific value in [0,100].

cdf(self, x)

Computes the cdf of a specific value, ie. computes F(x) where F denotes the CDF of the distribution.

trimmed_mean(self, p1, p2)

Computes the mean of the distribution between the two percentiles p1 and p2. This is a modified algorithm than the one presented in the original t-Digest paper.

centroids_to_list(self)

Returns a Python list of the TDigest object’s Centroid values.

to_dict(self)

Returns a Python dictionary of the TDigest and internal Centroid values. Or use centroids_to_list() for a list of only the Centroid values.

update_from_dict(self, dict_values)

Updates TDigest object with dictionary values.

The digest delta and K values are optional if you would like to update them, but the n value is not required because it is computed from the centroid weights.

For example, you can initalize a new TDigest:

digest = TDigest()

Then load dictionary values into the digest:

digest.update_from_dict({‘K’: 25, ‘delta’: 0.01, ‘centroids’: [{‘c’: 1.0, ‘m’: 1.0}, {‘c’: 1.0, ‘m’: 2.0}, {‘c’: 1.0, ‘m’: 3.0}]})

Or update an existing digest where the centroids will be appropriately merged:

digest = TDigest() digest.update(1) digest.update(2) digest.update(3) digest.update_from_dict({‘K’: 25, ‘delta’: 0.01, ‘centroids’: [{‘c’: 1.0, ‘m’: 1.0}, {‘c’: 1.0, ‘m’: 2.0}, {‘c’: 1.0, ‘m’: 3.0}]})

Resulting in the digest having merged similar centroids by increasing their weight:

{‘K’: 25, ‘delta’: 0.01, ‘centroids’: [{‘c’: 2.0, ‘m’: 1.0}, {‘c’: 2.0, ‘m’: 2.0}, {‘c’: 2.0, ‘m’: 3.0}], ‘n’: 6.0}

Alternative you can provide only a list of centroid values with update_centroids_from_list()

update_centroids_from_list(self, list_values)

Add or update Centroids from a Python list. Any existing centroids in the digest object are appropriately updated.

Example

digest.update_centroids([{‘c’: 1.0, ‘m’: 1.0}, {‘c’: 1.0, ‘m’: 2.0}, {‘c’: 1.0, ‘m’: 3.0}])