pyspark.sql.functions.hll_sketch_agg#

pyspark.sql.functions.hll_sketch_agg(col, lgConfigK=None)[source]#

Aggregate function: returns the updatable binary representation of the Datasketches HllSketch configured with lgConfigK arg.

New in version 3.5.0.

Parameters
colColumn or column name
lgConfigKColumn or int, optional

The log-base-2 of K, where K is the number of buckets or slots for the HllSketch

Returns
Column

The binary representation of the HllSketch.

Examples

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([1,2,2,3], "INT")
>>> df.agg(sf.hll_sketch_estimate(sf.hll_sketch_agg("value"))).show()
+----------------------------------------------+
|hll_sketch_estimate(hll_sketch_agg(value, 12))|
+----------------------------------------------+
|                                             3|
+----------------------------------------------+
>>> df.agg(sf.hll_sketch_estimate(sf.hll_sketch_agg("value", 12))).show()
+----------------------------------------------+
|hll_sketch_estimate(hll_sketch_agg(value, 12))|
+----------------------------------------------+
|                                             3|
+----------------------------------------------+