Approximate count distinct in 12c release1


The new and optimized SQL function, APPROX_COUNT_DISTINCT(), provides approximate count distinct aggregation. Processing of large volumes of data is significantly faster than the exact aggregation, especially for data sets with a large number of distinct values, with negligible deviation from the exact result.

The need to count distinct values is a common operation in today's data analysis. Optimizing the processing time and resource consumption by orders of magnitude while providing almost exact results speeds up any existing processing and enables new levels of analytical insight.

APPROX_COUNT_DISTINCT returns the approximate number of rows that contain distinct values of expr.

This function provides an alternative to the COUNT (DISTINCT expr) function, which returns the exact number of rows that contain distinct values of expr. APPROX_COUNT_DISTINCT processes large amounts of data significantly faster than COUNT, with negligible deviation from the exact result.

For expr, you can specify a column of any scalar data type other than BFILE, BLOB, CLOB, LONG, LONG RAW, or NCLOB.

APPROX_COUNT_DISTINCT ignores rows that contain a null value for expr. This function returns a NUMBER.

Example

SELECT APPROX_COUNT_DISTINCT(term_id) AS "Terms"
  FROM ra_customer_trx_all
  where trx_date>='01-aug-2017';

Experiment results

From 350 thousand transactions, 12 distinct payment terms fetched less than 75 seconds

Comments

Popular posts from this blog

How to use lexical parameters in Data template (XML document) R12

Query to get FSG report details with row/column set

Sub Inventory Transfer API.