Skip to main content

VARIANCE, VAR_SAMP, VAR_POP (SQL)

Aggregate functions that return the statistical variance of a data set.

Synopsis

VARIANCE([ ALL | DISTINCT [BY(col-list)] ] 
  expression 
  [ %FOREACH(col-list) ] [ %AFTERHAVING ])

VAR_SAMP([ ALL | DISTINCT [BY(col-list)] ] 
  expression 
  [ %FOREACH(col-list) ] [ %AFTERHAVING ])

VAR_POP([ALL | DISTINCT [BY(col-list)]] 
  expression 
  [%FOREACH(col-list)] [%AFTERHAVING])

Description

These three variance aggregate functions return the statistical variance of the values of expression, after discarding NULL values. That is, the amount of variation from the mean value of the data set, expressed as a positive number. The larger the return value, the more variation there is within the data set of values. InterSystems SQL also supplies aggregate functions to return the standard deviation corresponding to each of these variance functions.

There are slight variations in how this statistical variation is derived:

  • VARIANCE: Returns 0 if all of the values in the data set have the same value (no variability). Returns 0 if the data set consists of only one value (no possible variability). Returns NULL if the data set has no values.

    The VARIANCE calculation is:

    (SUM(expression**2) * COUNT(expression)) - SUM(expression**2)
    _____________________________________________________________
    COUNT(expression) * (COUNT(expression) - 1)
    
  • VAR_SAMP: Sample variance. Returns 0 if all of the values in the data set have the same value (no variability). Returns NULL if the data set consists of only one value (no possible variability). Returns NULL if the data set has no values. Uses the same variant calculation as VARIANCE.

  • VAR_POP: Population variance. Returns 0 if all of the values in the data set have the same value (no variability). Returns 0 if the data set consists of only one value (no possible variability). Returns NULL if the data set has no values.

    The VAR_POP calculation is:

    (SUM(expression**2) * COUNT(expression)) - (SUM(expression) **2)
    _____________________________________________________________
    (COUNT(expression) **2 )
    

These variance aggregate functions can be used in a SELECT query or subquery that references either a table or a view. They can appear in a SELECT list or HAVING clause alongside ordinary field values.

These variance aggregate functions cannot be used in a WHERE clause. They cannot be used in the ON clause of a JOIN, unless the SELECT is a subquery.

These variance aggregate functions return a value of data type NUMERIC with a precision of 36 and a scale of 17, unless expression is data type DOUBLE in which case the function returns data type DOUBLE.

These variance aggregate functions are normally applied to a field or expression that has a numeric value. They evaluate nonnumeric values, including the empty string (''), as zero (0).

These variance aggregate functions ignore NULL values in data fields. If no rows are returned by the query, or the data field value for all rows returned is NULL, they return NULL.

The statistical variance functions, like all aggregate functions, can take an optional DISTINCT clause. VARIANCE(DISTINCT col1) returns the variance of those col1 field values that are distinct (unique). VARIANCE(DISTINCT BY(col2) col1) returns the variance of the col1 field values in records where the col2 values are distinct (unique). Note however that the distinct col2 values may include a single NULL as a distinct value.

Arguments

ALL

An optional argument specifying that statistical variance functions return the variance of all values for expression. This is the default if no keyword is specified.

DISTINCT

An optional DISTINCT clause that specifies that statistical variance functions return the variance of the distinct (unique) expression values. DISTINCT can specify a BY(col-list) subclause, where col-list can be a single field or a comma-separated list of fields.

expression

Any valid expression. Usually the name of a column that contains the data values to be analyzed for variance.

%FOREACH(col-list)

An optional column name or a comma-separated list of column names. See SELECT for further information on %FOREACH.

%AFTERHAVING

An optional argument that applies the condition found in the HAVING clause.

Changes Made During the Current Transaction

Like all aggregate functions, the variance functions always returns the current state of the data, including uncommitted changes, regardless of the current transaction’s isolation level. For further details, refer to SET TRANSACTION and START TRANSACTION.

Examples

The following example uses VARIANCE to return the variance in the ages of the employees in Sample.Employee, and the variance in the distinct ages represented by one or more employees:

SELECT VARIANCE(Age) AS AgeVar,VARIANCE(DISTINCT Age) AS PerAgeVar
     FROM Sample.Employee

The following example uses VAR_POP to return the population variance in the ages of the employees in Sample.Employee, and the variance in the distinct ages represented by one or more employees:

SELECT VAR_POP(Age) AS AgePopVar,VAR_POP(DISTINCT Age) AS PerAgePopVar
     FROM Sample.Employee

See Also

FeedbackOpens in a new tab