Product of column values
Why there is no PRODUCT function among SQL aggregate functions?
This question is frequently asked in professional social networks. The matter is about product of column values while grouping a table. Function like PRODUCT is absent from SQL Standard; and I don’t know DBMS with support of this function. Good news is that we can express such function via three other functions which are supported by SQL servers. So.
Let’s multiply values in value column of the following table:
select value from (
values(2),(3),(4),(5)
) X(value);
[[ column ]] |
---|
NULL [[ value ]] |
value |
---|
2 |
3 |
4 |
5 |
The following logarithm property helps us: logarithm of product equals sum of logarithms. For our example this means
ln(2*3*4*5) = ln(2) + ln(3) + ln(4) + ln(5)
If we apply reverse function to natural logarithm (Ln), namely exponent (Exp), this gives
exp(ln(2*3*4*5)) = 2*3*4*5 = exp(ln(2) + ln(3) + ln(4) + ln(5))
Thus, we can replace values product with expression at the right part in the equality above.
All that we need to do to solve task is to write this formula in SQL language, taking into account that values are in the value column.
select exp(sum(log(value))) product from (
values(2),(3),(4),(5)
) X(value);
[[ column ]] |
---|
NULL [[ value ]] |
product |
---|
120 |
You can easyly verify the result in your brain or when using Excel :-).
The last solution is not universal. Because logarithm is not defined for the values less or equal zero, the query returns the following error:
Try this when running the query below.
select exp(sum(log(value))) product from (
values(2),(-3),(4),(-5)
) X(value);
[[ column ]] |
---|
NULL [[ value ]] |
Let’s rewrite the solution for account of “impermissible” values with use of the following algorithm:
- If there are zeros among calculated values, the result equals 0.
- If the number of negative values is odd, the product of absolute values of the column has to be multiplying by (-1).
- If the number of negative values is even, the result equals the product of absolute values of the column.
Here is solution with comments, which use this algorithm:
with T as(select * from (values(-2),(-3),(4),(-5)) X(value)),
P as (
select sum(case when value<0 then 1 else 0 end) neg, -- number of negative values
sum(case when value>0 then 1 else 0 end) pos, -- number of positive values
count(*) total -- total number of values
from T)
select case when total <> pos+neg /* there are zeros */ then 0 else
(case when neg%2=1 then -1 else +1 end) *exp(sum(log(abs(value))))
end product
from T,P
where value <> 0
group by neg, pos, total;
[[ column ]] |
---|
NULL [[ value ]] |
product |
---|
-120 |
Notice the condition value <>0 at the last line of the query. Its presence is due to logarithm expression which is being evaluated by SQL Server despite of unprocessing of the corresponding branch of CASE operator when zeros occur. As a result the query returns the error instead of 0.
Smarties have asked yet: “What about NULLs?”
Really, our solution gives 0 in the case of presence of NULLs among values of a column.
Let’s follow common logic of behaviour of aggregate functions - to ignore NULLs when computing.
Below is final solution, which has single difference in comparison with the previous one. Who will guess which is this?
with T as(select * from (values(-2),(-3),(4),(-5), (NULL) ) X(value)),
P as (
select sum(case when value<0 then 1 else 0 end) neg, -- number of negative values
sum(case when value>0 then 1 else 0 end) pos, -- number of positive values
count(value) total -- total number of values
from T)
select case when total <> pos+neg /* there are zeros */ then 0 else
(case when neg%2=1 then -1 else +1 end) *exp(sum(log(abs(value))))
end
product from T,P where value <> 0 group by neg, pos, total;
[[ column ]] |
---|
NULL [[ value ]] |