Boolean Expression
A Boolean expression is an Expression that returns true, false or null. Boolean expressions are used to describe Filters or Groups. They are also used inside
Measures for aggregation functions such as pct(<boolean expression>)
.
Examples of Boolean Expressions
country==1
segment in (1,2,3)
likely_to_recommend in (8:10)
age>=20 and age<30
s10_1==1 or s10_2==1 or s10_3==1 or s10_4==1
a==1 or (b==1 and c==2)
Useful syntax
In, Not in
<expr1> in (<list>)
returns true if expr1 matches any of the values or ranges specified by <list>
.
<expr1> not in (<list>)
returns true if expr1 matches none of the values or ranges specified by <list>
.
list
is a comma separated set of values or ranges.
If expr1 is null, the entire expression returns null (even if list contains a null, which would be pointless).
See Null Handling section below to include checks for null.
Values:
q2 in (1,2,9) // q2==1 or q2==2 or q2==9
q2 in (1) // q2==1 (a single value is fine)
q2 not in (1,2,9) // q2!=1 and q2!=2 and q2!=9
Range (inclusive) using ":" operator
A range is specified by min:max
and includes the min and max, (e.g., min ≤ x ≤ max).
However, min and max are optional. When omitted, the endpoint is unbounded.
q1 in (8:10) // equivalent to: q1 >= 8 and q1 <= 10
q1 in (:10) // equivalent to: q1 <= 10
q1 in (8:) // equivalent to: q1 >= 8
Range (exclusive max) using ".." operator
A partially exclusive range with is specified by min..emax
. It includes the min but not the max, (e.g., min ≤ x <
emax)
min and emax are optional. When omitted, the endpoint is unbounded.
age in (10..20) // equivalent to: age >= 10 and age < 20
age in (..20) // equivalent to: age < 20
age in (10..) // equivalent to: age >= 10
age not in (50..) // equivalent to: age < 50
Excluding the endpoint is convenient when declaring age ranges or income groups, keeping the syntax concise without double-counting, e.g.,
age in (10..20) // age 10 to 19.9999...
age in (20..30) // age 20 to 29.9999...
age in (30..40) // age 30 to 39.9999...
age in (50..) // age 50+
income in (..25000) // under $25k
income in (25000..50000) // $25k to less than $50k
income in (50000..75000) // $50k to less than $75k
income in (75000..100000) // $75k to less than $100k
income in (100000..) // $100k+
Mixing values and ranges
Comma-separated values and ranges may be mixed inside a list.
x in (1, 2, 3, 5:7) // x==1 or x==2 or x==3 or (x >= 5 and x <= 7)
x in (10, 40:) // x==10 or x >= 40
x in (:20, 99) // x <= 20 or x==99
age in (..18, 80..) // age is less than 18 or greater than or equal to 80
Null Handling (missing values)
The value null indicates a value that is missing/unknown or unassigned. It is not possible to compare an unknown value to another value using comparison operators such as (==, !=, >, in, etc).
To check for null values, use <expression> is null
.
Never use <expression> == null
(or any other comparison operator) to check for null values. The comparison will always return null if one of the values in the comparison is null.
// good
segment is null // correct way to check for null
// bad
segment == null // don't do this. It will always return null.
Consider if we allowed checking for null by using equals (versus an explicit 'is null'). A statement such as:
first_mention == favorite_brand
... would return true if both fields equal null. But that's not correct, because null values means that we don't know either first_mention or favorite_brand.
To check for valid values, use <expression> is not null
or <expression> is valid
.
Scenario 1 (Null Value Handling):
If we want a group to include every respondent where segment doesn't equal 5, we would use the syntax:
segment != 5
But it's important to note that we are excluding everyone where segment is null. This is because null reflects a missing or unknown value. If the value is unknown, we don't know if the respondent belongs to segment 5 or not, and we cannot include the record.
If we want a group to include every respondent where segment != 5 including respondents where segment is unknown, we need to explicity include the unknowns...
segment is null or segment != 5 // includes EVERY record where segment doesn't equal 5
Other ways to write it:
// includes all records:
ifnull(segment,0) != 5
segment ?? 0 != 5
Another example of this problem:
Suppose we want to include everyone that does NOT own a product. Some of the people that do not own the product are coded as own_product==0, but some of the people that do not own the product weren't even asked about it, so own_product is missing.
The following syntax doesn't give us everyone we need, because it only includes respondents with own_product==0 but excludes respondents where own_product is missing.
own_product==0 // this doesn't return true if own_product is missing!
own_product | expression | returns |
---|---|---|
1 | own_product==0 | false |
0 | own_product==0 | true |
null | own_product==0 | null |
The Solution:
If we want to include respondents where own_product is missing OR own_product==0, we convert missing values using ifnull(expr, fallback). The ifnull function returns fallback any time a missing value is encountered.
ifnull(own_product,0)==0 // replaces nulls with zero
own_product | expression | returns |
---|---|---|
1 | ifnull(own_product,0)==0 | false |
0 | ifnull(own_product,0)==0 | true |
null | ifnull(own_product,0)==0 | true |
Note for Rob: I think we should probably also allow the null coalesce operator as a replacement for ifnull.
own_product ?? 0 == 0
Another way to do it
own_product is null or own_product==0