Creating New Fields
A new field is defined as an <expression>.
Rollups
These syntax examples a chained ternary condition operator, which is an if-then-else statement.
// Recode age variable into groups
(age < 18) ? null // 0 to 17 is coded as null
: (age < 30) ? 1 // 18 to 29 is coded as 1
: (age < 40) ? 2 // 30 to 39 is coded as 2
: (age < 50) ? 3 // 40 to 49 is coded as 3
: (age < 60) ? 4 // 50 to 59 is coded as 4
5 // 60+ is coded as 5
// Recode income variable into groups
(hhincome < 25000) ? 1 // $0 to <$25k is coded as 1
: (hhincome < 50000) ? 2 // $25k to <$50k is coded as 2
: (hhincome < 75000) ? 3 // $50k to <$75k is coded as 3
: (hhincome < 100000) ? 4 // $75k to <$100k is coded as 4
: 5 // $100k+ is coded as 5
Field values can be numeric or strings. But in calculations, filters are likely more performant when groups are defined numerically.
// Recode state into integer Region (more performant)
state in ("AK", "HI") ? 1 // Pacific
: state in ("WA", "OR", "CA", "NV", "AZ", "UT", "ID", "WY") ? 2 // West
: state in ...
: null
// Recode state into string Region (less performant, but okay)
state in ("AK", "HI") ? "Pacific"
: state in ("WA", "OR", "CA", "NV", "AZ", "UT", "ID", "WY") ? "West"
: state in ...
: null
Rebasing
// Re-base own_product so that it includes all respondents, not just those answering
ifnull(own_product,0) // converts all missing values to 0
// Re-base like_product so that it only includes respondents who own it
own_product==1 ? like_product : null
Netting multiple vars
When netting, it's important to consider missing values. Be sure to understand how or
and and
work against missing values. (Rob to provide links for that)
// Respondent owns at least one product:
(own_product_1==1 or own_product_2==1 or own_product_3==1) ? 1 : 0
Unrolling picks
Consider that there exists a list of 20 brands in a study. A question "Q10" in the survey randomly picks 3 brands per respondent and asks the respondent about only those 3.
The survey data are stored as:
respid | pick1 | pick2 | pick3 | q10_pick1 | q10_pick2 | q10_pick3 |
---|---|---|---|---|---|---|
1 | brand 1 | brand 3 | brand 4 | 6 | 5 | 3 |
2 | brand 4 | brand 10 | brand 20 | 3 | 4 | 1 |
3 | brand 1 | brand 7 | brand 18 | 2 | 3 | 9 |
4 | brand 2 | brand 4 | brand 12 | 1 | 9 | 10 |
5 | brand 3 | brand 13 | brand 16 | 5 | 1 | 1 |
But to make calculations easier, we want to unroll the pick to give us one variable per brand:
respid | q10_brand1 | q10_brand2 | q10_brand3 | q10_brand4 | ... | q10_brand20 |
---|---|---|---|---|---|---|
1 | 6 | . | 5 | 3 | . | |
2 | . | . | . | 3 | 1 | |
3 | 2 | . | . | . | . | |
4 | . | 1 | . | 9 | . | |
5 | . | . | 5 | . | . |
I know some people like to create an additional restructured datafile with 3 fields: {"respid", "brand", "q10"}. That's fine too, but this platform seeks to handle multi-dimensional variables such as these in a way that restructuring isn't necessary. The Collections feature is helpful for this, because it allows an author to tell the system about the dimensions that exist in datasets.
The syntax to unroll:
// q10_brand1:
pick1==1 ? q10_pick1 :
pick2==1 ? q10_pick2 :
pick3==1 ? q10_pick3 :
null
// q10_brand2:
pick1==2 ? q10_pick1 :
pick2==2 ? q10_pick2 :
pick3==2 ? q10_pick3 :
null
...
Here's what it would look like in an addFields statement:
{
$addFields: {
q10_brand1: "(pick1==1) ? q10_pick1 : (pick2==1) ? q10_pick2 : (pick3==1) ? q10_pick3 : null",
q10_brand2: "(pick1==2) ? q10_pick1 : (pick2==2) ? q10_pick2 : (pick3==2) ? q10_pick3 : null",
q10_brand3: "(pick1==3) ? q10_pick1 : (pick2==3) ? q10_pick2 : (pick3==3) ? q10_pick3 : null",
q10_brand4: "(pick1==4) ? q10_pick1 : (pick2==4) ? q10_pick2 : (pick3==4) ? q10_pick3 : null",
q10_brand5: "(pick1==5) ? q10_pick1 : (pick2==5) ? q10_pick2 : (pick3==5) ? q10_pick3 : null",
q10_brand6: "(pick1==6) ? q10_pick1 : (pick2==6) ? q10_pick2 : (pick3==6) ? q10_pick3 : null",
q10_brand7: "(pick1==7) ? q10_pick1 : (pick2==7) ? q10_pick2 : (pick3==7) ? q10_pick3 : null",
q10_brand8: "(pick1==8) ? q10_pick1 : (pick2==8) ? q10_pick2 : (pick3==8) ? q10_pick3 : null",
q10_brand9: "(pick1==9) ? q10_pick1 : (pick2==9) ? q10_pick2 : (pick3==9) ? q10_pick3 : null",
q10_brand10: "(pick1==10) ? q10_pick1 : (pick2==10) ? q10_pick2 : (pick3==10) ? q10_pick3 : null",
q10_brand11: "(pick1==11) ? q10_pick1 : (pick2==11) ? q10_pick2 : (pick3==11) ? q10_pick3 : null",
q10_brand12: "(pick1==12) ? q10_pick1 : (pick2==12) ? q10_pick2 : (pick3==12) ? q10_pick3 : null",
q10_brand13: "(pick1==13) ? q10_pick1 : (pick2==13) ? q10_pick2 : (pick3==13) ? q10_pick3 : null",
q10_brand14: "(pick1==14) ? q10_pick1 : (pick2==14) ? q10_pick2 : (pick3==14) ? q10_pick3 : null",
q10_brand15: "(pick1==15) ? q10_pick1 : (pick2==15) ? q10_pick2 : (pick3==15) ? q10_pick3 : null",
q10_brand16: "(pick1==16) ? q10_pick1 : (pick2==16) ? q10_pick2 : (pick3==16) ? q10_pick3 : null",
q10_brand17: "(pick1==17) ? q10_pick1 : (pick2==17) ? q10_pick2 : (pick3==17) ? q10_pick3 : null",
q10_brand18: "(pick1==18) ? q10_pick1 : (pick2==18) ? q10_pick2 : (pick3==18) ? q10_pick3 : null",
q10_brand19: "(pick1==19) ? q10_pick1 : (pick2==19) ? q10_pick2 : (pick3==19) ? q10_pick3 : null",
q10_brand20: "(pick1==20) ? q10_pick1 : (pick2==20) ? q10_pick2 : (pick3==20) ? q10_pick3 : null",
}
}