Skip to main content

Stack

Stage as JSON

Format:

{
// stack may contain resolutions to resolve stacking conflicts
}

ROUGH DRAFT NOTES Stacking files (in data flow)

** MOVE THIS TO STACK STAGE DOCS

When files from different waves are stacked, the data flow should check the meta and warn about discrepancies. It won't warn about every difference. It will only warn if there are conflicts.

A conflict is:

  • A field label is different
  • A value label is different (same value but different label)

A noted discrepancy (not a conflict) is:

  • A new or dropped field
  • A new value label on an existing field
  • A dropped value label on a propagated field

Should we also perform an inspection an notify discrepancies?

  • Maybe if a user wants to manually inspect

How are conflicts resolved?

  • When a field label is different:
    • Pick which label to use. The stack stage needs a conflictResolution section which specifies.
  • When a value label is different (same value but different label)
    • User could pick which label to use (in a conflictResolution spec)
    • If the meaning is different, user should create a recode in advance of the stack stage
{
"$stack": {
"ds1": "<datasource 1 id>",
"ds2": "<datasource 2 id>",

// metaPick is OPTIONAL
// It tells the stacker which meta to use.
// If not provided, the stacker will MERGE meta from both dataset and stop only if there are conflicts
// It is used for resolution of conflicts (if any)

// take meta from second dataset
"metaPick": 2, // use second data source for all meta

// or:

// object layout concept:
"metaPick": {
"Q1": 1, // take label and valueLabels from first dataset
"Q3": {
"label": 2 // take label from second dataset
// note: valueLabels will be auto-merged
},
"Q4": {
// note: label not affected
"valueLabels": 2 // take all valueLabels from second dataset
},
"Q5": {
"valueLabels": {
"3": 2 // take value label for val 3 from second dataset
}
},
"Q10": {
"label": { "$merge": "{1} -- {2}" } // concatenate the two labels
},
"Q11": {
"valueLabels": {
"3": { "$merge": "pre 2022: {1}; 2022+: {2}" } // merge the labels
}
}
},

// array layout concept:
"metaMerge": [ // OPTIONAL!

// for Q1, take meta from first dataset
{ "$pickField": { "field": "Q1", "from": 1 }}

// for Q2, use the label from first dataset
{ "$pickLabel": { "field": "Q2", "from": 1 } }

// for Q2, concatenate the labels
{ "$pickLabel": { "field": "Q2", "concat": "{1} -- {2}" } }

// for Q3's value 7, use the valuelabel from the second dataset
{ "$pickValueLabel": { "field": "Q3", "value": "7", "from": 2 }}


// should there exist a fallback?
// i.e. "for all other conflicts, use ..."
// no. because it's not conflict only.

// regardless of conflicts, take meta from second dataset
// (this should be the only metaFlow statement as it would trump all others)
{ "$pickAll": { "use": 2 } } // NOT USING THIS


]
}
}