Field Meta
Field meta is the following extra information about a field.
{
"name": "censusreg",
"type": "number",
"label": "US Region",
"valueLabels": [
{ "value": 1, "label": "Northeast" },
{ "value": 2, "label": "Midwest" },
{ "value": 3, "label": "South" },
{ "value": 4, "label": "West" },
]
}
.name
(Is this technically part of field meta or not? It exists in child fields)
.type
Data type
.label
A description of the field
.valueLabels
Descriptions of values that the field may contain
.fields
Child fields (if type is an object type)
The SPSS file format (*.sav) stores field meta. So, importing an SPSS file as a data source means that field meta may exist. Other file formats such as CSV or Excel don't have field meta and you may need to add it manually.
Stages attempt to preserve field meta wherever possible. See the doc for each Stage to learn if/how each stage modifies field meta.
Field meta can be used by output visualizations, e.g., as column headers, legend labels, etc.
Name (.name)
The unique identifier for the field. This can contain probably any character, but use of unconventional characters or spaces will require quoting (and maybe encoding) them in syntax, perhaps making syntax more verbose.
Label (.label)
A string description of field. Any character is probably allowed.
Data Type (.type)
One of the following values:
Numeric (number) - an integer or flat
String (string)
DateTime (date)
Boolean (bool)
ValueCell (value-cell) - an object with {
.value
(default), .n, (.freq), (.wn), (.uwn) }LabelCell (label-cell) - an object with {
.label
(default), (.syntax), (.name), (.valueFormat) }Object
Unknown
Note for Rob:
- When inspector encounters a value-cell, it will injest .value (actually no, nested inspection will occur)
- When inspector encounters a label-cell, it will injest .label (actually no, nested inspection will occur)
- Also, when data writer writes a value-cell or label-cell, it will write .value or .label respectively and ignore the other properties. (Maybe it could provide the option to break those out into separate fields, or maybe author would need to break them out if they really want them). Actually yeah the .n is most useful for low-n indication in output visualizations, and/or filtering out data cells with low n. And the .syntax is mostly for debugging. So I think they can be ignored when writing out a data file - unless the author decides to pull them out using an addFields stage.
- In syntax, references to an object without the dot specifier will refer to the default property (.value or .label). For example: sort by
q1
means sort byq1.value
if q1 is a value-cell. (maybe not)
// sort stage example
["q1"]
// sort stage example
["q1.value"]
// add fields example
{
"gap": "seg2 - seg1"
}
// add fields example (same as above)
{
"gap": "seg2.value - seg1.value"
}
// example filtering on n
["q1.n > 50"]
Unweighted:
.value
is the unweighted value.n
is the unweighted n
Weighted:
.value
is the weighted value.n
is the weighted n (maybe a user preference should dictate whether or not this is weighted).uwValue
is the unweighted value.uwN
is the unweighted n
// results from an unweighted calc
{
"value": 0.45,
"freq": 45,
"n": 100
}
// results from a weighted calc
{
"value": 0.42, // weighted
"freq": 41.3, // weighted
"n": 98.3, // weighted (maybe a user preference dictates which is priority: weighted or unweighted)
// depending on the above priority, only one of the following would return
"uwN": 100, // unweighted N (uN or uwN?)
"wN": 98.3, // weighted N
}
The results pane, when rendering a numerical value (type number, valueCell, or maybe date), will search for a valueFormat:
- First- it will check its column definition for a valueFormat
- Second- it will check its row for a labelCell containing a valueFormat, using the leaf cell (latest occurance) as a priority if multiple exist.
Value Labels (.valueLabels)
An array of entries to provide a description of the values found in the field. Each entry has a .value and .label.
{
"name": "c_pref",
"label": "Favorite type of cheese",
"type": "number",
"valueLabels": [
{ "value": 1, "label": "American" },
{ "value": 2, "label": "Swiss" },
{ "value": 3, "label": "Provolone" },
{ "value": 4, "label": "Pepper Jack" },
]
}
// val can be a string
// I MIGHT NOT ALLOW THIS. SURE SPSS DOES, BUT MAYBE I WON'T.
// Spss only allows a few chars for val, which doesn't seem amazingly helpful.
// But it could be helpful.
{
"label": "US State",
"valueLabels": [
{ "value": "AL", "label": "Alabama" },
{ "value": "AK", "label": "Alaska" },
{ "value": "AZ", "label": "Arizona" },
{ "value": "AR", "label": "Arkansas" },
// ...
{ "value": "WY", "label": "Wyoming" },
]
}
Child Fields (.fields)
Array of child fields. These are found in output from aggregation stages, as a result of multiple dimensions as columns and/or the output columns being valueCells that contain both a value and an n.
{
"name": "cell",
"label": "Result",
"type": "valueCell",
"fields": [
{ "name": "value", "type": "number" },
{ "name": "n", "type": "number" },
]
}
Load from another field
Actually this is just a placeholder, because loading from another field depends on the context. The possible contexts are:
- When using addFields or Select stage in pipeline (see below)
- When stacking files in a data flow ?? maybe not here
{
// todo
}
Example in Select Stage
// SELECT stage
[
{ "name": "Q1" }, // this will automatically inherit meta from Q1
{ "name": "mm", "syntax": "month" }, // syntax is smart enough to pull meta from month
{ "name": "S3", "syntax": "wave < 5 ? null : S3" }, // syntax evaluator pulls meta from S3
{
"name": "Q1_rebased",
"syntax": "ifnull(Q1,0)",
// this syntax could figure out it should still probably inherit meta from Q1, but should it?
// unless it's something like ifnull(Q1,Q2) -- then it wouldn't know the meta
// i think if syntax is provided, and the syntax isn't a columnExpression,
// do we ask for this?
// it would suck if you have a large bank of vars to rebase
"metaFrom": "Q1" // pulls label and valuelabels from Q1
// or:
"label": { "$fromField": "Q1" }, // pulls label from Q1
"valueLabels": { "$fromField": "Q1" } // pulls valuelabels from Q1
},
{
"name": "Q1_rollup",
"syntax": "Q1 in (3,4,5) ? 3 : Q1", // this syntax doesn't know how to pull meta
// meta goes here:
"label": "Recoded question about something",
"valueLabels": [
{ "value": 1, "label": "Item one" },
{ "value": 2, "label": "Item two" },
{ "value": 3, "label": "Item three, four or five" }
]
}
]
Pipeline Output
A pipeline returns: fields, data.
fields
is an array of fieldsdata
is an array of records
{
"fields": [
{ "name": "Q1", "label": "Question about something", "valueLabels": [] },
{ "name": "Q1_rebased", "label": "Question about something", "valueLabels": [] },
{ "name": "Q1_rollup", "label": "Recoded question about something", "valueLabels": [] },
// more fields here
],
"data": [
{ "Q1": 4, "Q1_rebased": 4, "Q1_rollup": 1 },
{ "Q1": 7, "Q1_rebased": 7, "Q1_rollup": 2 },
// more records here...
]
}
Pipeline Output Nested Fields (from Aggregation Stage)
{
"fields": [ // top level fields
{ "name": "label": "label": "Some label" },
// from dim1
{
"name": "seg1",
"label": "Segment 1",
// from dim2
"fields": [
{ "name": "male", "label": "Male", "type": "valueCell" },
{ "name": "female", "label": "Female", "type": "valueCell" }
]
},
// from dim1
{
"name": "seg2",
"label": "Segment 2",
// from dim2
"fields": [
{ "name": "male", "label": "Male", "type": "valueCell" },
{ "name": "female", "label": "Female", "type": "valueCell" }
]
},
// from dim1
{
"name": "seg3",
"label": "Segment 3",
// from dim2
"fields": [
{ "name": "male", "label": "Male", "type": "valueCell" },
{ "name": "female", "label": "Female", "type": "valueCell" }
]
},
],
"data": [
// first row
{
"label": "Product A1000",
"seg1": {
"male": { "value": 12.34, n: 1000 },
"female": { "value": 12.34, n: 1000 },
},
"seg2": {
"male": { "value": 12.34, n: 1000 },
"female": { "value": 12.34, n: 1000 },
},
"seg3": {
"male": { "value": 12.34, n: 1000 },
"female": { "value": 12.34, n: 1000 },
},
},
// next rows go here
]
}