Save
The save stage provides a convenient way to export data to file. With the filename specified in the stage, the user can just click the "run" button to save, bypassing a file chooser window. It also allows the user to specify options if needed.
Example stage syntax (filename only):
"my file.zsav"
The file extension will determine file format. The following are supported:
- SPSS (.sav)
- Compressed SPSS (.zsav)
- Excel (.xlsx)
- CSV (.csv)
You may include a folder in the path. But that folder must already exist to avoid error.
"out/myfile.sav" // the folder "out" must already exist
You may declare as an object to provide options:
{
"filename": "my file.xlsx",
"exportValueLabels": true
}
You may declare multiple output files, using an array:
[
"my file.sav",
"my compressed file.zsav",
{
"filename": "my excel file.xlsx",
// options
}
]
To specify files in a cross-platform manner, use relative paths and forward slashes.
If you need to use an absolute path on Windows, remember to escape the backslashes in JSON:
{
// if your OS uses backslashes in file path,
// remember to escape the backslashes in the JSON ("\\").
"filename": "D:\\Projects\\ACME\\data\\myfile.zsav"
}
Use absolute path (MacOS/Linux):
{
"filename": "~/Projects/ACME/data/myfile.sav"
}
Suggestion
In long-running data flows, intermediate outputs can be saved between major groups of stages to improve efficiency and flexibility. These staging output files enable selective re-execution of the pipeline, allowing only affected sections to be rerun instead of the entire data flow.
For example, consider a data flow with three sections: (1) merging multiple waves of survey data, (2) unrolling picks, and (3) applying weighting and recoding variables. If each section writes a staging output, changes made to the final section (such as updating a recode) can be executed without recomputing earlier stages.
To configure staging outputs:
- Add a Save stage immediately after the section you want to persist.
- Specify an output path (for example, "staging/stage1.zsav").
- Create a data source stage that uses the saved file as its input.
- Connect this data source to the subsequent section of the pipeline.
This configuration allows downstream stages to consume previously saved results, reducing execution time and isolating changes to specific parts of the data flow.