Add Window Fields
Adds field(s) by performing operations on a specified span of records, known as a window.
Operations can include:
- Getting the rank of a record, given a specified sort order.
- Aggregating across the window to find mean, min, max, sum, etc.
Examples Below:
Calculate a Rank (No partitioning)
Windows are defined by partitioning. When a partition isn't specified, there is just one window and it contains all records.
Input records:
Product | Score |
---|---|
Product 01 | 0.3461 |
Product 02 | 0.3096 |
Product 03 | 0.7573 |
Product 04 | 0.8856 |
Product 05 | 0.3959 |
Stage definition:
We are defining a stage that will sort the input records by Score (highest first) and return a rank.
{
$addWindowFields: {
// note: No partitioning defined, so the window is all records.
// sort is required for ranking
sortBy: {
[ field: "Score", order: "descending" ]
},
// new output field to add
output: {
HighScoreRank: "rank()" // rank() is the operation
}
}
}
Output records:
Notice below that 'Product 04' has the highest score and received a rank of 1. 'Product 02' has the lowest score and received a rank of 5.
Product | Score | HighScoreRank |
---|---|---|
Product 01 | 0.3461 | 4 |
Product 02 | 0.3096 | 5 |
Product 03 | 0.7573 | 2 |
Product 04 | 0.8856 | 1 |
Product 05 | 0.3959 | 3 |
Calculate a Rank within Window
Input records:
Country | Product | Score |
---|---|---|
USA | Product 01 | 0.3461 |
USA | Product 02 | 0.3096 |
USA | Product 03 | 0.7573 |
USA | Product 04 | 0.8856 |
USA | Product 05 | 0.3959 |
Canada | Product 01 | 0.4744 |
Canada | Product 02 | 0.7804 |
Canada | Product 03 | 0.7071 |
Canada | Product 04 | 0.2232 |
Canada | Product 05 | 0.5828 |
Stage definition:
This stage definition is similar to the example above, but we are adding partitioning by Country. This means that there will be a window for each Country, and a separate ranking will be calculated for each window.
{
$addWindowFields: {
// this defines the window
paritionBy: "Country",
// sort is required for ranking
sortBy: {
[ field: "Score", order: "descending" ]
},
// new output field to add
output: {
HighScoreRankWithinCountry: "rank()" // this is the window operation
}
}
}
Output records:
Notice that the ranking occurs within country, because partitioning made each country a separate window.
Country | Product | Score | HighScoreRankWithinCountry |
---|---|---|---|
USA | Product 01 | 0.3461 | 4 |
USA | Product 02 | 0.3096 | 5 |
USA | Product 03 | 0.7573 | 2 |
USA | Product 04 | 0.8856 | 1 |
USA | Product 05 | 0.3959 | 3 |
Canada | Product 01 | 0.4744 | 4 |
Canada | Product 02 | 0.7804 | 1 |
Canada | Product 03 | 0.7071 | 2 |
Canada | Product 04 | 0.2232 | 5 |
Canada | Product 05 | 0.5828 | 3 |
Calculate a Net
In this example we will average across records (rows).
Taking an average across fields (columns) would be done using the AddFields stage.
Input records:
Country | Product | Score |
---|---|---|
USA | Product 01 | 0.3461 |
USA | Product 02 | 0.3096 |
USA | Product 03 | 0.7573 |
USA | Product 04 | 0.8856 |
USA | Product 05 | 0.3959 |
Canada | Product 01 | 0.4744 |
Canada | Product 02 | 0.7804 |
Canada | Product 03 | 0.7071 |
Canada | Product 04 | 0.2232 |
Canada | Product 05 | 0.5828 |
Stage definition:
This time we are defining two output fields. One calculates an average and one calculates a max. And since we are partitioning by Country, a separate calculation window will be created for each country.
{
$addWindowFields: {
// this defines the window (optional)
paritionBy: "Country",
// note: sort is not required because we aren't ranking
// new output field to add
output: {
AverageScore: "avg(Score)"
HighScore: "max(Score)"
}
}
}
Output records:
Notice that since the average and max calculations below were windowed by Country, one AverageScore is shared by all USA records, and a different AverageScore is shared by Canada records.
Country | Product | Score | AverageScore | HighScore |
---|---|---|---|---|
USA | Product 01 | 0.3461 | 0.5389 | 0.8856 |
USA | Product 02 | 0.3096 | 0.5389 | 0.8856 |
USA | Product 03 | 0.7573 | 0.5389 | 0.8856 |
USA | Product 04 | 0.8856 | 0.5389 | 0.8856 |
USA | Product 05 | 0.3959 | 0.5389 | 0.8856 |
Canada | Product 01 | 0.4744 | 0.55358 | 0.7804 |
Canada | Product 02 | 0.7804 | 0.55358 | 0.7804 |
Canada | Product 03 | 0.7071 | 0.55358 | 0.7804 |
Canada | Product 04 | 0.2232 | 0.55358 | 0.7804 |
Canada | Product 05 | 0.5828 | 0.55358 | 0.7804 |
As a next step, you could calculate a gap by appending an AddFields stage that subtracts a product's score from the average.