The Preparation nodes are used to apply a range of formulas on the tables in the flow diagram. These functions are used to transform data and optimize tables and columns for end-users by generation addition logic and columns, such as the creation of new date-time columns based on various date parts, generating latitude and longitude columns, and adding random number columns. These functions are also used for data cleansing, like sorting and filtering columns, removing duplicate rows, and transforming matrix grids into tabular ones so that they can be queried.
Configuring Preparation Functions
The Preparation nodes can be connected to Select, SQL Query, Bottom N, and Top N nodes. Connect the required Preparation node to the node representing the relevant table.
Once connected to the Data Flow, the Preparation node usually requires configuration from its Properties panel.
The following Preparation nodes can be connected to the Data Flow:
- Add Date/ Time: generate a new date-time column based on a given date part.
- Add Sequence:add a UUID or a numeric sequence as an additional column at the beginning of the table.
- Distinct: remove duplicate rows from the table.
- Filter: filter a specified column.
- GeoCode: extract latitude and longitude columns from your existing geolocation columns.
- Masking: replace values in a specified column with a mask string.
- Modify Case: change the case of string values in a column.
- Random Number: add a column of random numbers.
- Remove Duplicates: remove user-defined duplicate rows.
- Sort: sort a specified column in ascending or descending order.
- Summarize: generate summarize columns by applying an aggregate calculation or the Group By function.
- Time Intelligence: produce multiple date-time columns at different levels of granularity, based on a dateKey column in the data source.
- Unpivot: transform a matrix grid into a tabular grids.