Adding Joins to your Data Flow
The Join operations are used to combine tables together - either laterally or vertically. Join nodes and Union nodes can be added to the Data Flow to add them to the database schema. Joins can also be added and edited from Data Model, which is the semantic layer of logic that is used to describe the structure of the data model. Joins defined in this semantic layer are not added to the database schema; rather, they are used only in the querying process.
There are many circumstances in which you may want to add joins. Merge joins are useful when joining new machine learning tables to the original source table, for instance. Or you may want to join a new Summarize or Date Range table to its source table.
Alternatively, you may have tables in the data source that you want joined in a single table in the new data model, to make navigation and querying easier for end users.
Join Operations
- Join: Add a Join node between two tables, combining columns from each table and adding the join to the database schema.
- Union: Add a Union node between two or more tables, combining rows from each table and adding the union to the database schema.
Common Properties
The following fields are present in the Properties panel when you have any of the preceding nodes selected on the canvas. There are also Properties that are specific to the nodes described above, see the linked pages for more information.
Result Properties
The Result Properties panel contains only one field: Resulting Table Name. This is the name for the resulting table. You can change this value, if required, or leave its default name.
Column Selection
Expand the Column Selection panel to update the column selection for the given table. By default, all columns in the table are selected. You can, however, remove columns by clearing their checkboxes. Columns that are not selected will not be copied to the new data model.
The Data Type icon (blue arrow above) indicates the data type for this column.
Tip: Hover your cursor over the Data Type icon to view the current name of the column as a tooltip. This is useful where the name is too long to fit in the available space.
Searching your columns
If you are interested in particular columns, click Search (yellow arrow) to open a search field.
Renaming your Columns
To rename your column in this view, either double-click the current name or click Edit (purple arrow). The updated name will be used downstream of this node in your Data Flow.
Note: You can see that the name of this column has been updated in the example above, because the Info icon (green arrow) is visible. Hover your cursor over this icon to see the column's original name as a tooltip. Click Reset (orange arrow) to return the column to its original name.
Set Variable Values
Expand the Set Variable Values window to select variables that you want to pass to the node. Click the Plus (+) sign and then select the relevant variable, the aggregation type (update with), and the relevant column. For information about creating and editing variables, see Variables Panel.
Metadata
Expand the Metadata panel and add the following metadata details.
Description
Add a description for this node. This is useful for keeping track of the Data Flow (ETL) process, especially if multiple users are working with the same Data Flow. The description is visible only in the Model app.
Validate
If you do not want to validate this node when you run the validation process, for example because it is under construction and temporarily contains some invalid scripting, you can clear the selection of the Validate checkbox. Recommended: Always leave the Validate checkbox selected.
Node ID
The unique ID for this node. Click the Copy icon to copy this ID to your clipboard.
Previewing Joins
By default, the preview panel displays 50 rows, which may not be enough to preview joins. Use the Preview Size drop-down in the ribbon to show more rows in the preview panel.