The Query node is used to write custom SQL or SOQL queries against a data source. The Query node can only be connected to data source nodes that represent SQL databases (SQL Server, MySQL, PostgreSQL, Oracle, etc) or a Salesforce database. You can then inject an SQL or SOQL statement into the node in order to return the required data set from the source.
Configure the Query node
Connect the Query node to the source node; enter your SQL script in the SQL Query window (green arrow below) in the Properties panel. By default, the resulting table is called 'Custom Query' but this can be changed (red highlight). You can preview the resulting table from the Preview panel (blue highlight).
You can also click the SQL icon (yellow arrow) to open the SQL editor where you can easily access the data source's tables to write your script (see below for more on this).
In this example, the following Select statement was used to return a the profileKey, Income, Age, and Purchased Bike columns from the customerPofile table, wherever the number of children was greater than 2:
SELECT "profileKey", "Income", "Age", "Purchased Bike" FROM "customerProfile" WHERE "Children" > 2
Open the SQL Editor from the Properties panel. This editor exposes the data source's schemas; you can easily add tables and columns to the SQL statement by double clicking or dragging and dropping. The editor also exposes any variables that have been configured in the Data Flow, enabling you to inject variables into the SQL statement.
Preview Result Set
The result set of the SQL query can be previewed by clicking on the Query node, and then clicking the preview icon from the node's Properties panel, or from the Preview panel. By default, the preview is limited to the first 50 rows, but this can be changed from the Preview Size drop down in the ribbon.
- Click here to learn more about the Preview panel.
Expand the Column Selection window to update the column selection for the given table. By default, all columns in the table are selected but you can remove columns by deselecting them. Columns that have been deselected will not be copied to the new data model.
Variables can be injected into the query node in 2 ways: directly into the SQL script, or via the 'Set Variable Values' window. In the first scenario, the variable is used to retrieve the required result set. In the second scenario, the variable can be used to drive another node or process, for instance in the Master Flow.
To pass variable values to the node, expand the Set Variable Values window to pass a variable to the node. To do this, click the plus sign and then select the relevant variable, the aggregation type, and the relevant column.
In this example, the Data table contains a Sales column:
We only want to import rows where the sales were 2000 or more:
To do this, we create a variable with a value of 2000, names 'VarSales':
We then write a script that will return rows from the Data table only if the Sales value is higher than the variable:
SELECT * FROM [data] where [Sales] > @VarSales
Expand the Description window to add your own description to the node's properties. This is useful for keeping track of the ETL process, especially if multiple users are working with the same ETL.
Edit a Query Node
Edit the Query node from its Properties panel. You can change the table name, SQL statement, column selection, set incremental loading, and add a description.