Summarize

The Summarize node is used to construct a new summary table. The new table groups and aggregates the data in specific columns.

In the example below, the customerProfile Table is connected to the Summarize note and:

  • The Education column is added to the Summarize node and the Group By function is applied.
  • Then the Income column is added, and the Average function is applied.

The resulting table displays the five education groups and the average income for each group.

Configuring a Summarize node

Connect the Summarize node to the Select node representing the relevant table.

Step 1: Set up your Summarize node

With the Summarize node selected, go to the Properties panel to set up the Summarize properties:

  1. To add the columns to be summarized, do one of the following:
    • Click Add Column (blue arrow above) to add a single column. Repeat as required.
    • Click Add All (green arrow) to add all of the columns from the table in one move.
  2. Map each Column to an Action as required (purple selection).
  3. Note: If you added one column at a time, you need to create the mapping for each of your columns. If you added all columns in one action, you only need to change the action where "Group By" is not your required action. For details of the available actions, see Summarize Functions.

  4. Optionally, remove any columns that you don't want to include by hovering over the row and clicking the Delete icon (yellow arrow).

Rename your Summarized columns

By default, your summarized column will take the name: <Column Name>_<Action Name>. For example, if you are grouping by Education level, the first Summarized column is called Education_GroupBy.

You can rename your columns by doing one of the following:

  • You can add a Rename node to your Data Flow and connect it after the Summarize node.
  • You can rename the columns using the Edit option in the Column Selection panel (orange arrow above).

Note: You can see that one of the columns has been updated in the example above, because the Info icon (blue box) is visible. This icon is only shown where a column has been renamed. Hover your cursor over this icon to see the column's original name as a tooltip, and click Reset to return the column to its original name.

Step 2: Join your original table and the Summarized columns

To produce a table consisting of the columns from the original table and the Summarize columns:

  1. Add a Join node to the canvas and connect it to both the Table node and the Summarize node.
  2. In the Join node's Properties panel:
    1. Set Join Type as Right Outer Join.
    2. Set the Join Columns; both columns must exist in both the original table and in the Summarize table.
  3. Preview the Join node to see a preview of the table with the Summarize columns.
  • Click here to learn more about Joins.

Summarize Functions

The following functions may be used to generate the Summarize columns:

  • Group By: Groups the summarized columns by this column.
  • Count: Returns the number of rows in the column.
  • Sum: Sums the rows in the numeric column.
  • Count Non Null: Returns the number of rows in the column, excluding nulls.
  • Count Null: Returns the number of nulls in the column.
  • Average: Returns the column average for a numeric column.
  • Average, ignoring zeros: Returns the column average for a numeric column, excluding zero values.
  • Min: Returns the numeric column's smallest value.
  • Max: Returns the numeric column's largest value.

Join the Summarize Node

The Summarize node generates a new table which can then be joined to the original table. This can be done in Data Flow, using the Join nodes, or in Data Model. The Data Flow is the ETL toolset for Pyramid users; join operations added to the data flow are loaded into the target and stored in the database.

Data Model, on the other hand, is the semantic layer of logic that is used to describe the structure of the data model. Joins configured in Data Model are not added to the database schema, but are used only in the querying process.

Related information

Common Properties

There are a number of fields that are present in the Properties panel when you have any of the preceding nodes selected on the canvas. These fields include Result Properties, Column Selection, Set Variable Values, and Metadata.

  • Click here for more details about the Common Properties