The Data Flow is where you'll construct your data flow diagram, representing the flow of data from the datasource to the target. Your data flow will begin with at least one selected datasource; this could be a file, web source, relational database, unstructured database, or a scripted source. Once you select your datasource, Pyramid will prompt you to select which tables to copy from the source.
Some datasources support direct querying, meaning that they can be natively queries by Pyramid directly from Discover and Formulate. In this case, you don't need to copy any data into Pyramid or add a target server to the data flow, as Pyramid will directly query the source.
Finally, you'll need to add a target to the data flow. This is the server onto which the data set will be loaded and where the data model will be stored. Once you've configured the target server, you can move onto Data Modeling.
Building the data flow involves the following steps:
- Add the data source node and define the data source. You can add multiple datasources to build a model based on tables from different sources.
- Import the required tables from the datasource.
- You have the option to define the column selection for each table.
- Set variables for tables to enable incremental loading. This will ensure that when the data model is reprocessed later, only new rows in the source will be appended to the model's tables, saving time and resources.
- Add any calculations, scripting, or other operations to cleanse and prepare the data. These are available from the following sub-menus: Select, Preparation, Column Operations, Join, Machine Learning, and Scripting.
- Add the target to the flow, into which the dataset will be loaded.
- Click here to learn about the Data Flow interface.