You can load the ETL into a Parquet file, and save the new file to a shared folder location. When the ETL is run, the data will be loaded into Parquet files in the given folder. Each table in the data flow is loaded into a separate Parquet file and the corresponding .crc file is generated for each of these. To use a Parquet file as a target, add the Parquet node from the Targets panel to the data flow.
Configure a Parquet Target
From the target's Properties panel, name the new database that will be created, and provide a pointer to a shared folder where the file will be located:
- Database Name: name the new database that will be generated when the ETL is run.
- Shared Folder Path: provide a pointer to a shared folder where the new database will be saved.
- Create Folders: generate folders and save the database file within these folder:
- Database Name: create a folder named according to the given database name, and save the database file inside this folder.
- Date Time: create a folder named according to the date and time at which the ETL is run, and save the database file inside this folder. If a database folder is also created, the Date Time folder will be a subfolder.
Finally, click ‘Connect All’ to connect the target node to the data flow. As usual, you can add a description to the node's Properties panel.
Expand the Description window to add a description or notes to the node. The description is visible only from the Properties panel of the node, and does not produce any outputs. This is a useful way to document the ETL pipeline for yourself and other users.
Run the ETL
- Click here to learn how to process the ETL.
In this example, the ETL was loaded from an SQL server into an Parquet file. As seen in the Target properties (green highlight below) the Parquet target database was named 'Customers' and saved to a shared folder call DataModeling. Both Database Name and Date Time folders were enabled before connecting the tables to the target. The next step is to process the ETL.
Once the ETL is executed, the Parquet file is saved to the given folder location, inside the Database Name folder and Date Time subfolder. Here we see the database folder 'Customers':
Inside the 'Customers' folder is the Date Time subfolder:
The Parquet database file is inside the Date Time subfolder. Each table in the ETL is loaded into a separate Parquet file, which is stored inside a table subfolder (subfolders named according to the table):
Inside each table folder is the table's Parquet file and the corresponding .crc file: