There are two options for full data modeling: Model Lite, and Model Pro. Use the Model Lite wizard to perform the ETL in a series of simple chronological steps. Use the Model Pro to build a flow diagram to illustrate your data flow, and apply advanced data flow functions to build a custom ETL and data model. These advanced data flow functions include machine learning, scripting engines, column operations, data preparation and more. For an overview of the Model Pro, continue reading.
Model Pro features three main aspects:
- Master Flow: where advanced users can construct complex master flows, incorporating multiple data flows and models, scripts, APIs, and more.
- Data Flow: a set of functions and tools for importing, cleaning, embellishing and preparing data for analysis.
- Data Model: a tool for describing the structure of your data so it can be easily and properly queried and analyzed in the analytics tools like Discover.
Data flow is designed as an end-user's "ETL" tool set. ETL is the industry term for data preparation : Extract, Transform and Load. Sometimes ETL operations can be quite complex and detailed. The Pyramid data flow tool set is designed to make these capabilities easier to use and access.
The Model Pro module is designed as a multi-step flow-driven application, where you can design the various steps needed to bring your data into the application using point-and-click tools.
Data Sources: choose to read or import your data from a variety of data sources covering files (like Excel), to relational databases (like Oracle) to unstructured data sources (like JSON) to web based content (like REST services).
Data Targets: select the destination for your new data model. The application can push models (both data and schema) to multiple data source types, so you can elect which technology suits you best.
Preparation: choose from various advanced functions and wizards to condition and prepare your incoming data for analysis.
Column Operations: use a variety of different functions to manipulate your column data.
Join: use these functions to merge datasets horizontally (cross-join) or vertically (union).
Machine Learning: use this set of algorithms to apply Machine Learning logic to your data to enrich your analyses, by providing insights into your data generated by well defined and honed algorithms. You can also access a marketplace of ML scripts that can be applied to your data.
Scripting: use this functionality to inject custom scripts into your data cleansing operations. Scripts can include simple ETL operations not covered elsewhere in the application and extend to applying specialized, ML scripts for enriching your data and providing deeper insights into what could happen (predictive analysis).
Data Model is the tool that will lead you through the steps needed to describe the database structure you plan to query in Discover and elsewhere. The data model that is produced from this process includes the instructions for:
Configuration: the type of data model to be produced and its name.
Relationships: how the various tables and columns "fit" together and how they should be joined when querying.
Columns Selections: which columns (attributes) are visible, and their type settings.
Hierarchies: construct virtual hierarchies amongst the different columns.
The security settings allow you to set who can see your materialized databases, data models and machine learning models. By governing this exercise, you have both the ability to share data models with other users while also limiting who can see and edit your databases, models and ML formulations.
Once you've assigned roles, process the data model.