There are two options for full data modeling: Model Lite, and Model Pro. Model Lite is more of a wizard tool, that steps the user through a series of stages to build a mash-up of data sets. The Model Pro experience, on the other hand, is used to build a comprehensive pipeline of your data processing activities (the "flow"), and then let you describe a semantic layer for querying (the "model"). These functions include machine learning, scripting engines, column operations, data preparation and more.
Model Pro features three main interfaces:
- Master Flow: where users can construct a master workflow, incorporating multiple data flows and models, scripts, APIs, and more.
- Data Flow: a set of functions and tools for importing, cleaning, embellishing and preparing data for analysis.
- Data Model: a tool for describing the structure of your data so it can be easily and properly queried and analyzed in the analytics tools like Discover.
Users can use master flows to build advanced pipeline and flow logic, multiple data flows, multiple data models, and interactions with various other tools like APIs, command line and messaging.
Master flows include the following key elements:
Data flow is designed as an end-user's "ETL" tool set. ETL is the industry term for data preparation : Extract, Transform and Load. Sometimes ETL operations can be quite complex and detailed. The Pyramid data flow tool set is designed to make these capabilities easier to use and access.
The Model Pro module is designed as a multi-step flow-driven application, where you can design the various steps needed to bring your data into the application using point-and-click tools.
- Data Sources: choose to read or import your data from a variety of data sources covering files (like Excel), to relational databases (like Oracle) to unstructured data sources (like JSON) to web based content (like REST services).
- Data Targets: select the destination for your new data model. The application can push models (both data and schema) to multiple data source types, so you can elect which technology suits you best.
- Preparation: choose from various advanced functions and wizards to condition and prepare your incoming data for analysis.
- Column Operations: use a variety of different functions to manipulate your column data.
- Join: use these functions to merge datasets horizontally (cross-join) or vertically (union).
- Machine Learning: use this set of algorithms to apply Machine Learning logic to your data to enrich your analyses, by providing insights into your data generated by well defined and honed algorithms. You can also access a marketplace of ML scripts that can be applied to your data.
- Scripting: use this functionality to inject custom scripts into your data cleansing operations. Scripts can include simple ETL operations not covered elsewhere in the application and extend to applying specialized, ML scripts for enriching your data and providing deeper insights into what could happen (predictive analysis).
Data Model is the tool that will lead you through the steps needed to describe the database structure you plan to query in Discover and elsewhere. The data model that is produced from this process includes the instructions for:
- Configuration: the type of data model to be produced and its name.
- Relationships: how the various tables and columns "fit" together and how they should be joined when querying.
- Columns Selections: which columns (attributes) are visible, and their type settings.
- Hierarchies: construct virtual hierarchies amongst the different columns.
The security settings allow you to set who can see your materialized databases, data models and machine learning models. By governing this exercise, you have both the ability to share data models with other users while also limiting who can see and edit your databases, models and ML formulations.
Once you've assigned roles, process the data model.