As described in the calculation overview, Custom Columns are calculations that are designed to virtually append calculated values to an existing data model as new columns of data. By virtue, they are calculations that run at the grain or transactional level, so they are sometimes referred to as 'granular' calculations.
Custom columns are available for both value metrics (orange) and hierarchies (blue). In Discover they are exposed together with the other types of calculations (quick formulas, quick lists, context calc's etc) and can be triggered from the hierarchy trees, element trees, visualizations and drop zones.
Custom columns are based around 3 main functional types:
- Aggregations: these functions are used to add or change the summation logic used on columns presented in the data model. The options vary between numeric and non-numeric columns
- Categorizations: these functions are used to generate and add new classification columns to the data model using existing data values - typically using mathematical logic or machine learning .
- Date Parts: these functions extract date-time groupings from raw date-time columns in a SQL database. This includes things like years, months, weeks and quarter.
Note: Custom Columns allow the user to add granular logic to an existing model. This functionality is not available on MS OLAP, Tabular and SAP BW cubes and models since this is not possible on a predefined model framework.
Static vs Context Definitions
Some custom column calc's that are determined by their contextual usage, so they share some of the same processing logic as context calculations - this includes binning. Other custom column calc's are calculated using static definitions that are executed in the query. This includes aggregations, date parts, and geo boundaries.
By creating or 'changing' the aggregation type of a metric or hierarchy column, the engine effectively adds a new virtual column to the data model that represents a new way to quantify query results. Being executed at the grain, these columns (usually) produce different results compared to changing the semantic operations on existing columns - which is desirable depending on the business or analytic problem being resolved.
Aggregation options vary depending on the data type of the raw data columns:
- Numeric Columns or Existing Measures: The aggregation can be set to count, distinct count, minimum, maximum, sum, average, variance (both sample and population), standard deviation (both sample and population), first quartile (25th percentile), median (50th percentile) and third quartile (75th percentile).
- Non-numeric Columns: The aggregation can be set to count, distinct count, minimum and maximum only.
- Click here for examples and more on custom column aggregations .
By creating a logical categorization of data in hierarchy column (metric or non-metric), the engine effectively adds a new virtual column to the data model that represents a new way to slice, dice and view query results. These columns generally use mathematical or machine learning algorithms to generate the classifications based on the context of the current query (when they are invoked in Discover) - so they are often "contextually driven". Different column data types dictate which categorization options are available and "Geo Boundaries" categorization is triggered by user interactions in bubble maps only.
Context Sensitive Categorizations
Context sensitive categorizations use the existing query values to determine the categories. Once generated, these then remain fixed regardless of subsequent changes to the query.
- Outlier: for numeric columns only, outliers mathematically resolves which data points in the query are statistical outliers and flags all points as "in" and "out".
- Bin by Rank: creates the specified number of bins (or bands) and assigns the data points to each bin according to position in the list (rank) when ordered by value,
- Bin by Value: creates the specified number of bins based on dividing the range of values in the query into the specified bands, and assigns the data points to each bin based on which band matches its value.
Static categorizations are not based on query values, but rather on simple mechanisms for selecting elements and adding them to a category. Once generated, these then remain fixed regardless of subsequent changes to the query.
- Geo Boundary: creates a categorization of map-based elements based on the user's manual selection using the geo-boundary lasso tool - demarcating those inside or outside the boundary. Click here to learn more.
- Bin by Category: creates a categorization of elements based on the captions of the elements using simple alphanumeric text groupings, so that the elements are evenly spread amongst the number of bins specified.
- Bin by Size: creates a dynamic number of bins based on the specified size and then assigns each granular row in the column to each bin based on its value. The number of bins is determined automatically by measuring the lowest and highest granular values.
- Click here for examples and more on custom column categorizations.
Date parts allow users to add logical date groupings of date-time data - like year, month, quarter and week. By building these groupings, time-based analysis of data becomes significantly easier and smarter. By adding date-part columns, the engine effectively adds a new virtual column to the data model that represents a new way to query results. Converting a simple date into a year or month can only be done at the grain and cannot be solved effectively using semantic calculations.
Date Parts available include:
- Date "Numbers": Year, Quarter Number, Month Number, Week Number, Day of Month, Hour, Minute
- Date "Text" : Full Quarter, Full Month, Full Week, Month Name
- Click here for examples and more on custom column date parts.
The process of creating and using Custom Columns is specifically designed to make general assumptions about the logic, naming and presentation of the item to keep the exercise as fast and as simple as possible for users. Custom Columns are also delivered as one-off logic elements since they are always specific to a report or visualization. However, they can be lightly edited and deleted.
- Click here for more detail on each of these capabilities.