You are here:

Asking Granular Questions (NLQ)

Pyramid facilitates the use of granular data elements in NLQs when vectorization has been implemented on the related semantic model. End-users expect to ask ANY analytical or data centric question, using NLQ via the Chatbot interface and AI agents, and get an intelligent, insightful response. When using NLQ on corporate data, the ability to reference actual granular elements of data (at the member level) in questions can be quite limited unless certain infrastructural elements are added to the semantic operations of the underlying agents in collusion with the language processing of the associated LLMs.

The following topic explains how granular data elements can be surfaced in NLQ responses.

Note: This feature requires specific licensing options

What is Vectorization

Vectorization of structured data in Pyramid (also known as structured RAG) is the process by which selected attributes from a semantic model are processed through special LLM-related engines to 'embed ' vectors to each item or member in the attribute. The vectors are stored alongside the data in the source database and can be used during the NLQ engine to find and retrieve relevant members from the database for querying and analysis. In effect, the vectorization process enables AI agents and LLMs to be aware of bespoke, individual members or data elements in a customer's database so they can be correctly queried and analyzed.

Embedding is only supported on a limited number of data technologies. As of writing, it can be applied on write-capable Pyramid In-Memory Databases (IMDB) only.

Click here to see how to deploy vectorization on your data

Granular Questions

Looking at basic question structures for NLQ, users will note that the questions generally refer to the semantic layer only. To understand this clearly, lets use this semantic model with the following structure:

Dimensions and Attributes
- Customer - Country
- Customer - State
- Customer - Job Type
- Customer - Name
- Customer - Education Level
- Product - Name
- Product - Category
- Product - Color
- Product - SKU
- Product - Style
- Store - Name
- Store - Location
- Date - Year
- Date - Month
Measures
- Sales
- Expenses
- Quantity
- Returns

With this model in mind:

A user may ask "Show me sales and expenses by Country and Product Color". Notice that each of these elements refers to a semantic structure: sales, expenses, country and product color. In this scenario, NLQ, even without vectorization, will resolve this question very easily.
If instead the question used specific, granular selections like "Show me sales and expenses in the USA by Product Color for last year" agents will still be able to resolve this successfully, without vectorization, because sales, expenses and product color are semantic structures. While the granular elements are universally known: USA is a universally recognized name for 'country' and 'last year' is universally recognized time period and easily resolved.

What doesn't work

Without vectorization, the NLQ agents will likely fail in resolving this question: "Show me sales figures for blue Hostess items bought by Jo in the last 2 years in the USA." The NLQ agent and LLM will not understand what 'Hostess' is, what 'blue' refers to exactly, and will not be able to understand who or what 'Jo' is, even if it can resolve the 'USA' and the 'last 2 years.'

In another example: "Show me quantities of skates returned by high school graduates in New York who don't have professional jobs" The NLQ agent will not be able to discern and how build queries that involve "skates" , "high school graduates" and "professional jobs"." Since "New York" is universally understood as a location, there will be less issues using it.

Solving Granular Questions

How can a user get a strong response to the question: "Show me sales figures for blue Hostess items bought by Jo in the last 2 years in the USA.."

If the customer-name, product -name and product-color attributes were vectorized, then Pyramid together with the AI agents and LLMs will likely find strong candidates for 'blue Hostess' products and customers named 'Jo' (like Joanne, Josephine, Jo etc) - and ultimately feed their vectors into the query process. This in turn will produce result-sets and responses that are accurate for the user referring to these items correctly.

If there are many ambiguous outcomes, the agents (via the Chatbot) may even prompt the user to pick a specific customer or product using actual member candidates from the customer's own data.

NLQ agents that resolved complex questions with granular elements and vectors take longer to process given the large number of extra steps required to ascertain each aspect of the question.

Feedback

Couldn't find what I was looking for

Help was confusing, unclear or incomplete

Instructions didn't work