Query Categorization
On this page
Query Categorization lets you predict the categories to which a search query belongs.
To do this, it uses an AI model to create categories from the records in your index.
Every index can have a category hierarchy, depending on your needs, such as product categories for an ecommerce website or genres for a movie app.
For example, in an online grocery shop, the query banana
can be part of the category Food > Vegetables and fruits
.
With Algolia’s Query Categorization feature, you have:
- A dedicated section in the Algolia dashboard to set up the AI model and explore its predictions
- Automatic filtering and boosting on predicted categories without writing extra code to help increase the relevance of your user’s results
- Analytics grouped by a predicted category to learn how the categories perform and to detect underperforming queries
- Access to category predictions at query time (with the Search API) so that you can provide a Search and Discovery experience customized for your users.
Set up Query Categorization
To set up Query Categorization, you must send click and conversion events and then configure the AI model. After setup, check the model output and the generated category tree.
You can also use the Query Categorization predictions in your frontend at query time.
Send click and conversion events
To use Query Categorization, you must send click or conversion events. Algolia AI uses this data to train its model to predict categories.
For a query to be part of the Query Categorization model training, it must:
- Be longer than 3 characters
- Have returned at least 10 records
- Have received events on at least 3 different records
The Query Categorization model is retrained automatically every 24 hours. It always uses events from the last 90 days, meaning it’s based on a sliding window of the most recent analytics data.
Configure the model
To set up Query Categorization, you must provide the facets for the model to use for predictions. These facets must accurately depict the hierarchy of your categories (up to a depth of five levels).
Once you’ve entered your facets, click Save to start the model-building process. Depending on the number of categories and traffic, this can take a few minutes to half an hour.
Supported hierarchical facet formats
-
Assuming your records are structured like this:
Copy1 2 3 4 5 6 7 8 9 10
{ "name": "banana", "description": "...", "price": 3.45, "hierarchicalCategories": { "lvl0": "Food", "lvl1": "Fruits" } }
Set
hierarchicalCategories.lvl0
as the first level used by the model andhierarchicalCategories.lvl1
as the second level. -
If your records are structured like this:
Copy1 2 3 4 5 6 7
{ "name": "banana", "description": "...", "price": 3.45, "group": "Food", "section": "Fruits" }
Set
group
as the first level used by the model andsection
as the second level.
Suppose your records belong to several categories simultaneously, and you use arrays to represent each level of depth. In that case, the model expects shared prefixes (for example, use Food
as the first level facet value and Food > Fruits
as the second level).
The model doesn’t support records structured with only one attribute for all depth levels. For example:
1
2
3
4
5
6
{
"name": "banana",
"description": "...",
"price": 3.45,
"categories": ["Food", "Food > Fruits"]
}
Model output
After configuration, the AI model:
- Uses the provided categories to build a “categories tree” (a hierarchical representation of your categories) based on the different facet values of items in your index.
- Extracts the most likely categories for the most popular queries (by using the click and conversion events you sent to Algolia)
- Is trained to predict the categories associated with a query. Each prediction includes a confidence level from
very low
tocertain
and a type.
The confidence level can be:
very low
low
high
very high
certain
The type can be:
narrow
. Queries associated with a single item in the categories tree. In other words, a “leaf” in the tree with no further sub-categories.broad
. Queries associated with a category that has sub-categories.ambiguous
. Queries associated with several unrelated categories.none
. Used when the model can’t predict a category for the query.
Manage events source
Using a different source index for events lets you use alternative events to predict categories. For instance, you can apply events from a production index to a test index (which wouldn’t have had any user interactions). The current index must be a replica of the targeted source index. To use a different source index, go to the Categories Setting tab and find the Events source index field. Select the source index in the app from the drop-down menu. Once the configuration has been saved, it initiates a training process. This process regenerates the category tree and makes predictions using events from the source index.
Manage the categories tree
Once the model is trained, examine the newly generated categories tree by selecting the Categories Tree View tab in Algolia’s dashboard (Search > Configure > Query Categorization). This lets you confirm that the categories tree has been correctly generated.
You can choose to exclude some categories from the predictions. For example, you should exclude values that aren’t actual categories, like “Black Friday” or “On sale”. Removing these values from the categories tree increases the model’s performance.
Retrieve the Query Categorization predictions at query time
You can use the predictions directly in your frontend at query time to implement, for example:
- Query expansion. When you have limited results for a query, you can expand the results set with more items from the same category
- Disambiguation. When the query is
broad
, you can suggest different categories to help narrow down the search - A tailored experience. Provide a custom experience based on user intent by having a specific layout for some categories
Query Categorization populates your search results with the predicted categories for the search query. The query used for prediction is normalized by the engine, not the raw query.
Turn on Query Categorization at query time
To retrieve Query Categorization results at query time, you must activate this option from the dashboard or in query parameters.
- In the Query Categorization section of Algolia’s dashboard (Search > Configure > Query Categorization). Find the Categories with Search API toggle in the Categories Settings tab.
-
In query parameters as a JSON object or a URL encoded string. For example,
extensions%3D%7B%22queryCategorization%22%3A%7B%22enableCategoriesRetrieval%22%3Atrue%7D%7D
Find Query Categorization parameters in the
extensions
field:Copy1 2 3 4 5 6 7 8 9
{ /* Other standard query parameters... */ "extensions": { "queryCategorization": { "enableCategoriesRetrieval": true /* Other options to control Automatic Filtering and Boosting are available */ } } }
Search response format
The search response has the usual format, with predictions in the attribute extensions.queryCategorization
:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
{
/* Regular search answer (like hits) */
"extensions": {
"queryCategorization": {
"normalizedQuery": "banana",
"count": 2,
"type": "narrow",
"categories": [
{
"bin": "very high",
"hierarchyPath": [
{
"facetName": "category.lvl0",
"facetValue": "Food",
"depth": 0
},
{
"facetName": "category.lvl1",
"facetValue": "Fruits",
"depth": 1
}
]
}
]
}
}
}
On rare occasions, extensions.queryCategorization
can be an empty object for queries that the Query Categorization model didn’t categorize.
How to override predictions
To change the predictions made by Query Categorization, navigate to the Predictions Explorer tab in the Query Categorization section.
- To change the override or replace the predicted categories, click the edit (pencil) icon
- To revert an override to the predicted categories, click the zap (lightning) icon.
- To remove a prediction, click the trash icon.
Changes are displayed in the predictions list. To confirm the changes, click Save changes at the bottom of the page.
Changing the index classification in the Categories Settings tab deletes any override affected by this change. For example, if you remove the second facet level from the index classification, overrides with two levels of depth like Food
> Fruits
are deleted, and the query reset to automatic predictions.
Automatic Filtering and Boosting
Automatic Filtering and Boosting is a search experience that applies filters for user queries based on Query Categorization predictions.
- Automatic filtering applies a search query filter to remove items not matching the predicted category.
- Automatic boosting applies an optional filter to the query to boost items matching the predicted category to the top.
Based on confidence levels, the Query Categorization model determines whether to apply predictions as filters, boosts, or not apply them at all.
By default, only automatic boosting is activated. See Configure the impact of Automatic Filtering and Boosting to find out how to enable automatic filtering as well.
Implement Automatic Filtering and Boosting
To use Automatic Filtering and Boosting, you must enable Query Categorization in the Algolia dashboard. You can preview the impact of Automatic Filtering and Boosting from the Query Categorization section.
You can turn on Automatic Filtering and Boosting for an index in the Automatic filtering & boosting Settings tab.
Once activated, filters and boosts are automatically injected into your search parameters at query time without requiring any frontend changes. In the Query Categorization section of Algolia’s dashboard (Search > Configure > Query Categorization), you can exclude (ban) queries and categories that should never be automatically filtered or boosted. Anything specified here overrides your index’s configuration.
Override Automatic Filtering and Boosting at query time
You can override the default configuration for automatic filtering and boosting with query parameters:
1
2
3
4
5
6
7
8
{
/* Other standard query parameters... */
"extensions": {
"queryCategorization": {
"enableAutoFiltering": true|false
}
}
}
To let users remove filters applied by Automatic Filtering and Boosting, you must explicitly turn off automatic filters and boosts on the search query targeting your index (when users clear the automatic filter). Create an InstantSearch widget to implement a UI for this behavior.
Detect the impact of Automatic Filtering and Boosting at query time
When Automatic Filtering and Boosting is active for a query, the extensions.queryCategorization.autofiltering
section has the following content:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
{
/* Regular search answer (like hits...) */
"extensions": {
"queryCategorization": {
"normalizedQuery": "banana",
"count": 14870,
"type": "narrow",
"categories": [
{
"bin": "certain",
"hierarchyPath": [
{
"facetName": "categories.lvl0",
"facetValue": "Food",
"depth": 0
},
{
"facetName": "categories.lvl1",
"facetValue": "Food > Fruits",
"depth": 1
}
]
}
],
"autofiltering": {
"enabled": true,
"maxDepth": 5,
"facetFilters": [
[
"categories.lvl0:Food"
],
[
"categories.lvl1:Food > Fruits"
],
],
"optionalFilters": []
}
}
}
}
You can activate Automatic Filtering and Boosting without it having an impact. In this case, you won’t see the additional fields in your search response.
Configure the impact of Automatic Filtering and Boosting
You can adjust the impact of Automatic Filtering and Boosting by modifying two parameters in the Automatic filtering & boosting Settings tab:
- The minimum expected confidence level for filtering
- The minimum expected confidence level for boosting
These parameters let you configure when to apply filters or boosts, based on the predictions’ confidence levels.
The feature:
- Boosts predictions with a confidence level equal to or above the confidence level for boosting but below that for filtering.
- Filters on the predictions with a confidence level equal to or above the confidence level for filtering.
For instance, if the boosting confidence level is high
and the filtering confidence level is certain
, Algolia boosts high
and very high
predictions and filters on certain
predictions.
The confidence level for boosting must always be lower than the level for filtering.
You can turn off filtering or boosting using their respective disable option.
Preview Automatic Filtering and Boosting
You can preview Automatic Filtering and Boosting for any index from the Automatic filtering & boosting Preview tab of the Query categorization section in the dashboard.
As long as you have category predictions for the selected index, this screen lets you preview results for any query with predicted categories and show how Automatic Filtering and Boosting affects the results (without activating Automatic Filtering and Boosting on your production traffic).
The Automatic Filtering and Boosting Preview also shows how Promotion Rules and Dynamic Re-Ranking impact the results. You can turn off the Rules and Dynamic Re-Ranking in the preview using the Rules and Dynamic Re-Ranking toggles.
A/B test Automatic Filtering and Boosting
You can use A/B testing to test Automatic Filtering and Boosting on an index and accurately measure the effect on your search.
To do this, click the Launch an A/B test button from the Automatic filtering & boosting Settings tab of the Query categorization section in the dashboard.
Analytics grouped by categories
Once the Query Categorization model is set up, all search queries and can be found under their predicted categories in the Grouped Searches tab of Algolia’s dashboard (under Observe > Analytics). This view doesn’t include browsing queries (the empty query filtered on the category).
You can compare categories or click them to inspect their queries. Inside a category, the queries with a significantly lower click-through or conversion rate are automatically flagged as “underperforming”.
For instance, the two queries blue jeans
and denim
are flagged as belonging to the same category (pants
). Grouped analytics displays the performance of the category pants
(aggregating data for both blue jeans
, denim
, and other queries belonging to the pants
category). You can then compare the performance of the two. For example, the pants
category’s click-through rate is 10%, but the click-through rate for blue jeans
is only 4% (and identified as underperforming). You can improve the performance of the query by, for example, adding a synonym or a Rule.
With grouped analytics, you can aggregate your search analytics to gain new insights and optimize your Search and Discovery experience. It simplifies search analysis and helps manage the long tail of search queries.