How to group results
On this page
Algolia works differently than relational databases. When fetching data from a database, you can select what you need, perform complex operations to aggregate data from different tables together, and get data in a format that’s already close to how you want to display it on your frontend. With Algolia, every time you have a match within one or more of your records, the engine returns the full records ranked by relevance.
Sometimes your data contains records that are parts of a larger record. This can happen with a blog article that is broken up into one paragraph per record. It can also happen when several records share a common source, as in a hierarchy or one-to-many relationship. A good example of this is with job openings, where companies offer multiple job offers.
As you’ll see, the solution is to flatten records and repeat some data. In the job offer example, you only want to show the most relevant 1 or 3 offers per company, leaving room for other companies.
For more information about breaking up large texts, see Indexing long documents
Dataset example: job offer
Before
Using a “traditional” approach to structuring records, the dataset could look like this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
[
{
"company": "Twilio",
"job_openings": [
"Staff Software Engineer - Cloud Platform",
"Lead Frontend Engineer",
"Senior Data Engineer",
"Senior Software Engineer, Developer Experience"
]
},
{
"company": "Algolia",
"job_openings": [
"Full-Stack Software Engineer",
"Frontend Engineer",
"Open Source Software Engineer (JavaScript)",
"Senior Software Engineer - Core API",
"Senior Systems Engineer - SRE"
]
}
]
The problem with this structure is that whenever you have a match for any opening, the engine returns the full record for the company. If you want to show the best match per company, this data structure doesn’t work.
If you want to show a limited number of job openings per company, the right approach would be to split content into smaller records, by job opening, and repeat company data.
After
With the strategy of splitting records per company, you would have a single record per job opening, and repeat the company in each. Here’s what it might look like:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
[
{
"company": "Twilio",
"job_opening": "Staff Software Engineer - Cloud Platform"
},
{
"company": "Twilio",
"job_opening": "Lead Frontend Engineer"
},
{
"company": "Twilio",
"job_opening": "Senior Data Engineer"
},
{
"company": "Twilio",
"job_opening": "Senior Software Engineer, Developer Experience"
},
{
"company": "Algolia",
"job_opening": "Full-Stack Software Engineer"
},
{
"company": "Algolia",
"job_opening": "Frontend Engineer"
},
{
"company": "Algolia",
"job_opening": "Open Source Software Engineer (JavaScript)"
},
{
"company": "Algolia",
"job_opening": "Senior Software Engineer - Core API"
},
{
"company": "Algolia",
"job_opening": "Senior Systems Engineer - SRE"
}
]
This approach has many benefits. First, job openings are no longer intertwined, which allows for more granular search. Whenever someone searches for a position, for example, “engineer”, they no longer retrieve records representing a company with the full job openings list. Instead, they get single, best matching job positions, that can be individually ranked with custom ranking attributes.
Besides, you can handle the duplicate data with Algolia’s distinct feature. Enabling this would let you, for example, only retrieve the best matching position per company
.
Configuring attributeForDistinct
and enabling distinct
Using the API
To use distinct
you first need to set company
as attributeForDistinct
during indexing time. Only then can you set distinct
to true
to deduplicate your results.
Note that setting distinct
at indexing time is optional. If you want to, you can set it at query time instead.
1
2
3
4
$index->setSettings([
'attributeForDistinct' => 'company',
'distinct' => true
]);
Once attributeForDistinct
is set, you can enable distinct
by setting it to true
. Note that you can set distinct
to true
or 1
interchangeably. If you wanted to show the three best positions for one company, you could set distinct
to 3
.
1
2
3
$results = $index->search('query', [
'distinct' => true
]);
Using the dashboard
You can also set your attribute for distinct and enable distinct in your Algolia dashboard.
- Go to your dashboard and select your index.
- Click the Configuration tab, then click Deduplication and Grouping.
- Set Distinct to true
- Select attribute “company” in the Attribute for Distinct drop-down menu.
- Save your changes.