Guides / Sending and managing data / Manage indices and apps / Manage indices / Tutorial

Generate a sitemap from an Algolia index

A sitemap is an XML file that describes all pages of your website that are available for crawling by a search engine. Sitemaps contain the URLs of your pages along with metadata, such as when a page was last updated, how often a page is updated, or how important a page is relative to other pages. Sitemaps play a significant role in search engine optimization (SEO), especially when you’re using Algolia in your frontend.

Thanks to the flexibility of facets, Algolia can power navigation in addition to search results pages, which lets you implement dynamic category pages based on the data in your index. These are great candidates to add to your sitemap.

Before you start

This tutorial assumes that you installed Node.js in your environment, and you understand how to create and run Node.js scripts.

You also need an Algolia account. If you don’t have one already, you can create an account for free.

Dataset

For this tutorial, you’ll use an ecommerce dataset where each result is a product. All records have a categories attribute with one or more categories.

Download the dataset and import it into your Algolia application:

Install dependencies

Install the algolia-sitemap as a dependency. This helper library lets you dynamically generate sitemaps from your Algolia indices.

1
npm install algolia-sitemap

Create a sitemap of all the records in your index

First, create a sitemap with all your products to let search engines know where to find them. You need your Algolia application ID and your API key. The API key must have browse permission.

1
2
3
4
5
6
7
8
9
10
11
12
13
const sitemap = require('algolia-sitemap');

// You need an API key with `browse` permission
const algoliaConfig = {
  appId: 'YourApplicationID',
  apiKey: 'YourAPIKey',
  indexName: 'YourIndexName',
};

sitemap({
  algoliaConfig,
  // ... 
});

To the sitemap function, you need to add a hitsToParams callback function. This function turns a record from your index into a sitemap entry. The return value of the hitsToParams function must be an object with attributes that match an <url> entry of a sitemap file:

  • loc (required): the URL of the page for the record
  • lastmod: the last modified date (ISO 8601)
  • priority: the priority of this page compared to other pages on your site (between 0 and 1)
  • changefreq: describes how often the page is likely to change
  • alternates: alternate versions of this link
  • alternates.languages: an array of enabled languages for this link
  • alternates.hitToURL: a function that transforms a language into a URL

In this tutorial, you’ll create a simple sitemap, only specifying the loc attribute.

1
2
3
4
5
6
7
8
9
10
11
12
13
// Turn a record into a sitemap entry
function hitToParams({ url }) {
  return { loc: url };
}

sitemap({
  algoliaConfig,
  hitToParams,
  // The URL of the sitemaps directory
  sitemapLoc: 'https://example.com/sitemaps',
  // The directory with all sitemaps (default: `sitemaps`)
  outputFolder: 'sitemaps',
});

In your project directory, create a sitemaps/ directory. Now, you can run the sitemap.js script:

1
node sitemap.js

The /sitemaps directory now has two types of files:

  • The sitemap-index.xml file with links to each sitemap file
  • The sitemap.0.xml file with links to your products

You can validate your sitemaps with external tools. To inspect the sitemap, you can use a tool like xmllint.

1
xmllint --format sitemaps/sitemap.0.xml

Create a sitemap for categories

You should add your category pages to the sitemap. If you’re using the sample dataset, use the categories attribute for the URL:

1
2
3
{
  "categories": ["Mobile Phones", "Phones & Tablets"]
}

In this example, the product belongs to two categories. Usually, category pages have URLs like the following: https://example.com/{CATEGORY_NAME}.

You need to modify the hitToParams callback function to return an array of all categories that belong to a given record. Since categories apply to many records, you need to make sure to add them to your sitemaps only once. With ES6, you can use a Set.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
// Turn categories into sitemap entries
const alreadyAdded = {};

function hitToParams({ categories }) {
  const newCategories = categories.filter(
    (category) =>
      category !== null && category !== undefined && !alreadyAdded[category]
  );

  if (!newCategories.length) return;

  const locs = newCategories.map((category) => {
    alreadyAdded[category] = category;
    return { loc: `https://example.com/${encodeURI(category)}` };
  });

  return locs;

}
// ...

Check each record to see if it contains categories that you didn’t yet add to the sitemap, and add them. This lets you save all your category pages to your sitemap.

Create a sitemap for categories and all records

You can modify the code for adding category pages to your sitemap to create a combined sitemap for both records and categories.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
// ...
function hitToParams({ categories, url }) {
  const newCategories = categories.filter(
    (category) =>
      category !== null && category !== undefined && !alreadyAdded[category]
  );

  if (!newCategories.length) return;

  const locs = [];

  newCategories.forEach((category) => {
    alreadyAdded[category] = category;
    alreadyAdded[url] = url;

    locs.push(
      ...[
        { loc: `https://example.com/${encodeURI(category)}` },
        { loc: url },
      ]
    );
  });
  return locs;
}
// ...

Notify search engines of sitemap changes

Finally, you can let search engines know that your sitemap has changed. Most search engines have a ping mechanism to inform them of a new sitemap, so you can perform this directly from your script.

1
2
3
4
5
6
7
8
const endpoints = [
  'http://www.google.com/webmasters/sitemaps/ping?sitemap=http://example.com/sitemap.xml',
  'http://www.bing.com/webmaster/ping.aspx?siteMap=http://example.com/sitemap.xml',
];

Promise.all(endpoints.map(fetch)).then(() => {
  console.log('Done');
});
Did you find this page helpful?