Sending records in batches

For example, you might decide to send all the data from your database and end up with a million records to index. That’s too big to send all at once because Algolia limits you to 1 GB per batch per request. In reality, sending that much data in a single network call would fail before reaching the API. You could loop over each record and send them with the saveObjects method. The problem is that you would perform a million individual network calls, which would take way too long and saturate your Algolia cluster with indexing jobs.

A leaner approach is to split your collection of records into smaller collections, then send each chunk one by one. For optimal indexing performance, aim for a batch size of about 10 MB, representing between 1,000 and 10,000 records, depending on the average record size.

Batching records doesn’t reduce your operations count. Algolia counts indexing operations per record, not per method call, so from a pricing perspective, batching records is the same as indexing records individually.
Be careful when approaching your plan’s maximum number of records. If you’re close to the record limit, batch operations may fail. The error message “You have exceeded your Record quota” means the engine doesn’t know if the batch operation will update records or add new ones. If this happens, upgrade to a plan with a higher record limit or reduce your batch size.

Using the API

When using the saveObjects method, the API client automatically chunks your records into batches of 1,000 objects.

If you want to upload large files, consider using the Algolia CLI with the algolia objects import command.

Copy

1
2
3
4
5
6
7
$client = new \AlgoliaSearch\Client('YourApplicationID', 'YourWriteAPIKey');
$index = $client->initIndex('actors');

$records = json_decode(file_get_contents('actors.json'), true);

// Batching is done automatically by the API client
$index->saveObjects($records, ['autoGenerateObjectIDIfNotExist' => true]);

1
2
3
4
5
6
7
8
9
10
require 'json'
require 'algolia'

client  = Algolia::Search::Client.create('YourApplicationID', 'YourWriteAPIKey')
index   = client.init_index('actors')
file    = File.read('actors.json')
records = JSON.parse(file)

# The API client automatically batches your records
index.save_objects(records, { autoGenerateObjectIDIfNotExist: true })

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
const algoliasearch = require('algoliasearch')
const fs = require('fs');
const StreamArray = require('stream-json/streamers/StreamArray');

const client = algoliasearch('YourApplicationID', 'YourWriteAPIKey');
const index = client.initIndex('actors');

const stream = fs.createReadStream('actors.json').pipe(StreamArray.withParser());
let chunks = [];

stream
  .on('data', ({ value }) => {
    chunks.push(value);
    if (chunks.length === 10000) {
      stream.pause();
      index
        .saveObjects(chunks, { autoGenerateObjectIDIfNotExist: true })
        .then(() => {
          chunks = [];
          stream.resume();
        })
        .catch(console.error);
    }
  })
  .on('end', () => {
    if (chunks.length) {
      index.saveObjects(chunks, {
        autoGenerateObjectIDIfNotExist: true
      }).catch(console.error);
    }
  })
  .on('error', err => console.error(err));

1
2
3
4
5
6
7
8
9
10
11
import json
from algoliasearch.search_client import SearchClient

client = SearchClient.create('YourApplicationID', 'YourWriteAPIKey')
index = client.init_index('actors')

with open('actors.json') as f:
    records = json.load(f)

# Batching is done automatically by the API client
index.save_objects(records, {'autoGenerateObjectIDIfNotExist': True})

1
2
3
4
5
6
7
8
9
10
let filePath = Bundle.main.path(forResource: "actors", ofType: "json")!
let contentData = FileManager.default.contents(atPath: filePath)!
let records = try! JSONSerialization.jsonObject(with: contentData, options: []) as! [[String: Any]]

let chunkSize = 10000

for beginIndex in stride(from: 0, to: records.count, by: chunkSize) {
  let endIndex = min(beginIndex + chunkSize, records.count)
  index.addObjects(Array(records[beginIndex..<endIndex]))
}

1
2
3
4
5
6
7
8
9
10
val client = ClientSearch(ApplicationID("YourApplicationID"), APIKey("YourWriteAPIKey"))
val index = client.initIndex(IndexName("actors"))
val string = File("actors.json").readText()
val actors = Json.plain.parse(JsonObjectSerializer.list, string)

index.apply {
    actors
        .chunked(1000)
        .map { saveObjects(it) }
        .wait() // Wait for all indexing operations to complete.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
using System;
using System.IO;
using System.Collections.Generic;
using Algolia.Search.Clients;
using Newtonsoft.Json;

public class Actor
{
  public string Name { get; set; }
  public string ObjectId { get; set; }
  public int Rating { get; set; }
  public string ImagePath { get; set; }
  public string AlternativePath { get; set; }
}

public class AlgoliaIntegration
{
  private SearchClient client;
  private SearchIndex index;

  public AlgoliaIntegration(string ApplicationID, string apiKey)
  {
    client = new SearchClient("YourApplicationID", "YourWriteAPIKey");
    index = client.InitIndex("actors");

    // Assuming the actors.json file is in the same directory as the executable
    string json = File.ReadAllText("actors.json");
    var settings = new JsonSerializerSettings
    {
        ContractResolver = new Newtonsoft.Json.Serialization.CamelCasePropertyNamesContractResolver()
    };
    IEnumerable<Actor> actors = JsonConvert.DeserializeObject<IEnumerable<Actor>>(json, settings);

    // Batching/Chunking is done automatically by the API client
    bool autoGenerateObjectIDIfNotExist = true;
    index.SaveObjects(actors, autoGenerateObjectIDIfNotExist);
  }
}

// To use the above class, you would instantiate it somewhere in your application startup logic
// Example:
// var algoliaIntegration = new AlgoliaIntegration("YourApplicationID", "YourWriteAPIKey");

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
import java.io.FileInputStream;
import java.io.InputStream;
import com.fasterxml.jackson.databind.ObjectMapper;

public class Actor {
    // Getters/Setters ommitted
    private String name;
    private String objectId;
    private int rating;
    private String imagePath;
    private String alternativePath;
}

// Synchronous version
SearchClient client =
DefaultSearchClient.create("YourApplicationID", "YourWriteAPIKey");

SearchIndex<Actor> index = client.initIndex("actors", Actor.class);

ObjectMapper objectMapper = Defaults.getObjectMapper();

InputStream input = new FileInputStream("actors.json");
Actor[] actors = objectMapper.readValue(input, Actor[].class);

// Batching/Chuking is done automatically by the API client
boolean autoGenerateObjectIDIfNotExist = true;
index.saveObjects(Arrays.asList(actors), autoGenerateObjectIDIfNotExist);

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
package main

import (
    "encoding/json"
    "io/ioutil"

    "github.com/algolia/algoliasearch-client-go/v3/algolia/search"
)

type Actor struct {
    Name            string `json:"name"`
    Rating          int    `json:"rating"`
    ImagePath       string `json:"image_path"`
    AlternativeName string `json:"alternative_name"`
    ObjectID        string `json:"objectID"`
}

func main() {
    client := search.NewClient("YourApplicationID", "YourWriteAPIKey")
    index := client.InitIndex("actors")

    var actors []Actor
    data, _ := ioutil.ReadFile("actors.json")
    _ = json.Unmarshal(data, &actors)

    // Batching is done automatically by the API client
    _, _ = index.SaveObjects(actors)
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
package algolia

import java.io.FileInputStream

import algolia.AlgoliaDsl._
import org.json4s._
import org.json4s.native.JsonMethods._

import scala.concurrent.ExecutionContext.Implicits.global

case class Actor(name: String,
                 rating: Int,
                 image_path: String,
                 alternative_path: Option[String],
                 objectID: String)

object Main {

  def main(args: Array[String]): Unit = {
    val client = new AlgoliaClient("YourApplicationID", "YourWriteAPIKey")

    val records = parse(new FileInputStream("actors.json")).extract[Seq[Actor]]

    records
      .grouped(10000)
      .map(g => {
        client.execute {
          index into "actors" objects g
        }
      })
  }

}

With this approach, you would make 100 API calls instead of 1,000,000. Depending on your records’ sizes and your network speed, you could create bigger or smaller chunks.

Guides Scaling to larger datasets

Importing with the API

Using the dashboard

You can also send your records in your Algolia dashboard.

Add records manually

Go to your dashboard, select the Data Sources icon, and then select your index.
Click the Add records tab and select Add manually.
Copy/paste your chunk in the JSON editor, then click Push record.
Repeat for all your chunks.

Upload a file

Go to your dashboard and select your index.
Click Manage current index, then Upload file.
Either click the file upload area to select the file where your chunk is or drag it onto the page.
Upload starts automatically.
Repeat for all your chunks.

Guide Importing from the dashboard