Skip to content
This repository has been archived by the owner on Aug 1, 2023. It is now read-only.

algolia/algoliasearch-client-java-legacy

Repository files navigation

Algolia Search API Client for Java

Algolia Search is a hosted full-text, numerical, and faceted search engine capable of delivering realtime results from the first keystroke.

The Algolia Search API Client for Java lets you easily use the Algolia Search REST API from your Java code.

Build Status GitHub version Coverage StatusMaven Central

DEPRECATION WARNING The version 1.x is no longer under active development. It will still supported for bug fixes, and new query parameters & index settings.

Migration note from v1.x to v2.x In June 2016, we released the v2 of our Java client. If you were using version 1.x of the client, read the migration guide to version 2.x.

WARNING: The JVM has an infinite cache on successful DNS resolution. As our hostnames points to multiple IPs, the load could be not evenly spread among our machines, and you might also target a dead machine.

You should change this TTL by setting the property networkaddress.cache.ttl. For example to set the cache to 60 seconds:

java.security.Security.setProperty("networkaddress.cache.ttl", "60");

Install the Java API client

Install

With Maven, add the following dependency to your pom.xml file:

<dependency>
    <groupId>com.algolia</groupId>
    <artifactId>algoliasearch</artifactId>
    <version>[1,]</version>
</dependency>

Language-specific notes

Instantiate Client/Index

Init Index

To begin, you will need to initialize the client. In order to do this you will need your Application ID and API Key. You can find both on your Algolia account.

APIClient client = new APIClient("YourApplicationID", "YourAPIKey");
Index index = client.initIndex("your_index_name");

Warning: If you are building a native app on mobile, be sure to not include the search API key directly in the source code. You should instead consider fetching the key from your servers during the app's startup.

Search

Building search UIs

If you are building a web application, we recommend using one of our frontend search UI libraries instead of the API client directly.

For example, here is what Algolia's Instant Search UI offers:

  • An out of the box, good-looking Search UI, easily customizable, with instant results and unlimited facets and filters, and many other configurable features
  • Better response time because the request does not need to go through your own servers, but instead is communicated directly to the Algolia servers from your end-users
  • As a consequence, your servers will be far less burdened by real-time searching activity

To get started with building search UIs, take a look at these tutorials:

<title>Search UI</title>

Tutorials

Building an instant search result page

<title>Search UI</title>

Tutorials

Autocomplete

Indexing

Creating indices

You don't need to explicitly create an index, as it will be automatically created the first time you add an object.

Objects are schemaless so you don't need any pre-configuration to start indexing.

If you wish to configure your index, the settings section provides details about advanced settings.

Make sure you don’t use any sensitive or personally identifiable information (PII) as your index name, including customer names, user IDs, or email addresses. Index names appear in network requests and should be considered publicly available.

Index Objects

Schemaless

The objects sent to our Indexing methods schemaless: your objects can contain any number of fields, of any definition and content.

The engine has no expectations of what your data will contain, other than some formatting concerns, and the objectID.

The Object ID

That said, every object (record) in an index eventually requires a unique ID, called the objectID. This is the only field you are sure to see in an index object.

You can create the ID yourself or Algolia can generate it for you. Which means that you are not required to send us an objectID.

Whether sent or generated, once a record is added, it will have a unique identifier called objectID.

This ID will be used later by any method that needs to reference a specific record, such as Update Objects or Partial Updates.

Add, Update and Partial Update differences

Add Objects

The Add Objects method does not require an objectID.

  • If you specify an objectID:
    • If the objectID does not exist in the index, the record will be created
    • If the objectID already exists, the record will be replaced
  • If you do not specify an objectID:
    • Algolia will automatically assign an objectID, which will be returned in the response

Update Objects

The Update Objects method requires an objectID.

  • If the objectID exists, the record will be replaced
  • If the objectID is specified but does not exist, the record is created
  • If the objectID is not specified, the method returns an error

Note: Update Object is also known as Save Object. In this context, the terms are used interchangeably.

Partial Update Ojects

The Partial Update Objects method requires an objectID.

  • If the objectID exists, the attributes will be replaced
  • If the objectID is specified but does not exist, the record is created
  • If the objectID is not specified, the method returns an error

Note: As already discussed, Partial Update does not replace the whole object, it only adds, removes, or updates the attributes mentioned; the remaining attributes are left untouched. This is different from Add Object and Update Object, both of which replace the whole object.

For all three

  • The method for all three can be singular or plural.
    • If singular (e.g. AddObject), the method accepts only one object as a parameter
    • If plural (e.g. AddObjects), the method can accept one or many objects

Note: See the indvidual methods for more information on syntax and usage.

Terminology

Object = Record

We use the words "object" and "record" interchangeably. Sometimes within the same sentence. While they can certainly be different within the field of computer science, for us, they are the same. So don't place any significance on their usage:

  • indices contain "objects" or "records"
  • JSON contains "objects" or "records"

Indexes = Indices

We use these words interchangeably. The former is the American spelling, while the API often uses the British spelling.

In our documentation, we try to always use "indices".

Don't place any significance on their usage.

Attribute

All objects and records contain attributes. Sometimes we refer to them as fields, or elements. Within the search and indexing contexts, we often speak of settings and parameters. Again, these terms are mostly interchangeable.

Some attributes are simple key/value pairs. But others can be more complex, as in Java or C#, where they are often a collection or an object.

Asynchronous methods

Most of these methods are asynchronous. What you are actually doing when calling these methods is adding a new job to a queue: it is this job, and not the method, that actually performs the desired action. In most cases, the job is executed within seconds if not milliseconds. But it all depends on what is in the queue: if the queue has many pending tasks, the new job will need to wait its turn.

To help manage this asynchronicity, each method returns a unique task id which you can use with the waitTask method. Using the waitTask method guarantees that the job has finished before proceeding with your new requests. You will want to use this to manage dependencies, for example, when deleting an index before creating a new index with the same name, or clearing an index before adding new objects.

This is used most often in debugging scenarios where you are testing a search immediately after updating an index.

Settings

The scope of settings (and parameters)

Settings are set on the index and/or during a particular query. In both cases, they are sent to Algolia using parameters.

  • For the index, we use the set settings method.
  • For the search, we use the search method.

Importantly, each parameter has kinds of scope (See API Parameters). There are 3 scopes:

settings

Parameters with a setting scope can only be used in the set settings method. Meaning that it is not available as a search parameter.

Index settings are built directly into your index at indexing time, and they impact every search.

search

Individual queries can be parameterized. To do this, you pass search parameters to the search method. These parameters affect only those queries that use them; they do not set any index defaults.

Both settings and search

When applying both, you create a default + override logic: with the settings, you set an index default using the set settings method. These settings can then be overriden by your search method. Only some settings can be overidden. You will need to consult each settings to see its scope.

Note: Note that, if you do not apply an index setting or search parameter, the system will apply an engine level default.

Example

Just to make all of this more concrete, here is an example of an index setting. In this example, all queries performed on this index will use a queryType of prefixLast:

index.setSettings({
  queryType: 'prefixLast'
});

So every query will apply a prefixLast logic. However, this can be overridden. Here is a query that overrides that index setting with prefixAll:

index.search({
  query: 'query',
  queryType: 'prefixAll'
});

Categories

As you start fine-tuning Algolia, you will want to use more of its settings. Mastering these settings will enable you to get the best out of Algolia.

To help you navigate our list of settings, we've created the following setting categories:

For a full list of settings:

<title>View API Reference</title>

API reference

Settings API parameters

Manage Indices

Create an index

You don’t need to explicitly create an index, it will be automatically created the first time you add an object or set settings.

Make sure you don’t use any sensitive or personally identifiable information (PII) as your index name, including customer names, user IDs, or email addresses. Index names appear in network requests and should be considered publicly available.

Asynchronous methods

All the manage indices methods are asynchronous. What you are actually doing when calling these methods is adding a new job to a queue: it is this job, and not the method, that actually performs the desired action. In most cases, the job is executed within seconds if not milliseconds. But it all depends on what is in the queue: if the queue has many pending tasks, the new job will need to wait its turn.

To help manage this asynchronicity, each method returns a unique task id which you can use with the waitTask method. Using the waitTask method guarantees that the job has finished before proceeding with your new requests. You will want to use this to manage dependencies, for example, when deleting an index before creating a new index with the same name, or clearing an index before adding new objects.

This is used most often in debugging scenarios where you are testing a search immediately after updating an index.

Analytics data

Analytics data is based on the index; to access analytics data, it is therefore necessary to use the index name. See the common parameters of our analytics methods.

We collect analytics data on a separate server, using separate processes. In parallel, your main indices are updated and searched asynchronously. It is important to keep in mind that there is no hard link between your indices and the collection and storage of their analytics data. they are 2 sets of data on separate servers. Therefore, actions like deleting or moving an index will have no impact on your Analytics data.

As a consequence, Analytics is not impacted by indexing methods. We do not remove analytics data: whether you have removed or changed the name of an index, its analytics can always be accessed using the original index name - even if the underlying index no longer exists.

Additionally, copying or moving an index will not transfer Analytics data from source to destination. The Analytics data stays on the source index, which is to be expected; and the destination index will not gain any new Analytics data.

Keep in mind, then, that if you are overwriting an exiting index - an index that already has analytics data - the overwritten index will not only not lose its Analytics data, any new Analytics data will be mixed-in with the old.

Api keys

Adding and Generating API keys

It is important to understand the difference between the Add API Key and Generate secured API Key methods.

For example:

  • Add API key is executed on the Algolia server; Generate Secured API key is executed on your own server, not Algolia's.
  • Keys added appear on the dashboard; keys generated don't.
  • You add keys that are fixed and have very precise permissions. They are often used to target specific indices, users, or application use-cases. They are also used to generate Secured API Keys.

For a full discussion:

<title>Security</title>

Algolia Concepts

API Keys

Synonyms

Query Rules

Overview

Query Rules allows performing pre- and post-processing on queries matching specific patterns. For more details, please refer to our Rules guide.

Miscellaneous

As its name implies, Query Rules is applied at query time. Therefore, some search parameters can be used to control how the rules are applied.

Most of the methods manipulate queryRule objects, as described in detail in the different Query Rules methods.

Just like for objects or synonyms, write methods for rules are asynchronous: they return a taskID that can be used by Wait for operations.

A/B Test

MultiClusters API Client

A Brief Technical Overview

How to split the data (Logical Split)

The data is split logically. We decided not to go with a hash-based split, which requires the aggregation of answers from multiple servers and adds network latency to the response time. Normally, the data will be user-partitioned - split according to a user-id.

Uses a single appID

If we were to follow the logic of using one appID per cluster, multi-clusters would require many appIDs. However, this would be difficult to manage, especially when moving data from one cluster to another in order to balance the load. Our API therefore relies on a single appID: the engine routes requests to a specific destination cluster, using a new HTTP header, X-ALGOLIA-USER-ID, and a mapping that associates a userID to a cluster.

What MCM doesn't do

As mentioned, the data is broken up logically. The split is done in such a way that requires only one server to perform a complete search. This API doesn't aggregate the response from multiple clusters. We designed the multi-clusters feature in order to stay fast even with a lot of clusters in multiple regions.

Shared configuration

With MCM, all the settings, rules, synonyms and api keys operations are replicated on all the machine in order to have the same configuration inside the clusters. Only the records stored in the index are different between two clusters.

Shared data

For some use cases, there are two types of data:

  • Public data
  • Private user data

The public data can be searched at the same time as private user data. With MCM, it's possible to create public records with the multi-clusters using the special userID value * in order to replicate the record on all the clusters and make it available for search. We show this in our Public / Private data tutorial.

ObjectIDs

The objectIDs need to be unique from the userIDs to avoid a record of one userID to override the record of another userID. The objectID needs to be unique also because of the shared data which can be retrieved at the same time as the data of one specific customer. We recommend appending to the objectID, the userID of the specific user to be sure the objectID is unique.

Number of indices

MCM is design to work on a small number of indices (< 100). This limitation is mainly here to preserve the performance of the user migration. To migrate a user from one cluster to another, the engine needs to enumerate all the records of this specific user in order to send it to the destination cluster and so loop on all the indices, the cost of the operation is directly linked to the number of indices.

A small number of indices also allow the engine to optimize more the indexing operations by batching the operation of one index together.

Check out our Tutorial

Perhaps the best way to understand the MultiClusters API is to check out our [MCM tutorial], where explain, with code samples, the most important endpoints.

Limitation v0.1

For v0.1, the assignment of users to clusters won't be automatic: if a user is not properly assigned, or not found, the call will be rejected.

Warning: As you will notice, the documentation is actually using the REST API endpoints directly. We will soon be rolling out our API clients methods.

How to get the feature

MCM needs to be enabled on your cluster. You can contact support@algolia.com for more information.

MultiCluster usage

With a multi-cluster setup, the userID needs to be specified for each of the following methods:

Each of these methods allows you to pass any extra header to the request. We'll make use of the X-Algolia-User-ID header.

Here is an example of the search method, but the principle is the same for all the methods listed above:

search_multi_cluster

You can find an example of how to pass extra headers for the other methods in their respective documentation.

Advanced

Retry logic

Algolia's architecture is heavily redundant, to provide optimal reliability. Every application is hosted on at least three different servers (called clusters). As a developer, however, you don't need to worry about those details. The API Client handles them for you:

  • It leverages our dynamic DNS to perform automatic load balancing between servers.
  • Its retry logic switches the targeted server whenever it detects that one of them is down or unreachable. Therefore, a given request will not fail unless all servers are down or unreachable at the same time.

Note: Application-level errors (e.g. invalid query) are still reported without retry.

Error handling

Requests can fail for two main reasons:

  1. Network issues: the server could not be reached, or did not answer within the timeout.
  2. Application error: the server rejected the request.

In the latter case, the error reported by the API client contains:

  • message: an error message indicating the cause of the error
  • status: an HTTP status code indicating the type of error

Here's an example:

{
  "message":"Invalid Application ID",
  "status":404
}

Caution: The error message is purely informational and intended for the developer. You should never rely on its content programmatically, as it may change without notice.