Note: phiên bản Tiếng Việt của bài này ở link dưới.

https://duongnt.com/elasticsearch-api-client-kotlin-vie

Query data with Kotlin and Elasticsearch API Client

The Elasticsearch Java API Client is now the recommended library to interact with Elasticsearch. It provides a fluent way for building requests and parsing responses, with strong typing and good integration with JSON object mappers. And thanks to the interoperability between Java and Kotlin, we can use it almost as is in Kotlin projects.

Today, we will use the Elasticsearch API Client and Kotlin to query data from the cluster we set up in the previous article. As well as exploring two main ways to create a request with this new API.

You can download the sample code in this article from the link below.

https://github.com/duongntbk/elasticsearchclient-demo

Prerequisites

Set up and populate an Elasticsearch cluster locally

Please follow this guide to set up an Elasticsearch cluster on your local machine. Also, we will use the exact same test data as in our previous article.

PUT footballer/_bulk
{ "create": { } }
{ "name": "Ronaldo","position":"fw", "age": 38, "salary": 4430}
{ "create": { } }
{ "name": "Messi","position":"fw", "age": 36, "salary": 1440}
{ "create": { } }
{ "name": "Sancho","position":"lw", "age": 23, "salary": 373}
{ "create": { } }
{ "name": "Antony","position":"lw", "age": 23, "salary": 200}
{ "create": { } }
{ "name": "Salah","position":"rw", "age": 30, "salary": 350}
{ "create": { } }
{ "name": "Vinicius Junior","position":"lw", "age": 22, "salary": 354}
{ "create": { } }
{ "name": "Mahrez","position":"rw", "age": 32, "salary": 160}
{ "create": { } }
{ "name": "Rashford","position":"fw", "age": 25, "salary": 247}
{ "create": { } }
{ "name": "Bukayo Saka","position":"rw", "age": 21, "salary": 70}
{ "create": { } }
{ "name": "Gnabry","position":"rw", "age": 27, "salary": 365}

Install the necessary dependencies

As can be seen in the build.gradle.kts file, we only need to add three libraries.

implementation("org.elasticsearch.client:elasticsearch-rest-client:8.8.1")
implementation ("co.elastic.clients:elasticsearch-java:8.8.1")
implementation ("com.fasterxml.jackson.core:jackson-databind:2.12.3")

Connect to the cluster

The code to create a connection to our cluster is in the ElasticsearchClientWrapper class. Below are some of the important parts.

Providing the CA Fingerprint, this will be used to verify the identity of the Elasticsearch cluster. Also, we need to specify a username and password here as well.

val sslContext = TransportUtils.sslContextFromCaFingerprint(fingerprint)
val credsProv = BasicCredentialsProvider()
credsProv.setCredentials(AuthScope.ANY, UsernamePasswordCredentials(login, password))

Our local cluster only accepts HTTPS connections, which is why we need to specify the https protocol here.

.builder(HttpHost("localhost", 9200, "https"))

As our certificate is self-signed, we need to bypass the hostname verification process. Needless to say, we should never do this in production.

.setSSLHostnameVerifier { _, _ -> true } // DANGER!!

Our wrapper class implements the Closeable interface. An interesting detail is that instead of closing the ElasticsearchClient object, we need to close the RestClientTransport object.

transport.close()

Use the wrapper to send a request and read the response

Send a request

As can be seen here our wrapper simply forwards the request to the ElasticsearchClient object. The response will be in JSON format, so we also need to provide a POJO class to parse each document in the response. We will use the data class Footballer in our example.

val response = client.search(request, Footballer::class.java)

Note: the wrapper only accepts requests as a SearchRequest object. But the ElasticsearchClient class also accepts a lambda to create the request. However, this is out of scope for today’s article.

Read the response

The response is of type SearchResponse<TDocument>, or SearchResponse<Footballer> in our case. I wrote this simple function to print information retrieved from the response.

Print the total number of returned documents.

println("Hits: ${response.hits().total()?.value()}")

For each document, we print its name and score.

for (hit in response.hits().hits()) {
    println("Name: ${hit.source()?.name}, Score: ${hit.score()}")
}

Write a few simple queries

Using the provided DSL

The default method to create a SearchRequest is via a Domain Specific Language (DSL). Its syntax is very similar to the HTTP requests we sent via Kibana’s developer console in the last article.

For example, here is an HTTP request to find documents with name == Rashford.

GET /footballer/_search
{
  "query" : {
    "match" : { "name": "Rashford" }
  }
}

And here is how we write the same request for the Elasticsearch API client.

val request = SearchRequest.of { s -> s
    .index("footballer")
    .query { q -> q
        .match { t -> t
            .field("name")
            .query("Rashford")
        }
    }
}

This is the response from the Elasticsearch cluster, printed with the function printResults. It matches our expectations.

Hits: 1
Name: Rashford, Score: 2.1382177

Using QueryBuilder classes

The DSL is nice and all, but sometimes using the QueryBuilder classes is more straightforward, especially for people who are used to the old HighLevelRestClient. Fortunately, the new client also supports building queries with QueryBuilder classes.

We will build a SearchRequest object equivalent to the following HTTP request.

GET /footballer/_search
{
  "query" : {
    "bool" : {
      "should": [
        { "term": { "position": "lw" }},
        { "term": { "position": { "value": "rw", "boost": 2 }}}
      ]
    }
  }
}

As we can see, it is composed of two TermQuery objects grouped by a should operator under a BoolQuery. And the second TermQuery has a boost factor of 2. Let’s recreate it with QueryBuilder classes.

val term1 = TermQuery.Builder().field("position").value("lw").build()._toQuery()
val term2 = TermQuery.Builder().field("position").value("rw").boost(2F).build()._toQuery()
val boolQuery = BoolQuery.Builder()
    .should(term1, term2)
    .build()
    ._toQuery()

val request = SearchRequest.Builder()
    .index("footballer")
    .query(boolQuery)
    .build()

The request above should print the following result to console. As we can see, players who are right winger received a small boost.

Hits: 7
Name: Salah, Score: 1.7876358
Name: Mahrez, Score: 1.7876358
Name: Bukayo Saka, Score: 1.7876358
Name: Gnabry, Score: 1.7876358
Name: Sancho, Score: 1.1451323
Name: Antony, Score: 1.1451323
Name: Vinicius Junior, Score: 1.1451323

Mixing both methods together

After reading all this, we might ask what is the better approach? Should we use the DSL or the QueryBuilder classes? Well, only a Sith deals in absolutes. It turns out that mixing both methods together helps our code become less verbose and more readable.

Below is the final HTTP request we used in the previous article.

GET /footballer/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "bool": {
            "should": [
              { "term": { "position": { "value": "rw", "boost": 2 }}},
              { "term": { "position": "lw" }}
            ]
          }
        },
        {
          "bool": {
            "should": [
              { "range": { "age": { "lte": 23 }}},
              { "range": { "salary": {"lte": 200, "boost": 2 }}}
            ]
          }
        }
      ]
    }
  }
}

It is possible to create an equivalent SearchRequest object using only the DSL or only the QueryBuilder classes. But as we can see in those links, the DSL-only version is hard to read, and we cannot see the overall structure of the request in the QueryBuilder-only version.

Instead, we can have the best of both worlds by combining those methods together. First, we create the leaf nodes using QueryBuilder classes.

val positionTerm1 = TermQuery.Builder().field("position").value("rw").boost(2F).build()._toQuery()
val positionTerm2 = TermQuery.Builder().field("position").value("lw").build()._toQuery()
val ageRange = RangeQuery.Builder().field("age").lte(JsonData.of(23)).build()._toQuery()
val salaryRange = RangeQuery.Builder().field("salary").lte(JsonData.of(200)).boost(2F).build()._toQuery()

Then we use the DSL to create the overall structure of our request, substituting in the leaf nodes created above.

val request = SearchRequest.of { s -> s
    .index("footballer")
    .query { q -> q
        .bool { b -> b
            .must { m -> m
                .bool { b -> b
                    .should(positionTerm1)
                    .should(positionTerm2)
                }
            }
            .must { m -> m
                .bool { b -> b
                    .should(ageRange)
                    .should(salaryRange)
                }
            }
        }
    }
}

And the result is similar to what we got last time.

Hits: 5
Name: Bukayo Saka, Score: 4.787636
Name: Antony, Score: 4.145132
Name: Mahrez, Score: 3.7876358
Name: Sancho, Score: 2.1451323
Name: Vinicius Junior, Score: 2.1451323

Conclusion

The new API Client makes querying data from an Elasticsearch cluster a breeze. Although it might take some time to get used to, the DSL syntax can help us preserve the overall structure of a request. And the good old QueryBuilder classes help us keep our code readable as our queries grow in complexity.

A software developer from Vietnam and is currently living in Japan.

One Thought on “Query data with Kotlin and Elasticsearch API Client”

Leave a Reply