Apache Solr Tutorial

28/12/2020
Chưa phân loại
In this lesson, we will see how we can use Apache Solr to store data and how we can run various queries upon it.

What is Apache Solr

Apache Solr is one of the most popular NoSQL databases which can be used to store data and query it in near real-time. It is based on Apache Lucene and is written in Java. Just like Elasticsearch, it supports database queries through REST APIs. This means that we can use simple HTTP calls and use HTTP methods like GET, POST, PUT, DELETE etc. to access data. It also provides an option to get data in the form of XML or JSON through the REST APIs.

Architecture: Apache Solr

Before we can start working with Apache Solr, we must understand the components that constitutes Apache Solr. Let’s have a look at some components it has:

Apache Solr Architecture

Note that only major components for Solr are shown in above figure. Let’s understand their functionality here as well:

  • Request Handlers: The requests a client makes to Solr are managed by a Request Handler. The request can be anything from adding a new record to update an index in Solr. Handlers identify the type of request from the HTTP method used with the request mapping.
  • Search Component: This is one of the most important component Solr is known for. Search Component takes care of performing search related operations like fuzziness, spell checks, term queries etc.
  • Query Parser: This is the component which actually parses the query a client passes to the request handler and breaks a query into multiple parts which can be understood by the underlying engine
  • Response Writer: This component is responsible for managing the output format for the queries passed to the engine. Response Writer allows us to provide an output in various formats like XML, JSON etc.
  • Analyzer/Tokenizer: Lucene Engine understands queries in the form of multiple tokens. Solr analyzes the query, breaks it into multiple tokens and passes it to the Lucene Engine.
  • Update Request Processor: When a query is run and it performs operations like updating an index and data related to it, the Update Request Processor component is responsible for managing the data in the index and modifying it.

Getting Started with Apache Solr

To start using Apache Solr, it must be installed on the machine. To do this, read Install Apache Solr on Ubuntu.

Make sure you have an active Solr installation if you want to try examples we present later in the lesson and admin page is reachable on localhost:

Apache Solr Homepage

Inserting Data

To start, let us consider a Collection in Solr which we call as linux_hint_collection. There is no need to explicitly define this collection as when we insert the first object, the collection will be made automatically. Let’s try our first REST API call to insert a new object into the collection named linux_hint_collection.

Inserting Data

curl -X POST -H ‘Content-Type: application/json’
‘http://localhost:8983/solr/linux_hint_collection/update/json/docs’ –data-binary
{
"id": "iduye",
"name": "Shubham"
}’

Here is what we get back with this command:

Command to insert data into Solr

Data can also be inserted using the Solr Homepage we looked at earlier. Let’s try this here so that things are clear:

Insert Data via Solr Homepage

As Solr has an excellent way of interaction with HTTP RESTful APIs, we will be demonstrating DB interaction using the same APIs from now onwards and won’t focus much on inserting data through the Solr Webpage.

List All Collections

We can list all collections in Apache Solr using a REST API as well. Here is the command we can use:

List All Collections

curl http://localhost:8983/solr/admin/collections?actions=LIST&wt=json

Let’s see the output for this command:

We see two collections here which exist in our Solr installation.

Get Object by ID

Now, let us see how we can GET data from Solr collection with a specific ID. Here is the REST API command:

Get Object by ID

curl http://localhost:8983/solr/linux_hint_collection/get?id=iduye

Here is what we get back with this command:

Get All Data

In our last REST API, we queried data using a specific ID. This time, we will get all data present in our Solr collection.

Get Object by ID

curl http://localhost:8983/solr/linux_hint_collection/select?q=*:*

Here is what we get back with this command:

Notice that we have used ‘*:*’ in query parameter. This specifies that Solr should return all data present in the collection. Even if we have specified that all data should be returned, Solr understands that the collection might have large amount of data in it and so, it will only return first 10 documents.

Deleting All Data

Till now, all APIs we tried were using a JSON format. This time, we will give a try to XML query format. Using XML format is extremely similar to JSON as XML is self-descriptive as well.

Let’s try a command to delete all data we have in our collection.

Deleting All Data

curl "http://localhost:8983/solr/linux_hint_collection/update?commit=true" -H "Content-Type: text/xml" –data-binary "*:*"

Here is what we get back with this command:

Delete all data using XML query

Now, if we again try getting all data, we will see that no data is available now:

Get All data

Total Object Count

For a final CURL command, let’s see a command with which we can find the number of objects which are present in an index. Here is the command for the same:

Total Object Count

curl http://localhost:8983/solr/linux_hint_collection/query?debug=query&q=*:*

Here is what we get back with this command:

Count number of Objects

Conclusion

In this lesson, we looked at how we can use Apache Solr and pass queries using curl in both JSON and XML format. We also saw that the Solr admin panel is useful in same manner as all curl commands we studied.

ONET IDC thành lập vào năm 2012, là công ty chuyên nghiệp tại Việt Nam trong lĩnh vực cung cấp dịch vụ Hosting, VPS, máy chủ vật lý, dịch vụ Firewall Anti DDoS, SSL… Với 10 năm xây dựng và phát triển, ứng dụng nhiều công nghệ hiện đại, ONET IDC đã giúp hàng ngàn khách hàng tin tưởng lựa chọn, mang lại sự ổn định tuyệt đối cho website của khách hàng để thúc đẩy việc kinh doanh đạt được hiệu quả và thành công.
Bài viết liên quan

Best GIF Maker Apps for Linux

GIF animations are a popular way to share fun animations, jokes and short clips of just about anything and it is hard to...
29/12/2020

How to Delete Remote Branch in Git

In this article, I am going to show you how to delete or remove a remote Git branch. I will use a GitHub repository as...
29/12/2020

How HTTPS Works? — Beginner’s Guide

Certificate Authorities are one of the single most important cornerstone for Internet security. A certificate authority...
29/12/2020
Bài Viết

Bài Viết Mới Cập Nhật

Reliable IPv4 and IPv6 Subnet Rental Services: The Perfect Solution for Global Businesses
23/12/2024

Tìm Hiểu Về Thuê Proxy US – Lợi Ích và Cách Sử Dụng Hiệu Quả
11/12/2024

Mua Proxy V6 Nuôi Facebook Spam Hiệu Quả Tại Onetcomvn
03/06/2024

Hướng dẫn cách sử dụng ProxyDroid để duyệt web ẩn danh
03/06/2024

Mua proxy Onet uy tín tại Onet.com.vn
03/06/2024