Best Self-Hosted Search Engines  

29/12/2020
Chưa phân loại
Does your boss know that you’re looking for another job? Have you told your significant other about the inability to decide whether you want to have children or not? Do you parents know about your sexual orientation? Well, Google and other major search engines do.

“Most users search Google while signed in, so all of the information on their online life is available: YouTube searches, emails, and past search history,” says Adam Tauber, the lead developer of privacy-respecting metasearch engine Searx.

Of course, you could use Tor for anonymity and always delete all traces of your activity after each search, but doing so after each and every search would most likely get old pretty quickly. Instead, you should consider installing a self-hosted search engine capable of retrieving information for you without disclosing anything sensitive about you.

We have selected two such search engines, and we also introduce three additional search engines to show you that excellent alternatives to proprietary search engines such as Google or Bing already exist and are easier to install and use than you might think.

1. YaCy

YaCy is a free distributed peer-to-peer search engine whose core component is written in Java. Because all YaCy users are equal, and because the search engine doesn’t store user search requests, censorship is simply not possible.

Currently, YaCy indexes about 1.4 billion documents in its index thanks to the activity of more than 600 peer operators who contribute to it each month. For comparison, the Google Search index contains hundreds of billions of webpages and is well over 100,000,000 gigabytes in size.

While YaCy still has a long way to go before it can rival the largest centralized search engines in the world, it’s already usable as a search portal for private intranets and project-specific applications because YaCy can operate as a single search appliance without networking with other peers.

YaCy can be easily integrated into any web page thanks to its simple code snippets that can be effortlessly copied and pasted without any modification.

2. Searx

Searx is described as a privacy-respecting, hackable metasearch engine. It’s available under the GNU Affero General Public License version 3, and its main goal is to protect the privacy of its users by never sharing users’ IP addresses or search history with the search engines from which it gathers results.

“When using Searx, the IP address of Searx, a random User-Agent and a search query is sent to Google by default,” Adam Tauber, aka asciimoo, explains how his metasearch engine works. “Of course, you can customize Searx to forward other extra parameters like search language or the page number of the requested result page.”

Searx automatically blocks all tracking cookies served by the search engines to prevent user-profiling-based results modification, which can result from a search engine trying to implement search which is individualized based on what the engine knows about the user. Searx is 100 percent free, and anyone can modify it as needed. You can even take the Searx code and run the metasearch engine on your own server, which should definitely address any concerns you might have regarding logs.

3. ElasticSearch

ElasticSearch is a search engine based on Lucene, a free and open-source information retrieval software library supported by the Apache Software Foundation and is released under the Apache Software License.

ElasticSearch provides a full-text search engine with an HTTP web interface. The search engine can be used to search all kinds of documents, and it can be easily distributed across multiple nodes.

It’s possible to build a self-hosted search engine using ElasticSearch and Docker, and you can find a tutorial that describes the process here.

4. Ambar

Ambar is an open-source document search engine with many useful features. It supports automated crawling, tagging, and instant full-text search, just to give a few examples. One of the most exciting features of Ambar is its ability to perform OCR on images and PDF files. The supported languages include English, German, Russian, Italian, French, Spanish, Polish, and Dutch.

Ambar can be easily deployed with a single docker-compose file, and you can learn how to do it here.

5. Apache Solr

Written in Java, Apache Solr is an enterprise search platform that includes full-text search, hit highlighting, faceted search, real-time indexing, dynamic clustering, and many other important features. It was created in 2004 for an in-house project at CNET Networks. CNET Networks kindly donated it to the Apache Software Foundation in 2006, where it graduated from incubation status into a standalone top-level project in 2007.

Today, Solr is a highly reliable, scalable, and fault tolerant, enterprise search platform that powers the search and navigation features of many of the world’s largest internet sites, including DuckDuckGo, eHarmony, and BestBuy. You can

How to Install and Configure YaCy

The installation of YaCy is very simple, and it takes only a couple of minutes because you don’t need to install an external database or web server—YaCy comes with everything needed.

  1. Go to the official website of YaCy and download the latest package for Linux.
  2. Install the OpenJDK 8 runtime environment.
    • If you’re using a Debian-based distribution, use the following command: $ sudo apt-get install openjdk-8-jre
    • If not, follow the instructions specific for your distribution.
  3. Extract the downloaded package to your preferred location.
  4. Go to the new folder and start the “startYACY.sh” script in Terminal.
  5. You should see a confirmation message informing you that YaCy started as a daemon

Conclusion

Search engines know more about us than most people would like to admit. If you would like to stop feeding big corporations with juicy data, you can take things into your own hands and set up a self-hosted search engine to protect your privacy. Although self-hosted search engines still have a long way to go to become fully usable, the potential for them to outperform the likes of Google is there and capturing it is just a matter of attracting more users.

ONET IDC thành lập vào năm 2012, là công ty chuyên nghiệp tại Việt Nam trong lĩnh vực cung cấp dịch vụ Hosting, VPS, máy chủ vật lý, dịch vụ Firewall Anti DDoS, SSL… Với 10 năm xây dựng và phát triển, ứng dụng nhiều công nghệ hiện đại, ONET IDC đã giúp hàng ngàn khách hàng tin tưởng lựa chọn, mang lại sự ổn định tuyệt đối cho website của khách hàng để thúc đẩy việc kinh doanh đạt được hiệu quả và thành công.
Bài viết liên quan

Bash Parameter Expansion

The parameter is used in bash to store data. Different types of data can be stored in the parameter, such as integer, string,...
29/12/2020

Krita Digital Painting Application for Linux

For almost every application software on Windows there is an alternative on Linux systems and Krita is one of them. Krita...
28/12/2020

How to Use PulseAudio on Arch Linux

PulseAudio is a sound system for POSIX systems. Meaning, it acts as a proxy for all the sounds your system produces. Before...
29/12/2020
Bài Viết

Bài Viết Mới Cập Nhật

Reliable IPv4 and IPv6 Subnet Rental Services: The Perfect Solution for Global Businesses
23/12/2024

Tìm Hiểu Về Thuê Proxy US – Lợi Ích và Cách Sử Dụng Hiệu Quả
11/12/2024

Mua Proxy V6 Nuôi Facebook Spam Hiệu Quả Tại Onetcomvn
03/06/2024

Hướng dẫn cách sử dụng ProxyDroid để duyệt web ẩn danh
03/06/2024

Mua proxy Onet uy tín tại Onet.com.vn
03/06/2024