Install Apache Hadoop on Ubuntu 17.10!

28/12/2020

Apache Hadoop is a big data solution for storing and analyzing large amounts of data. In this article we will detail the complex setup steps for Apache Hadoop to get you started with it on Ubuntu as rapidly as possible.  In this post, we will install Apache Hadoop on a Ubuntu 17.10 machine.

Ubuntu Version

For this guide, we will use Ubuntu version 17.10 (GNU/Linux 4.13.0-38-generic x86_64).

Updating existing packages

To start the installation for Hadoop, it is necessary that we update our machine with latest software packages available. We can do this with:

sudo apt-get update && sudo apt-get -y dist-upgrade

As Hadoop is based on Java, we need to install it on our machine. We can use any Java version above Java 6. Here, we will be using Java 8:

sudo apt-get -y install openjdk-8-jdk-headless

Downloading Hadoop files

All the necessary packages now exist on our machine. We’re ready to download the required Hadoop TAR files so that we can start setting them up and run a sample program with Hadoop as well.

In this guide, we will be installing Hadoop v3.0.1. Download the corresponding files with this command:

wget http://mirror.cc.columbia.edu/pub/software/apache/hadoop/common/hadoop-3.0.1/hadoop-3.0.1.tar.gz

Depending upon the network speed, this can take up to a few minutes as the file is big in size:

Downloading Hadoop

Find latest Hadoop binaries here. Now that we have the TAR file downloaded, we can extract in the current directory:

tar xvzf hadoop-3.0.1.tar.gz

This will take a few seconds to complete due to big file size of the archive:

Hadoop Unarchived

Added a new Hadoop User Group

As Hadoop operates over HDFS, a new file system can disturn our own file system on the Ubuntu machine as well. To avoid this collission, we will create a completely separate User Group and assign it to Hadoop so it contains its own permissions. We can add a new user group with this command:

addgroup hadoop

We will see something like:

Adding Hadoop user group

We are ready to add a new user to this group:

useradd -G hadoop hadoopuser

Please take note that all the commands we run are as root user itself. With aove command, we were able to add a new user to the group we created.

To allow Hadoop user to perform operations, we need to provide it with root access as well. Open the /etc/sudoers file with this command:

sudo visudo

Before we add anything, the file will look like:

Sudoers file before adding anything

Add the following line to the end of the file:

hadoopuser ALL=(ALL) ALL

Now the file will look like:

Sudoers file after adding Hadoop user

This was the main setup for providing Hadoop a platform to perform actions. We are ready to setup a single node Hadoop cluster now.

Hadoop Single Node Setup: Standalone Mode

When it comes to the real power of Hadoop, it is usually set up across multiple servers so that it can scale on top of a large amount of dataset present in Hadoop Distributed File System (HDFS). This is usually fine with debugging environments and not used for production usage. To keep the process simple, we will explain how we can do a single node setup for Hadoop here.

Once we’re done installing Hadoop, we will also run a sample application on Hadoop. As of now, Hadoop file is named as hadoop-3.0.1. let’s rename it to hadoop for simpler usage:

mv hadoop-3.0.1 hadoop

The file now looks like:

Moving Hadoop

Time to make use of the hadoop user we created earlier and assign the ownership of this file to that user:

chown -R hadoopuser:hadoop /root/hadoop

A better location for Hadoop will be the /usr/local/ directory, so let’s move it there:

mv hadoop /usr/local/
cd /usr/local/

Adding Hadoop to Path

To execute Hadoop scripts, we will be adding it to the path now. To do this, open the bashrc file:

vi ~/.bashrc

Add these lines to the end of the .bashrc file so that path can contain the Hadoop executable file path:

# Configure Hadoop and Java Home
export HADOOP_HOME=/usr/local/hadoop
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

export PATH=$PATH:$HADOOP_HOME/bin

File looks like:

Adding Hadoop to Path

As Hadoop makes use of Java, we need to tell the Hadoop environment file hadoop-env.sh where it is located. The location of this file can vary based on Hadoop versions. To easily find where this file is located, run the following command right outside the Hadoop directory:

find hadoop/ -name hadoop-env.sh

We will get the output for the file location:

Environment file location

Let’s edit this file to inform Hadoop about the Java JDK location and insert this on the last line of the file and save it:

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

The Hadoop installation and setup is now complete. We are ready to run our sample application now. But wait, we never made a sample application!

Running Sample application with Hadoop

Actually, Hadoop installation comes with an in-built sample application which is ready to run once we are done with installing Hadoop. Sounds good, right?

Run the following command to run the JAR example:

hadoop jar /root/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.1.jar wordcount /root/hadoop/README.txt /root/Output

Hadoop will show how much processing it did at the node:

Hadoop processing stats

Once you execute the following command, we see the file part-r-00000 as an output. Go ahead and look at the content of the output:

cat part-r-00000

You will get something like:

Word Count output by Hadoop

Conclusion

In this lesson, we looked at how we can install and start using Apache Hadoop on Ubuntu 17.10 machine.  Hadoop is great for storing and analyzing vast amount of data and I hope this article will help you get started using it on Ubuntu quickly.

ONET IDC thành lập vào năm 2012, là công ty chuyên nghiệp tại Việt Nam trong lĩnh vực cung cấp dịch vụ Hosting, VPS, máy chủ vật lý, dịch vụ Firewall Anti DDoS, SSL… Với 10 năm xây dựng và phát triển, ứng dụng nhiều công nghệ hiện đại, ONET IDC đã giúp hàng ngàn khách hàng tin tưởng lựa chọn, mang lại sự ổn định tuyệt đối cho website của khách hàng để thúc đẩy việc kinh doanh đạt được hiệu quả và thành công.
Bài viết liên quan

Best Open Source Games in 2020

There’s no reason to spend $60 on a new AAA release just to bring some variety into your gaming diet when there are many...
29/12/2020

What Is a Boot Loader?

Although boot loaders tend to be very small and relatively simple, they play a critical role in the boot process. Visit...
28/12/2020

Cách lựa chọn proxy provider tốt nhất cho nhu cầu của bạn.

Khi bạn cần sử dụng proxy để truy cập vào các trang web bị chặn hoặc giới hạn về địa chỉ...
27/02/2023
Bài Viết

Bài Viết Mới Cập Nhật

Dịch vụ thuê mua proxy US UK uy tín, chất lượng số #1
13/05/2024

Thuê mua proxy Việt Nam: Báo giá & các thông tin MỚI NHẤT
13/05/2024

Dịch vụ thuê mua proxy giá rẻ an toàn, tốc độ cao
13/05/2024

Thuê mua proxy V6 uy tín, chất lượng tại đâu?
11/05/2024

Thuê mua proxy Tiktok tăng doanh thu, hiệu quả cao
11/05/2024