Nhập mã khuyến mãi ONETCOMVN được giảm 10%

Tài Khoản

Installing Apache Spark on Ubuntu 17.10

28/12/2020

đang xem

Tin Tức

Apache Spark is a data analytics tool that can be used to process data from HDFS, S3 or other data sources in memory. In this post, we will install Apache Spark on a Ubuntu 17.10 machine.

Ubuntu Version

For this guide, we will use Ubuntu version 17.10 (GNU/Linux 4.13.0-38-generic x86_64).

Apache Spark is a part of the Hadoop ecosystem for Big Data. Try Installing Apache Hadoop and make a sample application with it.

Updating existing packages

To start the installation for Spark, it is necessary that we update our machine with latest software packages available. We can do this with:

sudo apt-get update && sudo apt-get -y dist-upgrade

As Spark is based on Java, we need to install it on our machine. We can use any Java version above Java 6. Here, we will be using Java 8:

sudo apt-get -y install openjdk-8-jdk-headless

Downloading Spark files

All the necessary packages now exist on our machine. We’re ready to download the required Spark TAR files so that we can start setting them up and run a sample program with Spark as well.

In this guide, we will be installing Spark v2.3.0 available here:

Spark download page

Download the corresponding files with this command:

wget http://www-us.apache.org/dist/spark/spark-2.3.0/spark-2.3.0-bin-hadoop2.7.tgz

Depending upon the network speed, this can take up to a few minutes as the file is big in size:

Downloading Apache Spark

Now that we have the TAR file downloaded, we can extract in the current directory:

tar xvzf spark-2.3.0-bin-hadoop2.7.tgz

This will take a few seconds to complete due to big file size of the archive:

Unarchived files in Spark

When it comes to upgrading Apache Spark in future, it can create problems due to Path updates. These issues can be avoided by creating a softlink to Spark. Run this command to make a softlink:

ln -s spark-2.3.0-bin-hadoop2.7 spark

Adding Spark to Path

To execute Spark scripts, we will be adding it to the path now. To do this, open the bashrc file:

vi ~/.bashrc

Add these lines to the end of the .bashrc file so that path can contain the Spark executable file path:

SPARK_HOME=/LinuxHint/spark
export PATH=$SPARK_HOME/bin:$PATH

Now, the file looks like:

Adding Spark to PATH

To activate these changes, run the following command for bashrc file:

source ~/.bashrc

Launching Spark Shell

Now when we are right outside the spark directory, run the following command to open apark shell:

./spark/bin/spark-shell

We will see that Spark shell is openend now:

Launching Spark shell

We can see in the console that Spark has also opened a Web Console on port 404. Let’s give it a visit:

Apache Spark Web Console

Though we will be operating on console itself, web environment is an important place to look at when you execute heavy Spark Jobs so that you know what is happening in each Spark Job you execute.

Check the Spark shell version with a simple command:

sc.version

We will get back something like:

res0: String = 2.3.0

Making a sample Spark Application with Scala

Now, we will make a sample Word Counter application with Apache Spark. To do this, first load a text file into Spark Context on Spark shell:

scala> var Data = sc.textFile("/root/LinuxHint/spark/README.md")
Data: org.apache.spark.rdd.RDD[String] = /root/LinuxHint/spark/README.md MapPartitionsRDD[1] at textFile at :24

scala>

Now, the text present in the file must be broken into tokens which Spark can manage:

scala> var tokens = Data.flatMap(s => s.split(" "))
tokens: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[2] at flatMap at :25

scala>

Now, initialise the count for each word to 1:

scala> var tokens_1 = tokens.map(s => (s,1))
tokens_1: org.apache.spark.rdd.RDD[(String, Int)] = MapPartitionsRDD[3] at map at :25

scala>

Finally, calculate the frequency of each word of the file:

var sum_each = tokens_1.reduceByKey((a, b) => a + b)

Time to look at the output for the program. Collect the tokens and their respective counts:

scala> sum_each.collect()
res1: Array[(String, Int)] = Array((package,1), (For,3), (Programs,1), (processing.,1), (Because,1), (The,1), (page](http://spark.apache.org/documentation.html).,1), (cluster.,1), (its,1), ([run,1), (than,1), (APIs,1), (have,1), (Try,1), (computation,1), (through,1), (several,1), (This,2), (graph,1), (Hive,2), (storage,1), (["Specifying,1), (To,2), ("yarn",1), (Once,1), (["Useful,1), (prefer,1), (SparkPi,2), (engine,1), (version,1), (file,1), (documentation,,1), (processing,,1), (the,24), (are,1), (systems.,1), (params,1), (not,1), (different,1), (refer,2), (Interactive,2), (R,,1), (given.,1), (if,4), (build,4), (when,1), (be,2), (Tests,1), (Apache,1), (thread,1), (programs,,1), (including,4), (./bin/run-example,2), (Spark.,1), (package.,1), (1000).count(),1), (Versions,1), (HDFS,1), (D…
scala>

Excellent! We were able to run a simple Word Counter example using Scala programming language with a text file already present in the system.

Conclusion

In this lesson, we looked at how we can install and start using Apache Spark on Ubuntu 17.10 machine and run a sample application on it as well.

Bài Viết Mới

Hướng dẫn fake ip bằng phần mềm SStap

VPS treo game là gì? Thuê VPS treo game giá rẻ, không lo giật lag

BitBrowser – Best Anti-Detect Browser!

Dịch Vụ Xây Dựng Hệ Thống Peering Với Internet Exchange (IXP)

Dịch Vụ Triển Khai VPN Site-to-Site & Remote Access

Dịch Vụ Thiết Lập Hệ Thống Tường Lửa (Firewall)

Dịch Vụ Triển Khai Hệ Thống Ảo Hóa & Cloud

Dịch Vụ Triển Khai Hệ Thống Ceph

Dịch Vụ Triển Khai Hệ Thống BGP Multi-Peer Cho ISP

Hướng Dẫn Chọn Dịch Vụ Thuê Địa Chỉ IPv4

Bài Viết

Bài Viết Mới Cập Nhật

Hướng dẫn fake ip bằng phần mềm SStap

Hướng dẫn Tải và cài đặt Các bạn vào Google gõ từ khóa “Download SStap” hoặc vào sẵn link https://sourceforge.net/projects/sstap/files/latest/download Sau...

10/06/2025

VPS treo game là gì? Thuê VPS treo game giá rẻ, không lo giật lag

Bạn đam mê những tựa game online và muốn cày cuốc không ngừng nghỉ, nhưng chiếc máy tính cá nhân lại không đủ “trâu”...

02/06/2025

BitBrowser – Best Anti-Detect Browser!

Good anti association effect, complete browser fingerprint modification, affordable price! Please recommend it to friends around you! BitBrowser – anti detect browser, Dorang Account Defense Association ⚙️ Function: – RPA automation – API script – Extended plug -in – Window synchronization – Support Global Proxy IP Used for: Capital monetization, crypto，E-commerce, Social Media Marketing, Shopping Price Comparison, Price Comparison, Advertising, Alliance Marketing, Agency Operation, Self-testing etc. ♾️10 Profiles for Free ♾️ Free registration link：https://www.bitbrowser.net/vi/?code=5df4f4ec WhatsApp service group : https://chat.whatsapp.com/FCQaHfHbR351GIje98OIA9 Technical service group : https://t.me/bitbrowser000

26/05/2025

Dịch Vụ Xây Dựng Hệ Thống Peering Với Internet Exchange (IXP)

Peering với Internet Exchange (IXP) là giải pháp quan trọng giúp tăng tốc độ kết nối, giảm độ trễ, tối ưu chi phí băng thông...

04/04/2025

Dịch Vụ Triển Khai VPN Site-to-Site & Remote Access

Giới Thiệu VPN (Virtual Private Network) là giải pháp quan trọng giúp bảo mật dữ liệu, đảm bảo kết nối an toàn giữa các...

04/04/2025

Bài Viết Mới

Hướng dẫn fake ip bằng phần mềm SStap

VPS treo game là gì? Thuê VPS treo game giá rẻ, không lo giật lag

BitBrowser – Best Anti-Detect Browser!

Dịch Vụ Xây Dựng Hệ Thống Peering Với Internet Exchange (IXP)

Dịch Vụ Triển Khai VPN Site-to-Site & Remote Access

Dịch Vụ Thiết Lập Hệ Thống Tường Lửa (Firewall)

Dịch Vụ Triển Khai Hệ Thống Ảo Hóa & Cloud

Dịch Vụ Triển Khai Hệ Thống Ceph

Dịch Vụ Triển Khai Hệ Thống BGP Multi-Peer Cho ISP

Hướng Dẫn Chọn Dịch Vụ Thuê Địa Chỉ IPv4

Hotline/Zalo

09.016.19.525

Nhận chương trình khuyến mãi từ ONET IDC

72 Lê Thánh Tôn, P.Bến Nghé, Quận 1, TP HCM

1001 S MAIN ST STE 600 KALISPELL, MT 59901

Điện thoại: 09.016.19.525

Email liên hệ:

[email protected]

Installing Apache Spark on Ubuntu 17.10

Updating existing packages

Downloading Spark files

Adding Spark to Path

Launching Spark Shell

Making a sample Spark Application with Scala

Conclusion

Bài Viết Mới

Install Telegram on Linux

Vim and Ctags

Mua proxyv6 | Thuê Proxy V6 US VN UK, Proxy V6 Xoay cho Youtube subnet /32

Bài Viết Mới Cập Nhật

Hướng dẫn fake ip bằng phần mềm SStap

Hướng dẫn Tải và cài đặt Các bạn vào Google gõ từ khóa “Download SStap” hoặc vào sẵn link https://sourceforge.net/projects/sstap/files/latest/download Sau...

10/06/2025

VPS treo game là gì? Thuê VPS treo game giá rẻ, không lo giật lag

Bạn đam mê những tựa game online và muốn cày cuốc không ngừng nghỉ, nhưng chiếc máy tính cá nhân lại không đủ “trâu”...

02/06/2025

Dịch Vụ Xây Dựng Hệ Thống Peering Với Internet Exchange (IXP)

Peering với Internet Exchange (IXP) là giải pháp quan trọng giúp tăng tốc độ kết nối, giảm độ trễ, tối ưu chi phí băng thông...

04/04/2025

Dịch Vụ Triển Khai VPN Site-to-Site & Remote Access

Giới Thiệu VPN (Virtual Private Network) là giải pháp quan trọng giúp bảo mật dữ liệu, đảm bảo kết nối an toàn giữa các...

04/04/2025

Bài Viết Mới

CHÍNH SÁCH & ĐIỀU KHOẢN

Installing Apache Spark on Ubuntu 17.10

Updating existing packages

Downloading Spark files

Adding Spark to Path

Launching Spark Shell

Making a sample Spark Application with Scala

Conclusion

Bài Viết Mới

Install Telegram on Linux

Vim and Ctags

Mua proxyv6 | Thuê Proxy V6 US VN UK, Proxy V6 Xoay cho Youtube subnet /32

Bài Viết Mới Cập Nhật

Hướng dẫn fake ip bằng phần mềm SStap Hướng dẫn Tải và cài đặt Các bạn vào Google gõ từ khóa “Download SStap” hoặc vào sẵn link https://sourceforge.net/projects/sstap/files/latest/download Sau... 10/06/2025

VPS treo game là gì? Thuê VPS treo game giá rẻ, không lo giật lag Bạn đam mê những tựa game online và muốn cày cuốc không ngừng nghỉ, nhưng chiếc máy tính cá nhân lại không đủ “trâu”... 02/06/2025

Dịch Vụ Xây Dựng Hệ Thống Peering Với Internet Exchange (IXP) Peering với Internet Exchange (IXP) là giải pháp quan trọng giúp tăng tốc độ kết nối, giảm độ trễ, tối ưu chi phí băng thông... 04/04/2025

Dịch Vụ Triển Khai VPN Site-to-Site & Remote Access Giới Thiệu VPN (Virtual Private Network) là giải pháp quan trọng giúp bảo mật dữ liệu, đảm bảo kết nối an toàn giữa các... 04/04/2025

Bài Viết Mới

CHÍNH SÁCH & ĐIỀU KHOẢN

Hướng dẫn fake ip bằng phần mềm SStap

Hướng dẫn Tải và cài đặt Các bạn vào Google gõ từ khóa “Download SStap” hoặc vào sẵn link https://sourceforge.net/projects/sstap/files/latest/download Sau...

10/06/2025

VPS treo game là gì? Thuê VPS treo game giá rẻ, không lo giật lag

Bạn đam mê những tựa game online và muốn cày cuốc không ngừng nghỉ, nhưng chiếc máy tính cá nhân lại không đủ “trâu”...

02/06/2025

Dịch Vụ Xây Dựng Hệ Thống Peering Với Internet Exchange (IXP)

Peering với Internet Exchange (IXP) là giải pháp quan trọng giúp tăng tốc độ kết nối, giảm độ trễ, tối ưu chi phí băng thông...

04/04/2025

Dịch Vụ Triển Khai VPN Site-to-Site & Remote Access

Giới Thiệu VPN (Virtual Private Network) là giải pháp quan trọng giúp bảo mật dữ liệu, đảm bảo kết nối an toàn giữa các...

04/04/2025