Setup Spark on AWS Ubuntu EC2 Virtual Machine

How to install and setup Spark on Amazon web services (AWS) on Ubuntu OS

Quick Links:

We have already setup AWS EC2 (Virtual Machine) and SSH from local machine.
To setup AWS EC2 (Virtual Machine) (Click here for installation setup)

We are able to connect to AWS via Putty.

Install Components (Python, Scala, Jupyter , Java) to setup Spark on EC2

Install update on EC2, make sure you update EC2 instance, this will help to install python, pip3 and other things

command: sudo apt-get update

Now we will install pip3 to install python packages

Command: sudo apt install python3-pip

It might ask permission to continue, say “Y” for yes

Lest install Jupyter

Command: pip3 install jupyter

Now we will install Java, before setting up spark

Java is required for scala and scala is required for Spark

Command: sudo apt-get install default-jre

Now let’s install Scala

Command: sudo apt-get install scala

Let's verify version for

Java: Command: java -version

Scala: Command: scala -version

Python: Command: python3 --version

To connect Python with java, we need Py4J library, Lets install

Command: pip3 install py4j

Now let’s setup Spark

Its url path for spark tar file, to download in our EC2 container. You can download any version by clicking on url "http://archive.apache.org/dist/spark/"

You can take any version but i will suggest to download Spark-2.2.2

Command: wget http://archive.apache.org/dist/spark/spark-2.1.1/spark-2.1.1-bin-hadoop2.7.tgz

Let’s unzip tar file and install spark

Command: sudo tar -zxvf spark-2.1.1-bin-hadoop2.7.tgz

Lets run Linux commands and save the spark folder path

Command (List files): ls

Command (go to spark folder): cd spark-2.1.1-bin-hadoop2.7/

Command (check present working directory): Pwd

Output: “/home/ubuntu/spark-2.1.1-bin-hadoop2.7”

Command (come back to ubuntu home): cd

Install “findSpark” utility, it will help us to connect python with spark

Command: pip3 install findspark

Create Jupyter configuration

Command: jupyter notebook --generate-config

Create folder certs and inside that create .pem file.

Command: cd

Command: mkdir certs

Command: cd certs

Command: sudo openssl req -x509 -nodes -days 365 -newkey rsa:1024 -keyout mycert.pem -out mycert.pem

This will ask certain info, please share (you can just press enter also)

Now edit config file

Command: cd ~/.jupyter/

Command (open config file to edit): vi jupyter_notebook.config.py

This will open editor

Press key “i” (to edit file)

Enter below content in config file

c = get_config()
c.NotebookApp.certfile = u'/home/ubuntu/certs/mycert.pem'
c.NotebookApp.ip = '*'
c.NotebookApp.open_browser = False
c.Notebook.ports = 8888

Press escape

Type: “:wq!” to write-quit and back to console

Open jupyter notebooksession

Command: cd

Command: jupyter notebook --ip=*

Please copy highlighted url and replace local host with your EC2 instance name.

Copy URL in browser.

The URL looks like: http://ec2-18-224-213-152.us-east-2.compute.amazonaws.com:8888/?token=2a18626548b9de30668c11255d92ab3948223486cf626c47

Open new python notebook

Run below commands to import spark from installed spark file:

Lets run our first set of commands in PySpark

Write below code

Import findspark
findspark.init('/home/ubuntu/spark-2.1.1-bin-hadoop2.7’)
import pyspark

Now You are good to run any Spark code, You can use any version of spark to setup

5 comments:

AnonymousMay 13, 2023 at 3:56 AM
I got stuck in the last thing , accessing thru browser
ReplyDelete
Replies
AnonymousMay 13, 2023 at 4:11 AM
teh ec2 instance ip i replaced was the public one but it didnt work. i have apache installed and teh index page, through the public ip I can see teh hosted page also
ReplyDelete
Replies
Gacha99December 9, 2023 at 3:08 AM
These features work in tandem to create an immersive and responsive environment for users. https://petrogalant.com/
ReplyDelete
Replies
zoeFebruary 1, 2025 at 4:59 AM
A quiet engine indicates smooth performance. crate engine chevy 350
ReplyDelete
Replies
sofiaJuly 27, 2025 at 11:37 AM
Easy to use and super fast. vending machine
ReplyDelete
Replies

Please do not enter spam links

© 2014 Algae Education Services. Designed by Bloggertheme9