Install Spark on Windows 10

Steps to Install Spark on Windows 10 (Applicable to all versions)

Quick Links:

Prerequisite:

At least 4 GB RAM, i5 processer
GOW: It allows you to use Linux commands on windows (Click here to see install \ Update GOW)
Java: version 8 is good (Click here to update or install Java)
Jupyter with Below: Interface to write code

Python \Scala : Coding Language (Click here to install Python)

Lets verify Versions

Java: java -version
Python: Python --version

Grant Permission’s

Change Hidden folder settings in windows

Please provide permission ” full control ” to useri (used for installing) on folder “C:\ProgramData”

Download Spark

Once all prerequisites done, you can download spark software from http://spark.apache.org/downloads.html

Choose a Spark release: Pick latest stable release
Choose a package type: Pre-built for Hadoop 2.6
Choose a download type: (click on highlighted link)

Now spark downloaded.

Copy the files in folder where you want to setup engine

Install Spark

Now we have file “spark-2.2.0-bin-hadoop2.7.tgz”
You can use winzip or below command to extract file into .tar file and then further into folder “spark-2.2.0-bin-hadoop2.7” having spark fies
gzip -d spark-2.2.0-bin-hadoop2.7.tgz tar xvf spark-2.2.0-bin-hadoop2.7.tar
You don’t need to execute any file for install , You have to place this file in specific folder

I have created folder “C:\Sparkinstall”and copied files here

Now download windows utility

Download winutils.exe from https://github.com/steveloughran/winutils/blob/master/hadoop-2.6.0/bin/winutils.exe

copy to “C:\Sparkinstall\spark-2.2.0-bin-hadoop2.7\bin”

Or Execute “curl -k -L -o winutils.exehttps://github.com/steveloughran/winutils/blob/master/hadoop-2.6.0/bin/winutils.exe?raw=true” from command prompt

Install Spark (Setup environment variables)

Now please set environment variable, By running below command in command prompt

setx SPARK_HOME C:\Sparkinstall\spark-2.2.0-bin-hadoop2.7
setx HADOOP_HOME C:\Sparkinstall\spark-2.2.0-bin-hadoop2.7
setx PYSPARK_DRIVER_PYTHON ipython
setx PYSPARK_DRIVER_PYTHON_OPTS notebook

OR You can also do same from GUI (refer screenshot)

My computer --> right click properties --> Advance settings --> Environment variable’s
Add this in system variable path --> C:\spark-2.1.0-bin-hadoop2.7\bin

Now reboot your machine, its just recommendation from me and your spark is installed

Along spark now you have python also installed

Let’s check did we really have spark installed or have we missed some step

Go to command prompt and change directory to spark file location “C:\Sparkinstall\spark-2.2.0-bin-hadoop2.7\bin”

cd C:\Sparkinstall\spark-2.2.0-bin-hadoop2.7\bin

Now run command “spark-submit --version”

Congratulation you have installed spark engine if you get above screen shot else go thru steps again

Setup PySpark (install)

The shell for python is known as “PySpark”
PySpark helps data scientists interface with Resilient Distributed Datasets in apache spark and python.
Py4J is a popularly library integrated within PySpark that lets python interface dynamically with JVM objects (RDD's).
Apache Spark comes with an interactive shell for python as it does for Scala.

Install pyspark

Go to command prompt and run “pip install pyspark”

Go to command prompt and run “pip install jupyter”

Open Pyspark

To open console for pyspark
Open command prompt and change directory to “C:\Sparkinstall\spark-2.2.0-bin-hadoop2.7\bin” using command cd C:\Sparkinstall\spark-2.2.0-bin-hadoop2.7\bin
Run command “Pyspark”

In your browser a new tab is open with url http://localhost:8888/tree

Click new --> python3

Now you are ready to code your first program

Type your first command and press ctrl+ enter to see output

Congratulation now you can start working on pySpark

Algae Education Services

Labels

Install Spark on Windows 10

Steps to Install Spark on Windows 10 (Applicable to all versions)

Quick Links:

Prerequisite:

Lets verify Versions

Grant Permission’s

Download Spark

Install Spark

Install Spark (Setup environment variables)

Setup PySpark (install)

Install pyspark

Open Pyspark

No comments:

Followers

Categories

Total Pageviews

Popular Posts

Authors

Meet US

Services

More Services