Install Spark on Windows 10

Steps to Install Spark on Windows 10 (Applicable to all versions)

Quick Links:


Lets verify Versions

  • Java: java -version
  • Python: Python --version

Grant Permission’s

Change Hidden folder settings in windows

Please provide permission ” full control ” to useri (used for installing) on folder “C:\ProgramData”

Download Spark

Once all prerequisites done, you can download spark software from

  • Choose a Spark release: Pick latest stable release
  • Choose a package type:  Pre-built for Hadoop 2.6
  • Choose a download type: (click on highlighted link)

  • Now spark downloaded. 
  • Copy the files in folder where you want to setup engine

Install Spark

  • Now we have file “spark-2.2.0-bin-hadoop2.7.tgz”
  • You can use winzip or below command to extract file into .tar file  and then further into folder “spark-2.2.0-bin-hadoop2.7” having spark fies
  • gzip -d spark-2.2.0-bin-hadoop2.7.tgz tar xvf spark-2.2.0-bin-hadoop2.7.tar
  • You don’t need to execute any file for install , You have to place this file in specific folder

I have created folder “C:\Sparkinstall”and copied files here

Now download windows utility

    • Or Execute  “curl -k -L -o winutils.exe” from command prompt

Install Spark (Setup environment variables)

Now please set environment variable, By running below command in command prompt
  • setx SPARK_HOME C:\Sparkinstall\spark-2.2.0-bin-hadoop2.7
  • setx HADOOP_HOME C:\Sparkinstall\spark-2.2.0-bin-hadoop2.7
  • setx PYSPARK_DRIVER_PYTHON ipython

OR You can also do same from GUI (refer screenshot)

  • My computer --> right click properties --> Advance settings --> Environment variable’s
  • Add this in system variable path --> C:\spark-2.1.0-bin-hadoop2.7\bin

  • Now reboot your machine, its just recommendation from me and your spark is installed
  • Along spark now you have python also installed
  • Let’s check did we really have spark installed or have we missed some step
  • Go to command prompt and change directory to spark file location “C:\Sparkinstall\spark-2.2.0-bin-hadoop2.7\bin”   
    • cd C:\Sparkinstall\spark-2.2.0-bin-hadoop2.7\bin

Now run command   “spark-submit --version”

Congratulation you have installed spark engine if you get above screen shot else go thru steps again

Setup PySpark (install)

  • The shell for python is known as “PySpark”
  • PySpark helps data scientists interface with Resilient Distributed Datasets in apache spark and python.
  • Py4J is a popularly library integrated within PySpark that lets python interface dynamically with JVM objects (RDD's). 
  • Apache Spark comes with an interactive shell for python as it does for Scala.

Install pyspark

  • Go to command prompt and run “pip install pyspark”

  • Go to command prompt and run “pip install jupyter”

Open Pyspark

  • To open console for pyspark
  • Open command prompt and change directory to “C:\Sparkinstall\spark-2.2.0-bin-hadoop2.7\bin”  using command cd C:\Sparkinstall\spark-2.2.0-bin-hadoop2.7\bin
  • Run command “Pyspark

  • In your browser a new tab is open with url http://localhost:8888/tree
  • Click new --> python3

  • Now you are ready to code your first program

  • Type your first command and press ctrl+ enter to see output

Congratulation now you can start working on pySpark

No comments:
Write comments

Please do not enter spam links

Meet US


More Services