Friday, May 5, 2017

Installation of Apache Spark on Windows 10

Apache Spark is an open-source cluster-computing framework. Originally developed at the University of California, Berkeley's AMPLab, the Spark codebase was later donated to the Apache Software Foundation, which has maintained it since. Spark provides an interface for programming entire clusters with implicit data parallelism and fault-tolerance.

Please follow following instructions on installation Apache Spark on Windows 10.

Prerequisites:

Please ensure that you have installed JDK 1.8 or above on your environment.

Steps:

Installation of Scala 2.12.2
  • Please Install Scala after downloading it. 
  • Scala can be downloaded from here.
  • Download will give you a .msi file. Follow instructions and install Scala






















Installation of Spark


  • Spark Can be downloaded from here
  • I am choosing version 2.1.1 prebuit for Hadoop. Please note, I shall be running this without Hadoop.






















  • Extract the tar file into a folder called c:\Spark
  • The contents of the Extract will look like





Download Winutils


  • Download Winutils from these links : 64 bits
  • Create a folder c:\Spark\Winutils\bin and copy this winutils.exe there
  • The folder structure will look like


















Setup Environment Variables


  • Following environment variables will need to be setup:
    • JAVA_HOME: C:\jdk1.8.0_91
    • SCALA_HOME: C:\Program Files (x86)\scala\bin
    • _JAVA_OPTION: -Xms128m -Xmx256m
    • HADOOP_HOME:  C:\Spark\WinUtils
    • SPARK_HOME: C:\Program Files (x86)\scala\bin
  • Create a folder c:\tmp\hive and give it read/write/execute privileges for all
Test Spark Environment

  • Navigate to SPARK_HOME/bin and execute command scala-shell
You should re ready to use Spark






OCI Knowledge Series: OCI Infrastructure components

  Oracle Cloud Infrastructure (OCI) provides a comprehensive set of infrastructure services that enable you to build and run a wide range of...