Note: To install Spark on Centos, check out how to install Spark on Centos

Prerequisites:

  1. Install Java

Install Spark

  1. Download Spark
http://mirrors.ibiblio.org/apache/spark/spark-3.0.0/spark-3.0.0-bin-hadoop3.2.tgz
  1. Untar Spark tgz file
tar -xvf spark-3.0.0-bin-hadoop3.2.tgz
mv  spark-3.0.0-bin-hadoop3.2  /usr/local/spark
  1. Set SPARK_HOME Path
echo 'export SPARK_HOME=/usr/local/spark' >> ~/.bashrc
echo 'export PATH=$SPARK_HOME/bin:$PATH' >> ~/.bashrc
source ~/.bashrc
  1. Start a Spark standalone master server at http://127.0.0.1:8080/
$SPARK_HOME/sbin/start-master.sh

Open the browser and go to http://127.0.0.1:8080/, you should see response as shown below.

Test Spark Scala Shell

Run following command...

$SPARK_HOME/bin/spark-shell

The above command will start Scala shell. Let us run few commands to see if everything is fine...

scala> val textFile = spark.read.textFile("/usr/local/spark/README.md")
textFile: org.apache.spark.sql.Dataset[String] = [value: string]

scala> textFile.count()
res1: Long = 108

Get out of Scala shell by pressing Ctrl+D command

Test Spark Python Shell

Run following command...

$SPARK_HOME/bin/pyspark

The above command will start the Spark Python shell. Let us run few commands to check pyspark shell...

>>> textFile = spark.read.text("/usr/local/spark/README.md")
textFile.count()
>>> textFile.count()
108