Introduction Hadoop Big Data Course

  • Introduction to the Course

Top 50 Ubuntu commands

Understand NameNode, DataNode, YARN and Hadoop Infrastructure

Quiz1 20 Questions


Hadoop Install

  • Hadoop Installation & HDFS Commands
  • Java based Mapreduce - 2 examples

# Hadoop 2.7 and Hadoop 3

Top 20 HDFS commands

Setting up Java for mapreduce



  • SQL, Hive and Pig Installation (RDBMS world and NoSQL world)
  • More Hive and SQOOP (Cloudera – First Sqoop and Hive on Cloudera. Heena. Notes from tutor point. JDBC drivers. Ask Heena for the code.
  • Pig
  • Impala
  • Intro to NoSQL, MongoDB, Hbase Installation (Planning to do)


Creating DB in SQL and running group by and Joins

Understanding different databases


Hive :

  1. Hive Partitions and Bucketing
  2. Hive External and Internal Tables

Spark Scala Python

  • Spark Installations and Commands
  • Spark Scala Scala Sheets
  • Hadoop Streaming Python Map Reduce
  • PySpark – (R and Python – Basics). RDDs. - Heena


Running Spark-shell and importing data from csv files

PySpark – Running RDD


Mid Term Exam

50 Interview questions to search online and write two lines.


Mid Term Projects

  1. Pull data from csv online and move to Hive using hive import
  2. Pull data from spark-shell and run map reduce for fox news first page
  3. Create Data in MySQL and using SQOOP move it to HDFS
  4. Using Jupyter Anaconda and Spark Context run count on file that has Fox news first page
  5. Save raw data using delimiter commma, space, tab and pipe and move that into spark-context and spark shell

Broadcasting Data – stream of data

Kafka Message Broadcasting / Fum



Intro to Cloudera Hadoop & studying Cloudera Certification



  • 12. AWS Intro - EC2, SSHing, Keys; Droplet on Digitcal Ocean; Oracle VM