Big Data Hadoop Bootcamp for Non Programmers


Introduction Hadoop Big Data Course

  • Introduction to the Course

Top 50 Ubuntu commands

Understand NameNode, DataNode, YARN and Hadoop Infrastructure

Quiz1 20 Questions


Hadoop Install

  • Hadoop Installation & HDFS Commands
  • Java based Mapreduce - 2 examples

# Hadoop 2.7 and Hadoop 3

Top 20 HDFS commands

Setting up Java for mapreduce



  • SQL, Hive and Pig Installation (RDBMS world and NoSQL world)
  • More Hive and SQOOP (Cloudera – First Sqoop and Hive on Cloudera. Heena. Notes from tutor point. JDBC drivers. Ask Heena for the code.
  • Pig
  • Impala
  • Intro to NoSQL, MongoDB, Hbase Installation (Planning to do)


Creating DB in SQL and running group by and Joins

Understanding different databases


Hive :

  1. Hive Partitions and Bucketing
  2. Hive External and Internal Tables

Spark Scala Python

  • Spark Installations and Commands
  • Spark Scala Scala Sheets
  • Hadoop Streaming Python Map Reduce
  • PySpark – (R and Python – Basics). RDDs. - Heena


Running Spark-shell and importing data from csv files

PySpark – Running RDD


Mid Term Exam

50 Interview questions to search online and write two lines.


Mid Term Projects

  1. Pull data from csv online and move to Hive using hive import
  2. Pull data from spark-shell and run map reduce for fox news first page
  3. Create Data in MySQL and using SQOOP move it to HDFS
  4. Using Jupyter Anaconda and Spark Context run count on file that has Fox news first page
  5. Save raw data using delimiter commma, space, tab and pipe and move that into spark-context and spark shell

Broadcasting Data – stream of data

Kafka Message Broadcasting / Fum



Intro to Cloudera Hadoop & studying Cloudera Certification



  • 12. AWS Intro - EC2, SSHing, Keys; Droplet on Digitcal Ocean; Oracle VM


This workshop is for all those who want to know more about Big Data and tools used in Big data.


You need a Windows or Ubuntu computer with 8 GB Ram, and 200 GB space.

*Macs not supported, although, there are extra laptops that could be provided for installation upon advance request (week before).


Installing Ubuntu, Hadoop, Python, SQL. understanding mapreduce and various tools.

Part 1 (1 hours)

Oracle Virtual Box

Ubuntu Basics

Terminal Commands

Nano gedit editor

Environmental Variables

Break:1 hours

Part 2: (2 hours)

Hadoop Install and Mapreduce

Introduction: MapReduce

Overview of Big Data and the Hadoop ecosystem

HDFS – Hadoop Distributed File System examples

The concept of MapReduce

MapReduce examples in Java

Break: 30 minutes

Part 3: (2 hours)

Learn about different tools and install:




Python Hadoop Streaming


Class notes:

Class Quiz:

More on Big Data, the Hadoop ecosystem, and MapReduce for Data Science enthusiasts .

Instructor: Shivgan Joshi