Big Data Hadoop Bootcamp for Non Programmers

 

Introduction Hadoop Big Data Course

  • Introduction to the Course

Top 50 Ubuntu commands

Understand NameNode, DataNode, YARN and Hadoop Infrastructure

Quiz1 20 Questions

 

Hadoop Install

  • Hadoop Installation & HDFS Commands
  • Java based Mapreduce - 2 examples

# Hadoop 2.7 and Hadoop 3

Top 20 HDFS commands

Setting up Java for mapreduce

 

SQL and NoSQL

  • SQL, Hive and Pig Installation (RDBMS world and NoSQL world)
  • More Hive and SQOOP (Cloudera – First Sqoop and Hive on Cloudera. Heena. Notes from tutor point. JDBC drivers. Ask Heena for the code.
  • Pig
  • Impala
  • Intro to NoSQL, MongoDB, Hbase Installation (Planning to do)

 

Creating DB in SQL and running group by and Joins

Understanding different databases

 

Hive :

  1. Hive Partitions and Bucketing
  2. Hive External and Internal Tables

Spark Scala Python

  • Spark Installations and Commands
  • Spark Scala Scala Sheets
  • Hadoop Streaming Python Map Reduce
  • PySpark – (R and Python – Basics). RDDs. - Heena

 

Running Spark-shell and importing data from csv files

PySpark – Running RDD

 

Mid Term Exam

50 Interview questions to search online and write two lines.

 

Mid Term Projects

  1. Pull data from csv online and move to Hive using hive import
  2. Pull data from spark-shell and run map reduce for fox news first page
  3. Create Data in MySQL and using SQOOP move it to HDFS
  4. Using Jupyter Anaconda and Spark Context run count on file that has Fox news first page
  5. Save raw data using delimiter commma, space, tab and pipe and move that into spark-context and spark shell

Broadcasting Data – stream of data

Kafka Message Broadcasting / Fum

Flume

 

Intro to Cloudera Hadoop & studying Cloudera Certification

 

AWS

  • 12. AWS Intro - EC2, SSHing, Keys; Droplet on Digitcal Ocean; Oracle VM

 

This workshop is for all those who want to know more about Big Data and tools used in Big data.

Prerequisite*:

You need a Windows or Ubuntu computer with 8 GB Ram, and 200 GB space.

*Macs not supported, although, there are extra laptops that could be provided for installation upon advance request (week before).

Agenda:

Installing Ubuntu, Hadoop, Python, SQL. understanding mapreduce and various tools.

Part 1 (1 hours)

Oracle Virtual Box

Ubuntu Basics

Terminal Commands

Nano gedit editor

Environmental Variables

Break:1 hours

Part 2: (2 hours)

Hadoop Install and Mapreduce

Introduction: MapReduce

Overview of Big Data and the Hadoop ecosystem

HDFS – Hadoop Distributed File System examples

The concept of MapReduce

MapReduce examples in Java

Break: 30 minutes

Part 3: (2 hours)

Learn about different tools and install:

Hive

Spark

Scala

Python Hadoop Streaming

Pig

Class notes: https://docs.google.com/presentation/d/1UHKVynVNaLDbMAFxQcu2LEC46BGDIZFj1TBKYdx8WuA/edit?usp=sharing

Class Quiz:

https://docs.google.com/document/d/1-5kkaP3l4WOw1V8RuChQYgm6184ygjKFMhiJrn1NGyE/edit?usp=sharing

More on Big Data, the Hadoop ecosystem, and MapReduce for Data Science enthusiasts .

Instructor: Shivgan Joshi