Big Data Hadoop Bootcamp for Non Programmers
Introduction Hadoop Big Data Course
- Introduction to the Course
Top 50 Ubuntu commands
Understand NameNode, DataNode, YARN and Hadoop Infrastructure
Quiz1 20 Questions
- Hadoop Installation & HDFS Commands
- Java based Mapreduce - 2 examples
# Hadoop 2.7 and Hadoop 3
Top 20 HDFS commands
Setting up Java for mapreduce
SQL and NoSQL
- SQL, Hive and Pig Installation (RDBMS world and NoSQL world)
- More Hive and SQOOP (Cloudera – First Sqoop and Hive on Cloudera. Heena. Notes from tutor point. JDBC drivers. Ask Heena for the code.
- Intro to NoSQL, MongoDB, Hbase Installation (Planning to do)
Creating DB in SQL and running group by and Joins
Understanding different databases
- Hive Partitions and Bucketing
- Hive External and Internal Tables
Spark Scala Python
- Spark Installations and Commands
- Spark Scala Scala Sheets
- Hadoop Streaming Python Map Reduce
- PySpark – (R and Python – Basics). RDDs. - Heena
Running Spark-shell and importing data from csv files
PySpark – Running RDD
Mid Term Exam
50 Interview questions to search online and write two lines.
Mid Term Projects
- Pull data from csv online and move to Hive using hive import
- Pull data from spark-shell and run map reduce for fox news first page
- Create Data in MySQL and using SQOOP move it to HDFS
- Using Jupyter Anaconda and Spark Context run count on file that has Fox news first page
- Save raw data using delimiter commma, space, tab and pipe and move that into spark-context and spark shell
Broadcasting Data – stream of data
Kafka Message Broadcasting / Fum
Intro to Cloudera Hadoop & studying Cloudera Certification
- 12. AWS Intro - EC2, SSHing, Keys; Droplet on Digitcal Ocean; Oracle VM
This workshop is for all those who want to know more about Big Data and tools used in Big data.
You need a Windows or Ubuntu computer with 8 GB Ram, and 200 GB space.
*Macs not supported, although, there are extra laptops that could be provided for installation upon advance request (week before).
Installing Ubuntu, Hadoop, Python, SQL. understanding mapreduce and various tools.
Part 1 (1 hours)
Oracle Virtual Box
Nano gedit editor
Part 2: (2 hours)
Hadoop Install and Mapreduce
Overview of Big Data and the Hadoop ecosystem
HDFS – Hadoop Distributed File System examples
The concept of MapReduce
MapReduce examples in Java
Break: 30 minutes
Part 3: (2 hours)
Learn about different tools and install:
Python Hadoop Streaming
Class notes: https://docs.google.com/presentation/d/1UHKVynVNaLDbMAFxQcu2LEC46BGDIZFj1TBKYdx8WuA/edit?usp=sharing
More on Big Data, the Hadoop ecosystem, and MapReduce for Data Science enthusiasts .
Instructor: Shivgan Joshi