Course Overview

Course Duration : 5 Days

This course provides practical foundation level training that enables immediate and effective participation in big data and other analytics projects. It establishes a baseline of skills that can be further enhanced with additional training and real-world experience. The course provides an introduction to big data and a Data Analytics Lifecycle Process to address business challenges that leverage big data. It provides grounding in basic and advanced analytic methods and an introduction to big data analytics technology and tools, including MapReduce and Hadoop. The course has extensive labs throughout to provide practical opportunities to apply these methods and tools to real-world business challenges and includes a final lab in which students address a big data analytics challenge by applying the concepts taught in the course in the context of the Data Analytics Lifecycle. The course prepares the student for the Proven™ Professional Data Scientist Associate EMCDSA) certification exam.


  • Immediately participate and contribute as a Data Science Team Member on big data and other analytics projects by:
  • Deploying the Data Analytics Lifecycle to address big data analytics projects
  • Reframing a business challenge as an analytics challenge
  • Applying appropriate analytic techniques and tools to analyze big data,
  • create statistical models, and identify insights that can lead to actionable results
  • Selecting appropriate data visualizations to clearly communicate analytic insights to business sponsors and analytic audiences Using tools such as: R and RStudio, MapReduce/Hadoop, in-database analytics, Window and MADlib functions
  • Explain how advanced analytics can be leveraged to create competitive advantage and how the data scientist role and skills differ from those of a traditional business intelligence analyst

Who Should Attend

Benefits managers of business intelligence, analytics, big data professionals, data and database professionals adding big data analytics to their skills, recent college graduates and graduate students in related discipline looking to move into Data Science.


Course Certifications

This course is part of the following Certifications:



  • A strong quantitative background with a solid understanding of basic statistics, as would be found in a statistics 101 level course.
  • Experience with a scripting language, such as Java, Perl, or Python (or R). Many of the lab examples taught in the course use R (actually RStudio), which is an open source statistical tool and programming language
  • Experience with SQL


Course Objectives

You will learn basic and advanced analytic methods, get an introduction to Data Analytics Lifecycle to address business challenges that leverage big data; big data analytics technology and tools, including MapReduce and Hadoop.

Course Content

Module 1: Introduction to Big Data Analytics

  • Big Data Overview
  • State of the Practice in Analytics
  • The Data Scientist
  • Big Data Analytics in Industry Verticals

Module 2: Data Analytics Lifecycle

  • Discovery
  • Data Preparation
  • Model Planning
  • Model Building
  • Communicating Results
  • Operationalizing

Module 3: Review of Basic Data Analytic Methods Using R

  • Using R to Look at Data – Introduction to R
  • Analyzing and Exploring the Data
  • Statistics for Model Building and Evaluation

Module 4: Advanced Analytics – Theory And Methods

  • K Means Clustering
  • Association Rules
  • Linear and Logistic Regression
  • Naïve Bayesian Classifier
  • Decision Trees
  • Time Series Analysis
  • Text Analysis

Module 5: Advanced Analytics – Technologies and Tools

  • Analytics for Unstructured Data – MapReduce and Hadoop
  • The Hadoop Ecosystem:
  • In-database Analytics – SQL Essentials
  • Advanced SQL and MADlib for In-database Analytics

Module 6: The Endgame, or Putting it All Together

  • Operationalizing an Analytics Project
  • Creating the Final Deliverables
  • Data Visualization Techniques
  • Final Lab Exercise on Big Data Analytics