Course Code: 5625

20776 Engineering Data with Microsoft Cloud Services

Class Dates:
1/21/2019
3/18/2019
Length:
5 Days
Cost:
$2495.00
Class Time:
Technology:
microsoft
Delivery:

Overview

  • Course Overview
  • The main purpose of the course is to give students the ability plan and implement big data workflows on HDInsight.
  • Audience
  • The primary audience for this course is data engineers, data architects, data scientists, and data developers who plan to implement big data engineering workflows on HDInsight.

Prerequisites

  • In addition to their professional experience, students who attend this course should have:
    Programming experience using R, and familiarity with common R packages
    Knowledge of common statistical methods and data analysis best practices.
    Basic knowledge of the Microsoft Windows operating system and its core functionality.
    Working knowledge of relational databases.
  • Recommended Courses:

Course Details

  • After completing this course, students will be able to:
  • Deploy HDInsight Clusters.
  • Authorizing Users to Access Resources.
  • Loading Data into HDInsight.
  • Troubleshooting HDInsight
  • Implement Batch Solutions.
  • Design Batch ETL Solutions for Big Data with Spark
  • Analyze Data with Spark SQL
  • Analyze Data with Hive and Phoenix.
  • Describe Stream Analytics.
  • Implement Spark Streaming Using the DStream API.
  • Develop Big Data Real-Time Processing Solutions with Apache Storm.
  • Build Solutions that use Kafka and HBase.
  • Module 1: Getting Started with HDInsight
  • This module introduces Hadoop, the MapReduce paradigm, and HDInsight.
  • What is Big Data?
  • Introduction to Hadoop
  • Working with MapReduce Function
  • Introducing HDInsight
  • Lab : Working with HDInsight
  • Provision an HDInsight cluster and run MapReduce jobs
  • Module 2: Deploying HDInsight Clusters
  • Identifying HDInsight cluster types
  • Managing HDInsight clusters by using the Azure portal
  • Managing HDInsight Clusters by using Azure PowerShell
  • Lab : Managing HDInsight clusters with the Azure Portal
  • Create an HDInsight cluster that uses Data Lake Store storage
  • Customize HDInsight by using script actions
  • Delete an HDInsight cluster
  • Module 3: Authorizing Users to Access Resources
  • Non-domain Joined clusters
  • Configuring domain-joined HDInsight clusters
  • Manage domain-joined HDInsight clusters
  • Lab : Authorizing Users to Access Resources
  • Prepare the Lab Environment
  • Manage a non-domain joined cluster
  • Module 4: Loading data into HDInsight
  • Storing data for HDInsight processing
  • Using data loading tools
  • Maximising value from stored data
  • Lab : Loading Data into your Azure account
  • Load data for use with HDInsight
  • Module 5: Troubleshooting HDInsight
  • Analyze HDInsight logs
  • YARN logs
  • Heap dumps
  • Operations management suite
  • Lab : Troubleshooting HDInsight
  • Module 6: Implementing Batch Solutions
  • Apache Hive storage
  • HDInsight data queries using Hive and Pig
  • Operationalize HDInsight
  • Lab : Implement Batch Solutions
  • Deploy HDInsight cluster and data storage
  • Use data transfers with HDInsight clusters
  • Query HDInsight cluster data
  • Module 7: Design Batch ETL solutions for big data with Spark
  • What is Spark?
  • ETL with Spark
  • Spark performance
  • Lab : Design Batch ETL solutions for big data with Spark.
  • Create a HDInsight Cluster with access to Data Lake Store
  • Use HDInsight Spark cluster to analyze data in Data Lake Store
  • Analyzing website logs using a custom library with Apache Spark cluster on HDInsight
  • Managing resources for Apache Spark cluster on Azure HDInsight
  • Module 8: Analyze Data with Spark SQL
  • Implementing iterative and interactive queries
  • Perform exploratory data analysis
  • Lab : Performing exploratory data analysis by using iterative and interactive queries
  • Build a machine learning application
  • Use zeppelin for interactive data analysis
  • View and manage Spark sessions by using Livy
  • Module 9: Analyze Data with Hive and Phoenix
  • Implement interactive queries for big data with interactive hive.
  • Perform exploratory data analysis by using Hive
  • Perform interactive processing by using Apache Phoenix
  • Lab : Analyze data with Hive and Phoenix
  • Implement interactive queries for big data with interactive Hive
  • Perform exploratory data analysis by using Hive
  • Perform interactive processing by using Apache Phoenix
  • Module 10: Stream Analytics
  • Stream analytics
  • Process streaming data from stream analytics
  • Managing stream analytics jobs
  • Lab : Implement Stream Analytics
  • Process streaming data with stream analytics
  • Managing stream analytics jobs
  • Module 11: Implementing Streaming Solutions with Kafka and HBase
  • Building and Deploying a Kafka Cluster
  • Publishing, Consuming, and Processing data using the Kafka Cluster
  • Using HBase to store and Query Data
  • Lab : Implementing Streaming Solutions with Kafka and HBase
  • Create a virtual network and gateway
  • Create a storm cluster for Kafka
  • Create a Kafka producer
  • Create a streaming processor client topology
  • Create a Power BI dashboard and streaming dataset
  • Create an HBase cluster
  • Create a streaming processor to write to HBase
  • Module 12: Develop big data real-time processing solutions with Apache Storm
  • Persist long term data
  • Stream data with Storm
  • Create Storm topologies
  • Configure Apache Storm
  • Lab : Developing big data real-time processing solutions with Apache Storm
  • Stream data with Storm
  • Create Storm Topologies
  • Module 13: Create Spark Streaming Applications
  • Working with Spark Streaming
  • Creating Spark Structured Streaming Applications
  • Persistence and Visualization
  • Lab : Building a Spark Streaming Application
  • Installing Required Software
  • Building the Azure Infrastructure
  • Building a Spark Streaming Pipeline