Sunday, August 30, 2020

Azure Data Engineer Tutorial Content , Databricks Training Content

 

Azure Data Engineer + Databricks Developer

 

Azure Data Engineer


1)      Overview of the Microsoft Azure Platform

A.      Introduction to Azure

B.      Basics of Cloud computing

C.      Azure Infrastructure

D.      Walkthrough of Azure Portal

E.       Overview of Azure Services

2)      Azure Data Architecture

A.      Traditional RDBMS workloads.

B.      Data Warehousing Approach

C.      Big data architectures.

D.      Transferring data to and from Azure

3)      Azure Storage options

A.      Blob Storage

B.      ADLS Gen1 & Gen2

C.      RDBMS

D.      Hadoop

E.       NoSQL

F.       Disk

4)      Blob Storage

A.      Azure Blob Resources

B.      Types of Blobs in Azure

C.      Azure storage account data objects

D.      Azure storage account types and Options

E.       Replications in distribution

F.       Secure access to an application's data

G.     Azure Import/Export service

H.      Storage Explorer

I.        Practical section on Blob Storage

5)      Azure Data Factory

A.      Azure Data Factory Architecture

B.      Creating ADF Resource and Use in azure cloud

C.      Pipeline Creation and Usage Options

D.      Copy Data Tool in ADF Portal, Use

E.       Linked Service Creation in ADF

F.       Dataset Creation, Connection Reuse

G.     Staging Dataset with Azure Storage

H.      ADF Pipeline Deployments

I.        Pipeline Orchestration using Triggers

J.        ADF Transformations and other tools integration.

K.      Processing different type’s files using ADF.

L.       Integration Runtime

M.    Monitoring ADF Jobs

N.     Manage IR’s and Linked Services.

6)      Azure SQL Database Service

A.      Introduction to Azure SQL Database

B.      Relational Data Services in the Cloud

C.      Azure SQL Database Service Tiers

D.      Database Throughput Units (DTU)

E.       Scalable performance and pools

F.       Creating and Managing SQL Databases

G.     Azure SQL Database Tools

H.      Migrating data to Azure SQL Database

7)      Azure Data Lake Gen1 & Gen2

A.      Explore the Azure Data Lake enterprise-class security features.

B.      Understand storage account keys.

C.      Understand shared access signatures.

D.      Understand transport-level encryption with HTTPS.

E.       Understand Advanced Threat Protection.

F.       Control network access.

G.     Differences between Gen1 & Gen2

8)      Azure HD-Insight cluster

A.      Creating HD-Insight Cluster

B.      Understanding HD-Insight Architecture

C.      Using Spark in HD-insight

D.      Using Hadoop in HD-insight

E.       Understanding Amabari view

F.       Pricing structure and calculations

G.     Monitoring and manage

 

Azure Databricks Concepts.



1)      Azure Databricks Introduction

A.      Databricks Architecture

B.      Databricks Components overview

C.      Benefits for data engineers and data scientists

 

2)      Azure Databricks concepts

A.      Workspace – Creation and managing workspace.

B.      Notebook – creating notebooks, calling and managing different notebooks.

C.      Library  -  installing libraries, managing libraries

D.      Experiment - ML and dependency libraries usage.

 

3)      Data Management

A.      Databricks File System.  -  DBFS commands copy and manage files using DBFS.

B.      Database - Creating database, tables and managing databases and tables.

C.      Table - Creating Tables, dropping tables, loading data ..

D.      Metastore - managing metadata and delta tables creation, managing delta tables.

 

4)      Computation Management

A.      Cluster  -- Creating Clusters , managing clusters

B.      Pool - creating pools and using pools for Auto scaling.

C.      Databricks RunTime - understanding and using Databricks runtimes based on requirement.

D.      Jobs - creating jobs from notebooks and assigning types of clusters for jobs.

E.       Workload - monitoring jobs and managing loads.

F.       Execution Context – understanding context.

 

5)      Security

A.      User -  Creating users

B.      Group – creating groups.

C.      Managing Access – managing access to users and groups

 

DELTA LAKE

 


 

1)      Delta Lake usage in Databricks.

A.      Delta Lake Architecture

B.      Delta Lake Storage Understanding

C.      Delta lake table creation

D.      Delta Lake DML Operations usage.

E.       Delta Lake Snapshots

 

PySpark Content



Ø  Introduction to the Basics of Python

Ø  How to Use Jupyter & Notebooks for Python Development.

Ø  Install Python & Spark in Local System for development.

Ø  Sequence and File Operations

Ø  Functions, Sorting, Errors and Exception, Regular Expressions, and Packages

Ø  Introduction to Big Data and Apache Spark

Ø  Python for Spark

Ø  Python for Spark: Functional and Object-Oriented Model

Ø  Apache Spark Framework and RDDs

Ø  PySpark SQL and Data Frames  

Ø  Need for Spark SQL

Ø  What is Spark SQL

Ø  Spark SQL Architecture

Ø  SQL Context in Spark SQL

Ø  User-Defined Functions

Ø  Data Frames

Ø  Interoperating with RDDs

Ø  Loading Data through Different Sources

Ø  RDD Transformations & Actions

Ø  Dataframe Transformations & Actions

Ø  Performance Tuning

Ø  Spark-Hive Integration

 

 

Azure Data Engineer Tutorial Content , Databricks Training Content

  Azure Data Engineer + Databricks Developer   Azure Data Engineer 1)       Overview of the Microsoft Azure Platform A.       Intr...