Azure Data Engineer + Databricks Developer
Azure
Data Engineer
1) Overview of the
Microsoft Azure Platform
A.
Introduction to
Azure
B.
Basics of Cloud
computing
C.
Azure Infrastructure
D.
Walkthrough of Azure
Portal
E.
Overview of Azure
Services
2) Azure Data
Architecture
A.
Traditional RDBMS
workloads.
B.
Data Warehousing
Approach
C.
Big data
architectures.
D.
Transferring data to
and from Azure
3) Azure Storage
options
A.
Blob Storage
B.
ADLS Gen1 & Gen2
C.
RDBMS
D.
Hadoop
E.
NoSQL
F.
Disk
4) Blob Storage
A.
Azure Blob Resources
B.
Types of Blobs in
Azure
C.
Azure storage
account data objects
D.
Azure storage
account types and Options
E.
Replications in
distribution
F.
Secure access to an
application's data
G.
Azure Import/Export
service
H.
Storage Explorer
I.
Practical section on
Blob Storage
5) Azure Data
Factory
A.
Azure Data Factory
Architecture
B.
Creating ADF
Resource and Use in azure cloud
C.
Pipeline Creation
and Usage Options
D.
Copy Data Tool in
ADF Portal, Use
E.
Linked Service
Creation in ADF
F.
Dataset Creation,
Connection Reuse
G.
Staging Dataset with
Azure Storage
H.
ADF Pipeline
Deployments
I.
Pipeline
Orchestration using Triggers
J.
ADF Transformations
and other tools integration.
K.
Processing different
type’s files using ADF.
L.
Integration Runtime
M.
Monitoring ADF Jobs
N.
Manage IR’s and
Linked Services.
6) Azure SQL
Database Service
A.
Introduction to
Azure SQL Database
B.
Relational Data
Services in the Cloud
C.
Azure SQL Database
Service Tiers
D.
Database Throughput
Units (DTU)
E.
Scalable performance
and pools
F.
Creating and
Managing SQL Databases
G.
Azure SQL Database
Tools
H.
Migrating data to
Azure SQL Database
7) Azure Data Lake
Gen1 & Gen2
A.
Explore the Azure Data
Lake enterprise-class security features.
B.
Understand storage
account keys.
C.
Understand shared
access signatures.
D.
Understand
transport-level encryption with HTTPS.
E.
Understand Advanced
Threat Protection.
F.
Control network
access.
G.
Differences between
Gen1 & Gen2
8) Azure HD-Insight
cluster
A.
Creating HD-Insight
Cluster
B.
Understanding
HD-Insight Architecture
C.
Using Spark in HD-insight
D.
Using Hadoop in
HD-insight
E.
Understanding
Amabari view
F.
Pricing structure
and calculations
G.
Monitoring and
manage
Azure Databricks Concepts.
1) Azure
Databricks Introduction
A.
Databricks
Architecture
B.
Databricks
Components overview
C.
Benefits for data
engineers and data scientists
2) Azure
Databricks concepts
A.
Workspace – Creation
and managing workspace.
B.
Notebook – creating notebooks,
calling and managing different notebooks.
C.
Library -
installing libraries, managing libraries
D.
Experiment - ML and
dependency libraries usage.
3) Data Management
A.
Databricks File
System. - DBFS commands copy and manage files using
DBFS.
B.
Database - Creating
database, tables and managing databases and tables.
C.
Table - Creating Tables,
dropping tables, loading data ..
D.
Metastore - managing
metadata and delta tables creation, managing delta tables.
4) Computation
Management
A.
Cluster -- Creating Clusters , managing clusters
B.
Pool - creating
pools and using pools for Auto scaling.
C.
Databricks RunTime -
understanding and using Databricks runtimes based on requirement.
D.
Jobs - creating jobs
from notebooks and assigning types of clusters for jobs.
E.
Workload -
monitoring jobs and managing loads.
F.
Execution Context –
understanding context.
5) Security
A.
User - Creating users
B.
Group – creating groups.
C.
Managing Access –
managing access to users and groups
DELTA
LAKE
1) Delta Lake
usage in Databricks.
A.
Delta Lake
Architecture
B.
Delta Lake Storage Understanding
C.
Delta lake table
creation
D.
Delta Lake DML
Operations usage.
E.
Delta Lake Snapshots
PySpark
Content
Ø
Introduction to the
Basics of Python
Ø
How to Use Jupyter
& Notebooks for Python Development.
Ø
Install Python &
Spark in Local System for development.
Ø
Sequence and File
Operations
Ø
Functions, Sorting,
Errors and Exception, Regular Expressions, and Packages
Ø
Introduction to Big
Data and Apache Spark
Ø
Python for Spark
Ø
Python for Spark:
Functional and Object-Oriented Model
Ø
Apache Spark
Framework and RDDs
Ø
PySpark SQL and Data
Frames
Ø Need for Spark SQL
Ø
What is Spark SQL
Ø
Spark SQL
Architecture
Ø
SQL Context in Spark
SQL
Ø
User-Defined
Functions
Ø
Data Frames
Ø
Interoperating with
RDDs
Ø
Loading Data through
Different Sources
Ø
RDD Transformations
& Actions
Ø
Dataframe
Transformations & Actions
Ø
Performance Tuning
Ø
Spark-Hive
Integration



