Implementing an Azure Data Solution
- Course Pre-requisite
- Better if you have good knowledge on Azure Services (AZ-900)
- Get your e-books
- Signup with Azure Portal
Hello and welcome to DP-200: Implementing an Azure Data Solution. The focus of this learning path is to prepare you for Microsoft’s DP-200 exam. If you pass the DP-200 and DP-201 exams, then you’ll earn the Microsoft Certified Azure Data Engineer Associate certification. The provided resources should help you with your first steps in each area and can be used as an initial learning path.
Who is it for?
- Developers, Data Engineer, Cloud Architects
You can find all Lab Files and Instructions here.
Download as ZIP.
Please go through this Case Study to complete the Labs.
During this course, the first and the last lab of the course are group exercises that involve discussion to help provide context for the labs that the students will take. The last lab provides the opportunity for the students to reflect on what they have achieved and what they have overcome to achieve the delivery of requirements from the case study in the labs. The rest of the labs are hands on implementing Azure data platform capabilities to meet AdventureWorks business requirements.
The following is a summary of the lab objectives for each module:
The students will take the information gained in the lessons and from the case study to scope out the deliverables for a digital transformation project within AdventureWorks. They will first identify how the evolving use of data has presented new opportunities for the organization. The students will also explore which Azure Data Platform services can be used to address the business needs and define the tasks that will be performed by the data engineer. Finally, students will finalize the data engineering deliverables for AdventureWorks.
In this lab, the students will be able to determine the appropriate storage type to implement against a given set of business and technical requirements. They will be able to create Azure storage accounts and Data Lake Storage account and explain the difference between Data Lake Storage version 1 and version 2. They will also be able to demonstrate how to perform data loads into the data storage of choice.
By the end of this lab the student will be able to explain why Azure Databricks can be used to help in Data Science projects. The students will provision and Azure Databricks instance and will then create a workspace that will be used to perform a simple data preparation task from a Data Lake Store Gen II store. Finally, the student will perform a walk-through of performing transformations using Azure Databricks.
The students will be able to describe and demonstrate the capabilities that Azure Cosmos DB can bring to an organization. They will be able to create a Cosmos DB instance and show how to upload and query data through a portal and through a .Net application. They will then be able to demonstrate how to enable global scale of the Cosmos DB database.
The students will be able to provision an Azure SQL Database and Azure Synapse Analytics to be able to issue queries against one of the instances that are created. They will be also be able to integrate Azure Synapse Analytics with a number of other Data platform technologies and use PolyBase to load data from one data source into a data warehouse.
The students will be able to describe what data streams are and how event processing works and choose an appropriate data stream ingestion technology for the AdventureWorks case study. They will provision the chosen ingestion technology and integrate this with Stream Analytics to create a solution that works with streaming data.
In this module, students will learn how Azure Data factory can be used to orchestrate the data movement from a wide range of data platform technologies. They will be able to explain the capabilities of the technology and be able to set up an end to end data pipeline that ingests data from SQL Database and load the data into SQL Data Warehouse. The student will also demonstrate how to call a compute resource.
The students will be able to describe and document the different approaches to security that can be taken to provide defence in depth. This will involve the student documenting the security that has been set up so far in the course. It will also enable the students to identify any gaps in security that may exists for AdventureWorks.
The students will be able to define a broad monitoring solution that can help them monitor issues that can occur in their data estate. The student will then experience common data storage issues and data processing issue that can occur in cloud data solution. Finally they will implement a disaster recovery approach for a Data Platform technology.
DP-200 Presentation Slides Extra
Azure Storage Explorer
- If you’re using your own local Machine(for Lab), you can download Storage Explorer to Manage your Cloud Storage Resources
SQL Server Management Studio (SSMS)
Azure Synapse Analytics: the Next Evolution of SQL Data Warehouse
Azure Stream Analytics Lab Demo
Summary Mind Map
Git Locations for Demo
- Exam demo Questions can be found here!
Disclaimer : Please be consider that above link for dummy questions is a third party site and I don’t have any acquaintances with them.
Create Data Lake Storage and Ingest Data to Data Lake:
- Using azcopy
- Create Storage Account from Create a Resource
- In Advance tab : Enable : Hierarchical namespace to create Data Lake Storage Gen2 account.
- Once Data Lake Storage Create Go to Access Control (IAM) > Role Assignment > Add Role Assignment
- Select Storage Blob Data Contributor Role and Select your AAD user > Save
- Open Cloud Shell from Azure Portal (use Bash)
- Follow Step by Step Guideline to Ingest the Data to Data Lake Storage.
- Using Azcopy.exe
- Download AzCopy. Go here and download the latest version. The latest version is V10. The copy to data lake gen2 is supported from V10 onwards
- Go to Command prompt and navigate the location where AzCopy.exe is saved.
- Add yourself as a ‘Storage Blob Data Contributor’ by navigating to Access Control (IAM)
- Go back to command prompt and type ‘azcopy login’ in the folder where azcopy.exe is stored. This step will give you a code which needs to be entered at https://microsoft.com/devicelogin
- After successful login using following format of command you can upload folder or files into azure data lake storage
azcopy copy ” ‘https://..core.windows.net/’ –recursive
- Using Azure Storage Explorer
- Download and install Azure Storage Explorer
- Start by adding your Azure account.
- There are multiple options for connecting to your storage account. Sign in with your Azure account to provide access to all your subscriptions
- Once you sign in, you can select the subscriptions you want to work with. Make sure to select the one
you created the Azure Storage account in.
- Select the New Folder button from the menu running across the top
- Create a folder named “Data”
- n the top menu, select Upload. This gives you the option to upload a folder or a file.
- Select Upload Files to Upload your desired files.
Data Lake Storage Gen1 to Data Lake Storage Gen2 (using Azure Data Factory)
Azure Data Factory is a cloud-based data integration service that creates workflows in the cloud for orchestrating batch data movement and transformations. Using Azure Data Factory, you can create and
schedule workflows (called pipelines) to ingest data from disparate data stores. The data can then be processed and transformed with services such as:
- Azure HDInsight Hadoop
- Azure Data Lake
- Azure Machine Learning
There are many data orchestration tasks that can be conducted using Azure Data Factory. In this exercise, we’ll copy data from Azure Data Lake Storage Gen1 to Azure Data Lake Storage Gen2.
- Create an Azure Data Factory instance
- Create a Data Lake Storage Gen 1 Account
- Upload a file into data lake storage Gen 1 account
- Setting permissions on the data lake storage Gen 1 account
- You need to set permissions to enable the Azure Data Factory instance to access the data in your Data Lake Store Gen 1
- From Data Lake Storage Gen1 > Access control (IAM) > + Add Role Assignment > select Owner > Select and type in the Azure Data Factory instance name > Save
Stream Analytics Resources
- Lab Resources for Stream Analytics can be found here https://github.com/arifmarias/AzureStreamAnalytics