Microsoft DP-900: Azure Data Fundamentals Study Notes
Data Types & Formats
Structured, semi-structured, and unstructured data
Domain Weight
Core Data Concepts accounts for 25% of the DP-900 exam.
Data Classification
Type
Description
Format Examples
Azure Store Examples
Structured
Data with a defined schema and fixed format
Relational tables, CSV
Azure SQL Database, Azure Synapse
Semi-structured
Data with flexible schema — some structure but variable fields
JSON, XML, YAML
Azure Cosmos DB, Blob Storage
Unstructured
Data with no predefined schema
Images, video, audio, documents
Azure Blob Storage, Azure Data Lake
File Formats
CSV
Comma-separated values. Simple, human-readable. Good for tabular data exchange.
JSON
JavaScript Object Notation. Flexible, hierarchical. Common for APIs and NoSQL stores.
Parquet
Columnar format. Highly compressed, efficient for analytics. Common in big data.
Avro
Row-based format with schema embedded. Good for data streaming and Kafka.
ORC
Optimized for Hive. Columnar with advanced compression. Less common than Parquet.
Delta
Parquet + transaction log. Used in Delta Lake for ACID transactions on big data.
Batch vs Streaming Workloads
Transactional (OLTP) vs Analytical (OLAP) processing
Processing Models
Aspect
Batch Processing
Stream Processing
When data is processed
Collected over time, processed as a group
Processed continuously as it arrives
Latency
High (minutes to hours)
Low (milliseconds to seconds)
Azure services
Azure Data Factory, Synapse Pipelines
Azure Stream Analytics, Event Hubs
Use cases
Nightly reports, ETL loads, billing runs
Fraud detection, IoT alerts, real-time dashboards
OLTP vs OLAP
Aspect
OLTP (Transactional)
OLAP (Analytical)
Purpose
Record day-to-day business transactions
Analyze large volumes of historical data
Operations
INSERT, UPDATE, DELETE individual rows
Aggregations, GROUP BY, complex queries
Schema
Normalized (3NF) — reduces redundancy
Denormalized (Star/Snowflake) — optimized for reads
Azure service
Azure SQL Database, Azure SQL MI
Azure Synapse Analytics (Dedicated SQL Pool)
Row count per query
Small — few rows
Large — millions/billions of rows
Exam Tip
Know the difference between OLTP (transactional, Azure SQL) and OLAP (analytical, Synapse) workloads. A common exam scenario asks which service to use for a given workload type.
Data Roles & Responsibilities
Database Administrator, Data Engineer, Data Analyst
Key Data Roles
Database Administrator (DBA)
Manages database infrastructure. Responsible for backup/restore, security, performance tuning, and availability. Uses Azure SQL, SQL MI.
Data Engineer
Builds and maintains data pipelines and infrastructure. Works with ETL/ELT, Data Factory, Synapse, Databricks. Ensures data is available for analysis.
Data Analyst
Explores data, creates reports and dashboards, derives insights. Uses Power BI, SQL, Excel. Consumes data prepared by data engineers.
Also Know
Data Scientists build ML models. Application Developers use data services via APIs. These roles may overlap — the exam tests your ability to match a task to the appropriate role.