Skip to content

Instantly share code, notes, and snippets.

View Anshuman-02905's full-sized avatar

Anshuman Mandal Anshuman-02905

View GitHub Profile
Layer AWS Azure / Fabric Role
Data Lake S3 OneLake Store raw & curated data
Query Engine Athena / Redshift Fabric SQL Engine Interactive querying
Governance Lake Formation Purview Secure access & lineage
BI Tool QuickSight Power BI Visualize and share insights
Capability AWS Stack Azure / Fabric Stack
Data Lake Storage S3 OneLake (ADLS Gen2 under the hood)
Ad-hoc SQL Querying Athena Fabric SQL Endpoint / Synapse Serverless
Data Warehouse Redshift Fabric Warehouse / Synapse Dedicated Pool
Governance & Security Lake Formation + Glue Catalog Purview
BI & Dashboards QuickSight Power BI
Service Purpose Key Highlights
Amazon Athena Serverless SQL querying directly on data lakes Query data in Amazon S3 using standard SQL. Pay only for scanned data. Ideal for ad-hoc exploration, data validation, or feeding BI dashboards. Integrates with Glue Catalog for schema metadata.
Amazon Redshift Cloud data warehouse for analytics MPP-based (massively parallel) SQL engine optimized for analytical workloads. Supports columnar storage, compression, and fede
Layer AWS Tool Azure Equivalent Analogy
Workflow Engine Step Functions ADF Control Flow "Director" – tells everyone when to act
Job Scheduler MWAA Fabric Pipelines "Assistant Director" – manages dependencies
Monitoring CloudWatch Azure Monitor "Camera Crew" – records everything that happens
Alerts & Notifications SNS + EventBridge Logic Apps + Event Grid "Stage Manager" – alerts when something goes wrong
Capability AWS Stack Azure Stack
Visual Pipeline Orchestration Step Functions / Glue Workflows Azure Data Factory (ADF)
Code-based DAG Orchestration MWAA (Apache Airflow) Fabric Pipelines / Synapse Pipelines
Event-driven Automation EventBridge + Step Functions Logic Apps + Event Grid
Monitoring & Metrics CloudWatch Azure Monitor
Logging & Telemetry CloudWatch Logs, X-Ray Log Analytics, Application Insights
Governance & Lineage CloudTrail + Lake Formation Purview
Service Purpose Key Highlights
Azure Data Factory (ADF) Primary Data Orchestrator The powerhouse for orchestrating ingestion, transformation, and loading across hybrid and cloud data systems. Equivalent to Step Functions + MWAA combined. Supports dependency chaining, conditional logic, triggers, parameters, and CI/CD integration.
Service Purpose Key Highlights
AWS Step Functions Workflow Orchestration Serverless workflow engine to chain Lambda, Glue Jobs, EMR steps, ECS tasks, or any API call. Supports branching, retries, error handling, and parallel execution. Equivalent to ADF pipelines.
AWS Managed Workflows for Apache Airflow (MWAA) Data Pipeline Orchestration (Python DAGs) Fully managed Apache Airflow. Ideal for co
Capability AWS Service Azure Equivalent Notes
Data Lake Storage S3 ADLS Gen2 Core storage for raw → curated data
Data Warehouse Redshift Synapse (Dedicated) MPP engine for structured analytics
Serverless Querying Athena Synapse Serverless Query directly on data lake
Catalog & Governance Lake Formation + Glue Catalog Purview / Unity Catalog Fine-grained security, metadata lineage
Scenario Best AWS Service Azure Equivalent
Metadata-driven, serverless ETL Glue ADF Data Flows
Heavy Spark-based transformation EMR Databricks
Small real-time transformations Lambda Functions
Multi-step workflow orchestration Step Functions ADF Pipelines
Step Service Purpose Azure Counterpart
1️⃣ AWS Glue Serverless ETL with PySpark ADF Mapping Data Flows
2️⃣ Amazon EMR Scalable, managed Spark/Hive clusters Azure Databricks
3️⃣ AWS Lambda Event-driven or micro-transformations Azure Functions
4️⃣ Step Functions (optional) Pipeline orchestration & retries ADF Pipelines / Logic Apps