Table of Contents

Snowflake to Azure Synapse Analytics migration wasn’t something I planned—I was forced into it during one of our monthly “credit burn” meetings, watching Snowflake credits evaporate on a dashboard while the data team stayed silent.
For two years, I was a Snowflake zealot. The separation of storage and compute? Genius. The zero-copy cloning? Magic. It was the Ferrari of data warehouses.
But owning a Ferrari is expensive.
More importantly, I realized I wasn’t just driving the car; I was paying for a separate garage (ETL tool), a separate mechanic (orchestration tool), and a separate navigation system (BI tool).
My data stack was fragmented.
I had Fivetran moving data, Snowflake storing it, dbt transforming it, and Power BI visualizing it.

Every hop cost money.
Every integration point was a potential failure.
Then I looked at Azure Synapse Analytics, and that’s when the Snowflake to Azure Synapse Analytics migration started to make architectural and financial sense.
I didn’t move because I wanted a “cheaper Snowflake.” I moved because I found one specific feature that promised to kill the fragmentation entirely.

Here is why I migrated, and how it changed my data architecture forever.


The “Tax” on Curiosity

Before I explain the solution, let me explain the friction.
In the Snowflake world (at least how we used it), if I wanted to query data, I had to load it.
“Hey, can we analyze those 50TB of raw logs?”
“Sure, let me spin up an Extra-Large warehouse and load them. It’ll cost $200 just to ingest it.”
This created a “Tax on Curiosity.” We hesitated to explore raw data because we didn’t want to pay the storage and compute costs to load it into the warehouse first.
I wanted a world where I could query my data where it lived, without moving it, and only pay for the bytes I actually read.


Snowflake to Azure Synapse Analytics: The Game-Changing Feature That Made Me Switch

The feature that convinced me to switch is the Synapse Unified Workspace powered by Serverless SQL Pools.
Synapse isn’t just a data warehouse. It’s a “Studio” that combines:

  1. Data Integration (Pipelines/Data Factory)
  2. Big Data (Apache Spark)
  3. Warehousing (SQL Pools)
  4. Visualization (Power BI)

But the killer capability is Serverless SQL.

Why Serverless SQL is a Superpower

With Synapse Serverless SQL, I don’t provision clusters. I don’t turn on a warehouse. I don’t load data.
I simply point my SQL query at my Azure Data Lake Storage (ADLS Gen2) and run it.
It allows me to treat my CSV, JSON, and Parquet files as if they were tables. This is the Data Lakehouse dream realized.

The Code: Querying Files Without Loading

In Snowflake, I’d have to create a file format, a stage, and then COPY INTO a table.
In Synapse Serverless, I just write T-SQL.

-- Read a Parquet file directly from the Data Lake
SELECT
    TOP 100 *
FROM
    OPENROWSET(
        BULK 'https://datalake.dfs.core.windows.net/data/sales/year=2025/*.parquet',
        FORMAT = 'PARQUET'
    ) AS [result]
WHERE
    [result].Region = 'NorthAmerica'

The moment I run this, I pay roughly $5 per Terabyte scanned. If I scan 1GB, I pay a fraction of a penny.

This removed the barrier to entry. I could now give my data analysts access to raw data immediately. No pipelines. No waiting. No loading costs.

The Epiphany: I stopped building ETL pipelines for data that might never be used. I shifted to “ELT” (Extract, Load, Transform) and, in many cases, just “Leave it there and Query it.”

The “Everything in One Tab” Experience

The secondary part of this “One Feature” is the integration.

In my old stack, if a Spark job failed:

  1. Open Airflow (or Databricks) to check the logs.
  2. Fix the Python code.
  3. Re-run the job.
  4. Log into Snowflake to check if the data arrived.

In Synapse:

  1. I write a PySpark notebook to clean complex data.
  2. I write the output to the Data Lake.
  3. I switch to a SQL script in the same window to query that output using Serverless SQL.
  4. I click the “Visualize” button to turn that query into a Power BI dataset.

The mental overhead of context switching vanished.


My Migration Journey

Moving from Snowflake to Synapse is substantial. Snowflake uses a proprietary SQL dialect; Synapse uses T-SQL (mostly). Here is how I survived the move.

Phase 1: The Data Migration (The Easy Part)

Since our data was already in cloud storage (S3), moving it to Azure Data Lake Gen2 was a bandwidth problem, not a logic problem.

I used Synapse Pipelines (which is basically Azure Data Factory) to pull data from Snowflake tables and dump them into Parquet format in the Data Lake. This gave me my “Lakehouse” foundation immediately.

Phase 2: Translating SQL Dialects

This was the friction point.

  • Snowflake: SELECT * FROM table LIMIT 10
  • Synapse: SELECT TOP 10 * FROM table
  • Snowflake: try_parse_json(col)
  • Synapse: JSON_VALUE(col, '$.property')

I spent about two weeks rewriting our core dbt models. It wasn’t fun, but it forced us to optimize some terrible legacy SQL we had written years ago.

Phase 3: Security & Access

In Snowflake, I managed users in Snowflake.
In Synapse, I manage users in Azure Active Directory (Entra ID).

This was a massive upgrade. I could grant access to a specific folder in the Data Lake, and the Synapse Serverless SQL pool respects those ACLs (Access Control Lists). If a user doesn’t have read permission on the file, they can’t query it via SQL. It is Identity Passthrough at its finest.


Real Results: The Numbers

After six months of running on Azure Synapse Analytics, here is the breakdown.

MetricSnowflake EraSynapse Era
Ad-Hoc Query CostHigh (Warehouse Credits)-40% (Serverless Pay-per-TB)
Tool Count4 (Fivetran, Snowflake, dbt, BI)2 (Synapse, Power BI)
Time to InsightHours (Load first)Seconds (Query in Lake)
Data SilosSpark and SQL separateUnified Workspace

1. 40% Cost Reduction on Exploration

We still use Dedicated SQL Pools (formerly SQL DW) for our massive, high-performance dashboards. But for 80% of our daily work—data exploration, ad-hoc analysis, and data science—we use Serverless SQL. The cost difference is staggering.

2. Elimination of “Shadow IT”

Because Power BI is integrated directly into the Synapse Studio, developers build reports with the data engineers. The feedback loop is instant. We stopped having the issue where a dashboard breaks because the schema changed, and the analyst didn’t know.

3. Spark & SQL Harmony

We have a machine learning model that runs in Spark. It outputs predictions to the Data Lake.
Five seconds later, a SQL analyst queries those predictions using T-SQL.
No data movement. No JDBC drivers. It just shares the underlying storage.


Is Synapse Perfect?

No. Let me be honest.

The Concurrency Limits: Serverless SQL is great, but it’s not meant for high-concurrency dashboarding (e.g., 1,000 concurrent users). For that, you still need the Dedicated SQL Pool, which brings back the “provisioned cluster” cost model.

The Cold Start: Serverless pools can take a few seconds to “warm up” on the first query of the day.

The UI: The Synapse Studio is a web app. It can sometimes feel heavy compared to the snappy, text-centric Snowflake UI.


Final Thoughts

I’m talking to myself here, and to anyone who is looking at their data stack and seeing a jigsaw puzzle of vendors.
Snowflake is an amazing database. But Synapse is an amazing Platform.
The move wasn’t about finding a better SQL engine. It was about finding a better workflow.
The ability to query my data lake directly, without loading it, broke down the walls between my Data Engineers (who work in files) and my Data Analysts (who work in SQL).
I didn’t just save money on credits. I bought back the time I used to spend managing the plumbing between four different tools. And that, to me, is the real definition of a “Modern Data Stack.”

Categorized in: