Skip to main content
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Get Fabric certified for FREE! Don't miss your chance! Learn more

Reply
AndersASorensen
New Contributor III

SCD Type 2 Using MERGE in delta

Whats the best practice when I want to both update AND insert a record into a target table in one operation ensureing atomisity?

 

newRecord

ID, Product, Color

123, A, Blue

 

myTable

ID, Product, Color, is_current

123, A, Yellow, TRUE

 

I basically want to do an atomic operation that gives the following result in myTable

 

ID, Product, Color, is_current

123, A, Yellow, FALSE

123, A, Blue, TRUE

 

I basically want to do a .whenMatchedUpdate AND a .whenMatchedInsert in a MERGE statement, but I know thats not supported.

 

 

1 ACCEPTED SOLUTION

Hello @AndersASorensen 

 

give it a try

 

from pyspark.sql.functions import *
from delta.tables import *

# Assuming 'target_table' is your existing Delta table
target_table = DeltaTable.forName(spark, "target_table")

# 'source_df' is your new data
merge_condition = "target.natural_key = source.natural_key"

(target_table.alias("target")
.merge(source_df.alias("source"), merge_condition)
.whenMatchedUpdate(set={
"attribute1": "source.attribute1",
"attribute2": "source.attribute2",
"is_current": lit(True),
"update_date": current_date()
})
.whenNotMatchedInsert(values={
"natural_key": "source.natural_key",
"attribute1": "source.attribute1",
"attribute2": "source.attribute2",
"is_current": lit(True),
"update_date": current_date()
})
.execute())

 

 

https://iterationinsights.com/article/implementing-a-hybrid-type-1-and-2-slowly-changing-dimension-i...

 

Hope this helps.

 

Please accept the answer if this works. 

Thanks

 

 

View solution in original post

5 REPLIES 5
nilendraFabric
Honored Contributor

Hello @AndersASorensen 

The `MERGE` statement is not natively supported in Microsoft Fabricโ€™s Data Warehouse as of now. Users need to rely on PySpark or SQL workarounds for similar functionality

 

from delta.tables import DeltaTable

# Define source and target
target_table = DeltaTable.forName(spark, "Lakehouse.TargetTable")
source_df = spark.read.table("Lakehouse.SourceTable")

# Perform the merge
target_table.alias("target").merge(
source=source_df.alias("source"),
condition="target.key = source.key"
).whenMatchedUpdateAll().whenNotMatchedInsertAll().execute()

 

https://aventius.co.uk/2024/03/28/microsoft-fabric-using-pyspark-to-dynamically-merge-data-into-many...

 

Currently, the `MERGE` statement is not supported in Fabricโ€™s Data Warehouses. As a workaround:
โ€ข Use a combination of `ROW_NUMBER()` for deduplication and `INSERT`/`UPDATE` statements to achieve similar functionality:

WITH DeduplicatedSource AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY LOAD_DATE DESC) AS row_num
FROM SourceTable
WHERE row_num = 1
)
-- Merge logic
MERGE INTO TargetTable AS target
USING DeduplicatedSource AS source
ON target.ID = source.ID
WHEN MATCHED THEN UPDATE SET target.Column = source.Column
WHEN NOT MATCHED THEN INSERT (Column) VALUES (source.Column);

Hope this helps 

Thanks

I am happy to use pyspark, and I want to do it on a lakehouse, not a DW.

 

However, your .whenMatchedUpdateAll().whenNotMatchedInsertAll().execute() suggestion will not work as this will only update the matched row.

 

Yes, I would like it to update a is_current column when matched, but it will not insert the record as a new row too.

Hello @AndersASorensen 

 

give it a try

 

from pyspark.sql.functions import *
from delta.tables import *

# Assuming 'target_table' is your existing Delta table
target_table = DeltaTable.forName(spark, "target_table")

# 'source_df' is your new data
merge_condition = "target.natural_key = source.natural_key"

(target_table.alias("target")
.merge(source_df.alias("source"), merge_condition)
.whenMatchedUpdate(set={
"attribute1": "source.attribute1",
"attribute2": "source.attribute2",
"is_current": lit(True),
"update_date": current_date()
})
.whenNotMatchedInsert(values={
"natural_key": "source.natural_key",
"attribute1": "source.attribute1",
"attribute2": "source.attribute2",
"is_current": lit(True),
"update_date": current_date()
})
.execute())

 

 

https://iterationinsights.com/article/implementing-a-hybrid-type-1-and-2-slowly-changing-dimension-i...

 

Hope this helps.

 

Please accept the answer if this works. 

Thanks

 

 

Hi again ๐Ÿ™‚

 

The code provided won't work, but the article you linked to had a working solution ๐Ÿ™‚

Thanks!

This comment saved my life. Thank you @AndersASorensen 

Helpful resources

Announcements
Sticker Challenge 2026 Carousel

Join our Community Sticker Challenge 2026

If you love stickers, then you will definitely want to check out our Community Sticker Challenge!

Free Fabric Certifications

Free Fabric Certifications

Get Fabric certified for free! Don't miss your chance.

January Fabric Update Carousel

Fabric Monthly Update - January 2026

Check out the January 2026 Fabric update to learn about new features.

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.

Users online (722)