Skip to main content
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Get Fabric certified for FREE! Don't miss your chance! Learn more

Reply
Jeanxyz
Valued Contributor

partitioning concept

I'm studying Apache Spark concepts, the concept of partitioning got my interest. I would like to check if my understanding below is correct:

The partitioning concept only works by writing the data into files, using pyspark code below:

df.write \
  .mode("overwrite") \
  .partitionBy("department") \
  .parquet(output_path)

 

However, if I write data into tables, I am not allowed to partition the data. The code below would throw an error:

***

df.write.mode("overwrite").format("delta").partitionBy("department").saveAsTable("data_withoutpart")

***

So if my understanding is correct,  partitioning is not useful in my case at all my data needs to be write into delta tables in the end for use by Power BI reports. 

 

Please correct me if I got it wrong or refer me to relevant tutorials.

 

 
 
1 ACCEPTED SOLUTION

Hi @Jeanxyz ,

 

Yes, youโ€™re absolutely right. If your Delta table is partitioned by department but your merge condition is based on employee_id, then the partitioning doesnโ€™t really help during the merge. Spark will still need to scan all partitions to find the matching employee_id values, because the partition column isnโ€™t part of the predicate.

 

Partitioning only speeds things up when your filter, join, or merge condition includes the partition column itself. For example, a merge or query that filters on department = 'HR' would benefit immediately, but a merge on employee_id wonโ€™t take advantage of department partitions.

 

So the general rule is:
Pick a partition column only if your downstream workloads actually filter or process data using that column. Otherwise, partitioning doesnโ€™t provide much benefit.

 

Hope this helps โ€” and glad the earlier correction was useful. If this answers your question, please mark it as Accepted Solution โœ”๏ธ

 

โ€“ Gopi Krishna

View solution in original post

6 REPLIES 6
Ugk161610
Contributor

Hi @Jeanxyz ,

Your understanding is close, but thereโ€™s one important detail missing: you can partition Delta tables โ€” it just depends on where and how the table is created. The error youโ€™re seeing doesnโ€™t mean Spark doesnโ€™t support partitioning; it usually means the target table already exists without partitions, or Fabric created the table with a different structure than what your write command expects.

 

If the table already exists as a non-partitioned Delta table, Spark wonโ€™t let you overwrite it with a partitioned version. In that case, youโ€™d have to drop the table first and then recreate it with partitionBy(). But writing a partitioned Delta table itself is completely supported.

Also, even for Power BI scenarios, partitioning can help โ€” especially when your dataset grows and you want faster refresh or more efficient incremental loads. Power BI doesnโ€™t care whether the underlying data is partitioned; it only reads the Delta table through the SQL endpoint or Spark.

 

So in short:
Partitioning is definitely not useless for Delta tables. The issue comes from trying to overwrite an existing table with a different partition structure. If you create the Delta table fresh with partitionBy(), it will work normally.

 

Hope this helps. If so, please give a Kudos ๐Ÿ‘ or mark as Accepted Solution โœ”๏ธ

 

โ€“ Gopi Krishna

Jeanxyz
Valued Contributor

You are right. I made a syntax mistake when writing partitioned tables. below is the correct one:

df2.write.mode("overwrite").format("delta").saveAsTable("data_withoutpart2"partitionBy="department")

Hi @Jeanxyz ,

Your understanding is close, but thereโ€™s an important detail to clarify. Partitioning does work when writing Delta tables โ€” it doesnโ€™t only apply to writing files. The issue you hit is just the syntax.

 

In Spark, the partitionBy() method is always part of the writer, not an argument inside saveAsTable. So your corrected syntax is the right one. When you write:

 

df.write \
.mode("overwrite") \
.format("delta") \
.partitionBy("department") \
.saveAsTable("data_with_partition")

 

Spark will create a fully valid partitioned Delta table, and this works perfectly fine with Power BI, Fabric Lakehouse, and SQL endpoint queries.

 

So partitioning is still useful even if your final destination is a Delta table. As long as your table has a natural grouping column with reasonable cardinality (like department, date, region, category, etc.), partitioning can improve performance for large datasets.

 

Your updated example using:

 

saveAsTable("data_withoutpart2", partitionBy="department")

is also valid โ€” itโ€™s just an alternative syntax that Spark supports.

 

So yes, your data can be written as Delta and still be partitioned without any problem. You werenโ€™t running into a conceptual limitation, just a small syntax detail.

 

Hope this helps. If it does, please give a Kudos ๐Ÿ‘ or mark as Accepted Solution โœ”๏ธ

 

โ€“ Gopi Krishna

Jeanxyz
Valued Contributor

Thanks for clarification, @Ugk161610 .

One last question: if my delta table is partitioned by 'department' but I'm running a merge query by 'employee_id', the partition won't help in the case. Is that correct?

 

Hi @Jeanxyz ,

 

Yes, youโ€™re absolutely right. If your Delta table is partitioned by department but your merge condition is based on employee_id, then the partitioning doesnโ€™t really help during the merge. Spark will still need to scan all partitions to find the matching employee_id values, because the partition column isnโ€™t part of the predicate.

 

Partitioning only speeds things up when your filter, join, or merge condition includes the partition column itself. For example, a merge or query that filters on department = 'HR' would benefit immediately, but a merge on employee_id wonโ€™t take advantage of department partitions.

 

So the general rule is:
Pick a partition column only if your downstream workloads actually filter or process data using that column. Otherwise, partitioning doesnโ€™t provide much benefit.

 

Hope this helps โ€” and glad the earlier correction was useful. If this answers your question, please mark it as Accepted Solution โœ”๏ธ

 

โ€“ Gopi Krishna

Jeanxyz
Valued Contributor

You are right. I made a syntax mistake when writing partitioned tables. below is the correct one:

df2.write.mode("overwrite").format("delta").saveAsTable("data_withoutpart2", partitionBy="department")

Helpful resources

Announcements
Sticker Challenge 2026 Carousel

Join our Community Sticker Challenge 2026

If you love stickers, then you will definitely want to check out our Community Sticker Challenge!

Free Fabric Certifications

Free Fabric Certifications

Get Fabric certified for free! Don't miss your chance.

January Fabric Update Carousel

Fabric Monthly Update - January 2026

Check out the January 2026 Fabric update to learn about new features.

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.

Users online (4,725)