Join us at FabCon Atlanta from March 16 - 20, 2026, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.
Register now!Get Fabric certified for FREE! Don't miss your chance! Learn more
I'm studying Apache Spark concepts, the concept of partitioning got my interest. I would like to check if my understanding below is correct:
The partitioning concept only works by writing the data into files, using pyspark code below:
df.write \
.mode("overwrite") \
.partitionBy("department") \
.parquet(output_path)
However, if I write data into tables, I am not allowed to partition the data. The code below would throw an error:
***
df.write.mode("overwrite").format("delta").partitionBy("department").saveAsTable("data_withoutpart")
***
So if my understanding is correct, partitioning is not useful in my case at all my data needs to be write into delta tables in the end for use by Power BI reports.
Please correct me if I got it wrong or refer me to relevant tutorials.
Solved! Go to Solution.
Hi @Jeanxyz ,
Yes, youโre absolutely right. If your Delta table is partitioned by department but your merge condition is based on employee_id, then the partitioning doesnโt really help during the merge. Spark will still need to scan all partitions to find the matching employee_id values, because the partition column isnโt part of the predicate.
Partitioning only speeds things up when your filter, join, or merge condition includes the partition column itself. For example, a merge or query that filters on department = 'HR' would benefit immediately, but a merge on employee_id wonโt take advantage of department partitions.
So the general rule is:
Pick a partition column only if your downstream workloads actually filter or process data using that column. Otherwise, partitioning doesnโt provide much benefit.
Hope this helps โ and glad the earlier correction was useful. If this answers your question, please mark it as Accepted Solution โ๏ธ
โ Gopi Krishna
Hi @Jeanxyz ,
Your understanding is close, but thereโs one important detail missing: you can partition Delta tables โ it just depends on where and how the table is created. The error youโre seeing doesnโt mean Spark doesnโt support partitioning; it usually means the target table already exists without partitions, or Fabric created the table with a different structure than what your write command expects.
If the table already exists as a non-partitioned Delta table, Spark wonโt let you overwrite it with a partitioned version. In that case, youโd have to drop the table first and then recreate it with partitionBy(). But writing a partitioned Delta table itself is completely supported.
Also, even for Power BI scenarios, partitioning can help โ especially when your dataset grows and you want faster refresh or more efficient incremental loads. Power BI doesnโt care whether the underlying data is partitioned; it only reads the Delta table through the SQL endpoint or Spark.
So in short:
Partitioning is definitely not useless for Delta tables. The issue comes from trying to overwrite an existing table with a different partition structure. If you create the Delta table fresh with partitionBy(), it will work normally.
Hope this helps. If so, please give a Kudos ๐ or mark as Accepted Solution โ๏ธ
โ Gopi Krishna
You are right. I made a syntax mistake when writing partitioned tables. below is the correct one:
Hi @Jeanxyz ,
Your understanding is close, but thereโs an important detail to clarify. Partitioning does work when writing Delta tables โ it doesnโt only apply to writing files. The issue you hit is just the syntax.
In Spark, the partitionBy() method is always part of the writer, not an argument inside saveAsTable. So your corrected syntax is the right one. When you write:
df.write \
.mode("overwrite") \
.format("delta") \
.partitionBy("department") \
.saveAsTable("data_with_partition")
Spark will create a fully valid partitioned Delta table, and this works perfectly fine with Power BI, Fabric Lakehouse, and SQL endpoint queries.
So partitioning is still useful even if your final destination is a Delta table. As long as your table has a natural grouping column with reasonable cardinality (like department, date, region, category, etc.), partitioning can improve performance for large datasets.
Your updated example using:
saveAsTable("data_withoutpart2", partitionBy="department")
is also valid โ itโs just an alternative syntax that Spark supports.
So yes, your data can be written as Delta and still be partitioned without any problem. You werenโt running into a conceptual limitation, just a small syntax detail.
Hope this helps. If it does, please give a Kudos ๐ or mark as Accepted Solution โ๏ธ
โ Gopi Krishna
Thanks for clarification, @Ugk161610 .
One last question: if my delta table is partitioned by 'department' but I'm running a merge query by 'employee_id', the partition won't help in the case. Is that correct?
Hi @Jeanxyz ,
Yes, youโre absolutely right. If your Delta table is partitioned by department but your merge condition is based on employee_id, then the partitioning doesnโt really help during the merge. Spark will still need to scan all partitions to find the matching employee_id values, because the partition column isnโt part of the predicate.
Partitioning only speeds things up when your filter, join, or merge condition includes the partition column itself. For example, a merge or query that filters on department = 'HR' would benefit immediately, but a merge on employee_id wonโt take advantage of department partitions.
So the general rule is:
Pick a partition column only if your downstream workloads actually filter or process data using that column. Otherwise, partitioning doesnโt provide much benefit.
Hope this helps โ and glad the earlier correction was useful. If this answers your question, please mark it as Accepted Solution โ๏ธ
โ Gopi Krishna
You are right. I made a syntax mistake when writing partitioned tables. below is the correct one:
If you love stickers, then you will definitely want to check out our Community Sticker Challenge!
Check out the January 2026 Fabric update to learn about new features.
| User | Count |
|---|---|
| 18 | |
| 5 | |
| 4 | |
| 3 | |
| 3 |
| User | Count |
|---|---|
| 62 | |
| 27 | |
| 14 | |
| 7 | |
| 7 |