Skip to main content
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Calling all Data Engineers! Fabric Data Engineer (Exam DP-700) live sessions are back! Starting October 16th. Sign up.

Reply
Anonymous
Not applicable

Number of partitions created when I output a parquet file

How can I control the number of partitions created when I output a parquet file?

1 ACCEPTED SOLUTION
chetnachaudhari
New Contributor III

Hi @Anonymous,

  If you are using PySpark, you can control the number of partitions created when you output a Parquet file by using the repartition method or the coalesce method on your DataFrame before writing it to Parquet. These methods allow you to control the number of output partitions, which in turn affects the number of Parquet files generated.

Thanks,

Chetna

View solution in original post

1 REPLY 1
chetnachaudhari
New Contributor III

Hi @Anonymous,

  If you are using PySpark, you can control the number of partitions created when you output a Parquet file by using the repartition method or the coalesce method on your DataFrame before writing it to Parquet. These methods allow you to control the number of output partitions, which in turn affects the number of Parquet files generated.

Thanks,

Chetna

Helpful resources

Announcements
Top Solution Authors
Top Kudoed Authors
Users online (11,086)