Skip to main content
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Calling all Data Engineers! Fabric Data Engineer (Exam DP-700) live sessions are back! Starting October 16th. Sign up.

Reply
Scott_Powell
Contributor II

Easy way to copy Lakehouse tables into SQL DB?

Hi, just wondering if there's an easy way to copy tables from a Lakehouse into a SQL database? Pipeline copy activities don't seem to be able to write to a Fabric SQL database, and when I tried a gen2 dataflow it failed with an error.

 

Thanks,

Scott

4 REPLIES 4
dlevy
Contributor

@Scott_Powell - Sorry to hear that!

 

For pipelines were you doing something like this https://learn.microsoft.com/en-us/fabric/database/sql/load-data-pipelines#create-data-pipeline? I'd expect it to work. You can't create a new connection yet in pipelines but you should be able to pick your database from the OneLake Data Hub.

 

Do you have an error you can share for data flows?

sukkaur
New Contributor II

Hi @Scott_Powell - Here's a short demo on how to move lakehouse data to SQL database in Fabric - https://youtu.be/K8n_0ez31Oo . Please let me know if this is helpful.

Thanks

Sukhwant

Scott_Powell
Contributor II

Hi all, I was able to get this working with using a copy activity in a pipeline. The issue I originally had was, if I chose the destination connection as a SQL database, I couldn't figure out the right way to populate the connection fields that would work.

 

However, if I just chose the name of the Fabric SQL DB from the "OneLake" section - it prompted me to create a connection which worked perfectly.

 

The only thing I'll add is that this copy process is extremly slow...I'm not sure what's going on behind the scenes, but I've had some copy jobs running 3+ hours to move tables that have < 20 million rows. I'll need to find a much faster way of doing this.

 

Thanks for the assistance!

Scott

divyed
Contributor III

Hello @Scott_Powell ,

 

Have you tried using staging before loading to destination ? I would suggest couple of things here :

 

1. Using staging before loading to a destination :    

2. Vacuuming your Lakehouse data destination : 

3. Handle Nullable columns

4. Data types conversion and upscaling

 

for detailed information check out below link :

Dataflow Gen2 data destinations and managed settings - Microsoft Fabric | Microsoft Learn

 

If you are using pipeline, you can check few things :

 

  • Check Data Movement Options: If youโ€™re using Azure Data Factory (or the Fabric pipeline equivalent), you might want to experiment with different data movement strategies, like adjusting the copy behavior (e.g., batch size, partitioning). Sometimes splitting the data into chunks and processing those in parallel can dramatically speed things up.

  • Optimize Source and Destination Configurations:

    • Ensure that your source SQL database is optimized for querying (e.g., appropriate indexing, statistics up-to-date).
    • For the destination SQL database, ensure you have optimized tables (e.g., clustered indexes, proper data types).
  • Compression: Depending on the data you're moving, using compression can help reduce the time and network bandwidth needed for the transfer.

  • Staging Data: In some cases, copying the data first to a staging area (like Azure Blob storage) and then moving it to the destination database can improve performance, especially for large volumes of data.

  • Incremental Loads: If this is something you're doing regularly, setting up incremental data loads (only copying changed or new data) can speed up the overall process.

  • Monitor Throughput: Keep an eye on resource usage (CPU, memory, network) during the copy. Sometimes, bottlenecks can come from resource limits on the source, destination, or even on the pipeline itself.

 

 

Please try this once, I hope this will help.

 

Cheers

 

LinkedIn : https://www.linkedin.com/in/neeraj-kumar-62246b26/

Helpful resources

Announcements
Top Solution Authors
Top Kudoed Authors
Users online (9,586)