Easy way to copy Lakehouse tables into SQL DB?

Scott_Powell · ‎12-02-2024

Hi, just wondering if there's an easy way to copy tables from a Lakehouse into a SQL database? Pipeline copy activities don't seem to be able to write to a Fabric SQL database, and when I tried a gen2 dataflow it failed with an error.

Thanks,

Scott

dlevy · ‎12-02-2024

@Scott_Powell - Sorry to hear that!

For pipelines were you doing something like this https://learn.microsoft.com/en-us/fabric/database/sql/load-data-pipelines#create-data-pipeline? I'd expect it to work. You can't create a new connection yet in pipelines but you should be able to pick your database from the OneLake Data Hub.

Do you have an error you can share for data flows?

sukkaur · ‎12-04-2024

Hi @Scott_Powell - Here's a short demo on how to move lakehouse data to SQL database in Fabric - https://youtu.be/K8n_0ez31Oo . Please let me know if this is helpful.

Thanks

Sukhwant

Scott_Powell · ‎12-11-2024

Hi all, I was able to get this working with using a copy activity in a pipeline. The issue I originally had was, if I chose the destination connection as a SQL database, I couldn't figure out the right way to populate the connection fields that would work.

However, if I just chose the name of the Fabric SQL DB from the "OneLake" section - it prompted me to create a connection which worked perfectly.

The only thing I'll add is that this copy process is extremly slow...I'm not sure what's going on behind the scenes, but I've had some copy jobs running 3+ hours to move tables that have < 20 million rows. I'll need to find a much faster way of doing this.

Thanks for the assistance!

Scott

divyed · ‎01-26-2025

Hello @Scott_Powell ,

Have you tried using staging before loading to destination ? I would suggest couple of things here :

1. Using staging before loading to a destination :

2. Vacuuming your Lakehouse data destination :

3. Handle Nullable columns

4. Data types conversion and upscaling

for detailed information check out below link :

Dataflow Gen2 data destinations and managed settings - Microsoft Fabric | Microsoft Learn

If you are using pipeline, you can check few things :

Check Data Movement Options: If you’re using Azure Data Factory (or the Fabric pipeline equivalent), you might want to experiment with different data movement strategies, like adjusting the copy behavior (e.g., batch size, partitioning). Sometimes splitting the data into chunks and processing those in parallel can dramatically speed things up.
Optimize Source and Destination Configurations:
- Ensure that your source SQL database is optimized for querying (e.g., appropriate indexing, statistics up-to-date).
- For the destination SQL database, ensure you have optimized tables (e.g., clustered indexes, proper data types).
Compression: Depending on the data you're moving, using compression can help reduce the time and network bandwidth needed for the transfer.
Staging Data: In some cases, copying the data first to a staging area (like Azure Blob storage) and then moving it to the destination database can improve performance, especially for large volumes of data.
Incremental Loads: If this is something you're doing regularly, setting up incremental data loads (only copying changed or new data) can speed up the overall process.
Monitor Throughput: Keep an eye on resource usage (CPU, memory, network) during the copy. Sometimes, bottlenecks can come from resource limits on the source, destination, or even on the pipeline itself.

Please try this once, I hope this will help.

Cheers

LinkedIn : https://www.linkedin.com/in/neeraj-kumar-62246b26/

FabCon is coming to Atlanta

Easy way to copy Lakehouse tables into SQL DB?

Helpful resources