Skip to main content
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Calling all Data Engineers! Fabric Data Engineer (Exam DP-700) live sessions are back! Starting October 16th. Sign up.

Reply
Zoe_Guest
Regular Visitor

spark.sql is getting old data that was deleted from Lakehouse whereas spark.read.load doesn't

I have data in a Lakehouse and I have deleted some of it. I am trying to load it from a Fabric Notebook.

 

When I use spark.sql("SELECT * FROM parquet.`<abfs_path>/Tables/<table_name>`" then I get the old data I have deleted from the lakehouse.

 

When I use spark.read.load(<abfs_path>/Tables/<table_name>) I dont get this deleted data.

 

I have to use the abfs path as I am not setting a default lakehouse and can't set one to solve this.

 

Why is this old data coming up when I use spark.sql when the paths are exactly the same?

1 ACCEPTED SOLUTION

solved by changing it to delta

 

spark.sql("SELECT * FROM delta.`<abfs_path>/Tables/<table_name>`")

View solution in original post

6 REPLIES 6
wardy912
Solution Sage
Solution Sage

The paths are the same but you're using a different method to query them

spark.sql("SELECT * FROM parquet.`<abfs_path>/Tables/<table_name>`")

Spark SQL - may be using cached metadata

 

spark.read.load("<abfs_path>/Tables/<table_name>")

 

Dataframe API - reads current state of files

 

You could add a cell to your notebook that clears the cache if you want to use the Spark SQL code


spark.catalog.clearCache()

 

Please give a thumbs up if this helps, thanks

 

Unfortuantly clearing the cache doesn't work.

 

however this also gets the deleted data, so i think it's in specifying parquet.

spark.read.format("parquet").load(_table_abfs)
I want to be able to use a sql query and the abfs path to the data to load the data, any ideas on how i can do this? 
wardy912
Solution Sage
Solution Sage

df = spark.sql("""
    SELECT *
    FROM <lakehouse>.<schema>.<table>
""")
 
You can also drag the table from the left hand side in the lakehouse to a cell and it will automatically add a SQL query for that table.

I can't set it as the default lakehouse which is why i want to use the abfs path, how do you do this with the abfs path?

solved by changing it to delta

 

spark.sql("SELECT * FROM delta.`<abfs_path>/Tables/<table_name>`")

v-prasare
Community Support
Community Support

@Zoe_Guest Thanks for being part of Fabric community and making it grow. 

@wardy912 Thanks for your prompt response.

 

 

 

Thanks,

Prashanth Are

MS Fabric community support

Helpful resources

Announcements
November Fabric Update Carousel

Fabric Monthly Update - November 2025

Check out the November 2025 Fabric update to learn about new features.

Fabric Data Days Carousel

Fabric Data Weeks

Advance your Data & AI career with 50 days of live learning, contests, hands-on challenges, study groups & certifications and more!

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.

Users online (27)