Skip to main content
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Calling all Data Engineers! Fabric Data Engineer (Exam DP-700) live sessions are back! Starting October 16th. Sign up.

Reply
joshuaking
New Contributor II

write_deltalake with Python Notebook is creating an "Unidentified" folder in my lakehouse.

Hi there, all!

 

Due to the smaller size of our data at my organization, we decided to use Polars and PyArrow for our data transformation in lieu of PySpark, Spark SQL, and other distributed data processing applications.

 

After making a transformation and attempting to write a DataFrame, it drops off into this "Unidentified" folder instead of registering as a delta table. I am able to query it from the SQL endpoint and read it from the "/lakehouse/default/Tables/" directory using pl.read_delta, but it doesn't seem to identify the delta table like it should (see image below)

joshuaking_1-1739222466741.png

 

I have tried to write this code in numerous ways, such as:
- Referencing both the ABFS path and the local path (/lakehouse/default/Tables)

- Kept dataframe as Polars DataFrame, and have tried switching it to PyArrow DF and Pandas DF.

- Used polars' native write_delta, and the write_deltatable function in the deltalake package.

- I've even tried Microsoft Fabric's "Write data to delta table" code snippet for the Python notebooks and that also places items under "Unidentified".

 

What am I doing wrong? I'm beginning to run out of ideas.

Thanks a bunch for any help you all can offer.

2 ACCEPTED SOLUTIONS
nilendraFabric
Honored Contributor

Hello @joshuaking 

 

Here is what I am thinking could happened. 

when writing a Delta table using Python functions like `write_deltalake` or Polarsโ€™ `write_delta`, the underlying data files are correctly created and queryable, yet the lakehouse metadata may not recognize or register the table properly. This can lead to the table being shown under an โ€œUnidentifiedโ€ folder

 

Writing data to a specific path does not always automatically update the lakehouse catalog. 

try using ddl command like in your notebook 
`CREATE TABLE student USING DELTA LOCATION '/lakehouse/your_schema/your_path/student';`
to inform the system about the logical table name

 

if this helps please accept the solution and give kudos

View solution in original post

Yes it is automatic in Fabric

 

When files in Delta format (Parquet + `_delta_log` folder) are placed in the managed area of a Lakehouse, Fabric automatically registers them as tables in the catalog.

 

In most thr cases properly structured delta file from Unidentified folder eventually moves to Table

View solution in original post

9 REPLIES 9
nilendraFabric
Honored Contributor

Hello @joshuaking 

 

Here is what I am thinking could happened. 

when writing a Delta table using Python functions like `write_deltalake` or Polarsโ€™ `write_delta`, the underlying data files are correctly created and queryable, yet the lakehouse metadata may not recognize or register the table properly. This can lead to the table being shown under an โ€œUnidentifiedโ€ folder

 

Writing data to a specific path does not always automatically update the lakehouse catalog. 

try using ddl command like in your notebook 
`CREATE TABLE student USING DELTA LOCATION '/lakehouse/your_schema/your_path/student';`
to inform the system about the logical table name

 

if this helps please accept the solution and give kudos

I'll give this a try and see what happens. While I'm working on it, do you know if the lakehouse catalog updates on its own periodically? If not, what do you assume would be the best way to register new tables automatically?

Yes it is automatic in Fabric

 

When files in Delta format (Parquet + `_delta_log` folder) are placed in the managed area of a Lakehouse, Fabric automatically registers them as tables in the catalog.

 

In most thr cases properly structured delta file from Unidentified folder eventually moves to Table

v-tsaipranay
Honored Contributor II

Hi @joshuaking,

Thanks for reaching out to the Microsoft fabric community forum.

 

As @nilendraFabric  mentioned in his post, could you confirm whether it resolved your issue? Please feel free to reach out if you have any further questions. If the response addressed your query, please accept it as a solution and give a 'Kudos' so other members can easily find it.

 

Thankyou.

v-tsaipranay
Honored Contributor II

Hi @joshuaking ,

 

May I ask if you have resolved this issue? If so, please mark the helpful reply and accept it as the solution. This will be helpful for other community members who have similar problems to solve it faster.

 

Thank you.

v-tsaipranay
Honored Contributor II

Hello @joshuaking ,

 

Could you please confirm if the issue has been resolved? It would be greatly appreciated if you could share your insights. Feel free to reach out if you have any further questions. please accept it as a solution and give a 'Kudos' so other members can easily find it.

 

Thank you.

When writing to delta tables in lakehouse using Python notebooks, you have to provide the full name of table, including the schema name. So if you want to register your table to `dbo`, your full path should be: 

abfss://<ws>@onelake.dfs.fabric.microsoft.com/<lh>.Lakehouse/Tables/dbo/<table-name>
 
Otherwise, it would register to an unidentified schema. 
v-tsaipranay
Honored Contributor II

Hi @Rags2596 , 

Thank you for your helpful response.

Thibauld_c
New Contributor

For those wanting to use Polars you can write to your Lakehouse with write_delta by using the full URL. For example:

 

import polars as pl

data = {"a": [1, 2], "b": [3, 4]}
df = pl.DataFrame(data)

df.write_delta("abfss://workspace-name@onelake.dfs.fabric.microsoft.com/lakehouse-name.Lakehouse/Tables/table-name")

Helpful resources

Announcements
Users online (9,584)