Skip to main content
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Calling all Data Engineers! Fabric Data Engineer (Exam DP-700) live sessions are back! Starting October 16th. Sign up.

Reply
BoSe
New Contributor II

Unable to get list of files with ABFS path

Hello,

i added multiple lakehouses toone notebook.

Now i want to check what is the latest file in a specific folder in each of the lakehouses.

 

I'm able to access the data using the 'File API path' of the default Lakehouse:

 

list_of_files = glob.glob('/lakehouse/default/Files/.../input/*') 
last_modified_file = max(list_of_files, key=os.path.getmtime)
last_modified_file

 

 

 

However when i try to do the same with ABFS Path, i dont get a result in list_of_files. It just returns an empty list:

 

list_of_files = glob.glob('abfss://.../input/*')
list_of_files

 

 

If i try to read data with the ABFS Path i works without any issue- so i can not be an issue with path/permission:

 

df = pd.read_csv('abfss://...input/example.csv',sheet_name="Tabelle1")

 

 

 

Any idea how to make it work that not only the default lakehouse can be accessed but also another as datasource added lakehouses?

1 ACCEPTED SOLUTION
Anonymous
Not applicable

Hi @BoSe ,

Thanks for using Fabric Community.
It looks like glob is not able to process it when we are passing the abfs path. This looks like a limitation with glob.

You can also try mssparkutils.fs.ls - Introduction to Microsoft Spark utilities - Azure Synapse Analytics | Microsoft Learn

 

files = mssparkutils.fs.ls("abfss://5e****dd/Files")
file_paths = [f.path for f in files]
print(file_path)

 


Hope this is helpful. Please let me know incase of further queries.

View solution in original post

4 REPLIES 4
Anonymous
Not applicable

Hi @BoSe ,

Thanks for using Fabric Community.
It looks like glob is not able to process it when we are passing the abfs path. This looks like a limitation with glob.

You can also try mssparkutils.fs.ls - Introduction to Microsoft Spark utilities - Azure Synapse Analytics | Microsoft Learn

 

files = mssparkutils.fs.ls("abfss://5e****dd/Files")
file_paths = [f.path for f in files]
print(file_path)

 


Hope this is helpful. Please let me know incase of further queries.

BoSe
New Contributor II

I was able to get the latest file (with all the info), with this addition:

 

latest_file = max(files, key=lambda file: file.modifyTime)

Thanks for the input.

 

Anonymous
Not applicable

Glad to know your query got resolved. Please continue using Fabric Communty for your further queries.

zzzsharepoint
New Contributor III

You get the path.. but it does not work with abfss path

Helpful resources

Announcements
Top Kudoed Authors
Users online (2,084)