Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Calling all Data Engineers! Fabric Data Engineer (Exam DP-700) live sessions are back! Starting October 16th. Sign up.

Reply
msprog
Contributor

Getting createddatetime of files in one lake

Hey team

we have a set of CSV files in a folder in one lake.   we want to write a code in pyspark that would give us the file name and the file create date time. 

Please how do we achieve this . Is there any functions that can give us the file createdates pls

 

Thanks

 

1 ACCEPTED SOLUTION
spencer_sa
Contributor III

I'm not sure it's possible to get the create date of a OneLake file, given it's an abstraction layer of ADLS Gen II.
You can obtain the last modified date (which for an unmodified file will be the create date) from the Get Metadata activity of a Pipeline - see the documentation for the ADF version below for which metadata items are available.
https://learn.microsoft.com/en-us/azure/data-factory/control-flow-get-metadata-activity


In a pyspark notebook you'd use mssparkutils/notebookutils and the .fs.ls function

files = mssparkutils.fs.ls('Your directory path')
for file in files:
    print(file.name, file.isDir, file.isFile, file.path, file.size, file.modifyTime)

https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/microsoft-spark-utilities?pivots=pro...


If this helps, please consider Accepting as a Solution to help others find it more easily

View solution in original post

2 REPLIES 2
spencer_sa
Contributor III

I'm not sure it's possible to get the create date of a OneLake file, given it's an abstraction layer of ADLS Gen II.
You can obtain the last modified date (which for an unmodified file will be the create date) from the Get Metadata activity of a Pipeline - see the documentation for the ADF version below for which metadata items are available.
https://learn.microsoft.com/en-us/azure/data-factory/control-flow-get-metadata-activity


In a pyspark notebook you'd use mssparkutils/notebookutils and the .fs.ls function

files = mssparkutils.fs.ls('Your directory path')
for file in files:
    print(file.name, file.isDir, file.isFile, file.path, file.size, file.modifyTime)

https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/microsoft-spark-utilities?pivots=pro...


If this helps, please consider Accepting as a Solution to help others find it more easily

Anonymous
Not applicable

Hi @msprog ,
Thank you @spencer_sa  for the valuable input!
As Spencer_sa suggested, using mssparkutils.fs.ls() is an efficient approach for retrieving file metadata in OneLake. This approach might help you to resolve the issue.

If this helps, please give us Kudos and consider Accept it as a solution to help the other members find it more quickly.
Thank you for being a valued member of the Microsoft Fabric Community Forum!

Regards,
Pallavi.


Helpful resources

Announcements
Users online (25)