Skip to main content
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Calling all Data Engineers! Fabric Data Engineer (Exam DP-700) live sessions are back! Starting October 16th. Sign up.

Reply
xefere
New Contributor

Fabric tutorial failing on files path

Hi

 

I am doing this Fabric tutorial (Lakehouse tutorial - Prepare and transform data in the lakehouse - Microsoft Fabric | Microsoft Lear...)

 

When I use the code as provided,

from pyspark.sql.functions import col, year, month, quarter

table_name = 'fact_sale'

df = spark.read.parquet('Files/wwi-raw-data/full/fact_sale_1y_full')
df = df.withColumn('Year', year(col("InvoiceDateKey")))
df = df.withColumn('Quarter', quarter(col("InvoiceDateKey")))
df = df.withColumn('Month', month(col("InvoiceDateKey")))

df.write.mode("overwrite").format("delta").partitionBy("Year","Quarter").save("Tables/" + table_name)

 

I get the following error:

--------------------------------------------------------------------------- AnalysisException Traceback (most recent call last) Cell In[104], line 5 1 from pyspark.sql.functions import col, year, month, quarter 3 table_name = 'fact_sale' ----> 5 df = spark.read.parquet('Files/wwi-raw-data/full/fact_sale_1y_full') 6 df = df.withColumn('Year', year(col("InvoiceDateKey"))) 7 df = df.withColumn('Quarter', quarter(col("InvoiceDateKey")))

File /opt/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py:531, in DataFrameReader.parquet(self, *paths, **options) 520 int96RebaseMode = options.get("int96RebaseMode", None) 521 self._set_opts( 522 mergeSchema=mergeSchema, 523 pathGlobFilter=pathGlobFilter, (...) 528 int96RebaseMode=int96RebaseMode, 529 ) --> 531 return self._df(self._jreader.parquet(_to_seq(self._spark._sc, paths)))

File ~/cluster-env/trident_env/lib/python3.10/site-packages/py4j/java_gateway.py:1322, in JavaMember.__call__(self, *args) 1316 command = proto.CALL_COMMAND_NAME +\ 1317 self.command_header +\ 1318 args_command +\ 1319 proto.END_COMMAND_PART 1321 answer = self.gateway_client.send_command(command) -> 1322 return_value = get_return_value( 1323 answer, self.gateway_client, self.target_id, self.name) 1325 for temp_arg in temp_args: 1326 if hasattr(temp_arg, "_detach"๐Ÿ˜ž

File /opt/spark/python/lib/pyspark.zip/pyspark/errors/exceptions/captured.py:175, in capture_sql_exception.<locals>.deco(*a, **kw) 171 converted = convert_exception(e.java_exception) 172 if not isinstance(converted, UnknownException): 173 # Hide where the exception came from that shows a non-Pythonic 174 # JVM exception message. --> 175 raise converted from None 176 else: 177 raise AnalysisException:

[PATH_NOT_FOUND] Path does not exist: abfss://8c1fa0f9-27f5-4bd6-9266-e6dfccd1cf2f@onelake.dfs.fabric.microsoft.com/99c8f3be-4e9e-4f83-83f1-cc325343cf6b/Files/wwi-raw-data/full/fact_sale_1y_full.

 

xefere_0-1710114140024.png

 

I've now noticed that the abfss path is not the same.

 

When I run this code with the abfss path copied form my lakehouse, it work perfectly:

 

from pyspark.sql.functions import col, year, month, quarter

table_name = 'fact_sale'

# Read each CSV file in the folder
df = spark.read.option("header", "true").parquet(files).select("*", "_metadata.file_name","_metadata.file_modification_time")

df = spark.read.parquet('abfss://8c1fa0f9-27f5-4bd6-9266-e6dfccd1cf2f@onelake.dfs.fabric.microsoft.com/cbbd6d1f-0ac3-402a-ab8f-fbc7093b6ccc/Files/wwi-raw-data/full/fact_sale_1y_full')
df = df.withColumn('Year', year(col("InvoiceDateKey")))
df = df.withColumn('Quarter', quarter(col("InvoiceDateKey")))
df = df.withColumn('Month', month(col("InvoiceDateKey")))
 
df.write.mode("overwrite").format("delta").partitionBy("Year","Quarter").save("abfss://8c1fa0f9-27f5-4bd6-9266-e6dfccd1cf2f@onelake.dfs.fabric.microsoft.com/cbbd6d1f-0ac3-402a-ab8f-fbc7093b6ccc/Tables/" + table_name)
 
What am I doing wrong or is wrong in my setup?
1 ACCEPTED SOLUTION
Anonymous
Not applicable

Hi @xefere ,

Apologies for the delay in reply from our side. 
Based on the screenshot you provided, I can notice that lakehouse is not the default lakehouse in your case.
Once you change it to default lakehouse then you would be able to use the Relative File Path, i.e. 'Files/wwi-raw-data/full/fact_sale_1y_full'

vgchennamsft_0-1710313804660.png


Hope this is helpful. Please let me know incase of further queries.

 

View solution in original post

4 REPLIES 4
Anonymous
Not applicable

Hi @xefere ,

Apologies for the delay in reply from our side. 
Based on the screenshot you provided, I can notice that lakehouse is not the default lakehouse in your case.
Once you change it to default lakehouse then you would be able to use the Relative File Path, i.e. 'Files/wwi-raw-data/full/fact_sale_1y_full'

vgchennamsft_0-1710313804660.png


Hope this is helpful. Please let me know incase of further queries.

 

Anonymous
Not applicable

Hi @xefere ,

We havenโ€™t heard from you on the last response and was just checking back to see if you have a resolution yet .
In case if you have any resolution please do share that same with the community as it can be helpful to others .
Otherwise, will respond back with the more details and we will try to help .

xefere
New Contributor

Thank you, I've added the Lakehouse in the Sources panel of the notebook and the relative path worked perfectly.

Anonymous
Not applicable

Glad to know that your query resolved. Please continue using fabric community for your further queries.

Helpful resources

Announcements
Users online (25)