Solved: Fabric tutorial failing on files path

xefere · ‎03-10-2024

Hi

I am doing this Fabric tutorial (Lakehouse tutorial - Prepare and transform data in the lakehouse - Microsoft Fabric | Microsoft Lear...)

When I use the code as provided,

from pyspark.sql.functions import col, year, month, quarter

table_name = 'fact_sale'

df = spark.read.parquet('Files/wwi-raw-data/full/fact_sale_1y_full')

df = df.withColumn('Year', year(col("InvoiceDateKey")))

df = df.withColumn('Quarter', quarter(col("InvoiceDateKey")))

df = df.withColumn('Month', month(col("InvoiceDateKey")))

df.write.mode("overwrite").format("delta").partitionBy("Year","Quarter").save("Tables/" + table_name)

I get the following error:

--------------------------------------------------------------------------- AnalysisException Traceback (most recent call last) Cell In[104], line 5 1 from pyspark.sql.functions import col, year, month, quarter 3 table_name = 'fact_sale' ----> 5 df = spark.read.parquet('Files/wwi-raw-data/full/fact_sale_1y_full') 6 df = df.withColumn('Year', year(col("InvoiceDateKey"))) 7 df = df.withColumn('Quarter', quarter(col("InvoiceDateKey")))

File /opt/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py:531, in DataFrameReader.parquet(self, *paths, **options) 520 int96RebaseMode = options.get("int96RebaseMode", None) 521 self._set_opts( 522 mergeSchema=mergeSchema, 523 pathGlobFilter=pathGlobFilter, (...) 528 int96RebaseMode=int96RebaseMode, 529 ) --> 531 return self._df(self._jreader.parquet(_to_seq(self._spark._sc, paths)))

File ~/cluster-env/trident_env/lib/python3.10/site-packages/py4j/java_gateway.py:1322, in JavaMember.__call__(self, *args) 1316 command = proto.CALL_COMMAND_NAME +\ 1317 self.command_header +\ 1318 args_command +\ 1319 proto.END_COMMAND_PART 1321 answer = self.gateway_client.send_command(command) -> 1322 return_value = get_return_value( 1323 answer, self.gateway_client, self.target_id, self.name) 1325 for temp_arg in temp_args: 1326 if hasattr(temp_arg, "_detach"😞

File /opt/spark/python/lib/pyspark.zip/pyspark/errors/exceptions/captured.py:175, in capture_sql_exception.<locals>.deco(*a, **kw) 171 converted = convert_exception(e.java_exception) 172 if not isinstance(converted, UnknownException): 173 # Hide where the exception came from that shows a non-Pythonic 174 # JVM exception message. --> 175 raise converted from None 176 else: 177 raise AnalysisException:

[PATH_NOT_FOUND] Path does not exist: abfss://8c1fa0f9-27f5-4bd6-9266-e6dfccd1cf2f@onelake.dfs.fabric.microsoft.com/99c8f3be-4e9e-4f83-83f1-cc325343cf6b/Files/wwi-raw-data/full/fact_sale_1y_full.

I've now noticed that the abfss path is not the same.

When I run this code with the abfss path copied form my lakehouse, it work perfectly:

from pyspark.sql.functions import col, year, month, quarter

table_name = 'fact_sale'

# Read each CSV file in the folder

df = spark.read.option("header", "true").parquet(files).select("*", "_metadata.file_name","_metadata.file_modification_time")

df = spark.read.parquet('abfss://8c1fa0f9-27f5-4bd6-9266-e6dfccd1cf2f@onelake.dfs.fabric.microsoft.com/cbbd6d1f-0ac3-402a-ab8f-fbc7093b6ccc/Files/wwi-raw-data/full/fact_sale_1y_full')

df = df.withColumn('Year', year(col("InvoiceDateKey")))

df = df.withColumn('Quarter', quarter(col("InvoiceDateKey")))

df = df.withColumn('Month', month(col("InvoiceDateKey")))

df.write.mode("overwrite").format("delta").partitionBy("Year","Quarter").save("abfss://8c1fa0f9-27f5-4bd6-9266-e6dfccd1cf2f@onelake.dfs.fabric.microsoft.com/cbbd6d1f-0ac3-402a-ab8f-fbc7093b6ccc/Tables/" + table_name)

What am I doing wrong or is wrong in my setup?

Anonymous · ‎03-13-2024

Hi @xefere ,

Apologies for the delay in reply from our side.
Based on the screenshot you provided, I can notice that lakehouse is not the default lakehouse in your case.
Once you change it to default lakehouse then you would be able to use the Relative File Path, i.e. 'Files/wwi-raw-data/full/fact_sale_1y_full'

Hope this is helpful. Please let me know incase of further queries.

View solution in original post

Anonymous · ‎03-13-2024

Hi @xefere ,

Apologies for the delay in reply from our side.
Based on the screenshot you provided, I can notice that lakehouse is not the default lakehouse in your case.
Once you change it to default lakehouse then you would be able to use the Relative File Path, i.e. 'Files/wwi-raw-data/full/fact_sale_1y_full'

Hope this is helpful. Please let me know incase of further queries.

Anonymous · ‎03-14-2024

Hi @xefere ,

We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet .
In case if you have any resolution please do share that same with the community as it can be helpful to others .
Otherwise, will respond back with the more details and we will try to help .

xefere · ‎03-14-2024

Thank you, I've added the Lakehouse in the Sources panel of the notebook and the relative path worked perfectly.

Anonymous · ‎03-14-2024

Glad to know that your query resolved. Please continue using fabric community for your further queries.

FabCon is coming to Atlanta

Fabric tutorial failing on files path

Helpful resources