Solved: Notebook parameter inaccessible in next code cell

jnickell · ‎07-19-2024

I'm new to Data Engineering / Notebooks and am trying to follow a youtube video to setup a notebook for schema validation after data has been landed in Lakehouse files.

I have a parameter cell with a 'fileToTest' and a output_table_name. When I try to use the 'fileToTest' parameter outside of the parameter code cell it doesn't work and I get a "NameError: name 'fileToTest' is not defined" error.

--- Updated with some additional findings

I only have 3 code cells.

Cell 1: (Parameters)

fileToTest = "Files/folder/file.csv"
output_table_name = 'raw_users'

Cell 2 (attempting to install GreatExpectations

%pip install --q great_expectations

df = spark.read.format("csv").option("header","true").load(fileToTest)
display(df)

UPDATE:
IF I comment out the pip install command the spark.read operation will work. Not sure what this means. Of course in my notebook that means cell 3 fails when it tries to import great_expectations.

Cell 3 (attempting to create validations within Great Expectations Context

import great_expectations as gx
gxContext = gx.get_context()
validator = gxContext.sources.pandas_default.read_csv(fileToTest)

Either of the references above to 'fileToTest' fail with the NameError.

If I move this code to Cell 1 it works w/o issue

df = spark.read.format("csv").option("header","true").load(fileToTest)
display(df)

For reference the original video is here:

https://youtu.be/wAayC-J9TsU?si=D25oMc7oZfpGFrxc

jnickell · ‎07-19-2024

Found the cause of my issue. Learning newbie here.

spark.read works with the Files/.... path

great_Expectations / ?pandas? requires /lakehouse/default/ to be prepended to the path

View solution in original post

jnickell · ‎07-19-2024

Found the cause of my issue. Learning newbie here.

spark.read works with the Files/.... path

great_Expectations / ?pandas? requires /lakehouse/default/ to be prepended to the path

frithjof_v · ‎07-20-2024

I don't have experience with parameter cell, however my initial thought when you got the "NameError: name 'fileToTest' is not defined" was that you had not executed (run) Cell 1 before you tried to use the fileToTest variable in another cell.

So therefore the 'fileToTest' variable didn't exist at that moment when you executed another cell.