Load data from Semantic Model to Lakehouse where s...

BryanCarmichael · ‎08-16-2024

Hi there - trying to lload a tabe from a semantic model to a lakehouse but the issue is that the table in the semantic model I want to load from has column names that start with Capital letters (for ease of use).

Obviously this breaks the rules of lakehouse tables - how do i fix this - ideally dynamically ?

Code below

import sempy.fabric as fabric

#Dev

#workspace_ID = "XXXXXX"

#Stage

#workspace_ID = "XXXXXX"

#Prod

workspace_ID = "XXXXXX"

workspace_Safe_ID = workspace_ID.replace("-","_")

df_table = fabric.read_table("PSG_CTD_GDS_OMNI_Production", "_Environments",workspace=workspace_ID)

df_table.to_lakehouse_table("Environments", "overwrite")

And error log

Cell In[8], line 13
10 workspace_Safe_ID = workspace_ID.replace("-","_")
11 df_table = fabric.read_table("PSG_CTD_GDS_OMNI_Production", "_Environments",workspace=workspace_ID)
---> 13 df_table.to_lakehouse_table("Environments", "overwrite")

File ~/cluster-env/trident_env/lib/python3.10/site-packages/sempy/_utils/_log.py:273, in mds_log.<locals>.get_wrapper.<locals>.log_decorator_wrapper(*args, **kwargs)
270 raise
272 try:
--> 273 result = func(*args, **kwargs)
275 # The invocation for get_message_dict moves after the function
276 # so it can access the state after the method call
277 message.update(extractor.get_completion_message_dict(result, arg_dict))

File ~/cluster-env/trident_env/lib/python3.10/site-packages/sempy/fabric/_dataframe/_fabric_dataframe.py:559, in FabricDataFrame.to_lakehouse_table(self, name, mode, spark_schema)

550 with SparkConfigTemporarily(spark, "spark.sql.parquet.int96RebaseModeInWrite", "CORRECTED"):
551 spark_df = spark.createDataFrame(df_converted, schema=converted_schema)
553 (spark_df.write
554 .option("parquet.vorder.enabled", True)
555 .mode(mode)
556 .format("delta")
557 # enable column mapping to support special characters common w/ Power BI (e.g. [])
558 .option("delta.columnMapping.mode", "name")
--> 559 .saveAsTable(name))

File /opt/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py:1521, in DataFrameWriter.saveAsTable(self, name, format, mode, partitionBy, **options)

1519 if format is not None:
1520 self.format(format)
-> 1521 self._jwrite.saveAsTable(name)

File ~/cluster-env/trident_env/lib/python3.10/site-packages/py4j/java_gateway.py:1322, in JavaMember.__call__(self, *args)
1316 command = proto.CALL_COMMAND_NAME +\
1317 self.command_header +\
1318 args_command +\
1319 proto.END_COMMAND_PART
1321 answer = self.gateway_client.send_command(command)
-> 1322 return_value = get_return_value(
1323 answer, self.gateway_client, self.target_id, self.name)
1325 for temp_arg in temp_args:
1326 if hasattr(temp_arg, "_detach"):

File /opt/spark/python/lib/pyspark.zip/pyspark/errors/exceptions/captured.py:175, in capture_sql_exception.<locals>.deco(*a, **kw)
171 converted = convert_exception(e.java_exception)
172 if not isinstance(converted, UnknownException):
173 # Hide where the exception came from that shows a non-Pythonic
174 # JVM exception message.
--> 175 raise converted from None
176 else:
177 raise

frithjof_v · ‎08-16-2024

ChatGPT helped me with this. The main item in the code is this:

df_lower = df.toDF(*[c.lower() for c in df.columns])

ChatGPT created this example:

# Example DataFrame
data = [("John", 28), ("Alice", 23), ("Bob", 35)]
columns = ["Name", "Age"]
df = spark.createDataFrame(data, columns)

# Convert all column names to lowercase
df_lower = df.toDF(*[c.lower() for c in df.columns])

# Show the resulting DataFrame
df_lower.show()

Explanation:

df.columns: Retrieves the list of column names.
[c.lower() for c in df.columns]: Converts each column name to lowercase.
df.toDF(*[c.lower() for c in df.columns]): Creates a new DataFrame with the lowercase column names.

This will rename all columns in the DataFrame to their lowercase equivalents.

Here is a StackOverflow thread mentioning the same method. Perhaps ChatGPT learned from it:

https://stackoverflow.com/questions/43005744/convert-columns-of-pyspark-data-frame-to-lowercase

So in your case, the code could be like this:

import sempy.fabric as fabric

#Dev
#workspace_ID = "XXXXXX"
#Stage
#workspace_ID = "XXXXXX"
#Prod
workspace_ID = "XXXXXX"

workspace_Safe_ID = workspace_ID.replace("-","_")
df_table = fabric.read_table("PSG_CTD_GDS_OMNI_Production", "_Environments",workspace=workspace_ID)
df_lower = df_table.toDF(*[c.lower() for c in df_table.columns])

df_lower.to_lakehouse_table("Environments", "overwrite")

Anonymous · ‎08-21-2024

Hi @BryanCarmichael

Did you have a chance to try @frithjof_v 's solution? Does it work or not? Or did you find any other solution? Let us know if you have any questions.

Best Regards,
Jing

Load data from Semantic Model to Lakehouse where semantic model column names start with Capitals

Helpful resources

FabCon is coming to Atlanta

Load data from Semantic Model to Lakehouse where semantic model column names start with Capitals

Helpful resources