Skip to main content
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Calling all Data Engineers! Fabric Data Engineer (Exam DP-700) live sessions are back! Starting October 16th. Sign up.

Reply
BryanCarmichael
New Contributor III

Load data from Semantic Model to Lakehouse where semantic model column names start with Capitals

Hi there - trying to lload a tabe from a semantic model to a lakehouse but the issue is that the table in the semantic model I want to load from has column names that start with Capital letters (for ease of use). 

Obviously this breaks the rules of lakehouse tables  - how do i fix this - ideally dynamically ?

Code below 

import sempy.fabric as fabric

#Dev
#workspace_ID = "XXXXXX"
#Stage
#workspace_ID = "XXXXXX"
#Prod
workspace_ID = "XXXXXX"

workspace_Safe_ID = workspace_ID.replace("-","_")
df_table = fabric.read_table("PSG_CTD_GDS_OMNI_Production", "_Environments",workspace=workspace_ID)

df_table.to_lakehouse_table("Environments", "overwrite")

And error log 

Cell In[8], line 13
10 workspace_Safe_ID = workspace_ID.replace("-","_")
11 df_table = fabric.read_table("PSG_CTD_GDS_OMNI_Production", "_Environments",workspace=workspace_ID)
---> 13 df_table.to_lakehouse_table("Environments", "overwrite")

 

File ~/cluster-env/trident_env/lib/python3.10/site-packages/sempy/_utils/_log.py:273, in mds_log.<locals>.get_wrapper.<locals>.log_decorator_wrapper(*args, **kwargs)
270 raise
272 try:
--> 273 result = func(*args, **kwargs)
275 # The invocation for get_message_dict moves after the function
276 # so it can access the state after the method call
277 message.update(extractor.get_completion_message_dict(result, arg_dict))

File ~/cluster-env/trident_env/lib/python3.10/site-packages/sempy/fabric/_dataframe/_fabric_dataframe.py:559, in FabricDataFrame.to_lakehouse_table(self, name, mode, spark_schema)


550 with SparkConfigTemporarily(spark, "spark.sql.parquet.int96RebaseModeInWrite", "CORRECTED"):
551 spark_df = spark.createDataFrame(df_converted, schema=converted_schema)
553 (spark_df.write
554 .option("parquet.vorder.enabled", True)
555 .mode(mode)
556 .format("delta")
557 # enable column mapping to support special characters common w/ Power BI (e.g. [])
558 .option("delta.columnMapping.mode", "name")
--> 559 .saveAsTable(name))

File /opt/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py:1521, in DataFrameWriter.saveAsTable(self, name, format, mode, partitionBy, **options)


1519 if format is not None:
1520 self.format(format)
-> 1521 self._jwrite.saveAsTable(name)

 

File ~/cluster-env/trident_env/lib/python3.10/site-packages/py4j/java_gateway.py:1322, in JavaMember.__call__(self, *args)
1316 command = proto.CALL_COMMAND_NAME +\
1317 self.command_header +\
1318 args_command +\
1319 proto.END_COMMAND_PART
1321 answer = self.gateway_client.send_command(command)
-> 1322 return_value = get_return_value(
1323 answer, self.gateway_client, self.target_id, self.name)
1325 for temp_arg in temp_args:
1326 if hasattr(temp_arg, "_detach"):

File /opt/spark/python/lib/pyspark.zip/pyspark/errors/exceptions/captured.py:175, in capture_sql_exception.<locals>.deco(*a, **kw)
171 converted = convert_exception(e.java_exception)
172 if not isinstance(converted, UnknownException):
173 # Hide where the exception came from that shows a non-Pythonic
174 # JVM exception message.
--> 175 raise converted from None
176 else:
177 raise

2 REPLIES 2
frithjof_v
Honored Contributor

ChatGPT helped me with this. The main item in the code is this:

 

df_lower = df.toDF(*[c.lower() for c in df.columns])

 

 

ChatGPT created this example:

 

# Example DataFrame
data = [("John", 28), ("Alice", 23), ("Bob", 35)]
columns = ["Name", "Age"]
df = spark.createDataFrame(data, columns)

# Convert all column names to lowercase
df_lower = df.toDF(*[c.lower() for c in df.columns])

# Show the resulting DataFrame
df_lower.show()

 

Explanation:

  • df.columns: Retrieves the list of column names.
  • [c.lower() for c in df.columns]: Converts each column name to lowercase.
  • df.toDF(*[c.lower() for c in df.columns]): Creates a new DataFrame with the lowercase column names.

This will rename all columns in the DataFrame to their lowercase equivalents.

 

Here is a StackOverflow thread mentioning the same method. Perhaps ChatGPT learned from it:

https://stackoverflow.com/questions/43005744/convert-columns-of-pyspark-data-frame-to-lowercase

 

So in your case, the code could be like this:

 

import sempy.fabric as fabric

#Dev
#workspace_ID = "XXXXXX"
#Stage
#workspace_ID = "XXXXXX"
#Prod
workspace_ID = "XXXXXX"

workspace_Safe_ID = workspace_ID.replace("-","_")
df_table = fabric.read_table("PSG_CTD_GDS_OMNI_Production", "_Environments",workspace=workspace_ID)
df_lower = df_table.toDF(*[c.lower() for c in df_table.columns])

df_lower.to_lakehouse_table("Environments", "overwrite")

 

 

Anonymous
Not applicable

Hi @BryanCarmichael 

Did you have a chance to try @frithjof_v 's solution? Does it work or not? Or did you find any other solution? Let us know if you have any questions. 

 

Best Regards,
Jing

Helpful resources

Announcements
Users online (25)