Join us at FabCon Atlanta from March 16 - 20, 2026, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.
Register now!Calling all Data Engineers! Fabric Data Engineer (Exam DP-700) live sessions are back! Starting October 16th. Sign up.
I need to create one delte table. And it will be upserted many times as the time on. I have the code like below:
from delta.tables import *
DeltaTable.createIfNotExists(spark) \
.tableName("test") \
.addColumn("id", "VARCHAR(15)")\
.addColumn("code", "CHAR(3)")\
.execute()
dt_test = DeltaTable.forName(spark, "test")
dt_test_update = spark.createDataFrame([
("1","001"),
("2","002"),
],
schema=["id","code"]
)
dt_test.alias('test') \
.merge(
dt_test_update.alias('updates'),
'test.id = updates.id'
) \
.whenNotMatchedInsertAll()\
.whenMatchedUpdateAll()\
.execute()
df = spark.sql("select * from test")
display(df)
got the error below :
"Resolved attribute(s) id#33730,code#33731 missing from id#33631,code#33633,_metadata#33735 in operator !Project [id#33730, staticinvoke(class org.apache.spark.sql.catalyst.util.CharVarcharCodegenUtils, StringType, readSidePadding, code#33731, 3, true, false, true) AS code#33732]. Attribute(s) with the same name appear in the operation: id,code. Please check if the right attribute(s) are used.;"
But if I change '.addColumn("code", "CHAR(3)")\' to '.addColumn("code", "VARCHAR(3)")\' when i create the delta table.
It will be OK.
Any idea about this?
Solved! Go to Solution.
I think it would be better to use STRING data types instead of CHAR
Hi @Winnie2024
Thanks to @SachinNandanwar 's suggestion. Based on my test, it is very helpful!
PySpark doesnโt have a direct CHAR or VARCHAR type, you can use StringType() or "STRING" to represent the VARCHAR(15) and CHAR(3) data types.
Try
from delta.tables import DeltaTable
from pyspark.sql.types import StringType
# Create a Delta table with two columns
DeltaTable.createIfNotExists(spark) \
.tableName("test") \
.addColumn("id", StringType()) \
.addColumn("code", StringType()) \
.execute()
or
from delta.tables import *
from pyspark.sql.types import *
DeltaTable.createIfNotExists(spark) \
.tableName("test") \
.addColumn("id", "STRING") \
.addColumn("code", "STRING") \
.execute()
Based on my test, both of the above work for your later merge test code without any error.
Best Regards,
Jing
If this post helps, please Accept it as Solution to help other members find it. Appreciate your Kudos!
I think it would be better to use STRING data types instead of CHAR
Hi @Winnie2024
Thanks to @SachinNandanwar 's suggestion. Based on my test, it is very helpful!
PySpark doesnโt have a direct CHAR or VARCHAR type, you can use StringType() or "STRING" to represent the VARCHAR(15) and CHAR(3) data types.
Try
from delta.tables import DeltaTable
from pyspark.sql.types import StringType
# Create a Delta table with two columns
DeltaTable.createIfNotExists(spark) \
.tableName("test") \
.addColumn("id", StringType()) \
.addColumn("code", StringType()) \
.execute()
or
from delta.tables import *
from pyspark.sql.types import *
DeltaTable.createIfNotExists(spark) \
.tableName("test") \
.addColumn("id", "STRING") \
.addColumn("code", "STRING") \
.execute()
Based on my test, both of the above work for your later merge test code without any error.
Best Regards,
Jing
If this post helps, please Accept it as Solution to help other members find it. Appreciate your Kudos!
| User | Count |
|---|---|
| 1 | |
| 1 | |
| 1 | |
| 1 | |
| 1 |