Skip to main content
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Calling all Data Engineers! Fabric Data Engineer (Exam DP-700) live sessions are back! Starting October 16th. Sign up.

Reply
pmscorca
Valued Contributor

Using Spark SQL to alter a column of an existing lakehouse table

Hi,

I try to use a Spark SQL code to alter the data type and the lenght of a column of an existing lakehouse table.

This syntax goes in error

%%sql
ALTER TABLE MYTABLE ALTER COLUMN mycolumn varchar(10)

Any helps to me, please? Thanks

Unfortunately, I cannot use an equivalent T-SQL statement against a lakehouse.

1 ACCEPTED SOLUTION
Anonymous
Not applicable

Hi @pmscorca ,

 

Thanks for the reply from @frithjof_v .

 

The types supported by Cast() method include: "string", "boolean", "byte", "short", "int", "long", "float", "double", "decimal", "date", "timestamp".

 

Here is an official document about Cast() method:

Column.Cast(String) Method (Microsoft.Spark.Sql) - .NET for Apache Spark | Microsoft Learn

 

In Spark SQL, there is no direct way to cast a string to a varchar with a specified length.

 

Here is an alternative:

 

You can use withColumn and cast:

from pyspark.sql.functions import col

df = spark.sql("SELECT * FROM MYTABLE")
df = df.withColumn("mycolumn", col("mycolumn").cast("new_data_type"))

 

If you have any other questions, please feel free to contact me.

 

Best Regards,
Yang
Community Support Team

 

If there is any post helps, then please consider Accept it as the solution  to help the other members find it more quickly.
If I misunderstand your needs or you still have problems on it, please feel free to let us know. Thanks a lot!

View solution in original post

6 REPLIES 6
Anonymous
Not applicable

Hi @pmscorca ,

 

In Apache Spark SQL, you cannot directly change the data type of an existing column using the ALTER TABLE command.

vhuijieymsft_0-1720584541838.png

 

You need to create a new DataFrame with the changed column type and then overwrite the existing table with the new DataFrame.

 

Here is a sample PySpark code:

 

from pyspark.sql.functions import col

# Load the existing table into a DataFrame
df = spark.table("MYTABLE")

# Cast the column to the desired type
df = df.withColumn("mycolumn", col("mycolumn").cast("string"))

# Overwrite the existing table
df.write.mode('overwrite').saveAsTable("MYTABLE")

 

 

Replace "MYTABLE" and "mycolumn" with the name of the table and column, and "string" with the desired data type. Note that the length parameter does not apply when casting to a string in Spark.

 

One small note, remember to back up your data before proceeding.

 

Best Regards,
Yang
Community Support Team

 

If there is any post helps, then please consider Accept it as the solution  to help the other members find it more quickly.
If I misunderstand your needs or you still have problems on it, please feel free to let us know. Thanks a lot!

pmscorca
Valued Contributor

Hi,

I've tried to use the cast method and I've noticed that if I try to pass from a string data type to a int data type working with an empty table I've an error about the data type incompatibility.
I'd like to know which are the data types supported by the cast method.

I'd like to cast to a varchar with a specified lenght.

I'd like that the MS official documentation could report more useful and reusable spark codes to implement the notebooks.

Thanks 

Anonymous
Not applicable

Hi @pmscorca ,

 

Thanks for the reply from @frithjof_v .

 

The types supported by Cast() method include: "string", "boolean", "byte", "short", "int", "long", "float", "double", "decimal", "date", "timestamp".

 

Here is an official document about Cast() method:

Column.Cast(String) Method (Microsoft.Spark.Sql) - .NET for Apache Spark | Microsoft Learn

 

In Spark SQL, there is no direct way to cast a string to a varchar with a specified length.

 

Here is an alternative:

 

You can use withColumn and cast:

from pyspark.sql.functions import col

df = spark.sql("SELECT * FROM MYTABLE")
df = df.withColumn("mycolumn", col("mycolumn").cast("new_data_type"))

 

If you have any other questions, please feel free to contact me.

 

Best Regards,
Yang
Community Support Team

 

If there is any post helps, then please consider Accept it as the solution  to help the other members find it more quickly.
If I misunderstand your needs or you still have problems on it, please feel free to let us know. Thanks a lot!

pmscorca
Valued Contributor

Hi,

however does it exist a manner to cast to a varchar with a specified length by a pyspark code, Scala code, etc.? Thanks

pmscorca
Valued Contributor

Hi,

I can use this PySpark code successfully:

sql_sample_alter_table = """
ALTER TABLE sql_sample_table ALTER COLUMN descr TYPE VARCHAR(100)
"""
spark.sql(sql_sample_alter_table)
frithjof_v
Honored Contributor

This worked for me when I tried it a couple of months ago:

 

spark.read.table("tableName")\

   .withColumn("columnName", col("columnName").cast("columnType"))\

   .write\

   .mode("overwrite")\

   .option("overwriteSchema", "true")\

   .saveAsTable("tableName")

 

https://community.fabric.microsoft.com/t5/General-Discussion/Dropping-and-recreating-lakehouse-table...

Helpful resources

Announcements
Top Kudoed Authors
Users online (27)