How to use Snowflake VARIANT in Dataiku
If you are using Dataiku Data Science studio together with Snowflake, you might have faced situation how to load data into column which uses VARIANT data type in Snowflake. DSS by default does not support variant type but there is a way how to do it. In this quick tip I am going to show you how.
VARIANT is semi-structured data type which can store values of any other type, including OBJECT or ARRAY. Usually it is used for storing JSON documents in Snowflake. In my case I use variant columns to store encrypted data — result of ENCRYPT_RAW function.
I use Dataiku Data Science Studio for building the data pipelines and I’ve found out that VARIANT is not supported by Dataiku and according to their support there was no ETA to add the support. Let’s go through a workaround how to store results of recipes into VARIANT data type.
It is fair to mention that there is no way how to make it work if you want DSS to manage that dataset, meaning that DSS automatically generates table creation scripts for you. What you need to do is overwrite that automatically generated CREATE table script on dataset settings and manage it manually.
Go to Dataset → Settings → Advanced.
There is SQL section where you need to change Table creation mode from Automatically generate to Manually define, then Table creation SQL form entry become editable and you can manually update the DDL script for the table. Change data type of required columns into VARIANT and save the changes.
There is an obvious drawback of this solution — every time when there is a change in table structure (new or removed column), you need to go into this script definition and manually update it because DSS does not manage the table for you in this case. I forget to modify it every time but don’t worry, DSS is going to notify you by failing the recipe with error that schema is not in sync with recipe output. I think this small complication is acceptable if VARIANT data type support is crucial for you. Hope it helps. 🙂