Dataiku is one of the leaders in AI/ML segment and their tool called Data Science Studio is really a gem but can you use that tool as your one and only ELT / ETL tool for your cloud data warehouse? No matter if you are going to build it on top of Snowflake, Cloudera or any other cloud based technology? Sure thing! We did it and not only once. Read my experiences with building extensive data pipelines with Dataiku based on various use cases we have been working on during last almost 3 years.

Dataiku has been founded 2013 and…


Maybe you are preparing for your Snowflake certification and you are trying to learn what you can clone or share. Or maybe you are wondering during your daily work “can i clone this and that”? What about my new stream or task? Can I share it with my reader account? Of course all that can be found in documentation but it is scattered over multiple pages. During my certification preparation I was looking for some kind of overview which would combine all those features in one place and present it in some easy to read and easy to remember format…


Have you been thinking about being Snowflake certified but do not know where to start or what resources are used for learning? I recently passed both available certifications — Snowflake Core Designer Certification and SnowPro Advanced Architect certification. I would like to share my learning journey and provide some overview — what resources are available, which are must, which are nice to check, and of course which ones you can skip and still be successful. There are materials provided directly by Snowflake but there are also third-party courses and trainings. If you want it is possible to prepare for certifications…


If you are using Dataiku Data Science studio together with Snowflake, you might have faced situation how to load data into column which uses VARIANT data type in Snowflake. DSS by default does not support variant type but there is a way how to do it. In this quick tip I am going to show you how.

How?

VARIANT is semi-structured data type which can store values of any other type, including OBJECT or ARRAY. Usually it is used for storing JSON documents in Snowflake. In my case I use variant columns to store encrypted data — result of ENCRYPT_RAW function.


Learn how to monetize data and build data-intensive applications with the Snowflake Data Cloud.

In September 2019 I have shared One success story of the cloud delivery, brief description about one small data project which is being developed by using cutting-edge BI cloud tools and technologies like Snowflake, AWS, Dataiku, or Apache Superset. The project which I can be part of. Today I have an update for you. We have not been sleeping for those 14 months but intensively working with a focus on bringing more value for customers, make the solution more secure thanks to the latest available Snowflake features and follow the best practices in relation to the architecture of data apps…


In the middle of November 2020 there was Snowflake data cloud summit conference. Naturally, it was virtual this year. There was 40+ sessions divided into several tracks covering Migration into Snowflake, Modernization of Data lake, Analytics and ML track, Data Apps Track, Industry solution spotlight, bunch of sessions with Snowflake data heroes about mobilizing your data and last but not least Keynote of the day and couple more „headline“ sessions. All in all it was pretty packed day with lot of interesting sessions.

In this post I would like to provide my summary and view on the recent Snowflake announcements…


Data is one of the most valuable commodities which companies have. As such valuable commodity they should be protected in right manner. Today’s world is full of data protection rules (GDPR, CCPA, HIPAA, etc.) which main aim is protecting user’s data. Such compliances specify restrictions and rules to be followed by data platform providers and data processors. Data security should be always number one priority in every data related project.

Snowflake by default offers many security features but we have decided to put the security on the next level in data access perspective. We want to protect our user’s data…


Recently a few people have asked me how I see the trends in BI & data area, what to focus on, what might be useful to know or learn. I’ve decided to write it down and share the link with everyone who would be interested in. So, what kind of trends do I see in BI (data) project nowadays? I will take a look on it from general architecture perspective. There is so many various technologies which might be used that it is not my intention to cover them.

I think the main points have been more less same already…


You might come across situation when you have DSS scenarios which are supposed to be running in some time interval (daily, weekly, etc.) but they have not been running for a while and you do not know about that because there is no out of the box monitoring of this situation in DSS. It becomes more and more difficult to manually follow the scenario’s setup when you have tens of projects and scenarios. …


I have been using Snowflake DWH as a target DB for more than 7 months. We have been developing data pipelines in Dataiku DSS. During that time I have come across small pitfalls or differences compared to HDFS datasets in terms of functionality. Generally speaking those are mainly things related to handling particular options of different snowflake commands. It might be good to be aware of such things and not be surprised so let’s go through them.

Dataiku has support page which contains all the details in relation to Snowflake support in DSS. …

Tomáš Sobotík

Lead data engineer @TietoEVRY. Currently obsessed by cloud technologies and solutions in relation to data & analytics. ☁️ ❄️

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store