AWS re:Invent 2019 summary

Let’s start!
Keynote time!

Data warehouse is almost outdated technology

I mean everyone is building Data Lakes. There has been so many sessions about building data lakes, migrating into cloud. But sessions about building data warehouses? DWHs have more and more issues to handle new requests. There are new data formats (semi structured data, non-relational data, etc.). Legacy DWH is almost impossible to scale and there is also high cost. It was clearly visible how Data Lakes is hot topic, all sessions were packed and some of them especially related to the AWS Lake Formation was almost impossible to attend even though I have spent around 80 minutes in the queue.

Waiting in the queue for Lake Formation workshop. 80 minutes in advance was not enough to get in.

Redshift on steroids

There has been announced so many new releases and Redshift have had a lot of space on keynote as well. Last but not least there has been tens of different sessions/workshops or chalk talks related to Redshift. It is clearly visible that it is important AWS Product. There is 10000+ Redshift customers worldwide.
I have a feeling that with new features Redshift is trying to catch up Snowflake. Because some of the new features are exactly those which we have been missing couple months ago when there has been decided to use Snowflake instead of Redshift for one of the projects I have been working on.
Redshift should now have separated storage and compute where you pay only for used storage. There is new node type (R3). What looks interesting is federated queries when it should be possible to directly query the data which are outside the Redshift cluster (Aurora, S3, etc.). There is much more. Things like concurrency scaling or Advanced Query Accelerator (AQUA) looks more than interesting on the paper. I think there has been introduced more than 50 new features for Redshift on re:Invent.

Legacy ETL tools like Informatica might have problems

Informatica is leader in ETL tools in “legacy world”. I have been working with Power Center for more than 9 years. Informatica had just a small booth at Expo and when I asked them what they can offer, they just said you can install PowerCenter on EC2 instances. Ok that is nice, but is that all? Really?
When I have been listening stories about building data lakes from companies like Comcast there was many times mentioned that there is nothing like Informatica for the cloud. You have to combine more tools but firstly you have to find the right combination of them.
If I compare Informatica PowerCenter for example with Dataiku DSS it is totally different experience. Speed of development in DSS is incomparable with PowerCenter. What you would be doing for week in Power Center you can handle in day or two in DSS. And of course, there are things which you can’t do in Informatica.
I have a feeling that Informatica and similar companies which are still heavy focused on legacy DWH might have problems in couple of years.


Machine learning

I have recognized machine learning as another important AWS product. It got a lot of space on Keynote as well. There has been introduced so many new features for SageMaker which is AWS Machine Learning tool. I got a feeling that there is always effort to add some machine learning on top of every use case. Even though ML is not my primary field of interest I visited one session related to introduction of new AWS ML related service called Amazon Kendra.

AWS everywhere 👍🏻

What else — Apache Hudi, DeltaLake

Last but not least I want to mention two technologies which has been interested for me mainly because of their ability to perform change data capture operations. First one is Apache Hudi which is still in incubating phase but it is open source framework which should simplify incremental data processing. Simply said you can get transactional operations above data lake and perform INSERT, UPDATE or DELETE into existing dataset based on presence of the record in it.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Tomáš Sobotík

Tomáš Sobotík


Lead data engineer @Tietoevry. Currently obsessed by cloud technologies and solutions in relation to data & analytics. ☁️ ❄️