AWS re:Invent 2019 summary
I had a chance to visit AWS re:Invent conference in December 2019. It is almost month since it has ended and I wanted to keep a distance before writing any review or saying anything about my experiences. Simply because I wanted to think about things slowly and not in rush when you are spending 12 hours per day around conference campus.
It was my first re:Invent experience and it was truly amazing one! I did not have any expectations, but I was surprised about the content which was really focused on developers, architects. Number of handy workshops and quality was outstanding. There were always 2 or 3 different sessions in one time which were interested for me and I had to choose one. Possibility to discuss your own issues with AWS experts and their willingness to listen and help you has been surprising me whole week.
I was also surprised by openness of AWS customers who shared their stories including fails. It was nice to hear that other companies have been solving similar problems or have been going through similar challenges. I was amazed by stories from Comcast or NASDAQ about their journey when building Data Lake. And there was many more.
Last but not least I have to say how well was whole event organized. There has been more than 65 000 attendees and I have a feeling that everything worked smoothly. There was enough of volunteers who was there to help you find the right room or fill the gap in your schedule. People who control the queues, they were always nice and smiling. Thumbs up for organization!
And now the trends. Where the market goes and what I have learned there. I want to say that this is my personal opinion, nothing what would be clearly stated at conference as general true. My main interest is data and BI world in general, so the trends are also related to it. Let’s start!
Data warehouse is almost outdated technology
I mean everyone is building Data Lakes. There has been so many sessions about building data lakes, migrating into cloud. But sessions about building data warehouses? DWHs have more and more issues to handle new requests. There are new data formats (semi structured data, non-relational data, etc.). Legacy DWH is almost impossible to scale and there is also high cost. It was clearly visible how Data Lakes is hot topic, all sessions were packed and some of them especially related to the AWS Lake Formation was almost impossible to attend even though I have spent around 80 minutes in the queue.
Speaking about high cost there is one more thing mentioned also by AWS CEO Andy Jassy and it is popularity of their relational database service Amazon Aurora. It is supposed to be one of the most successful AWS product nowadays. There is fast moving into Aurora from traditional relational databases like Oracle and Microsoft SQL Server. Again, there is massive cost saving behind.
Redshift on steroids
There has been announced so many new releases and Redshift have had a lot of space on keynote as well. Last but not least there has been tens of different sessions/workshops or chalk talks related to Redshift. It is clearly visible that it is important AWS Product. There is 10000+ Redshift customers worldwide.
I have a feeling that with new features Redshift is trying to catch up Snowflake. Because some of the new features are exactly those which we have been missing couple months ago when there has been decided to use Snowflake instead of Redshift for one of the projects I have been working on.
Redshift should now have separated storage and compute where you pay only for used storage. There is new node type (R3). What looks interesting is federated queries when it should be possible to directly query the data which are outside the Redshift cluster (Aurora, S3, etc.). There is much more. Things like concurrency scaling or Advanced Query Accelerator (AQUA) looks more than interesting on the paper. I think there has been introduced more than 50 new features for Redshift on re:Invent.
Legacy ETL tools like Informatica might have problems
Informatica is leader in ETL tools in “legacy world”. I have been working with Power Center for more than 9 years. Informatica had just a small booth at Expo and when I asked them what they can offer, they just said you can install PowerCenter on EC2 instances. Ok that is nice, but is that all? Really?
When I have been listening stories about building data lakes from companies like Comcast there was many times mentioned that there is nothing like Informatica for the cloud. You have to combine more tools but firstly you have to find the right combination of them.
If I compare Informatica PowerCenter for example with Dataiku DSS it is totally different experience. Speed of development in DSS is incomparable with PowerCenter. What you would be doing for week in Power Center you can handle in day or two in DSS. And of course, there are things which you can’t do in Informatica.
I have a feeling that Informatica and similar companies which are still heavy focused on legacy DWH might have problems in couple of years.
I have recognized machine learning as another important AWS product. It got a lot of space on Keynote as well. There has been introduced so many new features for SageMaker which is AWS Machine Learning tool. I got a feeling that there is always effort to add some machine learning on top of every use case. Even though ML is not my primary field of interest I visited one session related to introduction of new AWS ML related service called Amazon Kendra.
Kendra should bring ML into enterprise search. It should deliver powerful natural language search capabilities. I have seen demo and it worked pretty well and I can see many useful use cases where this can help. We can start with Intranet search which really sucks in our company and finding what you need is many times big pain. And this has been also mentioned on the demo including numbers how much money costs companies thing that employees are not able to find what they need for their work. It seems that we are not only one company with this issue. 😀
What else — Apache Hudi, DeltaLake
Last but not least I want to mention two technologies which has been interested for me mainly because of their ability to perform change data capture operations. First one is Apache Hudi which is still in incubating phase but it is open source framework which should simplify incremental data processing. Simply said you can get transactional operations above data lake and perform INSERT, UPDATE or DELETE into existing dataset based on presence of the record in it.
Both tools impressed me on the attended sessions because they are trying to solve problems which we currently have.
And that’s it. I could continue with more and more new services and introduced features, but I think it is enough. I have to say once again it was worth to spend 24 hours on journey to Vegas to attend this conference. 💪🏻
I hope I will have another chance to visit re:Invent.