Recently a few people have asked me how I see the trends in BI & data area, what to focus on, what might be useful to know or learn. I’ve decided to write it down and share the link with everyone who would be interested in. So, what kind of trends do I see in BI (data) project nowadays? I will take a look on it from general architecture perspective. There is so many various technologies which might be used that it is not my intention to cover them.
I think the main points have been more less same already for some years. New, mainly cloud based technologies are being improved over time and becoming more capable and cheaper compared to “legacy” technologies which companies are used to use for years or decades. It is no more only about startups who follows these new approaches in data projects, but also other companies and it could be either Fortune 500 enterprise or small business who have discovered benefits of cloud based solution.
If I should summarize the main attributes which are driving the change, I would split them into 4 areas. I also consider those 4 points as key attributes for IT providers which want to help companies with their new challenges and be the trustworthy partner for them in future years. Without proper knowledge in following areas you will become (sooner or later) IT partner maintaining legacy solution with no potential to innovate.
The main points are:
- moving from fixed cost model to dynamic, pay as you go approach
- deconstruction of monolith BI solutions into micro-service model -> decoupling
- infra as code
- ongoing support for self — service BI and collaborative way of working in development process
I am mentioning the points without any relation to concrete technology stack as it can be very individual, based on current customer environment or tons of other reasons. One of the benefits of today’s decoupled approach is wide selection of tools, platforms, libraries where customers can build their technology stack according to their needs and budget.
IT service providers should not be fully focused on single technology stack (Azure/ AWS/Google Cloud, etc.) but be prepared and able to help no matter of technology.
Moving from fixed cost to dynamic, pay as you go approach
This scenario comes into stage very often when companies are reaching life cycle end of current setup. Which probably means:
- current servers are too old — lack of performance and you would need to buy a new ones
- end of license contracts for current tools — Do I need to extend it?
- current product (DWH) reach its limit and it is not possible (or very expensive) to update it in terms of new needs and requests which are coming from today’s world (semi-structured, unstructured or streaming data processing, dynamic scalability, etc.)
In such situations companies will start thinking which way to go further and they perhaps realize that old model (buy servers, licenses and all needed stuff in advance) is not the best option anymore and they will tend to change this model with approach based on pay as you go model where you will be paying only the real usage and capacity you need. Such model brings many benefits:
- scalability — you are no longer tied to purchased server capacity
- cost saving — paying only for running time
- freedom — you are not tied to contracts running for years. -> Easier to change the platform, provider, tool, etc.
Monolith BI solution decoupling into micro-service model
Micro-service model is very popular in area of “standard” apps (web, mobile, enterprise internal apps, etc.). I can see same model has been coming also into BI area where this architecture approach brings more freedom in used tools, libraries or technologies in general where it is easier to replace one small block than change completely the architecture in case of monolith solution. Such approach brings benefits like:
- development speed up — component based solution where it is easier to change or work on just one block without affecting rest of the solution
- cost saving — not needed to buy one “big and expensive tool” who rule them all
- wider options of available tools where for particular operation/task could be used specialized tool/library -> right tool for each task
- security — better communication control between components
Thanks to micro-service model we can also move from big releases once in time into continuous deployments when we deliver small changes but very often. Of course, you need more things to be able to do it like automated testing or infra as code.
Infra as a code
IaC strongly support micro-service architecture model and allow development team doing the continuous deployments faster. When you manage your infrastructure as a code you have one place where to maintain it, you can see the infra changes over time (git) and anyone with proper knowledge has ability to modify it and it is also visible for everyone.
Next thing, it improves security because you are trying to mitigate risk of human errors in configurations. All of those aspects improve efficiency and reliability. When organizations start with new project, they should think about IaC integration into new product lifecycle, no matter what tool might be used for it (Terraform, CloudFormation, etc.).
IaC has many benefits, lets emphasise the main ones for me:
- cost savings
Ongoing support for self — service BI and collaborative approach in development process
Self-service BI is something what has been “trendy” or “cool” for some time and it still preserves. Leveraging self-service BI helps organization to gain insights from data faster, it improves decisions made on top of them. Business users do not have to wait for IT guys or data scientist to build the reports for them, they can do it themselves. Software vendors have been trying to support this by new collaborative tools which are supposed to make it even easier.
Tools like Dataiku Data Science Studio brings unified platform where all roles involved in project development process use same tool which significantly makes communication or knowledge sharing between IT & Business easier. All roles share same tools, use same formal language and can collaborate on same models together. All of this automatically speeds up the development.
I think that meaning of similar collaborative features or tools will be increasing as they contribute to “democratization” of data and data access itself.