Many applications today are data-intensive opposed to compute-intensive. The CPU power is not a limiting problem of these applications, but there are bigger problems, like:
- amount of data
- complexity of the data
- the speed that the data is changing
Most of the applications that are data-intensive, need to:
- store data – databases
- remember the result of expensive operations – caching
- allow users to search and filter – search indexes
- real-time processing of continuous streams of data in motion – stream processing
- processing of transactions in a group – batch processing
There are many database systems, with different characteristics, because different applications have different requirements. (various approaches to caching, different ways of building search indexes, etc).
If a task cannot be achieved by one single tool, they can be combined in order to provide a service and usually the complexity is hidden by the service’s interface or application programming interface (API). When you start building complex systems that might provide certain guarantees (e.g. cache invalidation, or cache update on writes so that the clients see consistent results) you are now not only an application developer, but also a data system designer.
Inspired from “Designing Data Intensive Applications’ – Martin Kleppmann