The co-evolution of realtime and back office data stores — the two in a tango
By the 1990s, Tim Berners-Lee wrote the first web browser, WorldWideWeb and the Internet became widely available and online commerce started becoming the norm. In 1995, Stanford Federal Credit Union became the first financial institution in the US to offer internet banking to all of its customers. Jeff Bezos quietly launched Amazon.com and a host of massive internet companies emerged over the last 30 years. This has lead to the rapid evolution in the customer facing and back office technologies. Naturally this pushed the database and storage technology through the roof, as users and interactions grew from the order of millions to billions.
A bit of History
(if you wanna jump right into the meat of the article, please go straight to Stages of database co-evolution )
When I started my career in early 2000’s, building applications for banks, stores, insurance companies were all the buzz. And these were, well established brick and mortar companies. Technology was just another channel of their core- business and ‘real-business’ was the king. These businesses were mainly tied to a single country and even a target segment. Bank of America, mostly did banking in America and Target sold mostly clothing merchandise. Google was a search engine and navigator still, the facebook, still a social network to connect with friends. Then some where as the curtains closed down on the first decade of 21st century, multi portfolio Internet companies with global business model emerged. ‘The everything store’ like amazon emerged and PayPal spread to 200 countries offering wide range of products.
Facebook and google became a giant advisement and marketing company unleashing the virgin market of personal branding as well. Companies like AirBnB, Zomato, Uber thrived. The Businesses became global, high in scale, tech centric and banged on 4G / 5G in conquering the world. Tech centrality meant, large scale investments were flowing into building rock solid technological systems. These companies, not having a brick and mortar existence, asked do or die questions to technologists. And Technologists responded with a resounding ‘yes’.
And it was of a great interest for me to watch how technology evolved throughout these tectonic shifts in the way of doing business. Node.js and single-pager apps emerged in the front end side. APIs and Micro services started becoming the buzz word in the middle-ware world. And finally for databases, engineers started thinking out of RDBMs and DB2s, slowly moving towards commodity hardware. Commodity hardware emerged and established itself for the data storage needs of high-scale tech systems. Nothing was able to match the needs, except going distributed or using a massively parallel processing systems like Hadoop. The various data storage needs faced this heat and co-evolved with each other to complement the rapid growth. And I found the co evolution of various interdependent database systems quite interesting and want to drill down here.
Stages of database co-evolution
Disclaimer — The complexities of systems are hugely simplified to drive focus on the main point.
Stage 1 — The Early days
In an enterprise, there are diverse storage needs. In the early days of e-commerce and payment processing systems, the world was a much simpler place. If I can over simplify this, we can say, we just had two kind of data stores. (These were the years engineers build the bread and butter system for the Brick and mortar enterprises.)
Real-time databases — These databases were mostly monolithic and catered to applications serving realtime needs of the customer.
- Transactional systems which serves realtime customers — checkout, flight reservations etc.
- Non-transactional systems which serves realtime customers — catalogue search , social networking feeds etc
Back office databases — These databases were mostly monolithic and supported the post sales activities.
- Analytical systems for financial planning/projections, marketing market study and so on
- Systems for accounting, settlement, reporting etc
Add alt text
There was a clear distinction between customer facing databases vs back office databases. This era was dominated by RDBMS systems for realtime applications. The databases in the application side further bifurcated into read-optimized and write- optimised databases to serve the sub second needs of the customers with transactional integrity. On back office side, only pure analytical application was categorised as ‘back-office’. Many of the back office transactional data needs were in no-mans land where few usecases were getting served from the application side and few others from analytical side.
Stage 2 — Real time applications becomes distributed
Add alt text
The customers base grew and YOY of eCommerce’s year-over-year growth rate in the US, charted a consistent 13%+ through the course of 2010’s and social media websites crossed billion+ users comfortably. Amazon had arrived after long ardourous journey to be profitable and facebook/google where seen more than ‘social-media’ by the Industry watchers. With increasing transaction volumes and low response times, the realtime applications evolved into the distributed systems. These systems started catering to various domains and functions and Jeff Bezoz famous API mandate to build all services as reusable APIs was implemented pretty much in all tech companies. In the frontend world, react.js, one pager applications, iframes revolutionised and re-imagined UI layers in the distributed fashion from older html/css/jsp times. Parallely, middle-tier systems also evolved. Emergence and prevalence of microservices backed by Rest/ SOAP/ native APIs, Pub-serv models, Queuing systems etc created a highly decoupled micro service based systems in the Real time application world. Meanwhile in the back office world ETL systems like Informatica, Microstrategy, Cognos, Teradata and other players solidified their presence into enterprise data warehouses.
Stage 3- Distributed data processing goes mainstream
Add alt text
By this time Googles paper on mmap-reduce was well known to even a vaguely curious engineer. A decade after Doug Cutting’s son has christened the big data system after his Elephant toy, Big data became mainstream. With the super power of Big Data system, the storage technologies started flying. No-Sql databases ruled the rooster. The no-sql movement created by a twitter hashtag, heralded a new era for database systems. Application systems started adopting No-sql databases like mongo-db, cassandra, elastic search and so forth. And analytical systems where fast-tracked into Big Data adoption.
Stage 4 — Evolving area- The Datastore with scale and Integrity
But slowly things were getting complicated as internet bug spread to real time transactional businesses like e-commerce/payment systems. Transactional systems started boasting data in orders of Peta Bytes and need for transaction integrity with scale began to raise in the back -office data stores space. And there is hardly any tried and tested out of the box solution for this in Industry right now.
Add alt text
Main characteristics of the Transactional Back office databases
- High Volume structured data — Massive data consolidated across various application domain. While the domain application system handled this by divide and conquer, here the need itself is consolidate and make sense
- High Data Integrity — The data reliability and quality is paramount — There is no tolerance for missing data or inaccurate data
- Mix and match of data across various domains and products
- Update friendliness — Due to the evolution mentioned in the stage 3 (distributed application ), the data flows in from various upstreams with various speeds needing update to an existing view.
- Moderate speed — These systems are typically offline batch systems which needs to process millions of records in few minutes.
- High Resilience and availability — Since the business usecase cannot tolerate data loss.
- Cloud friendliness — As the data grows in volume and companies expands to various geographies, on-premise mode is not very viable due to large purchase and physical installation times
While RDBMS scores on points 2,3,4 and 5, it is a challenge with 1 and 6 largely and evolving in 7. Many of the Big Data systems are good in points 1, 3, 4, 5, 6, it struggles with 2 and 6.
I frankly see a gap in the Industry for a database system which full-fills the above needs. I see, the ongoing evolution of back office datastores as the next big opportunity for database evolution. I welcome everyone to share your thought on this and If you are working in this space, please share your experience/useful links/papers in comments section.