[updated May 2, 2022]
I originally wrote this in 2019, and I'm amazed at how much things have changed in just 3 years, from a technological and societal point of view. The cloud is going through a great generational shift before our eyes.
Put simply, we're now entering a new phase of the cloud where its architecture actually prevents it from solving emerging problems with data processing. The shift is from seeing and using data like its a historical record (seeing the world in the past tense), to seeing and using data as a dynamic set of events happening continuously (seeing the world in the present tense).
The cloud is built on the model of centralization: collect data from everywhere and centralize it in one big pile so that we can do something useful with it. But moving that data takes time and energy – the time cost often is latency and the energy cost is bandwidth. But there is nearly 2.5 quintillion bytes of data generated every day. By 2025, some estimate there will be 75 billion IoT devices in the world – all generating unbelievable amounts of data, much of it requiring real-time analysis. At this rate, the world will essentially be priced out of the centralized model.
Data needs to be ingested, processed, and analyzed where it is generated. Faster data processing requires it.
Fast data and the centralized cloud
Data has time value - it tends to decay in value over time if not acted on. Data has location value when seen through the lens of ownership, privacy and data regulation. Data has concurrency value in situations where it's changing fast because of the actions of a lot of people on a common piece of data (like in auctions where the price value of something changes quickly), and finally data has actuation value - its only useful if acted on immediately (like in placing a buy order or sell order with a stock).
So take the types of data that have a mix of time, location, concurrency, and actuation value and move it to a centralized cloud model and you quickly realize the folly. The data becomes less valuable by the time it's moved and stored, it loses context by being moved, it can't be changed as quickly because all the moving around gets in the way of changing it, and finally by the time you act on it - it might be too late (your order doesn't fill in stock trades costing you profits). Therefore something has to change or evolve, that makes new assumptions about the real-time world in order for it to work. Fast data and the centralized cloud are like a race car meeting a wall at peak speeds. Lots of projectile debris and a wreck of twisted metal and rubber.
So how then do we solve this problem of fast data?
Geo-distributed fast data processing is the answer, edge computing is the architecture
Centralized cloud is the architecture of Big Data, but Fast Data needs its own architecture - an architecture that embraces distribution and decentralization. When we place data processing closer to the producers and consumers of data, we remove the cost of time and energy to move the data to a centralized location. The added step of analyzing this data in some central cloud is no longer necessary – we have what we need to do this at the edge. Centralized cloud at this point is a legacy system that we use despite its best efforts to stall or hinder us.
An edge architecture is not an extension of what worked in the cloud, it has to be built on the philosophy, physics, and mathematics of geo-distribution and latency. You can't simply take the primitives of data processing on the cloud-like eventually consistent object storage (S3 on AWS), clustered file storage (EBS, EFS or even lustre for that matter), high coordination overhead using consensus protocols (looking at you Paxos and Raft), and put it on the edge and expect it to work at scale. None of these systems were designed to provide high levels of accuracy (database consistency) at rapid rates of data change across large distances and a large number of locations.
Will the Edge Native Approach for Fast Data problems please stand up?
Cloud native was designed to work with the centralized cloud. We can't simply apply cloud native to our new edge architecture and expect it to work. This is a half measure. A real edge native architecture is one that is coordination free and provides high levels of accuracy on rapid rates of data change. Edge native allows developers to work with data as events – not mere state changes. This edge native architecture is event-driven, reactive to the real world and, most importantly, geo-distributed. The real edge is a brand new architecture, built on purpose specific components that solve the problems of dynamic network behavior, accurate data synchronization without the use of consensus.
The edge is a real opportunity to solve hard problems that the cloud cannot or will not solve. The edge is a hot and exciting space for new ideas, startups, and breakthrough business models. And we can expect that along with all the new interest will come inevitably, the great hordes of legacy tools vendors - rebranding their old school big data, storage and container products/services as "Now available with exciting new edge capabilities!"
Folks, we've seen that movie before. Every on-prem system vendor suddenly became cloud native with a Powerpoint slide, remember? The bar for edge computing is pretty damn high and it's not going to happen on the old cloud platforms.
Macrometa - the planet scaling, geo-distributed, Fast Data cloud
Macrometa provides the programming primitives for building low latency, request-response and event-driven applications for fast data.
- A Global Data Network - cross region, multicloud by default with 175+ PoP locations around the world
- A modern, NoSQL streaming database offering Key-Value, Documents, Graphs, Geolocation, time series
- Strong and adaptive distributed consistency (accuracy) to handle various distributed concurrency scenarios
- A compute runtime that lets you write event-driven code as pipelines that run with data locality across a set of user-defined locations.
Macrometa is first and foremost a platform for developers - its meant for people who want to solve the type of fast data problems that only edge architecture can solve. In building Macrometa - we have focused on providing the right abstractions to hide the complexity of distributed programming, accuracy and correctness. You don't need to know distributed databases or concurrent programming to write real-time, event-driven apps and APIs with Macrometa. You simply consume our APIs and let our platform do the orchestration, distribution, and scheduling of your data and code. Sign up for a forever free dev account to get started.
- Serving your users globally, faster
- Building applications the edge native way
- The Guide to Event Stream Processing