Announcing PhotonIQ: The New AI CDN For Accelerating Apps, APIs, Websites and Services
Pricing
Log inTalk to an expert
Edge

Cloud Native Is Not Edge Native

Post Image

Today, we have many hyper-scale, distributed data platforms that can handle big data workloads with elasticity and simplicity. This simplicity comes from the fact these workloads run within a single data center in a single, centralized region. As long as those resources needed to scale are available within that same data center, everything scales as it should. But we’re entering a new age of data decentralization enabled by powerful next-gen devices, 5G networks, advances in IoT/IIoT, AI/ML, and the current geopolitical trends for regulating sensitive data.

This new world requires a new cloud –one that is distributed rather than centralized. The problem is that cloud native approaches that worked well in our centralized architectures won’t work as well in this new paradigm of dispersion and decentralization. 

Edge cloud needs an edge native approach that is designed to meet the challenges of the edge, built for distributed architectures, and optimized for decentralization.‍

The five challenges of edge cloud

The centralized approach doesn’t work because it doesn’t face the same barriers as distributed architectures. Unlike single region architectures, decentralized architectures need to deal with these five following challenges:

  1. Large network latencies with 100s of milliseconds
  2. Unreliable & jittery networks
  3. Data write costs
  4. Data transmission costs
  5. Complex service orchestration

For any data platform architecture to be truly edge native, it has to effectively address these five primary challenges of edge.

A diagram for a centralized architecture

Cloud native and the network latency and unreliability problem

Current cloud native data platforms are primarily coordination-based systems. This means that they rely heavily on data center class networks (i.e., very low network latency and high reliability) for their throughput, performance, and availability. The primary reason is the need for consensus.

Consensus is the task of getting all nodes in a group to agree on some specific values based on each node’s votes. For example, in election algorithms, the goal of consensus is to pick a leader. With distributed transactions, the goal of consensus is to get an agreement on whether to commit or not. In short, any algorithm that relies on multiple processes or nodes maintaining a common state relies on consensus. 

While consensus algorithms solve many distributed systems problems, these algorithms are very chatty and require synchronous coordination. Coordination creates three primary penalties for distributed systems:

  • Increased Latency (due to stalled execution)
  • Decreased throughput
  • Unavailability

For example, if a transaction takes d seconds to execute, the maximum throughput of the concurrent transactions operating on the same data items is 1/d. If a data platform is within a single availability zone or a single data center, this is not usually a problem. The network delays are slight (i.e., hundreds of microseconds or low single-digit milliseconds), permitting from a few 100s to a few 1000s of concurrent transactions per second.

However, if the data platform is spread across multiple regions/PoPs, the penalties will dramatically increase. The read & write delays are lower bound by the network latencies, which run into 100s of milliseconds. This leads to a rise in request latencies and a reduction in throughput. 

For example, if you have an eight-region cluster, the system must coordinate across those eight regions, bringing down the system’s throughput to 2-3 concurrent transactions per second. This is assuming the network is reliable. If the network is not reliable, then the system degrades to unavailability. Neither of these are practical solutions.

Cloud native and rising data write costs

Everyone talks about the data tsunami that is on the horizon (or already here, depending on who we talk to). Then the focus switches to how to optimize the read path for costs & performance? how to analyze data effectively? Or how to serve the ml models? These are certainly valid issues.

A big omission is that first we need to write the data, and that’s at least 50% of the data tsunami problem. This is an area where current distributed data platforms underperform. 

For example, let’s take a sample eCommerce store. This reference architecture from AWS Blueprints shows a fairly standard model for an eCommerce store. The blue text is my annotations.

Source: aws-samples/aws-bookstore-demo-app (github.com)

Like most enterprise apps, the store requires:

  1. A DynamoDB /document store for its users, items, and order information.
  2. A search store to provide search capability for users.
  3. A graph store to record its user activity and items for recommendations.
  4. A cache store to speed up data access. 
  5. An API gateway and lambdas to wire all these together.

While this architecture will technically work, it still presents a few problems::

  1. You have to store the same data in multiple formats in multiple stores. In essence, your writes are amplified as they fan out across various stores.
  2. You give up on consistency & transaction semantics across these multiple stores. You will also have to spend a lot of time and money to address the pain from these problems.
  3. Your developers will spend 80% of their time learning & writing extra glue code to integrate these stores, while spending only 20% on business logic that actually moves the needle.

In short, you have expensive writes, loss of developer velocity, and solution complexity.

Cloud native and data transmission costs

It’s estimated that IoT devices will generate 90 zettabytes of data by 2025. In the current cloud native system, all this data needs to be transferred to centralized cloud locations for processing. That increases not only the data transmission costs but the data write costs as well. In addition to being costly, the latencies involved with this approach make it difficult to take advantage of data in real-time, let alone react in a timely manner for recommenders, predictions, alerts, or something else.

One option is to optimize the data for transfer by compressing it through some intelligent means to reduce the data transfer costs. But that ends up just trading one problem for another with the increase in processing costs. The rest of the problems still remain.

In reality, over 90% of the data is noise, depending on the use case. A better approach is to just process data at the location near the data source and only transfer the signals. There are some solutions that provide this capability at the edge, but the problem is that they are mostly stateless solutions.

Many real-time analytics & Machine Learning solutions require and benefit tremendously from the state (aka context) around the data event that is being processed. But maintaining state at the edge requires a data platform that can run in 10s and 100s of locations around the world. In essence, one needs an edge native data platform to provide state.

Cloud native and complex service orchestration

Think of the last time you worked on a distributed system paired with another region for active-active or active-passive replication. This could be a hybrid cloud or a multicloud solution. I don’t need to tell the complexities inherent in this, the multidimensional expertise required from the engineers, and the efforts involved – those are all pain points we know too well. Not to mention the deployments, upgrades, and monitoring. You have to account for so many factors as part of the development of your service using these cloud models. Now multiply these factors by 10x or 20x to account for the 10s, if not 100s, of PoPs required for the edge.

Over the last 2 decades, Content Delivery Networks (CDN) have perfected the art of providing a Single System Image (SSI) addressing these challenges for website developers. We need similar SSI capabilities for backend developers working with data and streaming platforms dealing with dynamic data. 

Macrometa GDN provides a Single System Image (SSI) of the platform to the developer while running in 10s and 100s of PoPs just like CDNs provide an SSI of their network for a website developer.

Conclusion

To summarize, we don’t use a hammer to chop wood! Hammers are a perfectly useful tool, but if we need to solve the challenges at the edge, we need an edge native approach.

When my friend and co-founder Chetan and I discussed edge native in late 2015, it was a far-fetched idea with its fair share of skeptics. We benefited from those who provided constructive feedback and ignored those who were bound to the status quo. It is nice to see the industry trending towards the edge these last few years. The five challenges of the edge mentioned in this post guide all the fundamental principles for stateful edge computing. 

Building technically complex data platforms like Macrometa isn’t easy, but I believe there will be more data platform providers in the future who will address these principles natively in their architecture. Similarly, consumers will want these capabilities from their data platform providers as they learn the hard way that the cloud of today is ill-equipped for the edge of tomorrow.

Learn more about Macrometa GDN technology, including our Globally Distributed Multi-Model Database and Real-Time Stream Processing. If you want to see Macrometa in action, sign up for a free developer account or talk to an expert.

‍Photo by Cristian Palmer on Unsplash

Related posts

How to build applications the edge native way

The Great Shift - Edge computing & faster data processing

The Octopus and Macrometa

Related Posts