July 5, 2022

Embracing Cloud Native – Big Nerd Ranch

Cloud infrastructure has pushed software towards abstracting the developer away from the operating hardware, making global networks and copious amounts of computing power available over API’s, and managing large swaths of lower tiers of the tech stack with autonomous software. Gone are the days of buying bulky servers to own and here are the times of renting pieces of a data center to host applications. But how does designing for a cloud environment change your application? How do software teams take advantage of all the advancements coming with this new set of infrastructure? This article will go over three pillars of a “Cloud-Native” application and how you can embrace them in your own software.

Pets vs Cattle analogy. It differentiates how we treat our application servers between pets, things that we love and care for and never want to have die or be replaced, and cattle, things that are numbered and if one leaves another can take its place. It may sound cold and disconnected, but it embraces the failure and accepts it using the same methodology of “turning it off and on again”. This aligns with the Cloud mentality of adding more virtual machines and disposing of them at will, rather than the old ways of keeping a number of limited, in-house servers running because you didn’t have a whole data center available to you.

To utilize this methodology, it must be easy for your app to be restarted. One way to reflect this in your app is to make your server stateless, meaning it doesn’t persist state on its own disk: it delegates state to a database or a managed service for handling state in a resilient way. For connections or stateful attachments to dependencies, don’t fight it and try to reconnect when something goes down: just restart the application and let the initialization logic connect again. In cases where this isn’t possible, the orchestration software will kill the application, thinking it’s unhealthy (which it is) and try to restart it again, giving you a faux-exponential-backoff loop.

The above thinks of failure as binary: either the application is working or it isn’t, and let the orchestration software handle the unhealthy parts. But there’s another method to compliment these failure states, and that’s handling degraded functionality. In this scenario, some of your servers are unhealthy, but not all of them. If you’re already using an orchestration layer, you’ll likely already have something to handle this scenario: the software managing your application sees that certain instances are down, so it reroutes traffic to healthy instances and will return traffic when the instances are healthy again. But in the scenario where entire chunks of functionality are down, you can handle this state and plan for it. For example, you can return data and errors in a graphql response:

      "name": "James",
      "favoriteFood": "omelettes",
    "comments": null,
  "errors": [
      "path": [
      "locations": [
          "line": 2,
          "column": 3
      "message": "Could not fetch comments for user"

Here parts of the application were able to return user data, but comments weren’t available, so we return what we have, accepting that failure and working with it rather than returning no data. Just because parts of your application aren’t healthy doesn’t mean the user can’t still get things done with the other parts.

caching solution. All in all, the idea should be to look at logical factors that prevent you from running a second or third instance of your application in parallel. Ask yourself what are the downsides or complications of just adding one more instance of your app, and create a list of barriers. Once you’ve removed those, you’ll find that running tens or hundreds of instances in parallel is now possible.

Source link

Leave a Reply

Your email address will not be published.