Mad world - creating a new stateful world with Kubernetes

Mad world – creating a new stateful world with Kubernetes

March 22, 2024

Eddie Wassef

You should never run stateful systems on Kubernetes. Or should you? Kubernetes has come a long way since its inception as the way to orchestrate modern stateless applications. In this episode, Vinay and Eddie take a trip through its journey toward increasing adoption in stateful workloads.

Like the technology, their discussion is wide ranging and covers topics from how its continued development and changes in software development and operations thinking have supported this, how application, environmental, and organizational considerations should guide your decision to implement it, and more.

If you’re looking to get the ins and outs of Kubernetes and stateful workloads today, and where it is going. This is your episode.

Key insights

⚡DBaaS is not the turnkey, “low ops” choice it promised to be

DBaaS has taken a lot of the heavy lifting off of the DBA’s shoulders and is considered the low ops choice for stateful workloads but Vinay and Eddie question this assumption. They argue that, although it has made aspects of the operational milieu more efficient and reliable, it is not the turnkey solution it promises to be, it still requires a lot of expert administration.

⚡Kubernetes is the great environment equalizer for modern workloads

Due to Kubernetes’s ability to act as a standardized base layer across environments, sophisticated tools such as operators, various interfaces and services, and robust community, it is fast becoming the way to implement modern applications and their databases. This is being proven by its increasing inclusion in on-prem and cloud environments, and various workloads.

⚡Kubernetes illustrates that software is more like gardening than engineering

In engineering, you’re expected to get the same result as the next if you follow the blueprint. In gardening, the same inputs can produce different results — software is the same. Kubernetes isn’t for everything, always, including databases. Instead, its selection should be determined by needs and the context, whether that context be application stage, team capabilities, etc.

Episode highlights

? Eddie’s elevator pitch for placing databases on Kubernetes[00:04:24]

Vinay asks Eddie to give his elevator pitch on why you should run your databases on Kubernetes today. Beginning his response with, “The answer is, why wouldn’t you run your database on Kubernetes,” Eddie explains that, similar to VMs and bare metal, Kubernetes provides a standard that allows you to forget everything below your applications. So whether you’re talking about networking, monitoring, resiliency, storage, etc., Kubernetes can provide that, and more, out-of-box for your stateful systems.

?K8s has developed to satisfy stateful workload needs[00:08:47]

After establishing that Kubernetes is historically associated with stateless systems, but is now increasingly being adopted for stateful ones as well, Vinay and Eddie explore what has actually changed for this shift to happen. There has been a lot of work, such as the creation of StatefulSets, adoption of storage classes, integration of CNIs and CSIs, that has ultimately made incorporating database workloads into Kubernetes environments more suitable by increasing the their reliability.

?K8s is your great equalizer across environments[00:22:23]

Eddie discusses how Kubernetes enables organizations to efficiently move workloads across environments by acting as a common denominator, highlighting how operators are an example of this through their codification of best practices and detailing their mechanics such as reconcile loops work. He then speculates on how they can be extended via AI in the future.

?Kubernetes is sharp knife, not a silver bullet[00:29:17]

When should an organization not use Kubernetes for their stateful systems? Eddie suggests that, at minimum, you need specific skillsets or services such as Kubernetes as a Service (KaaS) and not have special hardware requirements. To underline the point, Vinay and Eddie review a few cases where simple changes can result in catastrophic consequences for your databases.

?Scale is still a limiter but not completely[00:31:38]

Kubernetes tops out at 5000 nodes per cluster and 110 pods per node. However, Eddie suggests that the days of gigantic clusters are nearing their end and micro-clusters will take their place at dawn. He not only discusses how this is a continuation of the “micro” trend, but how K8s enables this and why it might be better, e.g. reduced bottlenecks, blast radius, etc.

?Want Day 2? Go with operators[00:40:58]

Although they aren’t the only way to do so, Eddie believes that operators are the way to go when running databases on Kubernetes. They not only handle the Day 1, but Day 2 ops as well due to their unique mechanics and can be used in combination in various patterns to handle complex operational workflows; it’s not the only way, but the way to go.

Here’s the full transcript:

Vinay: Hello, and welcome to another episode of Sovereign DBaaS Decoded, brought to you by Severalnines. I’m Vinay Joosery, co-founder and CEO of Severalnines. Our guest today is Eddie Wassef, Vice President and Chief Architect at Vonage.

Thank you for joining us today, Eddie.

Eddie: Thank you very much. Pleasure to be here.

Vinay: So, Eddie, can you tell us a little bit about you and what you do?

Eddie: Absolutely. So, I’m currently the chief architect and VP at Vonage and Ericsson Company, and my role is really evolving the architecture, looking at cloud native technologies, and looking at ways to adopt new patterns and practices and engineering disciplines to evolve our architecture.

We’re in the business of being on the cutting edge. We’re in the business of communication, and you do not want technology to be a roadblock for that. So, me and the team, we kind of investigate, and we make the hard decisions of how much technical debt we’re gonna take on with any new technology, and we present it to the company.

Vinay: So I guess, you know, telecom, what, 20 years ago, a bit more than that, I was at Ericsson. And I guess, availability is still a thing for telecoms?

Eddie: Absolutely. Availability. I mean, you think that, you know, drop calls are just drop calls, but sometimes you have emergency calls or you have these really critical situations or situations like what we’re in. We’re recording something live. It could be a broadcast for a podcast or it could be a live broadcast.

All of these things require, you know, severalnines of availability and reliability there. So, it’s all about being able to have redundancy, being able to have the fallback, but still not be a dinosaur, still be at the cutting edge and take advantage of all the innovations that are coming out.

Vinay: And I guess, you know, being in telecom, telecom is it’s been at the forefront of, you know, tech in a way. It’s always been, you know, a highly technical industry, maybe not as, let’s say, conservative as other industries.

So Kubernetes, what’s the adoption of Kubernetes like in, you know, within telecoms?

Eddie: Well, I mean, Kubernetes is an amazing toolset. It’s an amazing orchestrator that solves a lot of problems. And, you know, telecom is a very broad, you know, industry, and there are areas that are very much risk averse and still require the bare metal capabilities. And they’re still, you know, even building their own OS, to manage the metal.

But in other cases, all the software on top of that; I’ve seen an amazing adoption of Kubernetes. You see things on public clouds, working with the cloud vendors, working with smaller cloud providers, and even working with the private clouds or which you call ‘edge clusters’ as well.

So, it really solves a problem because it brings you that common baseline and allows you to have, you know, a common language, if you would. A universal language of how to deploy your application, how to use applications, and how to service your customers. And so there’s been a huge uptick in it.

It’s not an easy, you know, tool to master. So there’s been a lot of variety, but I think, with the advancements, a lot of things are moving in the right direction.

Vinay: So, you know, this talk is about, you know, Kubernetes and databases, and I know you are a Kubernetes advocate, so give us the elevator pitch. Why would I run my databases on Kubernetes today?

Eddie: Well, you know, that’s the one that I’ve been asked about quite a few times. And the answer is “why wouldn’t you run your database on Kubernetes?”. Kubernetes gives you the ability to forget about everything that’s below your application.

Just like we stopped worrying about bare metal when virtual machines became the norm and we had that standard, you don’t even have to worry about storage. You don’t have to worry about networking, and you don’t have to worry about monitoring and resiliency. Kubernetes gives that to you, and it’s been cutting its teeth on stateless applications for years.

And data has kinda lagged a little bit behind, mostly because we had a monolithic application and a monolithic data store, and things were very much tuned for each one. But as the monolith on the application side has started breaking down, we need to start breaking down our data into micro databases and take advantage of all of the the the elasticity, the observability, and the declarative nature of deploying these things that Kubernetes gives us and put our data on there.

There’s no reason why your database can’t elastically scale both, you know, vertically and horizontally. Kubernetes gives you that, and we should take advantage of it.

Vinay: So let’s look at the landscape in terms of environment and workload. So, we knew that Kubernetes was born out of the cloud operating model. Which historically has been tied to public clouds. Now is that still the case? Or do we see more on prem infrastructure adopting Kubernetes?

Eddie: You know, you see a lot of of on prem, as they call them, edge clusters, that are coming out because of data sovereignty, because of the geopolitical landscape, and just because, honestly, I think Kubernetes has, commoditized a lot of cloud native technologies to be able to come into your data center.

You know, before if you wanted any kind of PaaS systems, you had to pick the right, you know, cloud provider that had what you needed and you kinda stuck with them. But now the business landscape, I mean the world’s gotten to be a smaller place, I mean, we’re speaking right now, you know, across half of the world, across the Atlantic, and, that’s how our customers are. So, maybe in the country that we’re trying to service customers, that cloud does not exist.

So now I have to be in multiple clouds. In other cases, there are no clouds. I mean, there’s still not a data center for AWS, Azure, Google, Alibaba, and so on in every country in the world. What if I have customers there? And so you start seeing a lot of these, you know, maybe bespoke vendors that create the private clouds or you have other software providers that give you Kubernetes-as-a-service or KaaS, that you can install your systems on top of.

And that’s a great thing because now you’ve got a common layer that I can run the same service, the same quality of that service on any cloud or in my data center or in my home lab.

Vinay: I guess it’s like this common denominator in multiple environments. Now moving on to workloads. And I think you kind of alluded to that earlier.

Looking at how workloads and Kubernetes have evolved over time, it was originally designed to run stateless. And I’ll be honest here. I have been very, very skeptical about running databases on Kubernetes. I remember the early days that put me off.

You’d have a new version every month, and it was hard to build something on something which was constantly changing. So, but today, as you advocate, right, it is increasingly sort of used to run databases and other stateful workloads.

We do see, you know, organizations even moving, you know, existing databases on Kubernetes. In your opinion, what was the initial issue?

And I think you mentioned there that one of it was mainly the database as a monolith, but there must be other things. And how has that been resolved? And has something, you know, fundamentally changed?

Eddie: Well, with Kubernetes jumped into the orchestrator world, I would say, with a lot of incumbents.

And so, to get adoption, they really had to meet the service fabrics and the cloud foundries and all these other, you know, the TIBCOs and the big service bus models. And there was a lot of churn, because, you know, it was born out of an existing project. To become a mainstream project, there was a lot of movement that happened. Docker was still relatively young. You know, the APIs weren’t necessarily stabilized.

But as it matured, it has really done a good job. I wouldn’t say it’s slowing down because there’s still a release every quarter, which is amazing. And that’s one of the things I actually like about Kubernetes.

But they’ve managed their backwards compatibility and their API structure really well. They’ve also created, I think, the key extension points allowing developers to create custom resources and to create operators to extend Kubernetes and effectively codify their platform engineers or their day 2 engineers into the clusters itself.

So I think from a product that’s matured, it’s absolutely, you know, everything runs on it, today. So it’s very much prime time. But I would say the biggest changes that had come to Kubernetes that helped the data side were things like stateful sets, things like, the adoption of storage classes, the integration of CNIs and CSIs onto the containerization platform using containerd, especially.

And so that allowed some of the things that are specific for stateful workloads, to be applied. Things like having a consistent network name.

Being able to consistently reach a particular pod or a particular storage that’s attached, that you really didn’t care about in the stateful set. Sorry, in a stateless application.

Before, if you had a, let’s say, a my SQL cluster and, that service went away and then showed up somewhere else on another machine, things wouldn’t work. But with stateful sets, you can guarantee that at least it’ll look the same on the network. It’ll be attached to the same storage that it had before. And so it’s effectively like you just restarted the process.

And so that helped a lot, and vendors took advantage of that. We saw a lot of new SQL vendors come out, which are kind of the hybrid between relational and and, you know, document store column, store vector, so on and so forth, and take advantage really of some of the best of the stateless world, like Raft Protocols and Consensus and so on, but still give you the atomicity and the acid, nature of, like, relational or something else.

So it’s really been evolving, and it’s building on top of each other. This house of cars from Kubernetes to the vendors, and then they built something that took advantage of Kubernetes. They saw an advantage and so on, and it’s really elevated, the bar across the board.

Vinay: And we can see I mean, you know, there’s huge organizations, but even pretty much all the database vendors have released their operators for their databases, and that’s that’s, you know, that’s a blueprint way of, you know, operating getting there.

So, I saw your talk at KubeCon. I’ve been watching a bunch of your talks there. Very, very, you know, sort of educational for me. I really love the way that, you know, you describe some of the Kubernetes components involved in the persistent storage. And that’s in terms of Marvel characters. So, you know, for the benefit of our audience, if you know, can you give us an abbreviated version of that, and where do you demonstrate, you know, this sort of feasibility of running databases and Kubernetes into those tabs?

Eddie: First and foremost, thank you so much for your kind words. I appreciate you going back and watching it. I encourage all of you all to take a look at it. It was a really fun experience for me and hopefully very educational for you. The gist of it is, you know, when I look at Kubernetes, it’s a big wide world of structures and concepts and things that are very difficult.

I try to relate it to something that is easy for me to digest, and therefore, this is kinda where this whole Marvel Universe came from. And so I look at Kubernetes like the Avengers. And they are there to, you know, save mankind from whatever it is, whether it’s Thanos or so on and so forth.

They can also destroy your city, which Kubernetes has done in a lot of cases. But, you know, if we look at the Marvel characters in the Avengers, you have your well known characters. You have your deployments, which are, like, you know, maybe Captain America or Iron Man. You have your replica sets.

These are things that have been known that they got all of the, I would say, all of the clout in all of the publications. They do a great job at scheduling workloads, stateful stateless capabilities, but they treat everybody as equal. There’s no personal connection that’s created.

And so when I look at how data is on Kubernetes using concepts like persistent volumes and storage classes and, stateful sets, I see in my mind the “Guardians of the Galaxy”, which are kinda like the quirky Avenger types.

And, in this presentation, I kind of relate Star Lord to a stateful set and a group to a persistent volume and the rocket raccoon to your storage class. And, you know, abbreviated virgin, Star Lord is an Avenger, but he does create a personal connection with everyone he’s trying to save. He gets to know their name, which is similar to how stateful sets provide an actual name, not just some hash to the pod. In that way, even if it moves around, it’s still referred to by that name, which is something that all databases or data applications need to know. They need to have that network identity.

And they also keep them in the same place the longest. They don’t want them to get up and move. And because Star-Lord was an orphan and he was stolen by the Ravagers, I know we might be getting a little too deep into the Marvel lore here, but, you know, he understands what it’s like to have a place to call yours, even if it’s a small place. And so they do that with pods.

They let the pods stay there as long as possible instead of moving them around whenever there’s a d you know, a scheduling event.

Persistent volumes are the elastic storage capabilities inside of Kubernetes, and they can effectively grow or shrink. You can take snapshots.

And just like Groot, where you can have a baby Groot or a gigantic Groot that kinda grows as you need it if Groot is destroyed, you can take a little piece of him and grow him again, and it comes back. You can do that, with snapshots and so on inside of persistent volumes, persistent volume claims.

But you can’t understand Groot. You have to have that special language, and Rocket Raccoon is that translator, the universal translator. And that’s kinda where storage classes come in. They give you a very simple, you know, what do you wanna do to the storage? I wanna attach this storage, I talk to the storage class, and so on.

So that was, you know, the very abbreviated version of the talk. Please take a look at it. It was a lot of fun to do, and hopefully, I kinda got the idea across there.

Vinay: I would recommend, you know, watching that talk. That was last November. I thought it was a very down-to-earth, you know, kind of, description. And, you know, stuff you can relate to because otherwise, the infrastructure can get so dry and boring, you know, and, especially when you’re trying to learn some new big thing, big framework.

But I think, maybe these comparisons kind of put a lot of life into this, and it does get interesting.

Eddie: I think software in and of itself is a model of how we’re trying to model how our life works in the real world, but in computers. And everything, regardless of what you name it, has some parallels in the real world. But if you can make that connection, it’ll hit home for you.

Vinay: So let’s see, let’s see if Kubernetes is right for you.

One of the most, you know, popular ways of running your database today is a public DBaaS. You know, Amazon RDS, Cloud SQL. I guess the reason is, you know, organizations, they hate databases.

The DBA teams are kind of special in a way. You know, traditionally, they didn’t collaborate that much. Whenever you were building applications, you had to go to the database guy or girl and, you know, get a database and try to find a way to, you know, model your data.

But with DBaaS, I mean, you still have to model your data, but in a way, you can just click and then you’ve outsourced the problem.

You click and you get a database, and all the life cycle kind of thing is taken care of. So you give it to somebody else. And I guess, you know, that’s been a big advantage, you know, that convenience that, you know, companies have gone to DBaaS.

So DBaaS, you know, I guess it’s supposed to be the low ops choice. Taking care of deployment, maintenance, tasks like backups, patching, scaling, or is it?

Eddie: Well, I mean, it’s closer than in the old days, like you said, where you had, you know, the folks that were taking care of your crown jewels. I mean, your data is your gold, and you need to make sure that it’s there, it’s secured, it’s available.

And that’s why we had the gatekeepers, or the DBAs back in the day. DBaaS promised that, but I think they stopped short of really taking that turnkey. So, sure, you can provision hardware and you can provision the service out in the cloud. But 9 out of 10 of them are not going to set up your security groups. They’re not going to trade your do any kind of backup for you. They give you the tools to do it.

But they’re not going to take that from soup to nuts. So as an organization, you still need to have someone who has some knowledge of what your data is gonna do and how you’re going to store it and maintain it.

And the number of times that I’ve seen where somebody forgot to turn on backup, snapshot backups, and then disaster happens because you never really know or have an issue with it until something bad happens. And then you go back to the vendor and they’re like, well, you didn’t, you didn’t click that checkbox when you were supposed to.

And then that becomes, you know, a bad taste in the mouth, and they have to pay for some kind of recovery. I think it’s because data is not easy. It’s a very hyper-customized kind of discipline. But I am pretty disappointed. I think the DBaaS world has not been the promise that we’ve all been promised.

I’m personally not a big fan. I’m much more of a “you put it on Kubernetes”, but I understand the value that customers see from DBaaS. But I would argue it is not the turnkey solution that was promised.

Vinay: What is the role of DBaaS in a Kubernetes world? I mean, would, like, you know, Kubernetes operators would be a viable alternative, right, to run your database anywhere? The thing what I what, you know, what’s also, you know, one of the big advantages with Kubernetes is the portability. Now that we know, the cloud is an operating system.

It’s not a destination. So, why would we continue to lock ourselves into an RDS or Aurora, which only exists in, you know, in Amazon and not just use, you know, standard, you know, sort of binaries? Because, that’s that’s that’s, you know, that’s kind of the question.

So, I mean, where do you see DBaaS going in a world where, you know, more and more we are maybe moving infrastructure to Kubernetes?

Eddie: Well, so I think that the term DBaaS is very broad. Database as a service. And that is absolutely where I think the industry is moving to.

But it’s not what we know today. Today, you’ve got the big vendors. Like, you may have mentioned Aurora, Cloud SQL, and Azure SQL. These are all some flavor that they’ve customized to run on their infrastructure well.

And that works great if you hire someone who gets trained on those, knows how to do it, and and and, and gets you implemented there, and you don’t have to leave that cloud.

For example, I think that’s still a viable option, but if you ever wanna get to a point where you could accelerate and potentially automate your DBAs, you’re gonna have to look at something else. You’re gonna have to look at operators or tools that provide DBaaS on Kubernetes, because Kubernetes runs anywhere. And, you know, you’d mentioned that it was a common denominator.

I like to think of Kubernetes as the great equalizer. And it equalizes both your data center, your lab setup, and the different clouds. And you can move things, you know, across any of those almost seamlessly. If you have a network connection, you can do it.

And then the tools are there for you to do so. And, so I see more and more vendors creating operators. Operator in Kubernetes is simply a way to codify best practices for that piece of software. And simplify it in a way to install it onto Kubernetes. So the key thing is that an operator in Kubernetes gets something called a “reconcile loop”, which is every so often and at whatever interval, reconciling the state of what you asked it to do.

And that could be, “give me a database that’s across 3 continents that is this big, and it has this level of, you know, reliability, let’s say”. And it can go in every so often and say, okay. Let me check the health of these machines. Let me check this storage, you know, capacity. Let me check the networking.

And depending on whoever wrote that operator, it could be yourself, it could be a vendor, they will have much, you know, much more power to fix things for you. And, you know, I’m speaking at KubeCon Paris here next week, but there’s a lot of AI that can start coming into this. Where now you can do the tuning that a DVI used to use to look at screens.

I was a DVI. I used to look at screens for hours and traces and try to figure out where the bottlenecks were. I see a future where that now becomes something packaged in your operator, and they can give you very specific capabilities for your workloads.

And say, look. Here’s what we recommend and even apply. And so I wouldn’t say it eliminates the role of a DBA. I think it elevates and changes it to a higher level. Capabilities, which include training the AI or training or, authorizing the operator.

So you’re becoming a DBA that manages a team of virtual DBAs that live inside your cluster. So that’s kinda where I see DBaaS in the future.

Vinay: So DBaaS as a model as opposed to DBaaS as a product. As we think of DBaaS as a Public DBaaS, RDS, or Aurora.

So how do you determine whether to run your database on Kubernetes or VMs or bare metal? Because, arguably, you know, it’s not gonna be the answer for everybody. I mean, are there any workload characteristics that, you know, that fits its use case? Is it a matter of architectural decisions? For instance, migrating maybe to a platform engineering organization, right, for the business with exceptions for a legacy?

Eddie: You know, like anything in software there is no, you know, recipe or silver bullet. Software should not be called engineering, because engineering is a blueprint that can be repeated.

This is more like gardening. And we can garden, and I can take the same seeds that you take and sow them in my yard, but there’s different weather capabilities and drought conditions and so on. And so it is a care and feeding, and that’s how software is. And so you can’t just say, look. Run your database at Kubernetes. Or run it on VM straight.

You have to look at what you need, whether that potentially could be you have a huge amount of data in your storage that you’ve chosen that has the particular read, write characteristics, or resiliency characteristics that don’t have a storage interface, like a CSI interface that can work with Kubernetes, so you should run it in VMs. In other cases, you might be, your application is already built.

It’s already tied to some technology, and you need that technology that doesn’t support Kubernetes, so you run it on bare metal. And sometimes they even have specific hardware requirements.

If you’re Greenfield, I think you have the opportunity to look at data differently. You don’t have to build a mono database anymore like we used to. And if you see, you know, back in my day, databases when they started adding JSON or XML or, you know, other non-relational capabilities, it was because you had one database, and it had to do everything.

But now with the microservices patterns on the application side, we should be looking at micro databases with different technologies to see which one of them is specific for that use case. In that case, the decision becomes a lot easier. The decision becomes, look, I’m now looking at instead of a terabyte or a petabyte of data, I’m looking at 10, or 20 gigs. That’s pretty easy. I can put that in this technology in Kubernetes. This other one might be a vector, database or time series database that needs a lot faster storage.

Maybe I put that on machines and connect and expose it to my cluster so that my applications can use it. That’s, I think, the way to do it. It’s really to be incremental and iterative and try to look at the problem with the toolkit that you have with you.

Do I need something that gives me resiliency and scalability, both up and out? Because one thing you can do, and I’m gonna take a sidebar. One thing you can do with Kubernetes is you can change the machine that your database runs on. You don’t have to have a giant database all the time. And I gave this example, it was right around Thanksgiving. And Thanksgiving in the US, we have Black Friday, which is a crazy shopping holiday.

And so you can run your database all year on relatively commodity machines 2 weeks before you can scale that up to giant machines, move your application there and your data there, right, for the duration of Black Friday, and then come back down.

And so the huge amounts of money that we used to spend on databases in the past, that’s not an issue anymore, because we’ve been able to take advantage of the portability and the capabilities of Kubernetes, in our toolkit. So, I know I gave you a long answer, and it’s not a prescriptive one.

But, with architecture, with software, you have to look at the problem. And like I tell people, my job as an architect is to give you the least amount of technical debt because there will be technical debt with any decision, for your use case. So pick the one with the least amount of technical debt for you.

Vinay: And then it comes to the other, you know, to the other side, is there any time when it is not a good idea to run, you know, on Kubernetes? So, you know, it could be organizations with multiple environments or maybe specific workloads for specific things. Or it could even be, you know, competence and supportability, team size.

Eddie: Of course. If you don’t have someone who can manage your Kubernetes or you’re not purchasing Kubernetes as a service, you should not use it. If you have a requirement to run on special hardware, don’t try to put shim Kubernetes in there. There are a lot of other options that you can run your databases on from bare metal, the virtual machines, to other, virtual, you know, virtualization technologies and orchestrators and Kubernetes.

But there are absolutely cases where you should not use Kubernetes, and that could be slowing down your databases. It could give you, if it’s not configured correctly and you don’t have the right administrators, you can lose your data.

So, you wanna make sure that you have all of the checkboxes from that side before you decide to put something critical like your data in Kubernetes.

Vinay: I still remember some of those, you know, horror stories of, oh, I changed the name of the database, and suddenly Kubernetes was reinstalling the database. Or, you know, I changed a couple of configuration parameters. So I guess, it’s like a sharp knife. You can be very efficient with it, but you can also…

Eddie: And that is a very real scenario that happens every day. Especially with this platform engineering discipline that’s starting to happen where you give a little bit more self-service tooling to your end users, your developers. And they think of changing a name as a simple thing, and that’ll reinstall their services.

And that’ll fully uninstall everything, reinstall it because that was the key that it was using to determine the state. And if that key has changed, it’s a different application. So, you get away with it when it’s stateless. You can’t get away with it with data.

Vinay: So in terms of scale and cluster size, is it not a limiting factor? I read somewhere a cluster of up to, I don’t know, 5,000 nodes, 110 pods per node.

Eddie: It’s still a factor. I used to be an advocate of these gigantic clusters. I thought it was cool to have the world’s biggest cluster. My previous job, that was kinda where we started.

But the more and more that I, you know, have had experience with Kubernetes, I’m much more of an advocate for microclusters. You’re starting to see a theme. Everything is micro mic microservices, micro databases, microclusters. Your blast radius is smaller.

And we have the technologies now to effectively mesh these clusters together. So you can have a virtual cluster of 1,000 nodes. But you also have multiple control planes. That’s the bottleneck, you have a control plane that sits on top of your machines that controls where everything goes, how the networking works, the DNS, the security policies, and so on.

That is a limit. There’s always gonna be a limit to how many machines you can manage and how many pods. So, break it up. Let’s have now that we can have these clusters anywhere, we can plumb them together whether you’re using a service mesh. You can use things like Cilium. You can use things like Linkerd.

You can use things like Stuffer to create these virtual networks, these effectively programmable VPN tunnels. And take advantage of whatever dedicated network you have and connect them. So, I’m more of a fan of that, but you’re right. You can hit an upper limit on your cluster size.

And I think that’s the case with any technology.

Vinay: I picked up on the last radius comment. Looking a little bit at how infrastructure is being consumed in the world, a lot of infrastructure is concentrated with the hyperscalers. And, you know, we all read in the news pretty much every week that, “oh, this went down and thousands of services were taken down”.

Eddie: And you should get, you know, yourself or your folks trained on understanding composite SLA and how to measure that. Because a lot of the hyperscalers, I mean, they have very strict rules. Like, “we’ll give you 3 nines if you’re in these 3 regions, and each of them together is this one 9, or so on”.

And so there is a way for you to be if you ever have a 4 nines or 5 nines application, you can achieve it Right. But you can achieve it in one region or one hyperscaler. That’s just not there’s no way to do it.

And so, there’s a simple, you know, a simple formula. You have to figure out how many 3 nines, and regions you have to be in to give you the 5 nines, and it’s 3. You know, it’s something if you want higher than that, you have to get a 4. But now you’re talking about active, active, active.

And that’s where these technologies come in. That’s where I think microclusters and these cluster meshes really show their power where you can effectively create, you know, 4 small database clusters, mesh them together, and add replication factors across it.

So if you lose you know, what is it? It’s almost 9 o’clock, so US US East 1 is gonna go down scheduled, you know, failure every week or whatever. You can have that backup, and your data doesn’t go away. That’s kind of what, once you have these tools, you can start playing that game.

And providing a better service for your customers.

Vinay: So, you know, the only issue I would say we have when we talk about multi-cloud, multi-region is the cost. You know, even taking the egress costs, can get pretty hairy unless you have some kind of a good deal.

Eddie: I think egress is probably not the biggest cost. I would say compute is a bigger cost. But, I love playing that game with the different vendors. When I am not tied to them and I, it’s just like I’m Egyptian originally, and we love to haggle. And so one of the best things you can do to someone at a store, at a vendor, you can go and say, you’re gonna give me this price or I’m gonna walk away?

And you start walking away and they do that with your vendors. Now that you can do that with the multi-cloud, do, and they will give you some deals. Hopefully, I don’t get blacklisted by anybody watching this, but that is the capability that you have. You have no more vendor lock-in. You have the power now to negotiate.

Vinay: So moving into, you know, kind of nuts and bolts here. We talk about operators, and that’s as a way to extend the functionality of, you know, Kubernetes. And all the database vendors have released, you know, their, you know, their own versions.

And looking at the operators out there, you know, the question is, do they provide everything you need to actually run your database? I mean, Day 1, to make sure that you have a proper deployment with monitoring and alarming and all that. And then Day 2 with backups, upgrades, you know, scaling, or maybe you have to fork it. And what does that entail when you do that?

Eddie: So, I mean, it’s a software product and all the software products aren’t created equally, and you really need to look at your vendor and the technology that they’re providing. Because for each technology, there are multiple operators, and each of them has a different definition of Day 2. So, yeah, they’re probably all gonna give you Day 1.

They’re gonna give you the monitoring and alerting and the installation and the networking and all that stuff out of the box. Some will just shake your hand and say, we’ll see you later, and others will take it to the next step and do the date too. So you need to do your due diligence there if you’re gonna pick a vendor just like you would with any other software product that you would buy.

Forking is never a good idea because it’s hard to keep up with the upstream. Especially, some of these vendors are really quick, really, really fast moving. I would say if you have the capability if your company allows it, contribute to that product, rather than forking it for yourself. As soon as you fork it, you’re now you stop getting the upstream, upgrades and changes. So, I would advocate for contribution. You know, a big open source software advocate that loves, and contributes.

I would say that’s a better path than forking it. Then you can build it yourself.

If you have the expertise in-house with that database technology, You can implement that framework yourself and really custom tailor it for what you need if you have special Day 2, you know, requirements. So you can have it installed with the vendor operator, and then your operator takes the Day 2 portion, whether that’s you’re doing special kinds of backups or you’re you know, you want to replicate in multiple regions or whatever.

That’s something you can do as well. It’s not an easy answer, but I would say just treat it like any other piece of software. Treat it like you’re interviewing a DBA. You want to know the capabilities of that person or software and then decide if you wanna install it.

And the good thing is you can start small. Kubernetes can run in your local home lab. You can test it and see if it satisfies what you need before you take it out into production.

Vinay: The thing is, you know, then you need to know Golang.

Eddie: No, not at all.

So, Golang is probably the biggest and the most used language in Kubernetes. But we, for example, write everything in C# and .NET. There are some great .net and even Java frameworks and Rust frameworks that let you implement the operator, because the operator is just a container that you deploy into Kubernetes, and you specify types and you extend the API, and then you can call it.

So, whatever capabilities that you have from a development perspective, you can build an operator on. And that’s the great thing about the equalizing capabilities of Kubernetes.

If you’re a Java shop and you have hundreds of Java developers, you know, one day, you know, there was the concept of a SQL developer where you had your developers understand SQL. Take that, put it in an operator written in Java, and deploy it. And now you can take that expertise and do it. Most vendors, I would say, probably use Golang, so you would need to know Golang for that.

Some do Rust for, like, the really high-performance stuff. But, again, it’s just like interviewing a DBA or a software developer in your company. Are they a fit? If you’re going to contribute to it or if you’re going to install it, do you understand how to look at crash dumps from Golang o r .NET or Java?

Those are kind of the things you need to look at as well.

Vinay: So is an operator the only way to run a database on Kubernetes? Or, you know, you talked about stateful sets.

Eddie: It’s a means to automate the installation. There are a lot of databases that are installed with Helm charts. Some are installed with just vanilla manifest and vanilla YAML files.

At the end of the day, something is going to need to give Kubernetes YAML. Its object. Now in Helm, that’s a templating package that lets you give it some variables, and it’ll create that and do it. That’s a once-and-done.

That’s kind of like, “Hey, we installed it. We set it up for you. Thank you very much. See you later.”

Operators ultimately create the same manifest, but then they have that loop where they can go back and they can start updating it. So you can do it in other ways. I would recommend going with operators. I think operators are the way that gives you Day 2 capabilities. And it could be a group of operators. Sometimes there are operators that create cron jobs for backups that create, you know, other operators to manage a different life cycle of it.

I think operators are the way to go. It’s not the only way, but I think it’s the way to go.

Vinay: So we touched on, you know, portability and we generally agree that it’s something that is considered, you know, important, right, among organizations.

When talking about Kubernetes, you know, different cloud vendors, especially, they have different, you know, varying levels of conformance. API compatibility, and feature support. Then you have your on-prem platforms.

You have that in your, maybe, your OpenShift. In OpenStack, you have some other components, Magnum, you know, to do that. So ideally you want to run applications in as standardized an environment as possible. Images in a repo, standardized software, you know, Helm charts.

But is it a case of, you know, write once and debug everywhere? How portable is it really across all these different, let’s say, you know, Kubernetes installations, you know, different environments?

Eddie: So Kubernetes is a unique piece of software, and I think it gives everybody the opportunity to come to the table and play. That’s why competing companies are adopting Kubernetes and contributing to the project. And that started giving us drift in the very early days.

Like like you said, Amazon’s version is very different from Google’s version, very different than AWS’s version, but, thankfully, the CNCF, which is the kind of governing, organization the foundation around it, has started creating certified Kubernetes distributions, and that gives you kind of the baseline. This is, it’ll work.

It’s certified Kubernetes. It’ll work with your vanilla Kubernetes, whether that is your home lab, whether that is, like, a Tanzu, or whether that’s AWS, and so on and so forth. So with that, there’s some level of conformance. You’re still not gonna be able to build once and deploy everywhere or debug everywhere, but you’re going to tweak very minor things.

In concepts like I mentioned in my talk, things like storage classes, that’s a simple abstraction that the vendor can create. And now you can basically, talk to storage, fast storage without having to know the details of the e EBS or EFS volume or the Azure tables that are underneath or the storage disk or whatever. You can just tell your manifest, I want this kind of storage class, the default or whatever. Same with networking. Networking is very different.

CNI is a huge topic, then and it can vary from one to another, but you have things like kubeDNS, and you have services that live on top that give you that standardization, across. So there was a really good model, introduced back in, I think it was 2017, called the open application, the open application manifest, OEM.

If you go to oem.dev they kind of break out the roles into 3 separate areas. So you have infrastructure operators, so they’re the people that are focused on the infrastructure and configuring that. You have the application operators.

Those are like your SREs that configure how many replicas and so on and so forth. Then you have the application developer. So if following that, you can’t truly build 75% once, and then the last 25% is the variations, between, the different environments, and that could be handled by another team, a platform engineering team or your ops team. So you’re still gonna have to debug everywhere.

That’s kind of the given with any piece of software that you write, even data, because of the quirks. So, you’re gonna need that 25% to be really good at their job.

But now that 25% is not with every single team that at least can be optimized for your organization that way.

Vinay: And I guess it takes some skill to make sure it’s organized so that it’s only 25%.

Yes. Exactly. Yeah. Because if it’s all over the place, then, yeah, then you’re gonna take more time.

Eddie: That’s where, you know, things like, you know, platform engineering really evolved from you know, we had DevOps and people started, you know, looking at that, and there was a divergence, I would say, towards SRE and DevOps. And you realize that you had these teams that had the same kind of people doing the same work across the board, and they weren’t collaborating. You had an SRE engineer or somebody. Every team had a Kubernetes expert who was a little bit different than the other Kubernetes expert, and you ended up paying extra.

And then whether that’s time or physically, you know, resources, and you weren’t seeing that benefit. Platform engineering kinda came in and said, “Hey, why don’t we take all these folks who are building the common tool sets and the tool chains and provide that as a service, right, platform engineering as a service, internally?” And, you know, the big poster of platform engineering is your internal developer platforms, your IDPs, that are like your portals that say, here’s everything that we offer.

You can pick and choose any kind of index yourself there. Do you need a cluster? We’ll give you a cluster. Do you need a, you know, storage array?

Here’s a storage array. And it’s the experts in the company that have come together. That’s when you can start seeing some of the efficiencies.

Vinay: So transitioning and looking a little bit more, you know, future, what next?

Where do Kubernetes go? First, you know, let’s say more broadly and then for maybe stateful workloads. You know, I mean, will we see more organizations like, for example, Starbucks, right, you know, basing their internal, you know, database as a service on the Kubernetes. You know, I mean, for public DBaaS, we have, you know, MariaDB SkySQL. We have EDB, you know, BigAnimal, it’s called. Where do you see things, you know, going?

Eddie: Well, I mean, I definitely see Kubernetes getting into many, many more areas. And we’ve seen it put in cars. Now they have clusters in cars. They have clusters on airplanes.

They have clusters in fast food chains, and they’re absolutely running the applications on their, on their edge clusters, if you would. And I absolutely, they already are putting databases on there, but I think it’s gonna get a lot more, you know, to be the norm where you start seeing these databases. Even if it’s local store. The local storage that you have, they’re gonna be put in a Kubernetes, and that might potentially then be set up into higher and higher levels of warehousing that could be in the cloud.

So I think what we’re gonna see with the geopolitical, you know, events that have been happening for the last few years, and then a lot of the new security things that are coming. There’s just an AI law that was passed in Europe. NIST 2 is happening with the vulnerability indexing in the SBOM. Same with the US around executive order.

Just yesterday, there was a bill passed by Congress about TikTok and who owns it. So the world is shifting away from the well, I just put it in the cloud. Good. Well, whose cloud is it, and who has access to it?

So I think there are going to be a lot of cloud players in the future. I think Kubernetes is going to take a top position there because nobody wants to build their application to work on one cloud that may not be there or may not be available for me to service my customers.

And so, definitely a big application push on there, but data has to follow. Data is probably the biggest, you know, red line in any of these, components in any of these laws or regulations.

Where is my data stored? AI is becoming, highlighting data even more. Now we need to be able to see it. Kubernetes is gonna give you the ability to abstract it, show it, and add monitoring capabilities on top that maybe you didn’t have before.

I think it’s going to be a first class citizen by the end of this decade.

Vinay: So what we need to get there is, you know, from the community and vendors. I mean, we need, as you mentioned, first class citizenship then, you know, there’s more development there, needed.

And, I mean, I don’t know if you take an illusion to OpenStack, you know, because OpenStack grew into this gigantic kind of thing with, I don’t know, vendors maybe pushing in different directions. You know, how do you see what’s needed for Kubernetes? And if in a way there is a risk to fall into this gigantic OpenStack

Eddie: I don’t think I don’t think I don’t think Kubernetes necessarily is going to be the highlight there. I think there’s gonna be standards

That is gonna be highlighted, whether that is things like SPIFFE or OTEL, OpenTelemetry, for example. Things that are gonna start standardizing where the customers are gonna say, look. I’ve already got 95% of my stack monitored using this, you know, this tool. Why can’t I look at my database? Why do I have to buy this bespoke product?

You know or hang on a second. I’ve got all the security policies. Why do I have to go learn another language? I wanna use Regos. Whatever.

I think that’s where we’re gonna start seeing the push probably in the latter half of the decade, especially with AI and especially with a lot of these tools that are coming out. And I think operators are gonna start becoming more mature. You’re gonna start seeing things that are, look, we can run it on Kubernetes.

We’re gonna give you Day 2. We’re gonna give you the auditing. We’re gonna start raising security alerts based on your data. We’re gonna start showing patterns, behavioral pattern matching, your queries and so on and so forth. And that’s gonna be where the competitive angle is gonna be.

Because if your organization is gonna invest in Kubernetes and they’re going to hire people that understand it and care and feed for it, I wanna put my data on there. Why not? I just invested this huge sum of money.

And so I think the vendors are gonna see it. I think we’re gonna start. There was an effort a long time ago with ODBC, right, or JDBC, which was, like, kinda a universal language for databases. I don’t know if that’s gonna happen. I think it would be nice to have it instead of everything being Mongo wire-compatible or Postgres wire-compatible. That’s effectively a standard, but it’s an unwritten standard.

I think we’re gonna start seeing some stuff from the CNCF, maybe some other foundations as well, start becoming, you know, the winners. And I think when there’s a lot of competition, it pushes it all to a better place, and then eventually, maybe one wins. We saw that, you know, we have all the different Telemetry providers.

Now they all are standard one on one, standard. And so maybe we’ll see something like data. I don’t know. But I’m really excited to watch the next 6 years and see what happens there.

Vinay: So to summarize, you know, Eddie, what would be your recommendation, to enterprises when it comes to devising a new, you know, Kubernetes-based database strategy?

Eddie: I would say, look, Kubernetes is here to stay. Invest in it. Invest in the people that you have. Have them understand the capabilities of the tool, and not necessarily how to be experts at it, but the capabilities. And then from there, look at your software, you know, just like we had the shift from it being, you know, just an enabler of, you know, some processes to being the business.

Your data is now in that pool. So look at it in the same way that you look at your applications. That it’s always an afterthought in organizations. Look at that and see how it can fit with the general Kubernetes landscape that you have so you can get the efficiencies there.

But, also, keep in mind that it may not fit or may not be exactly the right fit in that Kubernetes. So keep an open mind. You don’t want an organization to ever say this is it, and we’re going this way, period, one way or the other. Look at it as a tool in your tool belt.

Figure out what the best use is. Educate your folks on it, and, keep up with the whirlwind that’s going on, in the cloud native space. Kubernetes is really going to give you that freedom to pursue your customers wherever they are and deliver a product that is effectively you know, to put it another way, nobody’s paying you because you’re the best at CICD or you’re the best at, you know, orchestrating VMs. People are paying you for your product.

So take advantage of what’s already out there that can help you make your product the best that you can make it.

Vinay: Well, thank you, Eddie. It’s been great talking Kubernetes with you. Wise words there. And, folks, that’s it for today. So see you all for the next episode. Thank you.

Guest-at-a-Glance

Name: Eddie Wassef

What he does: Eddie is a VP and Cloud Architect for Vonage

Website: Vonage

Noteworthy: Eddie is a regular speaker at KubeCon due to his expert knowledge and passionate advocacy of Kubernetes, often taking the form of relatable analogies (Find his talk on Kubernetes StateFulsets as GOTG here: https://www.youtube.com/watch?v=hNRW0P8Zv0o).

You can find Eddie Wassef on LinkedIn

Available Databases

Available Databases