Our inaugural guest is Marianne Bellotti, author of the acclaimed book Kill it with Fire. Thanks to our partners at IBM, we’re going to spend 90 minutes diving into the challenges of legacy systems, technical culture, and modernization.
Want a taste of what we’ll be discussing? If you haven’t had a chance to read the book, we’ve recorded an interview with Marianne and FWD50 chair Alistair Croll. They look at the patterns of legacy modernization, the history of IT infrastructure, technical culture, pitfalls, and what leaders can do to cultivate continuous improvement.
[00:00:00] Alistair Croll: [00:00:00] Hi and welcome to another episode of FWDThinking. I’m FWD50 Conference Chair Alistair Croll. FWDThinking is a production of the FWD50 Digital Government Conference and this episode is made possible in partnership with IBM. Today, I’m going to talk to author and IT modernization expert Marianne. Marianne has a storied career in Public Service and also the private sector, she’s worked on legacy modernization, getting some of the massive monolithic systems on which society functions to run smoothly and enter the Modern Age. And what I found most interesting about her new book Kill It with Fire is how little of it’s about technology and how much is about human factors, calculating risk agency, giving people permission to get context on systems, the management of teams, how to use crises as an advantage. It’s a fascinating discussion that ran a [00:01:00] little long, and we’re incredibly fortunate to have had this chance to talk candidly with Marianne.
[00:01:04] So please join me in this journey down IT modernization with Kill It with Fire author Marianne Bellotti. And I have to be candid, Marianne, in reading this book, I kind of got PTSD because all of my memories of large system integration and big company innovation and mainframe connectivity to PC clients and cloud migration, all these things kind of came to a head. And it’s a fantastic book. Thank you so much for joining.
[00:01:29] Marianne Bellotti: [00:01:29] Well, thank you for inviting me. I’m so thrilled to be here.
[00:01:32] Alistair Croll: [00:01:32] So first of all, why don’t you tell folks a little bit about your, and then I’ll get into my many, many questions in my heavily scrawled on book. And if you haven’t already bought the book you should probably go get it because it’s kind of full of flashbacks and I spent the last few days consuming it. So who are you? And what’s the book about?
[00:01:49] Marianne Bellotti: [00:01:49] Yeah, so I have sort of a interesting indirect career into technology and IT modernization specifically. I sometimes like to refer to myself as a [00:02:00] relapsed anthropologists because all of my formal education is in anthropology and organizational science. I got into programming because my father is a computer programmer or he’s retired now. So he was a computer programmer working on old COBOL mainframe systems, a little bit of Java towards the end of his career in like big banks. And so from a very early age, like the late nineties, even like my , my youngest memories are about interacting with very, very old computers and computers were always in my house, like stacked up 3 deep in my basement. And so from about the age of 14 onward, I started building computers from spare parts and surfing around on the internet and writing code and those sorts of things. So I, by the time I got into sort of a college environment where one would normally like formally decide to become a computer scientist and [00:03:00] go through all of that, that highly disciplined development work, I was kind of bored of it. I had been doing it for a while and I wanted to go do other things. I wanted to see the world I wanted to, are very classic like, I want to go out there and save the world type attitude. So I spent a huge amount of my twenties kind of running away from technology, trying to establish my career in a different field. But as a result I came to realize that the core skill set that I had interacting with computers and being technically literate was often the most valuable thing I’d bring to the table with a lot of those organizations.
[00:03:35] And so I gradually started doing more and more technical type work for a lot of the nonprofits that I was working for, international development groups, and that eventually led me to sort of make a commitment to become full-time professional software engineer and eventually take a job with the UN working on their humanitarian data exchange project from like the very core beginning. [00:04:00] It’s origin story basically. And so I really enjoyed my work at the UN and I had been doing some work on legacy modernization for the city of New York and for some other large organizations. And my friend told me about this thing going on in DC called United States Digital Service. But USDS was like maybe a year old and didn’t have a lot of processes yet with it. So I went, I looked on the website and I’m like, I don’t, I don’t know what this is. And I don’t know what they’re doing. But I was like, my contract is coming up at the UN for renewal. So maybe I apply, they call me back. I have some interviews, I’ll figure out what they’re doing from the interviews. And like maybe, maybe I get an, a competing offer and I use it as leverage in my renegotiation at the UN and then like four or five interviews later with USDS I still had no idea what they were doing, but it became very clear to me that there were some really, really interesting, compelling people in that, in that organization. And there were also some really [00:05:00] interesting, compelling problems with some really large old systems. So I spent about three and a half years with USDS, which is about the maximum. I think that the term limits cock in it four years. And it just, it was an incredible experience dealing with systems that were this large and this old. And one of the things that I got the most benefit out is realizing that I had all of these skills that I didn’t realize that I had. Some of them were technical, like knowing how old systems functioned, but the vast majority of those hidden skills were actually more in line with those organizational science principles.
[00:05:41] Like I remember coming into USDS very early and talking with a colleague who I was taking over a project from and he was telling me that he was going around to government people, and this is early October, he’s gone out to government people and pitching them on the cloud and how much money it was going to save them. And I was just like, wait a [00:06:00] minute. You’re talking to government people about saving money in October. And he was like, yeah. I was like, that’s not, that’s not the pitch you want to tell them on cloud, like, there’s, these are the things you want to
[00:06:11] Alistair Croll: [00:06:11] November is gonna be a terrible month for you.
[00:06:12] Marianne Bellotti: [00:06:12] He was like, why? I was like, because like, this is the end of the fiscal year and they’re on like general accounting principles for like Public Sector. They it’s like if they don’t spend their money, they lose it the next year. It’s like, no one cares about saving money in September and October. Like people want to light money on fire. It’s like whatever’s left in their budget they want to burn to the ground so that their budget doesn’t get cut next year. It’s like, you can’t walk around and telling CIOs gonna save you money in October. And it was like, things like that, like things that I had come up from a different perspective and I had a different set of like principles that I was kind of navigating this world from, and that was super valuable and I hadn’t even considered or appreciated that.
[00:06:57] Alistair Croll: [00:06:57] So that’s one of the things that I took away from the book is [00:07:00] context is so important and get letting your engineers achieve context with a system. And when you’re talking about technology context is monitoring and it’s, it’s being able to really understand what was the world like when this was written? Was there some kind of engineer who wanted to advance their career because Java was the new thing back then and that’s why this one thing is written in Java. You know, what were the price incentives? How long did they think the system would be in place and as a result, what date storage did they use? So much of this is taking the time to be, I know you’re an anthropologist, this is almost like archeology and go in and like be the Jane Goodall that lives with the mainframe until you understand it, right. And that context can be incentives. Like you said, like fiscal year budgets, but it can also be context, like whether someone wanted to present at a conference or so much of what I took away from this book is that we don’t take the time to allow our technical workers to get the [00:08:00] context required to make wise decisions in the context of the world, in which those legacy systems were created. Yeah.
[00:08:07] Marianne Bellotti: [00:08:07] I think a lot of times people would observe my work from the outside and kind of say, Oh, well you have really great social skills. And I’m like, Well, thank you. But that’s also not exactly true, right?
[00:08:18] It’s not, it’s not the social skills that are difference maker. It’s the thinking about things in terms of systems, right? Not technical systems, but human systems and like, how are all these people incentivized and how do those incentives conflict? And like, what are the rules and regulations that are at play here? And that that’s not a social skills issue. That’s just like, that’s a skill set that you need to be taught how to do and know that you need to do it coming in. And so my main motivation for writing this book is that that’s true no matter the age of the system, like I build systems brand new now. And like a lot of these principles are still true, right? We’re cause we’re still dealing with the groups of people we’re still dealing with that kind of cultural [00:09:00] norms within the organization, the incentives and the way they conflict with one another. So I thought what would be really interesting is to write a book that was a technical book was not so technical, that it was inaccessible to people who’ve met. It’s maybe been a couple of years since their hands been on a keyboard or are kind of like in a supplemental role, not, not directly in line with the it stuff, but was technical enough to sort of get into the minds of software engineers. But its primary focus is not really technical in nature. Like I don’t think that there’s really any code in this book at all. There are some diagrams and there’s some technical jargon, but I tried very hard to make it accessible to technically literate, but not necessarily software engineering readers, precisely for that point, because I think it’s like your success or failure with any technical project is more about how you align incentives across your organization than it is about the technology you choose or the brilliant engineers you [00:10:00] hire to build it. And that’s something that everybody in the organization needs to understand.
[00:10:05] Alistair Croll: [00:10:05] So the term Kill It with Fire. I mean, I’ve heard that as a meme, I’ve seen it many times. I think it’s been on the Simpsons, but your book doesn’t actually say that you should kill systems with fire. In fact, quite the opposite. Can you explain, first of all there’s this, this misconception, I think that comes through very clearly in the book that a lot of people believe that building a technology product ends when you launch it and you have this line at the tail end of the book and the whole way through, I’m like, Oh, I bet you,she says this and you have this line in the last chapter. You say, stop saying the website is done. Right. And I think we miss the simplicity of like, this is this tangible object that I don’t have to worry about it. I can move on to new things that is left, which is a very atomic idea that we build a thing and then it decays over time, but it lives. This is a living being and as a result, you would need to kill it with fire because otherwise it’ll just get you living. Right. [00:11:00] But you don’t actually think we should kill these things. In fact, I think one of your big prescriptions is: “don’t rewrite things from scratch because you’re almost certainly going to die”. So the title is a contradiction. Was that intentional?
[00:11:11] Marianne Bellotti: [00:11:11] Yeah, somewhat. I think it, the title and the cover are, which for people who haven’t seen is a dumpster fire was very deliberate. That for me was sort of, well, let’s start with the people I want to talk to, let’s start at their level and like, what are their assumptions? Right. And so the thing that, the feedback that I would get the most often from software engineers, when we would start working together is this is awful and frustrating and gross, and like not a good use of my time. And I’m not going to get better as an engineer in particular, there was a huge concern among engineers that would enter USDS or any foreign federal service that their technical skills were going to atrophy as a result of doing this work and that they were not going to be hireable after it because they would have lost like their [00:12:00] finger on the pulse of modern technology, which I thought was really silly, right? Like, cause I always felt like I learned way more about technology than I ever would have learned just a normal software engineer for doing it. So I wanted to just start at the level of addressing, acknowledging sort of winking nodding to those basic underlying assumptions of the people I’m trying to reach, which are people who think: “I will never do modernization work because modernization work is awful and like, I just gonna avoid it”. And then sort of pulling them into the, what I would describe as the beauty of modernization work. Like I fell in love with it because I find it like interesting reflection and exploration of a group of people rather than a set of code. Right. It’s like, People think about code and technology as being math science-based and therefore completely and totally objective, stripped of any culture whatsoever. And that’s fundamentally not [00:13:00] true, right? It is very much an artifact of culture, the same way poetry or music or art is, it’s just involving computers and executing for some particular purpose.
[00:13:12]Alistair Croll: [00:13:12] So in government, we have this battle of culture versus technology, which you very well-described and that’s true in the private sector and the public sector. Although I think in the private sector, there’s more of a sense of, I want it, I want to prove myself so I can get a better job, whereas there’s more of a risk aversion and let’s be careful in government, but there in government, there’s this problem with public sentiment versus priority. So I’ve talked to a number of people in government over the years, since we launched FWD50 who tell me that they get a budget. And sometimes that budget is so big it’s a ruinous amount of money. Right. And you know exactly what I mean by that is like, have so much money that you now think you’re invincible or that you don’t need to prioritize. You know, the, the gating factor about money is if you have a little [00:14:00] bit, you need to be judicious. If you have a lot, it can be reckless, but the ruinous amount of money problem in government, I think, is amplified by the fact that money is an indication of political will. So if your government says, we need to fix climate change, we’re going to devote a billion dollars to climate change and you were to say, no, no, no, a billion dollars will never get anything done. Give me a hundred million and we’ll do something or something like that. Different numbers. The year rejecting the sort of levers of power to indicate to voters that you’re serious about your, your promises or your policy platforms.
[00:14:34] How do you deal with a ruined, like you don’t wanna say no to a ruined episode out of money, but it will ruin you. How do you, how do you balance that desire to demonstrate political will with effectiveness?
[00:14:44] Marianne Bellotti: [00:14:44] Yeah, I mean, it’s money and it’s also time, right? So I, like the US census, every decennial, every, so every 10 years has a kinda, struggles with their technology. Let’s, let’s [00:15:00] be, let’s be generous here. And I think largely it has to do with the fact that they have a lot of money to spend because the decennial is extremely important, but it also has to do with the fact that they have 10 years to do it. Right. And so I was chatting with them about their 2020 technology and looking to help them out. And they were talking to me about, well, why don’t you help us out with 2030? I’m acquainted, hold on a second. Why are we pretending that nothing will change in 10 years of technology? Right? Like technology that we build is generally out of date within six to three, like three to six months, right?
[00:15:33] Alistair Croll: [00:15:33] The lifespan of a phone these days is like four years.
[00:15:37] Marianne Bellotti: [00:15:37] Yeah. It’s like this idea that we’re going to do this, this, our critical architecture work in 2020 for 2030. It’s like what? Like, you can’t think about things that way, but they’re incentivized to think about things that way, because it’s a good governance, right? Like if you go to your, your Auditors and to Congress and oversight authorities and if you say, God, we’re doing all of this planning [00:16:00] for 2030, it looks really good. And it’s the same thing with money. Like if you have a good meaty budget behind people, take it seriously. I know a friend who was over DHS, who told me that like every project that every small project comes in at $5 million because it is less than $5 million nobody takes it seriously. So this is the minimum has to be at, which is, it’s just ridiculous. But as, as for how to keep a program efficient in that kind of environment, I would say that like, when it comes to your actual on the ground software engineers, I don’t think they have much visibility into the budget, right? Like I worked at startups that have lots and lots of money and you feel it like, cause the money is just being thrown around. But in government, generally speaking, like we’re, we’re not setting up kegs in the break room or anything like that. There are no fancy parties or things like that. Like people’s awareness of how much money is actually in their coffers is much, much less than in the private sector. And so I think you can sort of leverage [00:17:00] that. You can take advantage of that fact and sort of scope your plan as if you don’t have all of that money and as if you don’t have all of that time and kind of keep things structured. So really on the management level, that’s where that burden is to sort of like, keep that awareness in check.
[00:17:16] Alistair Croll: [00:17:16] So since you bring up the private and Public Sector. Last year I had a fantastic conversation on FWDThinking with Kathy Pham, Katherine Benjamin and Ayushi Roy, and they all have a background in Product Management and thoughts on Product Management in the public sector. And I’m a private sector Product Manager. One of the big differences is that in the private sector, you’re sort of building for the target market with the most money that has the similar needs and with the public sector you’re building for the widest possible market, often for the margins, because then the middle will be served. And so the challenge, I guess, is that you have to go talk to everybody who can say no, and that’s a, you know, that that can lead to this sort of consensus and, and [00:18:00] analysis paralysis that freaks out many people who try to take some time in the public service from the private sector. There’s so many differences. Do you have experiences with people that have been brought in from, you know, weaned on the startup world or brought in from, from big enterprise who then have to recalibrate themselves for the public sector, with the new scarcities and abundances?
[00:18:20] Marianne Bellotti: [00:18:20] Yeah. And I think the, appreciating the fact that it’s a different environment and that the exposure to different environment ultimately benefits you as a key part of getting through that, right? Like people that have trouble when they would think about things in terms of like, well, my way of doing it is just better. And so why can’t we just do it my way right? And, and when you realize that there are many paths to the center and that there are really like solid legitimate concerns with some of the ways things are configured, it sort of helps you kind of like learn what those benefits are. And it, it, it deepens your understanding of the context that technology works in. [00:19:00] So for example, there was a project I worked on towards the beginning of a USDS career that was about the renewal of student loans. So the department of education would make people fill out the same form every single year to renew their student loans. And if you forgot one year, you didn’t get your student loans that year, right? It wasn’t like I’m going to do a four year degree, I’ll fill out this form once and I’ll get my loans for all four years. So, so the main reason why that was necessary is because you would have to authorize the department of education to request your tax data from the IRS. And there was this huge internal battle within USDS about that idea about like, well, why can’t we just authorize them for multi years? Like the IRS gets your data and it sends over to department of ed, and the department of ed automatically goes, okay, here’s what you’re getting in student loans for this year, right? Why can’t we do multi-year authorization and, and there are [00:20:00] pros and cons to both sides of it, but like fundamentally it comes down to, yeah, it’s probably inefficient to do it this way, but also, do you really want the IRS to feel comfortable sending your private financial data to other government agencies? The reason why that role exists because Richard Nixon’s administration abused the IRS and the data they had to their political benefit.
[00:20:23] Alistair Croll: [00:20:23] So like, is this seems to be the case? All these systems, there’s very good reasons for all of these things like they are, but they’re such threaded needles that you wind up having to rethread every time. And, and again, it’s almost like there’s, they’ve accrued legislative debt instead of technical debt. Like there needs to be a thing behind that that goes, Hey, this is how you authorize annually all your public information to be communicated and get notified when someone looks at it. Yeah, that would solve the problem. Right. But, but that’s, that’s like legislative debt or technical debt. That’s so big and as you say, in the book: when a yellow team gets [00:21:00] called and we get to a yellow team in a minute, it’s because it spans multiple departments and that spans the IRS and department of education and so you kind of need like a legislative, yellow team to go. We need to rethink how information sharing happens with informed consent and the minimum effort on behalf of citizens, but there’s no government version of yellow teaming laws. Right?
[00:21:22]Marianne Bellotti: [00:21:22] We did, we did on the White House level because the US government is really persnickety about separation of those branches, right. I had a friend. Who was checking the people’s public websites for some particular vulnerability, I cannot remember if it was Heartbleed or drawn or whatever, but he found that the, the library of Congress had an active vulnerability on their website. And so he sent them an email and got an email back saying like, why is a person in the White House like doing security testing on the library of Congress, which is part of the executive branch. This is a constitutional violation of separation of powers. And it was like nothing came of it because [00:22:00] ultimately people appreciate you know, somebody trying to improve their security, but it was a very amusing email.
[00:22:06] Alistair Croll: [00:22:06] Yeah. That sounds like a great example of separation. You know, the, the executive, the tripartite government, which I think the whole world is that a crash course on in the last year. Has all kinds of interesting checks and balances.
[00:22:16] Marianne Bellotti: [00:22:16] Bringing it back to product management for a minute here, I would say that the core difference to understand between private sector or in government is that because government is not looking to make a profit, you have to adjust how you’re, you’re iterating, expanding the market, right? So generally in the private sector, people think like, what is the largest, most profitable like segmentation of people that might use this thing, let’s build it specifically for them for get every, all the kinks out in the user interface and then move on to another group and another group and another group. I find in government, it’s often beneficial to think about it in terms of the people who have the greatest need, right. So we had this huge argument about 508 compliance, [00:23:00] which is the United States accessibility standards. So like, can screen readers interact with your website appropriately? Like all of these tools for people who are deaf, blind, what, how, what have you and like a lot of the private sector software engineers would be like, well, we’ll get to that later. Like, let’s do it for like the norm person. And then like all of those, like edgy cases we’ll do later and I’ll look really, like, I feel like it is more important for us to address those groups of people first and then expand out our capability for a more normal group, largely because this is a government service who is the most like.
[00:23:37]Alistair Croll: [00:23:37] And support you get support for the normal group, normal quote unquote, you get support for the mass market by default, if you design for the edges, which is just a philosophical difference. Right? So I have a ton of questions and I want to get into some of the substance here. You mentioned entropy a lot. And I think that for, not so much the technologists, because they [00:24:00] understand systems and so on in most cases, but the executives need much better understanding of concepts like the fact that, you know, any untended system grows Moss and sort of entropy happens if you don’t maintain things, that it’s, that sort of systems are in living nurturing being, and you got to go and understand that, but also things like design thinking, incentives, game theory, there’s so many sort of liberal arts components to managing technology. If you had a month to train a Deputy Minister or the equivalent of another government on how to manage their technology roadmap, what would you tell them to do or
[00:24:41] read other than your Book, in terms of design thinking and incentives and game theory and all these things. What are the non-tech, cause most of them say, Oh, I’m just going to go to learn to code, or I’m going to learn agile or whatever. What are the things that would help, help them to really [00:25:00] look at the problems in new ways?
[00:25:01] Marianne Bellotti: [00:25:01] Well, what I think about what I think is really interesting about agile, is that everybody seems to take the framework of it and completely miss the philosophy behind it. Right. And, and agile is all about making sure that people implementing the technology are empowered to just make decisions about it and run experiments and then if those experiments fail, like for them away and like start over. Right. And, and people tend to like, ignore that part of it and just go, well, we have our scrum master and we have our sprints and we have like, just take all the accessories of it.
[00:25:35] Exactly. It’s like people like their, their frameworks and their structure and they just completely ignore what we’re trying to accomplish with agile. And so like the, the, the thing that is most important to me for somebody that high up to understand is what can your people who are doing implementation actually do, right? What are they actually empowered to do? And if there is a problem in the system, like who would they go [00:26:00] to? Like, how can they escalate? How does that get resolved? Because that is the number one cause of dysfunctions in any system, is that the people who are on the ground are not empowered to make a change to it. They’re not empowered to resolve a conflict. And if they escalate, they get punished for escalating. Like we had, like the thing that I did most effectively was actually go to the engineering level, figure out that there was a problem and then escalated up for them in a way that protected them, that kept them anonymous. Then there was no blowback towards them later on. And that’s really important. That’s a principle that I think the private sector is not perfect, but does way better than the government people in government do generally gravitate towards this idea of, well, if we make the punishment really bad, then obviously people will, will do the right thing. And it’s like the exact opposite. Like we know the exact opposite is true.
[00:26:59][00:27:00] Alistair Croll: [00:26:59] One of the things that you talk about in the book is this idea of breaking up monoliths, so first of all, can you quickly explain what a monolith is outside of 2001?
[00:27:09] Marianne Bellotti: [00:27:09] Yeah. So a monolith is basically any kind of computer program that is combining multiple functionality into a single operation unit, single code base, typically a single application.
[00:27:26] Alistair Croll: [00:27:26] So. You talk about sometimes a thing can be a function and it’s bad to take a thing that’s a function and try and make it into a service. And then you say sometimes monoliths are too big and then you’d be broken up into services. It seems like in the book, this idea of understanding what’s a function, what’s a service and what’s a monolith and when are those three important, is really a, a vernacular or like. Are there outcomes, razors for knowing what should be a function, what should be a service and what should be a monolith?
[00:27:57] Marianne Bellotti: [00:27:57] Yeah. So I’d say first people are really scared [00:28:00] of monoliths for a, I think unnecessarily scared of monoliths. Did they have a bad rap? And nobody wants to build the model with like, it’s a shame if you end up building a monolith, but I think that’s just actually the way code evolves, right? When, when there are a small features that you want to test out, you tend to not build separate services for them, you tend to just attach them to a place where they seem to logically fit. And then as you build them out and they evolve, they grow and they grow. And now suddenly you have kind of a monolithic structure going on. So what I emphasize is this idea that, that you can’t come into any, any technology project thinking I’m going to build this this way, and it will never have to change, right? You can’t, you have to change your failure mode that at five years now, for now you realize we have to rearchitect this or break it up, that’s not a failure. In many cases, that’s a success because it means people are using it. So you built the right thing and it was successful. The, the standard that I always use for people in terms of how whether or not they should be able to be building out separate services for [00:29:00] something, or they should build a model of for right now and break it up into separate services is really about maintenance in particular, I use on-call rotations as a benchmark because first of all, we should be on call for things that we are running in production. But it’s also, I think it, it becomes a really tangible numbers game really quickly when you think about it that way. So for people who are not familiar an on-call rotation basically means that we take a, a set of software engineers and we, we used to actually give them a literal pager, but now we have this thing called PagerDuty that’s on our phones. So we load the application on our phone and we say like, okay, Marianne for this week you are responsible for making sure that this service is online and performing well. So whenever our monitoring system tells us that there’s something going on with this service or it’s down, you’re going to get paged. If that happens at three o’clock in the morning, you have to get up log onto your computer, SSH in and fix it. If it happens in the Workday, [00:30:00] then, you know, it’s the same process, but you’re responsible for the system 24 hours a day for typically a week. Right. And then at the end of that week, I hand that over to the next person in the on-call rotation. We go through that. And realistically speaking, I don’t like to run on call rotations with fewer than six people. And I don’t like to run them with fewer than six people because what I like to have is essentially two different rotations. The first one being the real rotation, and then the second one being a backup list so that if God forbid your phone’s out of battery, or like, for some reason you’re unavailable on your, on call, we, we don’t wait around forever with the service down. We just paid the person who’s your backup. And so six is about the minimum number that we can do without people being on a call all the time and burning out. And so then if we think about it that way, then it’s like, okay, how many people do we have? How many rotations of six people can we do without burning [00:31:00] people out, without overloading people with all of this extra work. And that we’ll use that as a reference point for how many separate services we can run. So if we’re a really small team and we can really only handle one on call rotation of six people that we should be building a monolith. Right. But if we’re a larger team and we can’t handle more than one on call rotation of people, okay. So now we have like two services or three services. We can think about it that way. And I find that to be a really effective way of thinking about what is that architecture right now for the team size that we have for the resources that we have. And then as we grow. Like, when are we thinking about breaking it up? Because what a lot of people do when they, when they get into the world of microservices is that they build a whole bunch of little services that they never maintained. And that is not better than building a model.
[00:31:48] Alistair Croll: [00:31:48] So that, that segues perfectly into my next question which is about microservices. So I have explained the idea of a really good atomic microservice as Amazon’s [00:32:00] S3 storage system, for example, but this is true of, of Zephyr, any other shared distributed storage model. And all it does is two things. If I give it a URL, it gives me an object, like the image that I stored there. And if I give it an object, it gives me your URL, which is how I get that image later. And everything else is details, right? I go, here’s a movie. It goes, here’s the URL for that movie. I go, here’s the URL for the movie. It plays me in the movie. Like that’s all it does. And Bezos talks about a two pizza team, which is roughly the size of a couple of rotations, depending on how greedy your engineers are. But it seems like those things are connected that there’s this right size for an atomic service. But knowing what those atomic services are, is critical. I know in the UK, the the GDS is trying to build basic services like publish and subscribe or notification platforms or a forms system, you know, basic building blocks that a small team can make absolutely resilient and then have a very clearly defined contract [00:33:00] between them and other services. And if I took one lesson away for like the road to long-term sustainable, Legacy management. It’s take your monoliths, break them up into these atomic services that can be maintained by a two pizza team with a little on call rotation, recognize that they’re going to keep living forever and define really good contracts between those services so that you’re able to fix any one part of it, independent of the other parts forever. And that’s kinda like Nirvana that will never happen, but it’s an aspiration. So did I get that right? First of all.
[00:33:36]Marianne Bellotti: [00:33:36] So I think that is, that is the general consensus. Right. And I don’t think that it’s a bad strategy at all, but what I emphasize is trade-offs right. Like essentially we’re talking about efficiency, but yeah, if you have first, if you have product market fit and you know what you’re building and you know what the various services you need are, and you can separate them out into these atomic service services, that is a more [00:34:00] efficient way to spend your technical resources than other forms of architecture. But the reality of it is that sometimes it is to your benefit to run things inefficiently, because there are other factors in play. And so people will go like, well, I have to build services because I needed to scale. You can scale a monolith. Like it is possible to scale monoliths. I can name companies that are like billions of transactions per second, that are fundamentally just a monolith, at the end of the day,
[00:34:32] Alistair Croll: [00:34:32] Like Google.Yeah.
[00:34:34] Marianne Bellotti: [00:34:34] Google has the model at the, just the code base. Right. But it’s pulled out into separate services on deploy. So that, that’s kinda like an interesting sort of scenario right there, but like the industry that I just came out of, identity as a service, Octa is a monolith, Azureis a monolith. Like these companies, they, they are scaling modelists , the thing is, is it’s going to cost you more. Right. And so that’s [00:35:00] a business. Trade-off like the amount of cost that it will take to scale up that monolith if for like your peak transaction times, is that worth the some of the other factors of like breaking it up into microservices. And so I think that’s the key with technologies, understanding that there’s like one, a no golden rule. There are no silver bullets. There’s no one architecture or one technology that is gonna lead you in the right direction a hundred percent of the time, you have to really fairly evaluate the context of where you’re putting this technology and how you’re using it. And then be honest about what those straight offs are and why they end up being the case that you think to yourself, like, wow, like we have a set of COBOL that’s reasonably well-maintained that we like understand in a mainframe that like came off the, like, we just bought a new one from IBM. And so like that that that is that’s what suits us. That’s what’s going to get the job done for us. And that’s where our trade offs align. And like, I embrace those choices. I don’t think there’s [00:36:00] anything wrong with that because at the end of the day, technology is a tool.
[00:36:03] Alistair Croll: [00:36:03] Nice of you to mention IBM since IBM put this on. So thank you very much. You obviously love Conway and Conway’s law. Can you tell me why Conway’s law is not a curse? And what Conway’s law is maybe?
[00:36:15] Marianne Bellotti: [00:36:15] I think it will. So I encourage people that read Melvin Conway’s original paper that, that kicked off Conway’s law because first of all, it’s not terribly difficult to read. It’s not terribly wrong long. And it’s really interesting. People talk about Conway’s law in a very weird way as if it’s like some sort of curse, but when he really would, he’s talking about, is people designing technology based on their communication pathways. Right? So you don’t improve the architecture of your technology by moving the org chart boxes around on like Vizio diagram. Like that doesn’t immediately Conway. Well, it doesn’t kick in because you reorgs and then everybody just for your age is the [00:37:00] technology. It’s all about like who talks to who. Like, how do they relate to one another? How do they like report up to different, people and how do they talk to one another? And like, that’s a really critical insight. He also talks a little bit about incentives. Like he basically says, Hey, if you’ve got this organizational unit. That needs to justify its existence. The easiest way for them to justify their existence is to build the technology out as if it is separate and sort of now indicates like their presence in the organization. So I think Conway’s law is really a fascinating accomplishment to me because he was essentially just making observations. And yet they are so relevant to technology across all different types of organizations that I really encourage people to go back to the original paper and actually read the original paper, which is on his website. So it’s the easiest thing in the world.
[00:37:57] Alistair Croll: [00:37:57] Well, we’ll put the link in the video for sure. A [00:38:00] couple more quick questions. I know we’re almost out of time, which is awful because I have so many more questions. Another thing that really came through to me as an underlying pattern in your book, in addition to like entropy and stuff is creating an agency that you have to do so much to make the team feel like they have agency giving them all authority commensurate with the responsibility, even things like, yeah murder boards that aren’t intended to tear them down, but to show them that they are resilient. How important is self-confidence in agency, in the team and what does a leader have to do to sort of break a bad pattern there?
[00:38:34] Marianne Bellotti: [00:38:34] Yeah, I mean, like dealing with a team that has been burned many, many times by bad leadership is one of the more difficult things to do. And it’s just purely establishing trust at that point. And so I think for me, trust and authenticity, go hand in hand. I have a slight advantage in the sense that I have a very unique personality, a curse, like a [00:39:00] sailor, and I just show up and I’m very off informal and off hand. And like people find that jarring sometimes, but when they, they get accustomed to it, it’s a greater form of trust because like, I’m not going to be as them.
[00:39:13] Right. And I’m going to like work hard and, and be, my intentions are going to be honest and upfront. I think that works really well. The murder board stuff was actually, I was surprised at how effective that ended up being. So for those who are not familiar and haven’t yet read the book, a murder board is something we do really more on the policy side especially if we have people who are going through Senate confirmations, we just basically pull up people, the group of people together and have them think of like the worst questions that could be asked and the most aggressive way that could be asked. And then we, we sit the person who has to go through whatever this experience is, and we just hit them with it over and over and over again. And so it’s called a murder board because it’s like your colleagues trying to murder you. [00:40:00]
[00:40:00] Alistair Croll: [00:40:00] That sounds like like something that might take off in the Valley as a form of like radical candor meets extreme therapy intervention.
[00:40:08] Marianne Bellotti: [00:40:08] I think it could, I think it could, it’s like, I was just exposed to it as, like in the policy world and I was dealing with a team that had literally just been like they were terrified. They were terrified of doing things that they knew, know how to do because they, they, every time they failed, it was like, they were so critical to the infrastructure of the product that it was like, they were failing in front of an entire 500 person organization. And I was like, all right, let’s bring the best software engineers we have like the principal. Yeah. Engineers, the people who are like the organization’s cream of the crop. And let’s run this more murder board because as long as we do it, With the communication of like, we are doing this to make you stronger. And because we believe in you and not pulling you down or nitpicking you, then what it becomes, is an [00:41:00] experience where you were essentially being vetted by the top people in the organization. And that gives you a little bit of like, I’m going to say plausible deniability, which I’m not sure is the correct term, but it sort of made me is that if it does fail, it’s like you can sort of go back and point back to that and say like, well, yeah, Like the, the greatest engineers we’ve hired like looked at this plan and like threw their worst at it and couldn’t find a problem with it. Right. And like that gives people.
[00:41:28]Alistair Croll: [00:41:28] You mentioned that shifting people’s mentality from find the right answer to come up with as many possible answers. And then that them changes the incentives. And again, it comes back to human psychology. I have like three or four more things I’d like to quickly cover if you don’t mind sticking around for a couple more minutes. First of all code yellow, and, and, and for anybody watching this, like, just go buy the book because there’s so much truth in here, and there’s so much prescriptive stuff, which is what I love. That’s not the sort of chicken soup for the government soul. It’s like, here’s a way to do [00:42:00] this. Like a murder board. I thought the code yellow idea was great and you’re right. It’s something that I’ve heard from from X Google employees, but very few other people talk about what that is. You know, they know chaos monkeys and that kind of stuff. But code yellow is not something that’s been widely explained maybe because it’s so easily abused if you don’t understand why it’s intended. Yeah.
[00:42:20] Marianne Bellotti: [00:42:20] And it’s I think it’s kinda deep geeky, software engineer stuff. And I think a lot of the books that are written about Google’s process come from more of the product management and business side of things than the deep geeky software engineer stuff. But I found it to be immediately effective the first time I was put through a code yellow process. And I advocated for it wherever I go and is essentially be this giving people escalated privileges during a crisis for a very specific goal. Right. And the way the crisis is defined, doesn’t have to be an [00:43:00] outage necessarily. It can just be like, I think, Google used it because for the first time, because they’re, they’re latency numbers in, in Southeast Asia were awful or actually like just really, really bad. And they were losing, losing a huge amount of business. So it can be also things that are critical to the business, but just basically taking a tiny team saying you are now empowered. You, you’re getting the VIP treatment, whatever you need, whatever you want people have to stop, what they’re doing and immediately give it to you. And just letting them work on a very specific goal is highly effective. So I generally, this is the thing that I definitely try to bring to engineering teams as they start to work with them is like, what, what are the norms around that? And like.
[00:43:41] Alistair Croll: [00:43:41] So how do you, I mean, it’s great to create the double load departments, you know, am I six to solve a particular problem, but you don’t give someone a lifetime license to kill or let them fly around in you know, luxury airplanes. There is a sort of lifespan to this. How do you make sure someone doesn’t just do that cause their project’s late?
[00:43:59] Marianne Bellotti: [00:43:59] Well, so [00:44:00] I think the first thing is to have a good culture on incident response, you know, so the on-call rotations that we already talked about, but then like what happens when something does go down? What’s the process and procedure for that? Because you can sort of use the incident response process as a model for code yellow team. Right? We don’t run incident response all day, every day. Just because something went down once, right. We want it until we have gotten it back up and everyone has signed off on the fact that like, Well, the actual incident is over. Even if the problem isn’t a hundred percent fixed and it needs more, more work.
[00:44:35]But we don’t need to, like, everybody doesn’t need to be on high alert anymore. And if you’ve gotten that down and there was lots of great reference materials on like how to run incidents, some put up by Google, some put out by companies that do monitoring like New Relic or a Honeycomb, things like that. You got that done and you feel comfortable with that, then code yellow is much easier to implement responsibly then if you don’t, if you’re still struggling [00:45:00] with that sort of process.
[00:45:01] Alistair Croll: [00:45:01] Awesome. So I know in Canada we have, obviouslyFWD50 speaks to the whole world, we’ve had, I think nearly 30 countries attended last year, but we do see a lot of focus within Canada. And we have our own very big sort of legacy systems. The global case management system, for example, GMCs is a huge tool that, that many departments use, but it has been fraught with issues and outages and stuff like that. There’s a long governance, just many others. I love your idea about risk calculus that. The lowest risk is always not to do something, but that’s the legacy it tends to treat the future as an externality, right? That this idea that that’s actually not your biggest risk, you’re just delusional because you’re using, using the information that’s afforded to you. And everyone’s heard the old it crowd. Have you tried turning it off and on joke, but you have a different version of turning it off and on to kind of find out where your risks are. Can you get into that? Yeah.
[00:45:56] Marianne Bellotti: [00:45:56] So, I mean, like with legacy systems that haven’t really [00:46:00] been maintained, well, there are always these services and like nooks and crannies that no one can really figure out what they do. And so, like there’s only so far you can get with like research and interviewing people and like just nose to the grindstone, trying to figure out what something does without touching it or interfering with it. And, and, you know, like, right. Mikey Dickerson from USDA, very pragmatic and like unforgiving attitude towards this. He’s like, well, if you turn it off, you’ll figure out what it does really well quick, right. And if nothing bad happens and no one comes complaining, then maybe we didn’t need it. All right. In the first place. And when you talk about like these really large systems, the problem with maintaining them is their inherent complexity. So anytime you can sort of minimize some of that complexity, you probably end up with a better state than you had before. So you, you get the benefits of, if you didn’t really need it, now you’ve minimized your complexity. If you did really mean it needed. Now, you know what that part of the system does. [00:47:00]
[00:47:00]Alistair Croll: [00:47:00] And you, you make that point that, that there’s a temptation in the modernization to then add new features rather than achieving feature parody. And this myth that like, because you have a working system, you have a blueprint for a new one, and then you can improve it. But you are a strong advocate of like Build it to just get the same as the other one first, before you start adding your features, maybe even less than the new one and what the real MVP was.
[00:47:24] Marianne Bellotti: [00:47:24] Yeah. Well, old systems are not specifications for new systems and they’re frequently treated that way. But I think that the, the private sector, at least technology as a whole has become very comfortable with the idea that you will never build a perfect system that will never fail. So from our perspective, when you turn something off, That seems reckless, but the reality of it is you should have the process and infrastructure in place to handle a colossal failure on your system. So that system is simply being turned off. Shouldn’t be anything out of the norm for you should be prepared to handle an outage of that [00:48:00] system. If you were prepared to handle an outage on that system, you can turn it off whenever you want. Right.
[00:48:04] Alistair Croll: [00:48:04] Well, you make the point that there are, I’ve read the numbers, there’s like two full-time employees just swapping out hard drives in a typical Google because they’re just, you know, the, the meantime between failure and hard drives you’re just going to be replacing little hard drives.
[00:48:18] Marianne Bellotti: [00:48:18] So they get very good at replacing hard drives. Nobody notices when a hard drive fails.
[00:48:22] Alistair Croll: [00:48:22] As a fact,, several hard drives have failed during this call. Right. And that’s like an amazing thing to put your head. Like literally during this session, the internet hasn’t gone down and several hard drives have failed that were probably involved in delivering these services. Okay. Last question. This is fascinating and I’m thrilled that we’ve had this much time. Thank you so much. You talk about using crises to produce a change that you can say, look, and often those crises are performance and, and they’re sort of, you know, A matter of opinion, but in some cases they’re their security crises that cannot go unlooked. What is SolarWinds doing to [00:49:00] legacy modernization and government systems right now?
[00:49:03] Marianne Bellotti: [00:49:03] SolarWinds is a really weird case. First of all, in order to have a crisis to leverage a crisis to the benefit of legazy monitorization we have to be honest about the fact that there is a crisis and the impacts that it has, and I I’m no longer in government so I don’t have a great view into like how SolarWinds is, is changing the, the way government it is configured, but I have gotten several emails over the last couple of mornings of like press releases of the government declaring success. And I am a little skeptical of that because I’ve seen miracles before, but like from the initiation of this crisis to the scope of must have had in government to the, where we are now like that timeline is very short for me for an in terms of like real change in government. So I’m a tiny bit skeptical. And that, that gives me a little bit of concern that like we’re wasting acrisis by like, essentially just getting so anxious about [00:50:00] how the optics of it. The other thing that’s interesting about SolarWinds is that there were agencies that did not have good upgrade and patching strategies that were saved fromSolarWinds because they missed vulnerability, the, that the vulnerability in it. So I had some of our friends that are still in government, they had a conversation about like, How do you make sure that the lesson that they learn isn’t, we need to stop upgrading our systems or that we need to put this completely ridiculous, like audit process on top of the upgrade of our system because like, this is like the black Swan event where like keeping your system well past, made you more vulnerable rather than less. Right. So I think that there is a, an, a huge advantage to be had and SolarWinds to learn from it and to improve the process of security and government. I am optimistic and hopeful that it will be used to that way, but there are several places where there is a potential for it to [00:51:00] go wrong because of that, because the government might shorten the timeline and an attempt to improve the optics. And because people might actually take away. The idea that they shouldn’t upgrade their systems.
[00:51:12] Alistair Croll: [00:51:12] Very fascinating. So Marianne I really, really enjoyed this book and I read a lot of books and I’m not just saying that I did not expect it to be such a human book. You have technology, ideas, functions, and services and monoliths and and code yellow and all these kinds of things, but you also have a tremendous amount of stuff about humans, about managing expectations, about entropy and sort of the philosophy of systems about risk and externalities and bad calculus and, and just agency and recognizing that that this is continuous work. That it’s much closer to farming where you’ve got to go, tend to the crops and recognize the cycles than it is to manufacturing where you sort of ship it off the production line and hope the customer never calls. And if anything can come of this, [00:52:00] it’s that culturally, I think leaders need to recognize that engineering teams are farmers tending the crops and not factories pumping out things they hope never to see again. So thank you so much, first of all, for being here thanks to IBM for making this possible. And thank you for writing this book because it is a truly wonderful book. And I know our friend, Sean Boots, who I actually asked to join us this morning but selfishly, I had other questions he’s up in the Yukon. It’s way too early for him, but he says, but so many people like Sean who have been working at the coal face for a decade now, folks meeting AF and the presidential innovation fellows and the CVS and the GDS. They’re all just saying this book captures things they’ve been trying to say perhaps less eloquently or perhaps in ways that were ignored because they were not politically appropriate to say and you’ve managed to capture a tremendous amount of that stuff in, in what is a fairly short and entertaining read. So thank you so much for putting this together and it’s been great getting to know you a bit and poking some of the, or [00:53:00] pulling some of the threads that you’ve raised in the book.
[00:53:02] Marianne Bellotti: [00:53:02] Well, I really enjoyed talking today. Thank you so much for inviting me.
[00:53:07] Alistair Croll: [00:53:07] Great to see you.