154. Releasing Software in Big Bang Fashion -- What to avoid...

When it comes to making changes and updates to software, there is one of two options: either you make incremental changes or you roll out a large update in a big bang fashion. But we’re not talking about life-or-death software like pacemakers or massive impact ones related to rocket launches—of course, those need to be treated with extra care and come with a host of regulations! We’re talking about your everyday applications like Slack, for example. Companies like Facebook have made the mistake of suddenly making big changes to their platforms, creating a sink or swim experience for users, which, as you well know, they raged at. It’s is in our human nature to resist change, so how does a software engineer go about implementing much-needed updates without offending the heck out of users? In this episode, we unpack the advantages and disadvantages of doing big bang rollouts versus smaller, systematic changes that people hardly notice, pointing out the considerations for each approach and giving you pointers for getting it right.

 

Key Points From This Episode:

 

  • What we mean when we talk about a big bang release and an example of this approach.
  • The high cost of testing and updating when it comes to medical devices such as pacemakers.
  • The challenge of making big updates to products that are widely used and people are comfortable with.
  • Hear how Slack reverted to an earlier version after getting negative feedback about a change.
  • Dave’s experience developing pharmacy software and the strict regulations for updates.
  • The appropriate approach to changing and updating high-risk, high-impact software.
  • Mitigating the fear of releasing for places that don’t have CI/CD tooling.
  • An overview of the various risks and disadvantages involved in doing a big bang release.
  • The implication of big changes for companies like Amazon and how it affects customer relationships. 
  • Using the strangler pattern and other advice for introducing new features to a product.
  • The benefits (including pizza and beer!) of doing a bug bash to try and test every new feature.

Transcript for Episode 154. Releasing Software in Big Bang Fashion -- what to avoid...

 

[INTRODUCTION]

 

[0:00:01.9] MN: Hello and welcome to The Rabbit Hole, the definitive developer’s podcast. Live from the boogie down Bronx. I’m your host, Michael Nunez. Our co-host today.

 

[0:00:09.3] DA: Dave Anderson.

 

[0:00:11.0] MN: Today, we’re talking about releasing a product in big bang fashion.

 

[0:00:15.7] DA: Yeah, big bang, it’s only good when you create the universe or riding the rocket.

 

[0:00:24.8] MN: Exactly. We’ll talk about big bang releases, we’ll have stories,  we’ll talk about what it actually is, why it happens and what are some alternatives to the big bang release.

 

[0:00:36.3] DA: That sounds good.

 

[0:00:38.1] MN: Let’s start, the big bang release, what is it?

 

[0:00:42.0] DA: I mean, in the beginning, there was nothing, right?

 

[0:00:46.2] MN: Bible class? Bible studies now? In the beginning was nothing and then –

 

[0:00:51.6] DA: Well like no. I guess a big bang release for Bible would be like you know, God was like really working really hard for seven days and on the seventh day, he was pushed all into production and.

 

[0:01:04.1] MN: Came push-app.

 

[0:01:07.1] DA: Yeah, there’s like all these bugs and like what?

 

[0:01:10.1] MN: Humans and all sorts of stuff.

 

[0:01:11.7] DA: Do you want that? YAGNI.

 

[0:01:15.6] MN: Right. The idea of a big bang release is pretty much if you were building a set of features that you were holding off and then you delivered this sets of features straight into production and everything was brand new.

 

[0:01:33.8] DA: Right, like you had no website and all of a sudden you have the entire website. I never thought about it before but actually, the creation myth from the Bible is actually an incremental delivery if you think about it because every day, he’s like he’s checking how things are going, he’s like okay, I made this and it’s good, I made the other thing and that’s good. People are liking this, I need a little bit more of that.

 

[0:02:00.1] MN: Get commit, today was going to be a good day that he – shut down the laptop.

 

[0:02:05.2] DA: It’s like CICD. You’re building features over time as supposed to like just making everything all at once.

 

[0:02:14.2] MN: Right. I was trying to think when for a big bang release, when would like one would actually do it. The only example I thought of or like think is a solid example is probably like things that may have to be like doctor related or like medical related. The thing that comes into mind was like pacemaker software.

 

[0:02:35.2] DA: Sure.

 

[0:02:36.5] MN: The thing that is checking and ensuring the heart is continuing to beat, I don’t think that’s something that you can just like we’re going to release a thing that’s in your heart and let it rock for a couple of weeks and see how the users feel about this pacemaker in their heart.

 

[0:02:53.8] DA: Yeah, I guess there’s a high cost of testing in that scenario. If you get it wrong then the cost is very higher I guess air traffic control software to – there’s one production and it’s got some really real things happening, there might be a way to model that where maybe you can have a simulation but there’s only one reality, it’s not like a website like Amazon.

 

Like even though Amazon is like a very big website and like when things go wrong there, there are impacts but like, there’s a lower barrier to like rolling back the problem when there’s a problem with the pacemaker, there’s a really high barrier to roll back because.

 

[0:03:45.6] MN: Right, you can’t be just like hey, I need you to lay down now. Pacemaker off your chest.

 

[0:03:51.9] DA: Let me take that out or you’re dead now, sorry.

 

[0:03:57.3] MN: My bad. Yeah, I don’t even know the current state of pacemaker solutions or software like whether that information can be over the air, updated, like I’m sure there’s science all of that is happening right now.

 

I sleep with the CPAP then I’m sure, that information goes up to like a website and my doctor can see that stuff too but like, when it’s in your heart, I imagine that’s like you do as much rigorous testing as possible and then that is 1.0 and then you’re not expected to do like a 1.1 of 1.2 for a very long time.

 

[0:04:34.8] DA: There’s like implications of over the air updates on that kind of hardware too where like, maybe you wouldn’t want someone to be able to push new code to run like arbitrary code on your pacemaker. Someone’s like mining bit coin on your pacemaker and it’s like damn.

 

[0:04:53.0] MN: My chest is hurting, what’s going on, I got .00003 bitcoin a day from Bobby pacemaker.

 

[0:05:02.6] DA: The other example that we have sounds like doctor risk taking or life risking things like NASA for a rocket launch, once you get that satellite up into orbit, maybe you can push over there and update to the software but you got to make sure that you have certain things right before.

 

[0:05:24.0] MN: Right, you have to make sure you can get it up there successfully and like there’s no – I imagine the way to test that, you do rigorous testing simulation as you mentioned Dave and whatnot but it is the rocket at the end of the day that needs to go up. If it fails, yeah, you got to wait a long time to redo that process.

 

[0:05:43.6] DA: Year. Has this ever happened to you before?

 

[0:05:48.0] MN: Yeah, I think I worked on a project big bang release. People, if they’re using a product that they really like and the product that they’re facing often is probably like their accounts page. So, we had a release that dealt with updating the accounts page and people were really used to like – even though it felt like very rigid way of like going through your account, people were used to it so when we tried to update the account page, it took like a year and a half to do this but when we released it, there were a lot of complaints about the idea that I can’t find anything.

 

It was easy to find things but people weren’t used to finding those things.

 

[0:06:30.2] DA: It’s like old Facebook user groups where it’s like one million strong and bring back the old Facebook.

 

[0:06:38.0] MN: Yeah, exactly.

 

[0:06:39.3] DA: Fundamentally changed some kind of large aspect of the user experience.

 

[0:06:45.5] MN: Yeah, exactly.

 

[0:06:47.1] DA: Do they still have those now? I feel like Facebook’s gotten a little bit better about like, incremental delivery where it’s like, you wouldn’t even notice that anything is changing but most part.

 

[0:06:58.1] MN: I think I definitely uninstalled Facebook a long time ago although I still have Instagram though. That’s my social media choice. There’s sometimes where you can see like the incremental change and you’re like hey, something’s wrong here, what’s going on? And then you kind of realize what happened.

 

But some people noticed it and complain and some people change it, for example, recent one was Slack. Slack has been out for some time and they’ve been making incremental changes and if you do /feedback and you really give it to them Slack, they’ll listen which is pretty funny.

 

[0:07:32.7] DA: I’m curious about the story there.

 

[0:07:35.9] MN: The incremental change story, I think they changed like the searching feature and they put search of like everything into command K which is like usually when you search for a channel name and I was like no, I don’t want to search words when I do command K, I just want to search for channels and people.

 

Why do I want to search for words, I want to search for words, I’d do command F like everywhere else. It’s okay, we’re still testing it out, we’ll revert it back if there’s enough complaints and like all right, cool, you better and then I guess they didn’t do it.

 

[0:08:04.2] DA: One million strong.

 

[0:08:06.6] MN: One million strong.  You have any stories or thoughts?

 

[0:08:11.7] DA: Yeah, I used to like do software development in pharma. There was a lot of regulation that we had to follow about like how we were documenting and going about making changes to the software. You know, there was a lot of like steps that we had to go through in order to make even a simple change to the softwares.

 

It kind of like was a forcing function four us to make larger changes or if there was a big release then it was a big deal to have a roll back plan and if you had to roll it back then that was like really awful because then you had to go through the whole process all over again.

 

[0:08:58.1] MN: Right, I imagine like just like the senior engineer, like the senior execs knowing that they have to go through this process, they try to cram everything in so that they just do that one time and then release that. It kind of leads into that’s like the next part which is like why it happens, why would you deliver software like this and regulation of process is definitely one of those things where you are unable to change it and be agile about that.

 

[0:09:29.7] DA: Yeah, like the more friction you have in the process to make a change, the more you want to ball up all those changes into – well, it’s not – if you need to change like 20 things, it’s not 20 changes, it’s just one big change of all those 20 things kind of rolled up all in one.

 

[0:09:47.3] MN: Right. If any of those 20 things break when you do that release then you may have to sift through all those changes like one by one to make sure you know which one’s which and fix that and that’s a whole other process in general.

 

The next one is like, like you mentioned before with the riding the rocket to space. You can only do that once every so often. I’m sure sending rockets up to the sky cost a lot of money.

 

[0:10:14.2] DA: Yeah, if they blow up, that cost even more money.

 

[0:10:18.9] MN: Exactly.

 

[0:10:20.8] DA: Yeah, when there’s a lot of risk, then you know, you’re going to try to get it right of that one first time as supposed to like having a more like kind of devil may care attitude of pushing and reverting.

 

I can see it also have like a fixed delivery date where like, there’s a maybe like a contract. It’s like okay, we’re obligated to deliver all of these features to this customer by this date or maybe like more arbitrary.  Someone just with a lot of power in the organization just says, okay, this is the day that it’s got to be done and then you just got to do a death march and get across the line.

 

[0:11:03.4] MN: Right, then like, since it’s that end date, you try to fit as much features into this entire application that will be released on this date because someone mentioned that it has to be released by then.

 

I think the releasing can be scary in places that don’t have any like CI/CD tooling which is why they may be have like a developer etch and do all these manual testing to make sure that everything is fine before deploying it to production.

 

Just to mitigate some of that fear, definitely look into getting some continuous delivery pipeline, continuous integration and whatnot.

 

[0:11:49.6] DA: That kind of turning it around from like okay, this is the one day of the week that we do a deployment and whoever is on deployment duty, like Bobby just sits down and he like buckles up his seat and he’s like white knuckled the whole time as he’s like copying files from folder to folder.

 

But I guess like sometimes, you may not have like an easy way to do CI/CD as well. If you’re doing like an app development on phone, like the feedback cycle’s a bit longer like you have to get approval for a release from the Apple store or –

 

[0:12:28.4] MN: Yeah, the Apple store is a big one too. I mean, there is like tooling out there that you can use and like you do have the option of doing smaller releases but you do have that friction that kind of makes it like more important for you to get it more correct.

 

[0:12:48.0] DA: Right. On the first go.

 

[0:12:51.0] MN: Windowing a big bang release, there’s definitely a lot of risk that one may run into because of the way of releasing a product then in the big bang fashion. You want to start us off Dave on one?

 

[0:13:06.4] DA: Yeah, I guess a big one is that you could have bugs in your application that you didn’t know about and then you just have to deal with it all at once like it’s like hard to figure out which part of the application is broken or why it’s broken because you just released all of the things all at once.

 

[0:13:32.4] MN: Right. I think that one that I’ve seen is definitely – I think a combination with like a fixed release date that is far and advanced will result in some form of YAGNI, if you’re like, we have two months, we should be able to do ABCD when you may only need A and B to release and then do it earlier. It’s possible, always be on the lookout for YAGNI if you don’t know 100% what the users are trying or would want from this feature. Just be mindful that can definitely bite you, YAGNI always remember the YAGNI.

 

[0:14:14.5] DA: Right and I guess if even though you are trying to do like big bang delivery, you can still do incremental development of user facing value like they go, “Hey I am going to develop this part of this page and then demo it” and be like, “Okay, what do you think? You’re really not flipping a switch.”

 

[0:14:35.4] MN: Dave mentioned before you are going to run into bugs. It is going to be hard to identify the source of the problem and with that it is going to be hard to roll back because you have to know exactly at what point in time. If you want it to say you delivered seven features and one of them is broken you need to identify at what point in time was that feature X broken to roll that back and leave the other six, do you roll back all seven? That is going to be super hairy is my thinking.

 

[0:15:05.7] DA: With data it is screwed up forever now like what do I even do.

 

[0:15:10.1] MN: Yeah that is going to be pretty difficult to do and I think one way to like by having small commits and small releases will definitely help with rolling certain things back because then you know exactly, “Okay was this particular feature that did it? Let’s roll that bad boy back and figure out how to fix it and then pre-release it.”

 

[0:15:31.6] DA: Right, it is a way smaller service area for figuring it out although even when you do a CI/CD like sometimes the changes get batched together, they’re just like it is a rolling ball of changes that are going out but even then it is a smaller amount of things that could be the problem than if you waited a month or a year to do a release.

 

[0:15:56.9] MN: Right and depending on your website and the product that you have. I think you mentioned, Dave, before like if you are running Amazon.com will be the equivalent to that and you do this big bang change, it is going to change something huge on the application many users are going to have feedback for you and to make those changes like all in one go is going to definitely change the user experience of your application and can cause a lot of friction between you and your customers.

 

[0:16:29.8] DA: Yeah and look, if you are completely cutting over like the accounts page all in one go then for all of the users then that is all of the people being angrier or frustrated or experiencing the same bugs. They are definitely tools that are out there that you can use to do a feature flag roll out for just certain sets of users like incrementally exposing people to that change.

 

[0:17:02.8] MN: Or even like running it in parallel and doing an AB testing to actually know if you get that information faster then you can use that information for the thing that you’re building. If it is positive feedback that you get from users when you AB test then to continue in that path and if you have negative feedback you know it is early as possible rather than waiting X amount of time before you big bang that release and customers may not be excited for it.

 

[0:17:35.9] DA: And I guess if you are developing a pacemaker or something like that then you obviously have some medical regulations you need to follow to do like a smaller role out first or like if you are doing a rocket release, you might fire a smaller rocket that doesn’t cost you millions of dollars before you fire the big rocket.

 

[0:17:57.3] MN: Yeah, definitely fire the small rocket first before you fire the big ones.

 

[0:18:03.6] DA: Like what are some of the mitigations that we can do for big bang releases?

 

[0:18:09.1] MN: I think the one that I had in mind is probably like the strangler pattern, the idea that you slowly make the changes necessary as you are building the application. So for example, if we are using the accounts page, I would think about what parts of the page should I cut up so that the new implementation would have smaller sets of features and then you slowly roll out the rest of the application so that sooner or later 100% of your application is using the new account page.

 

But you can start out with the billing and shipping page and people can edit and add new addresses for example and then we go to their credit card page but if people were to go to the account home page they will probably see the old one and then until they slowly roll that out. I think the one – this is probably going to be a shout out but the one that I am currently seeing right now that is doing an account page incrementally is Libson.

 

If you use Libson, you’d probably see that the home page will have information and then you click on, “Oh more details” and then it looks brand new. So shout out to Libson who are doing this change is pretty dope. That is definitely one example of this strangler pattern of taking certain things and making those changes and slowly rolling the new features into the application. Do not put strangler pattern and pacemaker in the same sentence because that sounds horrible.

 

[0:19:44.1] DA: You go I get choked, no.

 

[0:19:45.6] MN: Yeah you don’t want to strangler anybody with the pace maker. I am not saying you could do that but I am sure there are many other ways to mitigate some of these issues for pacemaker software.

 

[0:19:55.5] DA: Right, I mean that’s like you could like another strategy we were thinking about is running in parallel. Maybe you could have one pacemaker you are testing out and another one that is tried and true like you could have the good steady pacemaker, low risk and then, “Oh okay, we will have another one” I don’t know. Don’t listen to me, I am not a pacemaker engineer but I am just trying to think this through.

 

[0:20:21.0] MN: Yeah, this pacemaker is a rocket, I don’t know exactly. That’s only half –

 

[0:20:24.5] DA: Well you have a smaller rocket on top of the big rocket and then wait, yeah sure okay.

 

[0:20:31.1] MN: You shoot just gone. Shoot up.

 

[0:20:32.3] DA: Great, yeah I am going to shoot a rocket now I got it.

 

[0:20:35.6] MN: BRB. I think one way and I guess this doesn’t apply to pacemakers and rockets it is like all the mitigation strategies that we have. I do think the idea of doing demos like showing a demo of the new thing that you are adding to your application and having the user experience of the demos and look at some of the changes is definitely a big way because I think you mentioned before, you will get the positive or constructive feedback right then and there rather than when it is released and then all of your users see it.

 

[0:21:10.4] DA: Right and if you can’t have a real world situation at least it is like you will have a simulation or something approximating but at least keeping in mind that you are trying to fulfill the need for a user and so therefore you have the show that it is working or doing a thing that is useful.

 

[0:21:33.4] MN: I worked at a place before in time that did a bug bash and for those who don’t know what a bug bash is, is when you have you and a team of people going through the feature and just trying to break the feature however you see fit and that way you can find out all the edge cases that may need to catch and whether it would be a good user experience for individuals but you spend 30 minutes even up to an hour depending on the feature just like crushing it.

 

And just making sure that that stuff is bug free, thumbs up that is always good. So a bug bash party is pretty cool. Some people get like pizza, pizza is always good, pizza and beer. Do it at a 4:00 on a Friday, bug bash away and make sure that everything works fine.

 

[0:22:24.0] DA: Yeah, I am trying to put together a bug bash right now.

 

[0:22:28.6] MN: Oh nice. Yeah pizza and beer I guess is where it’s at. I mean doing it remotely I imagine you may have to just order your own pizza and beer but.

 

[0:22:38.5] DA: Right, yeah just let me know where you live and then I am going to send some pizza and beer over, you know?

 

[0:22:46.3] MN: Yeah, contact list delivery.

 

[0:22:48.0] DA: Like a slice for sure, not a whole pie. You don’t eat a whole pie. I am just going to get you a whole slice.

 

[0:22:53.2] MN: Oh man, shout out to the pizza shops that will send two slices to your house and some beer but do a bug bath. Yeah the real heroes, shout out to the – yeah, you know bug bashes are dope. I enjoy them, just allow me and you just got to feed me that is pretty much what it comes about to.

 

[0:23:10.0] DA: I feel like we have a whole episode on bug bashes but yeah, trying to put as much pressure on the software before it is actually out in the wild will make sure that you don’t need to do that costly rollback or have that big risky failure at the worst possible time.

 

[0:23:33.9] MN: At the worst possible time.

 

[0:23:36.1] DA: And I am riding that rocket. I do not want it to play out.

 

[0:23:39.8] MN: Yeah no, definitely. You need to make sure that that rocket is going to ride all the way to space.

 

[0:23:45.4] DA: Sulking on straight to Mars, meeting Elon Musk.

 

[0:23:50.4] MN: Oh yeah.

 

[0:23:51.5] DA: The one thing you definitely want to try to avoid is like having everything like all the pieces coming together at the last possible moment when you are also trying to get out the door to the customer. So big bang sometimes you can’t afford a big bang release like you know the customers is only going to get like the finished product. It is going to be packaged and shiny and beautiful or it’s going to be a rocket and it is going to go to Mars.

 

But you know, you don’t want to test that all the pieces are working together properly as you are firing the rocket or whatever.

 

[0:24:33.1] MN: Yeah as the rocket gets ignited that is not the first time that it is communicating to other parts of the rocket.

 

[0:24:38.1] DA: Right so like especially with software, when you are developing it through user, slices of user facing value you can avoid that big bang integration where it has to always do something. It has to work together so that can help you avoid having this larger harder to understand bugs.

 

[0:25:00.3] MN: Yeah I mean if you are going to do – if you are planning to release as a big bang release just make sure before you do that that everything is intercommunicating with each other so that it does happen. There is some testing beforehand when it is doing that rather than when it is released all at once. That definitely is not a rocket I want to be on that’s for sure. That is not a pacemaker I want to install in my heart nor a rocket I want to ride. That is the –

 

[0:25:33.1] DA: If you have a pacemaker you probably shouldn’t be at a rocket I feel.

 

[0:25:37.2] MN: And if you are a rocket engineer or a pacemaker engineer please tweet at us because I am really curious.

 

[0:25:44.5] DA: Tell us how we got wrong.

 

[0:25:45.5] MN: Well yeah, I really need to know. Tell your friends if you’re not – if you happen to know one because you know rockets are cool but how do you develop them? How does it all work? Hopefully there is some testing that happens, there are some incremental changes that happens before that rocket gets sent to space.

 

[0:26:03.6] DA: Just hopefully.

 

[0:26:07.0] MN: I hope.

 

[0:26:08.3] DA: I hope, okay well maybe it will happen.

 

[0:26:11.7] MN: Yeah maybe it will happen, I am hoping. Prove me wrong rocket engineers please, I really would like to know.

 

[0:26:16.8] DA: Just like big rocket cowboys just like shooting rockets up left and right.

 

[0:26:21.9] MN: Just bashing on the keyboards and shipping rockets literally that is what’s happening but yeah I think if incremental changes to a product is something I personally would do. If you can avoid the big bang release, do so make any incremental changes. I think it will be good for your customer, for your product and for your own sanity if something does go wrong.

 

[0:26:44.6] DA: Totally.

 

[END OF INTERVIEW]

 

[0:26:45.1] MN: Follow us now on Twitter @radiofreerabbit so we can keep the conversation going. Like what you hear? Give us a five star review and help developers like you find their way into The Rabbit Hole and never miss an episode, subscribe now however you listen to your favorite podcast. On behalf of our producer extraordinaire, William Jeffries and my amazing co-host, Dave Anderson and me, your host, Michael Nunez, thanks for listening to The Rabbit Hole.

 

[END]

Links and Resources:

The Rabbit Hole on Twitter

Slack

Facebook

Amazon