97. Mocks vs Classical with Jacob O’Donnell

On the show today we are welcoming back our friend Jacob O’Donnell to talk about some different approaches to testing. The two main ones we will be discussing are mocks and the classical approach but we also chat about spies and stubs and everything in between. With some help from Martin Fowler and a few other resources we break these concepts down as best we can and share personal experiences on the topics. It does seem that there are different contexts for the different approaches and that there are definitely some overlaps between them but in the end you just want tests that are not going to break and that are going to best serve the work you are doing! For all this and much more be sure to join us here on The Rabbit Hole!

Key Points From This Episode:

  • Defining the difference between Classical and Mockist testing
  • Why Jacob is wary of too many mocks.
  • Mounting and testing with language.
  • Avoiding breaking all the tests!
  • When is mocking the best way to go for a test?
  • The value of integration tests and catching bugs.
  • End to end tests versus unit tests.
  • The GOOS framework for testing.
  • Putting spies up against mocks!
  • Defining what a stub actually is with the help of Martin Fowler.
  • And much more!

Transcript for Episode 97. Mocks vs Classical with Jacob O’Donnell

[0:00:01.9] MN: Hello and welcome to The Rabbit Hole, the definitive developer’s podcast in fantabulous Chelsea Manhattan. I’m your host, Michael Nunez. Our co-host today.

[0:00:09.8] DA: Dave Anderson.

[0:00:10.8] MN: Our producer.

[0:00:12.0] WJ: William Jeffries.

[0:00:12.0] MN: Today, we’ll be talking about Classical versus Mockist testing.

[0:00:16.4] DA: The most Mockist testing.

[0:00:18.8] MN: The most Mockist ever. I imagine there’s the classical way of testing and then mocking all the things, I think, I think we’ll get right into the crux of it before we start. We got special guest, reoccurring, going down, Jacob O’Donnell, what’s up?

[0:00:35.1] JD: Guys, it’s been a long time since I’ve seen you guys.

[0:00:37.7] MN: Yeah, it’s been a while.

[0:00:39.7] WJ: Been too long man, you got to come by more often.

[0:00:42.4] JD: I should do like a triple whammy sometimes.

[0:00:44.7] DA: Whatever that is. Got that year, known as the Beethoven of Classical testing, right?

[0:00:53.4] JD: Yeah, actually. I hear that quite often and you know –

[0:00:57.8] WJ: I always thought of you as more of a Mozart.

[0:00:59.8] JD: Kind of a sort of a smorgasbord of classical composers, I think I’m sort of - out of ones I know but I’m kind of all of them mushed into one. I mean, I guess in a way of introduction, I’m in my mid 30s and I don’t think there’s much else to say.

[0:01:15.3] MN: And a programmer.

[0:01:16.1] JD: Yeah, I live in Brooklyn.

[0:01:17.6] MN: Yes.

[0:01:20.1] WJ: That’s far out.

[0:01:20.9] MN: No, I don’t know about that, get that out of here. BX all day, stand up, sitting. What is Classical testing, not just like the Mozart and the Beethoven, let’s define the two and then we’ll go over the verses and why one may be better than the other and vice versa?

[0:01:37.8] JD: Okay, this is based off like a Martin Fowler blog post, it’s fantastic, if you google TDD Classical versus Mockist, Martin Fowler, you’ll find it. It’s a great article.

[0:01:48.2] DA: 27th January, 2007. That was the day, yeah.

[0:01:53.4] WJ: Man.

[0:01:53.7] MN: I felt bad, that guy’s old.

[0:01:56.5] WJ: Facts.

[0:01:57.1] DA: 2nd of January, see if it’s older. Gosh. It’s like new year. I need to get this down.

[0:02:03.6] JD: That’s a good point, yeah, he was not celebrating, he was busily hammering out a giant blog post. The gist of what it’s talking about is in the Mockist style TDD, you are actually like you have your subject under task which for the sake of it, let’s talk about an object and every collaborator is a mock.

You wouldn’t ever pass in anything like if your object itself, like constructed its own object, that would be a problem. You would want to instead use dependency injection so every single collaborator is passed in, everything is mocked out and all of your tests are asserting that some action happens that. This is called on this other object.

Your unit test is really a unit test. It is only testing the subject under test. Versus Classical TDD where you don’t worry so much about mocking. I mean, mocks still have their place in Classical TDD if you have a very long – usually it’s time based if you have like a network call that could either be flaky or take a long time then you mock it out but you certainly let your – you don’t worry so much about mocking every dependency.

[0:03:20.5] DA: Yeah, I guess, what you're saying is like in the Classical form of testing, it’s – you’re going to be more focused on the overall outcome that you want to happen. As a user, I want to see like you know, my widget get updated or something like that or you’ll see some side effect in the screen or something like that.

[0:03:42.7] JD: Yeah, if the side effect happens sort of outside of yeah, I mean, I guess, that’s typically what a side effect means, it’s sort of happening outside of - out of your particular thing under test.

I’m personally a much bigger believer of Classical TDD. In general, I actually haven’t really seen anyone go full blown mock. I have actually never really seen that but in general, I do see a lot of mocks plaguing, I mean, obviously I use a strong word, I don’t feel great about mocks, when I don’t think they are required, especially front end where you shallow mount your react object and every single sort of thing, you have a ton of mocks.

Every time you’re, like, clicking on a button and dispatching something in Redux, you’re just mocking something. The whole kind of system is lost in your task. You’re not saying okay, when I click this button, a modal pops up. You’re saying, when I click this button, dispatch is called with open modal.

[0:04:45.1] DA: Right. You just trust that dispatch is the thing that you think it is and that is wired into something called Redux and something called Redux will change the state. In the end, they’ll be a real modal.

[0:04:57.9] WJ: But there is an advantage there which is that now, when your code fails, you know it has to do with this particular model under test and not any of the dependencies, right? Because presumably you have test coverage over those other models or those other components that it’s interacting with.

[0:05:16.1] JD: Yeah, that’s true but on the other hand, what can happen and what I believe happens much more frequently, is your refactoring and you change some meaning but like things still work, right?

Maybe you’re not dispatching show modal but the system is still doing the correct thing. The mock fails because you're not – because mocks typically sort of say, ‘assert the code that the wrote is the code that I wrote’.

[0:05:46.6] DA: Vintage bookkeeping.

[0:05:49.4] WJ: It’s tautological.

[0:05:50.4] DA: Classical. Invented in Venice in the, you know, renaissance period.

[0:05:56.3] JD: What?

[0:05:58.0] WJ: Vintage bookkeeping, fantastic invention. You know, for accounting and maybe a code.

[0:06:05.1] MN: You propose that people not use shallow and use melt or render when they do react tests? I guess that’s a question because to my knowledge, shallow is just faster, right? To test.

[0:06:14.8] WJ: Significantly.

[0:06:16.3] JD: Yeah, I actually, I don’t know the answer, I know that one of our co-workers, Simon Chen has mentioned in on our Slack channels, a way that he’s done it which I assume uses mounting but I haven’t explored it.

Where I do know, I think that it would be prohibitively slow if you mounted everything. Because I think you’re right. I think it’s like way slower, it might even take like a matter of seconds for each test.

[0:06:43.2] MN: Then you don’t want to have multiple asserts to a test when you mount, you may lead to do that which may lead to bad testing practices and that’s kind of stuff. I mean, I don’t know, I often find myself using shallow but I think you brought up a good point that I should be testing what that button is actually doing, not that I simulate click and something is happening. That have to depend on other things to ensure it’s working.

[0:07:09.7] JD: Well, I mean, just speaking of react from Redux, like mounting’s actually a little bit hard, you have to solve a thing or two because you need to have like a provider around your – you can’t just –

[0:07:20.2] DA: Mount.

[0:07:21.5] JD: Yeah, it makes things a little bit harder which I’m sure it’s surmountable but teams would need to go to –

[0:07:27.0] MN: Pun intended?

[0:07:28.9] JD: Nice.

[0:07:29.0] WJ: Mountable.

[0:07:30.3] JD: Yeah, I did that on purpose. Yeah, I mean, I actually thought of a kind of an analogy for how I see that type of testing versus writing tests that assert more meaning than just what a mock does. If you’re like thinking of language and you have a sentence like this sentence is false. On one end, your test just says, literally, this sentence is false and if you change any word, the test breaks.

[0:08:01.0] MN: Right.

[0:08:01.5] JD: It’s not a terribly useful task, the back to the double accounting, you’re just saying, I wrote literally what I wrote. Versus – there’s kind of multiple levels of meaning going on there. Obviously, there’s the sentence speaking about itself and the contradiction and there is just like this sentiment of this words together kind of mean this thing. Where if you were to change your sentence to like, this sentence is definitely false.

The meaning is the same and so you want your test to still pass. You still - you want to be able to make changes to your code easily without everything failing because it’s not written exactly as you wrote it.

[0:08:45.7] DA: Like having [inaudible] in such a way where your code is guided by the test but not constricted by it.

[0:08:54.2] JD: Yes. In general, I want – what I want more than anything is I want to be able to refactor willy-nilly, have confidence and not break all the tests. Because that will happen. If you try and if you go to a react Redux code base and you make a lot of changes, even if your app works still perfectly, you’re going to need to rewrite a lot of tests.

[0:09:17.3] MN: Yeah.

[0:09:17.5] JD: If you’re like moving like functionality from one component to another, extracting components, you’re going to need to rewrite all your tests and I think that’s kind of a shame. Part of the promise of tests is easy refactoring.

[0:09:37.9] MN: What I find end up happening is that people get frustrated because they wrote test but then they have to change these components and they’re like – just write the least – write a smoke test and then we’ll figure it out from there and that causes problems all around but I think like react kind of leads to ensuring that you're right as confident test is possible, knowing that they’re going to probably break when you have to refactor the implementation.

Besides time as I mentioned before, where you know, using shallow rendering is the optimal thing to do when you want to unit test. Do you know any other times where like, mocking is the ultimate way to go, I’m going to say?

[0:10:18.2] JD: Yeah, I think that’s a good question. I mean, I thought about that and I think part of how that breaks down is what you’re considering a unit. It certainly – you know, a function is a very easy place to be like, this is a unit, react component, a class, all easy things to consider a unit and how deep that code will go sort of factors in to what makes sense as a unit.

Because I think there is a point where if you don’t mock anything out, then you’re basically dealing with an end to end task and the combinatorial – like amount of things that can happen in that code flow have burst up to millions and now that’s a problem.

[0:11:07.9] DA: Yeah, your test might fail for completely unrelated reasons like you are back to the fresh rating aspect that you talked about earlier where if you are mocking too heavily and you try to refactor you are going to break a bunch of things. The same thing that happened if your selectors are brittle or what have you, you can still run into the same problem with a broader scope test.

[0:11:30.8] JD: Yeah, I don’t have a solid answer there but I do believe it is part of it like you can imagine back when people wrote classes more often that had private methods, which I guess actually just personally I’ve been in working languages that don’t have private. They would say okay, you don’t test your private methods partly because unless you are using Ruby you can’t and so you task your public interface and on a set of code behind there and that piece of code does a thing.

It is sort of finding a good set of logical stuff that isn’t too large because then you get into too many possibilities. But also isn’t too small. I think if you are obsessively only testing the smaller things. The other thing you will see happen a lot is people write all of their unit tests and they’re burned out. They just spent so much time writing test and they’re tired and so they don’t really write any integration-y test which I think is a shame because I think integration tests catch a lot of bugs.

And that is like you talk about the testing period but usually the top layers of the test sort of software of people are always writing all the unit tests.

[0:12:47.3] DA: Right, I know that is something that we’re thinking about right now in the project I am on. Where we want to explore the top of the pyramid a little bit more but we’re spending a lot of time writing a lot of unit tests and it is a question of what do we value and how do we make that tradeoff. Like what is the right mix. Is it a pyramid shape? Maybe.

[0:13:12.7] WJ: I think it is easier if you start at the top of the pyramid and you write one happy path end to end test and let that drive maybe a couple of integration test which drive a ton of unit tests. And if you start at the end to end level then you know you are going to get there and I think the only thing that makes people resistant to that is that it leaves an open loop open for a while. Because that end to end test is going to be read until the feature is done.

And I think that is a good thing. I think that you want a little bookmark that says, “Hey you know we got to come back to this. This is not done yet.” And that shouldn’t go green until the feature is actually done but it is a little bit stressful to have a failing test.

[0:13:54.8] DA: Right, I think it is also taking the discipline to actually write the test in a meaningful way before it is actually - the feature is completed or before you know entirely what it is going to look like.

[0:14:05.9] JD: That is sort of Classical TDD though. Anytime you’re like, “Oh okay I haven’t written this feature yet so what does this test say?” It does take an extra level.

[0:14:14.6] DA: That is like something that is a little bit scary for people.

[0:14:17.9] WJ: Yeah that’s true. Although I think it is easier for end to end test than it is for unit tests. Because with the unit test you really have to think, “What kind of an interface am I going to design for this?” Whereas for an end to end test you probably have a mock like some mock up that a designer gave you and you know, “Oh there is going to be a button and it says submit so I don’t know as a placeholder I’ll just say like, click the submit button.”

[0:14:42.4] JD: The book, GOOS, ‘growing off object oriented software through tests. 

[0:14:49.1] DA: Guided by tests.

[0:14:49.5] JD: Yeah, it’s guided by tests.

[0:14:51.3] MN: The goose is loose.

[0:14:52.3] JD: The goose is loose, yeah. They advocate that same approach where you write your end to end task first and then I think even in integration tasks which is also another part of the pyramid that often gets forgotten followed by the low level like now, I am actually implementing code with unit tests and that book was actually written by I think some of the people that pioneered mocks so maybe they have it right all along.

[0:15:17.7] WJ: Yeah, it feels like integration test basically just get forgotten or they get confused with end to end test of the two terms becomes synonymous.

[0:15:25.1] MN: Yeah I think it is the latter as you mentioned.

[0:15:27.1] JD: It is tricky. You know if you write all of those tasks, a lot of testing.

[0:15:31.7] MN: Oh yeah.

[0:15:32.3] JD: It is a lot of testing. It is definitely going to spend more time than you are writing production code.

[0:15:36.5] WJ: Yeah for sure.

[0:15:37.3] MN: And it is definitely the way that you are testing is also different so that could be a little exhausting as well. Where like it’s unit test you are thinking about the particular changes that may happen within the unit but the end to end covers like everything. So to start from the top of the pyramid as you all mentioned on your way down can be exhausting.

[0:15:59.1] WJ: Yeah and it’s tough in there, other people in your team who are shipping code with note tests and they get their tickets done way faster.

[0:16:06.9] DA: Those saviors, the rock stars.

[0:16:11.3] WJ: The cowboy-ninjas.

[0:16:12.9] DA: Cowboy-ninja. Yeah, space pirate.

[0:16:17.3] MN: So do you, as you mentioned before, recommendation that covers the testing paradigm that we mentioned is GOOS as Jacob has called out. If you haven’t read it, it is a pretty solid book. I will recommend it myself.

[0:16:28.1] DA: Object growing to software guided by tests, yeah.

[0:16:31.6] JD: Yeah why is it goose?

[0:16:32.6] DA: Goosey.

[0:16:33.5] MN: I think it is yeah, goose. The goose, every time I hear the goose I got to say the goose is loose. Any thought on spies over mocks?

[0:16:46.5] JD: Spies versus mocks, that is – I am trying to actually think of what the distinction is because these days they blur together a lot of libraries - okay I think the traditional difference is that spies don’t stud out the behavior where mocks do. So a spy like you can say how many times are called but whatever the initial implementation is still happens.

[0:17:08.4] DA: So it just passes through?

[0:17:09.7] JD: Yeah. I guess I feel like if you are going to do it do it, you know mock that puppy out. You know if you want to be like, yeah I think if you want to leave the initial implementation in there then maybe it is better to have your assertion around something that happened because of that method call. Where you want to test everything right? Because that method call would mean a hundred different things happened and certainly you don’t want –

Because then you’ve got this rigid tests where you change one thing and your whole task suite fails but maybe you pick one thing that that method does that is sort of at its core what it’s about like something in that. There is an art there about picking what that is and asserting that that happened.

[0:17:58.7] MN: So if you’re going to do it, just do it. Go full mock.

[0:18:01.6] JD: I think so but it is actually a lot. I think a lot of testing frameworks these days they use the words interchangeably.

[0:18:09.0] DA: Yeah, like your mock will have the ability to spy.

[0:18:13.2] JD: Yeah or you know your spies stub out the behavior or - I love a good spy.

[0:18:19.7] MN: You love a good spy, yeah.

[0:18:20.9] JD: I’m not going to think about that I love spies.

[0:18:24.5] DA: All right so the title of the article that you were talking about before is ‘Mocks Aren’t Stubs’. So we’re talking about mocks and spies but what is a stub?

[0:18:34.7] JD: Man, what is a stub? That is actually – man there is too much terminology. I think the stub is actually when you stub out the thing. Which leaves the question what is the mock. I think maybe a mock stubs and records behavior. There is too much differentiation, like tiny little differences that people don’t really use in their everyday speak. But what I certainly do know is that I just hate Steven Nunez that. It is one of the worst programmers I have ever met.

[0:19:08.6] MN: I’ll be sure to relay that message to him along with the stubs and the mocks and the spies.

[0:19:13.3] JD: If you are listening Steven, I hate you.

[0:19:17.9] WJ: Do we have a Martin Fowler official definition of the stub?

[0:19:22.3] JD: I think we do. He’s even got dummy in there oh like Steven Nunez.

[0:19:28.6] DA: Fake?

[0:19:29.2] JD: That’s – my God this is good about Steven.

[0:19:34.9] DA: So stubs provide candidates there’s the calls made during the test, usually not responding at all to anything outside of what they’re programmed for in the desk. So your stub would expect to be called the five and respond with six. But he called it with four.

[0:19:50.4] MN: There you go, definitions. We’ll be sure to drop that link I guess in the shownotes and let our listeners have at it with the spies and the stubs and the fakes and the dummies and the what have you.

[0:20:05.1] WJ: Thank you to papa Marty for explaining all of them.

[0:20:07.5] DA: There you go, yeah if you enjoyed listening to this podcast you will enjoy reading this incredibly long blog as well.

[0:20:16.6] MN: Jacob always a pleasure having you on.

[0:20:19.6] JD: Thank you guys, it was a lot of fun.

[END OF EPISODE]

[0:20:21.6] MN: Follow us now on Twitter @radiofreerabbit so we can keep the conversation going. Like what you hear? Give us a five star review and help developers just like you find their way into The Rabbit Hole and never miss an episode, subscribe now however you listen to your favorite podcast. On behalf of our producer extraordinaire, William Jeffries and my amazing co-host, Dave Anderson and me, your host, Michael Nunez, thanks for listening to The Rabbit Hole.

Links and Resources:

The Rabbit Hole on Twitter

Jacob O’Donnell

Martin Fowler

Mocks Aren't Stubs