Avoiding APIpocalypse; API Resiliency Testing FTW!

Presenter: Naresh Jain
Event: Selenium Conf 2024
Location: Online

Presentation summary

As we build more complex distributed Applications, the resilience of APIs can be the linchpin of application reliability and user satisfaction. This talk will delve into practical tools and techniques used to enhance the resilience of APIs. We will explore how we utilise API specifications for simulating various input data, network conditions and failure modes to test how well the API handles unexpected situations.

 

Our session will begin with an overview of API resilience—why it matters, and what it means to build robust APIs that gracefully handles flaky dependencies in real-world operations. We’ll discuss the role of contract testing in achieving resilience and how to turn API specifications into executable contracts that can be continuously validated.

 

Following the introduction, we will dive into hands-on strategies for implementing these techniques into your day-to-day development and testing workflow. This includes setting up practices to run resilience tests, such as testing endpoints for handling latency, errors, and abrupt disconnections. We’ll provide examples of how to configure tools to simulate these conditions, interpret test results, and iteratively improve API designs.

 

Additionally, the session will also cover how to integrate with CI/CD pipelines and practices to foster better collaboration between Architect, Developers, QA engineers, and DevOps stakeholders through shared understanding and executable documentation.

 

To wrap up, we will discuss best practices for scaling API testing strategies across larger projects and teams, ensuring that resilience testing becomes a cornerstone of your API strategy rather than an afterthought.

 

This talk is designed for Architect, Tech Lead, Software developers, QA Engineers, and DevOps Engineers, who are keen on enhancing API resilience. Attendees will leave with actionable insights and tools to implement robust API testing strategies that can withstand the pressures of real-world usage. Join us to learn how you can transform your API testing approach and help build systems that last.

Share

Transcript

Hey guys. Welcome everyone. We have Naresh Jain with us for the today’s topic that is avoiding the API callups or an API resilience testing. Over to you Naresh.

Naresh Jain:
Thanks Shallabh for the wonderful introduction. Thanks everyone for joining in. So cool in terms of, you know, I would like to keep this interactive and make sure that you’re able to get the most out of this session. So please do, you know, keep your chat windows open. I’m going to keep asking you for questions and hoping that you can put comments in the chat window and that’s how we will be able to interact. Also, I think you would be able to show hands when I ask questions and that would give me feedback if everyone’s following along with me. So cool. Fancy background generated by chat GPT that’s kind of become a must have for every conference talk these days.

Naresh Jain:
So that’s what I have to do. But this talk is really about my experience of building really resilient APIs that work at really large scale. And so this is kind of some of the background around how to make that happen in your own organization. So with that, I’m going to quickly jump in. I’m going to take a quick example here to just explain the architecture of an application, and then we will have a little bit of a questions in terms of how we will make these things resilient. Okay, so I’m going to take again something that all of you must be very familiar. You have an app, the app makes a request to a backend for front end. This backend for front end may depend on one or more actual domain services.

Naresh Jain:
So it makes a request to the domain service, it gets a response back from the domain service, it does some business logic, and then it basically posts a message on a Kafka topic so that an analytic service could pick up and do its thing and then get back the response to the application. Right? Like this is a very simple application, but there are a lot of things that actually can go wrong in this simple application. So I want you to think about from a resiliency point of view, let’s say I am thinking about this BFF layer over here. Let me quickly highlight that. So let’s say I have this BFF layer and I want to make sure that this BFF layer is very resilient. Subsequently, we will also make sure that this domain service is also very resilient. And finally, we also want to make sure that this Kafka guy is also very resilient. So these are all the pieces that technically could have some kind of a fault or some kind of a problem, and that could make the overall experience for the user, uh, who’s using the app, not so pleasant.

Naresh Jain:
Right. So we, uh, our, our job here, uh, you know, as quality engineers, is to make sure that the experience here on the app is seamless, which means the, uh, these various moving parts that we depend on are actually very resilient. Right. So in your chat windows, if you can quickly, uh, you know, put, what are the kinds of things that can actually go wrong? Of course, here, I’m just highlighting that these have API specifications and async API specifications as well. But what are the kinds of things, from a resiliency point of view, that you think can go wrong in this case, if you can just pop open your chat windows and put in there, it’ll be helpful. I see network latency. Network latency between these services could cause a problem. Certainly server could go down with server, there are many servers here, and any one of them could go down, and we don’t really seem to have a redundancy in place, at least in this simplistic diagram that I’ve shown you.

Naresh Jain:
So that’s certainly a good possibility. The response time and network latency, both of those could be certainly an issue. Kafka could choke the topics on Kafka. We could overwhelm Kafka and it may not be able to process, and then that might get choked too many requests at a time. So you could have a runaway success, and your app could be maybe downloaded by billions of people and they all try and hit this thing, and then there might be too many requests. That is basically causing a problem. There could be disk, the disk usage. When you’re trying to write logs or database or things like that, you may have disk issues.

Naresh Jain:
So very good. I mean, there are several such problems that, as you guys have highlighted, that can occur, that can cause problem from a resiliency point of view, for your services. So what kinds of testing would you now perform to make sure that your application is actually resilient? So a lot of you talked about the kinds of problems that can occur to, but now what kind of testing would you do to ensure that your service is resilient? Can you go ahead and put that in chat? Okay, so I see chaos testing. I see load testing. Very good. What else do you anticipate? What happened? If only Kafka down, then how do we resolve it? Yeah, I mean, but what kinds of testing would you do? I see NehA is saying contract testing, Jeetu is saying chaos and resiliency testing. Okay, so load testing, couple of chaos testing, contract testing, any other kinds of things, like something even more simpler that I think is important to make sure that we test. Okay, maybe let me jump ahead and get to it.

Naresh Jain:
So here’s, in my opinion, not an exhaustive list, but a kind of more pragmatic list that you would use on a day to day basis when you’re testing resiliency of your services. The first and the very simple one that I think a lot of us already practice is the negative functional testing, where you’re doing boundary value testing, where you’re doing equivalence partitioning, we’re doing invalid data types, schema, invalid validations, where you’re testing for format validations, underflow overflow kind of conditions and stuff like that. These are all in the realm of functional testing, but more negative functional testing. So this is like bread and butter in my opinion. We would do this very often. The little bit more than sophistication on top of that could be service dependency testing. And I think NehA already pointed out contract testing. But backward compatibility testing is also very important from a resiliency point of view.

Naresh Jain:
If a new version of your app or BFF API was released and that made a backward breaking change, then basically the apps will face an issue. So that also will make your service non resilient or non available. And a lot of people talked about chaos engineering and chaos testing in general. So under chaos engineering, you have several different kinds of testing. The first one is fault injection testing. So you may inject or induce a fault by bringing down a service. To see how resilient the services, you may want to do failover testing, which is basically you bring down, you know, if you have multiple pods, you bring down one of them and see if the traffic fails over to the other pods without causing a outage. Right.

Naresh Jain:
You may also want to test for like for example, recovery from a database. So you might want to, you know, try and restore from, you know, your database backup and see if that recovery testing is working fine. You may want to do partial failures in your network, and you may want to see if it’s still responding within the given SLA. From a response time perspective, you may do a lot of chaos engineering and related chaos testing in this context. Let’s move a little bit more. Some people already talked about performance testing in performance testing. Also, we have several different types of performance performance testing, starting with load testing, then stress testing, soak testing. You know, things perform well for a few hours, and then maybe a few hours later or a few days later, suddenly things start becoming slow and there may be memory leaking, memory leak issues.

Naresh Jain:
There may be file descriptor running out of file descriptor or disk or other kinds of things that may cause these kinds of issues. So soak testing becomes again very important from that perspective. Latency testing, concurrency testing. You know, what happens if the exact same user is logged in from two devices and is trying to make a request, which one gets honored? How does it work? So those kinds of concurrency testing and just bombarding the service with a lot of requests as well, but that is already covered under load testing. Of course, security testing is very important, from SQL injection to cross site scripting to unauthorized access to session expiry, whether the sessions are expiring correctly. You may then have a host of penetration testing or pen tests that you might do. You may want to do DDoS attacks to make sure your firewalls and other kinds of things are resilient and they are holding up things. And of course people may do vulnerability scans to try and find known vulnerabilities that they can exploit and so forth.

Naresh Jain:
So again, that’s important from a testing point of view. And finally, of course, in spite of all of this testing, you may still want to make sure that you test your observability model monitoring and alerting itself, that it does give you the alert at the right time, whether the right data is being visible on the dashboards, and whether you’re able to do deep tracing and other kinds of observability related practices. So all of these things, in my opinion, again, not an exhaustive list, but something that we use on a very regular basis at work, is what I would categorize under resiliency testing. Of course, this session unfortunately is only 45 minutes long, and I won’t be, be able to cover each and every topic in detail. But my hope is to cover the things on the left side and show you some actual demos of how we would go about doing things from a functional testing, negative functional testing, the dependency pieces, contract testing specifically, and then a little bit of fault injection related testing. So we will cover the ones on the left, the ones on the right. I think we might need a separate session, but I just wanted, from a completeness point of view, one or two, call these things out. Okay.

Naresh Jain:
So jumping in, right, like the first thing that I want to kind of tackle today is both the negative functional API testing and service dependency testing. And here I want to introduce you to the concept of contracts, API contracts and how you could actually leverage API contracts to be able to do to tick off these two boxes for you, right? But often people say hey, what is a contract? There is a lot of confusion in terms of what does an actual contract mean? So I’m going to take a quick minute and just kind of set the stage. So make sure that we’re all on the same page when it comes to what is a contract, right? So let’s imagine I want to evaluate this expression, right, 30.1 into 43.74 plus 22 divided by seven. So if you think about it, I might first make a request for 30.1 into. So I would evaluate the multiplication, then I would evaluate the division, and then I would take the results from both, and then I would add the true and that would give me the final answer, right? Like, so this is a very simple interaction. I can now represent this more like an API call that all of us can relate to. So this is, you know, I’m posting a message to slash calculator with a left hand side, a right hand side and an operator, and essentially doing the same for the three multiplication, division and addition. I’m hoping this is something everyone can relate to, right? So now from an API testing point of view, something that all of you are familiar with, from an API testing point of view, what are the kinds of tests one would consider doing on this, right? So I might send, you know, 30.1 and 43.74 multiply and I would expect a 200 response back with this kind of a result, right? That is what I would validate or assert in my test.

Naresh Jain:
Then of course I would also store the result because I want to add this later. I might also want to do negative numbers and see if negative numbers are being handled correctly or not. I should get back a negative result in this case. I might want to do a little bit of now we’re getting into data type validation and boundary conditions and things like that where I might want to send ABC and see what happens. Of course you should be getting 400 bad requests with saying invalid. Left hand side value, the response type, the HTTP response type, or rather HTTP status and the error message are all important to be validated in your tests. You may also want to play around with the operator itself. The operators are fixed and they are of specific type.

Naresh Jain:
But if you try and send something that’s completely invalid again, you should get a 400 response back with a valid reason for it. So these are all, I’m hoping kinds of things that one would perform. These are more on the functional negative, positive and negative side of things that one would kind of perform. In this case, of course you can chain these methods together and then do API workflow tests. But here, let’s just focus on the API test point of view. Now, coming back to the original question, what is the contract? I think someone mentioned that we can do contract testing. So what does a contract mean in this case? Right, there is a specific agreement between the consumer of the service and the provider of the service in terms of the data types, the schema, the API signatures, the possible response codes and so forth that has been agreed upon. That’s what is the API signature or the API contract, if you will.

Naresh Jain:
There is one very popular specification format for capturing this. It’s called OpenaPi specification. The older name for this is Swagger. A lot of you might be familiar with swagger. Openapi has been around for a while as well. So Openapi is one way in which you can basically say, hey, this is my OpenAPI 3.0 version of the specification and I have a path called slash calculator. It has a post method and it can have possible 200 responses. It can have 400 responses.

Naresh Jain:
And here is the request body that it can take, which is basically has a left hand side, a right hand side and an operator. So as you can see, all three are required. But you’ll also notice that the op, which is operation, is a type of enum which can be only these four possible values. Now this I would say is a pretty decent contract that basically would allow what the provider and the consumer have agreed upon to be captured pretty nicely in this document. Now once you have this, the advantage of this is that a lot of these tests that you’re looking at in terms of basically if I send two valid numbers in the left hand side and right hand side, and I pick one of the operations from the enum that is there, then I expect a 200 result back with positive, like with the positive number. Right. Similarly, I could generate these tests. I could also generate these tests.

Naresh Jain:
So what ends up happening is now from a resiliency testing point of view, and from a contract testing point of view, a lot of this can be taken care for you without you having to write all of these things. So that much feedback is shifted left to the developers and they can get this feedback right away in their ide. Right? So that’s kind of quickly just explaining what a contract test can do. Now, how does this, doing this help you improve your resiliency? So there’s two parts to it. One is now you don’t really need to keep sitting and manually verifying or writing automated API tests for things like these. These automated tests can be generated right into the, in the developer’s ide and they could get this feedback in terms of basically ensuring that they’re sticking through the contract. Right. And if things outside the contract is presented to them, then the provider can gracefully handle them.

Naresh Jain:
The provider can gracefully handle them on the API side and respond back with valid status codes and response messages. That reduces a number of things you have to worry from a resiliency point of view. Now you can take this even better and even further and try and do a lot of the boundary case conditions. You can do several other kinds of, including fault injection. And that’s kind of the next section. Once we understand this so far, what you will notice is in this specific example, what I explained to you. We’ve taken inspiration from a couple of different styles of testing, and I think it’s important to understand those different styles of testing that were at play. You will notice here that we try to intentionally introduce a, a variant of the request that was not a valid variant and we wanted to make sure that that came back with a 400 bad request.

Naresh Jain:
Right. So that’s kind of a little bit of what mutation testing is all about. Right. Quick show of hand, if you’re familiar with mutation testing is something you practice at your work. Okay, I see two people. That’s great. Perfect. So that’s fine.

Naresh Jain:
Don’t worry. I’m going to take a few minutes to explain what mutation testing is. I also have a sample example for you to kind of walk you through. But the idea is very simple. In fact, this is kind of a little bit, I think it’s worth like a little side tour here and talk about some of the genesis of this. Right? I don’t know if you guys remember back in the days when Agile was in its glory days and everyone was like, you know, agile and scrum everything. One of the things that became very caught like a wildfire and a lot of companies was basically having code coverage. Everything should have at least 70% code coverage in some organizations went even more crazier and they said 80% or 90%.

Naresh Jain:
And what ended up happening is a lot of organizations, engineers under pressure to basically ship things. But the criteria was you had to have 80 or 90% code coverage. What they started, ended up doing is they just started writing tests without having assert statements or without kind of doing things. And that would give you the code coverage, which is what the management wanted, but really didn’t get much benefit because the assertions were not good, the quality of the tests were not great, and so forth. People just wrote them to tick a box and move forward, right? And then the leadership woke up after some time and they’re like, hey, we, you know, we’ve invested so much, we’re getting people to write all these unit tests, and we have like 80% coverage, but still the bug leakage is very high. You know, we’re still not able to catch these words going wrong, right? And that’s where I remember going to a few organizations and saying, well, how do you know the quality of your code and the quality of your tests is good? And they said, of course it is good because the coverage is very high. I was like, coverage just tells you intentionally or accidentally what got covered. It doesn’t tell you anything about the quality of the test.

Naresh Jain:
And that’s kind of when mutation testing was introduced into a lot of organizations. So what mutation testing does is it takes your source code. It mutates your source code. It basically changes certain, like, for example, if there’s an and condition, it’ll make it, it’ll turn it or condition. It’ll, like, start try to mutate your code, right? And it’ll produce mutants of, of that particular code. And then it’ll take your test cases and run against the mutants. And it would expect that you should be able to kill all these mutants, you know, what does that mean? That basically means that the test should fail when you run against the mutant, which shows that the test is actually able to catch silly mistakes or able to catch problems in the code, right? And if the mutant survived, that basically means your tests are no good, right? Like, they’re not able to catch these mutants. These are intentionally introduced, you know, bugs or errors in the, in the code, and you would expect that your test should catch it.

Naresh Jain:
And ideally you should, when you run this mutation test, if you’re written high quality code and high quality tests, then all mutations will not survive and everything will be caught. That’s what will give you confidence that mutation testing, that is what is, you know, hi. Let’s look at this example. Okay? I have a very simple class of us. Let me show you the tests. What I’m doing here, essentially, is I’m basically, you know, creating durations, and I’m verifying that. It’s giving me the closest matching duration for a given set of tests. What we will do here is we’ll quickly run these tests and we will see what happens.

Naresh Jain:
It’s a very small piece of code. Has only about five tests. And at this point, you will see all five tests have passed. The code here is just trying to find you the closest matching duration for a given last seen time. In a lot of apps, you would see in WhatsApp, for example, you’ll see last seen 1 minute ago or 5 hours ago or five days ago. So it gives you like the closest matching last duration based on your last activity time. Right. That’s what this little program is doing.

Naresh Jain:
So now let me basically run this guy, okay, and let’s see what happens. I’m basically running mutation testing. I’m using a tool called pit for mutation testing. And then I’m also getting jacoco to produce micro report. All right, so it’s done its magic. I hope you can see the screen. So here it says, well, I have 100% branch coverage. I have 100% line coverage.

Naresh Jain:
So everything basically in this code is fully covered. So this should mean that essentially the code that I have written is very high quality, of course, and it’s covered, so there shouldn’t be any problem at all. And I should be able to ship this into production. Correct. I’ve got 100% line coverage. I’ve got 100% branch coverage. What could possibly go wrong? Well, let’s quickly look at, you know, what the mutation testing report says. You know, so mutation testing bit in this case generates a very similar report.

Naresh Jain:
And you would see that it’s saying, well, sure, you have full coverage on, in terms of line coverage, 100%, but your mutation coverage is actually only 17. Right? So out of six mutants that we produced, your test is able to only catch one of the mutants, which means your quality of the test is pretty poor. Right. So let’s look at what actually went on here. Okay, I’m going to zoom in this a little bit. And here it gives you the list of mutations that it performed, and it highlights the one that survived in red, which is bad News, and the one that it killed in, you know, in green, which is good news. So you’d see that replace long substitution subtraction with addition. Right.

Naresh Jain:
So there is a subtraction that it replaced with addition over here. And then it’s. It tried to see if, after making the change, ideally your test should have failed. Right. But in this case, my test did not fail. You know, it still continued to work. So what do you think went wrong? Okay, well, what went wrong is basically, let’s go look at the tests. You would see that, you know, by mistake, instead of saying assert equals, I’ve basically said assert not null.

Naresh Jain:
So basically, whatever came back, just make sure that it’s not null, right? And I’m not really asserting it should be 1 minute ago, which is what it should have done, but I’ve just asserted it’s not null. And while this might look silly, I can tell you there’s so many places where you would see tests written this way. So let’s basically try and fix this. I’m just going to uncomment this out. And then let’s basically run all our tests. Let’s make this assert equals, and then I’m going to do the same thing. Just run this. Notice that it’s pretty fast.

Naresh Jain:
It gives you the mutation, you know, coverage pretty quickly. And okay, branch coverage is of course still hundred. But now you’ll see the mutation score has also gone 100%, which means that all the different mutants that it produced, we were able to catch all those mutants and we were able to, our test was able to kill all those mutants. So this is one example that basically ensures that your code is very resilient. You’re intentionally introducing these mutants and trying to verify whether they break or not. So we are doing this more from a unit test point of view, but the concept applies across. The point of explaining this is to make sure that you understand the concept. That’s the first concept that we took inspiration from.

Naresh Jain:
There is a second concept that I’m going to introduce that we can take inspiration from. I’m hoping everyone’s clear with mutation testing. So I’m going to move ahead. I might run short of time. So the second concept that is important to understand is essentially property based testing. In property based testing, what we do is we basically define the system as a set of properties that it adheres to and then generate lots and lots of different data to make sure that under all those conditions, those properties are still held true by the tests, by the system. What does that mean? This seems a little bit too much mouthful, right? So let’s try and look at an example and see if that kind of makes sense. So let’s look at here.

Naresh Jain:
I’m going to try and increase the font a little bit. Maybe too much. Okay, so I have a simple sorting test, right? Maybe before that, let’s look at this test. This is a little bit more easier for people to groke. And so what I’m trying to do is I’m trying to reverse a list and I’m making, I want to make sure that my reverse method is actually, you know, doing what it’s supposed to do. So one way that a lot of you might be used to writing unit tests is something like this, right? So I would basically say, here’s a list, one, two, three. You know, this is the, my collection that I’m interested in testing and specifically the reverse method inside that. And then I would say reverse this and then assert that it contains exactly three to one and is equal to this order list.

Naresh Jain:
Exactly three to one after reversing. This is typically how one would write unit tests for testing things like this, but this may not be sufficient. Then you might think, okay, what happens if the list contains only one? Or what happens if the list is empty? Or what happens if the list has hundreds of element, like, do I need to sit and manually test each of these combinations? That might be too much to write all of these. So what ends up happening is there is a property that you can say that when I reverse the list twice, I should get back the original rest. That’s now a property of reverse. Reversing something twice gives you back the original thing. So that’s what we are trying to do here is we are trying to say basically reverse. Reverse should be equal to the original list.

Naresh Jain:
And the original list now could be an empty list, could be a single item list, could be a multi item list. And essentially if you run this test, then it would test for all these different combinations for you. I’m just going to quickly run this. This is really not showing you the power of property based testing yet, but this is just showing you how to think in terms of properties. This is a round trip test, we would say, where you’re basically adding something twice, you’re doing something twice to get back original thing. So it’s a round trip in that sense. And that’s one of the common ways in which people kind of generally do. Sorry, I ended up kicking a wrong one.

Naresh Jain:
So you should see four tests and all four tests passing. This is fine, but I still had to write all these different combinations myself. So can I do something even better than this? So let’s go look at a slightly different property test, which will kind of make this even more interesting for you, is I have a property that, you know, I should be able to sort a list, and after I sort the list, you know, everything should be in a sorted order. So what I’m doing is I’m giving an unsorted list. I’m saying sort ascending and then I’m verifying it asserted. Now, in this case, when I run this test, it’s going to go ahead and test it with a several different combinations over here. And then it will tell me that with all those combinations, your sorting is working fine. But for some reason, let’s say I ended up making a mistake in my algorithm.

Naresh Jain:
Let’s say I did something like this. That’s my main method that I’m sorting. Then what should happen? Of course, this one still passed because I’m not really checking for the zero condition. And you can see that this guy is warning me, right? But like, the mistake could be me actually putting here zero, which means they are equal. Okay, will this catch it? And certainly it did. What was the error message? So it says it tried 22 different combinations. It tried one of these combinations, and after sorting, you know, it got this result back versus it should have got minus 10 and one back. And it’s tried several different combinations for you.

Naresh Jain:
And finally, it kind of distills down to one example that can demonstrate where things went wrong. So again, notice here, unlike in the previous case, I didn’t write all those different combinations myself. I simply defined this as a property. And this allowed me to simply go ahead and generate several different combinations for you. For me. So I’m just going to fix this back and make sure that I leave the test in a working state. But that’s an example of a property based testing where we basically defining a property. In this case, in the previous case, it was a round trip.

Naresh Jain:
In this case, we are saying after ascending, this is how sorting should work. And this will then allow you basically generate a whole bunch of different examples and make sure under all kinds of conditions, your sorting is still working as expected. So this is like a second inspiration from testing and optimizing your testing point of view to make things more resilient. Now, how do these two make sense? So now let’s try and put all of these together, and then we should be able to quickly wrap up the session at the end of this. How much time do I have left? Shalab?

Speaker A:
We have 14 minutes left.

Naresh Jain:
14 minutes. Perfect.

Speaker A:
Yes.

Naresh Jain:
Awesome. Okay, so that is good. I am at the right time.

Speaker A:
Yeah.

Naresh Jain:
So now remember, we go back to this example, and I want to take some of these learnings that I showed you in terms of mutation testing, in terms of property based testing, and try and bring it back to a more real world examples where we are trying to test APIs and see how that can be leveraged. So let’s say my system under test is the BFF layer. This is what I’m interested in testing. These are my dependencies for the BFF. And I want to basically abstract the dependencies away, then the app itself is the test. But I don’t really want the app. I’m going to replace that with a, with a different thing that would basically test the BFF for me and it’ll stub out. I’m going to stub out the domain service and the Kafka piece so that I’m able to test under all different conditions and make sure that the BFF itself is resilient.

Naresh Jain:
Right. So by doing this, I would be able to inject faults because that’s under my control now. And see if the BF works fine. I might be able to give it all kinds of boundary case inputs and make sure that it performs correctly. But something has to guide us in terms of how to give all these boundary case tests and so forth. And that’s where, you know, we leverage an OpenAPI specification and we derive properties out of it. Right. So in, in the previous case, you saw that, you know, you had to write the property by hand, which is fine.

Naresh Jain:
But in this case, what we’re going to do is we’re going to take the open API specification or the Async API specification. We’re going to derive properties out of it at runtime and then generate tests on top of that to be able to test if your BFF is resilient or not. Okay, so let’s quickly jump into a live demo and see what happens. So I’m going to quickly go here to this example, and I’m going to run my contractor, and I’ve introduced an error intentionally here so that we can see what happens. Let me just show you the test that I have. Hopefully you can see this. This is the test that I have. So this is basically saying where the application is running, the host and Portland.

Naresh Jain:
I am basically saying, where are my stubs for the dependencies that I have and where is Kafka running? So I would be able to basically, and of course, real Kafka broker is not running. We’ve stubbed that out. We have a in memory stubbed out version of the Kafka broker, and then we start our application and then we have basically a teardown. That’s it. But where are the tests? I have a setup and I have a teardown. I don’t see any tests here. Okay, that’s interesting. So let’s see what happened here.

Naresh Jain:
Nothing should have run. Oh, because I’ve not written any tests. I expected nothing to run. But I see 90 tests have run here. You see 90 tests have run, 81 have passed and nine have failed. Where did all these tests come from? Let’s look at one of these tests which basically saying I’m testing a positive scenario on post slash products and I’m testing for a 201 success case. Okay, so what we will see here is our tests have made a post request to slash products and essentially it is sentence this payload as the input to the request and then it’s got a response back from the server saying 200, ok, and it’s given an id eight. And it’s basically said well if that’s the case then all of this looks good and I’m going to pass this test.

Naresh Jain:
And then it’s got another test where it’s kind of doing something very similar. IPhone book one. If you notice in the previous case we were doing iPhone Gadget 100, in this case we are doing iPhone book inventory one. And again we got a result back and everything looks good. So this guy is saying, well, I’ll pass this test. So it’s generated a whole bunch of these request payloads and it’s verified the response payload and made sure that that is as per the properties that we have derived from the specification. Where is our specification? Let’s look at our specification. Let’s look at this in this.

Naresh Jain:
So what you will see is this is my specification that I’ve opened, that is the BFF API specification. This is like I highlighted earlier, this is an open API specification. And essentially in this you would see there are three different parts, find available products, orders and slash products. Slash products has only one method, which is a post method and it can respond back with 201, 400 and 503. Okay, so this is really what, you know, this testing tool that we’re using here has gone and looked at and then it’s derived certain properties from it and basis that it’s generated a bunch of positive tests for us. But it didn’t just stop at a bunch of positive tests. You would also see that it’s generated a bunch of negative tests for us. So let’s look at one of the negative tests and see what happened here.

Naresh Jain:
Right. So again, we’re looking at post slash products and it’s made a request and notice here this time and the name also suggests here that request body name string mutated to now. Okay, so it’s intentionally mutating the name to null, but name itself, as you will see here in the request body, is a mandatory field and it’s a non nullable field. Okay? So it cannot be null. But from what we had learned from mutation testing is I’m actually producing a mutant of not the source code in this case, but a mutant of my request. And I’m basically sending that request and I’m seeing whether there are validations that the developer has implemented correctly or not. The API developer, have they implemented the correct validations for this or not? And sure enough, when I send a null, the servers responded back with a 400 bad request and it’s given me in the format, I’m expecting the error message. And it’s basically given a parse error saying you cannot instantiate this, which is, which is fine.

Naresh Jain:
I mean, maybe it’s a little too leaky abstraction at this point, which maybe could have summarized this a little better, but that’s fine. At least I’m happy that there is a validation in place. Right.

Shallabh Dixitt:
So what I think is running out of time. It’s 46 we have. Okay, yeah, sorry to interrupt. And we have one question from Neha, who like to take it up in between one question.

Naresh Jain:
Sure, sure. Yeah.

Shallabh Dixitt:
Neha says that sometimes the backend for front end, when the passing incorrect response and the correct data saved in the redis, then also they are getting the correct response from the Kafka itself. So how this can be identified.

Naresh Jain:
Yeah, so that’s a great question. When I’ll answer the second piece. First is essentially when a downstream dependency like Kafka or redis or whatever, right. Like that’s basically giving you back here because you are able to stub it up, you can have a control over what is going to give you. But if you’re using a real Kafka, then you don’t have control over it. And that’s generally the problem where you won’t be able to do all kinds of resiliency testing. But in this case you’re actually able to stub it out. And in your test you’d be able to set expectation that, like I showed you.

Naresh Jain:
Let me quickly jump here. There is, if I go in my tests resources, you will see there’s a whole bunch of stubs that have actually got generated. And in one of these stubs you will see I’ve intentionally put a delay of 5 seconds. Now I’m controlling that the domain service when it responds back. Or in this case, if the Kafka were to not create a topic, and I’m still putting something on that topic or things like that, I’m intentionally introducing a delay over here and I’m making sure that my service, when it times out, is handling it gracefully. So that’s, hopefully that’s kind of the answer. Is that don’t rely on the actual database or the actual service or things like that because you don’t have control over that. Instead, you would be able to induce this and you’d be able to stub it out and then induce the fault and have a control over that.

Naresh Jain:
And those can be controlled through your tests.

More to explore