Understanding Automated Testing

Introduction

Hello there! Today’s talk is about a complicated topic that is often misunderstood by mid to senior developers. We’re going to talk about Automated Testing. I’ve struggled to understand its value and purpose, for almost my entire career. If you feel like you’re lost when it comes to testing, don’t worry, you’re not alone. Of course, these topics are complex by their nature, and it’s not easy to make sense of them, immediately. But I don’t think that’s the main issue. The actual problem comes from learning from people who don’t understand it. The best available testing resources are very technical and hard to digest. Therefore, companies and developers apply automated testing without understanding it. They’re stuck with technical terms and bad examples. Well, it doesn’t have to be that complicated if done right.

In our “Understand Domain-Driven Design” series, I briefly explained the importance of behavior testing. In this series, we’ll delve deeper into descriptions, examples, code illustrations, and best practices. We’re going to question what we know about testing and understand more.

Shout out to my colleague Sebastien Garcia, for his efforts in teaching automated testing. He inspired me to write this article!

What is Automated Testing?

Automated testing is proof of your software’s expected behavior. It’s a documentation where you explain your instructions leads to an expected outcome. Tests can either pass or fail and if a test case fails, it means that your code is not producing the expected result. Properly designed test cases can help pinpoint the underlying issues. On the contrary, poorly designed test cases can hide potential issues.

Can’t we just test our software manually? Yes, we can and manual testing has its place, such as User Acceptance Testing (UAT) or User Interface (UI) testing. We ensure everything looks and functions fine. But this type of testing is time-consuming and expensive. It’s also almost impossible to ensure the stability of the system being tested. It stays a bit on the surface. Especially in large enterprise projects with dozens of use cases if not hundreds. The use cases are complicated and they require many different steps. You might argue “Well, test cases are created by developers who can make mistakes”. Yes, they can. Our goal is to understand why those mistakes happen and how to reduce their frequency. To do this, we need to answer some fundamental questions about testing.

What’s The Purpose Of Testing?

As I explained in the Understanding DDD series, programming is about solving real-world problems. That’s why we focus on Use Cases rather than technical terms. We want to get feedback fast when we’re implementing those use cases. Getting that feedback from users is pretty expensive and mostly, a terrible User Experience (UX). Imagine you go to a gym, and the gym equipment works only half the time or sometimes breaks while you’re using it, or it’s already broken. Would you want to go to that gym again? The same goes for the software products. If your product fails to meet expectations, your users will be frustrated and look for alternatives. We want to avoid that. We want to get that feedback as early as in the development stage. Think about times when your work was stressful. The stress usually comes from uncertainty and insecurity about your or your company’s development approach. The purpose of automated testing is to eliminate that uncertainty and insecurity. We want to solve our problems confidently. We also want to bring better UX to the table.

No Value Equals To No Tests

Early in my career, I couldn’t quite grasp the idea of automated testing. I joined a company where people practiced automated testing. I was pretty happy to learn. I quickly started to read the existing tests. There were tests for basic ValueObjects (VO) without behavior, DataTransferObjects (DTO), or configurations. Tests with many mocks and assertions only validated the mocks’ output that we previously configured. There were a lot of assertions about every possible detail. If we were testing a database query, we wouldn’t only assert against the result but also against the underlying SQL string and the row count. Testing felt like a burden. It didn’t add any confidence, and I didn’t feel great writing them. I mostly didn’t even read testing-related code when I reviewed merge/pull requests. Because it was insanely complicated, every time. I didn’t understand why we created those cases, but of course, I followed them and created similar cases. I thought that was testing.

The truth is nobody knew what they were doing. This type of testing doesn’t bring any value for developers or users, therefore, no value for the business. You don’t need to write tests for every little detail. It’s redundant sometimes. You don’t need to write a unit test for a DTO that will be already validated when you validate the behavior of the higher-level components. Also, even for higher-level components, you don’t need to test against the implementation details such as SQL string. That’s not really important. This detail will be already validated when you test against the expected outcome. The result is important. It should produce what we would expect when we run that query. If your SQL string is wrong in the first place, you won’t get the desired outcome anyway. You don’t need to assert specifically against the SQL string.

Don’t Overlook Small Functions

There is something else about testing value. Sometimes it feels like the tests don’t add value when the functionality is pretty small. That’s not always the case. The real value of testing lies in catching potential issues before they make their way into production and are discovered by users. Although, complicated solutions indeed solve complicated problems, and their tests bring more value. But even a small function can benefit from testing. Let’s imagine the following JavaScript function:

function sum(a, b) {
   return a + b;
}

I love this example because it’s so simple: the function is supposed to add two numbers. It looks like we don’t really need to validate its behavior since it’s obvious, right? Well, it’s not the case. If you input a string instead of a number, the result will be completely different from entering two numbers. Even though the function is tiny, there can be hidden bugs.

What’s Test-Driven Development (TDD)?

Test-Driven Development (TDD) is a programming technique. It helps us create a small feedback window. It allows us to verify if our scenarios lead to the expected outcomes at the earliest stage. It’s not only about writing the tests first or covering every line with tests. It’s about getting fast feedback on your design. That’s why TDD is both a testing and design approach. Initially, you define a list of scenarios without implementation details. You build those scenarios one after the other and anticipate possible problems with your design, then refactor your design until the test case passes. Repeat the steps until you cover all the scenarios–cases. This approach is often misunderstood, even by experienced folks. TDD is about understanding what your system is supposed to do and validating its behavior in tiny steps. It’s a specification technique where you verify all your assumptions.

What’s Arrange-Act-Assert (AAA) Approach?

The Arrange-Act-Assert (AAA) or 3A is a testing technique that focuses on isolating the test’s state and expected outcome. Each “A” represents a different step. It’s similar to the Given-When-Then approach.

Arrange refers to the initial setup of your case. It could be something like “I need a repository to save a post” or “I need an author to create a post”. Essentially, it’s about the requirements to execute the task.
Act involves an action or set of actions, such as login as a user and dispatching a command.
Assert means comparing the expected outcome to the actual result.

I prefer the AAA approach because it helps me focus on the behavior rather than implementations. Therefore, reduces the implementation coupling in testing. It also helps formulate the test cases easily.

Types of Software Testing

Software systems are complex structures. There are different environments, layers, and building blocks. In an ideal world, we would have unit tests for every building block to validate their behavior, ensuring that all units are verified individually and work when they’re combined. However, that’s not the reality. The reality is we use open-source libraries and configurations from other sources. Even though you might have a perfect unit testing suite, it’s hard to ensure the integration of different components. Imagine you’re using a database or framework. How can you ensure that integration works as intended by only unit testing your blocks? You can’t. Therefore, you need a different type of validation–a testing type where you can boot up the framework configuration and test that configuration altogether. Sometimes, you want to test your system from a specific protocol and need a client to perform that test.

Testing types are mainly categorized as White-box testing, Black-box testing, and Gray-box testing.

White-box Testing

White-box testing is mainly concerned with single-unit. The tester has access to the internal design. This type of testing focuses on individual components to ensure they function correctly in isolation. It’s like validating individual computer hardware without assembling the whole computer.

Black-box Testing

Black-box testers only have access to the inputs and outputs of the application, not to the internal design. This type of testing ensures that the application behaves as expected for the end user and meets its requirements. Functional testing and end-to-end testing are some of the most common types of Black-box testing.

Gray-box Testing

Gray-box testing combines both White-box testing and Black-box testing. It partially knows about the internal design. It validates the outcome using that knowledge. One of the most known forms of Gray-box testing is Integration testing. For example, a tester might have access to the application container and focus on the interaction between the dispatched command and other components. Another example is when a tester knows about the database and focuses on specific data interactions between different parts.

Differences Between Testing Types

All types of testing are validating the behavior. They validate it from different perspectives. The differences come from how we build our test cases and what we want to validate about our design. All tests should be isolated and use shared fixtures only for static data like postal codes, countries, currencies, device types (e.g., desktop, tablet, mobile), file types, user roles (e.g., admin, editor, viewer), etc. Be cautious about overusing shared fixtures, as it might lead to unintended dependencies between tests.

Let’s create a functional test case to understand what Black-box testing means. We’ll also see examples from other types. The examples will be simplified versions for demo purposes.

Black-box Testing: Functional Test Case

#[Test]  
public function it_books_flight_ticket(): void  
{  
    // Arrange  
    $departureDate = '2024-02-17';  
    $returnDate = '2024-02-17';  
    $origin = ['airportId' => 'BER', 'city' => 'Berlin'];  
    $destination = ['airportId' => 'ADB', 'city' => 'İzmir'];  

    self::createAvailableFlightsBetween(
        $departureDate,
        $returnDate,
        $origin,
        $destination,
    );  

    $client = new HttpClient();  
    $request = new Request(
        url: self::BOOK_FLIGHT_ENDPOINT,  
        method: RequestMethod::POST,  
        body: [  
            'origin' => $origin,  
            'destination' => $destination,  
            'depart' => $departureDate,  
            'return' => $returnDate,  
        ],  
    );  

    // Act  
    self::loginAsCustomer();
    $actualResponse = $client->send($request);  

    // Assert  
    $expectedResponse = new Response(  
        ['message' => 'BOOKING_SUCCESS', 'context' => [...]],  
        200,  
    );  
    self::assertEquals($expectedResponse, $actualResponse);  
}

In this example above, we don’t know how the internals work. We act as a client, sending an input and receiving output.

Gray-box Testing: Integration Test Case

On the other hand, Gray-box testers know about the internal design. For example, they can access the application container. This type of testing ensures that the application is functioning at the code level and identifies any bugs or issues that may not be visible from the client’s perspective. Imagine a scenario where you want to ensure a specific notification is triggered when a new account is created.

#[Test]  
public function it_sends_new_account_notification_on_account_creation(): void  
{  
    // Arrange
    $role = self::sharedFixtures()->defaultRole();

    $command = new CreateAccount(  
        username: 'SomeCoolUsername',  
        password: 'Som#3Pa$$%word',
        role: $role,  
    );

    // Decorate the NotificationDispatcher, so we can trace the dispatched notifications.  
    $spyNotificationDispatcher = new TraceableNotificationDispatcher(  
        self::getService(NotificationDispatcher::class),  
    );  
    // Replace the actual service with the decorated one, in the application container.
    self::setService(NotificationDispatcher::class, $spyNotificationDispatcher);  

    // Act
    self::dispatchCommand($command);

    // Assert  
    $expectedNotification = new AccountCreatedNotification(username: 'SomeCoolUsername');
    self::assertTrue($spyNotificationDispatcher->hasNotificationBeenDispatched($expectedNotification));  
}

Here, we have access to the application container. Instead of validating the created account, we focus on the secondary behavior, the correct notification dispatch through integration.

White-box Testing: Unit Test Case

This type of testing will focus on single-unit behavior. We provide test doubles, such as Stub, Spy, or InMemoryRepository, when we rely on other dependencies. Because we don’t focus on integrations in this type of testing. It helps us verify the behavior of the given unit in isolation. We don’t need a container, framework, or anything else that you would need to run an application. That’s why unit testers are pretty fast.

#[Test]  
public function it_creates_post(): void  
{  
    // Arrange
    $postRepository = new InMemoryPostRepository();  

    $time = new DateTimeImmutable();  
    $frozenClock = new StubClock($frozenTime);

    $author = Author::create(  
        email: 'test@example.com',  
        name: 'Test User',  
    );

    $useCase = new CreatePost(  
        $postRepository,  
        new SpyEventDispatcher(),  
        $frozenClock,
    );  

    // Act  
    $useCase->create($author, 'Post Title');  

    // Assert
    $actualPost = $postRepository->getLastCreatedPost();  

    $expectedPost = Post::createWithId(  
        id: $actualPost->id,  
        author: $author,  
        title: 'Testing Title',  
        createdAt: $time,  
    );  

    self::assertEquals($expectedPost, $actualPost);  
}

What Would Be Considered a Bad Test?

Imagine, you have a console command that converts CSV files into database entries.

#[AsCommand(name: 'app:migrate-users')]  
final class StoreCsvFileCommand extends Command  
{  
    protected function execute(InputInterface $input, OutputInterface $output): int  
    {
        $entry = $this->csvToDbConverter->convert($input->getArgument('csvPath'));
        $this->bookingRepository->save($entry);

        $output->writeln('CSV content successfully converted!');

        return Command::SUCCESS;
    } 
}

Typically, if you don’t understand the behavior’s importance, the test becomes an implementation test.

#[Test]
public function test_execute(): void ❌ (A)
{
    $applicationContainer = // ...;  
    $command = $applicationContainer->get(StoreCsvFileCommand::class); 
    $commandTester = new CommandTester($command);  
    $commandTester->execute(['csvPath' => 'someFile.csv']);

    // A private assertion method to validate command name
    self::assertLastExecutedCommandName('app:migrate-users'); ❌ (B)

    $actualOutput = $commandTester->getDisplay();  
    $expectedOutput = 'CSV content successfully converted!';  
    self::assertSame($expectedOutput, $actualOutput); ❌ (C)
}

Let’s review this test case together. I’ve already marked some problems with the ❌:

A: The name doesn’t provide any information about the use case and its behavior.
B: We are already retrieving the command by its class name. We don’t care about its configuration name. The configuration name doesn’t validate anything about the command’s behavior.
C: The output text of the command is irrelevant when it comes to its behavior. The behavior is to store CSV entries in the database, not to output a particular text.

How could we change this test case?

#[Test]
public function it_converts_csv_file_into_database_entries(): void ✅ (A)
{
    $applicationContainer = // ...;  
    $command = $applicationContainer->get(StoreCsvFileCommand::class);
    $commandTester = new CommandTester($command);  
    $commandTester->execute(['csvPath' => 'someFile.csv']); 

    $expectedEntries = [...]; // Array of expected entries
    $actualEntries = $applicationContainer->get(ImportedBookingRepository::class)->findAll();
    self::assertEquals($expectedEntries, $actualEntries); ✅ (C)
}

A: We added a name that describes what the use case is about.
B: The command name is not that important. It doesn’t impact the behavior; it’s a configuration detail. Once you validate this command’s behavior, you already know you’re testing the correct command, implicitly confirming the command name.
C: We asserted against the expected outcome, not the console output. Once again, we can change this output message at any time without affecting the use case itself. It’s an implementation detail.

Conclusion

I’ll list some of the principles of effective testing, I regularly use. I follow them with rare exceptions.

Test behavior, not implementation details.
Test abstractions, not implementations.
Test public API, what a unit exposes to outside.
Don’t test private methods, they’re implementation details.
Isolate all tests with some shared fixtures such as postal codes, countries, and currencies.

Great job finishing the article! I know it’s a lot of information to digest, but it gets easier once you start applying it. We’ve learned about Automated Testing, the purpose of testing, Test-Driven Development (TDD), and the importance of behavior testing. Feel free to share your thoughts in the comments! I hope to publish every week on various software-related topics. I’m pretty sure I’ll share a lot more about testing!

If you enjoyed this article, let’s connect on https://twitter.com/akmandev and https://www.linkedin.com/in/ozanakman for more content like this!

Ozan Akman