I’ve been working on one of my personal C++ projects that deals with processing bitmaps. Recently I’ve been thinking about how I use unit tests when I develop software and in particular this project where the data is all about 2D maps and various algorithms.
In my younger days of writing code I’d just write “stuff that worked”. No tests, no version control. It got stuff done, but in the long run hard to maintain. When I was first introduced to writing unit tests it didn’t quite “click”. It felt slow and awkward. I felt it got in the way of “getting stuff done”.
These days I see the value of tests very differently. Not only how it helps long term maintenance, but also because it forces you to slow down and think before committing to writing the business logic. I like test driven development as a design tool to get a better feel of whether the interface makes sense. It also makes it clear when you have undesired dependencies.
The anatomy of tests
I’m a fan of very small tests. Atomic tests. Tests whose name will tell me exactly what failed. Tests that are easy to debug when stuff goes wrong. Tests where I can set a breakpoint early in the test and quickly step into the heart of the problem. It’s when tests fail that they are useful, not when they succeed. I don’t like debugging tests like this:
TEST(Length2DTest, Operations)
{
// Operator +
auto length = Length2D(1.2f) + Length2D(2.5f);
EXPECT_DOUBLE_EQ(3.7f, length.Value());
// Operator -
length = Length2D(2.5f) - Length2D(1.2f);
EXPECT_DOUBLE_EQ(1.3f, length.Value());
// Operator *
length = Length2D(1.2f) * Length2D(2.5f);
EXPECT_DOUBLE_EQ(3.0f, length.Value());
// Operator /
length = Length2D(6.4f) / Length2D(2.0f);
EXPECT_DOUBLE_EQ(3.2f, length.Value());
// Operator ==
EXPECT_EQ(Length2D(3.2f), Length2D(3.20001f));
// Operator !=
EXPECT_NE(Length2D(3.2f), Length2D(3.21f));
}
If any one of these expectations should fail the log of the test run won’t immediately tell you what operation failed. You do get a file and line number so you see it quickly when you look up the source, but that’s just half the story. Now you need to debug; set a break point and step through the code. It is particularly this scenario where these test-all-the-things tests falls short as you might want to set a break point deeper into the code. You end up with noise as the other things being tested in the test trigger your break point and you’ll then be carefully checking the call stack until you break when you want it. to or you set a break out higher up the call chain and then enable your while repeatedly hitting continue. This is getting better as IDEs are starting to support chained breakpoints, but it’s still churn.
Another issue is when you need to use ASSERT
instead of EXPECT
and the test terminates. That might hide failures in other functions in your test reports. Having that overview from your test automation system can be useful as it provides more context to the failure.
That is why I prefer tests to be looking like this, more atomic:
TEST(Length2DTest, OperatorPlus)
{
const auto length = Length2D(1.2f) + Length2D(2.5f);
EXPECT_DOUBLE_EQ(3.7f, length.Value());
}
TEST(Length2DTest, OperatorMinus)
{
const auto length = Length2D(2.5f) - Length2D(1.2f);
EXPECT_DOUBLE_EQ(1.3f, length.Value());
}
TEST(Length2DTest, OperatorMultiply)
{
const auto length = Length2D(1.2f) * Length2D(2.5f);
EXPECT_DOUBLE_EQ(3.0f, length.Value());
}
TEST(Length2DTest, OperatorDivide)
{
const auto length = Length2D(6.4f) / Length2D(2.0f);
EXPECT_DOUBLE_EQ(3.2f, length.Value());
}
TEST(Length2DTest, OperatorEqual)
{
EXPECT_EQ(Length2D(3.2f), Length2D(3.20001f));
}
TEST(Length2DTest, OperatorNotEqual)
{
EXPECT_NE(Length2D(3.2f), Length2D(3.21f));
}
These are simple tests for a simple type, so its not the worst example, but as complexity grows the value of atomic tests increase.
It’s exactly the same rationale for having small functions that do one thing. They are easier to reason about. And the headline-like comments in the first example are, like in any other function, a hint that something should be broken down into smaller units.
In the same vein of simplicity and single responsibility I prefer to write tests that doesn’t require separate setup or teardown. That’s often a source of slow and complex tests. It quickly adds slowness to your tests when setup creates everything for every tests, even though each test only need a fraction of it. When stepping through a test with the debugger you don’t want to guess which test case variables are relevant. Instead I often use helper functions in the test itself to setup the preconditions per test.
The extended purpose of tests
Debug friendly tests are something I feel is somewhat underrated. Often the focus is on getting coverage and a green build, ignoring that the tests are several hundred lines long and test everything in a single test. Good tests helps you understand the code and data you are working with. These are easier to step through a debugger should they fail. And using them in test driven development also makes them useful design tools.
Example: Analysing bitmaps
In my project where I’m tracing features from the content of bitmaps there is first a filtering step that compares the pixels against a threshold and returns a new set of data. This is represented as a ThresholdMap
type that dictates whether the pixel is within the threshold or not. Then each island, or cluster, of connected pixels are extracted into individual data units. Below is an example of a test for this logic:
TEST(TraceMapTest, FindClustersOnePolygon)
{
const auto map = ThresholdMapFromASCII({8, 6},
" "
" xxxx "
" xxxxx "
" xxx "
" "
" ");
const auto clusters = FindClusters(map, 0);
ASSERT_EQ(1, clusters.size());
// clang-format off
const std::vector<Point2D> expected_points{
{1, 1}, {2, 1}, {3, 1}, {4, 1},
{2, 2}, {3, 2}, {4, 2}, {5, 2}, {6, 2},
{3, 3}, {4, 3}, {5, 3}
};
// clang-format on
const auto& cluster = clusters.at(0);
ASSERT_EQ(expected_points.size(), cluster.Pixels().size());
ASSERT_TRUE(std::ranges::is_permutation(expected_points, cluster.Pixels()));
}
The setup here is done by a helper function that lets me create the ThresholdMap
using a small ASCII art string. I find this much easier and faster to create test data than typing raw array or vector of data. I could off course have created separate bitmaps, loaded those and passed them to the threshold logic, but I prefer to have full overview of the input data in the test logic. From experience I find that it quickly becomes difficult to keep track of which tests are using which set of test files. Its easy to break a test because you made a tweak to a file for another test. Inlining the data into the tests keeps it isolated and in context. This works well for smaller data sets, which are the type of data sets that are easiest to debug.
The ThresholdMapFromASCII
helper function helps to avoid pulling in the logic of loading a bitmap and parsing it. It just transforms the sequence of characters into the data set needed for ThresholdMap
.
Visualizing the test data
Before I created the ASCII helpers for this project the data was hard to reason about as it was laid out linearly in the source code. An example of this can be seen by the expected data set:
// clang-format off
const std::vector<Point2D> expected_points{
{1, 1}, {2, 1}, {3, 1}, {4, 1},
{2, 2}, {3, 2}, {4, 2}, {5, 2}, {6, 2},
{3, 3}, {4, 3}, {5, 3}
};
// clang-format on
Note that clang format was turned off to prevent every items to be in one consecutive series. The data here doesn’t have the exact shape of the expected data when viewed as a bitmap, but at least it’s easier to see which points relate to which row. Compare that to a pure linear representation:
const std::vector<Point2D> expected_points{{1, 1}, {2, 1}, {3, 1}, {4, 1},
{2, 2}, {3, 2}, {4, 2}, {5, 2},
{6, 2}, {3, 3}, {4, 3}, {5, 3}};
Same data, but a whole lot harder to reason about at a glance. It’s the kind of scenario where I’d end up sketching out the data on a piece of paper. If that test should fail I’d much rather see my data visualized in ways that are closer to what they represent.
Visualizing the test results
Once the clusters of features is extracted the pixels can be classified. The tests for this is similarly visual:
TEST(ClassifyTest, ClassifyPixelLShape)
{
const auto map = ThresholdMapFromASCII({8, 7},
" "
" xx "
" xx "
" xxxxx "
" xxxxx "
" xxxxx "
" ");
const auto expected = PixelTypeGridFromASCII({8, 7},
" "
" CC "
" EE "
" CEECE "
" EIIIE "
" CEEEC "
" ");
ASSERT_TRUE(IsExpectedClassification(map, expected));
}
Again I’m making use of test helpers that allows me to visualize the input and the expected result. But how to make sense of the data when the test fails? Originally IsExpectedClassification
was a simple loop that compared every pixel and using GoogleTest’s EXPECT_
assertions to report any failures:
bool IsExpectedClassification(const ThresholdMap& map, Grid<PixelType> expected)
{
bool success = true;
for (size_t i = 0; i < map.size() - 1; ++i)
{
const auto position = expected.Position(i);
const auto type = ClassifyPixel(map, position);
const auto expected_type = expected.at(i);
EXPECT_EQ(expected_type, type) << "Position: " << position;
if (type != expected_type)
{
success = false;
}
}
return success;
}
That yielded a failure report like this:
C:\Users\Thomas\SourceTree\traceup\cext\tests\image\classify_tests.cpp(23): error: Expected equality of these values:
expected_type
Which is: PixelType::Edge
type
Which is: PixelType::Corner
Position: Point2D(4, 3)
C:\Users\Thomas\SourceTree\traceup\cext\tests\image\classify_tests.cpp(23): error: Expected equality of these values:
expected_type
Which is: PixelType::Edge
type
Which is: PixelType::Corner
Position: Point2D(5, 4)
This precisely reports which pixel failed the expectation, but I found that I still needed to better understand what the full data looked like.
A quick revision converting it to a custom assertion function:
testing::AssertionResult IsExpectedClassification(const ThresholdMap& map,
Grid<PixelType> expected)
{
bool success = true;
Grid<PixelType> actual(expected.Dimension());
for (size_t i = 0; i < map.size() - 1; ++i)
{
const auto position = expected.Position(i);
const auto type = ClassifyPixel(map, position);
actual[i] = type;
const auto expected_type = expected.at(i);
EXPECT_EQ(expected_type, type)
<< "Position: " << position;
if (type != expected_type)
{
success = false;
}
}
if (!success)
{
std::string out;
for (int y = 0; y < actual.Height(); ++y)
{
for (int x = 0; x < actual.Width(); ++x)
{
const auto type = actual.at({x, y});
switch (type)
{
case PixelType::Empty:
out.push_back(' ');
break;
case PixelType::Corner:
out.push_back('C');
break;
case PixelType::Edge:
out.push_back('E');
break;
case PixelType::Interior:
out.push_back('I');
break;
}
}
out.push_back('\n');
}
return testing::AssertionFailure() << "Actual:\n" << out;
}
return testing::AssertionSuccess();
}
The failure report is now displaying an ASCII map of the actual data:
C:\Users\Thomas\SourceTree\traceup\cext\tests\image\classify_tests.cpp(23): error: Expected equality of these values:
expected_type
Which is: PixelType::Edge
type
Which is: PixelType::Corner
Position: Point2D(4, 3)
C:\Users\Thomas\SourceTree\traceup\cext\tests\image\classify_tests.cpp(23): error: Expected equality of these values:
expected_type
Which is: PixelType::Edge
type
Which is: PixelType::Corner
Position: Point2D(5, 4)
C:\Users\Thomas\SourceTree\traceup\cext\tests\image\classify_tests.cpp(153): error: Value of: IsExpectedClassification(map, expected)
Actual: false (Actual:
CC
CE
CECCE
EIIIE
CEEEC
)
Expected: true
This allows me to better understand the context of the failures and the data I was working with. It saves me from having to manually visualize this on a piece of paper.
The lifespan of tests
When tests are written as something that aids in both design and debugging their usefulness improves. It takes into account that code is a living thing that must be maintained over time and often by many different people.
If it’s not easy to add tests, you’re less likely to have as many as you should. And they are probably not as good either. When tests are part of the design process it feels less as an obstacle, but rather a helpful tool.
It quickly makes more sense to think about the ergonomics of tests when tests are not an afterthought to please the coverage report and achieve a passing build. How the input and expected conditions are presented as well as how failures are reported are important for the test ergonomics. Writing small utilities for this is well worth it in the long run.
When practicing test driven development you have the same needs for the qualities of a good test as when you debug a failed one. You want them small and simple to understand.
Most programming tools for running unit tests are text based. And in most cases that is sufficient. But the moment you deal with 2D and 3D data the more likely you are to wish you could easily visualize it. In many cases this is solved by dumping a bitmap which can be inspected. Dealing with 2D and 3D data in some form is however not that uncommon and I think it makes sense to have some better tooling and IDE integration for visualizing this in context. For instance, a test unit reporting scheme that can include references to external output such as an image and embed it inline into the test result presentation. It would be interesting to see what tooling would appear with a general scheme like that available in the most common code editors.
I’m curious to what solutions people are currently using to better understand the data in their application. Links or references are welcome!