Software Development

Nanook: Open-Source Test Data Generation with Equivalence Classes

Manual test data wastes time and misses edge cases. Nanook generates systematic test data from spreadsheets — open source, CI/CD-ready, and used by Deutsche Bahn.

Author

xhub.io Team

Published

April 10, 2026

Read time

7 min read

Nanook: Open-Source Test Data Generation with Equivalence Classes

The Problem: Test Data Is Software Development's Blind Spot

Every developer knows this story. The feature is done, unit tests pass, everything green. Then it hits production — and it breaks. Not because the code was wrong, but because nobody thought of that one edge case. The order with a negative quantity. The date in February with 29 days. The customer name with special characters.

The problem is rarely the code. It's the test data. Or more precisely: the lack of systematic test data.

Most teams create test data manually. A few standard cases, the obvious error scenarios, and then we hope it's enough. Spoiler: it never is.

The Method: Equivalence Class Tables

Before we talk about the tool, let's talk about the method. Because Nanook isn't just another data generator like Faker.js. It's the implementation of a proven testing methodology.

Equivalence class partitioning is a technique from systematic software testing. The idea: instead of testing every possible input value (impossible), we divide input values into classes. Values within a class behave identically — if one works, they all work.

An example. An age field accepts values from 0 to 150:

Class	Values	Expected
Valid	0–150	Accepted
Invalid (too small)	< 0	Rejected
Invalid (too large)	> 150	Rejected
Invalid (not a number)	"abc"	Rejected
Boundary	0, 150	Accepted

Instead of testing 150 values, we test 5. Systematically. Completely. Traceably.

The problem: creating these tables is easy. Generating matching test data from them is tedious. That's exactly where Nanook comes in.

What Nanook Does

Nanook connects two things that have traditionally been separate: planning test cases and generating test data.

Step 1: Plan Test Cases — in a Spreadsheet

You define your equivalence classes in a regular spreadsheet. Excel, LibreOffice, Google Sheets — whatever you prefer. No new software to learn, no special syntax. Just rows and columns.

Each column is an input field. Each row defines a class of values. Nanook understands the structure and knows which combinations need to be tested.

Step 2: Assign Data Generators

For each equivalence class, you define how concrete test data should be generated. Nanook includes generators for common data types:

Email addresses (valid and invalid)
Names (various character sets and lengths)
Dates (with boundary values and formats)
Numbers (ranges, decimals, signs)

Need something specific? No problem. Write custom generators in JavaScript. An IBAN generator, an article number generator matching your internal schema, a generator for your domain-specific codes — all possible.

Step 3: Generate and Export

One command. Nanook reads the table, combines the equivalence classes, invokes the generators, and writes the results. Choose JSON, CSV, or a custom format via a pluggable writer.

npm install @xhubio/nanook-table

The result: a complete set of test data that systematically covers all relevant equivalence classes. Reproducible. Versionable. CI/CD-ready.

Why Not Just Use Faker.js?

Faker.js is great at what it does: generating realistic random data. But Faker.js solves a different problem.

	Faker.js	Nanook
Approach	Random data	Systematic data
Foundation	No methodology	Equivalence classes
Goal	Realistic dummy data	Complete test coverage
Boundary values	Random, if at all	Explicitly defined
Reproducible	Only with seed	Always
Planning	In code	In spreadsheet

Faker.js fills a database with test data. Nanook ensures the right test cases exist. Both have their place — but they solve different problems.

In Production: Deutsche Bahn

Nanook isn't a hobby project. The toolkit is used at Deutsche Bahn, where systematic testing isn't optional — it's a necessity. When booking systems, timetable data, and customer information need to work together, random test data doesn't cut it.

The combination of structured test planning in spreadsheets and automated data generation has proven itself in one of Germany's most complex IT environments.

CI/CD Integration

Nanook is a Node.js module. That means it runs everywhere Node.js runs. In your pipeline, in your pre-commit hook, in your nightly build.

# In your CI pipeline
npx nanook generate --input tests/equivalence-tables/ --output tests/data/

Generated test data becomes part of your repo. Changes to equivalence class tables automatically produce updated test data. No manual steps. No forgotten updates.

Open Source and MIT Licensed

Nanook is MIT licensed. Free for personal and commercial use. No hidden costs, no enterprise-tier restrictions.

The complete source code is on GitHub. Fork it, extend it, contribute your own generators, or just use it.

Who Is Nanook For?

QA teams that need systematic test coverage, not just spot checks
Developers who want to generate test data as part of their CI/CD pipeline
Test managers who want to plan test cases in an understandable format (spreadsheet, not code)
Enterprise teams that need documented, reproducible test coverage

Get Started

Nanook is ready to use in minutes:

npm install @xhubio/nanook-table

Full documentation, tutorials, and API reference at nanook.xhub.io.

Source code on GitHub.

Questions about using Nanook in your project? Talk to us — we're happy to help, even if you're using the toolkit for free.