Nanook: Open-Source Test Data Generation with Equivalence Classes
Manual test data wastes time and misses edge cases. Nanook generates systematic test data from spreadsheets — open source, CI/CD-ready, and used by Deutsche Bahn.

The Problem: Test Data Is Software Development's Blind Spot
Every developer knows this story. The feature is done, unit tests pass, everything green. Then it hits production — and it breaks. Not because the code was wrong, but because nobody thought of that one edge case. The order with a negative quantity. The date in February with 29 days. The customer name with special characters.
The problem is rarely the code. It's the test data. Or more precisely: the lack of systematic test data.
Most teams create test data manually. A few standard cases, the obvious error scenarios, and then we hope it's enough. Spoiler: it never is.
The Method: Equivalence Class Tables
Before we talk about the tool, let's talk about the method. Because Nanook isn't just another data generator like Faker.js. It's the implementation of a proven testing methodology.
Equivalence class partitioning is a technique from systematic software testing. The idea: instead of testing every possible input value (impossible), we divide input values into classes. Values within a class behave identically — if one works, they all work.
An example. An age field accepts values from 0 to 150:
| Class | Values | Expected |
|---|---|---|
| Valid | 0–150 | Accepted |
| Invalid (too small) | < 0 | Rejected |
| Invalid (too large) | > 150 | Rejected |
| Invalid (not a number) | "abc" | Rejected |
| Boundary | 0, 150 | Accepted |
Instead of testing 150 values, we test 5. Systematically. Completely. Traceably.
The problem: creating these tables is easy. Generating matching test data from them is tedious. That's exactly where Nanook comes in.
What Nanook Does
Nanook connects two things that have traditionally been separate: planning test cases and generating test data.
Step 1: Plan Test Cases — in a Spreadsheet
You define your equivalence classes in a regular spreadsheet. Excel, LibreOffice, Google Sheets — whatever you prefer. No new software to learn, no special syntax. Just rows and columns.
Each column is an input field. Each row defines a class of values. Nanook understands the structure and knows which combinations need to be tested.
Step 2: Assign Data Generators
For each equivalence class, you define how concrete test data should be generated. Nanook includes generators for common data types:
- Email addresses (valid and invalid)
- Names (various character sets and lengths)
- Dates (with boundary values and formats)
- Numbers (ranges, decimals, signs)
Need something specific? No problem. Write custom generators in JavaScript. An IBAN generator, an article number generator matching your internal schema, a generator for your domain-specific codes — all possible.
Step 3: Generate and Export
One command. Nanook reads the table, combines the equivalence classes, invokes the generators, and writes the results. Choose JSON, CSV, or a custom format via a pluggable writer.
npm install @xhubio/nanook-table
The result: a complete set of test data that systematically covers all relevant equivalence classes. Reproducible. Versionable. CI/CD-ready.
Why Not Just Use Faker.js?
Faker.js is great at what it does: generating realistic random data. But Faker.js solves a different problem.
| Faker.js | Nanook | |
|---|---|---|
| Approach | Random data | Systematic data |
| Foundation | No methodology | Equivalence classes |
| Goal | Realistic dummy data | Complete test coverage |
| Boundary values | Random, if at all | Explicitly defined |
| Reproducible | Only with seed | Always |
| Planning | In code | In spreadsheet |
Faker.js fills a database with test data. Nanook ensures the right test cases exist. Both have their place — but they solve different problems.
In Production: Deutsche Bahn
Nanook isn't a hobby project. The toolkit is used at Deutsche Bahn, where systematic testing isn't optional — it's a necessity. When booking systems, timetable data, and customer information need to work together, random test data doesn't cut it.
The combination of structured test planning in spreadsheets and automated data generation has proven itself in one of Germany's most complex IT environments.
CI/CD Integration
Nanook is a Node.js module. That means it runs everywhere Node.js runs. In your pipeline, in your pre-commit hook, in your nightly build.
# In your CI pipeline
npx nanook generate --input tests/equivalence-tables/ --output tests/data/
Generated test data becomes part of your repo. Changes to equivalence class tables automatically produce updated test data. No manual steps. No forgotten updates.
Open Source and MIT Licensed
Nanook is MIT licensed. Free for personal and commercial use. No hidden costs, no enterprise-tier restrictions.
The complete source code is on GitHub. Fork it, extend it, contribute your own generators, or just use it.
Who Is Nanook For?
- QA teams that need systematic test coverage, not just spot checks
- Developers who want to generate test data as part of their CI/CD pipeline
- Test managers who want to plan test cases in an understandable format (spreadsheet, not code)
- Enterprise teams that need documented, reproducible test coverage
Get Started
Nanook is ready to use in minutes:
npm install @xhubio/nanook-table
Full documentation, tutorials, and API reference at nanook.xhub.io.
Source code on GitHub.
Questions about using Nanook in your project? Talk to us — we're happy to help, even if you're using the toolkit for free.