Using instancio to populate test data

Instancio is a Java library that automatically creates and populates objects for your unit tests.

Writing exhaustive, maintainable, and relevant unit tests are essential to the software delivery lifecycle. But it can quickly become a tedious task, especially while working with large message payloads with a significant number of attributes. Moreover, the relevant test data is governed by many factors, like patterns, mandatory or optional attributes, and value ranges, to name a few.

One option is to source this data directly from production or a production-like environment, but depending on the application you are working on, using actual production-like data for testing may not be the best option always due to various data governance rules.

While exploring options to automate the test data generation process, I came across Instancio. It integrates very well with the Java development ecosystem via Junit or Spock. In this post, I will capture a couple of common use cases and configurations that I have been using for the past couple of weeks.

Setup

Before we begin, let’s set up Instancio in your project by updating the relevant dependency configurations file (e.g., Maven or Gradle). You can refer to their github page for the latest version details.

 <dependency>
    <groupId>org.instancio</groupId>
    <artifactId>instancio-core</artifactId>
    <version>2.16.0</version>
    <scope>test</scope>
</dependency>

Test Data generation

Once you have set up Instancio, you can start generating test data for your tests (Junit/Spock). Depending on the type of attributes, Instancio will randomly populate the primitive or custom values.

Consider the following data model for which we need to generate the test data:

public record BlogDetails (String url,String author,String blogName, LocalDate createdDate, List<String> posts){
}

The simplest way to create a populated instance is using create() method. This will generate a pre-populated BlogDetails instance with all the attributes populated depending on their type.

Instancio.of(BlogDetails.class).create()
// BlogDetails[url=FCVVESYK, author=AGMZRKDU, blogName=HYNMUEC, createdDate=2089-05-03]

Although this one-liner can be a good starting point to quickly test a scenario, the real power of this library lies in the customizations it provides. For example, the default, randomly generated data may not be much relevant to the testing context (for example, the url attribute is populated with the value FCVVESYK which does not seem contextually correct). But we can customize the behavior as follows:

public static BlogDetails sampleBlogDetails() {
    return Instancio.of(BlogDetails.class)
        // non-random (expected value)
        .set(field(BlogDetails::createdDate), LocalDate.now())
        // using inbuilt generators
        .generate(field(BlogDetails::url), gen -> gen.text().pattern("https://#c#c#c#c#c#c.com")) 
        // custom logic to provide your own logic for values
        .supply(field(BlogDetails::author),
            () -> StringUtils.capitalize(RandomStringUtils.random(4, true, false).toLowerCase())
                    + " "
                    + StringUtils.capitalize(RandomStringUtils.random(5, true, false).toLowerCase()))
        .create();
}

// customized result
// BlogDetails[url=https://ovmnbp.com, author=Pmby Wegmj, blogName=RED, createdDate=2023-06-05]
  1. The set method is used to define a pre-defined value for the matching target. If the matching target is a Collection, the value is mapped to all of its entries.
  2. The generate method allows the end user to use built-in org.instancio.generators.Generators for various use-cases. Instancio provides different utility methods for handling arrays, strings, numbers and other types.
  3. The supply method is used to provide a user-defined custom logic for populating instances. Instancio does not update or populate fields of the supplied instance.

Working with collections and Streams

Additionally, Instancio also supports generating list and streams of data:

// generate a list of 10 records
List<BlogDetails> blogDetails = Instancio.ofList(BlogDetails.class).size(10).create();

// define a model template and use the same to 
// generate an unbounded stream of instances.
// the limit(5) allows us to re-size the generated stream
Model<BlogDetails> model = Instancio.of(BlogDetails.class)
                .set(field(BlogDetails::createdDate), LocalDate.now())
                .toModel();
Stream<BlogDetails> limit = Instancio.stream(model).limit(5);

Instancio also supports generating collections and relationships between objects. For example, you can create a list of users or associate a user with an address.

userFactory.configureField("addresses", Address.class, 2); // Generates a list of two Address objects
userFactory.configureField("address", Address.class); // Generates a single Address object

Customizations and nested instances

Instancio can be configured to ignore certain attributes which we don’t want to be populated during testing. Similarly, we can mark some attributes as nullable, and Instancio will randomly nullify those attributes while generating test data.

var blog = Instancio.of(BlogDetails.class)
                .ignore(field(BlogDetails::id))
                .withNullable(field(BlogDetails::createdDate))
                .create();

Instancio can handle complex data structures and nested objects. You can define nested classes and configure the field generation accordingly.

public class User {
    private Profile profile;
    // other attributes
}

public class Profile {
    private String bio;
    private String avatar;
}

// In the test, customize the generated bio data
userFactory.configureField("profile.bio", "[a-zA-Z ]{10,50}");

Similarly, we can use the subtype method to provide an implementation for a parent type. This helps us to choose an appropriate type based on the test context:

Instancio.of(BlogDetails.class)
  .subtype(field(BlogDetails::getPosts), LinkedList.class)
  .create();

The default Instancio behavior is to use ArrayList, but we can override the behavior using subtype to use LinkedList instead.


That is all for this post. If you want to share any feedback, please drop me an email, or contact me on any social platforms. I’ll try to respond at the earliest. Also, please consider subscribing feed for regular updates.

Be notified of new posts. Subscribe to the RSS feed.