Spring Batch – How to read lists of lists (Part II)

Spring batch lists of lists

Sometimes for a batch job you need to process some input where the final list of records to process depends on another list. For example, if you have a folder containing zip files and you need to process each file contained in each zip, or you have a list of accounts from one database and need to process the customer records for these accounts in another database.

There are a number of ways to do this. Here are three ways I have found useful:

  • Create a custom reader
  • Create an intermediate list
  • Use a decider to loop back and process the next item in the high level list

This blog post covers the last of these; the other two were covered in a previous post.

Decider to loop back to earlier step

This method involves storing the initial list in the job’s execution context and having a decider determine whether all of the codes have been processed.

For more information on deciders take a look here.

Using the same scenario as in part 1 (processing a list of customers for account codes retrieved from another data source), there could be a step to read the account codes and store them in the execution context, a step to read the customers for the first account code retrieved from the execution context and process them, and a decider to update the stored list of account codes and return control to the preceding step if there are still codes to be processed.

Spring batch lists 1

The first step is a tasklet that reads the account codes and stores them in the job execution context like this:

[prism field=Lists_code_14 language=java]

The second step has a reader that retrieves the customers for the current account code like this:

[prism field=Lists_code_15 language=java]

This requires the list of account codes stored in the job execution context to be passed in to the reader. This can be done in the configuration:

[prism field=Lists_code_16 language=java]

It is vital to set the scope of the reader bean to “step”, otherwise you won’t be able to reference the jobExecutionContext value. It is also important to make sure that you specify the same value name that was used to store the codes in the LoadAccountCodesTasklet class.

Lastly, a decider is required to remove the first code from the stored list of account codes and return a status that will be used to determine which step to execute next. Something like this:

[prism field=Lists_code_17 language=java]

The status returned from the decider will be used to control the processing in the job.

The job definition for the decider might include something like this:

[prism field=Lists_code_18 language=java]

This configuration will return control to step2 in order to process the next account code if there is still another code to process. If all the codes have been processed, the job will move on to step3.

Using this approach will result in the Spring Batch tables containing records for the steps that process the customers multiple times – once for each account code.

Conclusion

This may seem a complicated way to process a list of customers for a list of account codes.

As described here there is only one step executed, but if the requirement is to process the customers for each account code through a number of steps before progressing to the next account code, then this could be a suitable approach to take.

I have also used this principle when I needed one job to call another multiple times with different parameters. We had one job that processed records for a particular account. We then needed to process records for a list of accounts, so we created a simple second job that stored the list of account codes in the execution context with one main step to invoke the original job.

Over the two posts for this topic I have described three ways to process a list of lists:

  • Create a custom reader
  • Create an intermediate list
  • Use a decider to loop back and process the next item in the high level list

The first two options are described in part 1.

The first option is appropriate when the data is stable and the job will not need to be stopped and restarted.

The second option is best if the data is volatile, as you have taken a snapshot of the records to be processed. This is the option I usually choose.

The third option is appropriate if the records for one list need to be taken through a number of steps before processing the next list, such as processing the customers for one account before processing those for another account.

Work with Desynit

Looking for exceptional, professional Salesforce support?

Our independent tech team has been servicing enterprise clients for over 15 years from our HQ in Bristol, UK. Let’s see how we can work together and get the most out of your Salesforce implementation.