This is one of a series of blog posts about the Spring Batch framework, based on lessons learned from building a number of batch jobs.

For a description of the Spring Batch framework, please take a look here. 

With some Spring Batch jobs you end up processing thousands, if not millions, of records. It can be useful to log the progress, for two reasons:

  • Firstly, if you don’t have the admin web app or something similar, you can use the log file to check progress.
  • Secondly, you can review the historic logs to compare performance.

Thankfully, it’s quite easy to set up using the listener interfaces provided by the Spring Batch framework. For some background info on the listeners available take a look here.

What we need is a listener class to count the number of records read and output a log message every time the count reaches a multiple of a certain value, such as one thousand. The ChunkListener provides a means to be notified during the processing of records within a step.

Here’s an example:

package my.package;

import java.text.MessageFormat;

import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;
import org.springframework.batch.core.ChunkListener;
import org.springframework.batch.core.scope.context.ChunkContext;

/**
 * Log the count of items processed at a specified interval.
 * 
 * @author Jeremy Yearron
 *
 */
public class ChunkCountListener implements ChunkListener{
	
	private static final Logger log = LogManager.getLogger(ChunkCountListener.class);

	private MessageFormat fmt = new MessageFormat("{0} items processed");

	private int loggingInterval = 1000;
		
	@Override
	public void beforeChunk(ChunkContext context) {
		// Nothing to do here
	}

	@Override
	public void afterChunk(ChunkContext context) {
		
		int count = context.getStepContext().getStepExecution().getReadCount();
		
		// If the number of records processed so far is a multiple of the logging interval then output a log message.			
		if (count > 0 && count % loggingInterval == 0) {
			log.info( fmt.format(new Object[] {new Integer(count) })) ;
		}
	}
	
	@Override
	public void afterChunkError(ChunkContext context) {
		// Nothing to do here		
	}
	
	public void setItemName(String itemName) {
		this.fmt = new MessageFormat("{0} " + itemName + " processed");
	}

	public void setLoggingInterval(int loggingInterval) {
		this.loggingInterval = loggingInterval;
	}
}

This class allows you to specify the interval at which messages will be written to the log – I find that 1000 is a good default, but this can be changed to suit your circumstances. It also allows you to provide a name for the items that are being counted. This means that you can count the records for a number of steps and identify each set clearly in the log. The class uses the count of items read, rather than the items written, as you may have a processor that filters out some of the records – in which case the number of items written will not increase in a consistent manner.

If you’re configuring your Spring app using XML, then you add a bean like this:

<bean id="myCountListener" class="my.package.ChunkCountListener">
    <property name="itemName" value="Customers" />
    <property name="loggingInterval" value="10000" />
</bean>

Using this config, the listener will write out something like this:

10,000 Customers processed
20,000 Customers processed
...

Then you add the listener to the step:

<step id="myStep">
    <tasklet>
        <chunk reader="myReader" processor="myProcessor" writer="myListWriter" commit-interval=”500”/>
    </tasklet>
    <listeners>
        <listener ref="myCountListener”/>
    </listeners>
</step>

Note that the logging Interval value for the listener must be a multiple of the commit-interval value for the step or the number of items read may never be a multiple of the loggingInterval value and the messages may never get written to the log.

So now you can run your job and the progress will be recorded in the log file. But what if you want to take a closer look at the performance of the job?

How you approach this depends on your circumstances, but creating a class to parse the log file is straightforward. If you configure the logging so that the timestamp is included in each log record then you can calculate the time taken to process each thousand records or whatever your logging interval is. Then the time taken to process each interval could be plotted in a graph to illustrate how the speed of processing changed over the course of the execution and highlight any changes in performance.

Jeremy Yearron September 13, 2017

4 thoughts on “Spring Batch – Log the record count during processing

  • I am curious if this would be a good approach for calculating summary statistics of data. For example summing the number of instances of a status seen or calculating percent of total sales by month… seems like you could use a set of listeners to perform calculations if storing e whole dataset in memory is not an option..

    • Yes, using listeners would a good way to accumulate statistics. The results could be stored either in the JobExecutionContext or the StepExecutionContext depending on what suited your situation best.

    • Hi Vi, using this method you choose which steps are counted, by adding a ChunkCountListener to those steps that you want to have the count logged. If you don’t add a listener, then there will be no counting.
      If you just want the total number of records processed in the previous step, then this can be retrieved using a StepExecutionListener and adding code to the afterStep method to retrieve the count from the stepExecution argument.

Leave a Reply

Your email address will not be published. Required fields are marked *