Parallel Array sorting

The Java 8 API adds methods to java.util.Arrays to make use of the fork/join framework which has been introduced in Java 7 for sorting arrays. The new methods have the same signature as the existing sort() methods, with a parallel prefix. All the real work like breaking the array in sub arrays, sorting the sub arrays by distributing the sort algorithm to the available number of CPUs using the fort/join framework and merging the sorted sub arrays to the final result array is done in the background. To use the methods, simply use


instead of


The following code measures the performance difference between the two approaches by sorting a 10 MiB array of random numbers:

package com.example.java8;

import java.util.Arrays;
import java.util.Random;
import com.example.RealTimeCounter;

public class ParallelSort {
   final static int ARR_SIZE = 10 * 1024*1024;    // 10 MB
   final static int LOOPS = 100;                  // execute 100 times

   public static void main(String[] args) {

      // create an array with random numbers
      int[] array = new int[ARR_SIZE];
      Random rand = new Random(1);  // make sure to use the same random sequence each time, to ensure reproduceability
      for (int i = 0;  i < array.length;  i++) {
         array[i] = rand.nextInt(9999999);

      // create the timer probe
      RealTimeCounter rtc = new RealTimeCounter();

      // sort the array - note: inline sorting, need to copy the array first
      for (int i = 0;  i < LOOPS;  i++) {
         int[] toSort = Arrays.copyOf(array, array.length);
      System.err.println("Normal sort: " + rtc);

      for (int i = 0;  i < LOOPS;  i++) {
         int[] toSort = Arrays.copyOf(array, array.length);
      System.err.println("Parallel sort: " + rtc);

RealTimeCounter is a simple class which measures the elapsed real time between start() and stop. On my dual core machine, the result is like

Normal sort: Real Time: 112416000us (112416ms)
Parallel sort: Real Time: 52465000us (52465ms)

So, as expected, distributing the sort to two CPUs is approximately twice as fast as the single CPU sort. Note that the measurement also includes the copyOf() operation, but that is the same effort in both loops.