Java 8 Lambdas explained #1: Map/Reduce with Fork/Join and the Beauty of a One-Line Lambda

Lambdas are cool. Lambdas are sexy. Lambdas are expressive and let you write less code. Are these all a cliché? No! They are so true…

Recently we run a session locally explaining Lambdas with side-by-side comparisons of some coding patterns expressed in Java 7 vs. Java 8. We used the best we could obtain of each version of the language and APIs, including lambdas, stream processing, the new Date and Time API… It was fun seeing people reactions on how dramatically code can be reduced from Java 7 to Java 8 style.

In this post I want to show one example of the above: how a simple Map/Reduce pattern can be radically simplified if you switch the Fork/Join API from Java 7 to parallel stream processing in Java 8.

Setting up the project

Let’s start with setting up the project. A very simple Maven project configuration that explicitly sets Java 8 as source and target for compilation and adds a dependency to JavaTuples library:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>deors.demos</groupId>
  <artifactId>deors.demos.java8</artifactId>
  <version>0.0.1-SNAPSHOT</version>
  <packaging>jar</packaging>
  <name>deors.demos.java8</name>
  <properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
  </properties>
  <build>
    <plugins>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-compiler-plugin</artifactId>
        <version>3.1</version>
        <configuration>
          <verbose>true</verbose>
          <compilerVersion>1.8</compilerVersion>
          <source>1.8</source>
          <target>1.8</target>
        </configuration>
      </plugin>
    </plugins>
  </build>
  <dependencies>
    <dependency>
      <groupId>org.javatuples</groupId>
      <artifactId>javatuples</artifactId>
      <version>1.2</version>
      <scope>compile</scope>
    </dependency>
  </dependencies>
</project>

The Problem

For this example, this is the problem we want to solve: given a list of integer tuples (pairs of integers), calculate the sum of the product of each pair. As we want the solution to scale, it should be done through parallel computing so the JVM running the code can take advantage of multi-core environments to split the problem in simpler pieces and make the processing faster.

In Java 7 fashion, a clever programmer could choose the Fork/Join API and create a task that, given the list of tuples, split it in halves when its size exceeds some defined threshold, process each half in parallel and recursively, aggregating the results from each piece until the final result is obtained. Not complex but a bit verbose even with the Fork/Join API. This is the code (don’t worry to write it down – this code can be obtained from GitHub here https://github.com/deors/deors.demos.java8):

import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.ForkJoinPool;
import java.util.concurrent.RecursiveTask;
import org.javatuples.Pair;

public class SumProductCalculationJava7 extends RecursiveTask<Integer> {
  private static final long serialVersionUID = 6939566748704874245L;
  private int threshold = 10;
  List<Pair<Integer, Integer>> pairList;

  public SumProductCalculationJava7(List<Pair<Integer, Integer>> pairList) {
    super();
    this.pairList = pairList;
  }

  @Override
  protected Integer compute() {
    System.out.printf("fragment %s size %d\n", this, pairList.size());
    if (pairList.size() <= threshold) {
       return computeDirect();
    }
    int split = pairList.size() / 2;

    List<Pair<Integer, Integer>> forkedList1 = pairList.subList(0, split);
    SumProductCalculationJava7 forkedTask1 = new SumProductCalculationJava7(forkedList1);
    forkedTask1.fork();

    List<Pair<Integer, Integer>> forkedList2 = pairList.subList(split, pairList.size());
    SumProductCalculationJava7 forkedTask2 = new SumProductCalculationJava7(forkedList2);
    forkedTask2.fork();

    return forkedTask1.join() + forkedTask2.join();
  }

  private Integer computeDirect() {
    Integer sumproduct = 0;
    for (Pair<Integer, Integer> pair : pairList) {
      sumproduct += pair.getValue0() * pair.getValue1();
    }
    System.out.printf("fragment %s total %s\n", this, sumproduct);    return sumproduct;
  }
}

In short, the compute method from RecursiveTask contract checks the size of the list. If it is lower than the threshold it calculates the sum-product using a private method with a for each loop. If not, it splits the list in two pieces and recursively processes the two fragments. The result of processing the two fragments is, as expected, the sum of both fragment results.

It is verbose, yes, but it does a lot of work in the back stage: use the best concurrency patterns, running the threads, waiting for each fragment to finish before aggregating results bottom-up and returning the final glorious result. Try to do the same with threads in Java 2 style and look how your hair will turn grey during the process!

However, Java 8 still can beat this by a large margin.

Testing the Problem

Now let’s run this task with a main method or JUnit test:

  public static void main(String[] args) {
    List<Pair<Integer, Integer>> thePairList = new ArrayList<>();
    thePairList.add(new Pair<Integer, Integer>(10, 1));
    thePairList.add(new Pair<Integer, Integer>(12, 2));
    thePairList.add(new Pair<Integer, Integer>(14, 3));
    thePairList.add(new Pair<Integer, Integer>(16, 4));
    thePairList.add(new Pair<Integer, Integer>(18, 5));
    thePairList.add(new Pair<Integer, Integer>(20, 6));
    thePairList.add(new Pair<Integer, Integer>(22, 7));
    thePairList.add(new Pair<Integer, Integer>(24, 8));
    thePairList.add(new Pair<Integer, Integer>(26, 9));
    thePairList.add(new Pair<Integer, Integer>(28, 10));
    thePairList.add(new Pair<Integer, Integer>(30, 11));
    thePairList.add(new Pair<Integer, Integer>(32, 12));
    thePairList.add(new Pair<Integer, Integer>(34, 13));
    thePairList.add(new Pair<Integer, Integer>(36, 14));

    SumProductCalculationJava7 theTask = new SumProductCalculationJava7(thePairList);
    ForkJoinPool thePool = new ForkJoinPool();
    Integer result = thePool.invoke(theTask);
    System.out.printf("the final result is %s\n", result);
  }

Well done! As expected, the final result is 2870!

Try now with different lists of tuples or just copy and paste to have a very long list in a few seconds. Change the threshold, measure execution times and fine-tune the program until it is as good as it can be.

You still cannot be sure whether it will be as optimal when executed in a different JVM, though, with different CPU and RAM available resources performance may change and it is likely that the best threshold is different depending on the JVM, machine resources, OS, workload, etc., but you can still deliver this with confidence and fine tune it for production later.

Same Problem. Different Approach

Now let’s do the same calculation above with Java 8 parallel streams and lambda expressions. Don’t expect too much – it’s really as simple as it looks. In fact it is too simple that I just coded everything in the main method:

import java.util.ArrayList;
import java.util.List;
import org.javatuples.Pair;

public class SumProductCalculationJava8 {
  public static void main(String[] args) {
    List<Pair<Integer, Integer>> thePairList = new ArrayList<>();
    thePairList.add(new Pair<Integer, Integer>(10, 1));
    thePairList.add(new Pair<Integer, Integer>(12, 2));
    thePairList.add(new Pair<Integer, Integer>(14, 3));
    thePairList.add(new Pair<Integer, Integer>(16, 4));
    thePairList.add(new Pair<Integer, Integer>(18, 5));
    thePairList.add(new Pair<Integer, Integer>(20, 6));
    thePairList.add(new Pair<Integer, Integer>(22, 7));
    thePairList.add(new Pair<Integer, Integer>(24, 8));
    thePairList.add(new Pair<Integer, Integer>(26, 9));
    thePairList.add(new Pair<Integer, Integer>(28, 10));
    thePairList.add(new Pair<Integer, Integer>(30, 11));
    thePairList.add(new Pair<Integer, Integer>(32, 12));
    thePairList.add(new Pair<Integer, Integer>(34, 13));
    thePairList.add(new Pair<Integer, Integer>(36, 14));

    Integer result = thePairList.parallelStream().
      mapToInt(p -> p.getValue0() * p.getValue1()).sum();
    System.out.printf("the final result is %s\n", result);
  }
}

Just one line of code to solve the same problem. Same result, same magic in the back stage, same powerful performance, but simple, expressive and productive!

Moreover, the JVM does all the optimizations for you. Depending on the variables mentioned above, like CPU, RAM or workload, the JVM can apply different approaches at runtime and make sure that your code runs as quick as possible in all conditions.

This is just one example on how Java 8 lets you write better code, so don’t wait for others to ask you to do it. Try to adopt Java 8 today!

Resource Redirection in Spring Pet Clinic Application for Tomcat 7 and Cloud Platforms

In previous months I’ve been playing with various cloud platforms, learning the basics, what’s different and what not, between them and comparing with more ‘traditional’ developments.

When I start to work in a new framework or tool, I tend to use the same set of reference applications to start. Simple stuff for a simple start. With that I pretend to concentrate in  the specifics of the f/t at hand, without dealing at the same time with whatever idea I had and was building.

The first app, as you can see in previous posts, is the simplest of Spring+Hibernate use cases, CRUD on a simple, two-field entity. This one is good to start but too simple to be really representative o an actual development.

For the second iteration I work with Spring Pet Clinic reference application: an exemplar use of Spring Framework created by Spring team a few years ago. To my surprise, Pet Clinic didn’t work out of the box with the latest Tomcat release, and while looking at what was happening I found out a few things worth sharing about the greatest and latest Spring and Tomcat.

In this post I will walkthrough my findings with Pet Clinic and what enhancements I did to make it ready for 2012 and beyond.

Continue reading

First Steps with Heroku – The New-Old Boy in the Cloud

Since my previous posts about Java cloud platforms I wanted to expend some time with Heroku and compare with the others.

Heroku is a veteran among the cloud platforms, but it’s not until a few months ago that they launched a Java offering.

In this post I will share my experiences starting with Heroku and making an existing application to work on it.

Continue reading

First Steps with Micro Cloud Foundry

Micro Cloud Foundry is a complete Cloud Foundry installation shipped in a ready to use Virtual Machine.

With Micro Cloud Foundry you can work locally in your applications and test how they work integratedly with Cloud Foundry services.

Using Micro Cloud Foundry during development is highly recommended for any serious work. It is not practical to have multiple people working on the same application and constantly deploying to the same public or private Cloud Foundry instance (e.g. hosted in Amazon EC2 or hosted in a VMware vSphere environment). Instead, developers would use local Micro Cloud Foundry instances for build and test and then a dedicated Micro Cloud Foundry or Cloud Foundry instance for integration testing.

In this post I will show how to get, configure and start to work with Micro Cloud Foundry.

Continue reading

Red Hat OpenShift: Freedom of Choice

1         Introduction

After we finished writing the post on VMware Cloud Foundry platform, it seemed natural to write a follow-up on Red Hat OpenShift. OpenShift is a Java-based Platform-as-a-Service (PaaS) offering from Red Hat, the ‘giant’ of Open Source Software with a well-deserved reputation that comes from a wide range of products including operating systems (Fedora, Red Hat Enterprise Linux), application servers / middleware (JBoss AS, JBoss ESB), frameworks (Hibernate, Seam) and tools (JBoss Tools, Arquillian).

As a PaaS offering, the ultimate goal of OpenShift is to reduce the effort needed to write and deploy highly scalable and highly available Java applications. Under your dedicated “application space” the platform components run to ensure your application is able to respond to user’s requests, but isolating your application code from the infrastructure and all the complexity usually associated with complex, distributed deployments.

Let’s jump into OpenShift!

Continue reading

VMware Cloud Foundry: A Cloud Primer

1         Introduction

It’s cloudy, today. But I’m not speaking about the weather in northern hemisphere autumn. Cloud Computing, two words that cannot be explained with just a sentence, is now a reality, born to change our (digital) lives.

Cloud Computing may mean different things to different people. For some, it would be having their personal documents, photos and music synchronized “in the cloud” and accessible through their PC or mobile device. For others, it would be being able to run their favorite games and to do their favorite stuff anywhere, anytime. It may be a cost-effective, agile hosting solution, an elastic production environment for databases and application servers, and many, many other things. Google Docs, Picasa, iCloud, Facebook, Twitter, Foursquare, Amazon EC2…

In this article, though, we will be discussing about Cloud Computing as a platform where developers can deploy their highly scalable, highly available solutions. We call this Platform-as-a-Service, or PaaS for short. PaaS also means a lot of things but for us is forgetting about dealing with the OS, the application server, the disc or the memory in the server. You just have your “application space”, where you can upload your solutions, execute them, monitor how they behave and control the amount of resources available to them.

You don’t know where your application physically is, the type of host server, operating system configuration or even how the application server is configured at all. You may certainly know about some or all of this, but it is really not needed to achieve your targets: develop your idea and make it available to your clients (no matter how much or where they are) in a hassle.

VMware Cloud Foundry is one of such platforms.

Continue reading