Java SE 8 introduces a new concept of streams (java.util.stream). As the name suggests, data is still considered as a sequence of elements, similar to the already known streams from the java.io.stream package. However, java.util.stream offers an efficient declarative data processing and even allows data parallelization without writing any multithread code.
A stream from the java.util.stream package receives as input data structures, arrays, generator functions and I/O resources. It does not store any elements, but rather computes them on demand. The received data input is manipulated through operations from the functional programming: filter, map, find, reduce, etc. Many operations return a stream as a result, as well. This allows chaining the operations in a pipeline, in order to achieve certain optimizations, such as “laziness” and “short-circuiting”, which are explained below.
Stream operations are divided in two main groups: intermediate operations and terminal operations. Intermediate operations are the operations that return streams as output and can be connected together in a pipeline, e.g. filter, map, sort, etc. Terminal operations close the pipeline and return the final result from the data manipulation by the intermediate operations, e.g. collect. They produce a result of a non-stream type: a collection, a number, etc. It is important to mention that intermediate operations are executed only after a terminal operation is called. This feature is addressed as “laziness” of intermediate operations and allows certain optimizations, such as examining only a part of the data input. Further optimization is achieved through “short-circuiting” and the limit() operation. It allows to return only a certain number of output values and not the whole output.
The following example shows how streams work and how their implementation differs from the Java SE 7 implementation.
Example 1.
There is a list of programming languages and only the first three of them which have a length of 6 should be returned. In Java SE 7 that works as follows:
List languages = Arrays.asList("Scheme", "Ruby", "Python", "Scala", "Java", "Oracle", "Assembler"); List result = new ArrayList(); for (String s : languages) { for (int i = 0; i < 1; i++) { if (s.length() == 6) { result.add(s); } } } result.subList(0, 3);
The list of languages is explicitly iterated and the result is stored in an accumulator.
Streams offer the following solution:
List<String> languages = languages.stream().filter(a -> a.length() == 6).limit(3).collect(Collectors.toList());
First, the list of languages is converted into a stream by stream(). Then, the languages with a length of 6 are filtered and the result is limited to show only the first three languages. All iterations remain hidden for the user. filter() and limit() are intermediate operations. Each of them returns a stream which serves as an input for the next operation. The final result is computed only after the terminal operation collect() is called. It converts the result stream into a list.
The example above can be extended to compute the length of the list of languages:
int languagesLength = languages.stream().map(String::length).reduce(0, (a, b) -> a + b);
map() is used to apply a certain operation (here: length()) on each element in the list. Then, with reduce() the sum of all lengths is computed. reduce() takes two arguments: an initial value and a binary operator, used on two operands to produce a new value.
Streams also offer operations to convert the elements of a stream into a certain type, in order to use methods, offered by this type. For example, mapToInt, mapToLong and mapToDouble convert the elements of a stream to integers, long and double, respectively, and makes it possible to apply the corresponding operations on them.
A big advantage of streams is the ability to parallelize code without multithreading. This is achieved just by replacing stream() with parallelStream() and the code is then executed automatically in a parallelized way. Data is divided into blocks which are parallelly processed.
In conclusion, the new Stream API in Java SE 8 allows a declarative and more concise data processing. In addition to execution optimizations via "laziness" and "short-circuiting", parallelization is easily achieved via a single operation.