Wednesday, April 22, 2015

A Smarter Stream Builder

Background


We are going to provide the "user-land" API using Java 8 Streams. So you can obtain a stream of all Hares this way:

Hare.stream().filter(h->h.getAge()>8).forEach(System.out::println);

The problem is that if you use standard Java 8 streams, then you will iterate over all Hares. If we use a SQL storage engine, this will correspond to

select * from Hare;

If we have 1 000 000 hares, this is not a viable solution.

The Stream Builder

We have developed a smarter Stream Builder concept that implements Stream, IntStream, LongStream and  DoubleStream. The Stream Builder acts like a builder and adds all the translations (like map, filter) to a Pipeline. Whenever a terminal operation is encountered, a pluggable StreamTerminator can inspect the Pipeline and optionally modify the Pipeline and also select different upstream data sources before the real Stream is actually started.

If we take the example above:

Hare.stream().filter(h->h.getAge()>8).forEach(System.out::println);

the StreamTerminator can, in theory, inspect the filter Predicate and see that it can translate it to a SQL command. Thus, it removes the filter command in the Pipeline and modifies the upstream data source like this:

select * from Hare where age>8;

Terminal Operation Driven

Each terminal operation will have its own "call back" in the StreamTerminator so it can act in its best interest. Let's take an example with Stream::count.

Suppose we have:
 Hare.stream(),filter(h-"gray".equals(h.getColor()))
.sorted().map(h->h.getName()).count();

as input. Then the StreamTerminator's count() call back will determine that sorted() and map() does not change the number of items in the stream and thus can be removed from the Pipeline all together. Then it can modify the upstream source to reflect the predicate. So we will end up with:

select count(*) from Hare where color='gray';

For in-memory storage engines, the StreamTerminator can be made very efficient because it may evaluate a number of options to reduce the Pipeline by short-circuiting combinations of Predicates and determining which upstream source that shall be selected to yield the lowest number of iterations for the optimized stream/pipeline.

Check It Out

Check out the com.speedment.util.stream.builder.demo package.


The Hard Part

In the beginning, we are going to use specific predicates that we can recognize easily. One way is to have support classes/methods like Predicates.equals("columnName", value) that we can detect in the Pipeline. If we want to cover the generic case like the one above, we have to decompile the predicates and find out how they operate on the fly.

Future Work

In the future, the Stream Builder could also be applied to normal Java collections so that they may be optimized using redundant step removal, reordering of Pipeline, and upstream data source modifications. Like this seemingly O(N) operation:

collection
    .stream()
    .map((User u)-> u.getName())
    .sorted()
    .limit(limit)
    .count();

would be reduced by the StreamTerminator to:


Math.min(limit, collection.size);

which is an O(1) operation.





Friday, November 21, 2014

Hazelcast support!

Speedment Partners with Hazelcast for SQL Based In-Memory Operational Data Store

Hazelcast is the hot new scale out system for Big Data for the Java enterprise segment in Silicon Valley. Now Speedment’s SQL Reflector makes it possible to integrate your existing relational data with continuous updates of Hazelcast data-maps in real-time.

Typically, new systems for Big Data are built to work with NoSQL databases only, where information elements often are stored as key/value pairs. This has excluded the majority of the market that still uses SQL databases from using real-time big query systems. With Speedment’s SQL reflector, traditional data saved in SQL databases can be reflected, integrated and presented in Hazelcast’s NoSQL format. The result is that from now on, all data can be accessible instantly and you are able to query your entire dataset, no matter what format and sizes the data has today.

“Real-Time In-Memory SQL is a holy grail for In-Memory Data Management” said Miko Matsumura, VP of Marketing at Hazelcast “enabling real-time synchronization with SQL RDBMS and Hazelcast provides good value for Enterprise customers who need SQL.”

Together with Hazelcast, Speedment will run a webinar later this year. It will be a presentation on how easily the SQL Reflector can be installed on an SQL database and be seamlessly integrated with Hazelcast to gain the ultimate benefit from the two technologies.


Per Åke Minborg, CTO Speedment says: ”It was a very straight-forward integration and a good match since both companies are using the same basic ConcurrentMap structure for storing data. Both products are a perfect match for Java 8’s functional API, allowing developers to remain in a strict object oriented world.”

Hazelcast has not had a possibility to offer real-time SQL reflection to their solution before. The match with Speedment will fill this gap and gives customers with all kinds of data storage access to the Hazelcast platform.

“The automatic synchronization between database and the Hazelcast based cache is exactly what we would have needed on one of my old employers since we had legacy applications that wrote directly to the database.” says Hazelcast evangelist Christoph Engelbert.

Carina Dreifeldt, CEO at Speedment comments: “With our technology, the majority of the world’s more than 50 million RDBMS installations can gain access to the advantages of NoSQL solutions.”

For more information, please contact:
Carina Dreifeldt, (CEO)                       Miko Matsumura (VP Marketing and Developer Relations)
Speedment AB                                   Hazelcast Inc.
carina@speedment.com                    miko@hazelcast.com


How it works: 
Screencast: http://youtu.be/6tXW7x6QLiY
Presentation: http://www.speedment.com/doc/SpeedmentSqlReflector...


Thursday, October 30, 2014

Initial Meetup in Palo Alto

The announcement of an Open Source version of the Speedment ORM was made today in a meetup here in Palo Alto. Please leave your comments in this or any other post on this blog.

You can download your own free technology preview at:

http://www.speedment.com

Please try it and give me your feedback! Get involved!

Best, Per-Åke Minborg