Friday, November 30, 2012

Merging Two Binary Search Trees in O(logn) Space

In a recent coding interview, I was asked to merge two Binary Search Trees (BSTs) without modifying the given trees. This was the first time I was exposed to this question and for an hour or so I tried to come up with a linear time algorithm of O(1) space complexity. The defeat was inevitable. (To tell the truth, if a hint would be given like O(m+n) storage is allowed, I could easily figure out the solution without breaking a sweat.) As usual, I could not sleep that night and scratched dozens of papers to come up with a better solution. But as night passed, I started to realize the fact that the space complexity is lower bounded by the tree height. This blog post is set to keep a cyber-record of this pursuit.

Related Work

You can first start by checking the Google results. But, as usual, I am kind enough to provide you a tl;dr version: In order to merge two BSTs of size m and n nodes, there are a couple of common approaches of fixed O(m+n) processing time and space complexity, some of which are listed as follows.

  • Insert given BSTs into a newly created BST.
  • Create an array of elements of the first BST in sorted order, and use this array to merge the results to a new BST while traversing the second BST.

Further exposure to the higher doses of the problem is available through this and this StackOverflow threads.

On Complexity

You will definitely need an O(m+n) processing complexity to visit each node, that's for sure. But what about O(m+n) space complexity? It means that you need to store one (or both) of the given trees in a vector in order to proceed with the merge. As it will turn out in the following paragraphs, actually space complexity is lower-bounded by the height of the tree, that is, O(h), where h=logn for a balanced tree.

The Trick

In its most basic form, we flatten both trees into two separate vectors. Next, we consume one element at a time from either of the trees with the smallest element. This scheme deserves a figure of its own.

It is certain that we effectively don't need the whole elements of a tree packed into a single vector at once. At each step, what we ask for is the next smallest element. That is, we just need a stream of nodes traversed in-order.

Let's further investigate the possibility of implementing a stream of nodes. In order to consume the elements of a binary search tree in sorted order, we need to traverse the tree in left-center-right node order. Assume that we have below traversal function. (Yes, it is in C++ and I shamelessly used templates.)

template <class T>
void traverse(Node<T>* root, queue<T>& items) {
  if (root) {

What if I can suspend the execution at any point in time while pushing the data to a queue? In that case, what would be the maximum possible height of a recursive traverse() call stack? I know you like figures, so I took another photo of the board.

That is, the maximum call stack depth of a traverse() recursion is upper-bounded by the tree height. Coupled with the fact that successive traverse() calls are sufficient to consume the nodes of a tree in sorted order, we should be able to stream the nodes of tree with at most O(logn) node pointers.

Streaming Nodes

Since actual traverse call stack is bounded, we can emulate the recursive traverse using a stack of the nodes traversed so far from the root. The outline of the streaming algorithm is as follows.

The Prestige

Now we can stream a tree in sorted order using at most O(logn) storage. The rest is easy: Stream both trees and merge them while streaming.

The Code

I have implemented a streamer (NodeStream), stream merger (MergeNodeStream), and a vector merger (MergeNodeVector) in C++ and Scala. Code is accessible through this Gist. You can also find implementations of the algorithm in C++ using Boost Coroutines and Haskell written by Remko.

Wednesday, November 28, 2012

Setting up a Scala Development Environment for Intellij IDEA and SBT

Resisting against the Functional Programming Principles in Scala buzz was meaningless. After all, this or that way I know I would be repeating those steps on my own regardless of the presence of such a lecture. To warm up, I looked around for an appropriate development environment for Scala. The last time (~2 years ago) I repeated this same step ended up with a desperate search. Fortunately, reading dozens of blog posts, forum/mailing-list threads, GitHub README's, repeating try-and-fail procedures leaded me to a working setup. To mitigate the friction at least to an extent for new comers, I put up this blog post to make a list of steps in order to setup a working Scala development environment on top of IntelliJ IDEA with SBT integration.

Is there a Scala plugin available for IDEA?

Good news, yes. It is under active development, works way better than its alternatives, has a responsive maintainer and an active community. Plugin lets you create Scala projects, compile/run/debug Scala source files, scripts, and worksheets. Navigation, refactoring, tab-completion, code snippets are included as well. (Note that it is strongly advised to use an EAP version for a smooth experience.)

Is there a quick start guide for IDEA Scala plugin?

Yes, see Getting Started with IntelliJ IDEA Scala Plugin.

How do I manage project dependencies?

While one can setup a Maven/Ivy/Ant configuration for a Scala project, SBT is known to be the de-facto tool for build management throughout the Scala community. (Otherwise, you will need to set explicit scala-compiler and scala-library dependencies in XML mazes.) Fortunately, there is an SBT plugin for IDEA. It offers a console where SBT commands can be entered interactively, and a Before Launch task to delegate project compilation to SBT, as an alternative to the built in IntelliJ Make.

Is there a quick start guide for SBT?

Yes, see Hello, World in SBT documentation.

How can I integrate libraries installed by SBT to IDEA?

At its core, SBT uses Apache Ivy, which has its own nasty ways of dealing with downloaded JARs under ~/.ivy2. Instead of manually defining IDEA module dependencies for each JAR a project uses, lucky for us, there exists an SBT plugin for this purpose: sbt-idea. Basically, sbt-idea enhances SBT with a new task, called gen-idea, which creates IDEA project files with necessary module dependencies induced by SBT. That is,

  1. Add your dependencies to your SBT configuration,
  2. Call sbt update to download/update project dependencies,
  3. Call sbt gen-idea to create IDEA project files,
  4. Open created project from IDEA.

What are the IDEA modules created by sbt-idea?

In addition to below directories, SBT dependencies are added to the IDEA module configuration.

  • Source Folders: src/main/{scala,java,resources}
  • Test Source Folders: src/test/{scala,java,resources}

What about testing?

SBT supports a couple of testing frameworks, i.e., specs2, ScalaCheck, and ScalaTest. See Testing section of the SBT documentation for a detailed discussion.

What about my .gitignore?

Here you go.


I read enough, gimme the code!

Create a project directory.

$ export PROJECT_DIR=~/scala/ScalaHelloWorld
$ mkdir $PROJECT_DIR

Create $PROJECT_DIR/build.sbt as follows. (In this example, I used ScalaTest framework.)

Create $PROJECT_DIR/project directory and add below lines to $PROJECT_DIR/project/plugins.sbt to add sbt-idea plugin.

Run sbt in $PROJECT_DIR and execute gen-idea task.

$ sbt
[info] Loading project definition from /home/vy/scala/ScalaHelloWorld/project
[info] Updating {file:/home/vy/scala/ScalaHelloWorld/project/}default-70d248...
[info] Resolving org.scala-sbt#precompiled-2_10_0-m7;0.12.1 ...
[info] Done updating.
[info] Set current project to ScalaHelloWorld (in build file:/home/vy/scala/ScalaHelloWorld/)
> gen-idea
[info] Trying to create an Idea module ScalaHelloWorld
[info] Updating {file:/home/vy/scala/ScalaHelloWorld/}default-3005c4...
[info] Resolving org.scalatest#scalatest_2.9.2;1.8 ...
[info] Done updating.
[info] Resolving org.scalatest#scalatest_2.9.2;1.8 ...
[info] Excluding folder target
[info] Created /home/vy/scala/ScalaHelloWorld/.idea/IdeaProject.iml
[info] Created /home/vy/scala/ScalaHelloWorld/.idea
[info] Excluding folder /home/vy/scala/ScalaHelloWorld/target
[info] Created /home/vy/scala/ScalaHelloWorld/.idea_modules/ScalaHelloWorld.iml
[info] Created /home/vy/scala/ScalaHelloWorld/.idea_modules/ScalaHelloWorld-build.iml

In src/main/scala/Main.scala, create a main() stub with a testable function in it.

In src/test/scala/MainSuite.scala, create a sample test suite.

Enjoy the rest by either creating your own IDEA run configurations, or manually running tasks within the SBT console. (As a final note, while creating IDEA run configurations, you can use SBT Before Launch task provided by IDEA SBT plugin.)

Sunday, September 30, 2012

Sequential Chaining of Nested Callbacks

Back in the old days of programming, the days we used to have blocking function calls, things were easy: first do this, and then do that, and finally clean things up. While this sequential style of programming has its own advantages in terms of human perception, it imposes certain concurrency related limitations. That is, each expression in a program is supposed to wait for the completion of the execution flow up until the line above, regardless of if it has anything to do with the current expression or not. Asynchronious programs are known to solve this problem of concurrency, (at least, to some extent) with the cost of sacrificing the sequential program flow. At its most basic form, functions are provided interfaces to specify dependencies between each other and hence, independent functions are allowed to execute concurrently. These interfaces are generally provided in the form of callback functions, that is, when f() gets completed do g(), and when g() completes do h(), and so on. The premise of this blog post is to investigate whether it is possible to still preserve the dependency between functions by still allowing the programmer to syntatically structure the program sequentially.

In the context of asynchronious programming, JavaScript is an exceptional example, where almost every functionality in the language is shaped according to asynchronous callbacks, which eventually enforces you to program in a totally top-down callback-oriented style. In this post, I prefered to use CoffeeScript (which is a programming language compiled to JavaScript) to enhance the plain text the explanations.

Ok, enough talk. Let's start with working on a small sequential program snippet.

Simple and intuitive. You just grab the logic behind at first sight: First, connect to the database, and then query the rows of a table, and finally close the connection and return the results. And as a bonus you can catch errors that would occur during this sequence. Here is its asynchronious, callback-driven counterpart:

Um... Not as intiutive as its sequential counterpart. And the nested callback chains expands the code to the right, which makes it harder to understand as well. But, not that bad... with a serious glitch: Orphan exceptions. That is, for instance, who is supposed to catch a connection error exception after completes gracefully and the execution passes over the try-catch block? While code will be get polluted a little bit, this problem can be tackled by returning an error, instead of raising an exception.

Better now, at least in terms of correctness.

So far, we always beforehand knew the callbacks that will be nested into each other. That is, we knew that a simple query will follow just after the database connection gets established. What if we wouldn't? What if the next callback is to be dynamically determined according to a runtime variable? Think about this scenario: You need to query the database multiple times depending on the input passed by the user. A pretty common day-to-day practice. Terrorizing the code with unknown number of nested callbacks would buy us no credits.

On the other hand, forming nested callbacks using a recursive function solves the problem.

I admit that this is not intuitive, also more error-prone. (I also could not be sure if I wrote it right. But anyway, you get the idea.) There must be some other way. Wouldn't it be cute if there would exist some sort of sequencer mechanism that allows me to sequentially chain my nested callbacks?

This Aha! moment helped me to come up with below tiny helper class.

SequentialExecutor helps you to push your functions into an array and executes them in order for you. Specifically, it passes you the pointer to the next function (i.e., next) that is supposed to be executed after current function. So, it is up to you to execute it or not. Here is an example using this cute SequentialExecutor class:

Yes, now we have something! Let's also try to implement the case where the total number of queries are dynamically determined on the run.

Oops! That is not what we were expecting. Database connection is supposed to be closed at the end of the execution flow. Hrm... Can't we enhance SequentialExecutor to label tasks with priorities? Here is the poor man's sequential executor with priority support.

Let's give our new gear, PrioritizedSequentialExecutor, a try.

Mission accomplished! Now we have a fully-fledged sequencer where we can dynamically push tasks with different priorities.

Note that while PrioritizedSequentialExecutor is quite good at doing what it is advertised for, especially compared to the lines of code written, there exists other libraries (e.g., seq, chainsaw, futures, async, windjs, streamlinejs, etc.) with similar flow-control purposes. While you are at it, you might want to check them out too.

Sunday, September 16, 2012

An Authentication Service for AngularJS

I am so tired to form up explanatory, coherent sentences, but at the same time I really like to share a user authentication service that I wrote for AngularJS. Hence, pardon my brevity.

In this example I use RailwayJS with CoffeeScript both at the client- and server-side. (See my previous post on auto-compiling assets with connect-assets.) Here is the scenario: You have assignments.html such that only authenticated users are allowed.

First things first, here is our /config/

Then we implement our controller /app/controllers/ as follows.

Note that the authenticate used in login action handler is meant to be provided by you.

Later, we write /app/views/home/index.ejs to fire up AngularJS:

We first start by implementing app.js of AngularJS in /assets/js/

The extra bit for listening on $rootScope for $routeChangeStart is to check access to authentication required pages.

After app.js, we implement /assets/js/

For each controller, we implement a view, that is, /public/partials/login.html and /public/partials/assignments.html:

And here goes the magic, /assets/js/

Hope it works for you as well.

Auto-Compiling Assets with connect-assets

I have been using RailwayJS for a couple of weeks with great satisfaction and joy. Its built-in support for CoffeeScript files is fantastic. That being said, like a majority of other NodeJS web frameworks (e.g. Geddy, Tower.js, SocketStream, FlatIron) it doesn't provide a way to ship client-side CoffeeScript files. (In this context, Meteor represents a notable exception, where it is capable of handling CoffeeScript files both at the server- and client-side.)

To establish a complete CoffeeScript experience both at the server- and client-side, one can use connect-assets by Trevor Burnham to auto-compile assets (i.e., CoffeeScript, Stylus, LESS files) on-the-fly. For this purpose, you just need to (1) add a app.use require('connect-assets')() line to your, (2) create js and css folders under assets directory, and you are ready to go.

One minor glitch that you need to pay attention is, while using connect-assets, you must include asset files using accessor functions ‐ js() and css() ‐ provided by connect-assets. These functions in addition to generating necessary <script src="..."></script> tag, also register the file into the asset processor service. connect-assets auto-compiles an asset the first time it is called and caches the produced output in memory; later calls will be served from the cache. (It also provides options to write produced outputs to disk and watch file changes.) For instance, say you want to use /assets/js/ in /app/views/main/index.ejs file. For this purpose, you need to add <%- js('main') %> into your view for enabling connect-assets serving your file.

As a final note, since connect-assets is a Connect plugin, you should be able to run it on any other web framework that runs on Connect or ExpressJS.

Thursday, June 21, 2012

RPC (Remote Procedure Call) with Hazelcast

Despite the fact that Hazelcast is a data distribution (particularly, peer-to-peer) platform for Java, its group communication mechanics can be utilized for remote procedure calls in a similar fashion to JGroups remote procedure calls. Further, to ease this workflow, Hazelcast provides distributed executor service to execute Callable and Runnable instances on the remote cluster members. Now, let's glue things together for making remote procedure calls on Hazelcast cluster members.

Assuming that you have some familiarity with executor services, check out below SimpleRPC class, which makes use of the Hazelcast executor service to invoke a Call instance on every cluster member.

Since SimpleRPC is a singleton, whose instance is accessible via its getInstance() method, foreign classes can easily invoke its callMe() instance method. Now let's implement the Call class, which is anticipated to encapsulate the necessary mechanics to make a call to SimpleRPC.getInstance().callMe() on each cluster member.

Simple, eh?

Monday, June 4, 2012

Performance of Linux IP Aliased Network Interfaces: Lessons Learnt

In an earlier post, I had put together a set of benchmarks to measure the performance of IP aliased network interfaces in Linux. While I was expecting a performance degredation due to IP aliasing related book keeping at the kernel level, suprisingly results were pointing out that the overall network throughput measured over the used physical interface increases proportional to the number of aliased interfaces. In this post, I will re-investigate the subject with the lessons I learnt from my previous try.

To begin with, I want to address Chris Siebenmann's concerns regarding the raw performance of the used network interface card. For this purpose, I first replaced the NIC with a real one (RTL8111/8168B) that is capable of achieving gigabit rates. Then, I started playing with iperf parameters. After a couple of tries, I figured out that the game changer is TCP MSS (maximum segment size) for me. Setting MSS to 1448 helped me to boost my speed to gigabit rates. Other configurations (e.g., disabling TCP delays, i.e., Nagle's algorithm) did not change the results that much, but I present them hereby anyway.

New results are as follows. (Each pass is set to run for 60 seconds.)


Ok, the results are cool. But how do we explain them? For this purpose, by finding his name from Wikipedia IP aliasing page, I got in touch with Juan José Ciarlante ‐ the author of first IP aliasing support in Linux kernel in 1995 ‐ and he kindly replied my questions. Below, I directly quote from his own words.

you may be "just" exploiting the fact that you're using more TCP connections (==N_WORKERS) as number of aliases increases, and thus increasing the "parallelism of your transferring pipes", much in a way p2p networks do (spread the downloading between zillion of connections), suggest reading about TCP congestion window control semantics.

Finally, we have a valid answer!

Sunday, May 27, 2012

OpenVSwitch Installation Notes: OpenFlow Toolbelt

In the lab, I quite often find my self installing Open vSwitch instances, configuring network parameters for certain functionality and probing network interfaces to check connectivity. You know, the daily routines in an OpenFlow research laboratory. To ease the workflow, I put together a small toolbelt.

Usage: <ACTION>
Actions: ofctl
$ dl_ovs
$ install_ovs openvswitch-1.4.1.tar.gz
$ db_create
$ db_start
$ vsctl add-br br0
$ set_dpid br0 1
$ get_dpid br0
$ get_stp br0
$ set_stp br0 true
$ insmod
$ vswitchd
$ ethtool_help
ethtool eth0
ethtool -s eth0 speed 10 duplex full autoneg off
ethtool -s eth0 autoneg on
ethtool -s eth0 autoneg on adver 0x001    # 10 Half
                                 0x002    # 10 Full
                                 0x004    # 100 Half
                                 0x008    # 100 Full
                                 0x010    # 1000 Half(not supported by IEEE standards)
                                 0x020    # 1000 Full
                                 0x8000   # 2500 Full(not supported by IEEE standards)
                                 0x1000   # 10000 Full
                                 0x03F    # Auto
$ sudo ethtool -s eth0 speed 10 duplex half autoneg off
$ vsctl add-port br0 eth0

You can find the sources in the following gist. Feel free to use it for your own purpose. (Free as in free beer.) No need to say but here it goes: Comments/Contributions are highly welcome.

Friday, May 11, 2012

Performance of Linux IP Aliased Network Interfaces

TL;DR ‐ I put together a setup to measure the performance of IP aliasing in Linux. As the numbers at the bottom of the post describe, observed throughput increases as the number of aliases increase. WTF?

For a couple of months I have been putting together some fancy hacks using IP aliasing feature in Linux, that is, associating more than one IP address to a network interface. The limits of IP aliasing are endless...

$ sudo ifconfig eth0 netmask
$ for I in `seq 0 254`; do sudo ifconfig eth0:$I 192.168.2.$I; done

But obviously there is a price (overhead) to pay for this at kernel level. To shed some more light into the problem at hand, for experimentation purpose, I setup a simple network as follows.

First, I setup two identical Linux boxes with gigabit ethernet cards (RTL-8169 Gigabit Ethernet [10ec:8169] rev 10) connected through a Huawei gigabit switch. (Cable is CAT6 262M of length 1 meter.) Then, I started creating iperf instances binded to particular IP aliased interfaces. That is, first iperf instance is bind to at eth1:1, second is bind to at eth1:2, and so on. In other words, Nth iperf instance is bind to 192.168.2.N at eth1:N.

To ease the workflow, I put together a script as follows.

Using, I'll be able to start as many iperf instances (and necessary IP aliases for them) as I want. Next, I write as follows.

Then the workflow becomes relatively simple.

server$ ./ eth1:%d 192.168.2.%d 32
client$ ./ 192.168.2.%d 32 30

While going this further, nobody could stop me from writing a Gnuplot script to visualize these results.

Ok, too much talk so far. Let's get to results. (Timeout is set to 60 seconds.)


As the numbers suggest, Linux IP aliasing does a fairly good job that the overhead imposed by the aliases are nearly negligible. (At least I hope that is something I succeeded in measuring.) But the strange thing is, there is an improvement in the throughput as the number of network interfaces increase. What might be the explanation of this observation? Is my setup mistaken? I will be happy to hear your ideas.

Friday, April 27, 2012

JGroups IPv6 Broadcast Problem

Here is a quick tip: Do you regularly observe below harmless JGroups message send failures?

As Bela Ban says,

Your message are sent to an IPv6 multicast address; is this what you want? If not, and you want to force use of IPv4, use If you want to use IPv6, make sure your routing table is set up correctly.

Sunday, April 8, 2012

RPC (Remote Procedure Call) with JGroups

JGroups forms a perfect medium for RPC between nodes in a cluster. Here, I share a very basic JGroups RPC example.

I think (at least, hope) the code is self-explanatory. A minor important point here is the RPC channel. Per see, I use ChannelFactory.getInstance().create() method to create the channel. ChannelFactory is a shortcut class to create multiple channels sharing the same transport. (See my Shared Transport in JGroups post for details.) That is, you need to have a separate JGroups channel reserved for RPC communication. All other messages sent/received using this channel will be consumed and ignored by the RpcDispatcher.

Build Scripts for LaTeX-BibTeX-XFig-GnuPlot Combo

Over the past 8 years, I needed to compile LaTeX documents for this or that reason. As time passed, I enhanced them with BibTeX, XFig, and Gnuplot. Further, I generally needed to produce PDF files validated by IEEE PDF eXpress. Here, I share some scripts I developed during the road to ease the pain.

First, note that there is a certain directory structure I stick to.

  • / -- LaTeX (.tex) and BibTeX (.bib) files are placed here.
  • /constants.tex -- LaTeX file for constants (variables, definitions, commands, etc.) shared between LaTeX and XFig files.
  • / -- Compiles the whole project.
  • /figs -- XFig (.fig), Gnuplot (.gnu), and EPS (.eps) files go here.
  • /figs/
  • /figs/ -- Compiles the XFig and Gnuplot into EPS format.

Below is the entry point, / (Make sure you have latex, bibtex, dvips, and pspdf commands available.)

In /, I follow LaTeX->DVI->PostScript->PDF path. The main reason for the preference of this path over LaTeX->PDF is to properly process scalable EPS figures produced by XFig. (Personally, I hate to see broken figures in published articles.)

Next, here goes /figs/ script. (Per see, gnuplot is required during execution.)

Nothing fancy in /figs/ First, we process .fig files; second, we process .gnu files.

Finally, below goes /figs/ script. (Make fig2dev, pdflatex, pdf2ps commands ready. Note that /figs/ requires /constants.tex for shared LaTeX variables.)

After all these fuss, the whole project boils down to

$ ./ paper bibtex
$ ./ paper latex

and you are ready to go.

Scripts need a little bit more cleaning and they are probably the not most correct ones. Anyway, they served well until now, and I hope they would for you as well.

Sunday, March 25, 2012

Cleaning up the BufferedReader Mess in a Proxy Server

A couple of weeks ago, friends from the university knocked my door. They were given an assignment to implement a HTTP Proxy Server. I tried to do my best and told them the basics. That is, they should first simply read the HTTP headers line by line, and then read the rest of the stream in bytes. After that, the mechanics are easy:
  1. Pass the request headers from browser to the server, which is, provided by Host header in the browser request,
  2. Pass back the response sent by server to the browser.
Easy, right? Just before the homework due, my door knocked again. Suprise! Suprise! They couldn't properly read the server data after reading the headers. It sounded like a trivial problem at first, as hours pass by while I'm trying to fix the code, it appeared to be not like so. Since you are here for the code, let me first show you the working draft.
import java.util.ArrayList;
import java.util.List;
import java.util.regex.Pattern;

public class Main {
    static final int PROXY_PORT = 8080;
    static final String PROXY_HOST = "localhost";
    static final String NEWLINE = "\r\n";

    public static void main(String[] args) throws Exception {
        ServerSocket proxySocket = new ServerSocket(
                PROXY_PORT, 32, InetAddress.getByName(PROXY_HOST));

        while (true) {
            // Accept the incoming connection.
            Socket clientSocket = proxySocket.accept();
            BufferedInputStream clientInputStream = new BufferedInputStream(
                    new DataInputStream(clientSocket.getInputStream()));
            OutputStream clientOutputStream = new DataOutputStream(

            // Read client headers.
            List<String> clientHeaders = readHeaders(clientInputStream);
            display("Client Headers", clientHeaders);

            // Locate the web server.
            String hostHeader = getHeader(clientHeaders, "Host");
            display("HostHeader", hostHeader);
            String[] hostHeaders = hostHeader.split(":");
            String hostName = hostHeaders[0];
            display("HostName", hostName);
            int hostPort = hostHeaders.length > 1
                    ? Integer.parseInt(hostHeaders[1]) : 80;
            display("HostPort", hostPort);

            // Connect to the web server.
            Socket serverSocket = new Socket(hostName, hostPort);
            BufferedInputStream serverInputStream = new BufferedInputStream(
                    new DataInputStream(serverSocket.getInputStream()));
            OutputStream serverOutputStream = new DataOutputStream(

            // Pass the client request to the web server.
            writeHeaders(serverOutputStream, clientHeaders);
            display("Sent server headers.");

            // Read web server response headers.
            List<String> serverHeaders = readHeaders(serverInputStream);
            display("ServerHeaders", serverHeaders);

            // Read web server response data.
            byte[] serverData = readData(serverInputStream);
            display("ServerDataLength", serverData.length);

            // Try to sign the response data.
            byte[] signedData = sign(serverHeaders, serverData);

            // Pass the web server response to the client.
            writeHeaders(clientOutputStream, serverHeaders);



    static byte[] readData(InputStream stream) throws Exception {
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        BufferedInputStream bufferedStream = new BufferedInputStream(stream);
        byte[] buf = new byte[8192];
        int len;
        while ((len = > 0)
            baos.write(buf, 0, len);
        return baos.toByteArray();

    static void writeHeaders(OutputStream stream, List<String> headers)
            throws IOException {
        StringBuilder builder = new StringBuilder();
        for (String header : headers) {

    static List<String> readHeaders(BufferedInputStream stream)
            throws Exception {
        List<String> lines = new ArrayList<String>();
        BufferedReader reader = new BufferedReader(
                new InputStreamReader(stream));
        String line;
        long nRead = NEWLINE.length();  // For the last empty line.
        while ((line = reader.readLine()) != null && !(line.isEmpty())) {
            nRead += line.getBytes().length + NEWLINE.length();
            if (!line.startsWith("Accept-Encoding"))    // Avoid compressed pages.
        long nSkipped = stream.skip(nRead);
        assert (nSkipped == nRead);
        return lines;

    static String getHeader(List<String> headers, String name) {
        for (String line : headers)
            if (line.startsWith(name))
                return line.split(": ")[1];
        return null;

    static byte[] sign(List<String> headers, byte[] data) {
        String header = getHeader(headers, "Content-Type");
        if (header == null || !header.startsWith("text/html"))
            return data;
        String content = new String(data);
        Pattern pattern = Pattern.compile("^(.*<title>)(.*</title>.*)$",
                Pattern.CASE_INSENSITIVE | Pattern.MULTILINE);
        return pattern.matcher(content).replaceFirst("$1[VY] $2").getBytes();

    static void display(String title, List<String> lines) {
        System.out.println("### <" + title + "> ###");
        for (String line : lines)
            System.out.println("'" + line + "'");
        System.out.println("### </" + title + "> ###");

    static void display(String title, Object obj) {
        System.out.println("### " + title + ": '" + obj + "'");

    static void display(Object obj) {
        System.out.println("### " + obj);

    static void display() {
The truth is, it took my almost four hours to find and squash the bug. serverSocket.shutdownOutput() was a low hanging one, it solved the problem of web server waiting to start sending data. But take a closer look at the readHeaders() method. You see the mess? Particularly, the ones with stream arithmetic using mark(), reset() and skip() calls. The problem was, in order to make readLine() requests on an InputStream, you first need to wrap it with a BufferedReader. But once you wrap it up and make a call to readLine(), BufferedReader buffers the input stream to a position that is much more advanced than you generally expect it to be. Hence, after you finish reading headers and continue with reading the response data in bytes, read() tells you that there is nothing left to read. To avoid such nasty bugs, after reading from an InputStream using some sort of buffered reader, do not forget to reset the stream to a position where you expect it to be.

Monday, March 19, 2012

Shared Transport in JGroups

A transport in a JGroups application is a heavy-weight concept and a majority of the JGroups entities (remote procedure call, replicated hash map, etc.) generally require their dedicated channels for communication. However, by sharing a transport between multiple channels, it is possible to optimize available resources. (Remember that each transport occupies a thread pool and a socket.) For this purpose, one just needs to set the singleton_name attribute for the transport.

Unfortunately, it is not possible to alter the singleton_name attribute of a transport using something like:
String singletonName = "yabba";
JChannel channel = new JChannel();
        .setValue("singleton_name", singletonName)
At the moment, for this purpose, you have to alter the XML specification of the channel you are using. (See Bela Ban's reply to Setting Up a Shared Transport post in JGroups mailing-list.)
For this purpose, I put together a ChannelFactory as follows.
final class ChannelFactory {
    private static ChannelFactory instance = null;
    private static final String singletonName = "shared";
    private List channels;

    private ChannelFactory() {
        channels = new ArrayList();

    synchronized JChannel create() throws Exception {
        JChannel channel = new JChannel(
        return channel;

    synchronized void clear() {
        for (JChannel channel : channels) {

    static synchronized ChannelFactory getInstance() {
        if (instance == null)
            instance = new ChannelFactory();
        return instance;
Later, I copied (say) udp.xml from JGroups sources to META-INF/channel.xml. And I added singleton_name="shared" line to the protocol specification in META-INF/channel.xml file.

Now, my individual classes make calls to ChannelFactory.getInstance().create() to get a channel dedicated to them. And while program shuts down, I just make a call to ChannelFactory.getInstance().clear() to tidy things up.

Sunday, March 11, 2012

Chat Application with Play Framework and KnockoutJS

For a geography based social web application, I chose to work on a Single Page Application template. For this purpose, I needed a Javascript framework to do the necessary plumbing for synchronizing server side and client side model objects. It took my nearly 3-4 weeks to finish the implementation. I was very happy with its final state and was excited to use it in the rest of the project. Next morning, I met with Backbone.js with a suggestion from a friend of mine. Damn! My implementation was so close to Backbone.js that almost every function name was identical. Anyway, I couldn't know if I should cry or smile. Following that path, it didn't take long for me to discover KnockoutJS. I found it astonishingly beautiful and directly dived into the depths of the tutorial and spent that half day to read-and-evaluate the exercises. Since the best way to learn something is to use it, in this blog post I will try to walk you through a chat application using Play Framework and KnockoutJS.

In order to save some effort, I'll use the Facebook Login mechanics described in my Facebook Login and Secure Module Integration post. Since I put together the necessary pieces into a GitHub project (play-facebook), I will bootstrap directly from there.
$ git clone git:// play-chat
$ cd play-chat
$ play deps
$ emacs conf/application.conf            # Edit
$ emacs app/controllers/    # Edit FBOAuth
$ play run
So far, so good. Now, we need a Room model to store the posted messages.
public class Room {
    private static Room instance = null;
    public static final int EVENT_STREAM_SIZE = 100;
    private final ArchivedEventStream<Event> eventStream =
            new ArchivedEventStream<Event>(EVENT_STREAM_SIZE);

    public void publish(Event event) {

    public Promise<List<IndexedEvent<Event>>> nextMessages(long lastReceived) {
        return eventStream.nextEvents(lastReceived);

    public static abstract class Event {
        public final String date;
        public final String user;
        public final Type type;

        public final static DateFormat dateFormat =
                new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");

        public enum Type {

        public Event(User user, Type type) {
   = dateFormat.format(new Date());
            this.user =;
            this.type = type;

    public static class JoinEvent extends Event {
        public JoinEvent(User user) {
            super(user, Type.JOIN);

    public static class LeaveEvent extends Event {
        public LeaveEvent(User user) {
            super(user, Type.LEAVE);

    public static class MessageEvent extends Event {
        public final String text;

        public MessageEvent(User user, String text) {
            super(user, Type.MESSAGE);
            this.text = text;

    public static Room getInstance() {
        if (instance == null)
            instance = new Room();
        return instance;
Below is the list of major components provided by the Room model.
  • eventStream, which is of type ArchivedEventStream<Event>, is the stream of messages that are published in the room. We bound the size of the stream window with EVENT_STREAM_SIZE.
  • nextMessages, returns the list of messages that are published since the time pointed by passed index number, lastReceived. The main crux here is that, we return a Promise, which is actually a tiny magic provided by ArchivedEventStream. Hence, web server can process these requests asynchronously without blocking the whole thread. And this results in a huge increase in the number of simultaneous requests that we can handle.
  • Event is the base type that is fed to eventStream. We represent each user action (join, leave, message) by corresponding subclasses of Event, that is, JoinEvent, LeaveEvent, and MessageEvent. As a side note, these classes will be serialized to their JSON counterparts while getting transferred to the client side. Hence, try to keep them as compact as possible.
Since we have a Room, now we will implement our Chat controller.
public class Chat extends Controller {
    public static void index() {
        renderArgs.put("username", Security.getSessionUser().name);

    public static void join() {
                new Room.JoinEvent(Security.getSessionUser()));

    public static void leave() {
                new Room.LeaveEvent(Security.getSessionUser()));

    public static void say(String text) {
                new Room.MessageEvent(Security.getSessionUser(), text));

    public static void waitMessages(long lastReceived) {
                // Here we use continuation to suspend the execution until a
                // new message arrives.
                new TypeToken<List<IndexedEvent<Room.Event>>>() {}.getType());
See the @With(Secure.class) annotation? Yes, only signed in Facebook users are allowed. Per see, index(), join(), leave(), say() methods are clear. The significant bit here is the waitMessages method. Each client will make long XHR polls to the server and wait for incoming messages. And waitMessages will render necessary number of messages from Room event stream in JSON. Since Room.nextMessages() returns a Promise, waitMessages will work in an asynchronous (non-blocking) manner.

Let's modify app/views/Application/index.html as follows to provide a link to the chat room for the signed in users.
#{extends 'main.html' /}
#{set title:'Home' /}

#{if user}
<h1>Welcome, ${}! #{if user.isAdmin}(admin)#{/if}</h1>
<a href="@{Secure.logout()}">Logout</a>
<a href="@{Chat.index()}">Chat Room</a>

#{fbLogin text:'Log In', perms:'publish_stream' /}
It is time for the client side implementation of the chat room. I'll first present you the app/views/Chat/index.html as is, and then explain the bits in it. Here we go.
#{extends 'main.html' /}
#{set title:'Chat Room' /}
#{set 'moreScripts'}
    #{script 'jquery.scrollTo.js' /}
    #{script 'knockout.js' /}
#{set 'moreStyles'}
    #{stylesheet 'chat.css' /}

    <button data-bind="enable: !isJoined(), click: join">Join</button>
    <button data-bind="enable: isJoined, click: leave">Leave</button>

<div id="messages" data-bind="foreach: messages">
        <span class="date" data-bind="text: date"></span>
        <span class="message" data-bind="if: type == 'MESSAGE'">
            <span class="user"
                  data-bind="text: user,
                             css: { you: user == '${username}' }"></span>&gt;
            <span class="text" data-bind="text: text"></span>
        <span class="join" data-bind="if: type == 'JOIN'">
            <span class="user" data-bind="text: user"></span> joins the room.
        <span class="leave" data-bind="if: type == 'LEAVE'">
            <span class="user" data-bind="text: user"></span> leaves the room.

<form data-bind="submit: say">
    <label for="inputText">Type your text here:</label>
    <input id="inputText" type="input"
           data-bind="enable: isJoined, value: messageText" />
    <input type="submit" value="Send" data-bind="enable: isJoined" />

<script type="text/javascript">
    $(document).ready(function() {
        var XHR = {
            join: #{jsAction @join() /},
            leave: #{jsAction @leave() /},
            say: #{jsAction @say() /},
            waitMessages: #{jsAction @waitMessages(':lastReceived') /}

        var RoomModel = function() {
            var self = this;
            var lastReceived = 0;

            self.isJoined = ko.observable(false);
            self.messages = ko.observableArray([]);
            self.messageText = ko.observable("");

            self.join = function() {
                $.post(XHR.join(), {}, function() {

            self.leave = function() {
                $.post(XHR.leave(), {}, function() {

            self.say = function() {
                $.post(XHR.say(), {text: self.messageText()});

            var getMessages = function() {
                if (self.isJoined())
                    $.getJSON(XHR.waitMessages({lastReceived: lastReceived}), {},
                            function(events) {
                                $(events).each(function() {
                                    lastReceived =;

            $(window).unload(function() {
                if (self.isJoined())

        var roomModel = new RoomModel();
Before diving into the sources, you will need to get jquery.scrollTo.js and knockout.js files.
$ cd public/javascripts
$ wget -O jquery.scrollTo.js
$ wget -O knockout.js
And a simple CSS (public/stylesheets/chat.css) for the chat room.
#messages {
    height: 200px;
    overflow: auto;
    font-family: Arial, Verdana, sans-serif;
    font-size: 10pt;
    padding-bottom: 8px;
    padding-top: 8px;

#messages .user { font-style: oblique; }
#messages .message .user { font-style: normal; }
#messages .you { font-weight: bold; }
#messages .join { color: green; }
#messages .leave { color: red; }
#messages .date { font-family: monospace; padding-right: 6px; }
Let me introduce you to KnockoutJS with a very simple snippet from the above mess.
  1. There is a javascript object, called RoomModel, and it has a boolean field called isJoined.
  2. There is a submit button to send messages to the room and we want the button to be enabled and disabled appropriately according to the RoomModel.isJoined flag.
  3. We instantiate RoomModel as an observable using KnockoutJS.
    var roomModel = new RoomModel();
  4. We set isJoined to an observable as well.
    self.isJoined = ko.observable(false);
  5. Add data-bind="enable: isJoined" attribute to the button.
After three easy settings, we have the send button auto-magically binded to the isJoined flag. When there occurs a change on the isJoined variable, change will be propagated to the all dependent objects.

Another good thing about KnockoutJS observables is that the change propagation channel is bidrectional. That is, when an observable variable changes, this change gets propagated to the rest of the binded elements. Moreover, a change on the binded elements are also gets reflected to the variables as well. Let's see this in action.
  1. There is an observable text field called messageText.
  2. Message input box is binded to this observable through data-bind="enable: isJoined, value: messageText" attribute.
Here, messageText variable and the value of the input box are bidirectionally synchronized by KnockoutJS observables. Another cool KnockoutJS feature is its foreach looping functionality over observable arrays. We see on in action while printing out the incoming messages. That is,
  1. Messages are stored in an observable array via
    self.messages = ko.observableArray([]);
  2. We loop over the messages observable via
    <div ... data-bind="foreach: messages">
  3. Inside the loop body, current array element is accessed as is. (Remember the list of JSON-inified Room.Event subclasses?)
There are other cool KnockoutJS tricks that are available in the above chat room sources (e.g., if: type == 'MESSAGE', css: { you: user == '${username}', etc.), but explaining all of them is out of the scope of my agenda, at least not in this blog post. I strongly advise you to take a look at the KnockoutJS tutorials and documentation.

Near the end, I also would like to take your attention to the XHR singleton in the code.
var XHR = {
    join: #{jsAction @join() /},
    leave: #{jsAction @leave() /},
    say: #{jsAction @say() /},
    waitMessages: #{jsAction @waitMessages(':lastReceived') /}
Thanks God, Play handles server side and client side connectivity of the XHR functions for me. (Look ma, I can even pass parameters!)

Ok, enough talking. Let's do it!
And here I join to the room.

To sum up, I would definitely suggest everyone to checkout KnockoutJS. (Did I say that KnockoutJS provides one of the best tutorial experiences I have ever seen?) It was a really exiciting experience to work with KnockoutJS and to be honest, I found KnockoutJS so overwhelming and expressive that I would never ever code a single dynamical web page without it.

Anyway, let's finish this post here. Thanks for reading this far and hope this post helps to you as well.

Note: Source are available through play-chat GitHub project.

Thursday, February 23, 2012

Synchronization Between JGroups.ReplicatedHashMap's

ReplicatedHashMap of JGroups provides a distributed ConcurrentHashMap implementation. In its simplest form, it propagates every local change to all other instances in the cluster and you get your replicated-HashMap between available cluster members.

In a cluster environment, new node arrivals are highly anticipated, but ReplicatedHashMap doesn't really keep up with the late joiners. See below execution flow for a late joiner scenario.

TimeNode ANode B
2adds 1
3gets 1
4sees {1}
6sees {}
7adds 2
8gets 2gets 2
9sees {1,2}sees {2}

In the above figure, nodes are out of sync at time steps 6 and 9. To avoid such cases, we need to force ReplicatedHashMap for a sync during initialization. For this purpose, quoting with Bela Ban's (author of JGroups) words,
  1. Move JChannel.connect() after the creation of ReplicatedHashMap,
  2. I also suggest to call ReplicatedHashmap.start(), or else there won't be any state transfer.
See ReplicatedHashMap Updates for Late Joiners mailing-list for the related discussion.

Monday, February 20, 2012

Facebook Login and Secure module integration in Play Framework

Play has a really neat module, called Secure, to ease the authentication mechanics in a web application. Unfortunately, nowadays none of us would really be willing to fill yet another registration form, that is where Facebook Login (and similars) comes into play. The problem is, Facebook Login is not something trivial, and integrating it into Secure poses another challange. In this post, I will try to walk you through a Play web application with Secure module enabled and wrapped around Facebook Login. We will use OAuth2 provided by Play to get an access token from Facebook, and later issue queries using RestFB.

Let's start with creating the project directory.
play new PlayTest

Next, we introduce module dependencies to conf/dependencies.yml file.
    - play 1.2.4
    - play -> secure
    - com.restfb -> restfb 1.6.9

Let Play resolve the dependencies for us.
play dependencies

As a first step, let's start with creating an entrance page. Here goes our app/views/Application/index.html.
#{extends 'main.html' /}
#{set title:'Home' /}

#{if flash.error}<p class="error">&{flash.error}</p>#{/if}
#{if flash.success}<p class="success">&{flash.success}</p>#{/if}

#{if user}
<h1>Welcome, ${}</h1>
<p>id: ${user.uid},
name: ${},
isAdmin: ${user.isAdmin}</p>
<p><a href="@{Secure.logout()}">Logout</a></p>

<h1>Hello, stranger!</h1>
<a href="@{Secure.login()}">Login</a>

Oops! But we still didn't add secure module routes. Add below lines to your conf/routes file.
# Import Secure routes
*       /                                       module:secure

Before typing play run in the console, I will first add http.port=80 line to conf/application.conf and add line to /etc/hosts file. I want my application to get served on at localhost. This small trick will enable me to work with Facebook Login on localhost, before going public. (Remember that Facebook Apps require a proper domain name to redirect issued requests.)

Let's start our application.
sudo play run

And browse to

So far so good. Login link in the above page redirects us to /secure/login, hence we will need to override login.html (and layout.html for altering <html> tag attributes) contents.
play secure:ov --login
play secure:ov --layout

First add xmlns:fb="" attribute to <html> tag in app/views/Secure/layout.html. Later, override app/views/Secure/login.html as follows.
#{extends 'Secure/layout.html' /}

#{if flash.error}<p class="error">&{flash.error}</p>#{/if}
#{if flash.success}<p class="success">&{flash.success}</p>#{/if}

<div id="fb-root"></div>
<script type="text/javascript" src=""></script>
<fb:login-button perms="publish_stream">
    <a href="@{Security.auth()}" class="fb_button fb_button_medium">
        <span class="fb_button_text">Log In</span>

Click to Login link and go to

Yay! Now we have a Facebook Login button that will redirect Facebook approved login requests to Security.auth() method. (See <a href="@{Security.auth()}" class="fb... line.) Before going into the details of Security controller, we first need to create a Facebook App and User model to store registered users. For this purpose,  go to Facebook Apps page and create a new application. Don't forget to note down the App ID/API Key and App Secret of the created app. And you need to set Site URL to Later, create app/models/ as follows.
@Table(name = "users")
public class User extends Model {
    public String uid;

    public String name;

    public boolean isAdmin;

    public User(String uid, String name) {
        this.uid = uid; = name;

Now here comes the trick, the Security controller. Let's first write the code: Here goes app/controllers/

public class Security extends Secure.Security {
    static OAuth2 FBOAuth = new OAuth2(
            "17014613976****",  // App ID/API Key
            "04bf6165527ec9e30a7d2aa380e5****"  // App Secret

    public static void onAuth() {
        if (OAuth2.isCodeResponse()) {
            OAuth2.Response response = FBOAuth.retrieveAccessToken(onAuthURL());
            FacebookClient fbClient = new DefaultFacebookClient(response.accessToken);
            User fbUser = fbClient.fetchObject("me", com.restfb.types.User.class);
            models.User user = models.User.getById(fbUser.getId());
            if (user == null)
                user = new models.User(fbUser.getId(), fbUser.getName()).save();
            session.put("username", user.uid);  // Required by Secure.
            session.put("uid", user.uid);
            flash.success("You are logged in.");

    static String onAuthURL() {
        return play.mvc.Router.getFullUrl("Security.onAuth");

    public static void auth() {

    static void onDisconnected() {
        flash.success("You have been logged out.");

In login.html, we have prompted Facebook Login to redirect issued requests to Security.auth(). Here, in auth(), we tell FBOAuth to retrieve verification code (provided by OAuth2 protocol) via onAuth() method. In onAuth(), we first retrieve the access token and use this access token to construct a Facebook client using RestFB. (In other words, access token is the key for us to talk with Facebook behalf of the logged in user.) Later, using created Facebook client, we retrieve the user information into a com.restfb.types.User object. Consequently, we try to locate the current user in the available list of users, or create a new one if no such user exists. Finally, we store the user ID in uid and username session variables. The significant bit in here is the username session variable, which is required by Secure module to figure out if a user is logged in or not. The other uid session variable is for our own use.

Now we have necessary plumbing to authenticate a Facebook user. Do you remember the #{if user} line in index.html? Yep, now it's time to put the user as a render argument for Application.index view. For this purpose, edit app/controllers/ as follows.
public class Application extends Controller {
    static void setRenderArgs() {
        if (session.contains("uid"))
            renderArgs.put("user", User.getById(session.get("uid")));

    public static void index() {

After creating Security controller and updating Application.index() method, click on the Facebook Log In button appeared on the /secure/login page. When you finish giving related access rights to the application, you should have forwared to the Application.index view, which should look like as follows.

Let's give Logout link a try.

Now you can just use @With(Secure.class) annotations to define access control for your controllers. (As a next step, you might want to create a Users CRUD controller with access control.)

Edit: Don't forget to checkout the play-facebook, which is ready to go Play Framework application with Facebook Login integration. It follows the mechanics described in here and provides an archetype to bootstrap custom applications.