Friday, November 30, 2012

Merging Two Binary Search Trees in O(logn) Space

In a recent coding interview, I was asked to merge two Binary Search Trees (BSTs) without modifying the given trees. This was the first time I was exposed to this question and for an hour or so I tried to come up with a linear time algorithm of O(1) space complexity. The defeat was inevitable. (To tell the truth, if a hint would be given like O(m+n) storage is allowed, I could easily figure out the solution without breaking a sweat.) As usual, I could not sleep that night and scratched dozens of papers to come up with a better solution. But as night passed, I started to realize the fact that the space complexity is lower bounded by the tree height. This blog post is set to keep a cyber-record of this pursuit.

Related Work

You can first start by checking the Google results. But, as usual, I am kind enough to provide you a tl;dr version: In order to merge two BSTs of size m and n nodes, there are a couple of common approaches of fixed O(m+n) processing time and space complexity, some of which are listed as follows.

  • Insert given BSTs into a newly created BST.
  • Create an array of elements of the first BST in sorted order, and use this array to merge the results to a new BST while traversing the second BST.

Further exposure to the higher doses of the problem is available through this and this StackOverflow threads.

On Complexity

You will definitely need an O(m+n) processing complexity to visit each node, that's for sure. But what about O(m+n) space complexity? It means that you need to store one (or both) of the given trees in a vector in order to proceed with the merge. As it will turn out in the following paragraphs, actually space complexity is lower-bounded by the height of the tree, that is, O(h), where h=logn for a balanced tree.

The Trick

In its most basic form, we flatten both trees into two separate vectors. Next, we consume one element at a time from either of the trees with the smallest element. This scheme deserves a figure of its own.

It is certain that we effectively don't need the whole elements of a tree packed into a single vector at once. At each step, what we ask for is the next smallest element. That is, we just need a stream of nodes traversed in-order.

Let's further investigate the possibility of implementing a stream of nodes. In order to consume the elements of a binary search tree in sorted order, we need to traverse the tree in left-center-right node order. Assume that we have below traversal function. (Yes, it is in C++ and I shamelessly used templates.)

template <class T>
void traverse(Node<T>* root, queue<T>& items) {
  if (root) {
    traverse(root->left());
    items.push(root->data());
    traverse(root->right());
  }
}

What if I can suspend the execution at any point in time while pushing the data to a queue? In that case, what would be the maximum possible height of a recursive traverse() call stack? I know you like figures, so I took another photo of the board.

That is, the maximum call stack depth of a traverse() recursion is upper-bounded by the tree height. Coupled with the fact that successive traverse() calls are sufficient to consume the nodes of a tree in sorted order, we should be able to stream the nodes of tree with at most O(logn) node pointers.

Streaming Nodes

Since actual traverse call stack is bounded, we can emulate the recursive traverse using a stack of the nodes traversed so far from the root. The outline of the streaming algorithm is as follows.

The Prestige

Now we can stream a tree in sorted order using at most O(logn) storage. The rest is easy: Stream both trees and merge them while streaming.

The Code

I have implemented a streamer (NodeStream), stream merger (MergeNodeStream), and a vector merger (MergeNodeVector) in C++ and Scala. Code is accessible through this Gist. You can also find implementations of the algorithm in C++ using Boost Coroutines and Haskell written by Remko.

Wednesday, November 28, 2012

Setting up a Scala Development Environment for Intellij IDEA and SBT

Resisting against the Functional Programming Principles in Scala buzz was meaningless. After all, this or that way I know I would be repeating those steps on my own regardless of the presence of such a lecture. To warm up, I looked around for an appropriate development environment for Scala. The last time (~2 years ago) I repeated this same step ended up with a desperate search. Fortunately, reading dozens of blog posts, forum/mailing-list threads, GitHub README's, repeating try-and-fail procedures leaded me to a working setup. To mitigate the friction at least to an extent for new comers, I put up this blog post to make a list of steps in order to setup a working Scala development environment on top of IntelliJ IDEA with SBT integration.

Is there a Scala plugin available for IDEA?

Good news, yes. It is under active development, works way better than its alternatives, has a responsive maintainer and an active community. Plugin lets you create Scala projects, compile/run/debug Scala source files, scripts, and worksheets. Navigation, refactoring, tab-completion, code snippets are included as well. (Note that it is strongly advised to use an EAP version for a smooth experience.)

Is there a quick start guide for IDEA Scala plugin?

Yes, see Getting Started with IntelliJ IDEA Scala Plugin.

How do I manage project dependencies?

While one can setup a Maven/Ivy/Ant configuration for a Scala project, SBT is known to be the de-facto tool for build management throughout the Scala community. (Otherwise, you will need to set explicit scala-compiler and scala-library dependencies in XML mazes.) Fortunately, there is an SBT plugin for IDEA. It offers a console where SBT commands can be entered interactively, and a Before Launch task to delegate project compilation to SBT, as an alternative to the built in IntelliJ Make.

Is there a quick start guide for SBT?

Yes, see Hello, World in SBT documentation.

How can I integrate libraries installed by SBT to IDEA?

At its core, SBT uses Apache Ivy, which has its own nasty ways of dealing with downloaded JARs under ~/.ivy2. Instead of manually defining IDEA module dependencies for each JAR a project uses, lucky for us, there exists an SBT plugin for this purpose: sbt-idea. Basically, sbt-idea enhances SBT with a new task, called gen-idea, which creates IDEA project files with necessary module dependencies induced by SBT. That is,

  1. Add your dependencies to your SBT configuration,
  2. Call sbt update to download/update project dependencies,
  3. Call sbt gen-idea to create IDEA project files,
  4. Open created project from IDEA.

What are the IDEA modules created by sbt-idea?

In addition to below directories, SBT dependencies are added to the IDEA module configuration.

  • Source Folders: src/main/{scala,java,resources}
  • Test Source Folders: src/test/{scala,java,resources}

What about testing?

SBT supports a couple of testing frameworks, i.e., specs2, ScalaCheck, and ScalaTest. See Testing section of the SBT documentation for a detailed discussion.

What about my .gitignore?

Here you go.

/.idea
/.idea_modules
target/

I read enough, gimme the code!

Create a project directory.

$ export PROJECT_DIR=~/scala/ScalaHelloWorld
$ mkdir $PROJECT_DIR

Create $PROJECT_DIR/build.sbt as follows. (In this example, I used ScalaTest framework.)

Create $PROJECT_DIR/project directory and add below lines to $PROJECT_DIR/project/plugins.sbt to add sbt-idea plugin.

Run sbt in $PROJECT_DIR and execute gen-idea task.

$ sbt
[info] Loading project definition from /home/vy/scala/ScalaHelloWorld/project
[info] Updating {file:/home/vy/scala/ScalaHelloWorld/project/}default-70d248...
[info] Resolving org.scala-sbt#precompiled-2_10_0-m7;0.12.1 ...
[info] Done updating.
[info] Set current project to ScalaHelloWorld (in build file:/home/vy/scala/ScalaHelloWorld/)
> gen-idea
[info] Trying to create an Idea module ScalaHelloWorld
[info] Updating {file:/home/vy/scala/ScalaHelloWorld/}default-3005c4...
[info] Resolving org.scalatest#scalatest_2.9.2;1.8 ...
[info] Done updating.
[info] Resolving org.scalatest#scalatest_2.9.2;1.8 ...
[info] Excluding folder target
[info] Created /home/vy/scala/ScalaHelloWorld/.idea/IdeaProject.iml
[info] Created /home/vy/scala/ScalaHelloWorld/.idea
[info] Excluding folder /home/vy/scala/ScalaHelloWorld/target
[info] Created /home/vy/scala/ScalaHelloWorld/.idea_modules/ScalaHelloWorld.iml
[info] Created /home/vy/scala/ScalaHelloWorld/.idea_modules/ScalaHelloWorld-build.iml
>

In src/main/scala/Main.scala, create a main() stub with a testable function in it.

In src/test/scala/MainSuite.scala, create a sample test suite.

Enjoy the rest by either creating your own IDEA run configurations, or manually running tasks within the SBT console. (As a final note, while creating IDEA run configurations, you can use SBT Before Launch task provided by IDEA SBT plugin.)