Jake McCrary

Generating test cases in Clojure

Recently I was writing some data mining Clojure code which needed to parse a log file and do some transforms of the data. Some of the transforms were dependent on data found across multiple lines. There was no ordering or proximity guarantees to these lines.

This required the code to handle a variety of situations. After writing a couple simple tests and getting those passing I wanted to more extensively test my solution. I was lazy though and did not want to hand code all of the potential orderings. Enter permutations.

permutations is a function out of clojure.contrib.combinatorics. As the name suggests, you give it a collection and it returns a lazy sequence containing all the different permutations of the elements in that collection. An example is below.

1
2
3
4
5
user>(ns generate)
generate>(use '[clojure.contrib.combinatorics :only [permutations]])
nil
generate> (permutations [:a :b :c])
((:a :b :c) (:a :c :b) (:b :a :c) (:b :c :a) (:c :a :b) (:c :b :a))

You can already see where this is going. I was able to use permutations to generate all the potential different orderings of the input. This saved me the trouble of having to do that by hand.

One difficulty of generating test inputs pragmatically is telling what sort of inputs caused it to fail. To get around this I used the rarely used (at least in code I’m working on) second argument of clojure.test’s is. This second argument is a message that prints on a failure.

Below is a contrived example of using permutations to test an obviously wrong silly-add function. silly-add is defined below.

1
2
3
4
5
6
generate> (defn silly-add
              [x & xs]
              (if (zero? x)
                  (apply + 40 xs)
                  (apply + x xs)))
#'generate/silly-add

Below is a test that uses permutations to exercise silly-add with all the potential orderings three input numbers. Note that it takes advantage of the second argument to is. Without this we would not know what input caused the failure.

1
2
3
4
5
6
7
generate> (use 'clojure.test)
nil
generate> (deftest generate-some-tests
            (doseq [input (permutations [1 0 9])]
                   (is (= 10 (apply silly-add input))
                       (str "Failed on input: " (seq input)))))
#'generate/generate-some-tests

Running the test we see that there is clearly an error.

1
2
3
4
5
6
7
8
9
10
11
12
generate> (run-tests)
Testing generate

FAIL in (generate-some-tests) (NO_SOURCE_FILE:1)
Failed on input: (0 1 9)
expected: (= 10 (apply silly-add input))
  actual: (not (= 10 50))

FAIL in (generate-some-tests) (NO_SOURCE_FILE:1)
Failed on input: (0 9 1)
expected: (= 10 (apply silly-add input))
  actual: (not (= 10 50))

permutations saved me a bit of time and let me test some situations that I otherwise would not have tested. This actually exposed a subtle bug in my code. Hopefully it can do the same for you.

Quickly starting a powerful Clojure REPL

I often find myself browsing the Internet and then suddenly I want to have a Clojure REPL at my fingertips. As I’ve become better with emacs and paredit I’ve become dependent on the powerful editing this combo affords. The rest of this post details how I changed my five step process into a two step process. It does not explain basic emacs/slime setup but rather explains how I cut a few steps out of a suboptimal workflow for getting a powerful Clojure REPL up and running in emacs.

My previous workflow was the following:

  1. Open a terminal
  2. Change to the root of Clojure project where I use Leiningen and have swank-clojure as a dependency.
  3. Run the command lein swank
  4. Start emacs
  5. Run M-x slime-connect

This five step process was terrible. From me seeing something interesting to try to having a REPL open took too much time.

Today I changed my process so it on takes two steps. They are:

  1. Start emacs
  2. Run M-x clojure-swank

This is a much better. I’ll admit had a lot of room for improvement so it wasn’t too hard to make it better. Below are the steps I took to cut three steps.

First, using Leiningen 1.4.0, I ran lein install swank-clojure 1.3.0-SNAPSHOT. This installed a script called swank-clojure into $HOME/.lein/bin. When run, this script starts a swank server waiting for connections on port 4005.

Next I wrote a function in elisp that gives emacs the ability to call the newly installed swank-clojure script, wait for the swank server to start, and then connect to it. This function, clojure-swank, can be seen below. It creates a buffer named *clojure-swank*, runs the newly installed script, and captures the output in the freshly created buffer. When the “Connection opened” line appears slime-connect is called, connecting emacs to the freshly started swank server. After this we are at the REPL with all the advantages that emacs and paredit give us.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
(defun clojure-swank ()
  "Launch swank-clojure from users homedir/.lein/bin"
  (interactive)
  (let ((buffer (get-buffer-create "*clojure-swank*")))
    (flet ((display-buffer (buffer-or-name &optional not-this-window frame) nil))
          (bury-buffer buffer)
          (shell-command "~/.lein/bin/swank-clojure &" buffer))
    (set-process-filter (get-buffer-process buffer)
                        (lambda (process output)
                           (with-current-buffer "*clojure-swank*" (insert output))
                           (when (string-match "Connection opened on local port +\\([0-9]+\\)" output)
                             (slime-connect "localhost" (match-string 1 output))
                             (set-process-filter process nil))))
    (message "Starting swank.. ")))

I’ve also written a clojure-kill-swank function for stopping the swank server.

1
2
3
4
5
6
7
8
9
10
11
12
(defun clojure-kill-swank ()
  "Kill swank process started by lein swank."
  (interactive)
  (let ((process (get-buffer-process "*clojure-swank*")))
    (when process
      (ignore-errors (slime-quit-lisp))
      (let ((timeout 10))
        (while (and (> timeout 0)
                    (eql 'run (process-status process)))
          (sit-for 1)
          (decf timeout)))
      (ignore-errors (kill-buffer "*clojure-swank*")))))

Both of those functions need to be added to a location where they will get defined on emacs start-up. Once this is done the powerful REPL you are used to emacs providing can be at your finger tips in practically no time at all.

Trampolining through mutual recursion with Clojure

The other day I stumbled across some Clojure code that used mutual recursion. Mutual recursion can be a valuable tool when solving a problem. Unfortunately because of the lack of tail call optimization on the JVM this can be a dangerous technique when writing Clojure code. It can be easy to forget about this limitation and end up writing code that blows the stack.

Take the classic even/odd checking code from the Wikipedia page. If we just translate it to Clojure it will cause a stack overflow error when we pass in a large number. The massive number of function calls require before returning causes too much memory to be consumed.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
(declare my-odd?)

(defn my-even? [n]
  (if (zero? n)
    true
    (my-odd? (dec (Math/abs n)))))

(defn my-odd? [n]
  (if (zero? n)
    false
    (my-even? (dec (Math/abs n)))))

user> (my-even? 1000000)
; Evaluation aborted. <- this is a result of java.util.StackOverflowError

Luckily since Clojure 1.0 there has been a useful function for dealing with this. trampoline, with minor modifications to your code, can be used to get around the lack of tail call optimizations (docs here).

trampoline takes a function (and, if needed, arguments to pass into the function) and calls it. If the function returns a function then trampoline calls that. As long as functions are returned trampoline will continue calling them. When a non-function value is returned trampoline returns, passing through the value.

To make our sample code work with trampoline we simply change our functions to return a closure which wraps the call that was previously being executed. This just entails putting a # before the final s-exp. This takes advantage of Clojure’s anonymous function syntax to change the function call into a closure which is returned.

1
2
3
4
5
6
7
8
9
(defn my-even? [n]
  (if (zero? n)
    true
    #(my-odd? (dec (Math/abs n)))))

(defn my-odd? [n]
  (if (zero? n)
    false
    #(my-even? (dec (Math/abs n)))))

By doing this we’ve changed how the caller interacts with my-even? and my-odd?. It now needs to be called by trampoline.

1
2
user> (trampoline my-even? 1000000)
true

Now we no longer suffer from the stack overflow error.

I think we can still do better though, because now the caller of my-even? and my-odd? suffers since they are forced to remember to use trampoline. By forcing this on the caller, we’ve pushed what should be hidden implementations details into the callers code. We can fix this by pushing the use of trampoline into our functions.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
(defn my-even? [n]
  (letfn [(e? [n]
              (if (zero? n)
                true
                #(o? (dec (Math/abs n)))))
          (o? [n]
              (if (zero? n)
                false
                #(e? (dec (Math/abs n)))))]
    (trampoline e? n)))

(defn my-odd? [n]
  (not (my-even? n)))

user> (my-even? 1000000)
true
user> (my-odd? 1000000)
false

Now we have the best of both worlds. Mutual recursion without the worry of a stack overflow and functions that don’t force the caller to be aware of the implementation details.

Inserting values into a nested map in Clojure

Recently I was writing some Clojure with a coworker and we needed to insert values into a nested map structure. Our first solution (and example of using it at the repl) looked something like this.

1
2
3
4
5
6
7
8
9
(defn add-to-cache [cache key1 key2 data]
  (let [entry (get cache key1 {})
        new-entry (assoc entry key2 data)]
    (assoc cache key1 new-entry)))

user> (-> (add-to-cache {} :chicago :lakeview :jake)
          (add-to-cache :sf :mission :dan)
          (add-to-cache :chicago :wickerpark :alex))
{:sf {:mission :dan}, :chicago {:wickerpark :alex, :lakeview :jake}}

This worked but seemed overly verbose for doing what (in our minds) should have been a simple operation. After some digging around in the docs we found the function assoc-in. This useful function allowed us to greatly simplify the code.

1
2
3
4
5
6
7
(defn add-to-cache [cache key1 key2 data]
  (assoc-in cache [key1 key2] data))

user> (-> (add-to-cache {} :chicago :lakeview :jake)
          (add-to-cache :sf :mission :dan)
          (add-to-cache :chicago :wickerpark :alex))
{:sf {:mission :dan}, :chicago {:wickerpark :alex, :lakeview :jake}}

Much simpler and easier to read. The next person to look at the code will be able to quickly skim and tell what the code is doing.

assoc-in can also be used with nested associative structures like vectors.

1
2
3
4
user> (assoc-in [[0 1] [:a :b]] [0 1] :z)
[[0 :z] [:a :b]]
user> (assoc-in [[0 1] [:a :b]] [1 1] :z)
[[0 1] [:a :z]]

Hopefully this post makes searching for how to insert into nested maps slighly easier for the next person who thinks there must be a better way for doing this.

Plotting time series data with Incanter

Recently I found myself wanting to plot some time series data and wanted to do this in Clojure. Unfortunately Incanter, a good statistical and graphics library for Clojure, did not provide a way to plot data where the x-axis is a time value. A quick fork on github and a pull request later and now Incanter does. Since I added this functionality I thought I would write up a short example of using it.

The example time series data I’m using I took from Yahoo’s finance section. Here is a link to the csv file I used.

I’m using the read-dataset function provided by Incanter. This procedure reads a delimited file (or URL) and returns an Incanter dataset.

1
(def yhoo (read-dataset "table.csv" :header true))

Yahoo stores the date in a yyyy-mm-dd format. I need to convert that to milliseconds from the epoch so it can be used in time-series-plot as the x-axis data. To do this I wrote a function which takes the string representation of the date, splits in on “-”, then use the joda-date and to-ms functions from incanter.chrono to get the number of milliseconds from the epoch.

1
2
3
4
(defn to-milliseconds-from-epoch [x]
  (to-ms
    (apply joda-date (map #(Integer/parseInt %)
                           (.split x "-")))))

Now that we have a function which takes the string representation and get the milliseconds it is time to get the data I want from the dataset. The below code selects the :Close and :Date column while mapping the :Date column to a millisecond from epoch representation of date.

1
2
3
4
5
6
(def mod-data
     (col-names
      (conj-cols
       ($ :Close yhoo)
       ($map to-milliseconds-from-epoch :Date yhoo))
     [:Close :Date]))

The next step is to use the time-series-plot function to actually create the plot. Because the data we have is in a dataset, we can pass in the column names as the x and y parameters and provide the data set as the value to the :data key in the optional parameters.

1
2
3
4
5
(def chart (time-series-plot :Date :Close
                             :x-label "Date"
                             :y-label "Closing Price"
                             :title "Closing price over time for Yahoo"
                             :data mod-data))

Then we use the Incanter function view to actually see the chart.

1
(view chart)

Chart of historical YHOO closing prices