Jake McCrary

My recommended Clojure testing setup

Occasionally, either on Stack Overflow or in the Clojurians Slack group, someone will ask what tools they should use to test Clojure code. Below is what I would currently recommend. I’ve come to this recommendation through observing teams using a variety of testing tools and through my own use them.

Use clojure.test with humane-test-output and lein-test-refresh.

Use clojure.test

clojure.test is ubiquitous and not a big departure from other languages' testing libraries. It has its warts but your team will be able to understand it quickly and will be able to write maintainable tests.

Use humane-test-output

You should use clojure.test with humane-test-output. Together they provide a testing library that has minimal additional syntax and good test failure reporting.

Use lein-test-refresh

If you’re not using a tool that reloads and reruns your tests on file changes then you are wasting your time. The delay between changing code and seeing test results is drastically reduced by using a tool like lein-test-refresh. Nearly everyone I know who tries adding lein-test-refresh to their testing toolbox continues to use it. Many of these converts were not newcomers to Clojure either, they had years of experience and had already developed workflows that worked for them.

Use lein-test-refresh’s advanced features

lein-test-refresh makes development better even if you don’t change any of its settings. It gets even better if you use some of its advanced features.

Below is a stripped down version of my ~/.lein/profiles.clj. The :test-refresh key points towards my recommended lein-test-refresh settings.

1
2
3
4
5
6
7
{:user {:dependencies [[pjstadig/humane-test-output "0.8.0"]]
        :injections [(require 'pjstadig.humane-test-output)
                     (pjstadig.humane-test-output/activate!)]
        :plugins [[com.jakemccrary/lein-test-refresh "0.16.0"]]
        :test-refresh {:notify-command ["terminal-notifier" "-title" "Tests" "-message"]
                       :quiet true
                       :changes-only true}}}

These settings turn on notifications when my tests finish running (:notify-command setting), make clojure.test’s output less verbose (:quiet true), and only run tests in namespaces affected by the previous code change (:changes-only true). These three settings give me the quickest feedback possible and free me from having the terminal running lein test-refresh visible.

Quick feedback lets you make changes faster. If you’re going to write tests, and you should write tests, having them run quickly is powerful. After years of writing Clojure, this is my current go-to for testing Clojure code and getting extremely fast feedback.

Tips for working from home

I’ve been working remotely since October 2013. I can barely believe that nearly three years have passed and I’ve probably spent about two weeks in a traditional office.

It took me a bit of time to adjust to working remotely. I’m better at it now than I was three years ago. The rest of this post describes some of my learnings.

But first, some background. My remote experience comes from working at Outpace and Lumanu. Both are remote first companies, almost everyone works remotely with a few people occasionally working in a San Francisco office. The work is remote but it is not asynchronous. Both companies value pair programming and real time collaboration and, as a result, employees tend to work a core set of hours. My observations are probably most applicable in a similar environment.

Setup a home workspace

Before working remotely I did not have a great home computing setup. I worked using a 13-inch MacBook Air either sitting on my couch or at my dinner table. This worked fine for the occasional work-from-home day and for evening and weekend programming. It didn’t work fine when I was spending extended hours at a computer every day.

I’ve written about my setup before. Below is a list of the improvements I made to my home working environment.

  1. Outpace provided a beefy MacBook Pro and two 27-inch Apple Cinema displays.
  2. I upgraded my chair to to a Herman Miller Setu.
  3. I bought a mStand Laptop Stand to raise my laptop to a better viewing height.
  4. I upgraded my desk to a sit-stand desk.
  5. I built my own ErgoDox keyboard.
  6. I switched to an adjustable monitor arm so I could adjust my monitor height.

With each change my working experience improved. Each improvement left me feeling better at the end of my day. Many, if not all, of these improvements are things I’d expect to have in a traditional office. Don’t skimp on your setup because it is in your home. Part of your home is your office now.

Introduce a habit that delineates work from after-work

One of the great things about working from home is that you no longer have a commute. You don’t have to dodge cars on your bicycle, squeeze into a train, or sit in traffic while driving. This is a huge benefit.

A downside of not having a commute is that you lose a forced intermission between work and non-work. My commute was either a 30-minute bicycle ride or a 30-minute public transit ride. That time acted as a forced decompression period where I focused on something that wasn’t computing.

It took me months to realize that not having this intermission was stressing me out. The intermission helped me shift between working and non-working mindsets.

I reintroduced a decompression activity between work and non-work and became less stressed and generally happier. I’ve replaced my commute intermission with reading, cooking, or riding my bicycle. I’ve found doing a non-computer activity benefits me the most.

Stop working

When I first started working from home I was very guilty of overworking. It was so easy to just keep working. I would get invested in a problem and all sudden realize it was time to go to bed. Or I’d actually stop working, only to find myself checking our application’s monitoring or pulling up the codebase when I originally sat down to do some personal task.

In an office you have signals; your coworkers leaving, the cleaning crew vacuuming, air conditioning turning off, etc., that provide a hint that you should think about stopping. You don’t have these signals when you are working remotely. There also isn’t that spatial boundary between your office and your home.

You can search online and find many articles about how overwork is detrimental. You and your employer benefit if you are not overworked.

Get out of your house

I live in Chicago. During the winter the weather is very cold. Chicago is also a big city, so all sorts of food delivery options (both cooked and uncooked) exist. The cold weather combined with food delivery makes it easy to stay inside. During the winter, I’ve realized many times that I haven’t left my apartment for days. My girlfriend can tell when I haven’t gotten outside because I’m grumpier.

If I get out of the apartment for a while I almost always come back feeling better. It barely matters what I do when I leave, after an extended period of time inside of my home just getting outside and doing anything helps. A change of scenery is good for you.

Don’t just talk about business with your remote coworkers

When you work in an office you are pretty much forced to have non-work related chats with coworkers. You should do the same with your remote team.

Having non-work related conversations helps you make better connections with your coworkers. These better connections can lead to better communication, both with voice and text, and humanize the person on the other side of the screen.

Its even better if you can do this in a video conference. Then you get to learn the facial expressions and tone of voice of your coworker. This can make it easier to interpret text communication in the way they actually mean it.

Meet in person occasionally

It is great that technology and Internet speeds have progressed enough that working remotely works well. If you want, you can make it feel like your pair is sitting right next to you. This is great.

There are still benefits to meeting in person though. The main one is that it helps you make connections with coworkers. You can eat a meal together or play board games (though, you can do this online as well but it is a different experience). It can also be easier to have certain types of group discussions (video conferences do have limitations).

When you meet in person, I’d recommend doing something different than your normal day-to-day work. Don’t just exchange remote pairing for local pairing. Try to identify that are difficult to do remotely and do them in person.

I don’t have a concrete recommendation for how often your remote company should meet but I think it should be infrequent enough where you don’t feel pressure to do normal work.

End

Working from home has its challenges but with those challenges come many benefits. It is a different experience than working in an office and that experience isn’t for everyone. The above recommendations are things that have helped me adjust to working remotely. Some of these tips are actionable at an individual level and some require buy in from the company. Hopefully this list can help give guidance towards improving your remote work situation.

Use Google to get a site's favicon

A few months ago I was implementing some changes to Lumanu’s user interface. Lumanu is a tool I’ve been working on that helps its users create, discover, and curate engaging content.

This interface change was to our discovery view. This is the view that surfaces interesting content to our users. The change involved showing the favicon of content’s origin in our interface.

I often browse the Internet with the network tab of the Chrome Developer Tools open. I do this because I find it interesting to see what services other web applications are using. I had the network tab open while browsing a site that displayed many favicons and noticed a lot fetches from google.com. This surprised me, so I took a deeper look at the requests and saw they were hitting a URL that appeared to provide favicons. It turns out you can query Google for favicons.

Example

Let’s pretend we want to get the favicon for jakemccrary.com. You simply construct a URL that looks like https://www.google.com/s2/favicons?domain=jakemccrary.com and all of a sudden you have the favicon. Just replace jakemccrary.com with the domain you care about and you’ll be rewarded with that domain’s favicon.

My favicon from Google

This definitely isn’t a new feature. If you search online you’ll see people talking about it years ago. I had never heard of it before and discovering it saved us an unknown amount of time. It allowed us to iterate on our interface without having to figure out the nuances of favicons. We were able to quickly try out the interface change and then throw it away without costing us too much time.

Speeding up my blog

I was recently reading Jeff Ramnani’s about page and I was somewhat inspired by it. It loads quickly and links to Designing blogs for readers, an interesting essay by Matt Gemmmell. Reading that essay inspired me to think about my own site and what experience I want to deliver to readers.

I can’t imagine what every reader wants but I know what I want to experience when I read an article online. Reading high quality content is my highest priority. Beyond that I enjoy when a page loads fast and the visual design doesn’t get in the way. I think a great example of these two requirements is zen habits (along with Jeff Ramnani’s and Matt Gemmell’s).

My own site sort of achieves those goals. I like to think I’m writing well-written content that helps others. I know it has helped me. With regards to visual design I think there is room for improvement. I don’t think my site’s design is actively distracting from the content though, so I’ve decided to focus on improving the page load time first.

The optimization process

As with any optimization problem it is important figure what you’re going to measure, how you’re going to measure it and your starting point. I decided to focus on my page load time, as measured by Web Page Test. I used Google’s PageSpeed Insights to score and provide helpful tips for improving page speed. Unfortunately I didn’t capture my starting point with PageSpeed Insights but I think I was scoring around a 66/100 for mobile and 79/100 for desktop.

Starting point from Web Page Test

As measured by Web Page Test, the first load of my main page took five seconds and it wasn’t fully loaded for another second. This is ridiculous. My page is almost entirely static content and most of my assets are served from CloudFlare. It should be blazing fast.

Next I looked at what was actually being loaded. Google’s PageSpeed Insights identified that I had three render-blocking script tags. The offending scripts were Modernizr, jQuery, and octopress.js. PageSpeed Insights recommends inlining JavaScript required to render the page or make loading asynchronous. I decided to go a step further and remove the need for the JavaScript.

Removing octopress.js

It turns out octopress.js was the reason Modernizr and jQuery were required. Most of what octopress.js did were things that I don’t need; some sort of flash video fallback, adding line numbers to GitHub Gists, rendering delicious links, and toggling the sidebar visibility. I was able to delete all that code.

Next up was the mobile navigation octopress.js provided. This feature enabled navigation through a <select> element when the reader’s view port was tiny. Restyling my navigation bar to fit better on small screens allowed me to remove this feature. ocotpress.js also did some feature detection for Modernizr. I stopped using image masks and was able to remove that code as well.

The remaining code in octopress.js was a workaround for an iOS scaling bug. This JavaScript was inlined into my html. At this point octopress.js was empty and with it empty the requirements for jQuery and Modernizer disappeared. This let me remove three render-blocking script tags.

Remaining JavaScript

At this point the remaining JavaScript used for my blog was enabling comments with Disqus and showing recent tweets in my sidebar. I still enjoy having comments on my blog so I’m keeping Disqus around. I doubt that readers care what my most recent tweets are so I removed Twitter’s JavaScript. Removing my tweets also cleans up my sidebar and helps keep the focus on my writing.

Nearly no JavaScript, now what?

At this point Google’s PageSpeed Insight was suggesting that I up my cache times, inline my css, and move my web fonts lower on my page. Bumping up my cache times was trivial; I simply tweaked a CloudFlare setting.

I opted to not inline my css. This would require me to modify my site’s generation and I just didn’t feel like diving down that rabbit hole. I also didn’t move the web fonts lower on the page. I find fonts re-rendering jarring and as a result kept them loading1 in my <head>.

The results

I used Web Page Test to measure again and now the page load time is down to 2.5 seconds. Page load times are cut in half from the starting point. My PageSpeed Insights scores are also higher; up to 79/100 for mobile and 92/100 for desktop.

Web Page Test after optimization

Honestly, that number still seems high2 to me and I’m sure I could get it lower. But for now it is good enough3. As a result of doing this I’ve learned more about my blogging setup and managed to speed up my page load. Now it is time to focus on researching for future posts (and at some point restyling).

Update on 2016-05-03

I completely removed web font loading from my site. Getting rid of the fonts reduced my load time, as measured by Web Page Test, by a second. Google’s PageSpeed Insights now scores this site at 90/100 for mobile and 96/100 for desktop.

Web Page Test after font removal


  1. When I first wrote this I didn’t change anything about my web fonts. After thinking about it for a few days I ended up removing them completely. Details are in the update at the end of the post.

  2. I’m asking Web Page Test to load my page using IE10. I get much faster load times using Chrome or Firefox locally which is what most of my readers use. This is good enough for now.

  3. I mean, the starting point was probably good enough but if I admitted that then I wouldn’t have had the excuse to dig into my site’s load time.

The usefulness of Clojure's cond->

Clojure’s cond-> (and cond->>) is a versatile macro. It isn’t a new macro, it has been around since version 1.5, but I finally discovered and started using it sometime last year. It isn’t a workhorse macro, you won’t be using it everyday, but it comes in handy.

What is cond->?

Let’s start by looking at the docstring.

1
2
3
4
5
6
Usage: (cond-> expr & clauses)

Takes an expression and a set of test/form pairs. Threads expr (via ->)
through each form for which the corresponding test
expression is true. Note that, unlike cond branching, cond-> threading does
not short circuit after the first true test expression.

So what does the docstring mean? Let’s break it down with an example.

1
2
3
(cond-> 10
  false inc)
=> 10

In the above example 10 is the expr mentioned in the docstring and everything after it are the clauses. Each clause is a pair made up of a test and a form. In this example there is a single clause with the value false as the test the function inc as the form. Since the test evaluates to a false value the expression is not threaded into the form. As a result the original expression, 10, is returned.

Let’s look at an example with a truthy test.

1
2
3
(cond-> 10
  true (- 2)
=> 8

Once again, 10 is the starting expression. The single clause has a test that evaluates to true so the expression is threaded into the first position of the form (- 2). The result is 8 and this is returned.

Next is an example of a cond-> with multiple clauses. Explanations are inline with the code.

1
2
3
4
5
6
7
8
9
10
(cond-> 10 ; start with 10
  ;; test evaluates to true, so apply inc to 10. Current value is now 11.
  true inc

  ;; (zero? 1) evaluates to false, do not perform action. Current value stays 11.
  (zero? 1) (+ 2)

  ;; (pos? 4) evaluates to true, thread 11 into first position of form.
  (pos? 4) (- 5))
=> 6 ; The result of (- 11 5) is 6.

If you understand the above example then you have a good grasp of cond->. But when is this functionality useful?

When do I use cond->?

Looking through the codebases I work on, I almost primarily see cond-> being used with the initial expression being a hash-map. It is being used in situations where we want to selectively assoc, update, or dissoc something from a map.

If cond-> did not exist you would accomplish those selective modifications with code similar to below.

1
2
3
(if (some-pred? q)
  (assoc m :a-key :a-value)
  m)

You can rewrite the above with cond->.

1
2
(cond-> m
  (some-pred? q) (assoc :a-key :a-value))

If you’re not used to seeing cond-> the above transformation might seem like a step backwards. I know it felt that way to me when I first saw cond->. Give yourself time to get familiar with it and you’ll be glad you’re using it.

A meatier example of using cond-> is demonstrated below. Here we’re manipulating data structures designed for use with honeysql to generate SQL statements. We start with a base-query and selectively modify it based on incoming parameters.

1
2
3
4
5
6
7
8
(defn query [req-params]
  (let [and-clause (fnil conj [:and])
        base-query {:select [:name :job]
                    :from [:person]}]
    (cond-> base-query
      (:job req-params) (update :where and-clause [:= :job (:job req-params)])
      (:name req-params) (update :where and-clause [:= :name (:name req-params)])
      (:min-age req-params) (update :where and-clause [:> :age (:min-age req-params)]))))

Hopefully this gives you a taste of cond->. I’ve found it to be quite useful. It has a place in every Clojure developer’s toolbox.

Book review: Serverless Single Page Apps

I’ve read Ben Rady’s Serverless Single Page Apps twice now. As an early technical reviewer, I was able to watch and take part in the book’s evolution. The early draft was good but the most recent near-final draft was better.

Serverless Single Page Apps walks you through building a low-cost, highly-available, serverless single page web application. It does this on top of various Amazon web services (DynamoDB, Cognito, Lambda, API Gateway, S3). If you follow along you’ll end up with a simple web application with authentication.

The book is very enjoyable. The examples are clear and the book is well written. The book uses JavaScript to implement the serverless application. For the user interface it uses plain JavaScript with a bit of jQuery and for the AWS Lambda functions you dip into some Node.js. Serverless doesn’t distract you from learning about serverless applications by forcing you to learn new JavaScript frameworks or libraries.

One of my favorite parts of the book is Ben’s use of test driven development. The examples provided give the reader a decent taste of the benefits of test-first development. Having the tests helped me when I made some silly mistakes in later parts of the book.

Overall I’d recommend this book to developers who are interested in learning what a serverless application might look like. If you follow along you’ll know how to build one by the end and will have a good starting point for diving deeper into the topic.

Reading in 2015

At the beginning of the year I generally take the time to reflect on my reading in the previous year. I’m nearly three months but I’m finally taking a look at 2015. Here are my summaries of my 2014 and 2013 reading.

I’ve continued to keep track of my reading by using Goodreads. My profile contains the full list and reviews of books I’ve read since 2010. Here is my 2015 list.

2015 Goals

2015 did not have an easily measured goal. I set the vague goal of increasing the quality of my reading by attempting to think deeper about what I’ve read.

2015 Results

I have no idea if I achieved my goal. Some books have stuck with me and I’ve thought quite a bit about the stories. Others I’ve forgotten already.

Looking at raw numbers I read 51 books in 2015 for a total of about 21,790 pages. When compared to 2014 these numbers are lower by 19 books and about 1300 pages.

In terms of star ratings, 2015 was a better year. I had three more five star books and one more four star book. The 19 book difference between 2014 and 2015 is entirely found in two and three star books.

Recommendations

I awarded ten books a five star rating. This is more five stars than any other year. Each of the five star books I’d recommend without hesitation. Below is my list of five star books. The titles are affiliate links to Amazon and the my review text links to Goodreads.

One of the great things about writing this post is that it forces me to pause and reflect on the previous years books. Its great seeing this list of great books and remembering the stories. Of these ten books the ones I remember most fondly are Stoner, Snow Crash, and The Pale King.

There were also a ton of great four star books this year. One that stands out is Joseph Heller’s Something Happened (my review). Kurt Vonnegut wrote a brilliant review of this book which I encourage you to read.

Dave MacLeod’s Make or Break: Don’t Let Climbing Injuries Dictate Your Success (my review) deserves a mention. I highly recommend this book to any climber. We push our bodies hard and this book will help you prevent and recover from injuries. I’ve used it as a reference so many times over the past year. It probably deserves five stars.

Other Stats

Unsurprisingly, I’m continuing to mostly read ebooks.

1
2
3
4
5
|           | 2014 | 2015 |
|-----------+------+------|
| ebook     |   64 |   47 |
| hardcover |    1 |    1 |
| paperback |    4 |    3 |

My average rating went up.

1
2
3
4
5
6
7
| Year | Average Rating |
|------+----------------|
| 2011 |           3.84 |
| 2012 |           3.66 |
| 2013 |           3.67 |
| 2014 |           3.48 |
| 2015 |           3.86 |

Last year I had many repeat authors. This year I had fewer. Neal Stephenson and Donna Tart really stood out this year. I read multiple books from both of them and rated every book five stars.

1
2
3
4
5
6
| Author               | My Average Rating | Number of Pages | Number of books |
|----------------------+-------------------+-----------------+-----------------|
| Neal Stephenson      |                 5 |            2693 |               3 |
| Donna Tartt          |                 5 |            1427 |               2 |
| Paolo Bacigalupi     |       3.666666667 |            1113 |               3 |
| Bret Easton Ellis    |               3.5 |             590 |               2 |

2016 Goals

In 2016 I’m planning on reading one or two biographies. That isn’t a genre I typically read. It should be a pretty easy goal to hit. If you have any recommendations please leave them in a comment.

ClojureScript: Treat warnings as errors

Recently my team deployed a new version of our ClojureScript UI and it had a minor bug. It was trivial to fix the problem, a ClojureScript build warning pointed us to the cause. As a result we started thinking it would be nice to have build warnings count as errors and fail our ClojureScript build.

We use Leiningen (version 2.5.3) and lein-cljsbuild (version 1.1.1). After some searching we found that lein-cljsbuild supports specifying custom warning handlers as the value to the :warning-handlers key. The lein-cljsbuild README even provides an example, which we took and added a (System/exit 1) to the end of it. This resulted in a build configuration that looked similar to below.

1
2
3
4
5
6
7
8
9
10
11
{:id "prod"
 :warning-handlers [(fn [warning-type env extra]
                      (when-let [s (cljs.analyzer/error-message warning-type extra)]
                        (binding [*out* *err*]
                          (println "WARNING:" (cljs.analyzer/message env s)))
                        (System/exit 1)))]
 :source-paths ["src/cljc" "src/cljs"]
 :compiler {:output-to "resources/public/js/compiled/ui.js"
            :externs ["resources/intercom-externs.js"
                      "resources/mixpanel-externs.js"]
            :optimizations :advanced}}

This worked! Well, it sort of worked. Our build failed whenever there was a warning but now we were seeing spurious warnings. We saw “Use of undeclared Var” warnings when functions created in a letfn where calling each other. Definitely not a situation that warrants a warning and definitely not a build failure.

We weren’t seeing this warning before so we opened ClojureScript’s source and found the default warning handler. The default handler checks that warning-type has a truthy value in the map *cljs-warnings*. Inspired by the default handler we added the same check to the start of our warning handler.

1
2
3
4
5
6
:warning-handlers [(fn [warning-type env extra]
                     (when (warning-type cljs.analyzer/*cljs-warnings*)
                       (when-let [s (cljs.analyzer/error-message warning-type extra)]
                         (binding [*out* *err*]
                           (println "WARNING:" (cljs.analyzer/message env s)))
                         (System/exit 1))))]

Success! Now we no longer get incorrect warnings when compiling our letfn form and our build still fails if a warning occurs. Now we can build and deploy with a little more confidence.

Even quicker feedback from your Clojure tests

I was recently inspired by a post on a mailing list to make the TDD cycle with clojure.test and lein-test-refresh even faster. lein-test-refresh is a Leiningen tool that monitors your Clojure project’s source, reloads changes, and then runs your tests. Tools like it provide some of the fastest feedback cycles possible.

To make the feedback cycle even faster I added the option to only run tests in changed namespaces. This means you’re running the minimum number of tests after a change. Version 0.12.0 of lein-test-refresh was released earlier this week with this feature.

To use it add [com.jakemccrary/lein-test-refresh 0.12.0] as a plugin to your profiles.clj or project.clj. An example project.clj can be found in the project’s GitHub repo.

Once you’re on the latest version you can toggle this feature from the command line by providing a :changes-only flag, lein test-refresh :changes-only, or by adding :changes-only true to your :test-refresh configuration section in your project.clj or profiles.clj. When the feature is on you can still run all your tests by hitting enter in the terminal running lein test-refresh.

Below is an example of the time difference between running all my tests and the tests in a single namespace.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
Ran 49 tests containing 219 assertions.
0 failures, 0 errors.

Passed all tests
Finished at 14:42:41.655 (run time: 2.006s)
*********************************************
*************** Running tests ***************
:reloading (lumanu.utils-test)

Ran 1 tests containing 3 assertions.
0 failures, 0 errors.

Passed all tests
Finished at 14:43:12.648 (run time: 0.085s)

I’ve been using this feature for about a week now and am enjoying it. My whole test suite isn’t particularly slow but even still I’ve been enjoying the faster feedback.

SQL: Aggregate a set of values together

Lately I’ve been working on projects that use Postgres as our relational database. This has allowed us to simplify some of our Clojure code by leaning on some built-in features of Postgres. One SQL function supported by Postgres which has greatly simplified our code is the array_agg aggregate function.

What is array_agg?

The array_agg function takes an argument and returns an array of the argument type. That sentence will make more sense after an example. The snippet below shows a simplified schema for a blog’s database. There is a table called blog_posts that contains details about posts, a table called categories that has labels that can be applied to blog posts, and a join table called post_categories that links the two previous tables together.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
blog=# select id, title from blog_posts;
 id |    title
----+--------------
  1 | SQL Post
  2 | Clojure Post

blog=# select * from categories;
 id |   name
----+----------
  1 | sql
  2 | emacs
  3 | clojure
  4 | postgres

blog=# select * from post_categories;
 blog_post_id | category_id
--------------+-------------
            1 |           1
            2 |           2
            1 |           4
            2 |           3

Before I learned about array_agg, if I wanted to know how each blog post had been categorized I might have written the following query.

1
2
3
4
5
6
7
8
9
10
11
12
13
select title, name as category
  from blog_posts bp
  join post_categories pc on pc.blog_post_id = bp.id
  join categories c on c.id = pc.category_id
  order by title;


    title     | category
--------------+----------
 Clojure Post | emacs
 Clojure Post | clojure
 SQL Post     | sql
 SQL Post     | postgres

The result is readable but as the number of posts and categories grow it becomes harder to read. The query also doesn’t answer the question, “How are my posts categorized?”, well. The ideal answer is a single row per post that shows the post’s categories. You can use array_agg to get that ideal answer.

1
2
3
4
5
6
7
8
9
10
select title, array_agg(name) as categories
  from blog_posts bp
  join post_categories pc on pc.blog_post_id = bp.id
  join categories c on c.id = pc.category_id
  group by title;

    title     |   categories
--------------+-----------------
 SQL Post     | {sql,postgres}
 Clojure Post | {emacs,clojure}

I find the array_agg version much nicer to read. The result answers the question in a very direct fashion and the query expresses the question well. Everything about the query expresses the question, you no longer have an extra order by clause to make the result more readable by human eyes.

How did it make my Clojure code simpler?

The above is great and it makes everything more readable for a human. Most of the time I’m not querying a SQL database so that a human can directly read the results; instead I’m using Clojure to manipulate results of a query. Fortunately, array_agg simplifies my Clojure code as well.

I’m working with a schema that has many relationships similar to the above relationship. Continuing with the example from above the snippet below shows the data shape we’d get back from clojure.java.jdbc prior to using array_agg. The data shape we actually want follows.

1
2
3
4
5
6
7
8
9
;; data shape you get from the non-array_agg query.
[{:title "Clojure Post" :category "emacs"}
 {:title "SQL Post" :category "sql"}
 {:title "Clojure Post" :category "clojure"}
 {:title "SQL Post" :category "postgres"}]

;; data shape you want
[{:title "Clojure Post" :categories ["emacs" "clojure"]}
 {:title "SQL Post" :categories ["sql" "postgres"]}]

Since we’re not getting data in our desired shape we need to write code that combines rows. One way of doing that is to use reduce and map.

1
2
3
4
(defn squash-by-title [rows]
  (->> rows
       (reduce (fn [r row] (update r (:title row) conj (:category row))) {})
       (map (fn [[title categories]] {:title title :categories categories}))))

I’ve been writing Clojure for a long time and when I see code like above it still takes me a bit of time to figure out what is happening. Not only that, but eventually your project has different squash operations depending on what data you’re pulling back from the database. They are probably mostly similar and eventually you abstract the differences and feel great. Then you come back months later and have to figure out how it all works. Luckily, if you’re using a database that supports array_agg, there is a better way.

The first step is to change your queries to use array_agg. The second step is to extend the clojure.java.jdbc/IResultSetReadColumn protocol to the type returned by your jdbc driver. For my project that looks like the following code:

1
2
3
4
5
6
;; clojure.java.jdbc has been required as jdbc

(extend-protocol jdbc/IResultSetReadColumn
  org.postgresql.jdbc4.Jdbc4Array
  (result-set-read-column [pgobj metadata i]
    (vec (.getArray pgobj))))

By changing my queries to use array_agg and adding those four lines of code I’m able to delete all of my squashing functions and get data from my database in the shape I want. I also end up with easier to understand code and more expressive queries. Awesome.

Thanks to Timothy Pratley for providing feedback on earlier versions of this post.