Lately I’ve been working on projects that use
Postgres as our relational database.
This has allowed us to simplify some of our Clojure code by leaning on
some built-in features of Postgres. One SQL function supported by
Postgres which has greatly simplified our code is the array_agg
aggregate
function.
What is array_agg
?
The array_agg
function takes an argument and returns an array of the
argument type. That sentence will make more sense after an example.
The snippet below shows a simplified schema for a blog’s database.
There is a table called blog_posts
that contains details about
posts, a table called categories
that has labels that can be applied
to blog posts, and a join table called post_categories
that links
the two previous tables together.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
|
Before I learned about array_agg
, if I wanted to know how each blog
post had been categorized I might have written the following query.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
The result is readable but as the number of posts and categories grow
it becomes harder to read. The query also doesn’t answer the question,
“How are my posts categorized?”, well. The ideal answer is a single
row per post that shows the post’s categories. You can use array_agg
to get that ideal answer.
1 2 3 4 5 6 7 8 9 10 |
|
I find the array_agg
version much nicer to read. The result answers
the question in a very direct fashion and the query expresses the
question well. Everything about the query expresses the question, you
no longer have an extra order by
clause to make the result more
readable by human eyes.
How did it make my Clojure code simpler?
The above is great and it makes everything more readable for a human.
Most of the time I’m not querying a SQL database so that a human can
directly read the results; instead I’m using Clojure to manipulate
results of a query. Fortunately, array_agg
simplifies my Clojure
code as well.
I’m working with a schema that has many relationships similar to the
above relationship. Continuing with the example from above the snippet
below shows the data shape we’d get back from clojure.java.jdbc
prior to using array_agg
. The data shape we actually want follows.
1 2 3 4 5 6 7 8 9 |
|
Since we’re not getting data in our desired shape we need to write
code that combines rows. One way of doing that is to use reduce
and map
.
1 2 3 4 |
|
I’ve been writing Clojure for a long time and when I see code like
above it still takes me a bit of time to figure out what is happening.
Not only that, but eventually your project has different squash
operations depending on what data you’re pulling back from the
database. They are probably mostly similar and eventually you abstract
the differences and feel great. Then you come back months later and
have to figure out how it all works. Luckily, if you’re using a
database that supports array_agg
, there is a better way.
The first step is to change your queries to use array_agg
. The
second step is to extend the clojure.java.jdbc/IResultSetReadColumn
protocol to the type returned by your jdbc driver. For my project that
looks like the following code:
1 2 3 4 5 6 |
|
By changing my queries to use array_agg
and adding those four lines
of code I’m able to delete all of my squashing functions and get data
from my database in the shape I want. I also end up with easier to
understand code and more expressive queries. Awesome.
Thanks to Timothy Pratley for providing feedback on earlier versions of this post.