In
a
previous post,
we used Google’s BigQuery and the
public
GitHub dataset
to discover the most used Clojure testing library. The answer wasn’t
surprising. The built-in clojure.test was by far the most used.
Let’s use the dataset to answer a less obvious question, what are the
most used libraries in Clojure projects? We’ll measure this by
counting references to libraries in project.clj and build.boot
files.
Before we can answer that question, we’ll need to transform the
data. First, we create the Clojure subset of the GitHub dataset. I did
this by executing the following queries and saving the results to
tables1.
123456789101112131415
-- Save the results of this query to the clojure.files tableSELECT*FROM[bigquery-public-data:github_repos.files]WHERERIGHT(path,4)='.clj'ORRIGHT(path,5)='.cljc'ORRIGHT(path,5)='.cljs'ORRIGHT(path,10)='boot.build'-- Save the results to clojure.contentsSELECT*FROM[bigquery-public-data:github_repos.contents]WHEREidIN(SELECTidFROMclojure.files)
Next we extract the dependencies from build.boot and project.clj
files. Fortunately for us, both of these files specify dependencies in
the same format, so we’re able to use the same regular expression on both types.
The query below identifies project.clj and build.boot files,
splits each file into lines, and extracts referenced library names and
versions using a regular expression. Additional filtering is done get
rid of some spurious results.
Clojure and ClojureScript are at the top, which isn’t surprising. I’m
surprised to see tools.nrepl in the next five results (rows 3-7). It
is the only library out of the top that I haven’t used.
What testing library is used the most? We already answered this in
my
last article but
let’s see if we get the same answer when we’re counting how many times
a library is pulled into a project.
Before doing this research I tried to predict what libraries I’d see
in the top 10. I thought that clj-time and clj-http would be up
there. I’m happy to see my guess was correct.
It was pretty pleasant using BigQuery to do this analysis. Queries
took at most seconds to execute. This quick feedback let me play
around in the web interface without feeling like I was waiting for
computers to do work. This made the research into Clojure library
usage painless and fun.
Looking forward to the next article? Never miss a post by subscribing using e-mail or RSS. The e-mail newsletter goes out periodically (at most once a month) and includes reviews of books I've been reading and links to stuff I've found interesting.