we used Google’s BigQuery and the
to discover the most used Clojure testing library. The answer wasn’t
surprising. The built-in clojure.test was by far the most used.
Let’s use the dataset to answer a less obvious question, what are the
most used libraries in Clojure projects? We’ll measure this by
counting references to libraries in project.clj and build.boot
Before we can answer that question, we’ll need to transform the
data. First, we create the Clojure subset of the GitHub dataset. I did
this by executing the following queries and saving the results to
-- Save the results of this query to the clojure.files tableSELECT*FROM[bigquery-public-data:github_repos.files]WHERERIGHT(path,4)='.clj'ORRIGHT(path,5)='.cljc'ORRIGHT(path,5)='.cljs'ORRIGHT(path,10)='boot.build'-- Save the results to clojure.contentsSELECT*FROM[bigquery-public-data:github_repos.contents]WHEREidIN(SELECTidFROMclojure.files)
Next we extract the dependencies from build.boot and project.clj
files. Fortunately for us, both of these files specify dependencies in
the same format, so we’re able to use the same regular expression on both types.
The query below identifies project.clj and build.boot files,
splits each file into lines, and extracts referenced library names and
versions using a regular expression. Additional filtering is done get
rid of some spurious results.
Before doing this research I tried to predict what libraries I’d see
in the top 10. I thought that clj-time and clj-http would be up
there. I’m happy to see my guess was correct.
It was pretty pleasant using BigQuery to do this analysis. Queries
took at most seconds to execute. This quick feedback let me play
around in the web interface without feeling like I was waiting for
computers to do work. This made the research into Clojure library
usage painless and fun.