Jake McCrary

Using my phone's voice control for a month

From May 6th to June 2nd the screen of my phone had a crack. I have an Android phone, and the crack was through the software buttons at the bottom of the screen. As a result, I could not touch the back, home, or overview (app switching) buttons. For nearly a month I never saw my home screen, couldn’t go back, or switch apps through touching my phone. I was very reliant on arriving notifications giving me an opportunity to open apps.

It took me some time, but I realized I could use voice commands to replace some of the missing functionality. Using voice commands, I could open apps and no longer be at the whim of notifications.

Here is an example of my phone usage during this month. My thoughts are in [brackets]. Italics indicate actions. Talking is wrapped in “ ”.

  1. [Alright, I want to open Instagram] “Ok Google, open Instagram.”
  2. [Sweet, it worked] scrolls through feed
  3. WhatsApp notification happens [Great, a notification, I can click it to open WhatsApp]
  4. I read messages in WhatsApp.
  5. [Time to go back to Instagram] “Ok Google, open Instagram”
  6. [sigh, voice command failed, lets try again] “Ok Google, open Instagram”
  7. Instagram opens [Great, time to scroll through more pictures]

As you can see, it is a bit more painful than clicking buttons to switch between different apps. Voice commands fail sometimes and, at least for me, generally take more effort than tapping the screen. That’s ok though; I was determined to embrace voice commands and experience what a future of only voice commands might feel like.

Below are some observations from using my voice to control my phone for a month.

It is awkward in public

My phone usage in public went way down. There was something about having to talk to your phone to open an app that made me not want to pull out my phone.

It is much more obvious you are using your phone when you use your voice to control it. It makes casual glances at your phone while hanging out with a group impossible. You can’t sneak a quick look at Instagram when you need to say “Ok Google, open Instagram” without completely letting everyone around you know you are no longer paying attention.

This also stopped me from using my phone in Ubers/Lyfts/cabs. I often talk to the driver or other passengers anyway, but this cemented that. I realize it is completely normal to ignore the other people in a car but I felt like a (small) asshole audibly calling out that I’m ignoring other people in the car.

You become more conscious of what apps you use

When you have to say “Okay Google, open Instagram” every time you want to open Instagram, you become way more aware of how often you use Instagram. Using your voice instead of tapping a button on your screen is a much bigger hurdle between having the urge to open something and actually opening it. It gives you more time to observe what you are doing.

You become more conscious of using your phone

Using your phone becomes a lot harder. This increased difficulty helped highlight when I was using my phone. My phone’s functionality dropped drastically and, as a result, I stopped reaching for it as much.

This reminded me of when I used a dumb (feature) phone for a couple of months a few years ago. Using a non-smartphone after using a smartphone for years was weird. It helped me reign in my usage1.

Voice control can be pretty convenient

Even after repairing my screen, I still find myself using some voice commands. While making my morning coffee, I often ask my phone for the weather forecast. This is more convenient than opening an app and it lets me continue to use both hands while making coffee.

Setting alarms, starting countdown timers, adding reminders, and checking the weather are all things I do through voice commands now.

I wish it worked all the time

I suppose this is an argument for getting a Google Home or Amazon Echo. I have to wake up my phone to use voice commands with it. This limits the usefulness of voice commands since I need be within reach of my phone.

I wish it could do more

At some point, I got used to asking my phone to do things. Then I started giving it more complicated commands, and it would fail. I found myself giving it multi-stage commands such as “Ok Google, turn on Bluetooth and play my playlist Chill on Spotify.” That doesn’t work but it would be amazing if it did.

Recommendations

I recommend that you force yourself to use voice commands for some period of time. Pretend your home button is broken and you have to use voice control to move around your phone. You’ll become more aware of your phone usage and you’ll learn some useful voice commands that will make your technology usage nicer.


  1. My non-smartphone experiment four years ago is what resulted in me no longer using Facebook or Twitter on my phone. It also is the reason I silenced most notifications, including email, on my phone.

Speeding up this site by optionally loading Disqus comments

Earlier this month I took another look at what was required for reading an article on this site. What else could I do to make this site load faster?

To do this, I loaded up WebPageTest and pointed it towards one of my posts. To my shock, it took 113 requests for a total of 721 KB to load a single post. This took WebPageTest 6.491 seconds. The document complete event triggered after 15 requests (103 KB, 1.6 seconds).

113 requests to load a static article was ridiculous. Most of those requests happened as a result of loading the Disqus javascript. I find comments valuable and want to continue including them on my site. Because of this, I couldn’t remove Disqus. Instead, I made loading Disqus optional.

After making the required changes, it only takes 11 requests for 61 KB of data to fully load the test post. The document complete event only required 8 requests for 51 KB of data. Optionally loading the Disqus javascript resulted in a massive reduction of data transferred.

How did I do it? The template that generates my articles now only inserts the Disqus javascript when a reader clicks a button. My final template is at the bottom of this post.

The template adds an insertDisqus function that inserts a <script> element when a reader clicks a button. This element contains the original JavaScript that loads Disqus. When the <script> element is inserted into the page, the Disqus javascript is loaded and the comments appear.

My exact template might not work for you, but I’d encourage you to think about optionally loading Disqus and other non-required JavaScript. Your readers will thank you.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
{% if site.disqus_short_name and page.comments == true %}
  <noscript>Please enable JavaScript to view the <a href="http://disqus.com/?ref_noscript">comments powered by Disqus.</a></noscript>
  <div id="disqus_target">
    <script>
     var insertDisqus = function() {
       var elem = document.createElement('script');
       elem.innerHTML =  "var disqus_shortname = '{{ site.disqus_short_name }}'; var disqus_identifier = '{{ site.url }}{{ page.url }}'; var disqus_url = '{{ site.url }}{{ page.url }}'; (function () {var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true; dsq.src = '//' + disqus_shortname + '.disqus.com/embed.js'; (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq);}());"
       var target = document.getElementById('disqus_target');
       target.parentNode.replaceChild(elem, target);
     }
    </script>
    <button class="comment-button" onclick="insertDisqus()"><span>ENABLE COMMENTS AND RECOMMENDED ARTICLES</span></button>
  </div>
{% endif %}

Adding a JSON Feed to an Octopress/Jekyll generated site

I went to a coffee shop this last weekend with the intention of writing up a quick article on comm. I sat down, sipping my coffee, and wasn’t motivated. I didn’t feel like knocking out a short post, and I didn’t feel like editing a draft I’ve been sitting on for a while. I wanted to do some work though, so I decided to add a JSON Feed to this site.

JSON Feed is an alternative to Atom and RSS that uses JSON instead of XML. I figured I could add support for it in less than the time it would take to enjoy my coffee and maybe some readers would find it useful. I’d be shocked if anyone actually finds this useful, but it was a fun little exercise anyway.

An old version of Octopress (2.something), which uses an old version of Jekyll (2.5.3), generates this site. Despite this, I don’t think the template would need to change much if I moved to a new version. The template below is saved as source/feed.json in my git repository.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
---
layout: null
---
{
  "version": "https://jsonfeed.org/version/1",
  "title": {{ site.title | jsonify }},
  "home_page_url": "{{ site.url }}",
  "feed_url": "{{site.url}}/feed.json",
  "favicon": "{{ site.url }}/favicon.png",
  "author" : {
      "url" : "https://twitter.com/jakemcc",
      "name" : "{{ site.author | strip_html }}"
  },
  "user_comment": "This feed allows you to read the posts from this site in any feed reader that supports the JSON Feed format. To add this feed to your reader, copy the following URL - {{ site.url }}/feed.json - and add it your reader.",
  "items": [{% for post in site.posts limit: 20 %}
    {
      "id": "{{ site.url }}{{ post.id }}",
      "url": "{{ site.url }}{{ post.url }}",
      "date_published": "{{ post.date | date_to_xmlschema }}",
      "title": {% if site.titlecase %}{{ post.title | titlecase | jsonify }}{% else %}{{ post.title | jsonify }}{% endif %},
      {% if post.description %}"summary": {{ post.description | jsonify }},{% endif %}
      "content_html": {{ post.content | expand_urls: site.url | jsonify }},
      "author" : {
        "name" : "{{ site.author | strip_html }}"
      }
    }{% if forloop.last == false %},{% endif %}
    {% endfor %}
  ]
}

I approached this problem by reading the JSON Feed Version 1 spec and cribbing values from the template for my Atom feed. The trickiest part was filling in the "content_html" value. It took me a while to find figure out that jsonify needed to be at the end of {{ post.content | expand_urls: site.url | jsonify }}. That translates the post’s HTML content into its JSON representation. You’ll notice that any template expression with jsonify at the end also isn’t wrapped in quotes. This is because jsonify is doing that for me.

The {% if forloop.last == false %},{% endif %} is also important. Without this, the generated JSON has an extra , after the final element in items. This isn’t valid JSON.

I caught that by using the command line tool json. If you ever edit JSON by hand or generate it from a template then you should add this tool to your toolbox. It will prevent you from creating invalid JSON.

How did I use it? I’d make a change in the feed.json template and generate an output file. Then I’d cat that file to json --validate. When there was an error, I’d see a message like below.

1
2
3
4
5
6
7
0 [last: 5s] 12:43:47 ~/src/jakemcc/blog (master *)
$ cat public/feed.json | json --validate
json: error: input is not JSON: Expected ',' instead of '{' at line 25, column 5:
            {
        ....^
1 [last: 0s] 12:43:49 ~/src/jakemcc/blog (master *)
$

And there would be zero output on success.

1
2
3
4
0 [last: 5s] 12:45:25 ~/src/jakemcc/blog (master)
$ cat public/feed.json | json --validate
0 [last: 0s] 12:45:30 ~/src/jakemcc/blog (master)
$

It was pretty straightforward to add a JSON Feed. Was it a good use of my time? ¯\_(ツ)_/¯. In the process of adding the feed I learned more about Liquid templating and figured out how to embed liquid tags into a blog post. Even adding redundant features can be a useful exercise.

Using comm to verify file content matches

I recently found myself in a situation where I needed to confirm that a process took in a tab separated file, did some processing, and then output a new file containing the original columns with some additional ones. The feature I was adding allowed the process to die and restart while processing the input file and pick up where it left off.

I needed to confirm the output had data for every line in the input. I reached to the command line tool comm.

Below is a made up input file.

1
2
3
4
5
6
7
UNIQUE_ID    USER
1 38101838
2 19183819
3 19123811
4 10348018
5 19881911
6 29182918

And here is some made up output.

1
2
3
4
5
6
7
UNIQUE_ID    USER    MESSAGE
1 38101838    A01
2 19183819    A05
3 19123811    A02
4 10348018    A01
5 19881911    A02
6 29182918    A05

With files this size, it would be easy enough to check visually. In my testing, I was dealing with files that had thousands of lines. This is too many to check by hand. It is a perfect amount for comm.

comm reads two files as input and then outputs three columns. The first column contains lines found only in the first file, the second column contains lines only found in the second, and the last column contains lines in both. If it is easier for you to think about it as set operations, the first two columns are similar to performing two set differences and the third is similar to set intersection. Below is an example adapted from Wikipedia showing its behavior.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
$ cat foo.txt
apple
banana
eggplant
$ cat bar.txt
apple
banana
banana
zucchini
$ comm foo.txt bar.txt
                  apple
                  banana
          banana
eggplant
          zucchini

So how is this useful? Well, you can also tell comm to suppress outputting specific columns. If we send the common columns from the input and output file to comm and suppress comm’s third column then anything printed to the screen is a problem. Anything printed to the screen was found in one of the files and not the other. We’ll select the common columns using cut and, since comm expects input to be sorted, then sort using sort. Let’s see what happens.

1
2
$ comm -3 <(cut -f 1,2 input.txt | sort) <(cut -f 1,2 output.txt | sort)
$

Success! Nothing was printed to the console, so there is nothing unique in either file.

comm is a useful tool to have in your command line toolbox.

Send a push notification when your external IP address changes

I need to know when my external IP address changes. Whenever it changes, I need to update an IP whitelist and need to re-login to a few sites. I sometimes don’t notice for a couple of days and, during that time, some automatic processes fail.

After the last time this happened, I whipped up a script that sends me a push notification when my IP address changes.

The script uses Pushover to send the push notification. Pushover is great. I have used it for years to get notifications from my headless computers. If you use the below script, replace ${PUSHOVER_TOKEN} and ${PUSHOVER_USER} with your own details.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
#!/bin/bash

set -e

previous_file="${HOME}/.previous-external-ip"

if [ ! -e "${previous_file}" ]; then
    dig +short myip.opendns.com @resolver1.opendns.com > "${previous_file}"
fi

current_ip=$(dig +short myip.opendns.com @resolver1.opendns.com)

previous_ip=$(cat "${previous_file}")

if [ "${current_ip}" != "${previous_ip}" ]; then
    echo "external ip changed"
    curl -s --form-string "token=${PUSHOVER_TOKEN}" \
         --form-string "user=${PUSHOVER_USER}" \
         --form-string "title=External IP address changed" \
         --form-string "message='${previous_ip}' => '${current_ip}'" \
         https://api.pushover.net/1/messages.json
fi

echo "${current_ip}" > "${previous_file}"

What are the most used Clojure libraries?

In a previous post, we used Google’s BigQuery and the public GitHub dataset to discover the most used Clojure testing library. The answer wasn’t surprising. The built-in clojure.test was by far the most used.

Let’s use the dataset to answer a less obvious question, what are the most used libraries in Clojure projects? We’ll measure this by counting references to libraries in project.clj and build.boot files.

Before we can answer that question, we’ll need to transform the data. First, we create the Clojure subset of the GitHub dataset. I did this by executing the following queries and saving the results to tables1.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
-- Save the results of this query to the clojure.files table
SELECT
  *
FROM
  [bigquery-public-data:github_repos.files]
WHERE
  RIGHT(path, 4) = '.clj'
  OR RIGHT(path, 5) = '.cljc'
  OR RIGHT(path, 5) = '.cljs'
  OR RIGHT(path, 10) = 'boot.build'

-- Save the results to clojure.contents
SELECT *
FROM [bigquery-public-data:github_repos.contents]
WHERE id IN (SELECT id FROM clojure.files)

Next we extract the dependencies from build.boot and project.clj files. Fortunately for us, both of these files specify dependencies in the same format, so we’re able to use the same regular expression on both types.

The query below identifies project.clj and build.boot files, splits each file into lines, and extracts referenced library names and versions using a regular expression. Additional filtering is done get rid of some spurious results.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
SELECT
  REGEXP_EXTRACT(line, r'\[+(\S+)\s+"\S+"]') AS library,
  REGEXP_EXTRACT(line, r'\[+\S+\s+"(\S+)"]') AS version,
  COUNT(*) AS count
FROM (
  SELECT
    SPLIT(content, '\n') AS line
  FROM
    [clojure.contents]
  WHERE
    id IN (
    SELECT
      id
    FROM
      [clojure.files]
    WHERE
      path LIKE '%project.clj'
      OR path LIKE '%build.boot')
      HAVING line contains '[')
GROUP BY
  library, version
HAVING library is not null and not library contains '"'
ORDER BY
  count DESC

The first five rows from the result are below. Let’s save the entire result to a clojure.libraries table.

1
2
3
4
5
6
7
| library             | version | count |
|---------------------+---------+-------|
| org.clojure/clojure | 1.6.0   | 7015  |
| org.clojure/clojure | 1.5.1   | 4251  |
| org.clojure/clojure | 1.7.0   | 4093  |
| org.clojure/clojure | 1.8.0   | 3016  |
| hiccup              | 1.0.5   | 1280  |

Now we can start answering all sorts of interesting questions.

What is the most referenced library put out under the org.clojure group?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
SELECT library, sum(count) count
FROM clojure.libraries
WHERE library CONTAINS 'org.clojure'
GROUP BY library
ORDER BY count desc

| Row | library                        | count |
|-----+--------------------------------+-------|
|   1 | org.clojure/clojure            | 20834 |
|   2 | org.clojure/clojurescript      |  3080 |
|   3 | org.clojure/core.async         |  2612 |
|   4 | org.clojure/tools.logging      |  1579 |
|   5 | org.clojure/data.json          |  1546 |
|   6 | org.clojure/tools.nrepl        |  1244 |
|   7 | org.clojure/java.jdbc          |  1064 |
|   8 | org.clojure/tools.cli          |  1053 |
|   9 | org.clojure/tools.namespace    |   982 |
|  10 | org.clojure/test.check         |   603 |
|  11 | org.clojure/core.match         |   578 |
|  12 | org.clojure/math.numeric-tower |   503 |
|  13 | org.clojure/data.csv           |   381 |
|  14 | org.clojure/math.combinatorics |   372 |
|  15 | org.clojure/tools.reader       |   368 |
|  16 | org.clojure/clojure-contrib    |   335 |
|  17 | org.clojure/data.xml           |   289 |
|  18 | org.clojure/tools.trace        |   236 |
|  19 | org.clojure/java.classpath     |   199 |
|  20 | org.clojure/core.cache         |   179 |

Clojure and ClojureScript are at the top, which isn’t surprising. I’m surprised to see tools.nrepl in the next five results (rows 3-7). It is the only library out of the top that I haven’t used.

What testing library is used the most? We already answered this in my last article but let’s see if we get the same answer when we’re counting how many times a library is pulled into a project.

1
2
3
4
5
6
7
8
9
10
11
12
SELECT library, sum(count) count
FROM [clojure.libraries]
WHERE library in ('midje', 'expectations', 'speclj', 'smidjen', 'fudje')
GROUP BY library
ORDER BY count desc

| Row | library                | count |
|-----+------------------------+-------|
|   1 | midje                  |  1122 |
|   2 | speclj                 |   336 |
|   3 | expectations           |   235 |
|   4 | smidjen                |     1 |

Those results are close to the previous results. Of the non-clojure.test libraries, midje still ends up on top.

What groups (as identified by the Maven groupId) have their libraries referenced the most? Top 12 are below but the full result is available.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
SELECT REGEXP_EXTRACT(library, r'(\S+)/\S+') AS group, sum(count) AS count
FROM [clojure.libraries]
GROUP BY group
HAVING group IS NOT null
ORDER BY count DESC

| Row | group                 | count |
|-----+-----------------------+-------|
|   1 | org.clojure           | 39611 |
|   2 | ring                  |  5817 |
|   3 | com.cemerick          |  2053 |
|   4 | com.taoensso          |  1605 |
|   5 | prismatic             |  1398 |
|   6 | org.slf4j             |  1209 |
|   7 | cljsjs                |   868 |
|   8 | javax.servlet         |   786 |
|   9 | com.stuartsierra      |   642 |
|  10 | com.badlogicgames.gdx |   586 |
|  11 | cider                 |   560 |
|  12 | pjstadig              |   536 |

And finally, the question that inspired this article, what is the most used library?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
SELECT library, sum(count) count
FROM [clojure.libraries]
WHERE library != 'org.clojure/clojure'
GROUP BY library
ORDER BY count desc

| Row | library                     | count |
|-----+-----------------------------+-------|
|   1 | compojure                   |  3609 |
|   2 | lein-cljsbuild              |  3413 |
|   3 | org.clojure/clojurescript   |  3080 |
|   4 | org.clojure/core.async      |  2612 |
|   5 | lein-ring                   |  1809 |
|   6 | cheshire                    |  1802 |
|   7 | environ                     |  1763 |
|   8 | ring                        |  1678 |
|   9 | clj-http                    |  1648 |
|  10 | clj-time                    |  1613 |
|  11 | hiccup                      |  1591 |
|  12 | lein-figwheel               |  1582 |
|  13 | org.clojure/tools.logging   |  1579 |
|  14 | org.clojure/data.json       |  1546 |
|  15 | http-kit                    |  1423 |
|  16 | lein-environ                |  1325 |
|  17 | ring/ring-defaults          |  1302 |
|  18 | org.clojure/tools.nrepl     |  1244 |
|  19 | midje                       |  1122 |
|  20 | com.cemerick/piggieback     |  1096 |
|  21 | org.clojure/java.jdbc       |  1064 |
|  22 | org.clojure/tools.cli       |  1053 |
|  23 | enlive                      |  1001 |
|  24 | ring/ring-core              |   995 |
|  25 | org.clojure/tools.namespace |   982 |

Compojure takes the top slot. Full results are available.

Before doing this research I tried to predict what libraries I’d see in the top 10. I thought that clj-time and clj-http would be up there. I’m happy to see my guess was correct.

It was pretty pleasant using BigQuery to do this analysis. Queries took at most seconds to execute. This quick feedback let me play around in the web interface without feeling like I was waiting for computers to do work. This made the research into Clojure library usage painless and fun.


  1. I did this in early March 2017.

Which Clojure testing library is most used?

I’ve always assumed that the built-in clojure.test is the most widely used testing library in the Clojure community. Earlier this month I decided to test this assumption using the Google’s BigQuery GitHub dataset.

The BigQuery GitHub dataset contains over three terabytes of source code from more than 2.8 million open source GitHub repositories. BigQuery lets us quickly query this data using SQL.

Below is a table with the results (done in early March 2017) of my investigation. Surprising no one, clojure.test comes out as the winner and it is a winner by a lot.

1
2
3
4
5
6
7
8
| Library      | # Repos Using |
|--------------+---------------|
| clojure.test |         14304 |
| midje        |          1348 |
| expectations |           429 |
| speclj       |           207 |
| smidjen      |             1 |
| fudje        |             1 |

23,243 repositories were identified as containing Clojure (or ClojureScript) code. This means there were about 6,953 repositories that didn’t use any testing library1. This puts the “no tests or an obscure other way of testing” in a pretty solid second place.

You should take these numbers as ballpark figures and not exact answers. I know from using GitHub’s search interface that there are three public projects using fudje2.

So, why don’t all three of those projects show up? The dataset only includes projects where Google could identify the project as open source and the GitHub licenses API is used to do that3. Two of those three projects were probably unable to be identified as something with an appropriate license.

Another small problem is that since expectations is an actual word, it shows up outside of ns declarations. I ended up using a fairly simple query to generate this data and it only knows that expectations shows up somewhere in a file. I experimented with some more restrictive queries but they didn’t drastically change the result and I wasn’t sure they weren’t wrong in other ways. If you subtract a number between 100 and 150 you’ll probably have a more accurate expectations usage count.

Keep reading if you want to hear more about the steps to come up with the above numbers.

If you have other Clojure questions you think could be answered by querying this dataset, let me know in the comments or on twitter. I have some more ideas, so I wouldn’t be surprised if at least one more article gets written.

The Details

The process was pretty straightforward. Most of my time was spent exploring the tables, figuring out what the columns represented, figuring out what queries worked well, and manually confirming some of the results. BigQuery is very fast. Very little of my time was spent waiting for results.

1. Setup the data

You get 1 TB of free BigQuery usage a month. You can blow through this in a single query. Google provides sample tables that contain less data but I wanted to operate on the full set of Clojure(Script) files, so my first step was to execute some queries to create tables that only contained Clojure data.

First, I queried the github_repos.files table for all the Clojure(Script) files and saved that to a clojure.files table.

1
2
3
4
5
6
7
8
SELECT
  *
FROM
  [bigquery-public-data:github_repos.files]
WHERE
  (RIGHT(path, 4) = '.clj'
    OR RIGHT(path, 5) = '.cljc'
    OR RIGHT(path, 5) = '.cljs')

The above query took only 9.2 seconds to run and processed 328 GB of data.

Using the clojure.files table, we can select the source for all the Clojure code from the github_repos.contents. I saved this to a clojure.contents table.

1
2
3
SELECT *
FROM [bigquery-public-data:github_repos.contents]
WHERE id IN (SELECT id FROM clojure.files)

This query processed 1.84 TB of data in 21.5 seconds. So fast. In just under 30 seconds, I’ve blown through the free limit.

2. Identify what testing library (or libraries) a repo uses

We can guess that a file uses a testing library if it contains certain string. The strings we’ll search for are the namespaces we’d expect to see required or used in a ns declaration. The below query does this for each file and then rolls up the results by repository. It took 3 seconds to run and processed 611 MB of data.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
SELECT
  files.repo_name,
  MAX(uses_clojure_test) uses_clojure_test,
  MAX(uses_expectations) uses_expectations,
  MAX(uses_midje) uses_midje,
  MAX(uses_speclj) uses_speclj,
  MAX(uses_fudje) uses_fudje,
  MAX(uses_smidjen) uses_smidjen,
FROM (
  SELECT
    id,
    contents.content LIKE '%clojure.test%' uses_clojure_test,
    contents.content LIKE '%expectations%' uses_expectations,
    contents.content LIKE '%midje%' uses_midje,
    contents.content LIKE '%speclj%' uses_speclj,
    contents.content LIKE '%fudje%' uses_fudje,
    contents.content LIKE '%smidjen%' uses_smidjen,
  FROM
    clojure.contents AS contents) x
JOIN
  clojure.files files ON files.id = x.id
GROUP BY
  files.repo_name

Below is a screenshot of the first few rows in the result.

BigQuery results for test library usage by repo

3. Export the data

At this point, we could continue doing the analysis using SQL and the BigQuery UI but I opted to explore the data using Clojure and the repl. There were too many rows to directly download the query results as a csv file, so I ended up having to save the results as a table and then export it to Google’s cloud storage and download from there.

The first few rows of the file look like this:

1
2
3
files_repo_name,uses_clojure_test,uses_expectations,uses_midje,uses_speclj,uses_fudje,uses_smidjen
wangchunyang/clojure-liberator-examples,true,false,false,false,false,false
yantonov/rex,false,false,false,false,false,false

4. Calculate some numbers

The code takes the csv file and does some transformations. You could do this in Excel or using any language of your choice. I’m not going to include code here, as it isn’t that interesting.

BigQuery thoughts

This was my first time using Google’s BigQuery. This wasn’t the most difficult analysis to do but I was impressed at the speed and ease of use. The web UI, which I used entirely for this, is neither really great or extremely terrible. It mostly just worked and I rarely had to look up documentation.

I don’t really feel comfortable making a judgment call on if the cost is expensive or not but this article cost a bit less than seven dollars to write. This doesn’t seem too outrageous to me.

Based on my limited usage of BigQuery, it is something I’d look into further if I needed its capabilities.


  1. Probably higher, as projects can and use more than one testing library.

  2. And those projects are jumarko/clojure-random, dpassen1/great-sort, and jimpil/fudje.

  3. Source is a Google Developer Advocate’s response on old HN post

Using lein-test-refresh with expectations

The 2.2.0 release1 of expectations adds a clojure.test compatible syntax. The release adds the defexpect macro which forces you to name your test but then generates code that is compatible with clojure.test.

Why would you want this? Because clojure.test is the built-in testing library for Clojure, an entire ecosystem has been built around it. Tool support for clojure.test is always going to be ahead of support for the original expectations. By using the new clojure.test compatible syntax, expectations can take advantage of all the tools built for clojure.test.

Using lein-test-refresh with expectations

If you move to the new clojure.test compatible syntax, you can start using lein-test-refresh to automatically rerun your tests when your code changes. lein-test-refresh is a fork of the original expectations autorunner, lein-autoexpect, but it has grown to have more features than its original inspiration. Now you can use it with expectations2.

Below is a sample project.clj that uses lein-test-refresh with the latest expectations.

1
2
3
4
5
(defproject expectations-project "0.1.0-SNAPSHOT"
  :description "Sample project using expectations"
  :dependencies [[org.clojure/clojure "1.8.0"]]
  :plugins [[com.jakemccrary/lein-test-refresh  "0.18.1"]]
  :profiles {:dev {:dependencies [[expectations "2.2.0-beta1"]]}})

Here is an example test file.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
(ns expectations-project.core-test
  (:require [expectations :refer :all]
            [expectations.clojure.test :refer [defexpect]]))

(defexpect two
  2 (+ 1 1))

(defexpect three
  3 (+ 1 1))

(defexpect group
  (expect [1 2] (conj [] 1 5))
  (expect #{1 2} (conj #{} 1 2))
  (expect {1 2} (assoc {} 1 3)))

And here is the result of running lein test-refresh.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
$ lein test-refresh
*********************************************
*************** Running tests ***************
:reloading (expectations-project.core-test)

FAIL in (group) (expectations_project/core_test.clj:11)
expected: [1 2]
  actual: [1 5] from (conj [] 1 5)

FAIL in (group) (expectations_project/core_test.clj:11)
expected: {1 2}
  actual: {1 3} from (assoc {} 1 3)

FAIL in (three) (expectations_project/core_test.clj:8)
expected: 3
  actual: 2 from (+ 1 1)

Ran 3 tests containing 5 assertions.n
3 failures, 0 errors.

Failed 3 of 5 assertions
Finished at 11:53:06.281 (run time: 0.270s)

After some quick edits to fix the test errors and saving the file, here is the output from the tests re-running.

1
2
3
4
5
6
7
8
9
10
11
12
13
*********************************************
*************** Running tests ***************
:reloading (expectations-project.core-test)

Ran 3 tests containing 5 assertions.
0 failures, 0 errors.
:reloading ()

Ran 3 tests containing 5 assertions.
0 failures, 0 errors.

Passed all tests
Finished at 11:53:59.045 (run time: 0.013s)

If you’re using expectations and switch to the new clojure.test compatible syntax, I’d encourage you to start using lein-test-refresh.


  1. As of 2016-02-27 2.2.0 isn’t out yet, but 2.2.0-beta1 has been released and has the changes.

  2. In fact, you have to use it if you use Leiningen and the new syntax and want your tests to run automatically.

Reading in 2016

The time has come for another end-of-year summary of my reading from the previous year. Here are links to my previous end-of-year reflections: 2013, 2014, 2015.

I’ve continued to keep track of my reading using Goodreads. My profile contains the full list and reviews of books I’ve read since 2010. Here is my full 2016 list.

2016 Goal

My 2016 goal was to read one or two biographies. It is a genre that I typically don’t read and I felt like branching out. I’m going to consider this goal achieved. I read Open (my review) by Andre Agassi (well, ghostwritten for him) and Medium Raw by Anthony Bourdain. Both have been tagged as memoirs so in the strictest sense maybe I shouldn’t count them towards my goal but I’m counting them. Open tells Andre Agassi’s life story and claims to have been fact checked so it is pretty close to a biography.

Of the two, I drastically preferred Open. I would not recommend Medium Raw unless you know you like Bourdain’s writing and the book description sounds interesting to you. Open has a broader appeal and tells an interesting story of a man’s conflict, struggle and success.

2016 Numbers

I read 59 books in 2016 for a total of 22,397 pages. I also read every issue of Amazon’s Day One weekly periodical (as I have every year since Day One started being published). Overall my rating distribution is pretty similar to 2015, with my three, four, and five star categories all containing two or three more books than the previous year.

Recommendations

I awarded twelve books a five star rating. I’ve listed them below in no particular order. The title links are affiliate links to Amazon and the review links to go my review on Goodreads.

While I enjoyed all of the above books, a few stand out.

Chasing the Scream by Johann Hari

This book is about the war on drugs. It brings together studies, history, and anecdotes to present a compelling read on addiction and drug policy. I think it would be hard to finish this book and not be persuaded to vote for policies of drug decriminalization or legalization and more humane addiction treatments. The goodreads reviews of this book are generally extremely positive, with 61% of them being five stars. Go read the book’s summary and a few reviews and you’ll want to read this book.

Deep Work by Cal Newport

This book is game changer for those of us that have hobbies or work in fields where distraction-free concentration is beneficial or required. Cal Newport makes the argument that the ability to perform deep work is critical for mastering complicated information and producing great results. The book is split in to two parts. The first defines and makes the argument for deep work. The second prescribes rules for enabling yourself to perform more deep work. I highly recommend reading this book and implementing the recommendations2.

A Little Life by Hanya Yanagihara

This is a depressing and challenging book. As this review says, this book is about a terribly broken character, Jude, and his struggles. The writing is excellent. If you feel up to reading a long, difficult book that exposes you to some terrible experiences then read this book. I’m having a hard time saying this was my favorite fiction book from last year but it is the only five star fiction book I’m specifically calling out.

Moral Tribes by Joshua Greene

I didn’t give this book five stars but I’m still going to recommend Moral Tribes by Joshua Greene. This book explains how conflict arises when two moral groups meet and interact (an “Us” vs “Them” situation) and proposes Utilitarianism as a solution to this problem. While some parts of the book were a chore to finish, I’m glad I’ve read this book. My review highlights more of what I found interesting in this book.

There is a section of the book that presents both sides of the abortion debate which I’ve shown to friends on both side of the debate. All of them seemed to enjoy reading this small section of the book.

Series read this year

I finish almost every book I start reading. I read quick enough where I find it worthwhile to finish marginal books just in case they turn out good. With a book series the end of each book provides a checkpoint for me to reevaluate if I want to continue the series. While none of the below series earned five stars, I’m highlighting them because each series represents me making the choice to continue staying in author’s world.

I continued The Expanse series this year and read books 4-6. This is just great space opera and I’ll continue reading it until it stops being written.

I started and finished Don Winslow’s The Power of the Dog series. It tells the fictionalized tale of drug cartels in Mexico. Parts are brutal and violent and unfortunately often based on real events.

I read the three main books of Ann Leckie’s Imperial Radch. It is a neat science fiction story that deals with interesting topics. As a warning, I found the first book in the series to be the weakest. Keep pushing and read the second before giving up on this story.

Scott Meyer’s Magic 2.0 was a fun read in which the main character realizes he can edit a file and change reality. It is a silly premise and it delivers good, funny, lighthearted reading.

I also read Neil Gaiman’s American Gods and Anasazi Boys. I devoured both of these books.

Technical books read

I didn’t read many books on software in 2016. I read the lowest number of software related books since I’ve started keeping track. I finished Ben Rady’s Serverless Single Page Apps and Ron Jeffries' The Nature of Software Development. I also read most of Google’s Site Reliability Engineering but did not finish it before the end of 2016. I read more non-fiction books than in 2015 but I would have liked to see more software books in the mix.

More stats

There were definitely a couple months where my reading took a definite dip. I don’t remember what I was doing during April or September but apparently I wasn’t reading.

Chart of reading per month

Unsurprisingly, ebooks continued to be my preferred format. This isn’t surprising and I expect that this trend continues. At this point the only reason I’ll even potentially look at this stat next year is because it forces me to go through and confirm each book had its format recorded correctly.

1
2
3
4
5
|           | 2014 | 2015 | 2016 |
|-----------+------+------+------|
| ebook     |   64 |   47 |   56 |
| hardcover |    1 |    1 |    0 |
| paperback |    4 |    3 |    3 |

My average rating was about the same as 2015.

1
2
3
4
5
6
7
8
| Year | Average Rating |
|------+----------------|
| 2011 |           3.84 |
| 2012 |           3.66 |
| 2013 |           3.67 |
| 2014 |           3.48 |
| 2015 |           3.86 |
| 2016 |           3.83 |

I read multiple books by quite a few authors. Below is a table summarizing my repeat

1
2
3
4
5
6
7
8
9
| Author           | Average Rating | Number of Books | Number of Pages |
|------------------+----------------+-----------------+-----------------|
| Ann Leckie       |          3.667 |               3 |            1205 |
| James S.A. Corey |          3.667 |               3 |            1673 |
| Scott Meyer      |              4 |               3 |            1249 |
| Don Winslow      |              4 |               2 |            1182 |
| Junot Díaz       |              4 |               2 |             565 |
| Michael Crichton |            3.5 |               2 |             814 |
| Neil Gaiman      |              4 |               2 |             981 |

2017 goals

This year I’m planning on revisiting some of my favorite books. I’m not going to set a concrete number for this goal and will just have to trust myself to honestly judge if I accomplish this goal.


  1. This review is currently blank on Goodreads because this was a book read for a book club I’m part of and we haven’t met to discuss the book yet. We typically don’t post reviews online of books we’ll be discussing. To any of my book club members that are reading this post, I apologize for spoiling the surprise of what star rating I’m planning on giving this book.

  2. I’m considering writing up more about this book and some changes I’ve made to help myself do more deep work.

Making code fast: Measure what you intend to measure

I’ve spent a significant portion of my career figuring out how to make software run faster. It is a problem I enjoy solving. One of the most important steps in an optimization task is to identify what you are trying to optimize and how you will measure it. Answer these questions wrong and you’ll waste your time solving the wrong problem.

Recently I joined a teammate on a task that involved identifying a bottleneck in a Clojure code base. We knew the code path we needed to optimize and turned to the Tufte library to take timing measurements. This was my first time using Tufte and, with my tiny amount of usage, I like what I see.

At some point in the process, we had code1 that looked similar to the translate function below (lines 20-24).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
(ns bench.core
  (:require [clojure.string :as string]
            [taoensso.tufte :as tufte]))

(defn raw->maps [lines]
  (map (fn [line]
         (zipmap [:a :b :c]
                 (map (fn [s] (Long/parseLong s))
                      (string/split line #"\|"))))
       lines))

(defn summarize [maps]
  (reduce (fn [r m]
            (-> r
                (update :a (fnil + 0) (:a m))
                (update :b (fnil + 0) (:b m))
                (update :c (fnil + 0) (:c m))))
          maps))

(defn translate [lines]
  (tufte/profile {}
                 (let [maps (tufte/p ::raw->maps (raw->maps lines))
                       summary (tufte/p ::summarize (summarize maps))]
                   summary)))

Here is some Tufte output from running some data through translate.

1
2
3
4
5
              pId      nCalls       Min        Max       MAD      Mean   Time% Time
:bench.core/summarize           1   346.0ms    346.0ms       0ns   346.0ms     100 346.0ms
:bench.core/raw->maps           1    2.46µs     2.46µs       0ns    2.46µs       0 2.46µs
       Clock Time                                                          100 346.05ms
   Accounted Time                                                          100 346.0ms

Notice anything surprising with the output?2

It surprised me that raw->maps took such a tiny amount of time compared to the summarize function. Then I realized that we had forgotten about Clojure’s lazy sequences. summarize is taking so much of the time because raw->maps is just creating a lazy sequence; all the work of realizing that sequence happens in summarize. By wrapping the call to raw->maps with a doall we were able to get the time measurements we intended.

This example demonstrates an important lesson. When you are profiling code, make sure you are measuring what you think you are measuring. This can be challenging in languages, such as Clojure, that have a concept of laziness. Reflect on your measurement results and perform a gut check that the results make sense with what you intended to measure. If anything feels off, confirm that you’re measuring what you meant to measure.


  1. Example built using clojure 1.8.0 and tufte 1.1.1. Also, sorry for the terrible names of functions. I was drawing a blank when coming up with this example.

  2. Imagine this output having 10 more lines in it. Now imagine it having 20. It starts getting quite a bit more difficult to notice oddities as more and more lines get added to this output. Try not to overwhelm yourself by having too much output.