Last time I needed to speed up some code, I wrote a Clojure macro that recorded the aggregate time spent executing the code wrapped by the macro. Aggregate timings were useful since the same functions were called multiple times in the code path we were trying to optimize. Seeing total times made it easier to identify where we should spend our time.
Below is the namespace I temporarily introduced into our codebase.
(ns metrics)(defn msec-str"Returns a human readable version of milliseconds based upon scale"[msecs](let [s1000m(* 60s)h(* 60m)](condp>= msecs1(format"%.5f msecs"(float msecs))s(format"%.1f msecs"(float msecs))m(format"%.1f seconds"(float (/ msecss)))h(format"%02dm:%02ds"(int (/ msecsm))(mod(int (/ msecss))60))(format"%dh:%02dm"(int (/ msecsh))(mod(int (/ msecsm))60)))))(def aggregates(atom{}))(defmacro record-aggregate"Records the total time spent executing body across invocations."[label&body]`(do(when-not (contains? @aggregates~label)(swap!aggregatesassoc ~label{:order(inc (count @aggregates))}))(let [start-time#(System/nanoTime)result#(do ~@body)result#(if (and (seq? result#)(instance? clojure.lang.IPendingresult#)(not (realized?result#)))(doall result#)result#)end-time#(System/nanoTime)](swap!aggregatesupdate-in[~label:msecs](fnil+ 0)(/ (double (- end-time#start-time#))1000000.0))result#)))(defn log-times"Logs time recorded by record-aggregate and resets the aggregate times."[](doseq [[labeldata](sort-by (comp :ordersecond)@aggregates):let[msecs(:msecsdata)]](println "Executing"label"took:"(msec-strmsecs)))(reset!aggregates{}))
record-aggregate takes a label and code and times how long that code takes to run. If the executed code returns an unrealized lazy sequence, it also evaluates the sequence1.
Below is an example of using the above code. When we used it, we looked at the code path we needed to optimize and wrapped chunks of it in record-aggregate. At the end of the calculations, we inserted a call to log-times so timing data would show up in our logs.
Using this technique, we were able to identify slow parts of our process and were able to optimize those chunks of our code. There are potential flaws with measuring time like this, but they were not a problem in our situation2.
Nearly three years ago I wrote an overview of my Leiningen profiles.clj. That post is one of my most visited articles, so I thought I’d give an update on what I currently keep in ~/.lein/profiles.clj.
The biggest difference between my profiles.clj from early 2015 and now is that I’ve removed all of the CIDER related plugins. I still use CIDER, but CIDER no longer requires you to list its dependencies explicitly.
I’ve also removed Eastwood and Kibit from my toolchain. I love static analysis, but these tools fail too frequently with my projects. As a result, I rarely used them and I’ve removed them. Instead, I’ve started using joker for some basic static analysis and am really enjoying it. It is fast, and it has made refactoring in Emacs noticeably better.
I’m also taking advantage of some new features that lein-test-refresh provides. These settings enable the most reliable, fastest feedback possible while writing tests. My recommended testing setup article goes into more details.
lein-ancient and lein-pprint have stuck around. I rarely use lein-pprint but it comes in handy when debugging project.clj problems. lein-ancient is great for helping you keep your project’s dependencies up to date. I use a forked version that contains some changes I need to work with my company’s private repository.
Some of you might wonder why I don’t just link to this file in version control somewhere? Well, it is kept encrypted in a git repository because it also contains some secrets that should not be public that I’ve removed for this post.↩
From May 6th to June 2nd the screen of my phone had a crack. I have an Android phone, and the crack was through the software buttons at the bottom of the screen. As a result, I could not touch the back, home, or overview (app switching) buttons. For nearly a month I never saw my home screen, couldn’t go back, or switch apps through touching my phone. I was very reliant on arriving notifications giving me an opportunity to open apps.
It took me some time, but I realized I could use voice commands to replace some of the missing functionality. Using voice commands, I could open apps and no longer be at the whim of notifications.
Here is an example of my phone usage during this month. My thoughts are in [brackets]. Italics indicate actions. Talking is wrapped in “ ”.
[Alright, I want to open Instagram] “Ok Google, open Instagram.”
[Sweet, it worked] scrolls through feed
WhatsApp notification happens [Great, a notification, I can click it to open WhatsApp]
I read messages in WhatsApp.
[Time to go back to Instagram] “Ok Google, open Instagram”
Instagram opens [Great, time to scroll through more pictures]
As you can see, it is a bit more painful than clicking buttons to switch between different apps. Voice commands fail sometimes and, at least for me, generally take more effort than tapping the screen. That’s ok though; I was determined to embrace voice commands and experience what a future of only voice commands might feel like.
Below are some observations from using my voice to control my phone for a month.
It is awkward in public
My phone usage in public went way down. There was something about having to talk to your phone to open an app that made me not want to pull out my phone.
It is much more obvious you are using your phone when you use your voice to control it. It makes casual glances at your phone while hanging out with a group impossible. You can’t sneak a quick look at Instagram when you need to say “Ok Google, open Instagram” without completely letting everyone around you know you are no longer paying attention.
This also stopped me from using my phone in Ubers/Lyfts/cabs. I often talk to the driver or other passengers anyway, but this cemented that. I realize it is completely normal to ignore the other people in a car but I felt like a (small) asshole audibly calling out that I’m ignoring other people in the car.
You become more conscious of what apps you use
When you have to say “Okay Google, open Instagram” every time you want to open Instagram, you become way more aware of how often you use Instagram. Using your voice instead of tapping a button on your screen is a much bigger hurdle between having the urge to open something and actually opening it. It gives you more time to observe what you are doing.
You become more conscious of using your phone
Using your phone becomes a lot harder. This increased difficulty helped highlight when I was using my phone. My phone’s functionality dropped drastically and, as a result, I stopped reaching for it as much.
This reminded me of when I used a dumb (feature) phone for a couple of months a few years ago. Using a non-smartphone after using a smartphone for years was weird. It helped me reign in my usage1.
Voice control can be pretty convenient
Even after repairing my screen, I still find myself using some voice commands. While making my morning coffee, I often ask my phone for the weather forecast. This is more convenient than opening an app and it lets me continue to use both hands while making coffee.
Setting alarms, starting countdown timers, adding reminders, and checking the weather are all things I do through voice commands now.
I wish it worked all the time
I suppose this is an argument for getting a Google Home or Amazon Echo. I have to wake up my phone to use voice commands with it. This limits the usefulness of voice commands since I need be within reach of my phone.
I wish it could do more
At some point, I got used to asking my phone to do things. Then I started giving it more complicated commands, and it would fail. I found myself giving it multi-stage commands such as “Ok Google, turn on Bluetooth and play my playlist Chill on Spotify.” That doesn’t work but it would be amazing if it did.
Recommendations
I recommend that you force yourself to use voice commands for some period of time. Pretend your home button is broken and you have to use voice control to move around your phone. You’ll become more aware of your phone usage and you’ll learn some useful voice commands that will make your technology usage nicer.
My non-smartphone experiment four years ago is what resulted in me no longer using Facebook or Twitter on my phone. It also is the reason I silenced most notifications, including email, on my phone.↩
Earlier this month I took another look at what was required for reading an article on this site. What else could I do to make this site load faster?
To do this, I loaded up WebPageTest and pointed it towards one of my posts. To my shock, it took 113 requests for a total of 721 KB to load a single post. This took WebPageTest 6.491 seconds. The document complete event triggered after 15 requests (103 KB, 1.6 seconds).
113 requests to load a static article was ridiculous. Most of those requests happened as a result of loading the Disqus javascript. I find comments valuable and want to continue including them on my site. Because of this, I couldn’t remove Disqus. Instead, I made loading Disqus optional.
After making the required changes, it only takes 11 requests for 61 KB of data to fully load the test post. The document complete event only required 8 requests for 51 KB of data. Optionally loading the Disqus javascript resulted in a massive reduction of data transferred.
How did I do it? The template that generates my articles now only inserts the Disqus javascript when a reader clicks a button. My final template is at the bottom of this post.
The template adds an insertDisqus function that inserts a <script> element when a reader clicks a button. This element contains the original JavaScript that loads Disqus. When the <script> element is inserted into the page, the Disqus javascript is loaded and the comments appear.
My exact template might not work for you, but I’d encourage you to think about optionally loading Disqus and other non-required JavaScript. Your readers will thank you.
1234567891011121314
{% if site.disqus_short_name and page.comments == true %}
<noscript>Please enable JavaScript to view the <ahref="http://disqus.com/?ref_noscript">comments powered by Disqus.</a></noscript><divid="disqus_target"><script>varinsertDisqus=function(){varelem=document.createElement('script');elem.innerHTML="var disqus_shortname = '{{ site.disqus_short_name }}'; var disqus_identifier = '{{ site.url }}{{ page.url }}'; var disqus_url = '{{ site.url }}{{ page.url }}'; (function () {var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true; dsq.src = '//' + disqus_shortname + '.disqus.com/embed.js'; (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq);}());"vartarget=document.getElementById('disqus_target');target.parentNode.replaceChild(elem,target);}</script><buttonclass="comment-button"onclick="insertDisqus()"><span>ENABLE COMMENTS AND RECOMMENDED ARTICLES</span></button></div>{% endif %}
I went to a coffee shop this last weekend with the intention of writing up a quick article on comm. I sat down, sipping my coffee, and wasn’t motivated. I didn’t feel like knocking out a short post, and I didn’t feel like editing a draft I’ve been sitting on for a while. I wanted to do some work though, so I decided to add a JSON Feed to this site.
JSON Feed is an alternative to Atom and RSS that uses JSON instead of XML. I figured I could add support for it in less than the time it would take to enjoy my coffee and maybe some readers would find it useful. I’d be shocked if anyone actually finds this useful, but it was a fun little exercise anyway.
An old version of Octopress (2.something), which uses an old version of Jekyll (2.5.3), generates this site. Despite this, I don’t think the template would need to change much if I moved to a new version. The template below is saved as source/feed.json in my git repository.
1234567891011121314151617181920212223242526272829
---layout:null---{"version":"https://jsonfeed.org/version/1","title":{{site.title|jsonify}},"home_page_url":"{{ site.url }}","feed_url":"{{site.url}}/feed.json","favicon":"{{ site.url }}/favicon.png","author":{"url":"https://twitter.com/jakemcc","name":"{{ site.author | strip_html }}"},"user_comment":"This feed allows you to read the posts from this site in any feed reader that supports the JSON Feed format. To add this feed to your reader, copy the following URL - {{ site.url }}/feed.json - and add it your reader.","items":[{%forpostinsite.postslimit:20%}{"id":"{{ site.url }}{{ post.id }}","url":"{{ site.url }}{{ post.url }}","date_published":"{{ post.date | date_to_xmlschema }}","title":{%ifsite.titlecase%}{{post.title|titlecase|jsonify}}{%else%}{{post.title|jsonify}}{%endif%},{%ifpost.description%}"summary":{{post.description|jsonify}},{%endif%}"content_html":{{post.content|expand_urls:site.url|jsonify}},"author":{"name":"{{ site.author | strip_html }}"}}{%ifforloop.last==false%},{%endif%}{%endfor%}]}
I approached this problem by reading the JSON Feed Version 1 spec and cribbing values from the template for my Atom feed. The trickiest part was filling in the "content_html" value. It took me a while to find figure out that jsonify needed to be at the end of {{ post.content | expand_urls: site.url | jsonify }}. That translates the post’s HTML content into its JSON representation. You’ll notice that any template expression with jsonify at the end also isn’t wrapped in quotes. This is because jsonify is doing that for me.
The {% if forloop.last == false %},{% endif %} is also important. Without this, the generated JSON has an extra , after the final element in items. This isn’t valid JSON.
I caught that by using the command line tool json. If you ever edit JSON by hand or generate it from a template then you should add this tool to your toolbox. It will prevent you from creating invalid JSON.
How did I use it? I’d make a change in the feed.json template and generate an output file. Then I’d cat that file to json --validate. When there was an error, I’d see a message like below.
1234567
0 [last: 5s] 12:43:47 ~/src/jakemcc/blog (master *)$ cat public/feed.json | json --validate
json: error: input is not JSON: Expected ',' instead of '{' at line 25, column 5: { ....^1 [last: 0s] 12:43:49 ~/src/jakemcc/blog (master *)$
It was pretty straightforward to add a JSON Feed. Was it a good use of my time? ¯\_(ツ)_/¯. In the process of adding the feed I learned more about Liquid templating and figured out how to embed liquid tags into a blog post. Even adding redundant features can be a useful exercise.
I recently found myself in a situation where I needed to confirm that a process took in a tab separated file, did some processing, and then output a new file containing the original columns with some additional ones. The feature I was adding allowed the process to die and restart while processing the input file and pick up where it left off.
I needed to confirm the output had data for every line in the input. I reached to the command line tool comm.
With files this size, it would be easy enough to check visually. In my testing, I was dealing with files that had thousands of lines. This is too many to check by hand. It is a perfect amount for comm.
comm reads two files as input and then outputs three columns. The first column contains lines found only in the first file, the second column contains lines only found in the second, and the last column contains lines in both. If it is easier for you to think about it as set operations, the first two columns are similar to performing two set differences and the third is similar to set intersection. Below is an example adapted from Wikipedia showing its behavior.
123456789101112131415
$ cat foo.txt
apple
banana
eggplant
$ cat bar.txt
apple
banana
banana
zucchini
$ comm foo.txt bar.txt
apple
banana
banana
eggplant
zucchini
So how is this useful? Well, you can also tell comm to suppress outputting specific columns. If we send the common columns from the input and output file to comm and suppress comm’s third column then anything printed to the screen is a problem. Anything printed to the screen was found in one of the files and not the other. We’ll select the common columns using cut and, since comm expects input to be sorted, then sort using sort. Let’s see what happens.
I need to know when my external IP address changes. Whenever it changes, I need to update an IP whitelist and need to re-login to a few sites. I sometimes don’t notice for a couple of days and, during that time, some automatic processes fail.
After the last time this happened, I whipped up a script that sends me a push notification when my IP address changes.
The script uses Pushover to send the push notification. Pushover is great. I have used it for years to get notifications from my headless computers. If you use the below script, replace ${PUSHOVER_TOKEN} and ${PUSHOVER_USER} with your own details.
In
a
previous post,
we used Google’s BigQuery and the
public
GitHub dataset
to discover the most used Clojure testing library. The answer wasn’t
surprising. The built-in clojure.test was by far the most used.
Let’s use the dataset to answer a less obvious question, what are the
most used libraries in Clojure projects? We’ll measure this by
counting references to libraries in project.clj and build.boot
files.
Before we can answer that question, we’ll need to transform the
data. First, we create the Clojure subset of the GitHub dataset. I did
this by executing the following queries and saving the results to
tables1.
123456789101112131415
-- Save the results of this query to the clojure.files tableSELECT*FROM[bigquery-public-data:github_repos.files]WHERERIGHT(path,4)='.clj'ORRIGHT(path,5)='.cljc'ORRIGHT(path,5)='.cljs'ORRIGHT(path,10)='boot.build'-- Save the results to clojure.contentsSELECT*FROM[bigquery-public-data:github_repos.contents]WHEREidIN(SELECTidFROMclojure.files)
Next we extract the dependencies from build.boot and project.clj
files. Fortunately for us, both of these files specify dependencies in
the same format, so we’re able to use the same regular expression on both types.
The query below identifies project.clj and build.boot files,
splits each file into lines, and extracts referenced library names and
versions using a regular expression. Additional filtering is done get
rid of some spurious results.
Clojure and ClojureScript are at the top, which isn’t surprising. I’m
surprised to see tools.nrepl in the next five results (rows 3-7). It
is the only library out of the top that I haven’t used.
What testing library is used the most? We already answered this in
my
last article but
let’s see if we get the same answer when we’re counting how many times
a library is pulled into a project.
Before doing this research I tried to predict what libraries I’d see
in the top 10. I thought that clj-time and clj-http would be up
there. I’m happy to see my guess was correct.
It was pretty pleasant using BigQuery to do this analysis. Queries
took at most seconds to execute. This quick feedback let me play
around in the web interface without feeling like I was waiting for
computers to do work. This made the research into Clojure library
usage painless and fun.
I’ve always assumed that the built-in clojure.test is the most
widely used testing library in the Clojure community. Earlier this
month I decided to test this assumption using the
Google’s BigQuery GitHub dataset.
The BigQuery GitHub dataset contains over three terabytes of source
code from more than 2.8 million open source GitHub
repositories. BigQuery lets us quickly query this data using SQL.
Below is a table with the results (done in early March 2017) of my
investigation. Surprising no one, clojure.test comes out as the
winner and it is a winner by a lot.
23,243 repositories were identified as containing Clojure (or
ClojureScript) code. This means there were about 6,953 repositories
that didn’t use any testing library1. This puts the “no tests or an
obscure other way of testing” in a pretty solid second place.
You should take these numbers as ballpark figures and not exact
answers. I know from using GitHub’s search interface that there are
three public projects
using fudje2.
So, why don’t all three of those projects show up? The dataset only
includes projects where Google could identify the project as open
source and the GitHub licenses API is used to do that3. Two of
those three projects were probably unable to be identified as
something with an appropriate license.
Another small problem is that since expectations is an actual word,
it shows up outside of ns declarations. I ended up using a fairly
simple query to generate this data and it only knows that
expectations shows up somewhere in a file. I experimented with some
more restrictive queries but they didn’t drastically change the result
and I wasn’t sure they weren’t wrong in other ways. If you subtract a
number between 100 and 150 you’ll probably have a more accurate
expectations usage count.
Keep reading if you want to hear more about the steps to come up with
the above numbers.
If you have other Clojure questions you think could be answered by
querying this dataset, let me know in the comments or
on twitter. I have some more ideas, so
I wouldn’t be surprised if at least one more article gets written.
The Details
The process was pretty straightforward. Most of my time was spent
exploring the tables, figuring out what the columns represented,
figuring out what queries worked well, and manually confirming some of
the results. BigQuery is very fast. Very little of my time was spent
waiting for results.
1. Setup the data
You get 1 TB of free BigQuery usage a month. You can blow through this
in a single query. Google provides sample tables that contain less
data but I wanted to operate on the full set of Clojure(Script) files,
so my first step was to execute some queries to create tables
that only contained Clojure data.
First, I queried the github_repos.files table for all the
Clojure(Script) files and saved that to a clojure.files table.
The above query took only 9.2 seconds to run and processed 328 GB of data.
Using the clojure.files table, we can select the source for all the
Clojure code from the github_repos.contents. I saved this to a
clojure.contents table.
This query processed 1.84 TB of data in 21.5 seconds. So fast. In just
under 30 seconds, I’ve blown through the free limit.
2. Identify what testing library (or libraries) a repo uses
We can guess that a file uses a testing library if it contains certain
string. The strings we’ll search for are the namespaces we’d expect to
see required or used in a ns declaration. The below query does this
for each file and then rolls up the results by repository. It took 3
seconds to run and processed 611 MB of data.
Below is a screenshot of the first few rows in the result.
3. Export the data
At this point, we could continue doing the analysis using SQL and the
BigQuery UI but I opted to explore the data using Clojure and the
repl. There were too many rows to directly download the query results
as a csv file, so I ended up having to save the results as a table and
then export it to Google’s cloud storage and download from there.
The code takes the csv file and does some transformations. You could
do this in Excel or using any language of your choice. I’m not going
to include code here, as it isn’t that interesting.
BigQuery thoughts
This was my first time using Google’s BigQuery. This wasn’t the most
difficult analysis to do but I was impressed at the speed and ease of
use. The web UI, which I used entirely for this, is neither really
great or extremely terrible. It mostly just worked and I rarely had to
look up documentation.
I don’t really feel comfortable making a judgment call on if the cost
is expensive or not but this article cost a bit less than seven
dollars to write. This doesn’t seem too outrageous to me.
Based on my limited usage of BigQuery, it is something I’d look into further if I needed its capabilities.
Probably higher, as projects can and use more than one testing library.↩
The 2.2.0 release1
of
expectations adds
a
clojure.testcompatible syntax. The release
adds the defexpect macro which forces you to name your test but then
generates code that is compatible with clojure.test.
Why would you want this? Because clojure.test is the built-in
testing library for Clojure, an entire ecosystem has been built around
it. Tool support for clojure.test is always going to be ahead of
support for the original expectations. By using the new
clojure.test compatible syntax, expectations can take
advantage of all the tools built for clojure.test.
Using lein-test-refresh with expectations
If you move to the new clojure.test compatible syntax, you can start
using
lein-test-refresh to
automatically rerun your tests when your code
changes. lein-test-refresh is a fork of the original expectations autorunner, lein-autoexpect, but it has grown to have more features than its original inspiration. Now you can use it with expectations2.
Below is a sample project.clj that uses lein-test-refresh with the latest expectations.
12345
(defproject expectations-project"0.1.0-SNAPSHOT":description"Sample project using expectations":dependencies[[org.clojure/clojure"1.8.0"]]:plugins[[com.jakemccrary/lein-test-refresh"0.18.1"]]:profiles{:dev{:dependencies[[expectations"2.2.0-beta1"]]}})