I use goodreads to keep track of my reading and have since early 2010. I find very useful for capturing what I want to read and reminding me of how I felt about books I’ve read. I thought it would be fun to take a closer look at what I read in 2013. I’m doing this using Clojure with Incanter. I haven’t used Incanter since I wrote this post and thought this would be a good opportunity to visit it again.
First I need to get my data out of goodreads. I’ve worked with the Goodreads API before 1 but am not going to use it for this exercise. Instead I’m using the goodreads export functionality (at goodreads follow the links: My Books > import/export) to export a csv file. Having the csv file also lets me cleanup some of the data since some of the book’s page counts were missing 2.
Now that I have data it is time to start playing with it. Run
lein new goodreads-summary and edit the
project.clj file to have a dependency on Incanter.
1 2 3 4 5
Next I’m going to take the csv file and transform it into an Incanter
dataset. This is easily done with
It isn’t well documented but by passing
:keyword-headers false to
read-dataset the headers from the csv are not converted to keywords.
I’m doing this because some of the goodreads headers contain spaces
and dealing with spaces in keywords is a pain. The snippet below has
all of the necessary requires for the remainder of the examples.
1 2 3 4 5 6 7 8 9 10 11
read-csv with the path to the exported goodreads data
results in dataset. If you want to view the data use
(incanter/view (read-csv "goodreads_export.csv")) pops up a
grid of with all the data. I don’t care about most of the columns so
lets define a function that selects out the few I care about.
Selecting columns is done with incanter.core/sel. Like most Incanter functions it has many overloads. One way to use it is to pass a dataset with a vector of columns you want to select.
Filtering a dataset is done using incanter.core/$where. Goodreads has three default shelves to-read, currently-reading, and read. To select all your read books you filter of the Exclusive Shelf column for read books.
Filtering for books read in 2013 is a bit more complicated. First I
convert the Date Read column from a string to a
org.joda.time.DateTime. This is done with the combination of
parse-date. Some of the my data is missing a Date Read value. I’m choosing to handle this by treating missing data as the result of
books-read-in-2013 is a bit more complicated than
the filtering in
finished. Here I’m providing a predicate to
use instead of just doing an equality comparison.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Now we have a dataset that that contains only books read in 2013
(well, until I read a book in 2014 and the filter above also grabs
books in 2014). Now to generate some analytic for each month. First
lets add a Month column to our data. Originally I wrote the
function below. It uses
incanter.core/$map to generate the data,
makes a dataset with the new data, and then adds that to the original
1 2 3 4 5
When I wrote the above code it seemed like there should be a better
way. While writing this post I stumbled across
add-month-read-column almost trivial.
Now that we have
add-month-read-column we can now start aggregating
some stats. Lets write code for calculating the pages read per month.
1 2 3 4
That was pretty easy. Lets write a function to count the number of books read per month.
1 2 3 4
book-count-by-month are very similar. Each uses incanter.core/$rollup to calculate per month stats. The first argument to
$rollup can be a function that takes a sequence of values or one of the supported magical “function identifier keywords”.
Next lets combine the data together so we can print out a nice table. While we are at it lets add another column.
1 2 3 4 5 6 7 8 9
stats-by-month returns a dataset which when printed looks like the following table. It joins the data, renames columns, and adds a derived column.
| Month | Book Count | Page Count | Pages/Books | |-------+------------+------------+-------------| | 1 | 6 | 1279 | 213 | | 2 | 2 | 1251 | 626 | | 3 | 8 | 2449 | 306 | | 4 | 5 | 1667 | 333 | | 5 | 6 | 2447 | 408 | | 6 | 5 | 1609 | 322 | | 7 | 5 | 1445 | 289 | | 8 | 5 | 2229 | 446 | | 9 | 2 | 963 | 482 | | 10 | 5 | 1202 | 240 | | 11 | 5 | 2248 | 450 | | 12 | 7 | 1716 | 245 |
Great. Now we have a little ascii table. Lets get graphical and make some bar charts.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Running the snippet
view-page-count-chart produces a pop-up with the
below bar chart. The chart actually surprises me as I fully expected
to have higher page counts during the winter months than the summer
months. This chart and analysis is pretty useless though without
knowing the difficulty of the pages read. For example, last February I
Knowing that I don’t feel like having a low page count in that month
is slacking at all.
2013 was a pretty big year of reading. I read more books this past year than all other years that I have data. I also read some of the best books I’ve ever read. Not only that but I actually created multiple 3 custom Kindle dictionaries to help improve my (and others) reading experience.
Summary table 4:
| :shelf | :books | :pages | |----------+--------+--------| | non-tech | 51 | 17798 | | tech | 10 | 2707 | | read | 61 | 20505 |
Plans for 2014
I’m planning on reading a similar amount in this upcoming year but will probably have a bit more non-fiction books. First step towards doing that is to start classifying my books as non-fiction or fiction. I’m also planning on rereading at least two books that I’ve read in the last few years. This is unusual for me because I don’t often reread books that quickly.
If you have any book recommendations feel free to leave them in the comments or contact me through twitter or email.
A project on Heroku that takes your to-read list from goodreads and queries the Chicago Public Library to see if books are available. Someday I’ll give it some love and make it usable by others.↩
I’ve also applied to be a goodreads librarian so I can actually fix their data as well.↩
tech shelf only includes programming books.↩