Serverless site analytics with Clojure nbb and AWS

Serverless site analytics with Clojure nbb and AWS

Since I started this blog, I miss some simple analytics. I don't want a cookie banner and I don't want to pay if possible, so I took some time to build a homemade solution, it was actually a good opportunity to try out the new @borkdude creation: nbb (a Clojure interpreter on node.js) which is a good fit for easy Clojure-based AWS Lambda (with no compilation step).
Check this blogpost for more details using nbb on AWS lambda: https://blog.michielborkent.nl/aws-lambda-nbb.html

You can see (and use freely) the final code here: https://github.com/cyppan/simple-site-analytics, the AWS infrastructure is managed as code (well, yaml actually) through the Serverless framework

The features I need:

  • Track the views per day
  • Keep specific counters for "utm_source" (passed in a query param when sharing my blog URLs, ex: "twitter" or "slack")
  • Show the top URLs (even if I have only three for now :) )

The AWS components used are:

  • A DynamoDB table SiteStatistics used to store the views counters per day and url.
  • A Lambda which increments the views counters.
  • A Lambda which returns a html page showing some statistics about the last seven days.
  • Two API Gateway HTTP endpoints proxying to the lambdas (POST /track and GET /dashboard).

Part one: the tracker

Each time a user view an URL, a fetch will call the /track endpoint with the canonical url and the utm_source if any.
The following javascript snippet can be added on the website pages:

<script type="text/javascript">
  fetch('https://xxxxxxx.execute-api.eu-west-3.amazonaws.com/track', {
    method: 'post', 
    mode: 'cors', 
    headers: {"Content-Type": "application/json"}, 
    body: JSON.stringify({
      url: document.querySelector("link[rel='canonical']").getAttribute("href"),
      utm_source: new URLSearchParams(window.location.search).get("utm_source")
    })
  });
</script>

The corresponding "track" lambda uses the node.js library "@aws-sdk/client-dynamodb" to create a dynamo client and call a function "increment-views".  

(defn increment-views [day url utm-source]
  (.send @dynamo-client
         (dynamo/UpdateItemCommand.
          (clj->js {:TableName "SiteStatistics"
                    :Key {:day {:S day}
                          :url {:S url}}
                    :UpdateExpression (str "ADD #views :increment"
                                           (when (seq utm-source)
                                             (str ", views_" utm-source " :increment")))
                    :ExpressionAttributeNames {"#views" "views"}
                    :ExpressionAttributeValues {":increment" {:N "1"}}
                    :ReturnValues "ALL_NEW"}))))

The design of the SiteStatistics table is pretty simple, the composite key is (day, url) and the columns are the counters (views, views_twitter, views_slack, ...). It is adapted for the read patterns I need (fetch all the URL counters for the last N days).

Part two: the dashboard

The other "dashboard" lambda is meant to be open in the browser and display a HTML view of the statistics. there is a bit more code involved in order to generate the view. For the styling I use bulma CSS which is a real time saver.

The first thing the lambda does is to fetch all the items from the SiteStatistics Dynamo table for the last 7 days, named stat-rows in the code, ex: [{:views 4 :views_slack 2 :day "2022-02-01" :url "https://url.com/page"} ,,,]

(defn fetch-last-7-days-statistics
  "returns [{day url views}]"
  []
  (p/let [items (js/Promise.all
                 (for [day (last-7-days)]
                   (p/let [resp (.send @dynamo-client
                                       (dynamo/QueryCommand.
                                        (clj->js {:TableName "SiteStatistics"
                                                  :KeyConditionExpression "#day = :day"
                                                  :ExpressionAttributeNames {"#day" "day"}
                                                  :ExpressionAttributeValues {":day" {:S day}}})))
                           resp (js->clj resp :keywordize-keys true)
                           items (->> (:Items resp)
                                      (map -parse-dynamo-item))]
                     (or (seq items) [{:day day :views 0}]))))]
    (into [] cat items)))

The page contains three sections.

Section 1: the counters tiles

(defn counter-cards [stat-rows]
  (let [views (reduce + 0 (map :views stat-rows))
        views-slack (reduce + 0 (map :views_slack stat-rows))
        views-twitter (reduce + 0 (map :views_twitter stat-rows))]
    [:nav.level.is-mobile
     [:div.level-item.has-text-centered
      [:div
       [:p.heading "Total views"]
       [:p.title views]]]
     [:div.level-item.has-text-centered
      [:div
       [:p.heading "views from Slack"]
       [:p.title views-slack]]]
     [:div.level-item.has-text-centered
      [:div
       [:p.heading "views from Twitter"]
       [:p.title views-twitter]]]]))

Section 2: the views bar chart

For this one I generate a vega-lite grammar and I use vega-embed to render it

(defn views-bar-chart [stat-rows]
  (let [data (->> stat-rows
                  (group-by :day)
                  (map (fn [[day rows]]
                         {:day day
                          :views (reduce + 0 (map :views rows))}))
                  (sort-by :day <))
        spec (clj->js {:$schema "https://vega.github.io/schema/vega-lite/v5.json"
                       :data {:values data}
                       :mark {:type "bar"}
                       :width "container"
                       :height 300
                       :encoding {:x {:field "day"
                                      :type "nominal"
                                      :axis {:labelAngle -45}}
                                  :y {:field "views"
                                      :type "quantitative"}}})
        id (str "div-" (.toString (crypto/randomBytes 16) "hex"))
        raw (str "<div id=\"" id "\" style=\"width:100%;height:300px\"></div>"
                 "<script type=\"text/javascript\">"
                 "vegaEmbed ('#" id "', JSON.parse('" (js/JSON.stringify spec) "'));"
                 "</script>")]
    [:div {:dangerouslySetInnerHTML {:__html raw}}]))

Section 3: the top urls table

(defn top-urls-table [stat-rows]
  (let [top-urls (->> stat-rows
                      (filter :url)
                      (group-by :url)
                      (map (fn [[url rows]]
                             {:url url
                              :views (reduce + 0 (map :views rows))
                              :views_slack (reduce + 0 (map :views_slack rows))
                              :views_twitter (reduce + 0 (map :views_twitter rows))}))
                      (sort-by :views >))]
    [:table.table.is-fullwidth.is-hoverable.is-striped
     [:thead>tr
      [:th "Rank"]
      [:th "URL"]
      [:th "Views"]
      [:th "Slack"]
      [:th "Twitter"]]
     [:tbody
      (for [[i {:keys [url views views_slack views_twitter]}] (map-indexed vector top-urls)]
        [:tr
         [:th {:style {:width "20px"}} (inc i)]
         [:td [:a {:href url} url]]
         [:td {:style {:width "20px"}} views]
         [:td {:style {:width "20px"}} views_slack]
         [:td {:style {:width "20px"}} views_twitter]])]]))

That's it! the whole page is generated like this:

(wrap-template
 [:<>
  [:div.box
   (counter-cards stat-rows)
   (views-bar-chart stat-rows)]
  [:div.box
   [:h1.title.is-3 "Top URLs"]
   (top-urls-table stat-rows)]])

The final files tree is just:

handlers
├── dashboard.cljs
└── track.cljs
package.json
index.mjs
serverless.yml
package-lock.json

You can see the whole code here with some more information in the readme about costs, CORS, and how to develop locally.