Building a simple, fast, scaleable API with sinatra, memcached, Amazon SQS, delayed_job, Exceptional and Heroku
The recent fast growth in Exceptional has meant our publish API implementation needed to be updated. Below are some notes and decisions we made about its re-architecture.
I've put together a simple skeleton API app that demonstrates the use of the various libraries used below here
Sinatra
The exceptional publish API is very simple, consisting of a single HTTP POST operation. The full rails stack is overkill for this, so we based our new API on Sinatra.
Throttling
Every scaleable API requires throttling, so that overly 'aggressive' api clients (i.e apps throwing errors in our case) do not hog the limited API resources. We used @datagraphs excellent rack_throttle rack middleware to handle request throttling for our app. rack_throttle allows you to set schemes for request limit counts on your interface, and makes it easy to set rules even for custom ways to identify clients (i.e by IP Address, API Key, path, cookie, header, etc).
We hooked up rack_throttle with memcached to store the connection counts. On heroku a shared memcached instance is available to all dynos in an application, so this was a super handy (and fast) way to maintain connection counts across all dynos. We need a fair few dynos to handle the ~4k api requests per minute that the exceptional API handles.
Queueing
Exceptional uses Amazon SQS as its back end queueing infrastructure. This means that when requests are received by the API interface, they are queued for processing at a later stage on a cluster of processing servers (The status of which you can see here). This allows the processing server cluster an amount of elasticity during burst periods.
Delayed Job
The speed at which the API interface could return to the api request was of crucial importance to our API. Since Amazon SQS is a third party service with a network hop, we used delayed_job to decouple the Exceptional API interface from the publishing to Amazon SQS. Each API request is persisted immediately, and then delayed job workers pick up the API requests and enqeue them onto Amazon SQS. This gives us the fastest possible API response time, and also another level of infrastructural flexibility.
New Relic
Monitoring the runtime performance of the API is key, and New Relic RPM is simply fantastic for that. On Heroku with the New Relic Add-on enabled, configuring new relicwas as simple as
configure :production do
require 'newrelic_rpm'
end
Exceptional
Obviously Exceptional is absolutely essential for monitoring any errors that occur on our API service. The docsdescribe the few steps required to enable exceptional for your sinatra app
Heroku
Our API runs on, and the fantastic infrastructure provided by heroku.
Database
We use a MySQL cluster hosted on Amazon RDS for all our databases.
Scalability & Performance
The API is currently scaling to burst traffic of ~7k request per minute without any degradation in average response time (~30ms). We have further scaleability testing, and likely further scaleability improvements to do, but what we have will keep us through our next growth period.
Futures
We have not implemented it in our API (yet!), but David Dollars Autoscale middleware looks very interesting!
Using eventmachine or node.js are likely other avenues if our API traffic continues to scale at the current rate.