Finding Data, Index and Free sizes for all the tables in your MySQL Database
'SHOW TABLE STATUS' is just pure ugly, so I have another simple command for showing the data, index and free sizes for the tables in my MySQL DB
SELECT concat(table_schema,'.',table_name),
table_rows as 'rows',
concat(round(data_length/(1024*1024*1024),2),'Gb') Data_Size,
concat(round(data_free/(1024*1024*1024),2),'Gb') Free_Size,
concat(round(index_length/(1024*1024*1024),2),'Gb') Index_Size,
concat(round((data_length+index_length)/(1024*1024*1024),2),'Gb') Total_Size
FROM information_schema.TABLES
ORDER BY data_length+index_length;
Building a simple, fast, scaleable API with sinatra, memcached, Amazon SQS, delayed_job, Exceptional and Heroku
The recent fast growth in Exceptional has meant our publish API implementation needed to be updated. Below are some notes and decisions we made about its re-architecture.
I've put together a simple skeleton API app that demonstrates the use of the various libraries used below here
Sinatra
The exceptional publish API is very simple, consisting of a single HTTP POST operation. The full rails stack is overkill for this, so we based our new API on Sinatra.
Throttling
Every scaleable API requires throttling, so that overly 'aggressive' api clients (i.e apps throwing errors in our case) do not hog the limited API resources. We used @datagraphs excellent rack_throttle rack middleware to handle request throttling for our app. rack_throttle allows you to set schemes for request limit counts on your interface, and makes it easy to set rules even for custom ways to identify clients (i.e by IP Address, API Key, path, cookie, header, etc).
We hooked up rack_throttle with memcached to store the connection counts. On heroku a shared memcached instance is available to all dynos in an application, so this was a super handy (and fast) way to maintain connection counts across all dynos. We need a fair few dynos to handle the ~4k api requests per minute that the exceptional API handles.
Queueing
Exceptional uses Amazon SQS as its back end queueing infrastructure. This means that when requests are received by the API interface, they are queued for processing at a later stage on a cluster of processing servers (The status of which you can see here). This allows the processing server cluster an amount of elasticity during burst periods.
Delayed Job
The speed at which the API interface could return to the api request was of crucial importance to our API. Since Amazon SQS is a third party service with a network hop, we used delayed_job to decouple the Exceptional API interface from the publishing to Amazon SQS. Each API request is persisted immediately, and then delayed job workers pick up the API requests and enqeue them onto Amazon SQS. This gives us the fastest possible API response time, and also another level of infrastructural flexibility.
New Relic
Monitoring the runtime performance of the API is key, and New Relic RPM is simply fantastic for that. On Heroku with the New Relic Add-on enabled, configuring new relicwas as simple as
configure :production do
require 'newrelic_rpm'
end
Exceptional
Obviously Exceptional is absolutely essential for monitoring any errors that occur on our API service. The docsdescribe the few steps required to enable exceptional for your sinatra app
Heroku
Our API runs on, and the fantastic infrastructure provided by heroku.
Database
We use a MySQL cluster hosted on Amazon RDS for all our databases.
Scalability & Performance
The API is currently scaling to burst traffic of ~7k request per minute without any degradation in average response time (~30ms). We have further scaleability testing, and likely further scaleability improvements to do, but what we have will keep us through our next growth period.
Futures
We have not implemented it in our API (yet!), but David Dollars Autoscale middleware looks very interesting!
Using eventmachine or node.js are likely other avenues if our API traffic continues to scale at the current rate.
Sinatra & Heroku & NewRelic
The docs for instrumenting your Sinatra app with New Relic on Heroku are either wrong or a bit sparse. Its actually very simple.
- Add the newrelic_rpm gem to either your .gems or Gemfile
- require the rpm library when in production
configure :production do require 'newrelic_rpm' end - add the heroku new_relic addon
heroku addons:add newrelic:bronze
A full working example is here!
RDS Database Stats
If you use Amazon RDS the following might be useful for pulling some simple Min, Max stats (FreeStorageSpace, CPU Utilisation and DB Connection Count) about your RDS instance
git clone http://github.com/wal/RDS-DB-Stats.git
cd RDS-DB-Stats
ruby rds_db_stats.rb AWS_ACCESS_KEY AWS_SECRET_KEY RDS_ID TIME
RDS_ID - your rds instance identifier
TIME - the number of minutes you want to calculate the stats for
(i.e 60 = last hour)
Code is on Github
Darragh and Myself and Scotland Ruby Conf 2010
Git Submodules
Working with work-in-progress rails plugins can be a right pain. Especially if you need to upgrade them often.
I've started to install plugins in my rails apps as git submodules, this allows me to upgrade them easily, and also win the power of git for manipulating them. This is especially useful for making patches, and developing plugins
Adding submodules is easy. Instead of the usual
script/plugin install ... stuff, use
git submodule add GIT-URL-OF-PLUGIN ./vendor/plugins/PLUGIN-NAME
e.g
git submodule add git@github.com:contrast/exceptional.git ./vendor/plugins/exceptional
** Warning ** Installing a plugin this way will not fire its initialization. So you might need to run the init.rb for your plugin manually
Now you can cd into that plugin folder, and use all its git commands as if it was an isolated git repo
The Dreyfus model of skill acquisition
Inspired by a great talk I attended today by Ryan Sims I've been swoting up on the Dreyfus model which defines a model for new skill acquisition. Thinking about this model while acquiring new skills allows you to focus on what you need to do to get to the next level (eventually Expert), rather than just simply spending time with a new technology/subject which I all too often do. Its about 'learning smartly'.
innotop MySQL Monitoring tool
innotop is a fantastic tool to investigating what's going on live inside your MySQL server. Use it to spot locks, deadlocks, long running queries, buffer sizes, replication status
HAVING SQL clause
I've been using the HAVING clause to simplify my SQL queries. It allows you to specify a condition for an aggregate function used in the SELECT list (doing so in the WHERE clause is illegal).
So to find the politicians who are receiving a total of more than $50,000 of expenses
SELECT politician, SUM(payment)
FROM expences
GROUP BY politician
HAVING SUM(payment) > 50000;
Nice!