What! What?

Wal McConnell

Finding Data, Index and Free sizes for all the tables in your MySQL Database

'SHOW TABLE STATUS' is just pure ugly, so I have another simple command for showing the data, index and free sizes for the tables in my MySQL DB

SELECT concat(table_schema,'.',table_name),
	table_rows as 'rows',  
	concat(round(data_length/(1024*1024*1024),2),'Gb') Data_Size,  
	concat(round(data_free/(1024*1024*1024),2),'Gb') Free_Size,  
	concat(round(index_length/(1024*1024*1024),2),'Gb') Index_Size,  
	concat(round((data_length+index_length)/(1024*1024*1024),2),'Gb') Total_Size
FROM information_schema.TABLES   
ORDER BY data_length+index_length;

Building a simple, fast, scaleable API with sinatra, memcached, Amazon SQS, delayed_job, Exceptional and Heroku

The following is a blog post I posed over at the exceptional blog

The recent fast growth in Exceptional has meant our publish API implementation needed to be updated. Below are some notes and decisions we made about its re-architecture.

I've put together a simple skeleton API app that demonstrates the use of the various libraries used below here

Sinatra

The exceptional publish API is very simple, consisting of a single HTTP POST operation. The full rails stack is overkill for this, so we based our new API on Sinatra.

Throttling

Every scaleable API requires throttling, so that overly 'aggressive' api clients (i.e apps throwing errors in our case) do not hog the limited API resources. We used @datagraphs excellent rack_throttle rack middleware to handle request throttling for our app. rack_throttle allows you to set schemes for request limit counts on your interface, and makes it easy to set rules even for custom ways to identify clients (i.e by IP Address, API Key, path, cookie, header, etc).

We hooked up rack_throttle with memcached to store the connection counts. On heroku a shared memcached instance is available to all dynos in an application, so this was a super handy (and fast) way to maintain connection counts across all dynos. We need a fair few dynos to handle the ~4k api requests per minute that the exceptional API handles.

Queueing

Exceptional uses Amazon SQS as its back end queueing infrastructure. This means that when requests are received by the API interface, they are queued for processing at a later stage on a cluster of processing servers (The status of which you can see here). This allows the processing server cluster an amount of elasticity during burst periods.

Delayed Job

The speed at which the API interface could return to the api request was of crucial importance to our API. Since Amazon SQS is a third party service with a network hop, we used delayed_job to decouple the Exceptional API interface from the publishing to Amazon SQS. Each API request is persisted immediately, and then delayed job workers pick up the API requests and enqeue them onto Amazon SQS. This gives us the fastest possible API response time, and also another level of infrastructural flexibility.

New Relic

Monitoring the runtime performance of the API is key, and New Relic RPM is simply fantastic for that. On Heroku with the New Relic Add-on enabled, configuring new relicwas as simple as

configure :production do
  require 'newrelic_rpm'
end

Exceptional

Obviously Exceptional is absolutely essential for monitoring any errors that occur on our API service. The docsdescribe the few steps required to enable exceptional for your sinatra app

Heroku

Our API runs on, and the fantastic infrastructure provided by heroku.

Database

We use a MySQL cluster hosted on Amazon RDS for all our databases.

Scalability & Performance

The API is currently scaling to burst traffic of ~7k request per minute without any degradation in average response time (~30ms). We have further scaleability testing, and likely further scaleability improvements to do, but what we have will keep us through our next growth period.

Futures

We have not implemented it in our API (yet!), but David Dollars Autoscale middleware looks very interesting!

Using eventmachine or node.js are likely other avenues if our API traffic continues to scale at the current rate.

Sinatra & Heroku & NewRelic

The docs for instrumenting your Sinatra app with New Relic on Heroku are either wrong or a bit sparse. Its actually very simple.

  1. Add the newrelic_rpm gem to either your .gems or Gemfile
  2. require the rpm library when in production
    configure :production do
    	require 'newrelic_rpm'
    end
  3. add the heroku new_relic addon
    heroku addons:add newrelic:bronze

A full working example is here!

RDS Database Stats

If you use Amazon RDS the following might be useful for pulling some simple Min, Max stats (FreeStorageSpace, CPU Utilisation and DB Connection Count) about your RDS instance

git clone http://github.com/wal/RDS-DB-Stats.git
cd RDS-DB-Stats
ruby rds_db_stats.rb AWS_ACCESS_KEY AWS_SECRET_KEY RDS_ID TIME
RDS_ID - your rds instance identifier
TIME - the number of minutes you want to calculate the stats for 
(i.e 60 = last hour)

Code is on Github

Git Submodules

Working with work-in-progress rails plugins can be a right pain. Especially if you need to upgrade them often.

I've started to install plugins in my rails apps as git submodules, this allows me to upgrade them easily, and also win the power of git for manipulating them. This is especially useful for making patches, and developing plugins

Adding submodules is easy. Instead of the usual

script/plugin install ... 
stuff, use


git submodule add GIT-URL-OF-PLUGIN ./vendor/plugins/PLUGIN-NAME

e.g


git submodule add git@github.com:contrast/exceptional.git ./vendor/plugins/exceptional

** Warning ** Installing a plugin this way will not fire its initialization. So you might need to run the init.rb for your plugin manually

Now you can cd into that plugin folder, and use all its git commands as if it was an isolated git repo

The Dreyfus model of skill acquisition

Inspired by a great talk I attended today by Ryan Sims I've been swoting up on the Dreyfus model which defines a model for new skill acquisition. Thinking about this model while acquiring new skills allows you to focus on what you need to do to get to the next level (eventually Expert), rather than just simply spending time with a new technology/subject which I all too often do. Its about 'learning smartly'.

innotop MySQL Monitoring tool

innotop is a fantastic tool to investigating what's going on live inside your MySQL server. Use it to spot locks, deadlocks, long running queries, buffer sizes, replication status

HAVING SQL clause

I've been using the HAVING clause to simplify my SQL queries. It allows you to specify a condition for an aggregate function used in the SELECT list (doing so in the WHERE clause is illegal).

So to find the politicians who are receiving a total of more than $50,000 of expenses

	
		SELECT politician, SUM(payment) 
		FROM expences 
		GROUP BY politician 
		HAVING SUM(payment) > 50000;	

Nice!