Grails: Multi tenant architecture and database indexes

I’ve spent a lot of time lately working on the multi-tenant-single-db plugin for Grails . The main purpose of this plugin is to simplify development of Grails applications based on a single database multi-tenant architecture. 

One of the challenges with this approach is the possible amount of data. Getting database indexes right becomes very important when you have hundreds of customers (or tenants) sharing the same database tables… In this blog post I’ll be demonstrating some of the principles behind the plugin and how they affect database indexes.

Example domain class

The @MultiTenant annotation will trigger a compile time AST transformation that in turn will add a tenantId field to the annotated domain class. GORM and Hibernate will see this field and include it when the database schema is generated.

package indextest

import grails.plugin.multitenant.core.annotation.MultiTenant;

@MultiTenant
class Message {

    String message
    String day
    
    static constraints = {
    }
    
}

After creating a MySQL database for the test application and updating DataSource.groovy we’re ready to run the application. Hibernate should now create a message table in the database you've configured.

mysql> describe message;
+-----------+--------------+------+-----+---------+----------------+
| Field     | Type         | Null | Key | Default | Extra          |
+-----------+--------------+------+-----+---------+----------------+
| id        | bigint(20)   | NO   | PRI | NULL    | auto_increment |
| version   | bigint(20)   | NO   |     | NULL    |                |
| day       | varchar(255) | NO   |     | NULL    |                |
| message   | varchar(255) | NO   |     | NULL    |                |
| tenant_id | int(11)      | NO   |     | NULL    |                |
+-----------+--------------+------+-----+---------+----------------+
5 rows in set (0.00 sec)

No indexes apart from the primary key will be created.

mysql> show indexes from message;
+---------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table   | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+---------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| message |          0 | PRIMARY  |            1 | id          | A         |           0 |     NULL | NULL   |      | BTREE      |         |
+---------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
1 row in set (0.00 sec)

Introducing the Grails console

The Grails console is a semi secret gem. It's a Swing GUI (extension of the Groovy console) allowing you to run code within the context of your application. This is a great way to experiment with the language, framework and all the plugins available in the Grails ecosystem.

Starting the console is incredibly easy. Fire up your favorite terminal and navigate to the root folder of one of your Grails projects. Then it’s just a matter of typing “grails console” and the application should start up, shortly followed by the console window.

Grails console

Consult the grails.org wiki page for more information .

Generate some test data

Executing the following script in the console will generate some test data. This script will generate enough data to demonstrate the concepts. You have to generate a lot more test data if you really want to feel the impact of index in terms of seconds.

import grails.plugin.multitenant.core.Tenant
import indextest.Message

public enum Day {
    SUNDAY, MONDAY, TUESDAY, WEDNESDAY, THURSDAY, FRIDAY, SATURDAY 
}

Day today = Day.SUNDAY
1.upto(20) { tenantId ->
    println "Creating messages for tenant $tenantId"
    Tenant.withTenantId(tenantId) {
        1.upto(100) { messageNr ->
            String msg = "Message for tenant $tenantId on $today"
            Message message = new Message(day: today.toString(), message: msg)
            message.save failOnError: true
            today++
        }
    }
}

Experimenting with the Grails console

Tips! Setting logSql = true in DataSource.groovy makes Hibernate log queries. The logged queries are unfortunately without parameters. There is obviously a Grails plugin out there allowing you to intercept SQL with the parameters in place. Have a look at the p6spy plugin if you want to do some more serious query profiling.

import grails.plugin.multitenant.core.Tenant 
import indextest.UserPassword 

Tenant.withTenantId(10) { 
    Message.list() 
} 

This should result in something similar to this:

Hibernate: 
    select
        this_.id as id0_0_,
        this_.version as version0_0_,
        this_.day as day0_0_,
        this_.message as message0_0_,
        this_.tenant_id as tenant5_0_0_ 
    from
        message this_ 
    where
        ? = this_.tenant_id <-- This line is important
        
Result: [indextest.Message : 901, indextest.Message : 902, indextest.Message : 903...]

Notice the highlighted line above, it demonstrates very nicely how the multi-tenant plugin works. All queries are re-written before they're executed to make sure that they only see and touch rows belonging to the active tenant. All the heavy lifting involved with this is done by Hibernate filters underneath the hood.

When working with a web application outside the Grails console the tenant will be resolved once for each http request, stored on a ThreadLocal variable and applied transparently to all queries made during processing of that http request. This way you don't have to use Tenant.withTenantId(..) everywhere inside your controllers and services.

Database indexes

Now that you have an idea of how the plugin works is it time to start thinking about the implications it has on database performance and indexes. This might not be such a problem in the beginning, but if you have a lot of tenants all producing data you'll very soon run into trouble as most SQL queries will trigger table scans..

Running EXPLAIN on the Hibernate generated SQL query (with parameters added) from the previous example gives us the following information about the execution plan.

mysql> explain select this_.id as id0_0_, this_.version as version0_0_, this_.day as day0_0_, this_.message as 
   message0_0_, this_.tenant_id as tenant5_0_0_ from message this_ where 10 = this_.tenant_id;
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra       |
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
|  1 | SIMPLE      | this_ | ALL  | NULL          | NULL | NULL    | NULL | 2188 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
1 row in set (0.00 sec)

The important thing to notice here is join type (in this case: ALL ) . This is very, very bad news if you have a table with more than a couple of hundred rows. No indexes are being used by the query so the database has to do a table scan to find matching rows.

Tips! Consult the MySQL documentation for more information for a list of join types sorted from best to worst .

Lets fix that!

Creating an index on tenant_id should solve the problem, at least for this query. Note that creating the index might take some time if you have a lot of data and / or a slow disk. A little tip on the side: Configuring your database server to keep data and indexes on different hard drives often results in a nice performance boost, especially on traditional mechanical drives.

mysql> CREATE INDEX tenant_idx ON message (tenant_id) USING BTREE;
Query OK, 2000 rows affected (0.06 sec)
Records: 2000  Duplicates: 0  Warnings: 0

Running EXPLAIN again gives us the following information about the new execution plan:

mysql> explain select this_.id as id0_0_, this_.version as version0_0_, this_.day as day0_0_, 
  this_.message as message0_0_, this_.tenant_id as tenant5_0_0_ from message this_ where 10 = this_.tenant_id;
+----+-------------+-------+------+---------------+------------+---------+-------+------+-------+
| id | select_type | table | type | possible_keys | key        | key_len | ref   | rows | Extra |
+----+-------------+-------+------+---------------+------------+---------+-------+------+-------+
|  1 | SIMPLE      | this_ | ref  | tenant_idx    | tenant_idx | 4       | const |  100 |       |
+----+-------------+-------+------+---------------+------------+---------+-------+------+-------+
1 row in set (0.00 sec)

This is a lot better!

Find messages for a tenant on a given day (multiple where conditions)

The multi-tenant plugin will, as you’ve seen automatically add a tenant id condition. We often use dynamic finders like Message.findByDay(“MONDAY”) in Grails. It’s important to keep in mind that this will result in a query with two, and not one where conditions when executed within a tenant namespace.

import grails.plugin.multitenant.core.Tenant 
import indextest.Message

Tenant.withTenantId(10) { 
    Message.findAllByDay("MONDAY") 
} 

Hibernate: 
    select
        this_.id as id0_0_,
        this_.version as version0_0_,
        this_.day as day0_0_,
        this_.message as message0_0_,
        this_.tenant_id as tenant5_0_0_ 
    from
        message this_ 
    where
        ? = this_.tenant_id    <-- From multi tenant Hibernate filter
        and this_.day=?     <-- From the dynamic finder
        
Result: [indextest.Message : 905, indextest.Message : 912, indextest.Message : 919...]

How does it perform? Running EXPLAIN gives us this:

mysql> explain select this_.id as id0_0_, this_.version as version0_0_, this_.day as day0_0_, this_.message as message0_0_, 
   this_.tenant_id as tenant5_0_0_ from message this_ where 10 = this_.tenant_id and this_.day="MONDAY";
+----+-------------+-------+------+---------------+------------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key        | key_len | ref   | rows | Extra       |
+----+-------------+-------+------+---------------+------------+---------+-------+------+-------------+
|  1 | SIMPLE      | this_ | ref  | tenant_idx    | tenant_idx | 4       | const |  100 | Using where |
+----+-------------+-------+------+---------------+------------+---------+-------+------+-------------+
1 row in set (0.00 sec)

This is definitely a lot better than without an index on tenant_id , but MySQL still has to look at all the rows with tenant_id = 10 in order to find those with the day we asked for. We can surely do better than this!

What happens if we index day as well?

Lets see what's happen if we add an index to day and re-run EXPLAIN for the same query.

mysql> CREATE INDEX day_idx ON message (day) USING BTREE;
Query OK, 2000 rows affected (0.08 sec)
Records: 2000  Duplicates: 0  Warnings: 0

mysql> show indexes from message;
+---------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table   | Non_unique | Key_name   | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+---------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| message |          0 | PRIMARY    |            1 | id          | A         |        1959 |     NULL | NULL   |      | BTREE      |         |
| message |          1 | tenant_idx |            1 | tenant_id   | A         |          38 |     NULL | NULL   |      | BTREE      |         |
| message |          1 | day_idx    |            1 | day         | A         |          14 |     NULL | NULL   |      | BTREE      |         |
+---------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
3 rows in set (0.00 sec)

mysql> explain select this_.id as id0_0_, this_.version as version0_0_, this_.day as day0_0_, this_.message as message0_0_, 
  this_.tenant_id as tenant5_0_0_ from message this_ where 10 = this_.tenant_id and this_.day="MONDAY";
+----+-------------+-------+-------------+--------------------+--------------------+---------+------+------+--------------------------------------------------+
| id | select_type | table | type        | possible_keys      | key                | key_len | ref  | rows | Extra                                            |
+----+-------------+-------+-------------+--------------------+--------------------+---------+------+------+--------------------------------------------------+
|  1 | SIMPLE      | this_ | index_merge | tenant_idx,day_idx | tenant_idx,day_idx | 4,257   | NULL |   14 | Using intersect(tenant_idx,day_idx); Using where |
+----+-------------+-------+-------------+--------------------+--------------------+---------+------+------+--------------------------------------------------+
1 row in set (0.00 sec)

This is better and comes with the advantage that we can do fast lookups on both tenant_id and day individually.

Multi column / composite indexes

Lets drop the two indexes and create a new multi column / composite index containing both tenant_id and day .

mysql> drop index day_idx on message;
Query OK, 2000 rows affected (0.05 sec)
Records: 2000  Duplicates: 0  Warnings: 0

mysql> drop index tenant_idx on message;
Query OK, 2000 rows affected (0.04 sec)
Records: 2000  Duplicates: 0  Warnings: 0

mysql> create index tenant_day_idx on message (tenant_id, day) using btree;
Query OK, 2000 rows affected (0.06 sec)
Records: 2000  Duplicates: 0  Warnings: 0

mysql> explain select this_.id as id0_0_, this_.version as version0_0_, this_.day as day0_0_, this_.message as message0_0_, 
   this_.tenant_id as tenant5_0_0_ from message this_ where 10 = this_.tenant_id and this_.day="MONDAY";
+----+-------------+-------+------+----------------+----------------+---------+-------------+------+-------------+
| id | select_type | table | type | possible_keys  | key            | key_len | ref         | rows | Extra       |
+----+-------------+-------+------+----------------+----------------+---------+-------------+------+-------------+
|  1 | SIMPLE      | this_ | ref  | tenant_day_idx | tenant_day_idx | 261     | const,const |   14 | Using where |
+----+-------------+-------+------+----------------+----------------+---------+-------------+------+-------------+
1 row in set (0.00 sec)

Very good! MySQL is able to locate all the rows we asked for directly by using tenant_day_idx ! This multi column index will still allow us to do fast lookups on tenant_id , but not day (unless combined with tenant_id) ! Read more about this in the next section, the short version is that the order matters when you create composite indexes.

More about multi column indexes

As mentioned in the previous section, the order of the columns in a composite index should not be left to chance. An index defined on (tenant_id, day) can be used to query for tenant_id alone and tenant_id combined with day , but not day alone.

A phone book on the traditional dead tree format is in my opinion the best analogy for explaining this. You might define the following index on a phone book: (surname, first_name). If you know the full name of the person you’re looking for then it shouldn’t take you long to determine his or her phone number, and even if the person is left out of the phone book you can determine that very quickly as you know where the number should have been. This index will also help you with looking up every person with a given surname. It will on the other hand be completely useless for looking up people by their first name.

Example:

mysql> explain select * from message where tenant_id = 10;
+----+-------------+---------+------+----------------+----------------+---------+-------+------+-------+
| id | select_type | table   | type | possible_keys  | key            | key_len | ref   | rows | Extra |
+----+-------------+---------+------+----------------+----------------+---------+-------+------+-------+
|  1 | SIMPLE      | message | ref  | tenant_day_idx | tenant_day_idx | 4       | const |   99 |       |
+----+-------------+---------+------+----------------+----------------+---------+-------+------+-------+
1 row in set (0.00 sec)

mysql> explain select * from message where day = "MONDAY";
+----+-------------+---------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table   | type | possible_keys | key  | key_len | ref  | rows | Extra       |
+----+-------------+---------+------+---------------+------+---------+------+------+-------------+
|  1 | SIMPLE      | message | ALL  | NULL          | NULL | NULL    | NULL | 1936 | Using where |
+----+-------------+---------+------+---------------+------+---------+------+------+-------------+
1 row in set (0.00 sec)

From the examples we can see that lookups on tenant_id is able to leverage tenant_day_idx , while lookups on day alone triggers a table scan.

Consult the MySQL documentation for more information on the topic.

Defining indexes inside Grails domain classes

Defining composite indexes doesn't work that well today, at least not today. I've created a JIRA issue if you want to track the progress. Creating a single column index works pretty well though.

package indextest

import grails.plugin.multitenant.core.annotation.MultiTenant;

@MultiTenant
class Message {

    String message
    String day
    
    static constraints = {
    }
    
    static mapping = {
        tenantId index: 'tenant_idx'
    }
    
}

Remember..

Remember that indexes takes up disk space and introduces some overhead as they have to be kept up to date when you do add, remove and update rows. You’ll probably find that this overhead isn’t too bad in traditional web applications where the read-to-update data ratio often is something like 1000-to-1.

I usually start with the application when I decide which indexes to add. Look at the most common / expensive queries and add indexes to optimize those. There are plenty of good books, blogs and articles on this subject if you feel like reading more about it (you probably should if you’re maintaining a large multi-tenant database).
 

 

Thanks for reading!

User comments

Gravatar

Wolfgang Schell - 24. Feb 2011 13:05

Great write-up! Thanks for sharing! Getting indexes right in MySQL is something like a dark art. Fortunately there is a good book out there: "High Performance MySQL" (O'Reilly, 2nd Edition).

Gravatar

Lucas Teixeira - 24. Feb 2011 14:19

Great writing Kim, hope to see plugins released soon.

Gravatar

thangchung - 06. Sep 2011 18:52

Cool stuff. Very interested in indexes on multi tenant.


Fork me on GitHub