I was recently asked about whether column order matters in composite indexes, and if so, to what extent. I recall this being the case when working with SQL Server 2000 and 2005, so I became curious how MySQL handled scenarios where a query touched different columns in a composite index. The short answer is yes, column order does matter. If you’re interested in seeing how, and when, follow along.
Before running some tests I loaded test data, chose 1 table, and ran a baseline query with no indexes on the table (except for it’s primary key). The data I used is one table from one of MySQL’s sample databases, sakila. You can download the table’s dump here. It’s also worth noting that the table uses the InnoDB storage engine.
Next, I ran the following query to ensure no indexes were being used for my test query. It’s execution plan is below to confirm this, indicated by the key column being null:
select first_name, last_name, last_update from actor where last_name = 'Witherspoon' and first_name = 'Angela'
This is where the useful tests start. First, I added an index on last_name and first_name:
create index IX_actor_last_name_first_name on actor (last_name, first_name)
Next, I re-ran EXPLAIN on the baseline query above. This time, you can see that the new index on last_name and first_name is being used (again, by referencing the key column):
With use of the index established, I moved on to test dependence on column order. I looked at 3 scenarios, each illustrated with an execution plan.
The first involved modifying the where clause such that it only references last_name. In this case the index can still be used since last_name is in the left-most position of the index:
select first_name, last_name, last_update from actor where last_name = 'Witherspoon'
The second involved modifying the where clause such that it only referenced first_name. In this case, you can see the index is not used. It’s worth noting that in this case, SQL Server would probably use an index scan operation (vs. an index seek, which is the faster of the 2). It would still leverage the index, but would have to scan into the index to compare against the first_name data since it’s not in the left-most position of the index. Unfortunately, MySQL doesn’t act similar, it dismisses the index entirely.
select first_name, last_name, last_update from actor where first_name = 'Angela'
The third test was done out of curiosity, but demonstrates intelligence on the optimizer’s part prior to executing the query. I modified the where clause such that both columns were included, but the order of the columns cited in the query are opposite of the column order in the index. As you can see, the order in the where clause makes no difference since the optimizer is smart enough to know which order should be applied to make use of the index.
select first_name, last_name, last_update from actor where first_name like 'Angela' and last_name like 'Witherspoon'
So, all this reinforced 2 things:
- Index column order does matter
- Know what’s going on with your queries! Run EXPLAIN and study execution plans so that you know what your queries are doing vs. possibly making a costly guess. That relates in this case because indexes aren’t free, and as demonstrated above, there are cases where they’re added but the optimizer simply dismisses them. In such cases you only make things worse since DML operations actually become slower when operating on columns that are indexed.
Finally, note that all this only covers non-clustered indexes, but the same effect happens on tests I’ve ran against clustered indexes. Plus, column order in clustered indexes also has another impact on the data itself, which I’ll follow up on in another post that digs into clustered indexes in more detail.