MySQL – inner join vs. exists() may yield better performance

Here’s another example of why it’s always a good idea to refer to execution plans and test different query options.  Here’s a query I was running as part of some manual data work:

select ua.* from vn_user_achievements ua
where exists (
select 1 from _users_achievements_set_to_notified m
where ua.user_id = m.user_id
)

Given that this was running under manual data work context, vs. being part of our actual product, I wasn’t too worried about how slow it was performing.  However, before too long I was running this query against bigger data sets and, as a result, I was waiting for over 30 seconds for it to return; that got me curious.  A quick look at it’s execution plan (shown below) revealed 2 big negatives:

  1. it wasn’t using an index:
  2. it was executing a correlated subquery

As for the index – I knew there was one on vn_user_achievements.user_id, so I was curious why it wasn’t being used.  In addition, correlated subqueries are well known to be painfully slow due to the amount of additional lookups they perform.  So as a next step I decided to rewrite the query using an inner join …

select ua.* from vn_user_achievements ua
inner join _users_achievements_set_to_notified m on ua.user_id = m.user_id

… to see if MySQL would use a smarter execution plan.

It did!  The key and ref columns show that MySQL used the user_id portion of the primary key on vn_user_achievements, and the rewritten query is no longer resulting in a correlated subquery.  As a result, this version returned an identical result set in just 1.5 seconds, or about 20 times faster!

So, if you’re working on tuning a slow query that happens to use exists(), consider rewriting it to use an inner join, especially if the execution plan is dismissing an index you suspect should be used, or if it’s forcing a correlated subquery to be executed.

It’s also worth noting that you need to test any exists() vs. inner join comparisons for identical result sets.  In my case, this wasn’t a problem due to a 1:many relationship, but if the data was many:many, the inner join rewrite would have resulted in a different result set.

This entry was posted in Database, Performance. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *