Monday, May 31, 2010

Improving the performance of REGEXP queries in MySql

Say we’re trying to query a table of UK postcode and just want to return the postcodes in a particular area(the Birmingham area B for instance). A naive implementation of this may be something like this

SELECT * FROM Postcodes WHERE Postcode LIKE 'B%'

This works for some postcodes but doesn’t work for this particular example because it also returns any postcodes in the BA, BB, BT etc areas. So it looks like a regular expression is required, to ensure the area code is followed by a number, so we try the following

SELECT * FROM Postcodes WHERE Postcode REGEXP'^B[0-9]'

This works and returns what we expect but it is much slower than the LIKE query. So is there any way to speed it up? Actually, it’s pretty straightforward, just combine the LIKE and the REGEXP queries, like so

SELECT * FROM Postcodes WHERE Postcode LIKE 'B%' AND Postcode REGEXP'^B[0-9]'

This give MySql the chance to first filter the data based on the LIKE clause then only use the regular expression on the filtered data. It’s not quite as fast as the original LIKE implementation but it’s much quicker than REGEXP on its own and unlike the original LIKE implementation it actually works.

No comments: