InnoDB Primary Key versus Secondary Index: An Interesting Lesson from EXPLAIN

I ran into an interesting issue today, while examining some EXPLAIN outputs, and wanted to share the findings, as some of this is undocumented. Basically, you can start with a very simple InnoDB table – 2 INT columns, Primary Key (PK) on the 1st column, regular index on the 2nd:
CREATE TABLE `t1` ( `id1` int(10) unsigned NOT NULL, `id2` int(10) unsigned DEFAULT NULL, PRIMARY KEY (`id1`), KEY `id2` (`id2`) ) ENGINE=InnoDB;
The query is:
SELECT id1 FROM t1;
This is a straight-forward query with no WHERE clause. Given no WHERE clause, we know there will be a full table or index scan. Let’s look at EXPLAIN:
mysql> EXPLAIN SELECT id1 FROM t1G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: t1 type: index possible_keys: NULL key: id2 key_len: 5 ref: NULL rows: 1 Extra: Using index
EXPLAIN returns NULL for “possible_keys”. This means there was no relevant index (though we see one was used at some point): “It is possible that key will name an index that is not present in the possible_keys value. This can happen if none of the possible_keys indexes are suitable for looking up rows, but all the columns selected by the query are columns of some other index. That is, the named index covers the selected columns, so although it is not used to determine which rows to retrieve, an index scan is more efficient than a data row scan.” http://dev.mysql.com/doc/refman/5.6/en/explain-output.html So “possible_keys” returns NULL, but key returns `id2`. Per the above, that means `id2` is a covering index. This part is a bit strange, as it is not readily intuitive why this is the case. I mean, the query asks for “id1”, yet EXPLAIN says “id2” is the covering index. How is this possible and why isn’t just the Primary Key chosen anyway? This is possible because of the way InnoDB’s secondary indexes are stored. In InnoDB, secondary indexes are stored along with their corresponding primary key values, so by looking at the secondary index, you can see the primary key values. As for why this is chosen instead of the primary key, it took some digging, but I found the exact comments as to why in the source code (sql/sql_select.cc) in the find_shortest_key() function:
"As far as 1) clustered primary key entry data set is a set of all record fields (key fields and not key fields) and 2) secondary index entry data is a union of its key fields and primary key fields (at least InnoDB and its derivatives don't duplicate primary key fields there, even if the primary and the secondary keys have a common subset of key fields), then secondary index entry data is always a subset of primary key entry. Unfortunately, key_info[nr].key_length doesn't show the length of key/pointer pair but a sum of key field lengths only, thus we can't estimate index IO volume comparing only this key_length value of secondary keys and clustered PK. So, try secondary keys first, and choose PK only if there are no usable secondary covering keys or found best secondary key include all table fields (i.e. same as PK):"
So since secondary index entry data is always a subset of primary key entry, scanning the secondary index should generate slightly less IO than scanning the primary key. Fwiw, this also explains why key_len is less if you specify “USE INDEX (Primary)” to force it to use the PK over the secondary index:
mysql> EXPLAIN SELECT id1 FROM t1 USE INDEX (Primary)G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: t1 type: index possible_keys: NULL key: PRIMARY key_len: 4 ref: NULL rows: 1 Extra: Using index
Note this reports key_len of 4, whereas the former reported 5. Hope this helps.
Post a Comment
Log into your MariaDB ID account to post a comment.