Slow query with subselect
I have a query that takes a few minutes to run. It's actually part of a larger query, but this part seems to be the bottleneck. I have an internal selection that might be the culprit.
I'm looking for other indexes or other rearrangements to speed things up. I'm thinking maybe putting that subselect into a temp table, except it uses the data from the outer query in the where clause, otherwise that won't work.
Here is the query:
SELECT principalid, count(*) AS CRs_used FROM
(
SELECT crMan.principalid, crMan.repid, MIN(crMan.daterequest) as FirstContactDate
FROM contactrequest crMan
INNER JOIN principal p
ON crMan.principalid = p.userid
WHERE
initiatedby = 2
AND status <> 'C'
AND NOT EXISTS
(
SELECT *
FROM contactrequest crRep
WHERE crMan.principalid = crRep.principalid
AND crMan.repid = crRep.repid
AND initiatedby = 1
AND status <> 'C'
AND crRep.daterequest < crMan.daterequest
)
GROUP BY userid, crMan.principalid, crMan.repid) AS ContactRequestsThatCount GROUP BY principalid;
model:
CREATE TABLE `principal` (
`operid` mediumint(8) unsigned NOT NULL DEFAULT 0,
`ts` timestamp NOT NULL DEFAULT current_timestamp() ON UPDATE current_timestamp(),
`userid` mediumint(8) unsigned NOT NULL DEFAULT 0,
`targetcustomer` varchar(8000) NOT NULL DEFAULT '',
`targetcustomer_stemmed` varchar(10000) NOT NULL DEFAULT '',
`productline` varchar(8000) NOT NULL DEFAULT '',
`productline_stemmed` varchar(10000) NOT NULL DEFAULT '',
`salesopportunity` varchar(8000) NOT NULL DEFAULT '',
`salesopportunity_stemmed` varchar(10000) NOT NULL DEFAULT '',
`annualsales` decimal(11,0) DEFAULT NULL,
`marketingassistance` bit(1) DEFAULT NULL,
`trainingprovided` bit(1) DEFAULT NULL,
`exclusiveterritories` bit(1) DEFAULT NULL,
`repagency` bit(1) DEFAULT NULL,
`made_in_usa` bit(1) DEFAULT NULL,
`established_line` bit(1) DEFAULT NULL,
PRIMARY KEY (`userid`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
CREATE TABLE `contactrequest` (
`operid` mediumint(8) unsigned NOT NULL DEFAULT 0,
`ts` timestamp NOT NULL DEFAULT current_timestamp() ON UPDATE current_timestamp(),
`contactrequestid` mediumint(8) unsigned NOT NULL AUTO_INCREMENT,
`repid` mediumint(8) unsigned NOT NULL DEFAULT 0,
`principalid` mediumint(8) unsigned NOT NULL DEFAULT 0,
`initiatedby` tinyint(3) unsigned NOT NULL DEFAULT 0,
`response` char(1) NOT NULL DEFAULT '',
`reasonid` tinyint(3) unsigned NOT NULL DEFAULT 0,
`status` char(1) NOT NULL DEFAULT '',
`daterequest` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
`dateresponse` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
`archivebypri` tinyint(1) NOT NULL DEFAULT 0,
`archivebyrep` tinyint(1) NOT NULL DEFAULT 0,
PRIMARY KEY (`contactrequestid`),
KEY `ix_contactrequest_repid_request` (`repid`,`daterequest`),
KEY `ix_contactrequest_principalid_request` (`principalid`,`daterequest`)
) ENGINE=InnoDB AUTO_INCREMENT=851354 DEFAULT CHARSET=latin1
Here is the EXPLAIN output:
edit:
The purpose of the query is as follows: The contactrequest table contains records of contacts between two members of our website, represented as principals and representatives. Both parties can initiate a request; initialby = 1 indicates that it has started; initialby = 2 indicates that the principal has started. There can be multiple such records between each pair of principle and representative.
The query counts the number of contacts from the principal to the representative, but there are no contacts from the representative to the principal with an earlier timestamp. Likewise, lines with status 'C' will also be ignored.
The index suggested in the answer below is already partially covered. State indexes and initialby indexes are not, because according to the SQL documentation, indexes with lower cardinality should not be used. initialby only has value in (1, 2) and state in ('C', 'N', ''). So the base is very low.
Edit 2:
After looking at the original query and answer, the question doesn't make any sense, I think the SQL has changed. Evidence for this is that something is included in the advice of - unless it was present in the original query. So I'm going to modify the original query back to what I think it should be.
The problem is that part of the last line is not visible in the rendered question, but when you try to edit it is actually the correct text there. I'll try to get it so you can see it in the formatted code.
Edit 3:
Revise the schema and add the following on the proposed index:
CREATE TABLE `contactrequest` (
`operid` mediumint(8) unsigned NOT NULL DEFAULT 0,
`ts` timestamp NOT NULL DEFAULT current_timestamp() ON UPDATE current_timestamp(),
`contactrequestid` mediumint(8) unsigned NOT NULL AUTO_INCREMENT,
`repid` mediumint(8) unsigned NOT NULL DEFAULT 0,
`principalid` mediumint(8) unsigned NOT NULL DEFAULT 0,
`initiatedby` tinyint(3) unsigned NOT NULL DEFAULT 0,
`response` char(1) NOT NULL DEFAULT '',
`reasonid` tinyint(3) unsigned NOT NULL DEFAULT 0,
`status` char(1) NOT NULL DEFAULT '',
`daterequest` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
`dateresponse` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
`archivebypri` tinyint(1) NOT NULL DEFAULT 0,
`archivebyrep` tinyint(1) NOT NULL DEFAULT 0,
PRIMARY KEY (`contactrequestid`),
KEY `ix_contactrequest_repid_request` (`repid`,`daterequest`),
KEY `ix_contactrequest_principalid_request` (`principalid`,`daterequest`),
KEY `ix_contactrequest_initiatedby` (`initiatedby`),
KEY `ix_contactrequest_status` (`status`),
KEY `ix_contactrequest_daterequest` (`daterequest`),
KEY `ix_contactrequest_dateresponse` (`dateresponse`),
KEY `ix_contactrequest_temp` (`repid`,`initiatedby`,`status`,`daterequest`)
) ENGINE=InnoDB AUTO_INCREMENT=858323 DEFAULT CHARSET=latin1
In addition to the suggested index, I have added several columns as indexes. It turns out that just using the suggested indexes doesn't improve query speed, but by adding each individual index, you can get even bigger improvements. I don't understand this because I think the index is redundant now.
Note: dateresponse was recently added for other purposes but this query is not supported
You can start by simplifying your query:
SELECT principalid, COUNT(DISTINCT userid, repid) AS CRs_used
FROM contactrequest crMan INNER JOIN
principal p
ON crMan.principalid = p.userid
WHERE initiatedby = 2 AND
status <> 'C' AND
NOT EXISTS (SELECT 1
FROM contactrequest crRep
WHERE crMan.principalid = crRep.principalid AND
crMan.repid = crRep.repid AND
initiatedby = 1 AND
status <> 'C' AND
crRep.daterequest < crMan.daterequest
)
GROUP BY principalid;
You want an index contactrequest(repid, initiatedby, status, daterequest)
.
With more information about the query and what it's supposed to do, you can probably do more.