Slow query with subselect

Jeffrey Simon

I have a query that takes a few minutes to run. It's actually part of a larger query, but this part seems to be the bottleneck. I have an internal selection that might be the culprit.

I'm looking for other indexes or other rearrangements to speed things up. I'm thinking maybe putting that subselect into a temp table, except it uses the data from the outer query in the where clause, otherwise that won't work.

Here is the query:

SELECT principalid, count(*) AS CRs_used FROM
(
    SELECT crMan.principalid, crMan.repid, MIN(crMan.daterequest) as FirstContactDate
    FROM contactrequest crMan
    INNER JOIN principal p
        ON crMan.principalid = p.userid
    WHERE
        initiatedby = 2
        AND status <> 'C'
        AND NOT EXISTS
        (
             SELECT *
             FROM contactrequest crRep
             WHERE crMan.principalid = crRep.principalid
                 AND crMan.repid = crRep.repid
                 AND initiatedby = 1
                 AND status <> 'C'
                 AND crRep.daterequest < crMan.daterequest
         )
    GROUP BY userid, crMan.principalid, crMan.repid) AS ContactRequestsThatCount GROUP BY principalid;

model:

CREATE TABLE `principal` (
  `operid` mediumint(8) unsigned NOT NULL DEFAULT 0,
  `ts` timestamp NOT NULL DEFAULT current_timestamp() ON UPDATE current_timestamp(),
  `userid` mediumint(8) unsigned NOT NULL DEFAULT 0,
  `targetcustomer` varchar(8000) NOT NULL DEFAULT '',
  `targetcustomer_stemmed` varchar(10000) NOT NULL DEFAULT '',
  `productline` varchar(8000) NOT NULL DEFAULT '',
  `productline_stemmed` varchar(10000) NOT NULL DEFAULT '',
  `salesopportunity` varchar(8000) NOT NULL DEFAULT '',
  `salesopportunity_stemmed` varchar(10000) NOT NULL DEFAULT '',
  `annualsales` decimal(11,0) DEFAULT NULL,
  `marketingassistance` bit(1) DEFAULT NULL,
  `trainingprovided` bit(1) DEFAULT NULL,
  `exclusiveterritories` bit(1) DEFAULT NULL,
  `repagency` bit(1) DEFAULT NULL,
  `made_in_usa` bit(1) DEFAULT NULL,
  `established_line` bit(1) DEFAULT NULL,
  PRIMARY KEY (`userid`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

CREATE TABLE `contactrequest` (
  `operid` mediumint(8) unsigned NOT NULL DEFAULT 0,
  `ts` timestamp NOT NULL DEFAULT current_timestamp() ON UPDATE current_timestamp(),
  `contactrequestid` mediumint(8) unsigned NOT NULL AUTO_INCREMENT,
  `repid` mediumint(8) unsigned NOT NULL DEFAULT 0,
  `principalid` mediumint(8) unsigned NOT NULL DEFAULT 0,
  `initiatedby` tinyint(3) unsigned NOT NULL DEFAULT 0,
  `response` char(1) NOT NULL DEFAULT '',
  `reasonid` tinyint(3) unsigned NOT NULL DEFAULT 0,
  `status` char(1) NOT NULL DEFAULT '',
  `daterequest` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
  `dateresponse` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
  `archivebypri` tinyint(1) NOT NULL DEFAULT 0,
  `archivebyrep` tinyint(1) NOT NULL DEFAULT 0,
  PRIMARY KEY (`contactrequestid`),
  KEY `ix_contactrequest_repid_request` (`repid`,`daterequest`),
  KEY `ix_contactrequest_principalid_request` (`principalid`,`daterequest`)
) ENGINE=InnoDB AUTO_INCREMENT=851354 DEFAULT CHARSET=latin1

Here is the EXPLAIN output:

edit:

The purpose of the query is as follows: The contactrequest table contains records of contacts between two members of our website, represented as principals and representatives. Both parties can initiate a request; initialby = 1 indicates that it has started; initialby = 2 indicates that the principal has started. There can be multiple such records between each pair of principle and representative.

The query counts the number of contacts from the principal to the representative, but there are no contacts from the representative to the principal with an earlier timestamp. Likewise, lines with status 'C' will also be ignored.

The index suggested in the answer below is already partially covered. State indexes and initialby indexes are not, because according to the SQL documentation, indexes with lower cardinality should not be used. initialby only has value in (1, 2) and state in ('C', 'N', ''). So the base is very low.

Edit 2:

After looking at the original query and answer, the question doesn't make any sense, I think the SQL has changed. Evidence for this is that something is included in the advice of - unless it was present in the original query. So I'm going to modify the original query back to what I think it should be.

The problem is that part of the last line is not visible in the rendered question, but when you try to edit it is actually the correct text there. I'll try to get it so you can see it in the formatted code.

Edit 3:

Revise the schema and add the following on the proposed index:

CREATE TABLE `contactrequest` (
  `operid` mediumint(8) unsigned NOT NULL DEFAULT 0,
  `ts` timestamp NOT NULL DEFAULT current_timestamp() ON UPDATE current_timestamp(),
  `contactrequestid` mediumint(8) unsigned NOT NULL AUTO_INCREMENT,
  `repid` mediumint(8) unsigned NOT NULL DEFAULT 0,
  `principalid` mediumint(8) unsigned NOT NULL DEFAULT 0,
  `initiatedby` tinyint(3) unsigned NOT NULL DEFAULT 0,
  `response` char(1) NOT NULL DEFAULT '',
  `reasonid` tinyint(3) unsigned NOT NULL DEFAULT 0,
  `status` char(1) NOT NULL DEFAULT '',
  `daterequest` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
  `dateresponse` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
  `archivebypri` tinyint(1) NOT NULL DEFAULT 0,
  `archivebyrep` tinyint(1) NOT NULL DEFAULT 0,
  PRIMARY KEY (`contactrequestid`),
  KEY `ix_contactrequest_repid_request` (`repid`,`daterequest`),
  KEY `ix_contactrequest_principalid_request` (`principalid`,`daterequest`),
  KEY `ix_contactrequest_initiatedby` (`initiatedby`),
  KEY `ix_contactrequest_status` (`status`),
  KEY `ix_contactrequest_daterequest` (`daterequest`),
  KEY `ix_contactrequest_dateresponse` (`dateresponse`),
  KEY `ix_contactrequest_temp` (`repid`,`initiatedby`,`status`,`daterequest`)
) ENGINE=InnoDB AUTO_INCREMENT=858323 DEFAULT CHARSET=latin1

In addition to the suggested index, I have added several columns as indexes. It turns out that just using the suggested indexes doesn't improve query speed, but by adding each individual index, you can get even bigger improvements. I don't understand this because I think the index is redundant now.

Note: dateresponse was recently added for other purposes but this query is not supported

Gordon Linoff

You can start by simplifying your query:

SELECT principalid, COUNT(DISTINCT userid, repid) AS CRs_used
FROM contactrequest crMan INNER JOIN
     principal p
     ON crMan.principalid = p.userid
WHERE initiatedby = 2 AND
      status <> 'C' AND
      NOT EXISTS (SELECT 1
                  FROM contactrequest crRep
                   WHERE crMan.principalid = crRep.principalid AND
                         crMan.repid = crRep.repid AND
                         initiatedby = 1 AND
                         status <> 'C' AND
                         crRep.daterequest < crMan.daterequest
                )
GROUP BY principalid;

You want an index contactrequest(repid, initiatedby, status, daterequest).

With more information about the query and what it's supposed to do, you can probably do more.

Query Optimization - Subselect in Left Join

Nora: I'm optimizing an SQL query, but I've found a particular line that seems to be crippling my query performance: LEFT JOIN anothertable lastweek AND lastweek.date>= (SELECT MAX(table.date)-7 max_date_lweek FROM table table

Subselect with count, SQL query

User 2922456 Hi, I need to select the work experience and salary of the employees in the department who have at least two males with the same salary as the females. this is the table EMP: empno|ename|deptno|sal|gender DEPT: deptno|dname Here is my code, fo

Query Optimization - Subselect in Left Join

Is there a more scalable alternative to subselect for this query?

sky walker I have a large table with a datetype field datetime. As part of a function that takes two lists as input, datetimei.e. the list afromsand atosI want to calculate the afrom,atopairs of each of these in a large table whose dates are the rows between t

Subselect with count, SQL query

Query using regex and subselect is too slow

Vincent I have this query in my PHP code querying a MySQL 5.5 database. The purpose of this query is to generate the next invoice number, so I need to make sure I get the largest existing invoice number and increment it by 1. The trick is that this reference (

Efficiency of subselect in MySQL Select query

Samuel Hawksby Robinson I'm trying to generate a report from a collection of database tables, but find that the query I've written is very slow. I realize that using multiple subselects on a select query increases the time the query takes to complete because e

Optimizing query with subselect in two tables

Lin Fang The table employeehas two columns: ID name The table external_jobalso has two columns: ID salary I have to find someone with the highest salary. The result must have three columns and one row: ID name salary I made a query but the client asked me not

Slow query due to subselect

Dennis I have several SQL Server 2014 queries that pull back a dataset where we need to compute related but different criteria along with that data. We do this with a subquery, but that slows it down tremendously. Until now we have more data in our database to

How to optimize delete query with subselect?

stevia This query needs to delete over 17 million rows from a table containing 20 million. DELETE FROM statements WHERE agreement_id IN (SELECT id FROM agreements WHERE created < DATE_SUB(CURDATE(), INTERVAL 6 MONTH)); DELETE FROM agreements WH

Subselect with count, SQL query

MySql: Generate this query without subselect

Eduardo I have a transaction log table of the form: | id | date | type | symbol | volume | unit_price | user_id | | 1 | 2016-01-01 | BUY | AAPL | 100 | 100.00 | a | | 2 | 2016-01-02 | SELL | AAPL | 50 | 110 | a | I'm

Is there a more scalable alternative to subselect for this query?

Using subselect count in postgres is really slow

Trabefi I have this query: SELECT c.name, COUNT(t.id) FROM Cinema c JOIN CinemaMovie cm ON cm.cinema_id = c.id JOIN Ticket t ON cm.id = cinema_movie_id WHERE cm.id IN ( SELECT cm1.id FROM CinemaMovie cm1 JOIN Movie m1 ON m1.id = cm1.movie_id J

Query using regex and subselect is too slow

MySQL subselect in query

Danny Hobo I would like to get the following results from the query: id_product_attribute | id_product | reference | name | total 12 | 1 | 234235 | product_name | 2 14 | 2 | 235435 | product_n

Slow query with subselect

Jeffrey Simon I have a query that takes a few minutes to run. It's actually part of a larger query, but this part seems to be the bottleneck. I have an internal selection that might be the culprit. I'm looking for other indexes or other rearrangements to speed

Subselect with count, SQL query

MYSQL related subselect in query

universal I've been trying to get this to work for the past few days...I know what I need to do and I have an idea that I need to use correlated subqueries to achieve it, but I've never had one before Related sub-queries, finally being told that the sub-select

Slow query with subselect

Subselect with count, SQL query

Query using regex and subselect is too slow

Using subselect count in postgres is really slow

MySQL subselect in query

Danny Hobo I would like to get the following results from the query: id_product_attribute | id_product | reference | name | total 12 | 1 | 234235 | product_name | 2 14 | 2 | 235435 | product_n

MYSQL related subselect in query

universal I've been trying to get this to work for the past few days...I know what I need to do and I have an idea that I need to implement it using correlated subqueries, but I've never had one before Related sub-queries, finally being told that the sub-selec

SQL aggregate function from subselect is too slow

super flash The query is taking too much time. SELECT sum(res.opportunities_sum) as opportunities_sum, res.traffic_source_id, sum(res.ad_request_sum) as ad_request_sum, sum(res.ad_impression_sum) as ad_impression_sum, sum(res.ad_start_sum) as

Slow query with subselect

Related

Ranking