Home | Blog | Software | Reviews and Features | Forum | Help | Donate | About us
topbanner_forum
  *

avatar image

Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
  • September 04, 2015, 02:50:33 AM
  • Proudly celebrating 10 years online.
  • Donate now to become a lifetime supporting member of the site and get a non-expiring license key for all of our programs.
  • donate

Author Topic: [Solved] SQL Query is Kicking my Ass.  (Read 3693 times)

Stoic Joker

  • Honorary Member
  • Joined in 2008
  • **
  • Posts: 5,883
    • View Profile
    • www.StoicJoker.com
    • Donate to Member
[Solved] SQL Query is Kicking my Ass.
« on: July 04, 2011, 09:18:41 AM »
Greetings
   So when the boss first asked me about doing this report it sounded simple enough. But after 4 days of hammering on it I've come to the conclusion that my initial assessment may have been (dead) wrong. I'm going to try to be both detailed and succinct ... So please bear with me if this gets a bit long.

   One of our internal billing systems contains historical information (Client, serial#, date, pages printed, etc.) about printers that are under contract. Circumstances being what they are (IT Happens), occasionally there are devices that don't get their counts collected. Now usually this gets caught up at a later date ... but some times certain devices slide long enough to start raising eyebrows as to their existence and condition. Which brings "us" to the report in question.

   I'm trying to create a SQL query (for a MySQL db) that will list all the devices for client X that:
1. Does not recently/currently have a billable count (Total Pages Printed = 0).
2. Shows the date of the last billable count (or the first zero count date in a contiguous historical block between then and now).


   Sound simple enough? Hopefully it is for some one here ... 'cause I'm getting my ass kicked by this thing. The relevant table structure and usage is as follows:

Columns:
Index - auto incremented (int).
SerialNo - of the device (string).
ModelName - of the device (string).
ClientID - Internal identified of customer (string).
TimeStamp - a.k.a. billing date / all devices for a given customer/date will have an identical time stamp (timestamp).
TotalPP - Total Pages Printed this will reflect the number of pages printed by that device for that billing cycle (int).

I've got close to 9 pages of failed queries and notes on what doesn't work. Nested selects tend to make the server race and die (neat trick with dual Xeons), so I've been concentrating on various self-join variations. But nothing is giving me a resultant data set that isn't either wrong or total garbage.


So I'm hoping to find a more informed opinion on how to approach this.

Thank you,
Slightly frazzled but still vaguely Stoic Joker
« Last Edit: July 06, 2011, 08:19:16 AM by Stoic Joker »

justice

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 1,896
    • View Profile
    • Donate to Member
Re: SQL Query is Kicking my Ass.
« Reply #1 on: July 04, 2011, 09:30:57 AM »
(you could try <1 in case the default value for TotalPP columns empty instead of 0)

What's wrong with something like:
SELECT Max(TimeStamp) As LastBillableDate,Index,SerialNo,ModelName, ClientID FROM <TableName> WHERE TotalPP=0.

* Split the problem up into small problems you can get working, don't try it all at once.
* Microsoft Reporting Services (if you can get your hands on it) is a great tool for making reports and it can connect to MySQL databases too, this might make it easier to create reports.
« Last Edit: July 04, 2011, 09:32:31 AM by justice »

Stoic Joker

  • Honorary Member
  • Joined in 2008
  • **
  • Posts: 5,883
    • View Profile
    • www.StoicJoker.com
    • Donate to Member
Re: SQL Query is Kicking my Ass.
« Reply #2 on: July 04, 2011, 09:53:05 AM »
(you could try <1 in case the default value for TotalPP columns empty instead of 0)

Default column value is 0, so no NULLs to trip over/around.

What's wrong with something like:
SELECT Max(TimeStamp) As LastBillableDate,Index,SerialNo,ModelName, ClientID FROM <TableName> WHERE TotalPP=0.

That's along the lines of what I have been playing with ... But no *joy* as of yet.

* Split the problem up into small problems you can get working, don't try it all at once.

Sound advice, but it hasn't panned out yet. :)

* Microsoft Reporting Services (if you can get your hands on it) is a great tool for making reports and it can connect to MySQL databases too, this might make it easier to create reports.

That may be handy for the future, but I'm not sure I want to introduce a new tools learning curve to the mix at this point.



Here's one of the Close-but-no-Cigar queries I've been playing with. Note it does include references to another table which is required for the final product. But isn't part of the current sticking point.
Code: Text
  1. SELECT Pc1.ClientID, Pc1.SerialNo, Pc1.ModelName, PiX.CurLocation, Pc1.TimeStamp
  2. FROM PageCounts Pc1
  3. INNER JOIN PageCounts Pc2 ON Pc1.SerialNo = Pc2.SerialNo
  4. INNER JOIN Printer_Inv PiX ON Pc1.SerialNo = PiX.SerialNo
  5. WHERE Pc1.ClientID = 'Client X' AND PiX.Active = 1
  6. GROUP BY Pc1.SerialNo, Pc1.TimeStamp
  7. HAVING Pc1.TimeStamp = MAX(Pc2.TimeStamp)
  8. ORDER BY Pc1.TimeStamp DESC
  9. LIMIT 0, 300
  10.  

justice

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 1,896
    • View Profile
    • Donate to Member
Re: SQL Query is Kicking my Ass.
« Reply #3 on: July 04, 2011, 10:25:12 AM »
It's very hard to help you without seeing an example database, but I think the problem is between line 5-7.
If you look at the
documentation on group by / having you see Having is used as a where on groups.

I'm still not really sure what you want because the query you are posting is not the same as your description of the query in the starting post.
Your query is trying to search for a Client that is Active where the billing date from pc1 historyical information is identical to the last billing date from some other place (pc2), then grouping by printer and timestamp. It doesn't make sense.

Also to me it's not clear how Pc2 is related to pc1 :P in english? Without being clear how you want the query in english noone can translate that to SQL. hope that helps
« Last Edit: July 04, 2011, 10:28:31 AM by justice »

Stoic Joker

  • Honorary Member
  • Joined in 2008
  • **
  • Posts: 5,883
    • View Profile
    • www.StoicJoker.com
    • Donate to Member
Re: SQL Query is Kicking my Ass.
« Reply #4 on: July 04, 2011, 11:55:46 AM »
It's very hard to help you without seeing an example database, but I think the problem is between line 5-7.
If you look at the
documentation on group by / having you see Having is used as a where on groups.

Having is something I just ran across the other day while Googling for an example of how best to get a list of max(dates) out of a table during/using a self join. So given that it has yet to create a result set that made complete sense it's quite likely I'm using it wrong...or expecting it to do more than it actually does (hell if I knew what I was doing I'd probably be done by now... (hehe)). Also that's just one of a multitude of things I tried none of which worked properly.

Not sure how well this will paste, but here's a small (cleaned) piece of the table's content.
Code: Text
  1. Index  SerialNo         ModelName       ClientID        CurPageCount  OldPageCount      TimeStamp               TotalPP
  2. 29      CNBM044207      3380 mfp        MIKLA01         2404            26              2005-10-27 09:04:17     2378
  3.  9      USLH039092      1100            CRITR01         39339           38834           2005-09-29 11:47:17     505
  4. 10      CNDIF23087      1160            CRITR01         7947            7632            2005-09-29 11:47:17     315
  5. 30      USBNL12704      4200            MIKLA01         199451          193626          2005-10-27 09:04:17     5825
  6. 12      CN09506649      4300            CRITR01         401922          383635          2005-09-29 11:47:17     18287
  7. 13      JPCG012675      4v              CRITR01         449756          449206          2005-09-29 11:47:17     550
  8. 14      USBB019753      8000 series     CRITR01         844310          839591          2005-09-29 11:47:17     4719
  9. 15      USBB020920      8000 series     CRITR01         760125          756990          2005-09-29 11:47:17     3135
  10. 16      U61036C5J482752 Brother MFC8840D CRITR01        3271            2314            2005-09-29 11:47:17     957
  11. 18      USBB031505      3100 fax        SNIFFERT_B      167316          167316          2005-10-02 11:47:23     0
  12. 19      USCC000104      4050            UFORA03         258810          255488          2005-10-04 08:50:26     3322
  13. 20      JPJB002010      Color 8550 series ECODE01       162675          160416          2005-10-18 16:27:32     2259
  14. 21      USLH039092      1100            CRITR01         39945           39339           2005-10-19 08:52:25     606
  15. 22      CNDIF23087      1160            CRITR01         9131            7947            2005-10-19 08:52:25     1184
  16. 24      CN09506649      4300            CRITR01         426537          401922          2005-10-19 08:52:25     24615
  17. 25      JPCG012675      4v              CRITR01         450190          449756          2005-10-19 08:52:25     434
  18. 26      USBB019753      8000 series     CRITR01         848516          844310          2005-10-19 08:52:25     4206
  19. 27      USBB020920      8000 series     CRITR01         768063          760125          2005-10-19 08:52:25     7938
  20. 28      U61036C5J482752 Brother MFC8840D CRITR01        4225            3271            2005-10-19 08:52:25     954
  21. 31      USSC016017      4000 series     OCLAD01         124823          124164          2005-10-27 13:25:31     659
  22.  


I'm still not really sure what you want because the query you are posting is not the same as your description of the query in the starting post.

The relevant part is. There are just some other (device specific) values being pulled in from another (PiX) table.

Your query is trying to search for a Client that is Active where the billing date from pc1 historyical information is identical to the last billing date from some other place (pc2), then grouping by printer and timestamp. It doesn't make sense.

I had a feeling that might confuse thing, which is why I didn't lead with it. Just ignore all the PiX.* stuff. It's only verifying that the device with SN X is "active" to thin out the data set.

Also to me it's not clear how Pc2 is related to pc1 :P in english? Without being clear how you want the query in english noone can translate that to SQL. hope that helps

Hm... (a bit sketchy on that myself), Best I can tell table is joined to itself on SN so that each SN can be searched for/aligned to its maximum date with a TotallPP of 0.

I'm at a point where the more I look at it the less sense it makes (forest for trees??) even after taking a break.

Renegade

  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 12,787
  • Tell me something you don't know...
    • View Profile
    • Renegade Minds
    • Donate to Member
Re: SQL Query is Kicking my Ass.
« Reply #5 on: July 04, 2011, 04:37:26 PM »
Have you tried using temporary tables? Joins can be very expensive. (Sorry -- I'd love to jump in and help more, but need to run out the door.) Multiple queries can sometimes help.
Slow Down Music - Where I commit thought crimes...

Freedom is the right to be wrong, not the right to do wrong. - John Diefenbaker

Perry Mowbray

  • N.A.N.Y. Organizer
  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 1,809
    • View Profile
    • Read more about this member.
    • Donate to Member
Re: SQL Query is Kicking my Ass.
« Reply #6 on: July 05, 2011, 01:06:31 AM »
I've probably not understood your problem, but what I generally do to get the last date for a sub-set of records is something like the following (written for a table we use to check this year's changed data... My apologies if I've not understood, this basically gets all the fields from the table where one field is equal to something and it's the last date:

Code: Text
  1. SELECT DISTINCT
  2.     T1.[CheckDate],
  3.     T1.[Assessment_Key],
  4.     T1.[TPKLAPAPPL],
  5.     T1.[Assessment],
  6.     T1.[Purpose],
  7.     T1.[RateType],
  8.     T1.[Address]
  9. FROM
  10.     [infoprod].[dbo].[LicenseOnSiteCheckTable] T1
  11.     INNER JOIN [infoprod].[dbo].[LicenseOnSiteCheckTable] T2
  12.         ON T1.[TPKLAPAPPL] = T2.[TPKLAPAPPL]
  13.             AND T2.RateType= 'OSM1'
  14.             AND T1.[CheckDate] = (
  15.                 SELECT
  16.                     Max(CheckDate)
  17.                 FROM
  18.                     [infoprod].[dbo].[LicenseOnSiteCheckTable]
  19.                 WHERE
  20.                     T1.[TPKLAPAPPL]= [TPKLAPAPPL])
  21. ORDER BY
  22.         [Assessment]
  23.  


Stoic Joker

  • Honorary Member
  • Joined in 2008
  • **
  • Posts: 5,883
    • View Profile
    • www.StoicJoker.com
    • Donate to Member
Re: SQL Query is Kicking my Ass.
« Reply #7 on: July 05, 2011, 06:43:59 AM »
Have you tried using temporary tables? Joins can be very expensive. (Sorry -- I'd love to jump in and help more, but need to run out the door.) Multiple queries can sometimes help.

My understanding was that joins were less expensive than nested selects, so I went that route to keep the server from racing itself to death (had to kill it a few times). This isn't for something that has to be done frequently by multiple users. It's a one-shot report that has great potential to be useful going forward on a ~monthly basis. So if it took a few minutes to run...That's Okay...If it comes up with the right answer.

OTOH, if it could be done quickly, no-one would complain... ;)

Haven't tried temporary tables yet, but I have been slowly resigning myself to the possibility of their need. In a nutshell, the downside of being completely self taught, is that when you're stuck, you are stuck... :) (e.g. I'm stuck'ed)


@Perry - At this point I'm game to try anything. I'll try your query later today and see what happens. Hopefully it'll give me a data set that's a bit less hellter skelter than what I have been getting.

patthecat

  • Member
  • Joined in 2006
  • **
  • Posts: 88
    • View Profile
    • Donate to Member
Re: SQL Query is Kicking my Ass.
« Reply #8 on: July 05, 2011, 08:34:05 AM »
Take out the #7 Having clause and run the query. 
Is this the subset that you want the "max" timestamp to be chosen from?

I think it's where the issue lies when you try to bring PC2 timestamp into the picture.  If it is then at least you can try the other methods (sub queries, temp files, etc) you've tried to see if it get's the one "max" timestamp from your subset.

Oh and also, since you are dealing with timestamp, make sure the computer setting as far as timezones is consistent between the machines that produces the timestamp data into the database.  Had this issue once where the timestamp data was created on an Oracle Unix database which was in GMT settings, versus the Crystal report running on windows with the local timezone settings.  Once I made things consistent (forgot whether which I changed - windows timezone OR how Crystal Reports handled date time string) the report ran as expected.

Stoic Joker

  • Honorary Member
  • Joined in 2008
  • **
  • Posts: 5,883
    • View Profile
    • www.StoicJoker.com
    • Donate to Member
Re: SQL Query is Kicking my Ass.
« Reply #9 on: July 05, 2011, 08:38:48 AM »
Um... Perry? I may have translated the query incorrectly. The below modified version held a pair of 3.0GHz Xeons at 25% usage for 10 minutes (with no output in sight) ... So I gave up (had to) and killed it. Table's only got 20,000 lines in it. So I'm guessing something is awry (or you've got some seriously fast hardware).

Code: Text
  1. SELECT DISTINCT
  2.         Pc1.ClientID,
  3.         Pc1.SerialNo,
  4.         Pc1.ModelName,
  5.         Pc1.TimeStamp
  6. FROM
  7.         PageCounts Pc1
  8.         INNER JOIN PageCounts Pc2
  9.                 ON Pc1.SerialNo = Pc2.SerialNo
  10.                         AND Pc2.ClientID = 'Client X'
  11.                         AND Pc1.TimeStamp = (
  12.                                 SELECT MAX(TimeStamp)
  13.                                 FROM PageCounts
  14.                                 WHERE Pc1.SerialNo = SerialNo
  15.                                 )
  16. ORDER BY Pc1.TimeStamp DESC
  17.  

Stoic Joker

  • Honorary Member
  • Joined in 2008
  • **
  • Posts: 5,883
    • View Profile
    • www.StoicJoker.com
    • Donate to Member
Re: SQL Query is Kicking my Ass.
« Reply #10 on: July 05, 2011, 08:40:38 AM »
Take out the #7 Having clause and run the query. 
Is this the subset that you want the "max" timestamp to be chosen from?

I'll have to get back to you on that ... I gotta be on-Site here shortly.

Stoic Joker

  • Honorary Member
  • Joined in 2008
  • **
  • Posts: 5,883
    • View Profile
    • www.StoicJoker.com
    • Donate to Member
Re: SQL Query is Kicking my Ass.
« Reply #11 on: July 05, 2011, 01:34:45 PM »
Take out the #7 Having clause and run the query.  
Is this the subset that you want the "max" timestamp to be chosen from?

Okay, I take this to mean: When looking at the output, how do I know it's wrong? ...Which is actually a hell of a good GD question, now that I think of it ... Or rather, now that you asked me to think of it. ;)

Now here's the weird part. Most of it ain't. Here's an updated version of the query as just used for above:
Code: Text
  1. SELECT Pc1.ClientID, Pc1.SerialNo, Pc1.ModelName, PiX.CurLocation, Pc1.TotalPP, Pc1.TimeStamp
  2. FROM PageCounts Pc1
  3. INNER JOIN PageCounts Pc2 ON Pc1.SerialNo = Pc2.SerialNo
  4. INNER JOIN Printer_Inv PiX ON Pc1.SerialNo = PiX.SerialNo
  5. WHERE Pc1.ClientID = 'Client X' AND Pc1.TotalPP = 0 AND PiX.Active = 1
  6. GROUP BY Pc1.SerialNo, Pc1.TimeStamp HAVING Pc1.TimeStamp = MAX(Pc2.TimeStamp)
  7. ORDER BY Pc1.TimeStamp DESC
  8. LIMIT 0, 300
  9.  

This produces a data set of the correct size...and with the correct items. The problem is the dates are "wrong". Sure they're what I asked for ... But that's not (exactly...) what I meant (to ask for) ... And computers are of course basically shit for interpretation. (...Oops!)

So here's how I got ^^there^^. I was (assuming) looking for a data set size somewhere in the 30's, so when I kept getting 80's it looked/felt/appeared "wrong" ... Which it would have indeed been (wrong) if I hadn't just created another bill last Friday which contained 80 something devices with absent page counts.  :wallbash:

So here's the - Newly updated with correct-er information - fun part. The date column (actually) needs to reflect the date of the last non-zero value. I went with max to get the latest info. because the table contains the complete billing history. So I didn't want to have any old zeros surfacing if they'd been "washed" by a valid, more current, non-zero count. However I do need this report query to show how (far back) long its been since device X has had a valid count retrieved from it ... So we can look into the why who is most guilty of (zero count) silence.


I think it's where the issue lies when you try to bring PC2 timestamp into the picture.  If it is then at least you can try the other methods (sub queries, temp files, etc) you've tried to see if it get's the one "max" timestamp from your subset.

No that part - Best I can tell - is Okay ... I was just on the wrong end of the problem. ;)

Thank you!

justice

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 1,896
    • View Profile
    • Donate to Member
Re: SQL Query is Kicking my Ass.
« Reply #12 on: July 05, 2011, 06:35:28 PM »
Now I'm not an expert, but it seems because you need the last date from a bunch of non zero pagecounts, but you also need values from a bunch of zero pagecounts, that you have to do a subselect to get the max date and other information from the zero pagecounts data and then reuse that in your main query. :)

So if that means you're back in the "nested select" performance problem then perhaps it's worth restructuring the data if possible, or wait for someone with better knowledge to help you out :D Google for subselect alternatives

Stoic Joker

  • Honorary Member
  • Joined in 2008
  • **
  • Posts: 5,883
    • View Profile
    • www.StoicJoker.com
    • Donate to Member
Re: SQL Query is Kicking my Ass.
« Reply #13 on: July 06, 2011, 06:48:18 AM »
Now I'm not an expert, but it seems because you need the last date from a bunch of non zero pagecounts, but you also need values from a bunch of zero pagecounts, that you have to do a subselect to get the max date and other information from the zero pagecounts data and then reuse that in your main query. :)

That's about where I got too. Unfortunately the passing values between queries thing just isn't cooperating. I could resort to bolting this thing to a web page and just doing the subquery in the result set's while loop processing ... But I really wanted to keep this as a single free standing SQL query script so it could be dumped to Excel quickly.

So if that means you're back in the "nested select" performance problem then perhaps it's worth restructuring the data if possible, or wait for someone with better knowledge to help you out :D Google for subselect alternatives

...That's why I'm here... :) ...Google just got me into the joins fiasco. Restructuring the data is not an option as it's tied into/used by more than one of our internal billing systems (can U say Ripple Effect..?  :D). There has got to be a simple answer for this...I just haven't stumbled across it yet.

Stoic Joker

  • Honorary Member
  • Joined in 2008
  • **
  • Posts: 5,883
    • View Profile
    • www.StoicJoker.com
    • Donate to Member
Re: SQL Query is Kicking my Ass.
« Reply #14 on: July 06, 2011, 08:18:41 AM »
Occam's Razor anyone?

The answer was simple, after I got the question right.

Code: Text
  1. SELECT ClientID, PC.SerialNo, PC.ModelName, CurLocation, Max(TimeStamp) As LastValidCount
  2. FROM PageCounts PC
  3. INNER JOIN Printer_Inv PiX ON PC.SerialNo = PiX.SerialNo
  4. WHERE PiX.Active = 1 AND ClientID = 'Client X' AND TotalPP > 0
  5. GROUP BY PC.SerialNo
  6. ORDER BY LastValidCount DESC, CurLocation
  7. LIMIT 0, 300
  8.  

justice

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 1,896
    • View Profile
    • Donate to Member
Re: [Solved] SQL Query is Kicking my Ass.
« Reply #15 on: July 06, 2011, 08:42:03 AM »
 :o :huh: :-[ :Thmbsup:

cranioscopical

  • Friend of the Site
  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 4,303
    • View Profile
    • Read more about this member.
    • Donate to Member
Re: SQL Query is Kicking my Ass.
« Reply #16 on: July 06, 2011, 09:55:19 AM »
Occam's Razor anyone?

Aha! Just in the nick of timeā€¦

Well done!  :Thmbsup:   I haven't a clue what you were doing but it's good to see persistence rewarded.  :Thmbsup: