Welcome Guest.   Make a donation to an author on the site November 25, 2014, 06:27:30 PM  *

Please login or register.
Or did you miss your validation email?


Login with username and password (forgot your password?)
Why not become a lifetime supporting member of the site with a one-time donation of any amount? Your donation entitles you to a ton of additional benefits, including access to exclusive discounts and downloads, the ability to enter monthly free software drawings, and a single non-expiring license key for all of our programs.


You must sign up here before you can post and access some areas of the site. Registration is totally free and confidential.
 
Read the full one-year retrospective report on DonationCoder.com.
   
   Forum Home   Thread Marks Chat! Downloads Search Login Register  
Pages: [1]   Go Down
  Reply  |  New Topic  |  Print  
Author Topic: [Solved] SQL Query is Kicking my Ass.  (Read 3153 times)
Stoic Joker
Honorary Member
**
Posts: 5,402



View Profile WWW Give some DonationCredits to this forum member
« on: July 04, 2011, 09:18:41 AM »

Greetings
   So when the boss first asked me about doing this report it sounded simple enough. But after 4 days of hammering on it I've come to the conclusion that my initial assessment may have been (dead) wrong. I'm going to try to be both detailed and succinct ... So please bear with me if this gets a bit long.

   One of our internal billing systems contains historical information (Client, serial#, date, pages printed, etc.) about printers that are under contract. Circumstances being what they are (IT Happens), occasionally there are devices that don't get their counts collected. Now usually this gets caught up at a later date ... but some times certain devices slide long enough to start raising eyebrows as to their existence and condition. Which brings "us" to the report in question.

   I'm trying to create a SQL query (for a MySQL db) that will list all the devices for client X that:
1. Does not recently/currently have a billable count (Total Pages Printed = 0).
2. Shows the date of the last billable count (or the first zero count date in a contiguous historical block between then and now).


   Sound simple enough? Hopefully it is for some one here ... 'cause I'm getting my ass kicked by this thing. The relevant table structure and usage is as follows:

Columns:
Index - auto incremented (int).
SerialNo - of the device (string).
ModelName - of the device (string).
ClientID - Internal identified of customer (string).
TimeStamp - a.k.a. billing date / all devices for a given customer/date will have an identical time stamp (timestamp).
TotalPP - Total Pages Printed this will reflect the number of pages printed by that device for that billing cycle (int).

I've got close to 9 pages of failed queries and notes on what doesn't work. Nested selects tend to make the server race and die (neat trick with dual Xeons), so I've been concentrating on various self-join variations. But nothing is giving me a resultant data set that isn't either wrong or total garbage.


So I'm hoping to find a more informed opinion on how to approach this.

Thank you,
Slightly frazzled but still vaguely Stoic Joker
« Last Edit: July 06, 2011, 08:19:16 AM by Stoic Joker » Logged
justice
Supporting Member
**
Posts: 1,890



Solve issues simply.

View Profile WWW Give some DonationCredits to this forum member
« Reply #1 on: July 04, 2011, 09:30:57 AM »

(you could try <1 in case the default value for TotalPP columns empty instead of 0)

What's wrong with something like:
SELECT Max(TimeStamp) As LastBillableDate,Index,SerialNo,ModelName, ClientID FROM <TableName> WHERE TotalPP=0.

* Split the problem up into small problems you can get working, don't try it all at once.
* Microsoft Reporting Services (if you can get your hands on it) is a great tool for making reports and it can connect to MySQL databases too, this might make it easier to create reports.
« Last Edit: July 04, 2011, 09:32:31 AM by justice » Logged

Stoic Joker
Honorary Member
**
Posts: 5,402



View Profile WWW Give some DonationCredits to this forum member
« Reply #2 on: July 04, 2011, 09:53:05 AM »

(you could try <1 in case the default value for TotalPP columns empty instead of 0)

Default column value is 0, so no NULLs to trip over/around.

What's wrong with something like:
SELECT Max(TimeStamp) As LastBillableDate,Index,SerialNo,ModelName, ClientID FROM <TableName> WHERE TotalPP=0.

That's along the lines of what I have been playing with ... But no *joy* as of yet.

* Split the problem up into small problems you can get working, don't try it all at once.

Sound advice, but it hasn't panned out yet. smiley

* Microsoft Reporting Services (if you can get your hands on it) is a great tool for making reports and it can connect to MySQL databases too, this might make it easier to create reports.

That may be handy for the future, but I'm not sure I want to introduce a new tools learning curve to the mix at this point.



Here's one of the Close-but-no-Cigar queries I've been playing with. Note it does include references to another table which is required for the final product. But isn't part of the current sticking point.
Formatted for SQL with the GeSHI Syntax Highlighter [copy or print]
  1. SELECT Pc1.ClientID, Pc1.SerialNo, Pc1.ModelName, PiX.CurLocation, Pc1.TimeStamp
  2. FROM PageCounts Pc1
  3. INNER JOIN PageCounts Pc2 ON Pc1.SerialNo = Pc2.SerialNo
  4. INNER JOIN Printer_Inv PiX ON Pc1.SerialNo = PiX.SerialNo
  5. WHERE Pc1.ClientID = 'Client X' AND PiX.Active = 1
  6. GROUP BY Pc1.SerialNo, Pc1.TimeStamp
  7. HAVING Pc1.TimeStamp = MAX(Pc2.TimeStamp)
  8. ORDER BY Pc1.TimeStamp DESC
  9. LIMIT 0, 300
Logged
justice
Supporting Member
**
Posts: 1,890



Solve issues simply.

View Profile WWW Give some DonationCredits to this forum member
« Reply #3 on: July 04, 2011, 10:25:12 AM »

It's very hard to help you without seeing an example database, but I think the problem is between line 5-7.
If you look at the
documentation on group by / having you see Having is used as a where on groups.

I'm still not really sure what you want because the query you are posting is not the same as your description of the query in the starting post.
Your query is trying to search for a Client that is Active where the billing date from pc1 historyical information is identical to the last billing date from some other place (pc2), then grouping by printer and timestamp. It doesn't make sense.

Also to me it's not clear how Pc2 is related to pc1 tongue in english? Without being clear how you want the query in english noone can translate that to SQL. hope that helps
« Last Edit: July 04, 2011, 10:28:31 AM by justice » Logged

Stoic Joker
Honorary Member
**
Posts: 5,402



View Profile WWW Give some DonationCredits to this forum member
« Reply #4 on: July 04, 2011, 11:55:46 AM »

It's very hard to help you without seeing an example database, but I think the problem is between line 5-7.
If you look at the
documentation on group by / having you see Having is used as a where on groups.

Having is something I just ran across the other day while Googling for an example of how best to get a list of max(dates) out of a table during/using a self join. So given that it has yet to create a result set that made complete sense it's quite likely I'm using it wrong...or expecting it to do more than it actually does (hell if I knew what I was doing I'd probably be done by now... (hehe)). Also that's just one of a multitude of things I tried none of which worked properly.

Not sure how well this will paste, but here's a small (cleaned) piece of the table's content.
Formatted for SQL with the GeSHI Syntax Highlighter [copy or print]
  1. INDEX  SerialNo ModelName ClientID   CurPageCount  OldPageCount   TimeStamp   TotalPP
  2. 29 CNBM044207 3380 mfp MIKLA01 2404 26 2005-10-27 09:04:17 2378
  3. 9 USLH039092 1100 CRITR01 39339 38834 2005-09-29 11:47:17 505
  4. 10 CNDIF23087 1160 CRITR01 7947 7632 2005-09-29 11:47:17 315
  5. 30 USBNL12704 4200 MIKLA01 199451 193626 2005-10-27 09:04:17 5825
  6. 12 CN09506649 4300 CRITR01 401922 383635 2005-09-29 11:47:17 18287
  7. 13 JPCG012675 4v CRITR01 449756 449206 2005-09-29 11:47:17 550
  8. 14 USBB019753 8000 series CRITR01 844310 839591 2005-09-29 11:47:17 4719
  9. 15 USBB020920 8000 series CRITR01 760125 756990 2005-09-29 11:47:17 3135
  10. 16 U61036C5J482752 Brother MFC8840D CRITR01 3271 2314 2005-09-29 11:47:17 957
  11. 18 USBB031505 3100 fax SNIFFERT_B 167316 167316 2005-10-02 11:47:23 0
  12. 19 USCC000104 4050 UFORA03 258810 255488 2005-10-04 08:50:26 3322
  13. 20 JPJB002010 Color 8550 series ECODE01 162675 160416 2005-10-18 16:27:32 2259
  14. 21 USLH039092 1100 CRITR01 39945 39339 2005-10-19 08:52:25 606
  15. 22 CNDIF23087 1160 CRITR01 9131 7947 2005-10-19 08:52:25 1184
  16. 24 CN09506649 4300 CRITR01 426537 401922 2005-10-19 08:52:25 24615
  17. 25 JPCG012675 4v CRITR01 450190 449756 2005-10-19 08:52:25 434
  18. 26 USBB019753 8000 series CRITR01 848516 844310 2005-10-19 08:52:25 4206
  19. 27 USBB020920 8000 series CRITR01 768063 760125 2005-10-19 08:52:25 7938
  20. 28 U61036C5J482752 Brother MFC8840D CRITR01 4225 3271 2005-10-19 08:52:25 954
  21. 31 USSC016017 4000 series OCLAD01 124823 124164 2005-10-27 13:25:31 659


I'm still not really sure what you want because the query you are posting is not the same as your description of the query in the starting post.

The relevant part is. There are just some other (device specific) values being pulled in from another (PiX) table.

Your query is trying to search for a Client that is Active where the billing date from pc1 historyical information is identical to the last billing date from some other place (pc2), then grouping by printer and timestamp. It doesn't make sense.

I had a feeling that might confuse thing, which is why I didn't lead with it. Just ignore all the PiX.* stuff. It's only verifying that the device with SN X is "active" to thin out the data set.

Also to me it's not clear how Pc2 is related to pc1 tongue in english? Without being clear how you want the query in english noone can translate that to SQL. hope that helps

Hm... (a bit sketchy on that myself), Best I can tell table is joined to itself on SN so that each SN can be searched for/aligned to its maximum date with a TotallPP of 0.

I'm at a point where the more I look at it the less sense it makes (forest for trees??) even after taking a break.
Logged
Renegade
Charter Member
***
Posts: 11,822



Tell me something you don't know...

see users location on a map View Profile WWW Give some DonationCredits to this forum member
« Reply #5 on: July 04, 2011, 04:37:26 PM »

Have you tried using temporary tables? Joins can be very expensive. (Sorry -- I'd love to jump in and help more, but need to run out the door.) Multiple queries can sometimes help.
Logged

Slow Down Music - Where I commit thought crimes...

Freedom is the right to be wrong, not the right to do wrong. - John Diefenbaker
Perry Mowbray
N.A.N.Y. Organizer
Charter Member
***
Posts: 1,807



Thoughtful Scribbles

see users location on a map View Profile WWW Read user's biography. Give some DonationCredits to this forum member
« Reply #6 on: July 05, 2011, 01:06:31 AM »

I've probably not understood your problem, but what I generally do to get the last date for a sub-set of records is something like the following (written for a table we use to check this year's changed data... My apologies if I've not understood, this basically gets all the fields from the table where one field is equal to something and it's the last date:

Formatted for SQL with the GeSHI Syntax Highlighter [copy or print]
  1. SELECT DISTINCT
  2.    T1.[CheckDate],
  3.    T1.[Assessment_Key],
  4.    T1.[TPKLAPAPPL],
  5.    T1.[Assessment],
  6.    T1.[Purpose],
  7.    T1.[RateType],
  8.    T1.[Address]
  9. FROM
  10.    [infoprod].[dbo].[LicenseOnSiteCheckTable] T1
  11.    INNER JOIN [infoprod].[dbo].[LicenseOnSiteCheckTable] T2
  12.        ON T1.[TPKLAPAPPL] = T2.[TPKLAPAPPL]
  13.            AND T2.RateType= 'OSM1'
  14.            AND T1.[CheckDate] = (
  15.                SELECT
  16.                    Max(CheckDate)
  17.                FROM
  18.                    [infoprod].[dbo].[LicenseOnSiteCheckTable]
  19.                WHERE
  20.                    T1.[TPKLAPAPPL]= [TPKLAPAPPL])
  21. ORDER BY
  22. [Assessment]

Logged

Stoic Joker
Honorary Member
**
Posts: 5,402



View Profile WWW Give some DonationCredits to this forum member
« Reply #7 on: July 05, 2011, 06:43:59 AM »

Have you tried using temporary tables? Joins can be very expensive. (Sorry -- I'd love to jump in and help more, but need to run out the door.) Multiple queries can sometimes help.

My understanding was that joins were less expensive than nested selects, so I went that route to keep the server from racing itself to death (had to kill it a few times). This isn't for something that has to be done frequently by multiple users. It's a one-shot report that has great potential to be useful going forward on a ~monthly basis. So if it took a few minutes to run...That's Okay...If it comes up with the right answer.

OTOH, if it could be done quickly, no-one would complain... Wink

Haven't tried temporary tables yet, but I have been slowly resigning myself to the possibility of their need. In a nutshell, the downside of being completely self taught, is that when you're stuck, you are stuck... smiley (e.g. I'm stuck'ed)


@Perry - At this point I'm game to try anything. I'll try your query later today and see what happens. Hopefully it'll give me a data set that's a bit less hellter skelter than what I have been getting.
Logged
patthecat
Member
**
Posts: 88


View Profile Give some DonationCredits to this forum member
« Reply #8 on: July 05, 2011, 08:34:05 AM »

Take out the #7 Having clause and run the query. 
Is this the subset that you want the "max" timestamp to be chosen from?

I think it's where the issue lies when you try to bring PC2 timestamp into the picture.  If it is then at least you can try the other methods (sub queries, temp files, etc) you've tried to see if it get's the one "max" timestamp from your subset.

Oh and also, since you are dealing with timestamp, make sure the computer setting as far as timezones is consistent between the machines that produces the timestamp data into the database.  Had this issue once where the timestamp data was created on an Oracle Unix database which was in GMT settings, versus the Crystal report running on windows with the local timezone settings.  Once I made things consistent (forgot whether which I changed - windows timezone OR how Crystal Reports handled date time string) the report ran as expected.
Logged
Stoic Joker
Honorary Member
**
Posts: 5,402



View Profile WWW Give some DonationCredits to this forum member
« Reply #9 on: July 05, 2011, 08:38:48 AM »

Um... Perry? I may have translated the query incorrectly. The below modified version held a pair of 3.0GHz Xeons at 25% usage for 10 minutes (with no output in sight) ... So I gave up (had to) and killed it. Table's only got 20,000 lines in it. So I'm guessing something is awry (or you've got some seriously fast hardware).

Formatted for SQL with the GeSHI Syntax Highlighter [copy or print]
  1. SELECT DISTINCT
  2. Pc1.ClientID,
  3. Pc1.SerialNo,
  4. Pc1.ModelName,
  5. Pc1.TimeStamp
  6. FROM
  7. PageCounts Pc1
  8. INNER JOIN PageCounts Pc2
  9. ON Pc1.SerialNo = Pc2.SerialNo
  10. AND Pc2.ClientID = 'Client X'
  11. AND Pc1.TimeStamp = (
  12. SELECT MAX(TimeStamp)
  13. FROM PageCounts
  14. WHERE Pc1.SerialNo = SerialNo
  15. )
  16. ORDER BY Pc1.TimeStamp DESC
Logged
Stoic Joker
Honorary Member
**
Posts: 5,402



View Profile WWW Give some DonationCredits to this forum member
« Reply #10 on: July 05, 2011, 08:40:38 AM »

Take out the #7 Having clause and run the query. 
Is this the subset that you want the "max" timestamp to be chosen from?

I'll have to get back to you on that ... I gotta be on-Site here shortly.
Logged
Stoic Joker
Honorary Member
**
Posts: 5,402



View Profile WWW Give some DonationCredits to this forum member
« Reply #11 on: July 05, 2011, 01:34:45 PM »

Take out the #7 Having clause and run the query.  
Is this the subset that you want the "max" timestamp to be chosen from?

Okay, I take this to mean: When looking at the output, how do I know it's wrong? ...Which is actually a hell of a good GD question, now that I think of it ... Or rather, now that you asked me to think of it. Wink

Now here's the weird part. Most of it ain't. Here's an updated version of the query as just used for above:
Formatted for SQL with the GeSHI Syntax Highlighter [copy or print]
  1. SELECT Pc1.ClientID, Pc1.SerialNo, Pc1.ModelName, PiX.CurLocation, Pc1.TotalPP, Pc1.TimeStamp
  2. FROM PageCounts Pc1
  3. INNER JOIN PageCounts Pc2 ON Pc1.SerialNo = Pc2.SerialNo
  4. INNER JOIN Printer_Inv PiX ON Pc1.SerialNo = PiX.SerialNo
  5. WHERE Pc1.ClientID = 'Client X' AND Pc1.TotalPP = 0 AND PiX.Active = 1
  6. GROUP BY Pc1.SerialNo, Pc1.TimeStamp HAVING Pc1.TimeStamp = MAX(Pc2.TimeStamp)
  7. ORDER BY Pc1.TimeStamp DESC
  8. LIMIT 0, 300

This produces a data set of the correct size...and with the correct items. The problem is the dates are "wrong". Sure they're what I asked for ... But that's not (exactly...) what I meant (to ask for) ... And computers are of course basically shit for interpretation. (...Oops!)

So here's how I got ^^there^^. I was (assuming) looking for a data set size somewhere in the 30's, so when I kept getting 80's it looked/felt/appeared "wrong" ... Which it would have indeed been (wrong) if I hadn't just created another bill last Friday which contained 80 something devices with absent page counts.  wallbash

So here's the - Newly updated with correct-er information - fun part. The date column (actually) needs to reflect the date of the last non-zero value. I went with max to get the latest info. because the table contains the complete billing history. So I didn't want to have any old zeros surfacing if they'd been "washed" by a valid, more current, non-zero count. However I do need this report query to show how (far back) long its been since device X has had a valid count retrieved from it ... So we can look into the why who is most guilty of (zero count) silence.


I think it's where the issue lies when you try to bring PC2 timestamp into the picture.  If it is then at least you can try the other methods (sub queries, temp files, etc) you've tried to see if it get's the one "max" timestamp from your subset.

No that part - Best I can tell - is Okay ... I was just on the wrong end of the problem. Wink

Thank you!
Logged
justice
Supporting Member
**
Posts: 1,890



Solve issues simply.

View Profile WWW Give some DonationCredits to this forum member
« Reply #12 on: July 05, 2011, 06:35:28 PM »

Now I'm not an expert, but it seems because you need the last date from a bunch of non zero pagecounts, but you also need values from a bunch of zero pagecounts, that you have to do a subselect to get the max date and other information from the zero pagecounts data and then reuse that in your main query. smiley

So if that means you're back in the "nested select" performance problem then perhaps it's worth restructuring the data if possible, or wait for someone with better knowledge to help you out cheesy Google for subselect alternatives
Logged

Stoic Joker
Honorary Member
**
Posts: 5,402



View Profile WWW Give some DonationCredits to this forum member
« Reply #13 on: July 06, 2011, 06:48:18 AM »

Now I'm not an expert, but it seems because you need the last date from a bunch of non zero pagecounts, but you also need values from a bunch of zero pagecounts, that you have to do a subselect to get the max date and other information from the zero pagecounts data and then reuse that in your main query. smiley

That's about where I got too. Unfortunately the passing values between queries thing just isn't cooperating. I could resort to bolting this thing to a web page and just doing the subquery in the result set's while loop processing ... But I really wanted to keep this as a single free standing SQL query script so it could be dumped to Excel quickly.

So if that means you're back in the "nested select" performance problem then perhaps it's worth restructuring the data if possible, or wait for someone with better knowledge to help you out cheesy Google for subselect alternatives

...That's why I'm here... smiley ...Google just got me into the joins fiasco. Restructuring the data is not an option as it's tied into/used by more than one of our internal billing systems (can U say Ripple Effect..?  cheesy). There has got to be a simple answer for this...I just haven't stumbled across it yet.
Logged
Stoic Joker
Honorary Member
**
Posts: 5,402



View Profile WWW Give some DonationCredits to this forum member
« Reply #14 on: July 06, 2011, 08:18:41 AM »

Occam's Razor anyone?

The answer was simple, after I got the question right.

Formatted for SQL with the GeSHI Syntax Highlighter [copy or print]
  1. SELECT ClientID, PC.SerialNo, PC.ModelName, CurLocation, Max(TimeStamp) AS LastValidCount
  2. FROM PageCounts PC
  3. INNER JOIN Printer_Inv PiX ON PC.SerialNo = PiX.SerialNo
  4. WHERE PiX.Active = 1 AND ClientID = 'Client X' AND TotalPP > 0
  5. GROUP BY PC.SerialNo
  6. ORDER BY LastValidCount DESC, CurLocation
  7. LIMIT 0, 300
Logged
justice
Supporting Member
**
Posts: 1,890



Solve issues simply.

View Profile WWW Give some DonationCredits to this forum member
« Reply #15 on: July 06, 2011, 08:42:03 AM »

 ohmy huh embarassed Thmbsup
Logged

cranioscopical
Friend of the Site
Supporting Member
**
Posts: 4,196



see users location on a map View Profile Read user's biography. Give some DonationCredits to this forum member
« Reply #16 on: July 06, 2011, 09:55:19 AM »

Occam's Razor anyone?

Aha! Just in the nick of timeā€¦

Well done!  Thmbsup   I haven't a clue what you were doing but it's good to see persistence rewarded.  Thmbsup
Logged

Chris
Pages: [1]   Go Up
  Reply  |  New Topic  |  Print  
 
Jump to:  
   Forum Home   Thread Marks Chat! Downloads Search Login Register  

DonationCoder.com | About Us
DonationCoder.com Forum | Powered by SMF
[ Page time: 0.102s | Server load: 0.06 ]