topbanner_forum
  *

avatar image

Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
  • Monday March 18, 2024, 10:28 pm
  • Proudly celebrating 15+ years online.
  • Donate now to become a lifetime supporting member of the site and get a non-expiring license key for all of our programs.
  • donate

Author Topic: The tricky issue of structuring a database log table for your web service  (Read 9931 times)

mouser

  • First Author
  • Administrator
  • Joined in 2005
  • *****
  • Posts: 40,896
    • View Profile
    • Mouser's Software Zone on DonationCoder.com
    • Read more about this member.
    • Donate to Member
I've been confronted by this problem before and am being faced with it again while working on a new big project (YUMPS).

The question is how to best structure a logging database table.

It seems like it should be a trivial issue.  You want to keep a table of events that an administrator can browse to help identify problems.  So you obviously need fields for date/time stamp, and some text about the event, maybe the userid# responsible.

Now if you're only going to log unusual rare events, and you're never going to need to search it, then there's nothing to worry about.

But if you would like to log a lot of events to help you keep records and track what users are doing, and want to be able to search it easily, then it starts to get a little tricky because this log table might start getting huge, hundreds of thousands or even millions of entries.. So do you really want to be storing large text blocks?  Some of these events you might like to preserve indefinitely, while others are just going to be taking up huge amounts of wasted space..

And do you need to reserve some fields in the table corresponding to object ids relevant to the event.  For example if there is an event like "User X did operation Y on Blog post Z" and you ever want to be able to search for all operations on blog post Z, then you need an integer field for Z so you can search and filter for all events on that column=Z.   But its not going to be a specific column for BlogPostId because blog posts are only 1 of the kinds of objects you work with.  So it gets messy.

An alternative solution would be to use multiple log tables, one for each kind of event.. I guess this has some pros and cons.

Anyone know about any good discussion of this kind of thing?  Or any particularly elegant solutions? It's obviously something that the big guys (facebook, etc.) need to deal with well, but i've not seen any good discussions of the issue.

db90h

  • Coding Snacks Author
  • Charter Member
  • Joined in 2005
  • ***
  • default avatar
  • Posts: 481
  • Software Engineer
    • View Profile
    • Bitsum - Take control of your PC
    • Read more about this member.
    • Donate to Member
You enumerated the feasible ways to do it, though nuances of them are available. To mitigate the impact of a log growing excessively large, old or uncommon events can be moved to an ancillary 'archive' table. If a preliminary search in the primary table fails, your code must then join up with the secondary 'archive' table... or perhaps a properly written SQL statement will be equally as efficient and do the same thing all-in-one. As with everything databases, it is all about structure. I don't think there's a magic bullet, and optimization will depend heavily on the type and frequency of query.

Additional fields can be supplied, through foreign key links, while still maintaining use of a single log table. They can be dependent on the event type, again enumerated in a different table with a foreign key into it.
« Last Edit: March 24, 2012, 07:22 AM by db90h »

mouser

  • First Author
  • Administrator
  • Joined in 2005
  • *****
  • Posts: 40,896
    • View Profile
    • Mouser's Software Zone on DonationCoder.com
    • Read more about this member.
    • Donate to Member
To mitigate the impact of a log growing excessively large, old or uncommon events can be moved to an ancillary 'archive' table.

Yes i am considering building in such auto archiving functionality into the log system of YUMPS.. it does seem like a reasonable way to trade off daily efficiency with long term storage.



The use of additional tables for additional specialized fields though.. thats part of what i'm still struggling with -- whether that is the right way to do it.



I am thinking right now that I will have 2 main KINDS of log tables though.. One for system events, and one for user events.  The former being mostly things that a site coder would care about, warnings and errors, etc.  The latter being something that moderators of a site might care about, to track what users do.



The idea of automatic archiving is connected to the questions of what to store.. As part of the stuggle is clearly that there are cases where you'd like to log voluminous detailed information -- but keeping it around indefinitely is too costly in terms of disk and memory and cpu usage.  So maybe the idea of building a system that allows voluminous logging but performs regular automatic maintenance on the log file is the way to go.. Archiving some old stuff, and deleting really old big content that is unlikely to be needed in the long run...  But that's a lot of work for a log table..

db90h

  • Coding Snacks Author
  • Charter Member
  • Joined in 2005
  • ***
  • default avatar
  • Posts: 481
  • Software Engineer
    • View Profile
    • Bitsum - Take control of your PC
    • Read more about this member.
    • Donate to Member
The root problem I believe you describe is not being able to create a log system generic enough to handle any type of data thrown at it, and in any quantity. I am not sure if using additional tables with foreign indexes is most optimal or not either, but it works. You may end up having to create log type specific tables though to cover the needs at hand, rather than try to be generic.

If there are millions of log events involved, you'll probably end up having to create a specialized logging system for it, regardless of what additional measures are taken. It would have to be highly optimal, or you'd need a powerful server one.

Someone who is a database expert can probably best answer this. That is a field into itself, as you know. And while we can easily create, maintain, and manipulate databases... the real SQL database guru's out there can probably give you a lot more input than me.

Steven Avery

  • Supporting Member
  • Joined in 2006
  • **
  • default avatar
  • Posts: 1,038
    • View Profile
    • Donate to Member
Hi Folks,

My understanding is that the particular database makes a difference, as well as the type of access fitting the database.  A million records may be super-quick on a Sequel server while bog down on a PostgreSQL (simply based on the type of access, how memory is handled, etc.)   It is possible that the person who told me this (they did a switch for this reason) had other factors involved.

Generally, if the data structure is quite similar, as a general concept it would seem proper to keep the number of tables small. (e.g. User and System.) In all those cases where you want "all the activity of user A". ... you really would prefer to extract from one table with keyed access, rather than jump here and there and do special programming and joins and this and that.

Your concern about the notes would not seem to matter much, except in a fixed format data file where you might end up with gigabytes of blanks.  Which is a different legacy architecture. If the notes are actually typed in by the user, they are an integral part of the data.

Sensible decisions on archiving of course will help you out.

And I agree about bouncing this off the real database experts.

Steven

db90h

  • Coding Snacks Author
  • Charter Member
  • Joined in 2005
  • ***
  • default avatar
  • Posts: 481
  • Software Engineer
    • View Profile
    • Bitsum - Take control of your PC
    • Read more about this member.
    • Donate to Member
Did you ever find your perfect solution? I do my own logging of all sorts of my web server events (to an SQL DB), and if there's a better way - I'd adopt it.

mouser

  • First Author
  • Administrator
  • Joined in 2005
  • *****
  • Posts: 40,896
    • View Profile
    • Mouser's Software Zone on DonationCoder.com
    • Read more about this member.
    • Donate to Member
I wrote a little bit about how I decided to handle it here: https://www.donation....msg283364#msg283364

db90h

  • Coding Snacks Author
  • Charter Member
  • Joined in 2005
  • ***
  • default avatar
  • Posts: 481
  • Software Engineer
    • View Profile
    • Bitsum - Take control of your PC
    • Read more about this member.
    • Donate to Member
Looks like a good solution for YUMPS. :Thmbsup: Maybe overkill for me, but very robust. I will likely stick with my simplified logging unless and until I need something more robust. I was just curious what you discovered on the proper way do this. While I'm no stranger to SQL by any means, I am also no expert in the field. That's one class I slept through in college, lol. Seeing some of these SQL injection attacks reminds me just how complex SQL can be. If I ever grow to a size where I need a more robust solution, it might be YUMPS I call upon!