The tricky issue of structuring a database log table for your web service

ATTENTION: You are viewing a page formatted for mobile devices; to view the full web page, click HERE.

Other Software > Developer's Corner

(1/2) > >>

mouser:
I've been confronted by this problem before and am being faced with it again while working on a new big project (YUMPS).

The question is how to best structure a logging database table.

It seems like it should be a trivial issue. You want to keep a table of events that an administrator can browse to help identify problems. So you obviously need fields for date/time stamp, and some text about the event, maybe the userid# responsible.

Now if you're only going to log unusual rare events, and you're never going to need to search it, then there's nothing to worry about.

But if you would like to log a lot of events to help you keep records and track what users are doing, and want to be able to search it easily, then it starts to get a little tricky because this log table might start getting huge, hundreds of thousands or even millions of entries.. So do you really want to be storing large text blocks? Some of these events you might like to preserve indefinitely, while others are just going to be taking up huge amounts of wasted space..

And do you need to reserve some fields in the table corresponding to object ids relevant to the event. For example if there is an event like "User X did operation Y on Blog post Z" and you ever want to be able to search for all operations on blog post Z, then you need an integer field for Z so you can search and filter for all events on that column=Z. But its not going to be a specific column for BlogPostId because blog posts are only 1 of the kinds of objects you work with. So it gets messy.

An alternative solution would be to use multiple log tables, one for each kind of event.. I guess this has some pros and cons.

Anyone know about any good discussion of this kind of thing? Or any particularly elegant solutions? It's obviously something that the big guys (facebook, etc.) need to deal with well, but i've not seen any good discussions of the issue.

db90h:
You enumerated the feasible ways to do it, though nuances of them are available. To mitigate the impact of a log growing excessively large, old or uncommon events can be moved to an ancillary 'archive' table. If a preliminary search in the primary table fails, your code must then join up with the secondary 'archive' table... or perhaps a properly written SQL statement will be equally as efficient and do the same thing all-in-one. As with everything databases, it is all about structure. I don't think there's a magic bullet, and optimization will depend heavily on the type and frequency of query.

Additional fields can be supplied, through foreign key links, while still maintaining use of a single log table. They can be dependent on the event type, again enumerated in a different table with a foreign key into it.

mouser:
To mitigate the impact of a log growing excessively large, old or uncommon events can be moved to an ancillary 'archive' table.
--- End quote ---

Yes i am considering building in such auto archiving functionality into the log system of YUMPS.. it does seem like a reasonable way to trade off daily efficiency with long term storage.

The use of additional tables for additional specialized fields though.. thats part of what i'm still struggling with -- whether that is the right way to do it.

I am thinking right now that I will have 2 main KINDS of log tables though.. One for system events, and one for user events. The former being mostly things that a site coder would care about, warnings and errors, etc. The latter being something that moderators of a site might care about, to track what users do.

The idea of automatic archiving is connected to the questions of what to store.. As part of the stuggle is clearly that there are cases where you'd like to log voluminous detailed information -- but keeping it around indefinitely is too costly in terms of disk and memory and cpu usage. So maybe the idea of building a system that allows voluminous logging but performs regular automatic maintenance on the log file is the way to go.. Archiving some old stuff, and deleting really old big content that is unlikely to be needed in the long run... But that's a lot of work for a log table..

db90h:
The root problem I believe you describe is not being able to create a log system generic enough to handle any type of data thrown at it, and in any quantity. I am not sure if using additional tables with foreign indexes is most optimal or not either, but it works. You may end up having to create log type specific tables though to cover the needs at hand, rather than try to be generic.

If there are millions of log events involved, you'll probably end up having to create a specialized logging system for it, regardless of what additional measures are taken. It would have to be highly optimal, or you'd need a powerful server one.

Someone who is a database expert can probably best answer this. That is a field into itself, as you know. And while we can easily create, maintain, and manipulate databases... the real SQL database guru's out there can probably give you a lot more input than me.

Steven Avery:
Hi Folks,

My understanding is that the particular database makes a difference, as well as the type of access fitting the database. A million records may be super-quick on a Sequel server while bog down on a PostgreSQL (simply based on the type of access, how memory is handled, etc.) It is possible that the person who told me this (they did a switch for this reason) had other factors involved.

Generally, if the data structure is quite similar, as a general concept it would seem proper to keep the number of tables small. (e.g. User and System.) In all those cases where you want "all the activity of user A". ... you really would prefer to extract from one table with keyed access, rather than jump here and there and do special programming and joins and this and that.

Your concern about the notes would not seem to matter much, except in a fixed format data file where you might end up with gigabytes of blanks. Which is a different legacy architecture. If the notes are actually typed in by the user, they are an integral part of the data.

Sensible decisions on archiving of course will help you out.

And I agree about bouncing this off the real database experts.

Steven

Navigation

[0] Message Index

[#] Next page

Go to full version