ATTENTION: You are viewing a page formatted for mobile devices; to view the full web page, click HERE.

News and Reviews > Official Announcements

Server - Apache problem running out of memory

(1/2) > >>

mouser:
I'm not sure how accurate my description of this problem is going to be, but we seem to have a problem on the new server which has caused the server to crash occasionally.  It seems to have to do with running out of memory from Apache processes.

In truth, it's hard for me to even write about the server software that is the backbone of our entire culture without flying into a screaming rage about how incredibly horrible all of it is designed and how it's virtually impossible to figure out what goes wrong and why and how impossible it is to configure this stuff to work well under load.  But hey, it's just the software that powers millions of web sites and servers and runs the entire internet, so why should it matter if it is complete crap.

But anyway.. the best solution we can think of is maybe to set apache maxclients to something like 21 and hope this will keep the server from running out of memory.

Does anyone have any better ideas?  We upgraded the server when we moved to 8gb but the use of vmware to host multiple servers seems to be really taking a toll on available memory.

We never had this problem on the old server.. I'm not sure why we are having it now.

I'm also not sure if this is related to the random but persistent problem we are having with the server (especially member server) timing out on some downloads, or failing with partial files.

Any suggestions or ideas would be welcome.

mouser:
ps. we set maxclients to 21 now to see if that helps.. though it seems to have slowed down web access to the server substantially.

f0dder:
Try another httpd? Like, one that uses async I/O instead of relying on multiple processes?

Gothi[c]:
First of all, while mouser states "we don't know" he really means "he doesn't know" :) I tried to explain, but oh well...

The problem is this:

Recently we have been implementing a lot of caching in order to speed things up. The database currently does quite some heavy query caching, and on the apache side we have besides the usual stuff, APC for php opcode caching (which the forum software, smf, can use the API for). All of this, of course comes at a cost of memory.

We had maxclients set quite high (200), which makes things run pretty fast with prefork, since there's always processes around to serve new clients. The problem is that the average apache process size has grown to around 30MB, so worst case scenario is 20*30MB=5.8GB, which is more memory than the server has.
In fact, after the aggressive MySQL caching, there is only 780 MB free for apache to play around with.
This is why I have decreased MaxClients to 20, which will prevent us from running out of memory, at the cost of some slowdown.

So really, all we have to do at this moment is probably just make the MySQL caching a bit less crazy, to make some more room for Apache.

As f0dder points out sort of kind of, the fact that Apache with prefork is process based, it makes memory consumption a bit hard to predict (since the memory size of a process changes by what script is running, moreover, if you use keepalive, it grows with the number of requests a client makes over the same connection to the same client thus making things even more unpredictable).

I wouldn't go as far as switching httpd, but we could switch MPM. There is an experimental event MPM for apache, but I'm not sure how I feel about using an experimental MPM on a production server.

To make matters worse, packet loss at softlayer has been really hurting us.
This is their latest excuse:

    SoftLayer Engineers are aware of the sporadic packetloss and/or connectivity to FCR01.SEA01. Currently engineers are working on resolving this issue; however, there is a chance a reboot to the router will be required. In the event this needs to happen, a notice will be posted here along with any other information gathered.

-- Update --
Service has been restored to customers behind FCR01.SEA01. During the process of working on the router, the issue manifested itself into 100% CPU resulting in upstream and downstream links going down along with routing protocols. Engineers were able to stabilize the router without a reload, and are currently monitoring it to determine if the fix is permanent.

-- UPDATE --
Engineers have determined this is not a permanent fix for the FCR01.SEA01 issue. During the course of troubleshooting this issue with Cisco, it has been determined that the best course of action is to upgrade the router to the latest IOS version. This will be happening at approximately 01:30 CDT.

--UPDATE--
Engineers are continuing to work with Cisco TAC on this issue. At this point, the router has been restored to service at approximately 03:20 CDT. Some customers may continue to experience intermittent packetloss behind FCR01.SEA01.

--- End quote ---

Working for a hosting provider myself, I understand that stuff happens, but this packet loss stuff with them has been going on ever since we moved to their seattle data center :( every day.
Also, one would think that softlayer is big enough to have a replacement router ready they can just put in (or have a proper redundant network that can route around it for that matter).

mouser:
SoftLayer's handling of the issue has been horrible and at this point I wouldn't feel at all comfortable recommending them to anyone as a hosting solution.  In the past i've recommended SoftLayer on the basis that they may be more expensive, but they are more stable and reliable.  But this experience has turned all of that around.

Navigation

[0] Message Index

[#] Next page

Go to full version