I thought that this experience was blog-worthy due to the highly undocumented nature of this problem. I hope that it can shed some light on why your new MS2003 web migration isn't exactly holding it's own under a moderate traffic load.
Day starts like this.. You take initiative and migrate all of your rusty old NT webfarm to a happy new Win2003 environment. You're running very expensive and highly trafficked custom ASP code, with a remote SQL backend. Expecting huge performance gains from the cutting edge systems, you brief everyone on the IT staff about the new direction the neglected websystems are headed.
You migrate all of the systems over in one fell swoop, do a bit of testing and after deciding that everything checks out - you swap the DNS entries and now the applications are live on the new hardware.
4am the phone rings, it's Tokyo and they want to know why their business critical web applications are not serving pages.
Saving you from my ranting soliloquy, this is an issue that I recently ran into. All ran smooth as a whistle until there was a mild, I stress mild, load (<300 concurrent connections) on the servers. But why did cutting edge servers running the latest Microsoft webserving technology get out performed by old NT machines? The answer is in the way that 2003 server talks to SQL backends with connection pooling disabled.
If you read this far, you should follow me on Twitter!
Basically the web server was running out of available ports to communicate with the back end SQL server. This became evident by running a basic netstat on the test webserver during one of the load tests and monitoring the connection traffic. By default, w2k3 server reserves 4000 ports for communicating with SQL. The netstat showed me that all 4000 ports were quickly simultaneously opened to the SQL server. When the server runs over this allotment, it will start denying the connections. These opened ports are reserved for a default 4 minutes, so even if the connection is idle - it is still in a "TIME_WAIT" status, essentially unusable by another request. Every request after this limit gets denied until ports are freed. This is a very "obscure" feature of 2003 and classic ASP. There are essentially two ways to fix the problem, in this case, partly to do with the nature of the applications, I chose to increase the port allotment on the webserver to allow more simultaneous connections. Here is the quick and dirty fix:
In the afflicted webservers registry, add the key
Value Name: MaxUserPort
Data Type: REG_DWORD
Value: 65534
to
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters
This essentially gives the client (in this case a webserver) 60000 more ports to play with. After applying the changes, the load tests show that this successfully resolved our issue.
You should under just about all circumstances be using connection pooling, however there are some circumstances where this is not feasable. In my own opinion, I am not confident that connection pooling works very well after some of the load tests I have done on 2003 server communicating with a SQL backend (in my case I was using a SQL cluster). This may or not have been fixed by the time this is read, if it was an issue at all.
After spending hours testing and researching this issue, we happened on this fix. Since then MS has published this KB article explaining most of what is going on here.
http://support.microsoft.com/kb/328476
Please note, that this may or may not be the best solution for your setup. This type of issue can also be evident if there are other underlying problems such as poorly closed code or quickly opening and closing connections in the code which can lead to high stress on your database servers.
I hope this is useful info, I know it would have been to me had I run across it.
If you read this far, you should follow me on Twitter!
2 comments:
How do you enable connection pooling with OLEDB provider from classic ASP to SQL server 2000, both on win2003 x64 machines?
I'm experiencing your same issue, tryingto fix opening ports, but I would be happier with connection pooling on for performance reasons.. do you think this is useless? And why it could be disabled on such an environment I described? How do I check it?
Thanks mate
Apkmopo.com
FileHippoCnet.com
Canondriverscenter.com
Post a Comment