Proto Balance Mail: Public Release - 15 February 2010
The cost of deploying an enterprise mail cluster just dropped to 1/10th
A new technique for SMTP load balancing gives economy to large-scale SMTP clusters.
After many months of finishing touches, Proto Balance Mail is released
with much excitement and expectation. Proto Balance Mail is
categorically the most effective and economical mail clustering
solution for enterprises hosting 1,000, 10,000, 100,000 or even
1,000,000 mailboxes. Previous headaches managing and maintaining
primitive "DNS load balancing" SMTP clusters are a thing of the past.
Cumbersome and expensive shared file-systems can be done away with.
Proto Balance Mail not only scales virtually infinitely, but also blocks
SPAM more effectively than any other standalone mail system. Bot-net
attacks, mail-ware, or any other type of mail abuse is monitored, and
automatically blocked using on-the-fly statistical analysis.
Proto Balance Mail has an XML interface (SOA = Service Oriented
Architecture) to seamlessly integrate with your infrastructure.
We are here to rescue you: Proto Balance Mail cures the problem of mail.
Application Note - 29 March 2009
TCP FIN-WAIT and TIME-WAIT Timeout - What Setting Should I use?
Summary: there is no magic bullet. Timeouts requires forethought.
TCP has many timeouts configurable based on whether you are using
LAN-speed or Internet-speed packet latencies. These timeouts can cause
ports to be marked in-use for periods much longer than the time the
connection is held open. A load balancer must make many connections
between the same machines: this means that ports can rapidly be used up
in high load conditions. To solve this problem one can be tempted to
reduce timeout values (like TIME-WAIT and FIN-WAIT) to very low values.
Be aware however of the case where a web server serves a web page but a
client takes a long time to read and process that page. If the client
takes more than the FIN-WAIT timeout to process the page, the web server
could drop the connection while the client application has not yet read
the final bytes of data. There is a real possibility that the web server
could expire the connection by sending a RST (reset) packet thus
prematurely closing the connection.
When serving non-critical content, the occasional lost data will not
appreciably impact a user's experience. Throughput may be most important
and TIME-WAIT and FIN-WAIT can be set to low values. However for
business-critical applications a single lost connection may not be
tolerable. Here are some things to note:
- A "high" connection rate means a rate of more than 100 new connections
per second. If your rate is lower than this, you should keep your operating system
default settings for TCP timeouts.
- The FIN-WAIT timeout must be set to a value higher than the longest any client
might take to finish reading all the web server data.
- The TIME-WAIT timeout must be set to a value longer than any client might take to
complete a closing negotiation. This can be much lower than FIN-WAIT without
fear of loosing a transaction.
- HTTP is not a completely reliable transport. Web Service (SOA) implementations must
perform a geometric retry schedule using a unique transaction ID.
- HTTP/1.1 holds the connection open. It will allow for less new connections,
and allow you to set your timeout values to a high number (120 seconds and up)
without fear of consuming all your ports.
Application Note - 5 March 2009
IP Connection Tracking Problem - Linux Kernel
Summary: disabled the Linux connection tracking module on your installations.
Linux distributions ship by default with an IP connection tracking
module. Some distributions enable this module by default. This module
includes an internal connection table that keeps track of a maximum of
16384-32768 concurrent connections (memory dependant). The default
timeout after a connection is released is 120 seconds. After the
connection table is exhausted packets are silently dropped. Therefore
the maximum sustainable connection rate is between 140 and 270
connections per second.
If you are not doing NAT (network address translation) or NAT-MASQ
(IP masquerading) you can safely remove this module using
and reboot your machine. You are advised to backup your system
before doing this and also seek expert consultation.
You can increase the table size by recompiling your Linux kernel.
Please contact us for instructions.
Feature Article - 2 February 2009
Why Not Use a Free Load Balancer?
Summary: free load balancers have a hidden bite.
There are many load balancer applications written and distributed
for free that appear as a simple solution to a simple problem.
However, to achieve high performance and scalability is more
challenging than free software authors care to admit.
The worst performing free load balancers create a separate process
for each new connection. No operating system functions well with
thousands of concurrent processes. The load balancer will
appear to work fine until there is a sudden spike in connection
rate - the flood of process spawning will completely deny service.
Other load balancers create a single thread per connection. Thousands of
threads, though more friendly to the OS, are still prone to deny service.
More subtle are load balancers that do blocking connects. The load
balancer appears to work perfectly well until there is a network failure
to one of the backend machines. Because the connect is blocking,
all traffic through the load balancer halts until the connect
times out. The timeout can only be reproduced with a hardware failure or
power outage, so it goes unnoticed until it happens in your production