![]() |
|||
Greylisting Proposal [as originally sent to the ISF in June 2003]There was recently a paper released on a new SPAM abatement methodology that strictly uses the nature of the SMTP protocol itself to provide the relief. For all of the details, please see the original paper:
In this note, I am going to briefly outline the nature of the methodology, how it works, what it would mean for TAMU, and what my observations of it have been with my testing. First, one of the main methods of SPAMming is that a SPAM site has a database and a small machine somewhere. This machine just reads addresses from the database, connects to the destination host for that address, spews the message ignoring the SMTP result codes, and disconnects. The main point with that scenario is that the spammer's machine never actually performs the queuing and retrying that a "real" mail server should do; that is one of the ways that they keep their costs down and put the burden of the e-mail solely on the recipient machine (i.e. TAMU hosts and servers). For our part, that means that they spew these thousands of messages to our site (particularly to smtp-relay.tamu.edu), and send all these messages that appear to be valid from our viewpoint: The sender addresses "look" good in that we can resolve the hostnames used on the right of the "@", and the recipients are in the tamu.edu domain (or one of several others that are handled on-campus) such that we should accept responsibility of delivery for them. Unfortunately, the databases used by the spammers have been accreting for over a decade and there are many completely bogus addresses in them. Many of those addresses are destined for hosts that were valid at one time, but are no longer connected to the network, even though they are still in DNS (since neither the previous host owner [who should have], nor CIS networking [who would have to do a lot of owner contacting/etc to do so] have cleaned up those ancient hostnames that previously were valid for receiving mail). With this deluge of bogus messages it takes our disk space to queue it until the timeout for the messages arrives and we then bounce the message back to the (usually bogus) sender. Since the sender is bogus, the messages "double-bounce" or timeout if the sender was faked as yet another no-longer deliverable hostname. Several problems are caused by this:
This brings us to the new methodology presented in the paper above. The main tenet of the proposal is to move the burden and expense of the queuing to the sending system temporarily to make sure they are a real mail system, after which everything happens as it does now. [As you read this, note that local-hosts will be pre-white-listed for no delays ever. Also, the newest version causes out-going mail to be pre-white-listed for immediate acceptance upon reply by the off-campus responder.] The way the remote site is forced to queue the message temporarily is that when a new message gets offered from a remote host out host uses a database to note the sending IP, and the sending and recipient e-mail addresses. Our side answers the attempt to send with a 400-level "TMPFAIL" answer that says "I should be able to accept this message; but, I can't do it right now, please try again later" as defined in the SMTP protocol. The remote site will queue it and try to re-send again later. At least an hour later (by default), when the same host using the same e-mail addresses tries to re-send the message, the message will be accepted just like happens currently and that triplet of information is marked in the database so that future e-mail will pass through with no delay. Only the first time an IP/sender/recipient triplet is seen will there be a short delay, after which there will be no delay. SPAM sites, on the other hand, do not do queuing, so when we say TMPFAIL, they don't care, they're just spewing the message ignoring our answers. Since they don't queue, they won't re-send the message and the intended recipient will never get their SPAM. Obviously there are a few ways that the senders of SPAM can respond:
Regarding "real" mail, there is one obvious and one discovered issue:
I have been running this particular implementation on my personal mail server since this last weekend and one of the local ISPs has been experimenting with it as well. So far our SPAM levels have dropped 2 orders of magnitude (i.e., I am down from over 90 SPAM messages each day in my personal mailbox to fewer than 5) all the while I am receiving all of my personal and mailing-list traffic as before. This is by far the best solution I have implemented and I think it possible to configure the TAMU SMTP systems to gain all of the benefits. I am just beside myself with how well it is working in practice. This is in addition to the existing procedures already in place to block viruses and other protocol errors, and tagging of anything that does make it through would continue unaltered. |
|||