Identifying Spammers in Outbound Email Systems
A minor thesis submitted in partial fulfilment of the requirements for the degree of Master of Computer Science
Martin François Foster School of Computer Science and Information Technology RMIT Melbourne, Victoria, Australia January 24, 2011
Declaration
This thesis contains work that has not been submitted previously, in whole or in part, for any other academic award and is solely my original research, except where acknowledged.
This work has been carried out since March 2010, under the supervision of Dr. Ibrahim Khalil.
Martin François Foster School of Computer Science and Information Technology RMIT January 24, 2011
Acknowledgement
I would like to thank my supervisor, Dr. Ibrahim Khalil for his feedback throughout the process, and particularly his guidance in choosing a reasonable amount of data and metrics to analyse.
I thank my wife Nicole and daughter Stefanie for travelling to Canada in the final weeks of my write-up process, and our families to hosting them while there. Without this time to focus, this minor thesis would probably have never been completed.
List of Figures
- Figure 1: simple email system
- Figure 2: service provider Mail System Components and SMTP flows
- Figure 3: mail System Components and in/out flows
- Figure 4: a SMTP transaction
- Figure 5: a SMTP transaction seen from system logs
- Figure 6: centralized syslog host
- Figure 7: simple case, one server one daemon
- Figure 8: single server multiple application
- Figure 9: multiple applications, multiple servers
- Figure 10: system under test, Pacnet Internet Outbound e-mail servers
- Figure 11: an event stream with postfix accept, message-id, sender and recipient events
- Figure 12: an Event Query Language (EQL) statement for selecting AMSR events
- Figure 13: reconstructed message
- Figure 14: delivery error rate for sources over 250 messages/week
- Figure 15: delivery error rate to total message volume
List of Tables
Abstract
Large scale Service Provider email systems have always been targeted for potential exploitation by spammers because of these systems’ capability to deliver huge volumes of their nefarious payload.
Today, most email systems decide whether to accept email from another system based the sender system’s network reputation. Reputation provides Service Providers with the incentive to minimize spam originating from their network; allowing too much spam to be sent via their facilities is likely to result in a poor reputation, meaning other reputable service providers will reject their mail.
Compared to the volumes of research targeted to detecting spam in inbound mail flows, there has been relatively little work done in identifying spam in outbound mail flows. The research that has been done on outbound flows is difficult to apply to modern mail systems, in that it discounts either their complexity, has too much reliance on simple metrics such as volume, or provides mechanisms that are only suited for offline analysis – by which time it is too late to act.
Service Provider email system complexity must be accounted for in order to build a complete picture of the paths and transformations that affect messages on their way out of a mail system: typically crossing multiple servers and multiple different software packages. Old metrics such a message sending volume are likely to penalize the wrong party; with the advent of stringent anti-spam laws legitimate mailing lists tend to deliver a high volume of email to recipients desiring this content. Whereas spammers have moved away from sending high volumes from one host over a short amount of time – preferring to send low volumes from many hosts over a long period of time to net higher overall delivery and avoid or delay detection by service providers.
This thesis presents a novel mechanism to detect spammers in the outbound flows of Service Provider e-mail systems. It accounts for the complexity of such systems, can be used for near real time analysis, and uses the foreign destination system’s response as a measure of the probability that the sender is a spammer.