Overview
The SpamAssassin system is software for analyzing email messages, determining how likely they are to be spam, and reporting its conclusions. It is a rule-based system that compares different parts of email messages with a large set of rules. Each rule adds or removes points from a message's spam score. A message with a high enough score is reported to be spam.
Many spam-checking systems are available. SpamAssassin has become popular for several reasons:
It uses a large number of different kinds of rules and weights them according to their checks. Rules that have been demonstrated to be more effective at discriminating spam from non-spam email are given higher weightings.
It is easy to tune the scores associated with each rule or to add new rules based on regular expressions.
SpamAssassin can adapt to each system's email environment, learning to recognize which senders are to be trusted and to identify new kinds of spam.
It can report spam to several different spam clearinghouses and can be configured to create spam traps—email addresses that are used only to forward spam to a clearinghouse.
SpamAssassin Scoring System
The approach, to filter spam in SpamAssassin is more sophisticated than the simple keyword matching provided by most SMTP anti-virus software. SpamAssassin uses a scoring system: messages are tagged as spam only when they have enough spam characteristics in total. This in combination with other features results in very few false positives. In our experience, a properly managed SpamAssassin installation correctly identifies 90% to 95% of spam with less than 1% false positives.
SpamAssassin doesn't block spam. Instead, it tags messages as probable spam by changing the Subject line and message headers. This is very wise: no automated system can recognize spam with 100% certainty — deciding «what is spam» is a judgment call. All automated spam filters will produce some false positives (wanted e-mail mistakenly tagged as spam) and false negatives (spam not identified as such).
SpamAssassin identifies probable spam e-mail, but leaves the choice of what to do with it up to you. You can instruct the users how to add rules to their e-mail software to delete identified messages or, better yet, move them to a folder for later review.
The method shown here only tags suspected spam. No automated deletion is performed, but the filter script can be changed without too much effort to sideline or delete suspected spam if that's what you want.
Installing SpamAssassin
SpamAssassin is build from several Perl modules. We recommend to installing it manually as described below. The SpamAssassin documentation describes how to install and configure the software. We installed each module with make, make test and make install.
-
Download the following modules from http://www.cpan.org/, if you search a single module,http://search.cpan.org/ can be very helpful.
Digest-1.15.tar.gz
Digest-HMAC-1.01.tar.gz
Digest-SHA1-2.11.tar.gz
Digest-SHA-5.44.tar.gz
Archive-Tar-1.30.tar.gz
Crypt-OpenSSL-RSA-0.24.tar.gz
DB_File-1.815.tar.gz
Error-0.17008.tar.gz
Geography-Countries-1.4.tar.gz
HTML-Parser-3.56.tar.gz
IO-Zlib-1.05.tar.gz
IP-Country-2.23.tar.gz
libnet-1.20.tar.gz
libwww-perl-5.805.tar.gz
Mail-DKIM-0.24.tar.gz
Mail-DomainKeys-1.0.tar.gz
Mail-SPF-Query-1.999.1.tar.gz
MailTools-1.76.tar.gz
MIME-Base64-3.07.tar.gz
Net-CIDR-Lite-0.20.tar.gz
Net-DNS-0.59.tar.gz
Net-Ident-1.20.tar.gz
Storable-2.16.tar.gz
Sys-Hostname-Long-1.4.tar.gz
Text-Diff-0.35.tar.gz
Time-HiRes-1.9707.tar.gz
DBI-1.54.tar.gz
DBD-mysql-4.004.tar.gz
Mail-SpamAssassin-3.1.8.tar.gz
-
Install the modules as follows:
tar xzvf <file-from-above>
cd <extracted module>
perl Makefile.PL
make
make test
make install
Configure SpamAssassin
SpamAssassin is installed «out of the box» in /usr/share/spamassassin with a good set of spam identification rules. You can specify your own settings in file /etc/mail/spamassassin/local.cf.
We recommend making some changes to local.cf right away: Whitelist well-known senders so their mail will never be identified as spam. You should whitelist the e-mail addresses of well-known legitimate senders to avoid the chance of them being mis-identified by the SpamAssassin default rules. Add«whitelist_from» settings to file /etc/mail/spamassassin/local.cf for each important client, mailing list and other known spam free senders.
# How many hits before a message is considered spam.
required_hits 5.0
# Text to prepend to subject if rewrite_subject is used
rewrite_header Subject [*****SPAM*****]
# Encapsulate spam in an attachment
report_safe 1
# Enable the Bayes system
use_bayes 1
# Enable Bayes auto-learning
bayes_auto_learn 1
bayes_path /home/spamd/
bayes_file_mode 0666
# Enable or disable network checks
skip_rbl_checks 0
use_razor2 0
use_dcc 0
use_pyzor 0
# Mail using languages used in these country codes will not be marked
# as being possibly spam in a foreign language.
# ok_languages all
# Mail using locales used in these country codes will not be marked
# as being possibly spam in a foreign language.
# ok_locales all
# Whitelist important senders
whitelist_from *@xyz.xx
Check you local.cf configuration parameters with:
# spamassassin –lint
# spamassassin –lint -D
Configure Postfix
Some spam checks can be configured with Postfix and/or SpamAssassin – but we recommend to do it NOTon both for performance reasons. Specially all RBL lookups should be deactived in the Postfix configuration file main.cf
strict_rfc821_envelopes = yes
disable_vrfy_command = yes
smtpd_helo_required = yes
smtpd_client_restrictions =
smtpd_helo_restrictions =
smtpd_sender_restrictions =
smtpd_recipient_restrictions =
permit_mynetworks,
permit_sasl_authenticated,
reject_unauth_destination,
reject_invalid_hostname,
reject_unauth_pipelining,
reject_non_fqdn_sender,
reject_unknown_sender_domain,
reject_non_fqdn_recipient,
reject_unknown_recipient_domain,
check_client_access hash:$config_directory/access_client,
check_sender_access hash:$config_directory/access_sender
permit
# ——————————–
# Deactivated, done in SpamAssasin
# ——————————–
# reject_rhsbl_client blackhole.securitysage.com,
# reject_rhsbl_sender blackhole.securitysage.com,
# reject_rbl_client relays.ordb.org,
# reject_rbl_client blackholes.easynet.nl,
# reject_rbl_client cbl.abuseat.org,
# reject_rbl_client proxies.blackholes.wirehub.net,
# reject_rbl_client bl.spamcop.net,
# reject_rbl_client sbl.spamhaus.org,
# reject_rbl_client opm.blitzed.org,
# reject_rbl_client dnsbl.njabl.org,
# reject_rbl_client list.dsbl.org,
# reject_rbl_client multihop.dsbl.org,
# ——————————–
# Deactivated, done in SpamAssasin
# ——————————–
# Check Message Header and Body
# body_checks = regexp:$config_directory/body_checks
# header_checks = regexp:$config_directory/header_checks
Configure the SpamAssassin Daemon (spamd/spamc)
The purpose of this program is to provide a daemonized version of the SpamAssassin executable. The goal is improving throughput performance for automated mail checking. Here is a brief synopsis of howspamc/spamd work, and how to use them effectively.
The Server: spamd
spamd is the workhorse of the spamc/spamd pair — it loads an instance of the SpamAssassin filters, and then listens as a daemon for incoming requests to process messages. By default, spamd listens on port 783, but this is specifiable on the command line.
When spamd receives a connection, it spawns a child to handle the request. The child will expect to read an email message from the network socket, which should then be closed for writing on the other end (so spamdreceives an EOF). spamd will then use SA to rewrite the message, and dump the processed message back to the socket before closing the connection. The child process then dies.
In theory, this child-forking should be quite efficient, since on most OSes the fork will not actually copy any memory until the child attempts to write to a memory page, and then only the dirty page(s) will be copied. This means the entire perl engine and the SA regular expressions, etc. will only be loaded once and then be reused by all the children, saving a lot of overhead.
The Client: spamc
spamc is the client half of the pair. It should be used in place of «spamassassin» in scripts to process mail. It will read the mail from stdin, and spool it to its connection to spamd, then read the result back and print it to stdout. spamc has extremely low overhead in loading, so it should be much faster to load than the whole SpamAssassin program (and a perl VM).
Running spamd as a non-root user
Many system administrators are uncomfortable running spamd as root. A bug in spamd could provide an attacker with root privileges; a local attacker could also spoof spamc and claim to be a different user (which can be ameliorated with the –auth-ident option discussed later).
To provide additional security, spamd can be instructed to run as a non-root user. After binding its TCP port or Unix socket, spamd gives up root privileges and runs as the specified user. Ideally, you should create a new user e.g., «spamd» with its own group «spamd» and a private home directory (/home/spamd). If spamd is using a Unix domain socket, the socket will automatically have its owner set to the new user, so no changes to this path are necessary, but the directory in which the socket will be created must be writable by the user.
groupadd -g 501 spamd
useradd -u 501 -g 501 -s /sbin/nologin -d /home/spamd spamd
If you plan to use Bayesian classification (the BAYES rules) with spamd, you will need to modify/etc/mail/spamassassin/local.cf to use a shared database of tokens, by setting the «bayes_path»setting to a path all users can read and write to. You will also need to set the «bayes_file_mode» setting to 0666 so that created files are shared, too.
# Enable Bayes auto-learning
bayes_auto_learn 1
bayes_path /home/spamd/
bayes_file_mode 0666
After creating your new user, start spamd like this, as root:
/usr/bin/spamd –daemonize –username spamd –pidfile /home/spamd/spamd.pid
Integrating SpamAssassin with Postfix
Postfix is a mail transport agent written by security researcher Wietse Venema. Not surprisingly, Postfix is designed from the ground up to be a highly secure system. It consists of several components, each of which runs with least privilege and none of which trust data from the other without validating it themselves. Despite the extensive security emphasis in the system's architecture, Postfix is capable of very good performance in normal conditions; because of architectural decisions, it is also fault tolerant and capable of good performance under adverse conditions such as resource starvation. It has become a popular replacement for sendmail because it provides a compatible command-line interface. This article does not explain how to install and setup Postfix, more information can be found here.
This article explains how to integrate SpamAssassin into a Postfix-based mail server to perform spam-checking on a mail gateway.
|