Overview

The SpamAssassin system is software for analyzing email messages, determining how likely they are to be spam, and reporting its conclusions. It is a rule-based system that compares different parts of email messages with a large set of rules. Each rule adds or removes points from a message's spam score. A message with a high enough score is reported to be spam.

Many spam-checking systems are available. SpamAssassin has become popular for several reasons:

It uses a large number of different kinds of rules and weights them according to their checks. Rules that have been demonstrated to be more effective at discriminating spam from non-spam email are given higher weightings.

It is easy to tune the scores associated with each rule or to add new rules based on regular expressions.

SpamAssassin can adapt to each system's email environment, learning to recognize which senders are to be trusted and to identify new kinds of spam.

It can report spam to several different spam clearinghouses and can be configured to create spam traps—email addresses that are used only to forward spam to a clearinghouse.

SpamAssassin Scoring System

The approach, to filter spam in SpamAssassin is more sophisticated than the simple keyword matching provided by most SMTP anti-virus software. SpamAssassin uses a scoring system: messages are tagged as spam only when they have enough spam characteristics in total. This in combination with other features results in very few false positives. In our experience, a properly managed SpamAssassin installation correctly identifies 90% to 95% of spam with less than 1% false positives.

SpamAssassin doesn't block spam. Instead, it tags messages as probable spam by changing the Subject line and message headers. This is very wise: no automated system can recognize spam with 100% certainty — deciding «what is spam» is a judgment call. All automated spam filters will produce some false positives (wanted e-mail mistakenly tagged as spam) and false negatives (spam not identified as such).

SpamAssassin identifies probable spam e-mail, but leaves the choice of what to do with it up to you. You can instruct the users how to add rules to their e-mail software to delete identified messages or, better yet, move them to a folder for later review.

The method shown here only tags suspected spam. No automated deletion is performed, but the filter script can be changed without too much effort to sideline or delete suspected spam if that's what you want.

Installing SpamAssassin

SpamAssassin is build from several Perl modules. We recommend to installing it manually as described below. The SpamAssassin documentation describes how to install and configure the software. We installed each module with make, make test and make install.

Download the following modules from http://www.cpan.org/, if you search a single module,http://search.cpan.org/ can be very helpful.

Digest-1.15.tar.gz
Digest-HMAC-1.01.tar.gz
Digest-SHA1-2.11.tar.gz
Digest-SHA-5.44.tar.gz
Archive-Tar-1.30.tar.gz
Crypt-OpenSSL-RSA-0.24.tar.gz
DB_File-1.815.tar.gz
Error-0.17008.tar.gz
Geography-Countries-1.4.tar.gz
HTML-Parser-3.56.tar.gz
IO-Zlib-1.05.tar.gz
IP-Country-2.23.tar.gz
libnet-1.20.tar.gz
libwww-perl-5.805.tar.gz
Mail-DKIM-0.24.tar.gz
Mail-DomainKeys-1.0.tar.gz
Mail-SPF-Query-1.999.1.tar.gz
MailTools-1.76.tar.gz
MIME-Base64-3.07.tar.gz
Net-CIDR-Lite-0.20.tar.gz
Net-DNS-0.59.tar.gz
Net-Ident-1.20.tar.gz
Storable-2.16.tar.gz
Sys-Hostname-Long-1.4.tar.gz
Text-Diff-0.35.tar.gz
Time-HiRes-1.9707.tar.gz
DBI-1.54.tar.gz
DBD-mysql-4.004.tar.gz
Mail-SpamAssassin-3.1.8.tar.gz

Install the modules as follows:

tar xzvf <file-from-above>
cd <extracted module>
perl Makefile.PL
make
make test
make install

Configure SpamAssassin

SpamAssassin is installed «out of the box» in /usr/share/spamassassin with a good set of spam identification rules. You can specify your own settings in file /etc/mail/spamassassin/local.cf.

We recommend making some changes to local.cf right away: Whitelist well-known senders so their mail will never be identified as spam. You should whitelist the e-mail addresses of well-known legitimate senders to avoid the chance of them being mis-identified by the SpamAssassin default rules. Add«whitelist_from» settings to file /etc/mail/spamassassin/local.cf for each important client, mailing list and other known spam free senders.

# How many hits before a message is considered spam.
required_hits           5.0

# Text to prepend to subject if rewrite_subject is used
rewrite_header Subject [*****SPAM*****]

# Encapsulate spam in an attachment
report_safe             1

# Enable the Bayes system
use_bayes               1

# Enable Bayes auto-learning
bayes_auto_learn        1
bayes_path              /home/spamd/
bayes_file_mode         0666

# Enable or disable network checks
skip_rbl_checks         0
use_razor2              0
use_dcc                 0
use_pyzor               0

# Mail using languages used in these country codes will not be marked
# as being possibly spam in a foreign language.
# ok_languages            all

# Mail using locales used in these country codes will not be marked
# as being possibly spam in a foreign language.
# ok_locales              all

# Whitelist important senders
whitelist_from          *@xyz.xx

Check you local.cf configuration parameters with:

# spamassassin –lint
# spamassassin –lint -D

Configure Postfix

Some spam checks can be configured with Postfix and/or SpamAssassin – but we recommend to do it NOTon both for performance reasons. Specially all RBL lookups should be deactived in the Postfix configuration file main.cf

strict_rfc821_envelopes = yes
disable_vrfy_command = yes
smtpd_helo_required = yes
smtpd_client_restrictions =
smtpd_helo_restrictions =
smtpd_sender_restrictions =

smtpd_recipient_restrictions =
    permit_mynetworks,
    permit_sasl_authenticated,
    reject_unauth_destination,
    reject_invalid_hostname,
    reject_unauth_pipelining,
    reject_non_fqdn_sender,
    reject_unknown_sender_domain,
    reject_non_fqdn_recipient,
    reject_unknown_recipient_domain,
    check_client_access hash:$config_directory/access_client,
    check_sender_access hash:$config_directory/access_sender
    permit

# ——————————–
# Deactivated, done in SpamAssasin
# ——————————–
#    reject_rhsbl_client blackhole.securitysage.com,
#    reject_rhsbl_sender blackhole.securitysage.com,
#    reject_rbl_client relays.ordb.org,
#    reject_rbl_client blackholes.easynet.nl,
#    reject_rbl_client cbl.abuseat.org,
#    reject_rbl_client proxies.blackholes.wirehub.net,
#    reject_rbl_client bl.spamcop.net,
#    reject_rbl_client sbl.spamhaus.org,
#    reject_rbl_client opm.blitzed.org,
#    reject_rbl_client dnsbl.njabl.org,
#    reject_rbl_client list.dsbl.org,
#    reject_rbl_client multihop.dsbl.org,

# ——————————–
# Deactivated, done in SpamAssasin
# ——————————–
# Check Message Header and Body
# body_checks = regexp:$config_directory/body_checks
# header_checks = regexp:$config_directory/header_checks

Configure the SpamAssassin Daemon (spamd/spamc)

The purpose of this program is to provide a daemonized version of the SpamAssassin executable. The goal is improving throughput performance for automated mail checking. Here is a brief synopsis of howspamc/spamd work, and how to use them effectively.

The Server: spamd

spamd is the workhorse of the spamc/spamd pair — it loads an instance of the SpamAssassin filters, and then listens as a daemon for incoming requests to process messages. By default, spamd listens on port 783, but this is specifiable on the command line.

When spamd receives a connection, it spawns a child to handle the request. The child will expect to read an email message from the network socket, which should then be closed for writing on the other end (so spamdreceives an EOF). spamd will then use SA to rewrite the message, and dump the processed message back to the socket before closing the connection. The child process then dies.

In theory, this child-forking should be quite efficient, since on most OSes the fork will not actually copy any memory until the child attempts to write to a memory page, and then only the dirty page(s) will be copied. This means the entire perl engine and the SA regular expressions, etc. will only be loaded once and then be reused by all the children, saving a lot of overhead.

The Client: spamc

spamc is the client half of the pair. It should be used in place of «spamassassin» in scripts to process mail. It will read the mail from stdin, and spool it to its connection to spamd, then read the result back and print it to stdout. spamc has extremely low overhead in loading, so it should be much faster to load than the whole SpamAssassin program (and a perl VM).

Running spamd as a non-root user

Many system administrators are uncomfortable running spamd as root. A bug in spamd could provide an attacker with root privileges; a local attacker could also spoof spamc and claim to be a different user (which can be ameliorated with the –auth-ident option discussed later).

To provide additional security, spamd can be instructed to run as a non-root user. After binding its TCP port or Unix socket, spamd gives up root privileges and runs as the specified user. Ideally, you should create a new user e.g., «spamd» with its own group «spamd» and a private home directory (/home/spamd). If spamd is using a Unix domain socket, the socket will automatically have its owner set to the new user, so no changes to this path are necessary, but the directory in which the socket will be created must be writable by the user.

groupadd -g 501 spamd
useradd -u 501 -g 501 -s /sbin/nologin -d /home/spamd spamd

If you plan to use Bayesian classification (the BAYES rules) with spamd, you will need to modify/etc/mail/spamassassin/local.cf to use a shared database of tokens, by setting the «bayes_path»setting to a path all users can read and write to. You will also need to set the «bayes_file_mode» setting to 0666 so that created files are shared, too.

# Enable Bayes auto-learning
bayes_auto_learn        1
bayes_path              /home/spamd/
bayes_file_mode         0666

After creating your new user, start spamd like this, as root:

/usr/bin/spamd –daemonize –username spamd –pidfile /home/spamd/spamd.pid

Integrating SpamAssassin with Postfix

Postfix is a mail transport agent written by security researcher Wietse Venema. Not surprisingly, Postfix is designed from the ground up to be a highly secure system. It consists of several components, each of which runs with least privilege and none of which trust data from the other without validating it themselves. Despite the extensive security emphasis in the system's architecture, Postfix is capable of very good performance in normal conditions; because of architectural decisions, it is also fault tolerant and capable of good performance under adverse conditions such as resource starvation. It has become a popular replacement for sendmail because it provides a compatible command-line interface. This article does not explain how to install and setup Postfix, more information can be found here.

This article explains how to integrate SpamAssassin into a Postfix-based mail server to perform spam-checking on a mail gateway.

Spam-Checking All Incoming Internet Mail

If you want to set up a spam-checking gateway for all recipients, local or not, you need a way to perform spam-checking as mail is received, before final delivery. Postfix provides a general-purpose filtering directive called content_filter.

The content_filter directive specifies a mail transport that Postfix will invoke after receiving a message. The mail transport hands the message to a filtering program. The filter checks the message and then either refuses it (which will cause Postfix to generate a bounce message), discards it, or reinjects the modified message into Postfix for further delivery. Messages that pass the filter are reinjected so that Postfix can operate on them almost as if they were new messages; this allows Postfix to behave properly if the content filter rewrites message headers.

Content filters can be programs that are invoked for each message. They read a message on standard input and reinject filtered messages via the sendmail program. SpamAssassin itself is not suitable for use as a content filter, because it doesn't know how to reinject a tagged message. However, SpamAssassin can be invoked by a content filter in several ways.

Create your own Content Filter

Postfix receives unfiltered mail from the network with the smtpd server, and delivers unfiltered mail to the SpamAssassin content filter with the Postfix pipe delivery agent. The content filter injects filtered mail back into Postfix with the Postfix sendmail command, so that Postfix can deliver it to the final destination.

This means that mail submitted via the Postfix sendmail command cannot be content filtered again.

The content filters are programs that accept messages on standard input, perform spam-checking, and either exit with an error status code or reinject the message to Postfix. To use a program as a content filter requires a series of steps:

1. To use external filtering with Postfix, first add a new a Unix group on the server named «filter».
Next, add a new user account named «filter» on the server and make it a member of group «filter". No other user should belong to group «filter».

groupadd -g 500 filter
useradd -u 500 -g 500 -d /home/filer -s /bin/false filter

filter:x:500:500:Spam Filter Owner:/home/filter:/bin/false

2. Create a program that can accept an email message on standard input, perform filtering, and pass the modified message to sendmail's standard input. The filter should also return an appropriate status code, usually the exit code from sendmail, which Postfix will understand.

Here's an example of a filter script called spamchk that calls spamd – the daemonized version of SpamAssassin using spamc.

The filer «spamc» receives the content on standard input. If the content filter program finds a problem, the mail is bounced by terminating with exit status 69 (EX_UNAVAILABLE). Postfix will return the message to the sender as undeliverable. If the content is OK, it is given as input to the Postfix sendmail command, and the exit status of the filter command is whatever exit status the Postfix sendmail command produces. Postfix will deliver the message as usual.

#!/bin/sh

# —————————————————————–
# File:        spamchk
#
# Purpose:     SPAMASSASIN shell-based filter
#
# Location:    /usr/local/bin
#
# Usage:       Call this script from master.cf (Postfix)
#
# Certified:   GENTOO Linux, Spamassassin 3.0, Postfix
# —————————————————————–

# Variables
SENDMAIL="/usr/local/postfix/sendmail/sendmail -i"
EGREP=/bin/egrep

# Exit codes from <sysexits.h>
EX_UNAVAILABLE=69

# Number of *'s in X-Spam-level header needed to sideline message:
# (Eg. Score of 5.5 = "*****" )
SPAMLIMIT=10

# Clean up when done or when aborting.
trap "rm -f /var/tempfs/out.$$" 0 1 2 3 15

# Pipe message to spamc
cat | /usr/bin/spamc -u filter | sed 's/^\.$/../' > /var/tempfs/out.$$

# Are there more than $SPAMLIMIT stars in X-Spam-Level header? :
if $EGREP -q "^X-Spam-Level: \*{$SPAMLIMIT,}" < /var/tempfs/out.$$
then
# Option 1: Move high scoring messages to sideline dir so
# a human can look at them later:
# mv out.$$ $SIDELINE_DIR/`date +%Y-%m-%d_%R`-$$

# Option 2: Divert to an alternate e-mail address:
$SENDMAIL xyz@xxxx.xx < /var/tempfs/out.$$

# Option 3: Delete the message
# rm -f /var/tempfs/out.$$
else
$SENDMAIL "$@" < /var/tempfs/out.$$
fi

# Postfix returns the exit status of the Postfix sendmail command.
exit $?

Because this filter uses the spamc client, you must be running a spamd server. Save the filter somewhere publicly accessible (e.g., /usr/local/bin/spamchk) and set its permissions.

-rwxr-x— 1 root filter 2455 Nov 18 11:37 spamchk

3. Define a new mail transport in master.cf that invokes the filter you created in step 2. The following example shows how you add a transport called spamchk, defined as a Unix service. By defining the transport as shown, you specify that the mail transport will use Postfix's pipe command to run /usr/local/bin/spamchk as user filter, and will pass the email address of the sender and the email addresses of recipients as command-line arguments to spamchk. The flag argument includes the R flag (add a Return-Path header) and the q flag (quote the sender and recipient addresses for use in the command line).

#==========================================================================
# service type private unpriv chroot wakeup maxproc command + args
#               (yes)   (yes)   (yes)   (never) (100)
# ==========================================================================
spamchk   unix –       n       n       –       10      pipe
flags=Rq user=filter argv=/usr/local/bin/spamchk -f ${sender} — ${recipient}

4. Direct Postfix to use the new mail transport as a content filter for the smtpd daemon.

# ==========================================================================
# service type private unpriv chroot wakeup maxproc command + args
#               (yes)   (yes)   (yes)   (never) (100)
# ==========================================================================
# smtp      inet n       –       n       –       –       smtpd
smtp      inet n       –       n       –       –       smtpd
        -o content_filter=spamchk:dummy

5. Run postfix reload to re-read the configuration files. Test the system by sending an email from the Internet and see whether SpamAssassin is called to check the message.

postfix/smtpd[2647]: connect from brasiltelecom.net.br
postfix/smtpd[2647]: 214A52F4421: client=brasiltelecom.net.br
postfix/cleanup[2650]: 214A52F4421: message-id=<2936a2zwc30$701285710$x>
postfix/qmgr[16611]: 214A52F4421: from=<loessash@naver.com>, size=1794,
spamd[16619]: connection from localhost [127.0.0.1] at port 32791
spamd[16619]: info: setuid to filter succeeded
spamd[16619]: processing message <2936a2zwc30$701285710$x> for filter:500.
spamd[16619]: identified spam (11.2/5.0) for filter:500 in 0.3 seconds,
spamd[16619]: result: Y 11 – HELO_DYNAMIC_HCC,HELO_DYNAMIC_IPADDR2,INFO_TLD,
MIME_BOUND_DD_DIGITS,X_MESSAGE_INFO scantime=0.3,size=1785,
mid=<2936a2zwc30$701285710$x>,autolearn=no
postfix/pipe[2651]: 214A52F4421: to=<xxxx@xxxx.xx>, relay=spamchk,
delay=5, status=sent(dummy)
postfix/qmgr[16611]: 214A52F4421: removed