← Index
NYTProf Performance Profile   « line view »
For /usr/local/bin/sa-learn
  Run on Sun Nov 5 02:36:06 2017
Reported on Sun Nov 5 02:56:20 2017

Filename/usr/local/lib/perl5/site_perl/Mail/SpamAssassin/Plugin/TxRep.pm
StatementsExecuted 171883 statements in 1.63s
Subroutines
Calls P F Exclusive
Time
Inclusive
Time
Subroutine
220711655ms614sMail::SpamAssassin::Plugin::TxRep::::check_reputationMail::SpamAssassin::Plugin::TxRep::check_reputation
32121277ms620sMail::SpamAssassin::Plugin::TxRep::::check_senders_reputationMail::SpamAssassin::Plugin::TxRep::check_senders_reputation (recurses: max depth 1, inclusive time 613s)
220711189ms925msMail::SpamAssassin::Plugin::TxRep::::get_senderMail::SpamAssassin::Plugin::TxRep::get_sender
22077193.3ms615sMail::SpamAssassin::Plugin::TxRep::::check_reputationsMail::SpamAssassin::Plugin::TxRep::check_reputations
16082185.8ms301msMail::SpamAssassin::Plugin::TxRep::::add_scoreMail::SpamAssassin::Plugin::TxRep::add_score
22082184.8ms142msMail::SpamAssassin::Plugin::TxRep::::pack_addrMail::SpamAssassin::Plugin::TxRep::pack_addr
917412180.5ms80.5msMail::SpamAssassin::Plugin::TxRep::::countMail::SpamAssassin::Plugin::TxRep::count
39641154.0ms54.0msMail::SpamAssassin::Plugin::TxRep::::CORE:regcompMail::SpamAssassin::Plugin::TxRep::CORE:regcomp (opcode)
93425145.6ms45.6msMail::SpamAssassin::Plugin::TxRep::::CORE:matchMail::SpamAssassin::Plugin::TxRep::CORE:match (opcode)
11134.2ms42.3msMail::SpamAssassin::Plugin::TxRep::::BEGIN@205Mail::SpamAssassin::Plugin::TxRep::BEGIN@205
6421132.4ms40.9msMail::SpamAssassin::Plugin::TxRep::::ip_to_awl_keyMail::SpamAssassin::Plugin::TxRep::ip_to_awl_key
37975132.2ms32.2msMail::SpamAssassin::Plugin::TxRep::::totalMail::SpamAssassin::Plugin::TxRep::total
2341130.1ms665sMail::SpamAssassin::Plugin::TxRep::::learn_messageMail::SpamAssassin::Plugin::TxRep::learn_message
5121128.1ms83.5msMail::SpamAssassin::Plugin::TxRep::::remove_scoreMail::SpamAssassin::Plugin::TxRep::remove_score
22071121.1ms29.9msMail::SpamAssassin::Plugin::TxRep::::open_storagesMail::SpamAssassin::Plugin::TxRep::open_storages
25292118.7ms18.7msMail::SpamAssassin::Plugin::TxRep::::CORE:substMail::SpamAssassin::Plugin::TxRep::CORE:subst (opcode)
87114.74ms613sMail::SpamAssassin::Plugin::TxRep::::forget_messageMail::SpamAssassin::Plugin::TxRep::forget_message
111229µs1.13msMail::SpamAssassin::Plugin::TxRep::::set_configMail::SpamAssassin::Plugin::TxRep::set_config
111148µs3.21sMail::SpamAssassin::Plugin::TxRep::::modify_reputationMail::SpamAssassin::Plugin::TxRep::modify_reputation
111112µs1.32msMail::SpamAssassin::Plugin::TxRep::::newMail::SpamAssassin::Plugin::TxRep::new
11192µs5.67sMail::SpamAssassin::Plugin::TxRep::::finishMail::SpamAssassin::Plugin::TxRep::finish
11157µs515µsMail::SpamAssassin::Plugin::TxRep::::BEGIN@203Mail::SpamAssassin::Plugin::TxRep::BEGIN@203
11150µs5.67sMail::SpamAssassin::Plugin::TxRep::::learner_closeMail::SpamAssassin::Plugin::TxRep::learner_close
11144µs54µsMail::SpamAssassin::Plugin::TxRep::::BEGIN@198Mail::SpamAssassin::Plugin::TxRep::BEGIN@198
11132µs186µsMail::SpamAssassin::Plugin::TxRep::::BEGIN@206Mail::SpamAssassin::Plugin::TxRep::BEGIN@206
11132µs32µsMail::SpamAssassin::Plugin::TxRep::::__ANON__[:491]Mail::SpamAssassin::Plugin::TxRep::__ANON__[:491]
11129µs107µsMail::SpamAssassin::Plugin::TxRep::::BEGIN@201Mail::SpamAssassin::Plugin::TxRep::BEGIN@201
11128µs56µsMail::SpamAssassin::Plugin::TxRep::::BEGIN@199Mail::SpamAssassin::Plugin::TxRep::BEGIN@199
11126µs32µsMail::SpamAssassin::Plugin::TxRep::::BEGIN@200Mail::SpamAssassin::Plugin::TxRep::BEGIN@200
11123µs110µsMail::SpamAssassin::Plugin::TxRep::::BEGIN@209Mail::SpamAssassin::Plugin::TxRep::BEGIN@209
11122µs166µsMail::SpamAssassin::Plugin::TxRep::::BEGIN@207Mail::SpamAssassin::Plugin::TxRep::BEGIN@207
11118µs18µsMail::SpamAssassin::Plugin::TxRep::::BEGIN@204Mail::SpamAssassin::Plugin::TxRep::BEGIN@204
11115µs15µsMail::SpamAssassin::Plugin::TxRep::::learner_newMail::SpamAssassin::Plugin::TxRep::learner_new
0000s0sMail::SpamAssassin::Plugin::TxRep::::__ANON__[:302]Mail::SpamAssassin::Plugin::TxRep::__ANON__[:302]
0000s0sMail::SpamAssassin::Plugin::TxRep::::__ANON__[:346]Mail::SpamAssassin::Plugin::TxRep::__ANON__[:346]
0000s0sMail::SpamAssassin::Plugin::TxRep::::__ANON__[:371]Mail::SpamAssassin::Plugin::TxRep::__ANON__[:371]
0000s0sMail::SpamAssassin::Plugin::TxRep::::__ANON__[:394]Mail::SpamAssassin::Plugin::TxRep::__ANON__[:394]
0000s0sMail::SpamAssassin::Plugin::TxRep::::__ANON__[:417]Mail::SpamAssassin::Plugin::TxRep::__ANON__[:417]
0000s0sMail::SpamAssassin::Plugin::TxRep::::__ANON__[:442]Mail::SpamAssassin::Plugin::TxRep::__ANON__[:442]
0000s0sMail::SpamAssassin::Plugin::TxRep::::__ANON__[:523]Mail::SpamAssassin::Plugin::TxRep::__ANON__[:523]
0000s0sMail::SpamAssassin::Plugin::TxRep::::__ANON__[:556]Mail::SpamAssassin::Plugin::TxRep::__ANON__[:556]
0000s0sMail::SpamAssassin::Plugin::TxRep::::__ANON__[:638]Mail::SpamAssassin::Plugin::TxRep::__ANON__[:638]
0000s0sMail::SpamAssassin::Plugin::TxRep::::__ANON__[:754]Mail::SpamAssassin::Plugin::TxRep::__ANON__[:754]
0000s0sMail::SpamAssassin::Plugin::TxRep::::__ANON__[:788]Mail::SpamAssassin::Plugin::TxRep::__ANON__[:788]
0000s0sMail::SpamAssassin::Plugin::TxRep::::__ANON__[:827]Mail::SpamAssassin::Plugin::TxRep::__ANON__[:827]
0000s0sMail::SpamAssassin::Plugin::TxRep::::__ANON__[:853]Mail::SpamAssassin::Plugin::TxRep::__ANON__[:853]
0000s0sMail::SpamAssassin::Plugin::TxRep::::__ANON__[:884]Mail::SpamAssassin::Plugin::TxRep::__ANON__[:884]
0000s0sMail::SpamAssassin::Plugin::TxRep::::__ANON__[:936]Mail::SpamAssassin::Plugin::TxRep::__ANON__[:936]
0000s0sMail::SpamAssassin::Plugin::TxRep::::__ANON__[:989]Mail::SpamAssassin::Plugin::TxRep::__ANON__[:989]
0000s0sMail::SpamAssassin::Plugin::TxRep::::_fail_exitMail::SpamAssassin::Plugin::TxRep::_fail_exit
0000s0sMail::SpamAssassin::Plugin::TxRep::::_fn_envelopeMail::SpamAssassin::Plugin::TxRep::_fn_envelope
0000s0sMail::SpamAssassin::Plugin::TxRep::::_messageMail::SpamAssassin::Plugin::TxRep::_message
0000s0sMail::SpamAssassin::Plugin::TxRep::::autolearnMail::SpamAssassin::Plugin::TxRep::autolearn
0000s0sMail::SpamAssassin::Plugin::TxRep::::blacklist_addressMail::SpamAssassin::Plugin::TxRep::blacklist_address
0000s0sMail::SpamAssassin::Plugin::TxRep::::learner_expire_old_trainingMail::SpamAssassin::Plugin::TxRep::learner_expire_old_training
0000s0sMail::SpamAssassin::Plugin::TxRep::::remove_addressMail::SpamAssassin::Plugin::TxRep::remove_address
0000s0sMail::SpamAssassin::Plugin::TxRep::::whitelist_addressMail::SpamAssassin::Plugin::TxRep::whitelist_address
Call graph for these subroutines as a Graphviz dot language file.
Line State
ments
Time
on line
Calls Time
in subs
Code
1# <@LICENSE>
2# Licensed to the Apache Software Foundation (ASF) under one or more
3# contributor license agreements. See the NOTICE file distributed with
4# this work for additional information regarding copyright ownership.
5# The ASF licenses this file to you under the Apache License, Version 2.0
6# (the "License"); you may not use this file except in compliance with
7# the License. You may obtain a copy of the License at:
8#
9# http://www.apache.org/licenses/LICENSE-2.0
10#
11# Unless required by applicable law or agreed to in writing, software
12# distributed under the License is distributed on an "AS IS" BASIS,
13# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14# See the License for the specific language governing permissions and
15# limitations under the License.
16# </@LICENSE>
17
18
19=head1 NAME
20
21Mail::SpamAssassin::Plugin::TxRep - Normalize scores with sender reputation records
22
23
24=head1 SYNOPSIS
25
26The TxRep (Reputation) plugin is designed as an improved replacement of the AWL
27(Auto-Whitelist) plugin. It adjusts the final message spam score by looking up and
28taking in consideration the reputation of the sender.
29
30To try TxRep out, you B<have to> disable the AWL plugin (if present), back up its
31database and add a line loading this module in init.pre (AWL may be enabled in v310.pre):
32
33 # loadplugin Mail::SpamAssassin::Plugin::AWL
34 loadplugin Mail::SpamAssassin::Plugin::TxRep
35
36When AWL is not disabled, TxRep will refuse to run.
37
38Use the supplied 60_txreputation.cf file or add these lines to a .cf file:
39
40 header TXREP eval:check_senders_reputation()
41 describe TXREP Score normalizing based on sender's reputation
42 tflags TXREP userconf noautolearn
43 priority TXREP 1000
44
45
46=head1 DESCRIPTION
47
48This plugin is intended to replace the former AWL - AutoWhiteList. Although the
49concept and the scope differ, the purpose remains the same - the normalizing of spam
50score results based on previous sender's history. The name was intentionally changed
51from "whitelist" to "reputation" to avoid any confusion, since the result score can
52be adjusted in both directions.
53
54The TxRep plugin keeps track of the average SpamAssassin score for senders.
55Senders are tracked using multiple identificators, or their combinations: the From:
56email address, the originating IP and/or an originating block of IPs, sender's domain
57name, the DKIM signature, and the HELO name. TxRep then uses the average score to reduce
58the variability in scoring from message to message, and modifies the final score by
59pushing the result towards the historical average. This improves the accuracy of
60filtering for most email.
61
62In comparison with the original AWL plugin, several conceptual changes were implemented
63in TxRep:
64
651. B<Scoring> - at AWL, although it tracks the number of messages received from each
66respective sender, when calculating the corrective score at a new message, it does
67not take it in count in any way. So for example a sender who previously sent a single
68ham message with the score of -5, and then sends a second one with the score of +10,
69AWL will issue a corrective score bringing the score towards the -5. With the default
70C<auto_whitelist_factor> of 0.5, the resulting score would be only 2.5. And it would be
71exactly the same even if the sender previously sent 1,000 messages with the average of
72-5. TxRep tries to take the maximal advantage of the collected data, and adjusts the
73final score not only with the mean reputation score stored in the database, but also
74respecting the number of messages already seen from the sender. You can see the exact
75formula in the section L</C<txrep_factor>>.
76
772. B<Learning> - AWL ignores any spam/ham learning. In fact it acts against it, which
78often leads to a frustrating situation, where a user repeatedly tags all messages of a
79given sender as spam (resp. ham), but at any new message from the sender, AWL will
80adjust the score of the message back to the historical average which does B<not> include
81the learned scores. This is now changed at TxRep, and every spam/ham learning will be
82recorded in the reputation database, and hence taken in consideration at future email
83from the respective sender. See the section L</"LEARNING SPAM / HAM"> for more details.
84
853. B<Auto-Learning> - in certain situations SpamAssassin may declare a message an
86obvious spam resp. ham, and launch the auto-learning process, so that the message can be
87re-evaluated. AWL, by design, did not perform any auto-learning adjustments. This plugin
88will readjust the stored reputation by the value defined by L</C<txrep_learn_penalty>>
89resp. L</C<txrep_learn_bonus>>. Auto-learning score thresholds may be tuned, or the
90auto-learning completely disabled, through the setting L</C<txrep_autolearn>>.
91
924. B<Relearning> - messages that were wrongly learned or auto-learned, can be relearned.
93Old reputations are removed from the database, and new ones added instead of them. The
94relearning works better when message tracking is enabled through the
95L</C<txrep_track_messages>> option. Without it, the relearned score is simply added to
96the reputation, without removing the old ones.
97
985. B<Aging> - with AWL, any historical record of given sender has the same weight. It
99means that changes in senders behavior, or modified SA rules may take long time, or
100be virtually negated by the AWL normalization, especially at senders with high count
101of past messages, and low recent frequency. It also turns to be particularly
102counterproductive when the administrator detects new patterns in certain messages, and
103applies new rules to better tag such messages as spam or ham. AWL will practically
104eliminate the effect of the new rules, by adjusting the score back towards the (wrong)
105historical average. Only setting the C<auto_whitelist_factor> lower would help, but in
106the same time it would also reduce the overall impact of AWL, and put doubts on its
107purpose. TxRep, besides the L</C<txrep_factor>> (replacement of the C<auto_whitelist_factor>),
108introduces also the L</C<txrep_dilution_factor>> to help coping with this issue by
109progressively reducing the impact of past records. More details can be found in the
110description of the factor below.
111
1126. B<Blacklisting and Whitelisting> - when a whitelisting or blacklisting was requested
113through SpamAssassin's API, AWL adjusts the historical total score of the plain email
114address without IP (and deleted records bound to an IP), but since during the reception
115new records with IP will be added, the blacklisted entry would cease acting during
116scanning. TxRep always uses the record of th plain email address without IP together
117with the one bound to an IP address, DKIM signature, or SPF pass (unless the weight
118factor for the EMAIL reputation is set to zero). AWL uses the score of 100 (resp. -100)
119for the blacklisting (resp. whitelisting) purposes. TxRep increases the value
120proportionally to the weight factor of the EMAIL reputation. It is explained in details
121in the section L</BLACKLISTING / WHITELISTING>. TxRep can blacklist or whitelist also
122IP addresses, domain names, and dotless HELO names.
123
1247. B<Sender Identification> - AWL identifies a sender on the basis of the email address
125used, and the originating IP address (better told its part defined by the mask setting).
126The main purpose of this measure is to avoid assigning false good scores to spammers who
127spoof known email addresses. The disadvantage appears at senders who send from frequently
128changing locations or even when connecting through dynamical IP addresses that are not
129within the block defined by the mask setting. Their score is difficult or sometimes
130impossible to track. Another disadvantage is, for example, at a spammer persistently
131sending spam from the same IP address, just under different email addresses. AWL will not
132find his previous scores, unless he reuses the same email address again. TxRep uses several
133identificators, and creates separate database entries for each of them. It tracks not only
134the email/IP address combination like AWL, but also the standalone email address (regardless
135of the originating IP), the standalone IP (regardless of email address used), the domain
136name of the email address, the DKIM signature, and the HELO name of the connecting PC. The
137influence of each individual identificator may be tuned up with the help of weight factors
138described in the section L</REPUTATION WEIGHTS>.
139
1408. B<Message Tracking> - TxRep (optionally) keeps track of already scanned and/or learned
141message ID's. This is useful for avoiding to strengthen the reputation score by simply
142rescanning or relearning the same message multiple times. In the same time it also allows
143the proper relearning of once wrongly learned messages, or relearning them after the
144learn penalty or bonus were changed. See the option L</C<txrep_track_messages>>.
145
1469. B<User and Global Storages> - usually it is recommended to use the per-user setup
147of SpamAssassin, because each user may have quite different requirements, and may receive
148quite different sort of email. Especially when using the Bayesian and AWL plugins,
149the efficiency is much better when SpamAssassin is learned spam and ham separately
150for each user. However, the disadvantage is that senders and emails already learned
151many times by different users, will need to be relearned without any recognized history,
152anytime they arrive to another user. TxRep uses the advantages of both systems. It can
153use dual storages: the global common storage, where all email processed by SpamAssassin
154is recorded, and a local storage separate for each user, with reputation data from his
155email only. See more details at the setting L</C<txrep_user2global_ratio>>.
156
15710. B<Outbound Whitelisting> - when a local user sends messages to an email address, we
158assume that he needs to see the eventual answer too, hence the recipient's address should
159be whitelisted. When SpamAssassin is used for scanning outgoing email too, when local
160users use the SMTP server where SA is installed, for sending email, and when internal
161networks are defined, TxREP will improve the reputation of all 'To:' and 'CC' addresses
162from messages originating in the internal networks. Details can be found at the setting
163L</C<txrep_whitelist_out>>.
164
165Both plugins (AWL and TxREP) cannot coexist. It is necessary to disable the AWL to allow
166TxRep running. TxRep reuses the database handling of the original AWL module, and some
167its parameters bound to the database handler modules. By default, TxRep creates its own
168database, but the original auto-whitelist can be reused as a starting point. The AWL
169database can be renamed to the name defined in TxRep settings, and TxRep will start
170using it. The original auto-whitelist database has to be backed up, to allow switching
171back to the original state.
172
173The spamassassin/Plugin/TxRep.pm file replaces both spamassassin/Plugin/AWL.pm and
174spamassassin/AutoWhitelist.pm. Another two AWL files, spamassassin/DBBasedAddrList.pm
175and spamassassin/SQLBasedAddrList.pm are still needed.
176
177
178=head1 TEMPLATE TAGS
179
180This plugin module adds the following C<tags> that can be used as
181placeholders in certain options. See L<Mail::SpamAssassin::Conf>
182for more information on TEMPLATE TAGS.
183
184 _TXREP_XXX_Y_ TXREP modifier
185 _TXREP_XXX_Y_MEAN_ Mean score on which TXREP modification is based
186 _TXREP_XXX_Y_COUNT_ Number of messages on which TXREP modification is based
187 _TXREP_XXX_Y_PRESCORE_ Score before TXREP
188 _TXREP_XXX_Y_UNKNOW_ New sender (not found in the TXREP list)
189
190The XXX part of the tag takes the form of one of the following IDs, depending
191on the reputation checked: EMAIL, EMAIL_IP, IP, DOMAIN, or HELO. The _Y appendix
192ID is used only in the case of dual storage, and takes the form of either _U (for
193user storage reputations), or _G (for global storage reputations).
194
195=cut # ....................................................................
196package Mail::SpamAssassin::Plugin::TxRep;
197
198272µs264µs
# spent 54µs (44+10) within Mail::SpamAssassin::Plugin::TxRep::BEGIN@198 which was called: # once (44µs+10µs) by Mail::SpamAssassin::PluginHandler::load_plugin at line 198
use strict;
# spent 54µs making 1 call to Mail::SpamAssassin::Plugin::TxRep::BEGIN@198 # spent 10µs making 1 call to strict::import
199270µs284µs
# spent 56µs (28+28) within Mail::SpamAssassin::Plugin::TxRep::BEGIN@199 which was called: # once (28µs+28µs) by Mail::SpamAssassin::PluginHandler::load_plugin at line 199
use warnings;
# spent 56µs making 1 call to Mail::SpamAssassin::Plugin::TxRep::BEGIN@199 # spent 28µs making 1 call to warnings::import
200280µs238µs
# spent 32µs (26+6) within Mail::SpamAssassin::Plugin::TxRep::BEGIN@200 which was called: # once (26µs+6µs) by Mail::SpamAssassin::PluginHandler::load_plugin at line 200
use bytes;
# spent 32µs making 1 call to Mail::SpamAssassin::Plugin::TxRep::BEGIN@200 # spent 6µs making 1 call to bytes::import
201280µs2185µs
# spent 107µs (29+78) within Mail::SpamAssassin::Plugin::TxRep::BEGIN@201 which was called: # once (29µs+78µs) by Mail::SpamAssassin::PluginHandler::load_plugin at line 201
use re 'taint';
# spent 107µs making 1 call to Mail::SpamAssassin::Plugin::TxRep::BEGIN@201 # spent 78µs making 1 call to re::import
202
2033138µs3972µs
# spent 515µs (57+457) within Mail::SpamAssassin::Plugin::TxRep::BEGIN@203 which was called: # once (57µs+457µs) by Mail::SpamAssassin::PluginHandler::load_plugin at line 203
use NetAddr::IP 4.000; # qw(:upper);
# spent 515µs making 1 call to Mail::SpamAssassin::Plugin::TxRep::BEGIN@203 # spent 421µs making 1 call to NetAddr::IP::import # spent 37µs making 1 call to version::_VERSION
204271µs118µs
# spent 18µs within Mail::SpamAssassin::Plugin::TxRep::BEGIN@204 which was called: # once (18µs+0s) by Mail::SpamAssassin::PluginHandler::load_plugin at line 204
use Mail::SpamAssassin::Plugin;
# spent 18µs making 1 call to Mail::SpamAssassin::Plugin::TxRep::BEGIN@204
2052401µs142.3ms
# spent 42.3ms (34.2+8.06) within Mail::SpamAssassin::Plugin::TxRep::BEGIN@205 which was called: # once (34.2ms+8.06ms) by Mail::SpamAssassin::PluginHandler::load_plugin at line 205
use Mail::SpamAssassin::Plugin::Bayes;
# spent 42.3ms making 1 call to Mail::SpamAssassin::Plugin::TxRep::BEGIN@205
206267µs2340µs
# spent 186µs (32+154) within Mail::SpamAssassin::Plugin::TxRep::BEGIN@206 which was called: # once (32µs+154µs) by Mail::SpamAssassin::PluginHandler::load_plugin at line 206
use Mail::SpamAssassin::Util qw(untaint_var);
# spent 186µs making 1 call to Mail::SpamAssassin::Plugin::TxRep::BEGIN@206 # spent 154µs making 1 call to Exporter::import
207270µs2310µs
# spent 166µs (22+144) within Mail::SpamAssassin::Plugin::TxRep::BEGIN@207 which was called: # once (22µs+144µs) by Mail::SpamAssassin::PluginHandler::load_plugin at line 207
use Mail::SpamAssassin::Logger;
# spent 166µs making 1 call to Mail::SpamAssassin::Plugin::TxRep::BEGIN@207 # spent 144µs making 1 call to Exporter::import
208
209212.6ms2197µs
# spent 110µs (23+87) within Mail::SpamAssassin::Plugin::TxRep::BEGIN@209 which was called: # once (23µs+87µs) by Mail::SpamAssassin::PluginHandler::load_plugin at line 209
use vars qw(@ISA);
# spent 110µs making 1 call to Mail::SpamAssassin::Plugin::TxRep::BEGIN@209 # spent 87µs making 1 call to vars::import
210120µs@ISA = qw(Mail::SpamAssassin::Plugin);
211
212
213###########################################################################
214
# spent 1.32ms (112µs+1.21) within Mail::SpamAssassin::Plugin::TxRep::new which was called: # once (112µs+1.21ms) by Mail::SpamAssassin::PluginHandler::load_plugin at line 1 of (eval 42)[Mail/SpamAssassin/PluginHandler.pm:129]
sub new { # constructor: register the eval rule
215###########################################################################
21613µs my ($class, $main) = @_;
217
21813µs $class = ref($class) || $class;
219113µs132µs my $self = $class->SUPER::new($main);
# spent 32µs making 1 call to Mail::SpamAssassin::Plugin::new
22012µs bless($self, $class);
221
222110µs $self->{main} = $main;
22315µs $self->{conf} = $main->{conf};
22415µs $self->{factor} = $main->{conf}->{txrep_factor};
22513µs $self->{ipv4_mask_len} = $main->{conf}->{txrep_ipv4_mask_len};
22613µs $self->{ipv6_mask_len} = $main->{conf}->{txrep_ipv6_mask_len};
227111µs134µs $self->register_eval_rule("check_senders_reputation");
# spent 34µs making 1 call to Mail::SpamAssassin::Plugin::register_eval_rule
22819µs11.13ms $self->set_config($main->{conf});
# spent 1.13ms making 1 call to Mail::SpamAssassin::Plugin::TxRep::set_config
229
230 # only the default conf loaded here, do nothing here requiring
231 # the runtime settings
23218µs111µs dbg("TxRep: new object created");
# spent 11µs making 1 call to Mail::SpamAssassin::Logger::dbg
233110µs return $self;
234}
235
236
237###########################################################################
238
# spent 1.13ms (229µs+904µs) within Mail::SpamAssassin::Plugin::TxRep::set_config which was called: # once (229µs+904µs) by Mail::SpamAssassin::Plugin::TxRep::new at line 228
sub set_config {
239###########################################################################
24012µs my($self, $conf) = @_;
24112µs my @cmds;
242
243# -------------------------------------------------------------------------
244=head1 USER PREFERENCES
245
246The following options can be used in both site-wide (C<local.cf>) and
247user-specific (C<user_prefs>) configuration files to customize how
248SpamAssassin handles incoming email messages.
249
250=over 4
251
252=item B<use_txrep>
253
254 0 | 1 (default: 0)
255
256Whether to use TxRep reputation system. TxRep tracks the long-term average
257score for each sender and then shifts the score of new messages toward that
258long-term average. This can increase or decrease the score for messages,
259depending on the long-term behavior of the particular correspondent.
260
261Note that certain tests are ignored when determining the final message score:
262
263 - rules with tflags set to 'noautolearn'
264
265=cut # ...................................................................
26618µs push (@cmds, {
267 setting => 'use_txrep',
268 default => 0,
269 type => $Mail::SpamAssassin::Conf::CONF_TYPE_BOOL
270 });
271
272
273# -------------------------------------------------------------------------
274=item B<txrep_factor>
275
276 range [0..1] (default: 0.5)
277
278How much towards the long-term mean for the sender to regress a message.
279Basically, the algorithm is to track the long-term total score and the count
280of messages for the sender (C<total> and C<count>), and then once we have
281otherwise fully calculated the score for this message (C<score>), we calculate
282the final score for the message as:
283
284 finalscore = score + factor * (total + score)/(count + 1)
285
286So if C<factor> = 0.5, then we'll move to half way between the calculated
287score and the new mean value. If C<factor> = 0.3, then we'll move about 1/3
288of the way from the score toward the mean. C<factor> = 1 means use the
289long-term mean including also the new unadjusted score; C<factor> = 0 mean
290just use the calculated score, disabling so the score averaging, though still
291recording the reputation to the database.
292
293=cut # ...................................................................
294 push (@cmds, {
295 setting => 'txrep_factor',
296 default => 0.5,
297 type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
298 code => sub {
299 my ($self, $key, $value, $line) = @_;
300 if ($value < 0 || $value > 1.0) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;}
301 $self->{txrep_factor} = $value;
302 }
303115µs });
304
305
306# -------------------------------------------------------------------------
307=item B<txrep_dilution_factor>
308
309 range [0.7..1.0] (default: 0.98)
310
311At any new email from given sender, the historical reputation records are "diluted",
312or "watered down" by certain fraction given by this factor. It means that the
313influence of old records will progressively diminish with every new message from
314given sender. This is important to allow a more flexible handling of changes in
315sender's behavior, or new improvements or changes of local SA rules.
316
317Without any dilution expiry (dilution factor set to 1), the new message score is
318simply add to the total score of given sender in the reputation database. When
319dilution is used (factor < 1), the impact of the historical reputation average is
320reduced by the factor before calculating the new average, which in turn is then
321used to adjust the new total score to be stored in the database.
322
323 newtotal = (oldcount + 1) * (newscore + dilution * oldtotal) / (dilution * oldcount + 1)
324
325In other words, it means that the older a message is, the less and less impact
326on the new average its original spam score has. For example if we set the factor
327to 0.9 (meaning dilution by 10%), the score of the new message will be recorded
328to its 100%, the last score of the same sender to 90%, the second last to 81%
329(0.9 * 0.9 = 0.81), and for example the 10th last message just to 35%.
330
331At stable systems, we recommend keeping the factor close to 1 (but still lower
332than 1). At systems where SA rules tuning and spam learning is still in progress,
333lower factors will help the reputation to quicker adapt any modifications. In
334the same time, it will also reduce the impact of the historical reputation
335though.
336
337=cut # ...................................................................
338 push (@cmds, {
339 setting => 'txrep_dilution_factor',
340 default => 0.98,
341 type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
342 code => sub {
343 my ($self, $key, $value, $line) = @_;
344 if ($value < 0.7 || $value > 1.0) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;}
345 $self->{txrep_dilution_factor} = $value;
346 }
34719µs });
348
349
350# TODO, not implemented yet, hence no advertising until then
351# -------------------------------------------------------------------------
352#=item B<txrep_expiry_days>
353#
354# range [0..10000] (default: 365)
355#
356#The scores of of messages can be removed from the total reputation, and the
357#message tracking entry removed from the database after given number of days.
358#It helps keeping the database growth under control, and it also reduces the
359#influence of old scores on the current reputation (both scoring methods, and
360#sender's behavior might have changed over time).
361#
362#=cut # ...................................................................
363 push (@cmds, {
364 setting => 'txrep_expiry_days',
365 default => 365,
366 type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
367 code => sub {
368 my ($self, $key, $value, $line) = @_;
369 if ($value < 0 || $value > 10000) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;}
370 $self->{txrep_expiry_days} = $value;
371 }
37218µs });
373
374
375# -------------------------------------------------------------------------
376=item B<txrep_learn_penalty>
377
378 range [0..200] (default: 20)
379
380When SpamAssassin is trained a SPAM message, the given penalty score will
381be added to the total reputation score of the sender, regardless of the real
382spam score. The impact of the penalty will be the smaller the higher is the
383number of messages that the sender already has in the TxRep database.
384
385=cut # ...................................................................
386 push (@cmds, {
387 setting => 'txrep_learn_penalty',
388 default => 20,
389 type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
390 code => sub {
391 my ($self, $key, $value, $line) = @_;
392 if ($value < 0 || $value > 200) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;}
393 $self->{txrep_learn_penalty} = $value;
394 }
39519µs });
396
397
398# -------------------------------------------------------------------------
399=item B<txrep_learn_bonus>
400
401 range [0..200] (default: 20)
402
403When SpamAssassin is trained a HAM message, the given penalty score will be
404deduced from the total reputation score of the sender, regardless of the real
405spam score. The impact of the penalty will be the smaller the higher is the
406number of messages that the sender already has in the TxRep database.
407
408=cut # ...................................................................
409 push (@cmds, {
410 setting => 'txrep_learn_bonus',
411 default => 20,
412 type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
413 code => sub {
414 my ($self, $key, $value, $line) = @_;
415 if ($value < 0 || $value > 200) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;}
416 $self->{txrep_learn_bonus} = $value;
417 }
41818µs });
419
420
421# -------------------------------------------------------------------------
422=item B<txrep_autolearn>
423
424 range [0..5] (default: 0)
425
426When SpamAssassin declares a message a clear spam resp. ham during the mesage
427scan, and launches the auto-learn process, sender reputation scores of given
428message will be adjusted by the value of the option L</C<txrep_learn_penalty>>,
429resp. the L</C<txrep_learn_bonus>> in the same way as during the manual learning.
430Value 0 at this option disables the auto-learn reputation adjustment - only the
431score calculated before the auto-learn will be stored to the reputation database.
432
433=cut # ...................................................................
434 push (@cmds, {
435 setting => 'txrep_autolearn',
436 default => 0,
437 type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
438 code => sub {
439 my ($self, $key, $value, $line) = @_;
440 if ($value < 0 || $value > 5) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;}
441 $self->{txrep_autolearn} = $value;
442 }
443110µs });
444
445
446# -------------------------------------------------------------------------
447=item B<txrep_track_messages>
448
449 0 | 1 (default: 1)
450
451Whether TxRep should keep track of already scanned and/or learned messages.
452When enabled, an additional record in the reputation database will be created
453to avoid false score adjustments due to repeated scanning of the same message,
454and to allow proper relearning of messages that were either previously wrongly
455learned, or need to be relearned after modifying the learn penalty or bonus.
456
457=cut # ...................................................................
45814µs push (@cmds, {
459 setting => 'txrep_track_messages',
460 default => 1,
461 type => $Mail::SpamAssassin::Conf::CONF_TYPE_BOOL
462 });
463
464
465# -------------------------------------------------------------------------
466=item B<txrep_whitelist_out>
467
468 range [0..200] (default: 10)
469
470When the value of this setting is greater than zero, recipients of messages sent from
471within the internal networks will be whitelisted through improving their total reputation
472score with the number of points defined by this setting. Since the IP address and other
473sender identificators are not known when sending the email, only the reputation of the
474standalone email is being whitelisted. The domain name is intentionally also left
475unaffected. The outbound whitelisting can only work when SpamAssassin is set up to scan
476also outgoing email, when local users use the SMTP server for sending email, and when
477C<internal_networks> are defined in SpamAssassin configuration. The improving of the
478reputation happens at every message sent from internal networks, so the more messages is
479being sent to the recipient, the better reputation his email address will have.
480
481
482=cut # ...................................................................
483 push (@cmds, {
484 setting => 'txrep_whitelist_out',
485 default => 10,
486 type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
487
# spent 32µs within Mail::SpamAssassin::Plugin::TxRep::__ANON__[/usr/local/lib/perl5/site_perl/Mail/SpamAssassin/Plugin/TxRep.pm:491] which was called: # once (32µs+0s) by Mail::SpamAssassin::Conf::Parser::parse at line 438 of Mail/SpamAssassin/Conf/Parser.pm
code => sub {
48816µs my ($self, $key, $value, $line) = @_;
48914µs if ($value < 0 || $value > 200) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;}
490115µs $self->{txrep_whitelist_out} = $value;
491 }
49219µs });
493
494
495# -------------------------------------------------------------------------
496=item B<txrep_ipv4_mask_len>
497
498 range [0..32] (default: 16)
499
500The AWL database keeps only the specified number of most-significant bits
501of an IPv4 address in its fields, so that different individual IP addresses
502within a subnet belonging to the same owner are managed under a single
503database record. As we have no information available on the allocated
504address ranges of senders, this CIDR mask length is only an approximation.
505The default is 16 bits, corresponding to a former class B. Increase the
506number if a finer granularity is desired, e.g. to 24 (class C) or 32.
507A value 0 is allowed but is not particularly useful, as it would treat the
508whole internet as a single organization. The number need not be a multiple
509of 8, any split is allowed.
510
511=cut # ...................................................................
512 push (@cmds, {
513 setting => 'txrep_ipv4_mask_len',
514 default => 16,
515 type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
516 code => sub {
517 my ($self, $key, $value, $line) = @_;
518 if (!defined $value || $value eq '')
519 {return $Mail::SpamAssassin::Conf::MISSING_REQUIRED_VALUE;}
520 elsif ($value !~ /^\d+$/ || $value < 0 || $value > 32)
521 {return $Mail::SpamAssassin::Conf::INVALID_VALUE;}
522 $self->{txrep_ipv4_mask_len} = $value;
523 }
52419µs });
525
526
527# -------------------------------------------------------------------------
528=item B<txrep_ipv6_mask_len>
529
530 range [0..128] (default: 48)
531
532The AWL database keeps only the specified number of most-significant bits
533of an IPv6 address in its fields, so that different individual IP addresses
534within a subnet belonging to the same owner are managed under a single
535database record. As we have no information available on the allocated address
536ranges of senders, this CIDR mask length is only an approximation. The default
537is 48 bits, corresponding to an address range commonly allocated to individual
538(smaller) organizations. Increase the number for a finer granularity, e.g.
539to 64 or 96 or 128, or decrease for wider ranges, e.g. 32. A value 0 is
540allowed but is not particularly useful, as it would treat the whole internet
541as a single organization. The number need not be a multiple of 4, any split
542is allowed.
543
544=cut # ...................................................................
545 push (@cmds, {
546 setting => 'txrep_ipv6_mask_len',
547 default => 48,
548 type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
549 code => sub {
550 my ($self, $key, $value, $line) = @_;
551 if (!defined $value || $value eq '')
552 {return $Mail::SpamAssassin::Conf::MISSING_REQUIRED_VALUE;}
553 elsif ($value !~ /^\d+$/ || $value < 0 || $value > 128)
554 {return $Mail::SpamAssassin::Conf::INVALID_VALUE;}
555 $self->{txrep_ipv6_mask_len} = $value;
556 }
557113µs });
558
559
560# -------------------------------------------------------------------------
561=item B<user_awl_sql_override_username>
562
563 string (default: undefined)
564
565Used by the SQLBasedAddrList storage implementation.
566
567If this option is set the SQLBasedAddrList module will override the set
568username with the value given. This can be useful for implementing global
569or group based TxRep databases.
570
571=cut # ...................................................................
57214µs push (@cmds, {
573 setting => 'user_awl_sql_override_username',
574 default => '',
575 type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING
576 });
577
578
579# -------------------------------------------------------------------------
580=item B<txrep_user2global_ratio>
581
582 range [0..10] (default: 0)
583
584When the option txrep_user2global_ratio is set to a value greater than zero, and
585if the server configuration allows it, two data storages will be used - user and
586global (server-wide) storages.
587
588User storage keeps only senders who send messages to the respective recipient,
589and will reflect also the corrected/learned scores, when some messages are marked
590by the user as spam or ham, or when the sender is whitelisted or blacklisted
591through the API of SpamAssassin.
592
593Global storage keeps the reputation data of all messages processed by SpamAssassin
594with their spam scores and spam/ham learning data from all users on the server.
595Hence, the module will return a reputation value even at senders not known to the
596current recipient, as long as he already sent email to anyone else on the server.
597
598The value of the txrep_user2global_ratio parameter controls the impact of each
599of the two reputations. When equal to 1, both the global and the user score will
600have the same impact on the result. When set to 2, the reputation taken from
601the user storage will have twice the impact of the global value. The final value
602of the TXREP tag will be calculated as follows:
603
604 total = ( ratio * user + global ) / ( ratio + 1 )
605
606When no reputation is found in the user storage, and a global reputation is
607available, the global storage is used fully, without applying the ratio.
608
609When the ratio is set to zero, only the default storage will be used. And it
610then depends whether you use the global, or the local user storage by default,
611which in turn is controlled either by the parameter user_awl_sql_override_username
612(in case of SQL storage), or the C</auto_whitelist_path> parameter (in case of
613Berkeley database).
614
615When this dual storage is enabled, and no global storage is defined by the
616above mentioned parameters for the Berkeley or SQL databases, TxRep will attempt
617to use a generic storage - user 'GLOBAL' in case of SQL, and in the case of
618Berkeley database it uses the path defined by '__local_state_dir__/tx-reputation',
619which typically renders into /var/db/spamassassin/tx-reputation. When the default
620storages are not available, or are not writable, you would have to set the global
621storage with the help of the C<user_awl_sql_override_username> resp.
622C<auto_whitelist_path settings>.
623
624Please note that some SpamAssassin installations run always under the same user
625ID. In such case it is pointless enabling the dual storage, because it would
626maximally lead to two identical global storages in different locations.
627
628This feature is disabled by default.
629=cut # ...................................................................
630 push (@cmds, {
631 setting => 'txrep_user2global_ratio',
632 default => 0,
633 type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING,
634 code => sub {
635 my ($self, $key, $value, $line) = @_;
636 if ($value < 0 || $value > 10) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;}
637 $self->{txrep_user2global_ratio} = $value;
638 }
63918µs });
640
641
642# -------------------------------------------------------------------------
643=item B<auto_whitelist_distinguish_signed>
644
645 (default: 1 - enabled)
646
647Used by the SQLBasedAddrList storage implementation.
648
649If this option is set the SQLBasedAddrList module will keep separate
650database entries for DKIM-validated e-mail addresses and for non-validated
651ones. A pre-requisite when setting this option is that a field awl.signedby
652exists in a SQL table, otherwise SQL operations will fail (which is why we
653need this option at all - for compatibility with pre-3.3.0 database schema).
654A plugin DKIM should also be enabled, as otherwise there is no benefit from
655turning on this option.
656
657=cut # ...................................................................
65814µs push (@cmds, {
659 setting => 'auto_whitelist_distinguish_signed',
660 default => 1,
661 type => $Mail::SpamAssassin::Conf::CONF_TYPE_BOOL
662 });
663
664
665=item B<txrep_spf>
666
667 0 | 1 (default: 1)
668
669When enabled, TxRep will treat any IP address using a given email address as
670the same authorized identity, and will not associate any IP address with it.
671(The same happens with valid DKIM signatures. No option available for DKIM).
672
673Note: at domains that define the useless SPF +all (pass all), no IP would be
674ever associated with the email address, and all addresses (incl. the froged
675ones) would be treated as coming from the authorized source. However, such
676domains are hopefuly rare, and ask for this kind of treatment anyway.
677
678=back
679
680=cut # ...................................................................
68114µs push (@cmds, {
682 setting => 'txrep_spf',
683 default => 1,
684 type => $Mail::SpamAssassin::Conf::CONF_TYPE_BOOL
685 });
686
687
688# -------------------------------------------------------------------------
689=head2 REPUTATION WEIGHTS
690
691The overall reputation of the sender comprises several elements:
692
693=over 4
694
695=item 1) The reputation of the 'From' email address bound to the originating IP
696 address fraction (see the mask parameters for details)
697
698=item 2) The reputation of the 'From' email address alone (regardless the IP
699 address being currently used)
700
701=item 3) The reputation of the domain name of the 'From' email address
702
703=item 4) The reputation of the originating IP address, regardless of sender's email address
704
705=item 5) The reputation of the HELO name of the originating computer (if available)
706
707=back
708
709Each of these partial reputations is weighted with the help of these parameters,
710and the overall reputation is calculation as the sum of the individual
711reputations divided by the sum of all their weights:
712
713 sender_reputation = weight_email * rep_email +
714 weight_email_ip * rep_email_ip +
715 weight_domain * rep_domain +
716 weight_ip * rep_ip +
717 weight_helo * rep_helo
718
719You can disable the individual partial reputations by setting their respective
720weight to zero. This will also reduce the size of the database, since each
721partial reputation requires a separate entry in the database table. Disabling
722some of the partial reputations in this way may also help with the performance
723on busy servers, because the respective database lookups and processing will
724be skipped too.
725
726=over 4
727
728=item B<txrep_weight_email>
729
730 range [0..10] (default: 3)
731
732This weight factor controls the influence of the reputation of the standalone
733email address, regardless of the originating IP address. When adjusting the
734weight, you need to keep on mind that an email address can be easily spoofed,
735and hence spammers can use 'from' email addresses belonging to senders with
736good reputation. From this point of view, the email address bound to the
737originating IP address is a more reliable indicator for the overall reputation.
738
739On the other hand, some reputable senders may be sending from a bigger number
740of IP addresses, so looking for the reputation of the standalone email address
741without regarding the originating IP has some sense too.
742
743We recommend using a relatively low value for this partial reputation.
744
745=cut # ...................................................................
746 push (@cmds, {
747 setting => 'txrep_weight_email',
748 default => 3,
749 type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
750 code => sub {
751 my ($self, $key, $value, $line) = @_;
752 if ($value < 0 || $value > 10) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;}
753 $self->{txrep_weight_email} = $value;
754 }
75518µs });
756
757# -------------------------------------------------------------------------
758=item B<txrep_weight_email_ip>
759
760 range [0..10] (default: 10)
761
762This is the standard reputation used in the same way as it was by the original
763AWL plugin. Each sender's email address is bound to the originating IP, or
764its part as defined by the txrep_ipv4_mask_len or txrep_ipv6_mask_len parameters.
765
766At a user sending from multiple locations, diverse mail servers, or from a dynamic
767IP range out of the masked block, his email address will have a separate reputation
768value for each of the different (partial) IP addresses.
769
770When the option auto_whitelist_distinguish_signed is enabled, in contrary to
771the original AWL module, TxRep does not record the IP address when DKIM
772signature is detected. The email address is then not bound to any IP address, but
773rather just to the DKIM signature, since it is considered that it authenticates
774the sender more reliably than the IP address (which can also vary).
775
776This is by design the most relevant reputation, and its weight should be kept
777high.
778
779=cut # ...................................................................
780 push (@cmds, {
781 setting => 'txrep_weight_email_ip',
782 default => 10,
783 type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
784 code => sub {
785 my ($self, $key, $value, $line) = @_;
786 if ($value < 0 || $value > 10) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;}
787 $self->{txrep_weight_email_ip} = $value;
788 }
78918µs });
790
791# -------------------------------------------------------------------------
792=item B<txrep_weight_domain>
793
794 range [0..10] (default: 2)
795
796Some spammers may use always their real domain name in the email address,
797just with multiple or changing local parts. This reputation will record the
798spam scores of all messages send from the respective domain, regardless of
799the local part (user name) used.
800
801Similarly as with the email_ip reputation, the domain reputation is also
802bound to the originating address (or a masked block, if mask parameters used).
803It avoids giving false reputation based on spoofed email addresses.
804
805In case of a DKIM signature detected, the signature signer is used instead
806of the domain name extracted from the email address. It is considered that
807the signing authority is responsible for sending email of any domain name,
808hence the same reputation applies here.
809
810The domain reputation will give relevant picture about the owner of the
811domain in case of small servers, or corporation with strict policies, but
812will be less relevant for freemailers like Gmail, Hotmail, and similar,
813because both ham and spam may be sent by their users.
814
815The default value is set relatively low. Higher weight values may be useful,
816but we recommend caution and observing the scores before increasing it.
817
818=cut # ...................................................................
819 push (@cmds, {
820 setting => 'txrep_weight_domain',
821 default => 2,
822 type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
823 code => sub {
824 my ($self, $key, $value, $line) = @_;
825 if ($value < 0 || $value > 10) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;}
826 $self->{txrep_weight_domain} = $value;
827 }
82817µs });
829
830# -------------------------------------------------------------------------
831=item B<txrep_weight_ip>
832
833 range [0..10] (default: 4)
834
835Spammers can send through the same relay (incl. compromised hosts) under a
836multitude of email addresses. This is the exact case when the IP reputation
837can help. This reputation is a kind of a local RBL.
838
839The weight is set by default lower than for the email_IP reputation, because
840there may be cases when the same IP address hosts both spammers and acceptable
841senders (for example the marketing department of a company sends you spam, but
842you still need to get messages from their billing address).
843
844=cut # ...................................................................
845 push (@cmds, {
846 setting => 'txrep_weight_ip',
847 default => 4,
848 type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
849 code => sub {
850 my ($self, $key, $value, $line) = @_;
851 if ($value < 0 || $value > 10) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;}
852 $self->{txrep_weight_ip} = $value;
853 }
85418µs });
855
856# -------------------------------------------------------------------------
857=item B<txrep_weight_helo>
858
859 range [0..10] (default: 0.5)
860
861Big number of spam messages come from compromised hosts, often personal computers,
862or top-boxes. Their NetBIOS names are usually used as the HELO name when connecting
863to your mail server. Some of the names are pretty generic and hence may be shared by
864a big number of hosts, but often the names are quite unique and may be a good
865indicator for detecting a spammer, despite that he uses different email and IP
866addresses (spam can come also from portable devices).
867
868No IP address is bound to the HELO name when stored to the reputation database.
869This is intentional, and despite the possibility that numerous devices may share
870some of the HELO names.
871
872This option is still considered experimental, hence the low weight value, but after
873some testing it could be likely at least slightly increased.
874
875=cut # ...................................................................
876 push (@cmds, {
877 setting => 'txrep_weight_helo',
878 default => 0.5,
879 type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
880 code => sub {
881 my ($self, $key, $value, $line) = @_;
882 if ($value < 0 || $value > 10) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;}
883 $self->{txrep_weight_helo} = $value;
884 }
88517µs });
886
887
888# -------------------------------------------------------------------------
889=back
890
891=head1 ADMINISTRATOR SETTINGS
892
893These settings differ from the ones above, in that they are considered 'more
894privileged' -- even more than the ones in the B<PRIVILEGED SETTINGS> section.
895No matter what C<allow_user_rules> is set to, these can never be set from a
896user's C<user_prefs> file.
897
898=over 4
899
900=item B<txrep_factory module>
901
902 (default: Mail::SpamAssassin::DBBasedAddrList)
903
904Select alternative database factory module for the TxRep database.
905
906=cut # ...................................................................
90715µs push (@cmds, {
908 setting => 'txrep_factory',
909 is_admin => 1,
910 default => 'Mail::SpamAssassin::DBBasedAddrList',
911 type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING
912 });
913
914
915# -------------------------------------------------------------------------
916=item B<auto_whitelist_path /path/filename>
917
918 (default: ~/.spamassassin/tx-reputation)
919
920This is the TxRep directory and filename. By default, each user
921has their own reputation database in their C<~/.spamassassin> directory with
922mode 0700. For system-wide SpamAssassin use, you may want to share this
923across all users.
924
925=cut # ...................................................................
926 push (@cmds, {
927 setting => 'auto_whitelist_path',
928 is_admin => 1,
929 default => '__userstate__/tx-reputation',
930 type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING,
931 code => sub {
932 my ($self, $key, $value, $line) = @_;
933 unless (defined $value && $value !~ /^$/) {return $Mail::SpamAssassin::Conf::MISSING_REQUIRED_VALUE;}
934 if (-d $value) {return $Mail::SpamAssassin::Conf::INVALID_VALUE; }
935 $self->{txrep_path} = $value;
936 }
93719µs });
938
939
940# -------------------------------------------------------------------------
941=item B<auto_whitelist_db_modules Module ...>
942
943 (default: see below)
944
945What database modules should be used for the TxRep storage database
946file. The first named module that can be loaded from the Perl include path
947will be used. The format is:
948
949 PreferredModuleName SecondBest ThirdBest ...
950
951ie. a space-separated list of Perl module names. The default is:
952
953 DB_File GDBM_File SDBM_File
954
955NDBM_File is not supported (see SpamAssassin bug 4353).
956
957=cut # ...................................................................
95815µs push (@cmds, {
959 setting => 'auto_whitelist_db_modules',
960 is_admin => 1,
961 default => 'DB_File GDBM_File SDBM_File',
962 type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING
963 });
964
965
966# -------------------------------------------------------------------------
967=item B<auto_whitelist_file_mode>
968
969 (default: 0700)
970
971The file mode bits used for the TxRep directory or file.
972
973Make sure you specify this using the 'x' mode bits set, as it may also be used
974to create directories. However, if a file is created, the resulting file will
975not have any execute bits set (the umask is set to 0111).
976
977=cut # ...................................................................
978 push (@cmds, {
979 setting => 'auto_whitelist_file_mode',
980 is_admin => 1,
981 default => '0700',
982 type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
983 code => sub {
984 my ($self, $key, $value, $line) = @_;
985 if ($value !~ /^0?[0-7]{3}$/) {
986 return $Mail::SpamAssassin::Conf::INVALID_VALUE;
987 }
988 $self->{txrep_file_mode} = untaint_var($value);
989 }
99018µs });
991
992
993# -------------------------------------------------------------------------
994=item B<user_awl_dsn DBI:databasetype:databasename:hostname:port>
995
996Used by the SQLBasedAddrList storage implementation.
997
998This will set the DSN used to connect. Example:
999C<DBI:mysql:spamassassin:localhost>
1000
1001=cut # ...................................................................
100214µs push (@cmds, {
1003 setting => 'user_awl_dsn',
1004 is_admin => 1,
1005 type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING
1006 });
1007
1008
1009# -------------------------------------------------------------------------
1010=item B<user_awl_sql_username username>
1011
1012Used by the SQLBasedAddrList storage implementation.
1013
1014The authorized username to connect to the above DSN.
1015
1016=cut # ...................................................................
101714µs push (@cmds, {
1018 setting => 'user_awl_sql_username',
1019 is_admin => 1,
1020 type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING
1021 });
1022
1023
1024# -------------------------------------------------------------------------
1025=item B<user_awl_sql_password password>
1026
1027Used by the SQLBasedAddrList storage implementation.
1028
1029The password for the database username, for the above DSN.
1030
1031=cut # ...................................................................
103213µs push (@cmds, {
1033 setting => 'user_awl_sql_password',
1034 is_admin => 1,
1035 type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING
1036 });
1037
1038
1039# -------------------------------------------------------------------------
1040=item B<user_awl_sql_table tablename>
1041
1042 (default: txrep)
1043
1044Used by the SQLBasedAddrList storage implementation.
1045
1046The table name where reputation is to be stored in, for the above DSN.
1047
1048=back
1049
1050=cut # ...................................................................
1051113µs push (@cmds, {
1052 setting => 'user_awl_sql_table',
1053 is_admin => 1,
1054 default => 'txrep',
1055 type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING
1056 });
1057
1058125µs1904µs $conf->{parser}->register_commands(\@cmds);
1059}
1060
1061
1062###########################################################################
1063sub _message {
1064###########################################################################
1065 my ($self, $value, $msg) = @_;
1066 print "SpamAssassin TxRep: $value\n" if ($msg);
1067 dbg("TxRep: $value");
1068}
1069
1070
1071###########################################################################
1072sub _fail_exit {
1073###########################################################################
1074 my ($self, $err) = @_;
1075 my $eval_stat = ($err ne '') ? $err : "errno=$!";
1076 chomp $eval_stat;
1077 warn("TxRep: open of TxRep file failed: $eval_stat\n");
1078 if (!defined $self->{txKeepStoreTied}) {$self->finish();}
1079 return 0;
1080}
1081
1082
1083###########################################################################
1084sub _fn_envelope {
1085###########################################################################
1086 my ($self, $args, $value, $msg) = @_;
1087
1088 unless ($self->{main}->{conf}->{use_txrep}){ return 0;}
1089 unless ($args->{address}) {$self->_message($args->{cli_p},"failed ".$msg); return 0;}
1090
1091 my $factor = $self->{conf}->{txrep_weight_email} +
1092 $self->{conf}->{txrep_weight_email_ip} +
1093 $self->{conf}->{txrep_weight_domain} +
1094 $self->{conf}->{txrep_weight_ip} +
1095 $self->{conf}->{txrep_weight_helo};
1096 my $sign = $args->{signedby};
1097 my $id = $args->{address};
1098 if ($args->{address} =~ /,/) {
1099 $sign = $args->{address};
1100 $sign =~ s/^.*,//g;
1101 $id =~ s/,.*$//g;
1102 }
1103
1104 # simplified regex used for IP detection (possible FP at a domain is not critical)
1105 if ($id !~ /\./ && $self->{conf}->{txrep_weight_helo})
1106 {$factor /= $self->{conf}->{txrep_weight_helo}; $sign = 'helo';}
1107 elsif ($id =~ /^[a-f\d\.:]+$/ && $self->{conf}->{txrep_weight_ip})
1108 {$factor /= $self->{conf}->{txrep_weight_ip};}
1109 elsif ($id =~ /@/ && $self->{conf}->{txrep_weight_email})
1110 {$factor /= $self->{conf}->{txrep_weight_email};}
1111 elsif ($id !~ /@/ && $self->{conf}->{txrep_weight_domain})
1112 {$factor /= $self->{conf}->{txrep_weight_domain};}
1113 else {$factor = 1;}
1114
1115 $self->open_storages();
1116 my $score = (!defined $value)? undef : $factor * $value;
1117 my $status = $self->modify_reputation($id, $score, $sign);
1118 dbg("TxRep: $msg %s (score %s) %s", $id, $score || 'undef', $sign || '');
1119 eval {
1120 $self->_message($args->{cli_p}, ($status?"":"error ") . $msg . ": " . $id);
1121 if (!defined $self->{txKeepStoreTied}) {$self->finish();}
1122 1;
1123 } or return $self->_fail_exit( $@ );
1124 return $status;
1125}
1126
- -
1129# -------------------------------------------------------------------------
1130=head1 BLACKLISTING / WHITELISTING
1131
1132When asked by SpamAssassin to blacklist or whitelist a user, the TxRep
1133plugin adds a score of 100 (for blacklisting) or -100 (for whitelisting)
1134to the given sender's email address. At a plain address without any IP
1135address, the value is multiplied by the ratio of total reputation
1136weight to the EMAIL reputation weight to account for the reduced impact
1137of the standalone EMAIL reputation when calculating the overall reputation.
1138
1139 total_weight = weight_email + weight_email_ip + weight_domain + weight_ip + weight_helo
1140 blacklisted_reputation = 100 * total_weight / weight_email
1141
1142When a standalone email address is blacklisted/whitelisted, all records
1143of the email address bound to an IP address, DKIM signature, or a SPF pass
1144will be removed from the database, and only the standalone record is kept.
1145
1146Besides blacklisting/whitelisting of standalone email addresses, the same
1147method may be used also for blacklisting/whitelisting of IP addresses,
1148domain names, and HELO names (only dotless Netbios HELO names can be used).
1149
1150When whitelisting/blacklisting an email address or domain name, you can
1151bind them to a specified DKIM signature or SPF record by appending the
1152DKIM signing domain or the tag 'spf' after the ID in the following way:
1153
1154 spamassassin --add-addr-to-blacklist=spamming.biz,spf
1155 spamassassin --add-addr-to-whitelist=friend@good.org,good.org
1156
1157When a message contains both a DKIM signature and an SPF pass, the DKIM
1158signature takes the priority, so the record bound to the 'spf' tag won't
1159be checked. Only email addresses and domains can be bound to DKIM or SPF.
1160Records of IP adresses and HELO names are always without DKIM/SPF.
1161
1162In case of dual storage, the black/whitelisting is performed only in the
1163default storage.
1164
1165=cut
1166######################################################## plugin hooks #####
1167sub blacklist_address {my $self=shift; return $self->_fn_envelope(@_, 100, "blacklisting address");}
1168sub whitelist_address {my $self=shift; return $self->_fn_envelope(@_, -100, "whitelisting address");}
1169sub remove_address {my $self=shift; return $self->_fn_envelope(@_,undef, "removing address");}
1170###########################################################################
1171
1172
1173# -------------------------------------------------------------------------
1174=head1 REPUTATION LOGICS
1175
11761. The most significant sender identificator is equally as at AWL, the
1177 combination of the email address and the originating IP address, resp.
1178 its part defined by the IPv4 resp. IPv6 mask setting.
1179
11802. No IP checking for standalone EMAIL address reputation
1181
11823. No signature checking for IP reputation, and for HELO name reputation
1183
11844. The EMAIL_IP weight, and not the standalone EMAIL weight is used when
1185 no IP address is available (EMAIL_IP is the main indicator, and has
1186 the highest weight)
1187
11885. No IP checking at signed emails (signature authenticates the email
1189 instead of the IP address)
1190
11916. No IP checking at SPF pass (we assume the domain owner is responsable
1192 for all IP's he authorizes to send from, hence we use the same identity
1193 for all of them)
1194
11957. No signature used for standalone EMAIL reputation (would be redundant,
1196 since no IP is used at signed EMAIL_IP reputation, and we would store
1197 two identical hits)
1198
11998. When available, the DKIM signer is used instead of the domain name for
1200 the DOMAIN reputation
1201
12029. No IP and no signature used for HELO reputation (despite the possibility
1203 of the possible existence of multiple computers with the same HELO)
1204
120510. The full (unmasked IP) address is used (in the address field, instead the
1206 IP field) for the standalone IP reputation
1207
1208=cut
1209###########################################################################
1210
# spent 620s (277ms+620) within Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation which was called 321 times, avg 1.93s/call: # 234 times (209ms+620s) by Mail::SpamAssassin::Plugin::TxRep::learn_message at line 1782, avg 2.65s/call # 87 times (67.6ms+-67.6ms) by Mail::SpamAssassin::Plugin::TxRep::forget_message at line 1797, avg 0s/call
sub check_senders_reputation {
1211###########################################################################
1212321694µs my ($self, $pms) = @_;
1213
1214# just for the development debugging
1215# use Data::Printer;
1216# dbg("TxRep: DEBUG DUMP of pms: %s, %s", $pms, p($pms));
1217
12183211.17ms my $autolearn = defined $self->{autolearn};
12193211.07ms $self->{last_pms} = $self->{autolearn} = undef;
1220
1221321909µs return 0 unless ($self->{conf}->{use_txrep});
12223211.23ms if ($self->{conf}->{use_auto_whitelist}) {
1223 warn("TxRep: cannot run when Auto-Whitelist is enabled. Please disable it!\n");
1224 return 0;
1225 }
1226321594µs if ($autolearn && !$self->{conf}->{txrep_autolearn}) {
1227 dbg("TxRep: autolearning disabled, no more reputation adjusting, quitting");
1228 return 0;
1229 }
12303213.90ms321239ms my @from = $pms->all_from_addrs();
# spent 239ms making 321 calls to Mail::SpamAssassin::PerMsgStatus::all_from_addrs, avg 746µs/call
1231321938µs if (@from && $from[0] eq 'ignore@compiling.spamassassin.taint.org') {
1232 dbg("TxRep: no scan in lint mode, quitting");
1233 return 0;
1234 }
1235
1236321944µs my $delta = 0;
12373213.23ms3212.79ms my $timer = $self->{main}->time_method("total_txrep");
# spent 2.79ms making 321 calls to Mail::SpamAssassin::time_method, avg 9µs/call
12383211.04ms my $msgscore = (defined $self->{learning})? $self->{learning} : $pms->get_autolearn_points();
12393213.92ms3211.79s my $date = $pms->{msg}->receive_date() || $pms->{date_header_time};
# spent 1.79s making 321 calls to Mail::SpamAssassin::Message::receive_date, avg 5.59ms/call
1240 my $msg_id = $self->{msgid} ||
12413215.21ms321170ms Mail::SpamAssassin::Plugin::Bayes->get_msgid($pms->{msg}) ||
# spent 170ms making 321 calls to Mail::SpamAssassin::Plugin::Bayes::get_msgid, avg 528µs/call
1242 $pms->get('Message-Id') || $pms->get('Message-ID') || $pms->get('MESSAGE-ID') || $pms->get('MESSAGEID');
1243
12443214.00ms3218.15ms my $from = lc $pms->get('From:addr') || $pms->get('EnvelopeFrom:addr');;
# spent 8.15ms making 321 calls to Mail::SpamAssassin::PerMsgStatus::get, avg 25µs/call
124532114.8ms3211.57ms return 0 unless $from =~ /\S/;
# spent 1.57ms making 321 calls to Mail::SpamAssassin::Plugin::TxRep::CORE:match, avg 5µs/call
12463211.28ms my $domain = $from;
12473214.81ms3212.33ms $domain =~ s/^.+@//;
# spent 2.33ms making 321 calls to Mail::SpamAssassin::Plugin::TxRep::CORE:subst, avg 7µs/call
1248
1249321962µs my ($origip, $helo);
12503211.64ms if (defined $pms->{relays_trusted} || defined $pms->{relays_untrusted}) {
12516422.53ms my $trusteds = @{$pms->{relays_trusted}};
12529634.92ms foreach my $rly ( @{$pms->{relays_trusted}}, @{$pms->{relays_untrusted}} ) {
1253 # Get the last found HELO, regardless of private/public or trusted/untrusted
1254 # Avoiding a redundant duplicate entry if HELO is equal/similar to another identificator
12551426151ms792869.4ms if (defined $rly->{helo} && $rly->{helo} !~ /^\[?$rly->{ip}\]?$/ && $rly->{helo} !~ /$domain/i && $rly->{helo} !~ /$from/i ) {
# spent 54.0ms making 3964 calls to Mail::SpamAssassin::Plugin::TxRep::CORE:regcomp, avg 14µs/call # spent 15.3ms making 3964 calls to Mail::SpamAssassin::Plugin::TxRep::CORE:match, avg 4µs/call
125611123.77ms $helo = $rly->{helo};
1257 }
1258 # use only trusted ID, but use the first untrusted IP (if available) (AWL bug 6908)
1259 # at low spam scores (<2) ignore trusted/untrusted
1260 # set IP to 127.0.0.1 for any internal IP, so that it can be distinguished from none (AWL bug 6357)
126114264.65ms if ((--$trusteds >= 0 || $msgscore<2) && !$msg_id && $rly->{id}) {$msg_id = $rly->{id};}
126217468.53ms if (($trusteds >= -1 || $msgscore<2) && !$rly->{ip_private} && $rly->{ip}) {$origip = $rly->{ip};}
1263174712.1ms if ( $trusteds >= 0 && !$origip && $rly->{ip_private} && $rly->{ip}) {$origip = '127.0.0.1';}
1264 }
1265 }
1266
12673212.05ms if ($self->{conf}->{txrep_track_messages}) {
12683211.30ms if ($msg_id) {
12693213.90ms321316s my $msg_rep = $self->check_reputations($pms, 'MSG_ID', $msg_id, undef, $date, undef);
# spent 316s making 321 calls to Mail::SpamAssassin::Plugin::TxRep::check_reputations, avg 985ms/call
12703213.64ms3213.48ms if (defined $msg_rep && $self->count()) {
# spent 3.48ms making 321 calls to Mail::SpamAssassin::Plugin::TxRep::count, avg 11µs/call
12711741.26ms if (defined $self->{learning} && !defined $self->{forgetting}) {
1272 # already learned, forget only if already learned (count>1), and relearn
1273 # when only scanned (count=1), go ahead with normal rep scan
127487824µs87852µs if ($self->count() > 1) {
# spent 852µs making 87 calls to Mail::SpamAssassin::Plugin::TxRep::count, avg 10µs/call
127587231µs $self->{last_pms} = $pms; # cache the pmstatus
127687907µs87613s $self->forget_message($pms->{msg},$msg_id); # sub reentrance OK
# spent 613s making 87 calls to Mail::SpamAssassin::Plugin::TxRep::forget_message, avg 7.05s/call
1277 }
1278 } elsif ($self->{forgetting}) {
127987271µs $msgscore = $msg_rep; # forget the old stored score instead of the one got now
1280871.08ms871.23ms dbg("TxRep: forgetting stored score %0.3f of message %s", $msgscore || 'undef', $msg_id);
# spent 1.23ms making 87 calls to Mail::SpamAssassin::Logger::dbg, avg 14µs/call
1281 } else {
1282 # calculating the delta from the stored message reputation
1283 $delta = ($msgscore + $self->{conf}->{txrep_factor}*$msg_rep) / (1+$self->{conf}->{txrep_factor}) - $msgscore;
1284 if ($delta != 0) {
1285 $pms->got_hit("TXREP", "TXREP: ", ruletype => 'eval', score => sprintf("%0.3f", $delta));
1286 }
1287 dbg("TxRep: message %s already scanned, using old data; post-TxRep score: %0.3f", $msg_id, $pms->{score} || 'undef');
1288 return 0;
1289 }
1290 } # no stored reputation found, go ahead with normal rep scan
1291 } else {dbg("TxRep: no message-id available, parsing forced");}
1292 } # else no message tracking, go ahead with normal rep scan
1293
1294 # whitelists recipients at senders from internal networks after checking MSG_ID only
12953214.51ms if ( $self->{conf}->{txrep_whitelist_out} &&
1296321743µs defined $pms->{relays_internal} && @{$pms->{relays_internal}} &&
1297321726µs (!defined $pms->{relays_external} || !@{$pms->{relays_external}})
1298 ) {
1299114µs13.61ms foreach my $rcpt ($pms->all_to_addrs()) {
# spent 3.61ms making 1 call to Mail::SpamAssassin::PerMsgStatus::all_to_addrs
130019µs if ($rcpt) {
1301110µs17µs dbg("TxRep: internal sender, whitelisting recipient: $rcpt");
# spent 7µs making 1 call to Mail::SpamAssassin::Logger::dbg
1302114µs13.21s $self->modify_reputation($rcpt, -1*$self->{conf}->{txrep_whitelist_out}, undef);
1303 }
1304 }
1305 }
1306
13073214.28ms32112.5ms my $signedby = ($self->{conf}->{auto_whitelist_distinguish_signed})? $pms->get_tag('DKIMDOMAIN') : undef;
# spent 12.5ms making 321 calls to Mail::SpamAssassin::PerMsgStatus::get_tag, avg 39µs/call
1308 dbg("TxRep: active, %s pre-score: %s, autolearn score: %s, IP: %s, address: %s %s",
1309 $msg_id || '',
13103214.17ms3212.87ms $pms->{score} || '?',
# spent 2.87ms making 321 calls to Mail::SpamAssassin::Logger::dbg, avg 9µs/call
1311 $msgscore || '?',
1312 $origip || '?',
1313 $from || '?',
1314 $signedby ? "signed by $signedby" : '(unsigned)'
1315 );
1316
1317321988µs my $ip = $origip;
13183211.07ms if ($signedby) {
1319 $ip = undef;
1320 $domain = $signedby;
1321 } elsif ($pms->{spf_pass} && $self->{conf}->{txrep_spf}) {
1322 $ip = undef;
1323 $signedby = 'spf';
1324 }
1325
1326321760µs my $totalweight = 0;
13273211000µs $self->{totalweight} = $totalweight;
1328
13293212.94ms321427ms $delta += $self->check_reputations($pms, 'EMAIL_IP', $from, $ip, $signedby, $msgscore);
# spent 427ms making 321 calls to Mail::SpamAssassin::Plugin::TxRep::check_reputations, avg 1.33ms/call
13306424.16ms321332ms if ($domain) {$delta += $self->check_reputations($pms, 'DOMAIN', $domain, $ip, $signedby, $msgscore);}
# spent 332ms making 321 calls to Mail::SpamAssassin::Plugin::TxRep::check_reputations, avg 1.04ms/call
13316023.82ms281248ms if ($helo) {$delta += $self->check_reputations($pms, 'HELO', $helo, undef, 'HELO', $msgscore);}
# spent 248ms making 281 calls to Mail::SpamAssassin::Plugin::TxRep::check_reputations, avg 884µs/call
13323211.31ms if ($origip) {
13336423.94ms321312ms if (!$signedby) {$delta += $self->check_reputations($pms, 'EMAIL', $from, undef, undef, $msgscore);}
# spent 312ms making 321 calls to Mail::SpamAssassin::Plugin::TxRep::check_reputations, avg 972µs/call
13343212.67ms321303ms $delta += $self->check_reputations($pms, 'IP', $origip, undef, undef, $msgscore);
# spent 303ms making 321 calls to Mail::SpamAssassin::Plugin::TxRep::check_reputations, avg 945µs/call
1335 }
1336
1337321873µs if (!defined $self->{learning}) {
1338 $delta = ($self->{totalweight})? $self->{conf}->{txrep_factor} * $delta / $self->{totalweight} : 0;
1339 if ($delta) {
1340 $pms->got_hit("TXREP", "TXREP: ", ruletype => 'eval', score => sprintf("%0.3f", $delta));
1341 }
1342 $msgscore += $delta;
1343 if (defined $pms->{score}) {
1344 dbg("TxRep: post-TxRep score: %.3f", $pms->{score});
1345 }
1346 }
13473211.86ms if ($self->{conf}->{txrep_track_messages} && $msg_id) {
13483212.49ms321297s $self->check_reputations($pms, 'MSG_ID', $msg_id, undef, $date, $msgscore);
# spent 297s making 321 calls to Mail::SpamAssassin::Plugin::TxRep::check_reputations, avg 924ms/call
1349 }
13503211.72ms if (!defined $self->{txKeepStoreTied}) {$self->finish();}
1351
13523215.92ms return 0;
1353}
1354
1355
1356###########################################################################
1357
# spent 615s (93.3ms+614) within Mail::SpamAssassin::Plugin::TxRep::check_reputations which was called 2207 times, avg 278ms/call: # 321 times (14.4ms+316s) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1269, avg 985ms/call # 321 times (12.4ms+297s) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1348, avg 924ms/call # 321 times (13.1ms+414ms) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1329, avg 1.33ms/call # 321 times (18.2ms+314ms) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1330, avg 1.04ms/call # 321 times (12.0ms+300ms) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1333, avg 972µs/call # 321 times (12.4ms+291ms) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1334, avg 945µs/call # 281 times (10.8ms+238ms) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1331, avg 884µs/call
sub check_reputations {
1358###########################################################################
135922074.28ms my $self = shift;
136022073.66ms my $delta;
1361
1362220720.2ms220729.9ms if ($self->open_storages()) {
# spent 29.9ms making 2207 calls to Mail::SpamAssassin::Plugin::TxRep::open_storages, avg 14µs/call
1363220710.5ms if ($self->{conf}->{txrep_user2global_ratio} && $self->{user_storage} != $self->{global_storage}) {
1364 my $user = $self->check_reputation('user_storage', @_);
1365 my $global = $self->check_reputation('global_storage',@_);
1366
1367 $delta = (defined $user && $user==$user) ?
1368 ( $self->{conf}->{txrep_user2global_ratio} * $user + $global ) / ( 1 + $self->{conf}->{txrep_user2global_ratio} ) :
1369 $global;
1370 } else {
1371220719.6ms2207614s $delta = $self->check_reputation(undef,@_);
# spent 614s making 2207 calls to Mail::SpamAssassin::Plugin::TxRep::check_reputation, avg 278ms/call
1372 }
1373 }
1374220733.4ms return $delta;
1375}
1376
1377
1378###########################################################################
1379
# spent 614s (655ms+614) within Mail::SpamAssassin::Plugin::TxRep::check_reputation which was called 2207 times, avg 278ms/call: # 2207 times (655ms+614s) by Mail::SpamAssassin::Plugin::TxRep::check_reputations at line 1371, avg 278ms/call
sub check_reputation {
1380###########################################################################
1381220722.3ms my ($self, $storage, $pms, $key, $id, $ip, $signedby, $msgscore) = @_;
1382
138322074.18ms my $delta = 0;
13842207127ms my $weight = ($key eq 'MSG_ID')? 1 : eval('$pms->{main}->{conf}->{txrep_weight_'.lc($key).'}');
# spent 4.32ms executing statements in 321 string evals (merged) # spent 2.72ms executing statements in 321 string evals (merged) # spent 2.71ms executing statements in 321 string evals (merged) # spent 2.56ms executing statements in 321 string evals (merged) # spent 2.31ms executing statements in 281 string evals (merged)
1385
138622078.66ms if (defined $weight && $weight) {
138722073.68ms my $meanrep;
1388220724.5ms220720.1ms my $timer = $self->{main}->time_method('check_txrep_'.lc($key));
# spent 20.1ms making 2207 calls to Mail::SpamAssassin::time_method, avg 9µs/call
1389
139022074.48ms if (defined $storage) {
1391 $self->{checker} = $self->{$storage};
1392 }
1393220718.5ms2207925ms my $found = $self->get_sender($id, $ip, $signedby);
# spent 925ms making 2207 calls to Mail::SpamAssassin::Plugin::TxRep::get_sender, avg 419µs/call
139422076.87ms my $tag_id = (defined $storage)? uc($key.'_'.substr($storage,0,1)) : uc($key);
1395220724.1ms220722.5ms if (defined $found && $self->count()) {
# spent 22.5ms making 2207 calls to Mail::SpamAssassin::Plugin::TxRep::count, avg 10µs/call
1396172125.9ms344228.8ms $meanrep = $self->total() / $self->count();
# spent 15.7ms making 1721 calls to Mail::SpamAssassin::Plugin::TxRep::total, avg 9µs/call # spent 13.1ms making 1721 calls to Mail::SpamAssassin::Plugin::TxRep::count, avg 8µs/call
1397 }
1398220712.6ms if ($self->{learning} && defined $msgscore) {
139918866.02ms if (defined $meanrep) {
1400 # $msgscore<=>0 gives the sign of $msgscore
1401154710.5ms $msgscore += ($msgscore<=>0) * abs($meanrep);
1402 }
1403 dbg("TxRep: reputation: %s, count: %d, learning: %s, $tag_id: %s",
1404 defined $meanrep? sprintf("%.3f",$meanrep) : 'none',
1405 $self->count() || 0,
1406188655.6ms377234.7ms $self->{learning} || '',
# spent 18.8ms making 1886 calls to Mail::SpamAssassin::Logger::dbg, avg 10µs/call # spent 15.9ms making 1886 calls to Mail::SpamAssassin::Plugin::TxRep::count, avg 8µs/call
1407 $id || 'none'
1408 );
1409 } else {
1410321915µs $self->{totalweight} += $weight;
14113214.94ms4683.73ms if ($key eq 'MSG_ID' && $self->count() > 0) {
# spent 2.39ms making 321 calls to Mail::SpamAssassin::Plugin::TxRep::count, avg 7µs/call # spent 1.34ms making 147 calls to Mail::SpamAssassin::Plugin::TxRep::total, avg 9µs/call
14121742.31ms3482.55ms $delta = $self->total() / $self->count();
# spent 1.29ms making 174 calls to Mail::SpamAssassin::Plugin::TxRep::total, avg 7µs/call # spent 1.26ms making 174 calls to Mail::SpamAssassin::Plugin::TxRep::count, avg 7µs/call
14131744.83ms17412.5ms $pms->set_tag('TXREP'.$tag_id, sprintf("%2.1f",$delta));
# spent 12.5ms making 174 calls to Mail::SpamAssassin::PerMsgStatus::set_tag, avg 72µs/call
1414 } elsif (defined $self->total()) {
141514710.7ms2942.38ms $delta = ($self->total() + $msgscore) / (1 + $self->count()) - $msgscore;
# spent 1.28ms making 147 calls to Mail::SpamAssassin::Plugin::TxRep::total, avg 9µs/call # spent 1.10ms making 147 calls to Mail::SpamAssassin::Plugin::TxRep::count, avg 7µs/call
1416
14171473.64ms14710.5ms $pms->set_tag('TXREP_'.$tag_id, sprintf("%2.1f",$delta));
# spent 10.5ms making 147 calls to Mail::SpamAssassin::PerMsgStatus::set_tag, avg 71µs/call
1418147316µs if (defined $meanrep) {
1419 $pms->set_tag('TXREP_'.$tag_id.'_MEAN', sprintf("%2.1f", $meanrep));
1420 }
14211472.65ms2948.51ms $pms->set_tag('TXREP_'.$tag_id.'_COUNT', sprintf("%2.1f", $self->count()));
# spent 7.35ms making 147 calls to Mail::SpamAssassin::PerMsgStatus::set_tag, avg 50µs/call # spent 1.16ms making 147 calls to Mail::SpamAssassin::Plugin::TxRep::count, avg 8µs/call
14221471.98ms1478.20ms $pms->set_tag('TXREP_'.$tag_id.'_PRESCORE', sprintf("%2.1f", $pms->{score}));
# spent 8.20ms making 147 calls to Mail::SpamAssassin::PerMsgStatus::set_tag, avg 56µs/call
1423 } else {
1424 $pms->set_tag('TXREP_'.$tag_id.'_UNKNOWN', 1);
1425 }
14263217.01ms6425.42ms dbg("TxRep: reputation: %s, count: %d, weight: %.1f, delta: %.3f, $tag_id: %s",
# spent 2.94ms making 321 calls to Mail::SpamAssassin::Plugin::TxRep::count, avg 9µs/call # spent 2.47ms making 321 calls to Mail::SpamAssassin::Logger::dbg, avg 8µs/call
1427 defined $meanrep? sprintf("%.3f",$meanrep) : 'none',
1428 $self->count() || 0,
1429 $weight || 0,
1430 $delta || 0,
1431 $id || 'none'
1432 );
1433 }
1434220721.9ms220719.8ms $timer = $self->{main}->time_method('update_txrep_'.lc($key));
# spent 19.8ms making 2207 calls to Mail::SpamAssassin::time_method, avg 9µs/call
1435220722.7ms if (defined $msgscore) {
143618867.55ms if ($self->{forgetting}) { # forgetting a message score
14375123.74ms51283.5ms $self->remove_score($msgscore); # remove the given score and decrement the count
# spent 83.5ms making 512 calls to Mail::SpamAssassin::Plugin::TxRep::remove_score, avg 163µs/call
14385122.16ms if ($key eq 'MSG_ID') { # remove the message ID score completely
143987794µs87296s $self->{checker}->remove_entry($self->{entry});
# spent 296s making 87 calls to Mail::SpamAssassin::DBBasedAddrList::remove_entry, avg 3.41s/call
1440 }
1441 } else {
1442137410.4ms1374262ms $self->add_score($msgscore); # add the score and increment the count
# spent 262ms making 1374 calls to Mail::SpamAssassin::Plugin::TxRep::add_score, avg 191µs/call
144313747.82ms2342.36ms if ($self->{learning} && $key eq 'MSG_ID' && $self->count() eq 1) {
# spent 2.36ms making 234 calls to Mail::SpamAssassin::Plugin::TxRep::count, avg 10µs/call
14442341.61ms23438.8ms $self->add_score($msgscore); # increasing the count by 1 at a learned score (count=2)
# spent 38.8ms making 234 calls to Mail::SpamAssassin::Plugin::TxRep::add_score, avg 166µs/call
1445 } # it can be distinguished from a scanned score (count=1)
1446 }
1447 } elsif (defined $found && $self->{forgetting} && $key eq 'MSG_ID') {
144887821µs87316s $self->{checker}->remove_entry($self->{entry}); #forgetting the message ID
# spent 316s making 87 calls to Mail::SpamAssassin::DBBasedAddrList::remove_entry, avg 3.63s/call
1449 }
1450 }
145122074.05ms if (defined $storage) {$self->{checker} = $self->{default_storage};}
1452
1453220738.4ms return ($weight || 0) * ($delta || 0);
1454}
1455
- -
1458#--------------------------------------------------------------------------
1459# Database handler subroutines
1460#--------------------------------------------------------------------------
1461
1462###########################################################################
146318348143ms
# spent 80.5ms within Mail::SpamAssassin::Plugin::TxRep::count which was called 9174 times, avg 9µs/call: # 2207 times (22.5ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_reputation at line 1395, avg 10µs/call # 1886 times (15.9ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_reputation at line 1406, avg 8µs/call # 1721 times (13.1ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_reputation at line 1396, avg 8µs/call # 1608 times (13.5ms+0s) by Mail::SpamAssassin::Plugin::TxRep::add_score at line 1502, avg 8µs/call # 321 times (3.48ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1270, avg 11µs/call # 321 times (2.94ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_reputation at line 1426, avg 9µs/call # 321 times (2.39ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_reputation at line 1411, avg 7µs/call # 234 times (2.36ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_reputation at line 1443, avg 10µs/call # 174 times (1.26ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_reputation at line 1412, avg 7µs/call # 147 times (1.16ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_reputation at line 1421, avg 8µs/call # 147 times (1.10ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_reputation at line 1415, avg 7µs/call # 87 times (852µs+0s) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1274, avg 10µs/call
sub count {my $self=shift; return (defined $self->{checker})? $self->{entry}->{count} : undef;}
1464759460.6ms
# spent 32.2ms within Mail::SpamAssassin::Plugin::TxRep::total which was called 3797 times, avg 8µs/call: # 1721 times (15.7ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_reputation at line 1396, avg 9µs/call # 1608 times (12.6ms+0s) by Mail::SpamAssassin::Plugin::TxRep::add_score at line 1502, avg 8µs/call # 174 times (1.29ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_reputation at line 1412, avg 7µs/call # 147 times (1.34ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_reputation at line 1411, avg 9µs/call # 147 times (1.28ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_reputation at line 1415, avg 9µs/call
sub total {my $self=shift; return (defined $self->{checker})? $self->{entry}->{totscore} : undef;}
1465###########################################################################
1466
1467
1468###########################################################################
1469
# spent 925ms (189+736) within Mail::SpamAssassin::Plugin::TxRep::get_sender which was called 2207 times, avg 419µs/call: # 2207 times (189ms+736ms) by Mail::SpamAssassin::Plugin::TxRep::check_reputation at line 1393, avg 419µs/call
sub get_sender {
1470###########################################################################
1471220716.1ms my ($self, $addr, $origip, $signedby) = @_;
1472
147322074.97ms return unless (defined $self->{checker});
1474
1475220717.9ms2207142ms my $fulladdr = $self->pack_addr($addr, $origip);
# spent 142ms making 2207 calls to Mail::SpamAssassin::Plugin::TxRep::pack_addr, avg 64µs/call
1476220718.8ms2207574ms my $entry = $self->{checker}->get_addr_entry($fulladdr, $signedby);
# spent 574ms making 2207 calls to Mail::SpamAssassin::DBBasedAddrList::get_addr_entry, avg 260µs/call
1477220713.6ms $self->{entry} = $entry;
147822075.54ms $origip = $origip || 'none';
1479
1480220783.1ms441420.1ms if ($entry->{count}<0 || $entry->{count}=~/^(nan|)$/ || $entry->{totscore}=~/^(nan|)$/) {
# spent 20.1ms making 4414 calls to Mail::SpamAssassin::Plugin::TxRep::CORE:match, avg 5µs/call
1481 warn "TxRep: resetting bad data for ($addr, $origip), count: $entry->{count}, totscore: $entry->{totscore}\n";
1482 $self->{entry}->{count} = $self->{entry}->{totscore} = 0;
1483 }
1484220740.1ms return $self->{entry}->{count};
1485}
1486
1487
1488###########################################################################
1489
# spent 301ms (85.8+215) within Mail::SpamAssassin::Plugin::TxRep::add_score which was called 1608 times, avg 187µs/call: # 1374 times (75.1ms+187ms) by Mail::SpamAssassin::Plugin::TxRep::check_reputation at line 1442, avg 191µs/call # 234 times (10.7ms+28.1ms) by Mail::SpamAssassin::Plugin::TxRep::check_reputation at line 1444, avg 166µs/call
sub add_score {
1490###########################################################################
149116086.10ms my ($self,$score) = @_;
1492
149316084.58ms return unless (defined $self->{checker}); # no factory defined; we can't check
1494
149516084.26ms if ($score != $score) {
1496 warn "TxRep: attempt to add a $score to TxRep entry ignored\n";
1497 return; # don't try to add a NaN
1498 }
149916084.98ms $self->{entry}->{count} ||= 0;
1500
1501 # performing the dilution aging correction
1502160836.4ms321626.0ms if (defined $self->total() && defined $self->count() && defined $self->{txrep_dilution_factor}) {
# spent 13.5ms making 1608 calls to Mail::SpamAssassin::Plugin::TxRep::count, avg 8µs/call # spent 12.6ms making 1608 calls to Mail::SpamAssassin::Plugin::TxRep::total, avg 8µs/call
1503 my $diluted_total =
1504 ($self->count() + 1) *
1505 ($self->{txrep_dilution_factor} * $self->total() + $score) /
1506 ($self->{txrep_dilution_factor} * $self->count() + 1);
1507 my $corrected_score = $diluted_total - $self->total();
1508 $self->{checker}->add_score($self->{entry}, $corrected_score);
1509 } else {
1510160812.9ms1608189ms $self->{checker}->add_score($self->{entry}, $score);
# spent 189ms making 1608 calls to Mail::SpamAssassin::DBBasedAddrList::add_score, avg 118µs/call
1511 }
1512}
1513
- -
1516###########################################################################
1517
# spent 83.5ms (28.1+55.4) within Mail::SpamAssassin::Plugin::TxRep::remove_score which was called 512 times, avg 163µs/call: # 512 times (28.1ms+55.4ms) by Mail::SpamAssassin::Plugin::TxRep::check_reputation at line 1437, avg 163µs/call
sub remove_score {
1518###########################################################################
15195122.57ms my ($self,$score) = @_;
1520
15215121.13ms return unless (defined $self->{checker}); # no factory defined; we can't check
1522
15235121.52ms if ($score != $score) { # don't try to add a NaN
1524 warn "TxRep: attempt to add a $score to TxRep entry ignored\n";
1525 return;
1526 }
1527 # no reversal dilution aging correction (not easily possible),
1528 # just removing the original message score
15295122.34ms if ($self->{entry}->{count} > 2)
153058231µs {$self->{entry}->{count} -= 2;}
15314541.06ms else {$self->{entry}->{count} = 0;}
1532 # substract 2, and add a score; hence decrementing by 1
15335128.29ms51255.4ms $self->{checker}->add_score($self->{entry}, -1*$score);
# spent 55.4ms making 512 calls to Mail::SpamAssassin::DBBasedAddrList::add_score, avg 108µs/call
1534}
1535
- -
1538###########################################################################
1539
# spent 3.21s (148µs+3.20) within Mail::SpamAssassin::Plugin::TxRep::modify_reputation which was called: # once (148µs+3.20s) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1302
sub modify_reputation {
1540###########################################################################
154116µs my ($self, $addr, $score, $signedby) = @_;
1542
154312µs return unless (defined $self->{checker}); # no factory defined; we can't check
154419µs161µs my $fulladdr = $self->pack_addr($addr, undef);
# spent 61µs making 1 call to Mail::SpamAssassin::Plugin::TxRep::pack_addr
154519µs1397µs my $entry = $self->{checker}->get_addr_entry($fulladdr, $signedby);
1546
1547 # remove any old entries (will remove per-ip entries as well)
1548 # always call this regardless, as the current entry may have 0
1549 # scores, but the per-ip one may have more
1550110µs13.20s $self->{checker}->remove_entry($entry);
1551
1552 # remove address only, no new score to add if score NaN or undef
155319µs if (defined $score && $score==$score) {
1554 # else add score. get a new entry first
1555123µs1242µs $entry = $self->{checker}->get_addr_entry($fulladdr, $signedby);
1556110µs1108µs $self->{checker}->add_score($entry, $score);
# spent 108µs making 1 call to Mail::SpamAssassin::DBBasedAddrList::add_score
1557 }
1558121µs return 1;
1559}
1560
1561
1562# connecting the primary and the secondary storage; needed only on the first run
1563# (this can't be in the constructor, since the settings are not available there)
1564###########################################################################
1565
# spent 29.9ms (21.1+8.78) within Mail::SpamAssassin::Plugin::TxRep::open_storages which was called 2207 times, avg 14µs/call: # 2207 times (21.1ms+8.78ms) by Mail::SpamAssassin::Plugin::TxRep::check_reputations at line 1362, avg 14µs/call
sub open_storages {
1566###########################################################################
156722074.02ms my $self = shift;
1568
1569220724.7ms return 1 unless (!defined $self->{default_storage});
1570
157112µs my $factory;
1572111µs if ($self->{main}->{pers_addr_list_factory}) {
1573 $factory = $self->{main}->{pers_addr_list_factory};
1574 } else {
157514µs my $type = $self->{conf}->{txrep_factory};
1576115µs15µs if ($type =~ /^([_A-Za-z0-9:]+)$/) {
# spent 5µs making 1 call to Mail::SpamAssassin::Plugin::TxRep::CORE:match
1577110µs130µs $type = untaint_var($type);
# spent 30µs making 1 call to Mail::SpamAssassin::Util::untaint_var
1578 eval 'require '.$type.';
1579 $factory = '.$type.'->new();
1580 1;'
15811163µs or do {
# spent 372µs executing statements in string eval
1582 my $eval_stat = $@ ne '' ? $@ : "errno=$!"; chomp $eval_stat;
1583 warn "TxRep: $eval_stat\n";
1584 undef $factory;
1585 };
1586116µs110µs $self->{main}->set_persistent_address_list_factory($factory) if $factory;
1587 } else {warn "TxRep: illegal factory setting\n";}
1588 }
158914µs if (defined $factory) {
1590114µs14.18ms $self->{checker} = $self->{default_storage} = $factory->new_checker($self->{main});
# spent 4.18ms making 1 call to Mail::SpamAssassin::DBBasedAddrList::new_checker
1591
159214µs if ($self->{conf}->{txrep_user2global_ratio} && !defined $self->{global_storage}) {
1593 # hack to handle the BDB and SQL factory types of the storage object
1594 # TODO: add an a method to the handler class instead
1595 my ($storage_type, $is_global);
1596
1597 if (ref($factory) =~ /SQLasedAddrList/) {
1598 $is_global = defined $self->{conf}->{user_awl_sql_override_username};
1599 $storage_type = 'SQL';
1600 if ($is_global && $self->{conf}->{user_awl_sql_override_username} eq $self->{main}->{username}) {
1601 # skip double storage if current user same as the global override
1602 $self->{user_storage} = $self->{global_storage} = $self->{default_storage};
1603 }
1604 } elsif (ref($factory) =~ /DBBasedAddrList/) {
1605 $is_global = $self->{conf}->{auto_whitelist_path} !~ /__userstate__/;
1606 $storage_type = 'DB';
1607 }
1608 if (!defined $self->{global_storage}) {
1609 my $sql_override_orig = $self->{conf}->{user_awl_sql_override_username};
1610 my $awl_path_orig = $self->{conf}->{auto_whitelist_path};
1611 if ($is_global) {
1612 $self->{conf}->{user_awl_sql_override_username} = '';
1613 $self->{conf}->{auto_whitelist_path} = '__userstate__/tx-reputation';
1614 $self->{global_storage} = $self->{default_storage};
1615 $self->{user_storage} = $factory->new_checker($self->{main});
1616 } else {
1617 $self->{conf}->{user_awl_sql_override_username} = 'GLOBAL';
1618 $self->{conf}->{auto_whitelist_path} = '__local_state_dir__/tx-reputation';
1619 $self->{global_storage} = $factory->new_checker($self->{main});
1620 $self->{user_storage} = $self->{default_storage};
1621 }
1622 $self->{conf}->{user_awl_sql_override_username} = $sql_override_orig;
1623 $self->{conf}->{auto_whitelist_path} = $awl_path_orig;
1624
1625 # Another ugly hack to find out whether the user differs from
1626 # the global one. We need to add a method to the factory handlers
1627 if ($storage_type eq 'DB' &&
1628 $self->{user_storage}->{locked_file} eq $self->{global_storage}->{locked_file}) {
1629 if ($is_global)
1630 {$self->{global_storage}->finish();}
1631 else {$self->{user_storage}->finish();}
1632 $self->{user_storage} = $self->{global_storage} = $self->{default_storage};
1633 }
1634 }
1635 }
1636 } else {
1637 $self->{user_storage} = $self->{global_storage} = $self->{checker} = $self->{default_storage} = undef;
1638 warn("TxRep: could not open storages, quitting!\n");
1639 return 0;
1640 }
1641111µs return 1;
1642}
1643
1644
1645###########################################################################
1646
# spent 5.67s (92µs+5.67) within Mail::SpamAssassin::Plugin::TxRep::finish which was called: # once (92µs+5.67s) by Mail::SpamAssassin::Plugin::TxRep::learner_close at line 1825
sub finish {
1647###########################################################################
164812µs my $self = shift;
1649
165013µs return unless (defined $self->{checker}); # no factory defined; we can't check
1651
1652150µs if ($self->{conf}->{txrep_user2global_ratio} && defined $self->{user_storage} && ($self->{user_storage} != $self->{global_storage})) {
1653 $self->{user_storage}->finish();
1654 $self->{global_storage}->finish();
1655 $self->{user_storage} = undef;
1656 $self->{global_storage} = undef;
1657 } elsif (defined $self->{default_storage}) {
1658111µs15.67s $self->{default_storage}->finish();
# spent 5.67s making 1 call to Mail::SpamAssassin::DBBasedAddrList::finish
165918µs $self->{default_storage} = $self->{checker} = undef;
1660 }
1661114µs $self->{factory} = undef;
1662}
1663
1664
1665###########################################################################
1666
# spent 40.9ms (32.4+8.54) within Mail::SpamAssassin::Plugin::TxRep::ip_to_awl_key which was called 642 times, avg 64µs/call: # 642 times (32.4ms+8.54ms) by Mail::SpamAssassin::Plugin::TxRep::pack_addr at line 1721, avg 64µs/call
sub ip_to_awl_key {
1667###########################################################################
16686422.63ms my ($self, $origip) = @_;
1669
16706421.15ms my $result;
16716423.36ms local $1;
167264216.3ms6428.54ms if (!defined $origip) {
# spent 8.54ms making 642 calls to Mail::SpamAssassin::Plugin::TxRep::CORE:match, avg 13µs/call
1673 # could not find an IP address to use
1674 } elsif ($origip =~ /^ (\d{1,3} \. \d{1,3}) \. \d{1,3} \. \d{1,3} $/xs) {
16756421.88ms my $mask_len = $self->{ipv4_mask_len};
16766421.78ms $mask_len = 16 if !defined $mask_len;
1677 # handle the default and easy cases manually
16786422.97ms if ($mask_len == 32) {$result = $origip;}
16796422.50ms elsif ($mask_len == 16) {$result = $1;}
1680 else {
1681 my $origip_obj = NetAddr::IP->new($origip . '/' . $mask_len);
1682 if (!defined $origip_obj) { # invalid IPv4 address
1683 dbg("TxRep: bad IPv4 address $origip");
1684 } else {
1685 $result = $origip_obj->network->addr;
1686 $result =~s/(\.0){1,3}\z//; # truncate zero tail
1687 }
1688 }
1689 } elsif ($origip =~ /:/ && # triage
1690 $origip =~
1691 /^ [0-9a-f]{0,4} (?: : [0-9a-f]{0,4} | \. [0-9]{1,3} ){2,9} $/xsi) {
1692 # looks like an IPv6 address
1693 my $mask_len = $self->{ipv6_mask_len};
1694 $mask_len = 48 if !defined $mask_len;
1695 my $origip_obj = NetAddr::IP->new6($origip . '/' . $mask_len);
1696 if (!defined $origip_obj) { # invalid IPv6 address
1697 dbg("TxRep: bad IPv6 address $origip");
1698 } else {
1699 $result = $origip_obj->network->full6; # string in a canonical form
1700 $result =~ s/(:0000){1,7}\z/::/; # compress zero tail
1701 }
1702 } else {
1703 dbg("TxRep: bad IP address $origip");
1704 }
17056422.69ms if (defined $result && length($result) > 39) { # just in case, keep under
1706 $result = substr($result,0,39); # the awl.ip field size
1707 }
1708# if (defined $result) {dbg("TxRep: IP masking %s -> %s", $origip || '?', $result || '?');}
17096427.70ms return $result;
1710}
1711
1712
1713###########################################################################
1714
# spent 142ms (84.8+57.3) within Mail::SpamAssassin::Plugin::TxRep::pack_addr which was called 2208 times, avg 64µs/call: # 2207 times (84.8ms+57.3ms) by Mail::SpamAssassin::Plugin::TxRep::get_sender at line 1475, avg 64µs/call # once (50µs+11µs) by Mail::SpamAssassin::Plugin::TxRep::modify_reputation at line 1544
sub pack_addr {
1715###########################################################################
171622089.12ms my ($self, $addr, $origip) = @_;
1717
171822086.96ms $addr = lc $addr;
1719220834.8ms220816.4ms $addr =~ s/[\000\;\'\"\!\|]/_/gs; # paranoia
# spent 16.4ms making 2208 calls to Mail::SpamAssassin::Plugin::TxRep::CORE:subst, avg 7µs/call
1720
1721285011.4ms64240.9ms if ( defined $origip) {$origip = $self->ip_to_awl_key($origip);}
# spent 40.9ms making 642 calls to Mail::SpamAssassin::Plugin::TxRep::ip_to_awl_key, avg 64µs/call
1722377412.3ms if (!defined $origip) {$origip = 'none';}
1723220831.7ms return $addr . "|ip=" . $origip;
1724}
1725
- -
1728# -------------------------------------------------------------------------
1729=head1 LEARNING SPAM / HAM
1730
1731When SpamAssassin is told to learn (or relearn) a given message as spam or
1732ham, all reputations relevant to the message (email, email_ip, domain, ip, helo)
1733in both global and user storages will be updated using the C<txrep_learn_penalty>
1734respectively the C<rxrep_learn_bonus> values. The new reputation of given sender
1735property (email, domain,...) will be the respective result of one of the following
1736formulas:
1737
1738 new_reputation = old_reputation + learn_penalty
1739 new_reputation = old_reputation - learn_bonus
1740
1741The TxRep plugin currently does track each message individually, hence it
1742does not detect when you learn the message repeatedly. It will add/subtract
1743the penalty/bonus score each time the message is fed to the spam learner.
1744
1745=cut
1746######################################################### plugin hook #####
1747
# spent 15µs within Mail::SpamAssassin::Plugin::TxRep::learner_new which was called: # once (15µs+0s) by Mail::SpamAssassin::PluginHandler::callback at line 204 of Mail/SpamAssassin/PluginHandler.pm
sub learner_new {
1748###########################################################################
174912µs my ($self) = @_;
1750
175116µs $self->{txKeepStoreTied} = 1;
1752110µs return $self;
1753}
1754
1755
1756######################################################### plugin hook #####
1757sub autolearn {
1758###########################################################################
1759 my ($self, $params) = @_;
1760
1761 $self->{last_pms} = $params->{permsgstatus};
1762 return $self->{autolearn} = 1;
1763}
1764
1765
1766######################################################### plugin hook #####
1767
# spent 665s (30.1ms+665) within Mail::SpamAssassin::Plugin::TxRep::learn_message which was called 234 times, avg 2.84s/call: # 234 times (30.1ms+665s) by Mail::SpamAssassin::PluginHandler::callback at line 204 of Mail/SpamAssassin/PluginHandler.pm, avg 2.84s/call
sub learn_message {
1768###########################################################################
1769234521µs my ($self, $params) = @_;
1770234692µs return 0 unless (defined $params->{isspam});
1771
17722341.61ms2341.54ms dbg("TxRep: learning a message");
# spent 1.54ms making 234 calls to Mail::SpamAssassin::Logger::dbg, avg 7µs/call
17732343.39ms23468.8ms my $pms = ($self->{last_pms})? $self->{last_pms} : Mail::SpamAssassin::PerMsgStatus->new($self->{main}, $params->{msg});
# spent 68.8ms making 234 calls to Mail::SpamAssassin::PerMsgStatus::new, avg 294µs/call
17742341.59ms if (!defined $pms->{relays_internal} && !defined $pms->{relays_external}) {
17752342.31ms23444.9s $pms->extract_message_metadata();
# spent 44.9s making 234 calls to Mail::SpamAssassin::PerMsgStatus::extract_message_metadata, avg 192ms/call
1776 }
1777
17782341.41ms if ($params->{isspam})
17792341.38ms {$self->{learning} = $self->{conf}->{txrep_learn_penalty};}
1780 else {$self->{learning} = -1 * $self->{conf}->{txrep_learn_bonus};}
1781
17822342.93ms234620s my $ret = !$self->{learning} || $self->check_senders_reputation($pms);
# spent 620s making 234 calls to Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation, avg 2.65s/call
1783234678µs $self->{learning} = undef;
178423410.7ms792.90ms return $ret;
# spent 2.90ms making 79 calls to Mail::SpamAssassin::PerMsgStatus::DESTROY, avg 37µs/call
1785}
1786
1787
1788######################################################### plugin hook #####
1789
# spent 613s (4.74ms+613) within Mail::SpamAssassin::Plugin::TxRep::forget_message which was called 87 times, avg 7.05s/call: # 87 times (4.74ms+613s) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1276, avg 7.05s/call
sub forget_message {
1790###########################################################################
179187325µs my ($self, $params) = @_;
179287290µs return 0 unless ($self->{conf}->{use_txrep});
179387304µs my $pms = ($self->{last_pms})? $self->{last_pms} : Mail::SpamAssassin::PerMsgStatus->new($self->{main}, $params->{msg});
1794
179587570µs87516µs dbg("TxRep: forgetting a message");
# spent 516µs making 87 calls to Mail::SpamAssassin::Logger::dbg, avg 6µs/call
179687242µs $self->{forgetting} = 1;
179787829µs870s my $ret = $self->check_senders_reputation($pms);
# spent 613s making 87 calls to Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation, avg 7.05s/call, recursion: max depth 1, sum of overlapping time 613s
179887462µs $self->{forgetting} = undef;
179987846µs return $ret;
1800}
1801
1802
1803######################################################### plugin hook #####
1804sub learner_expire_old_training {
1805###########################################################################
1806 my ($self, $params) = @_;
1807 return 0 unless ($self->{conf}->{use_txrep} && $self->{conf}->{txrep_expiry_days});
1808
1809 dbg("TxRep: expiry not implemented yet");
1810# dbg("TxRep: expiry starting");
1811# my $timer = $self->{main}->time_method("expire_bayes");
1812# $self->{store}->expire_old_tokens($params);
1813# dbg("TxRep: expiry completed");
1814}
1815
1816
1817######################################################### plugin hook #####
1818
# spent 5.67s (50µs+5.67) within Mail::SpamAssassin::Plugin::TxRep::learner_close which was called: # once (50µs+5.67s) by Mail::SpamAssassin::PluginHandler::callback at line 204 of Mail/SpamAssassin/PluginHandler.pm
sub learner_close {
1819###########################################################################
182012µs my ($self, $params) = @_;
182113µs my $quiet = $params->{quiet};
182214µs return 0 unless ($self->{conf}->{use_txrep});
1823
182413µs $self->{txKeepStoreTied} = undef;
1825110µs15.67s $self->finish();
# spent 5.67s making 1 call to Mail::SpamAssassin::Plugin::TxRep::finish
1826130µs119µs dbg("TxRep: learner_close");
# spent 19µs making 1 call to Mail::SpamAssassin::Logger::dbg
1827}
1828
1829
1830# -------------------------------------------------------------------------
1831=head1 OPTIMIZING TXREP
1832
1833TxRep can be optimized for speed and simplicity, or for the precision in
1834assigning the reputation scores.
1835
1836First of all TxRep can be quickly disabled and re-enabled through the option
1837L</C<use_txrep>>. It can be done globally, or individually in each respective
1838C<user_prefs>. Disabling TxRep will not destroy the database, so it can be
1839re-enabled any time later again.
1840
1841On many systems, SQL-based storage may perform faster than the default
1842Berkeley DB storage, so you should consider setting it up. See the section
1843L</SQL-BASED STORAGE> for instructions.
1844
1845Then there are multiple settings that can reduce the number of records stored
1846in the database, hence reducing the size of the storage, and also the processing
1847time:
1848
18491. Setting L</C<txrep_user2global_ratio>> to zero will disable the dual storage,
1850halving so the disk space requirements, and the processing times of this plugin.
1851
18522. You can disable all but one of the L<REPUTATION WEIGHTS>. The EMAIL_IP is
1853the most specific option, so it is the most likely choice in such case, but you
1854could base the reputation system on any of the remaining scores. Each of the
1855enabled reputations adds a new entry to the database for each new identificator.
1856So while for example the number of recorded and scored domains may be big, the
1857number of stored IP addresses will be probably higher, and would require more
1858space in the storage.
1859
18603. Disabling the L</C<txrep_track_messages>> avoids storing a separate entry
1861for every scanned message, hence also reducing the disk space requirements, and
1862the processing time.
1863
18644. Disabling the option L</C<txrep_autolearn>> will save the processing time
1865at messages that trigger the auto-learning process.
1866
18675. Disabling L</C<txrep_whitelist_out>> will reduce the processing time at
1868outbound connections.
1869
18706. Keeping the option L</C<auto_whitelist_distinguish_signed>> enabled may help
1871slightly reducing the size of the database, because at signed messages, the
1872originating IP address is ignored, hence no additional database entries are
1873needed for each separate IP address (resp. a masked block of IP addresses).
1874
1875
1876Since TxRep reuses the storage architecture of the former AWL plugin, for
1877initializing the SQL storage, the same instructions apply also to TxRep.
1878Although the old AWL table can be reused for TxRep, by default TxRep expects
1879the SQL table to be named "txrep".
1880
1881To install a new SQL table for TxRep, run the appropriate SQL file for your
1882system under the /sql directory.
1883
1884If you get a syntax error at an older version of MySQL, use TYPE=MyISAM
1885instead of ENGINE=MyISAM at the end of the command. You can also use other
1886types of ENGINE (depending on what is available on your system). For example
1887MEMORY engine stores the entire table in the server memory, achieving
1888performance similar to Redis. You would need to care about the replication
1889of the RAM table to disk through a cronjob, to avoid loss of data at reboot.
1890The InnoDB engine is used by default, offering high scalability (database
1891size and concurence of accesses). In conjunction with a high value of
1892innodb_buffer_pool or with the memcached plugin (MySQL v5.6+) it can also
1893offer performance comparable to Redis.
1894
1895=cut
1896
1897112µs1;
 
# spent 45.6ms within Mail::SpamAssassin::Plugin::TxRep::CORE:match which was called 9342 times, avg 5µs/call: # 4414 times (20.1ms+0s) by Mail::SpamAssassin::Plugin::TxRep::get_sender at line 1480, avg 5µs/call # 3964 times (15.3ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1255, avg 4µs/call # 642 times (8.54ms+0s) by Mail::SpamAssassin::Plugin::TxRep::ip_to_awl_key at line 1672, avg 13µs/call # 321 times (1.57ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1245, avg 5µs/call # once (5µs+0s) by Mail::SpamAssassin::Plugin::TxRep::open_storages at line 1576
sub Mail::SpamAssassin::Plugin::TxRep::CORE:match; # opcode
# spent 54.0ms within Mail::SpamAssassin::Plugin::TxRep::CORE:regcomp which was called 3964 times, avg 14µs/call: # 3964 times (54.0ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1255, avg 14µs/call
sub Mail::SpamAssassin::Plugin::TxRep::CORE:regcomp; # opcode
# spent 18.7ms within Mail::SpamAssassin::Plugin::TxRep::CORE:subst which was called 2529 times, avg 7µs/call: # 2208 times (16.4ms+0s) by Mail::SpamAssassin::Plugin::TxRep::pack_addr at line 1719, avg 7µs/call # 321 times (2.33ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1247, avg 7µs/call
sub Mail::SpamAssassin::Plugin::TxRep::CORE:subst; # opcode