← Index
NYTProf Performance Profile   « line view »
For /usr/local/bin/sa-learn
  Run on Sun Nov 5 03:09:29 2017
Reported on Mon Nov 6 13:20:48 2017

Filename/usr/local/lib/perl5/site_perl/Mail/SpamAssassin/Plugin/TxRep.pm
StatementsExecuted 266199 statements in 12.1s
Subroutines
Calls P F Exclusive
Time
Inclusive
Time
Subroutine
3216111.29s1759sMail::SpamAssassin::Plugin::TxRep::::check_reputationMail::SpamAssassin::Plugin::TxRep::check_reputation
46821471ms96340sMail::SpamAssassin::Plugin::TxRep::::check_senders_reputationMail::SpamAssassin::Plugin::TxRep::check_senders_reputation (recurses: max depth 1, inclusive time 49211s)
321611335ms1.44sMail::SpamAssassin::Plugin::TxRep::::get_senderMail::SpamAssassin::Plugin::TxRep::get_sender
321611300ms94571sMail::SpamAssassin::Plugin::TxRep::::open_storagesMail::SpamAssassin::Plugin::TxRep::open_storages
321671197ms96330sMail::SpamAssassin::Plugin::TxRep::::check_reputationsMail::SpamAssassin::Plugin::TxRep::check_reputations
321821147ms252msMail::SpamAssassin::Plugin::TxRep::::pack_addrMail::SpamAssassin::Plugin::TxRep::pack_addr
12732101133ms133msMail::SpamAssassin::Plugin::TxRep::::countMail::SpamAssassin::Plugin::TxRep::count
62061187.6ms87.6msMail::SpamAssassin::Plugin::TxRep::::CORE:regcompMail::SpamAssassin::Plugin::TxRep::CORE:regcomp (opcode)
13902182.6ms262msMail::SpamAssassin::Plugin::TxRep::::add_scoreMail::SpamAssassin::Plugin::TxRep::add_score
140435178.1ms78.1msMail::SpamAssassin::Plugin::TxRep::::CORE:matchMail::SpamAssassin::Plugin::TxRep::CORE:match (opcode)
13741159.1ms209msMail::SpamAssassin::Plugin::TxRep::::remove_scoreMail::SpamAssassin::Plugin::TxRep::remove_score
9361155.2ms73.1msMail::SpamAssassin::Plugin::TxRep::::ip_to_awl_keyMail::SpamAssassin::Plugin::TxRep::ip_to_awl_key
48963151.0ms51.0msMail::SpamAssassin::Plugin::TxRep::::totalMail::SpamAssassin::Plugin::TxRep::total
36862134.7ms34.7msMail::SpamAssassin::Plugin::TxRep::::CORE:substMail::SpamAssassin::Plugin::TxRep::CORE:subst (opcode)
2341134.1ms96387sMail::SpamAssassin::Plugin::TxRep::::learn_messageMail::SpamAssassin::Plugin::TxRep::learn_message
11118.9ms26.5msMail::SpamAssassin::Plugin::TxRep::::BEGIN@209Mail::SpamAssassin::Plugin::TxRep::BEGIN@209
2341112.5ms49211sMail::SpamAssassin::Plugin::TxRep::::forget_messageMail::SpamAssassin::Plugin::TxRep::forget_message
211251µs6.36sMail::SpamAssassin::Plugin::TxRep::::modify_reputationMail::SpamAssassin::Plugin::TxRep::modify_reputation
111198µs848µsMail::SpamAssassin::Plugin::TxRep::::set_configMail::SpamAssassin::Plugin::TxRep::set_config
111102µs1.02msMail::SpamAssassin::Plugin::TxRep::::newMail::SpamAssassin::Plugin::TxRep::new
11155µs67µsMail::SpamAssassin::Plugin::TxRep::::BEGIN@202Mail::SpamAssassin::Plugin::TxRep::BEGIN@202
11155µs515µsMail::SpamAssassin::Plugin::TxRep::::BEGIN@207Mail::SpamAssassin::Plugin::TxRep::BEGIN@207
11149µs1.12msMail::SpamAssassin::Plugin::TxRep::::learner_closeMail::SpamAssassin::Plugin::TxRep::learner_close
11145µs1.06msMail::SpamAssassin::Plugin::TxRep::::finishMail::SpamAssassin::Plugin::TxRep::finish
11126µs162µsMail::SpamAssassin::Plugin::TxRep::::BEGIN@210Mail::SpamAssassin::Plugin::TxRep::BEGIN@210
11123µs152µsMail::SpamAssassin::Plugin::TxRep::::BEGIN@211Mail::SpamAssassin::Plugin::TxRep::BEGIN@211
11122µs102µsMail::SpamAssassin::Plugin::TxRep::::BEGIN@213Mail::SpamAssassin::Plugin::TxRep::BEGIN@213
11122µs22µsMail::SpamAssassin::Plugin::TxRep::::__ANON__[:495]Mail::SpamAssassin::Plugin::TxRep::__ANON__[:495]
11122µs88µsMail::SpamAssassin::Plugin::TxRep::::BEGIN@205Mail::SpamAssassin::Plugin::TxRep::BEGIN@205
11121µs62µsMail::SpamAssassin::Plugin::TxRep::::BEGIN@203Mail::SpamAssassin::Plugin::TxRep::BEGIN@203
11119µs19µsMail::SpamAssassin::Plugin::TxRep::::learner_newMail::SpamAssassin::Plugin::TxRep::learner_new
11114µs14µsMail::SpamAssassin::Plugin::TxRep::::BEGIN@208Mail::SpamAssassin::Plugin::TxRep::BEGIN@208
0000s0sMail::SpamAssassin::Plugin::TxRep::::__ANON__[:306]Mail::SpamAssassin::Plugin::TxRep::__ANON__[:306]
0000s0sMail::SpamAssassin::Plugin::TxRep::::__ANON__[:350]Mail::SpamAssassin::Plugin::TxRep::__ANON__[:350]
0000s0sMail::SpamAssassin::Plugin::TxRep::::__ANON__[:375]Mail::SpamAssassin::Plugin::TxRep::__ANON__[:375]
0000s0sMail::SpamAssassin::Plugin::TxRep::::__ANON__[:398]Mail::SpamAssassin::Plugin::TxRep::__ANON__[:398]
0000s0sMail::SpamAssassin::Plugin::TxRep::::__ANON__[:421]Mail::SpamAssassin::Plugin::TxRep::__ANON__[:421]
0000s0sMail::SpamAssassin::Plugin::TxRep::::__ANON__[:446]Mail::SpamAssassin::Plugin::TxRep::__ANON__[:446]
0000s0sMail::SpamAssassin::Plugin::TxRep::::__ANON__[:527]Mail::SpamAssassin::Plugin::TxRep::__ANON__[:527]
0000s0sMail::SpamAssassin::Plugin::TxRep::::__ANON__[:560]Mail::SpamAssassin::Plugin::TxRep::__ANON__[:560]
0000s0sMail::SpamAssassin::Plugin::TxRep::::__ANON__[:642]Mail::SpamAssassin::Plugin::TxRep::__ANON__[:642]
0000s0sMail::SpamAssassin::Plugin::TxRep::::__ANON__[:763]Mail::SpamAssassin::Plugin::TxRep::__ANON__[:763]
0000s0sMail::SpamAssassin::Plugin::TxRep::::__ANON__[:797]Mail::SpamAssassin::Plugin::TxRep::__ANON__[:797]
0000s0sMail::SpamAssassin::Plugin::TxRep::::__ANON__[:836]Mail::SpamAssassin::Plugin::TxRep::__ANON__[:836]
0000s0sMail::SpamAssassin::Plugin::TxRep::::__ANON__[:862]Mail::SpamAssassin::Plugin::TxRep::__ANON__[:862]
0000s0sMail::SpamAssassin::Plugin::TxRep::::__ANON__[:893]Mail::SpamAssassin::Plugin::TxRep::__ANON__[:893]
0000s0sMail::SpamAssassin::Plugin::TxRep::::__ANON__[:945]Mail::SpamAssassin::Plugin::TxRep::__ANON__[:945]
0000s0sMail::SpamAssassin::Plugin::TxRep::::__ANON__[:998]Mail::SpamAssassin::Plugin::TxRep::__ANON__[:998]
0000s0sMail::SpamAssassin::Plugin::TxRep::::_fail_exitMail::SpamAssassin::Plugin::TxRep::_fail_exit
0000s0sMail::SpamAssassin::Plugin::TxRep::::_fn_envelopeMail::SpamAssassin::Plugin::TxRep::_fn_envelope
0000s0sMail::SpamAssassin::Plugin::TxRep::::_messageMail::SpamAssassin::Plugin::TxRep::_message
0000s0sMail::SpamAssassin::Plugin::TxRep::::autolearnMail::SpamAssassin::Plugin::TxRep::autolearn
0000s0sMail::SpamAssassin::Plugin::TxRep::::blacklist_addressMail::SpamAssassin::Plugin::TxRep::blacklist_address
0000s0sMail::SpamAssassin::Plugin::TxRep::::learner_expire_old_trainingMail::SpamAssassin::Plugin::TxRep::learner_expire_old_training
0000s0sMail::SpamAssassin::Plugin::TxRep::::remove_addressMail::SpamAssassin::Plugin::TxRep::remove_address
0000s0sMail::SpamAssassin::Plugin::TxRep::::whitelist_addressMail::SpamAssassin::Plugin::TxRep::whitelist_address
Call graph for these subroutines as a Graphviz dot language file.
Line State
ments
Time
on line
Calls Time
in subs
Code
1# <@LICENSE>
2# Licensed to the Apache Software Foundation (ASF) under one or more
3# contributor license agreements. See the NOTICE file distributed with
4# this work for additional information regarding copyright ownership.
5# The ASF licenses this file to you under the Apache License, Version 2.0
6# (the "License"); you may not use this file except in compliance with
7# the License. You may obtain a copy of the License at:
8#
9# http://www.apache.org/licenses/LICENSE-2.0
10#
11# Unless required by applicable law or agreed to in writing, software
12# distributed under the License is distributed on an "AS IS" BASIS,
13# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14# See the License for the specific language governing permissions and
15# limitations under the License.
16# </@LICENSE>
17
18
19=head1 NAME
20
21Mail::SpamAssassin::Plugin::TxRep - Normalize scores with sender reputation records
22
23
24=head1 SYNOPSIS
25
26The TxRep (Reputation) plugin is designed as an improved replacement of the AWL
27(Auto-Whitelist) plugin. It adjusts the final message spam score by looking up
28and taking in consideration the reputation of the sender.
29
30To try TxRep out, you B<have to> first disable the AWL plugin (if enabled), and
31back up its database. AWL is loaded in v310.pre and can be disabled by
32commenting out the loadplugin line:
33
34 # loadplugin Mail::SpamAssassin::Plugin::AWL
35
36When AWL is not disabled, TxRep will refuse to run.
37
38TxRep should be enabled by uncommenting the following line in v341.pre:
39
40 loadplugin Mail::SpamAssassin::Plugin::TxRep
41
42Use the supplied 60_txreputation.cf file or add these lines to a .cf file:
43
44 header TXREP eval:check_senders_reputation()
45 describe TXREP Score normalizing based on sender's reputation
46 tflags TXREP userconf noautolearn
47 priority TXREP 1000
48
49
50=head1 DESCRIPTION
51
52This plugin is intended to replace the former AWL - AutoWhiteList. Although the
53concept and the scope differ, the purpose remains the same - the normalizing of spam
54score results based on previous sender's history. The name was intentionally changed
55from "whitelist" to "reputation" to avoid any confusion, since the result score can
56be adjusted in both directions.
57
58The TxRep plugin keeps track of the average SpamAssassin score for senders.
59Senders are tracked using multiple identificators, or their combinations: the From:
60email address, the originating IP and/or an originating block of IPs, sender's domain
61name, the DKIM signature, and the HELO name. TxRep then uses the average score to reduce
62the variability in scoring from message to message, and modifies the final score by
63pushing the result towards the historical average. This improves the accuracy of
64filtering for most email.
65
66In comparison with the original AWL plugin, several conceptual changes were implemented
67in TxRep:
68
691. B<Scoring> - at AWL, although it tracks the number of messages received from each
70respective sender, when calculating the corrective score at a new message, it does
71not take it in count in any way. So for example a sender who previously sent a single
72ham message with the score of -5, and then sends a second one with the score of +10,
73AWL will issue a corrective score bringing the score towards the -5. With the default
74C<auto_whitelist_factor> of 0.5, the resulting score would be only 2.5. And it would be
75exactly the same even if the sender previously sent 1,000 messages with the average of
76-5. TxRep tries to take the maximal advantage of the collected data, and adjusts the
77final score not only with the mean reputation score stored in the database, but also
78respecting the number of messages already seen from the sender. You can see the exact
79formula in the section L</C<txrep_factor>>.
80
812. B<Learning> - AWL ignores any spam/ham learning. In fact it acts against it, which
82often leads to a frustrating situation, where a user repeatedly tags all messages of a
83given sender as spam (resp. ham), but at any new message from the sender, AWL will
84adjust the score of the message back to the historical average which does B<not> include
85the learned scores. This is now changed at TxRep, and every spam/ham learning will be
86recorded in the reputation database, and hence taken in consideration at future email
87from the respective sender. See the section L</"LEARNING SPAM / HAM"> for more details.
88
893. B<Auto-Learning> - in certain situations SpamAssassin may declare a message an
90obvious spam resp. ham, and launch the auto-learning process, so that the message can be
91re-evaluated. AWL, by design, did not perform any auto-learning adjustments. This plugin
92will readjust the stored reputation by the value defined by L</C<txrep_learn_penalty>>
93resp. L</C<txrep_learn_bonus>>. Auto-learning score thresholds may be tuned, or the
94auto-learning completely disabled, through the setting L</C<txrep_autolearn>>.
95
964. B<Relearning> - messages that were wrongly learned or auto-learned, can be relearned.
97Old reputations are removed from the database, and new ones added instead of them. The
98relearning works better when message tracking is enabled through the
99L</C<txrep_track_messages>> option. Without it, the relearned score is simply added to
100the reputation, without removing the old ones.
101
1025. B<Aging> - with AWL, any historical record of given sender has the same weight. It
103means that changes in senders behavior, or modified SA rules may take long time, or
104be virtually negated by the AWL normalization, especially at senders with high count
105of past messages, and low recent frequency. It also turns to be particularly
106counterproductive when the administrator detects new patterns in certain messages, and
107applies new rules to better tag such messages as spam or ham. AWL will practically
108eliminate the effect of the new rules, by adjusting the score back towards the (wrong)
109historical average. Only setting the C<auto_whitelist_factor> lower would help, but in
110the same time it would also reduce the overall impact of AWL, and put doubts on its
111purpose. TxRep, besides the L</C<txrep_factor>> (replacement of the C<auto_whitelist_factor>),
112introduces also the L</C<txrep_dilution_factor>> to help coping with this issue by
113progressively reducing the impact of past records. More details can be found in the
114description of the factor below.
115
1166. B<Blacklisting and Whitelisting> - when a whitelisting or blacklisting was requested
117through SpamAssassin's API, AWL adjusts the historical total score of the plain email
118address without IP (and deleted records bound to an IP), but since during the reception
119new records with IP will be added, the blacklisted entry would cease acting during
120scanning. TxRep always uses the record of the plain email address without IP together
121with the one bound to an IP address, DKIM signature, or SPF pass (unless the weight
122factor for the EMAIL reputation is set to zero). AWL uses the score of 100 (resp. -100)
123for the blacklisting (resp. whitelisting) purposes. TxRep increases the value
124proportionally to the weight factor of the EMAIL reputation. It is explained in details
125in the section L</BLACKLISTING / WHITELISTING>. TxRep can blacklist or whitelist also
126IP addresses, domain names, and dotless HELO names.
127
1287. B<Sender Identification> - AWL identifies a sender on the basis of the email address
129used, and the originating IP address (better told its part defined by the mask setting).
130The main purpose of this measure is to avoid assigning false good scores to spammers who
131spoof known email addresses. The disadvantage appears at senders who send from frequently
132changing locations or even when connecting through dynamical IP addresses that are not
133within the block defined by the mask setting. Their score is difficult or sometimes
134impossible to track. Another disadvantage is, for example, at a spammer persistently
135sending spam from the same IP address, just under different email addresses. AWL will not
136find his previous scores, unless he reuses the same email address again. TxRep uses several
137identificators, and creates separate database entries for each of them. It tracks not only
138the email/IP address combination like AWL, but also the standalone email address (regardless
139of the originating IP), the standalone IP (regardless of email address used), the domain
140name of the email address, the DKIM signature, and the HELO name of the connecting PC. The
141influence of each individual identificator may be tuned up with the help of weight factors
142described in the section L</REPUTATION WEIGHTS>.
143
1448. B<Message Tracking> - TxRep (optionally) keeps track of already scanned and/or learned
145message ID's. This is useful for avoiding to strengthen the reputation score by simply
146rescanning or relearning the same message multiple times. In the same time it also allows
147the proper relearning of once wrongly learned messages, or relearning them after the
148learn penalty or bonus were changed. See the option L</C<txrep_track_messages>>.
149
1509. B<User and Global Storages> - usually it is recommended to use the per-user setup
151of SpamAssassin, because each user may have quite different requirements, and may receive
152quite different sort of email. Especially when using the Bayesian and AWL plugins,
153the efficiency is much better when SpamAssassin is learned spam and ham separately
154for each user. However, the disadvantage is that senders and emails already learned
155many times by different users, will need to be relearned without any recognized history,
156anytime they arrive to another user. TxRep uses the advantages of both systems. It can
157use dual storages: the global common storage, where all email processed by SpamAssassin
158is recorded, and a local storage separate for each user, with reputation data from his
159email only. See more details at the setting L</C<txrep_user2global_ratio>>.
160
16110. B<Outbound Whitelisting> - when a local user sends messages to an email address, we
162assume that he needs to see the eventual answer too, hence the recipient's address should
163be whitelisted. When SpamAssassin is used for scanning outgoing email too, when local
164users use the SMTP server where SA is installed, for sending email, and when internal
165networks are defined, TxREP will improve the reputation of all 'To:' and 'CC' addresses
166from messages originating in the internal networks. Details can be found at the setting
167L</C<txrep_whitelist_out>>.
168
169Both plugins (AWL and TxREP) cannot coexist. It is necessary to disable the AWL to allow
170TxRep running. TxRep reuses the database handling of the original AWL module, and some
171its parameters bound to the database handler modules. By default, TxRep creates its own
172database, but the original auto-whitelist can be reused as a starting point. The AWL
173database can be renamed to the name defined in TxRep settings, and TxRep will start
174using it. The original auto-whitelist database has to be backed up, to allow switching
175back to the original state.
176
177The spamassassin/Plugin/TxRep.pm file replaces both spamassassin/Plugin/AWL.pm and
178spamassassin/AutoWhitelist.pm. Another two AWL files, spamassassin/DBBasedAddrList.pm
179and spamassassin/SQLBasedAddrList.pm are still needed.
180
181
182=head1 TEMPLATE TAGS
183
184This plugin module adds the following C<tags> that can be used as
185placeholders in certain options. See L<Mail::SpamAssassin::Conf>
186for more information on TEMPLATE TAGS.
187
188 _TXREP_XXX_Y_ TXREP modifier
189 _TXREP_XXX_Y_MEAN_ Mean score on which TXREP modification is based
190 _TXREP_XXX_Y_COUNT_ Number of messages on which TXREP modification is based
191 _TXREP_XXX_Y_PRESCORE_ Score before TXREP
192 _TXREP_XXX_Y_UNKNOW_ New sender (not found in the TXREP list)
193
194The XXX part of the tag takes the form of one of the following IDs, depending
195on the reputation checked: EMAIL, EMAIL_IP, IP, DOMAIN, or HELO. The _Y appendix
196ID is used only in the case of dual storage, and takes the form of either _U (for
197user storage reputations), or _G (for global storage reputations).
198
199=cut # ....................................................................
200package Mail::SpamAssassin::Plugin::TxRep;
201
202264µs279µs
# spent 67µs (55+12) within Mail::SpamAssassin::Plugin::TxRep::BEGIN@202 which was called: # once (55µs+12µs) by Mail::SpamAssassin::PluginHandler::load_plugin at line 202
use strict;
# spent 67µs making 1 call to Mail::SpamAssassin::Plugin::TxRep::BEGIN@202 # spent 12µs making 1 call to strict::import
203263µs2103µs
# spent 62µs (21+41) within Mail::SpamAssassin::Plugin::TxRep::BEGIN@203 which was called: # once (21µs+41µs) by Mail::SpamAssassin::PluginHandler::load_plugin at line 203
use warnings;
# spent 62µs making 1 call to Mail::SpamAssassin::Plugin::TxRep::BEGIN@203 # spent 41µs making 1 call to warnings::import
204# use bytes;
205268µs2155µs
# spent 88µs (22+67) within Mail::SpamAssassin::Plugin::TxRep::BEGIN@205 which was called: # once (22µs+67µs) by Mail::SpamAssassin::PluginHandler::load_plugin at line 205
use re 'taint';
# spent 88µs making 1 call to Mail::SpamAssassin::Plugin::TxRep::BEGIN@205 # spent 67µs making 1 call to re::import
206
2073129µs3975µs
# spent 515µs (55+460) within Mail::SpamAssassin::Plugin::TxRep::BEGIN@207 which was called: # once (55µs+460µs) by Mail::SpamAssassin::PluginHandler::load_plugin at line 207
use NetAddr::IP 4.000; # qw(:upper);
# spent 515µs making 1 call to Mail::SpamAssassin::Plugin::TxRep::BEGIN@207 # spent 430µs making 1 call to NetAddr::IP::import # spent 30µs making 1 call to version::_VERSION
208252µs114µs
# spent 14µs within Mail::SpamAssassin::Plugin::TxRep::BEGIN@208 which was called: # once (14µs+0s) by Mail::SpamAssassin::PluginHandler::load_plugin at line 208
use Mail::SpamAssassin::Plugin;
# spent 14µs making 1 call to Mail::SpamAssassin::Plugin::TxRep::BEGIN@208
2092404µs126.5ms
# spent 26.5ms (18.9+7.59) within Mail::SpamAssassin::Plugin::TxRep::BEGIN@209 which was called: # once (18.9ms+7.59ms) by Mail::SpamAssassin::PluginHandler::load_plugin at line 209
use Mail::SpamAssassin::Plugin::Bayes;
# spent 26.5ms making 1 call to Mail::SpamAssassin::Plugin::TxRep::BEGIN@209
210270µs2298µs
# spent 162µs (26+136) within Mail::SpamAssassin::Plugin::TxRep::BEGIN@210 which was called: # once (26µs+136µs) by Mail::SpamAssassin::PluginHandler::load_plugin at line 210
use Mail::SpamAssassin::Util qw(untaint_var);
# spent 162µs making 1 call to Mail::SpamAssassin::Plugin::TxRep::BEGIN@210 # spent 136µs making 1 call to Exporter::import
211265µs2280µs
# spent 152µs (23+128) within Mail::SpamAssassin::Plugin::TxRep::BEGIN@211 which was called: # once (23µs+128µs) by Mail::SpamAssassin::PluginHandler::load_plugin at line 211
use Mail::SpamAssassin::Logger;
# spent 152µs making 1 call to Mail::SpamAssassin::Plugin::TxRep::BEGIN@211 # spent 128µs making 1 call to Exporter::import
212
213211.9ms2182µs
# spent 102µs (22+80) within Mail::SpamAssassin::Plugin::TxRep::BEGIN@213 which was called: # once (22µs+80µs) by Mail::SpamAssassin::PluginHandler::load_plugin at line 213
use vars qw(@ISA);
# spent 102µs making 1 call to Mail::SpamAssassin::Plugin::TxRep::BEGIN@213 # spent 80µs making 1 call to vars::import
214118µs@ISA = qw(Mail::SpamAssassin::Plugin);
215
216
217###########################################################################
218
# spent 1.02ms (102µs+916µs) within Mail::SpamAssassin::Plugin::TxRep::new which was called: # once (102µs+916µs) by Mail::SpamAssassin::PluginHandler::load_plugin at line 1 of (eval 42)[Mail/SpamAssassin/PluginHandler.pm:129]
sub new { # constructor: register the eval rule
219###########################################################################
22013µs my ($class, $main) = @_;
221
22212µs $class = ref($class) || $class;
223113µs125µs my $self = $class->SUPER::new($main);
# spent 25µs making 1 call to Mail::SpamAssassin::Plugin::new
22412µs bless($self, $class);
225
226113µs $self->{main} = $main;
22713µs $self->{conf} = $main->{conf};
22813µs $self->{factor} = $main->{conf}->{txrep_factor};
22913µs $self->{ipv4_mask_len} = $main->{conf}->{txrep_ipv4_mask_len};
23012µs $self->{ipv6_mask_len} = $main->{conf}->{txrep_ipv6_mask_len};
231111µs132µs $self->register_eval_rule("check_senders_reputation");
# spent 32µs making 1 call to Mail::SpamAssassin::Plugin::register_eval_rule
23218µs1848µs $self->set_config($main->{conf});
# spent 848µs making 1 call to Mail::SpamAssassin::Plugin::TxRep::set_config
233
234 # only the default conf loaded here, do nothing here requiring
235 # the runtime settings
23618µs110µs dbg("TxRep: new object created");
# spent 10µs making 1 call to Mail::SpamAssassin::Logger::dbg
23719µs return $self;
238}
239
240
241###########################################################################
242
# spent 848µs (198+650) within Mail::SpamAssassin::Plugin::TxRep::set_config which was called: # once (198µs+650µs) by Mail::SpamAssassin::Plugin::TxRep::new at line 232
sub set_config {
243###########################################################################
24412µs my($self, $conf) = @_;
24512µs my @cmds;
246
247# -------------------------------------------------------------------------
248=head1 USER PREFERENCES
249
250The following options can be used in both site-wide (C<local.cf>) and
251user-specific (C<user_prefs>) configuration files to customize how
252SpamAssassin handles incoming email messages.
253
254=over 4
255
256=item B<use_txrep>
257
258 0 | 1 (default: 0)
259
260Whether to use TxRep reputation system. TxRep tracks the long-term average
261score for each sender and then shifts the score of new messages toward that
262long-term average. This can increase or decrease the score for messages,
263depending on the long-term behavior of the particular correspondent.
264
265Note that certain tests are ignored when determining the final message score:
266
267 - rules with tflags set to 'noautolearn'
268
269=cut # ...................................................................
27018µs push (@cmds, {
271 setting => 'use_txrep',
272 default => 0,
273 type => $Mail::SpamAssassin::Conf::CONF_TYPE_BOOL
274 });
275
276
277# -------------------------------------------------------------------------
278=item B<txrep_factor>
279
280 range [0..1] (default: 0.5)
281
282How much towards the long-term mean for the sender to regress a message.
283Basically, the algorithm is to track the long-term total score and the count
284of messages for the sender (C<total> and C<count>), and then once we have
285otherwise fully calculated the score for this message (C<score>), we calculate
286the final score for the message as:
287
288 finalscore = score + factor * (total + score)/(count + 1)
289
290So if C<factor> = 0.5, then we'll move to half way between the calculated
291score and the new mean value. If C<factor> = 0.3, then we'll move about 1/3
292of the way from the score toward the mean. C<factor> = 1 means use the
293long-term mean including also the new unadjusted score; C<factor> = 0 mean
294just use the calculated score, disabling so the score averaging, though still
295recording the reputation to the database.
296
297=cut # ...................................................................
298 push (@cmds, {
299 setting => 'txrep_factor',
300 default => 0.5,
301 type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
302 code => sub {
303 my ($self, $key, $value, $line) = @_;
304 if ($value < 0 || $value > 1.0) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;}
305 $self->{txrep_factor} = $value;
306 }
307113µs });
308
309
310# -------------------------------------------------------------------------
311=item B<txrep_dilution_factor>
312
313 range [0.7..1.0] (default: 0.98)
314
315At any new email from given sender, the historical reputation records are "diluted",
316or "watered down" by certain fraction given by this factor. It means that the
317influence of old records will progressively diminish with every new message from
318given sender. This is important to allow a more flexible handling of changes in
319sender's behavior, or new improvements or changes of local SA rules.
320
321Without any dilution expiry (dilution factor set to 1), the new message score is
322simply add to the total score of given sender in the reputation database. When
323dilution is used (factor < 1), the impact of the historical reputation average is
324reduced by the factor before calculating the new average, which in turn is then
325used to adjust the new total score to be stored in the database.
326
327 newtotal = (oldcount + 1) * (newscore + dilution * oldtotal) / (dilution * oldcount + 1)
328
329In other words, it means that the older a message is, the less and less impact
330on the new average its original spam score has. For example if we set the factor
331to 0.9 (meaning dilution by 10%), the score of the new message will be recorded
332to its 100%, the last score of the same sender to 90%, the second last to 81%
333(0.9 * 0.9 = 0.81), and for example the 10th last message just to 35%.
334
335At stable systems, we recommend keeping the factor close to 1 (but still lower
336than 1). At systems where SA rules tuning and spam learning is still in progress,
337lower factors will help the reputation to quicker adapt any modifications. In
338the same time, it will also reduce the impact of the historical reputation
339though.
340
341=cut # ...................................................................
342 push (@cmds, {
343 setting => 'txrep_dilution_factor',
344 default => 0.98,
345 type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
346 code => sub {
347 my ($self, $key, $value, $line) = @_;
348 if ($value < 0.7 || $value > 1.0) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;}
349 $self->{txrep_dilution_factor} = $value;
350 }
35118µs });
352
353
354# TODO, not implemented yet, hence no advertising until then
355# -------------------------------------------------------------------------
356#=item B<txrep_expiry_days>
357#
358# range [0..10000] (default: 365)
359#
360#The scores of of messages can be removed from the total reputation, and the
361#message tracking entry removed from the database after given number of days.
362#It helps keeping the database growth under control, and it also reduces the
363#influence of old scores on the current reputation (both scoring methods, and
364#sender's behavior might have changed over time).
365#
366#=cut # ...................................................................
367 push (@cmds, {
368 setting => 'txrep_expiry_days',
369 default => 365,
370 type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
371 code => sub {
372 my ($self, $key, $value, $line) = @_;
373 if ($value < 0 || $value > 10000) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;}
374 $self->{txrep_expiry_days} = $value;
375 }
37617µs });
377
378
379# -------------------------------------------------------------------------
380=item B<txrep_learn_penalty>
381
382 range [0..200] (default: 20)
383
384When SpamAssassin is trained a SPAM message, the given penalty score will
385be added to the total reputation score of the sender, regardless of the real
386spam score. The impact of the penalty will be the smaller the higher is the
387number of messages that the sender already has in the TxRep database.
388
389=cut # ...................................................................
390 push (@cmds, {
391 setting => 'txrep_learn_penalty',
392 default => 20,
393 type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
394 code => sub {
395 my ($self, $key, $value, $line) = @_;
396 if ($value < 0 || $value > 200) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;}
397 $self->{txrep_learn_penalty} = $value;
398 }
39918µs });
400
401
402# -------------------------------------------------------------------------
403=item B<txrep_learn_bonus>
404
405 range [0..200] (default: 20)
406
407When SpamAssassin is trained a HAM message, the given penalty score will be
408deduced from the total reputation score of the sender, regardless of the real
409spam score. The impact of the penalty will be the smaller the higher is the
410number of messages that the sender already has in the TxRep database.
411
412=cut # ...................................................................
413 push (@cmds, {
414 setting => 'txrep_learn_bonus',
415 default => 20,
416 type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
417 code => sub {
418 my ($self, $key, $value, $line) = @_;
419 if ($value < 0 || $value > 200) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;}
420 $self->{txrep_learn_bonus} = $value;
421 }
42217µs });
423
424
425# -------------------------------------------------------------------------
426=item B<txrep_autolearn>
427
428 range [0..5] (default: 0)
429
430When SpamAssassin declares a message a clear spam resp. ham during the mesage
431scan, and launches the auto-learn process, sender reputation scores of given
432message will be adjusted by the value of the option L</C<txrep_learn_penalty>>,
433resp. the L</C<txrep_learn_bonus>> in the same way as during the manual learning.
434Value 0 at this option disables the auto-learn reputation adjustment - only the
435score calculated before the auto-learn will be stored to the reputation database.
436
437=cut # ...................................................................
438 push (@cmds, {
439 setting => 'txrep_autolearn',
440 default => 0,
441 type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
442 code => sub {
443 my ($self, $key, $value, $line) = @_;
444 if ($value < 0 || $value > 5) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;}
445 $self->{txrep_autolearn} = $value;
446 }
44718µs });
448
449
450# -------------------------------------------------------------------------
451=item B<txrep_track_messages>
452
453 0 | 1 (default: 1)
454
455Whether TxRep should keep track of already scanned and/or learned messages.
456When enabled, an additional record in the reputation database will be created
457to avoid false score adjustments due to repeated scanning of the same message,
458and to allow proper relearning of messages that were either previously wrongly
459learned, or need to be relearned after modifying the learn penalty or bonus.
460
461=cut # ...................................................................
46213µs push (@cmds, {
463 setting => 'txrep_track_messages',
464 default => 1,
465 type => $Mail::SpamAssassin::Conf::CONF_TYPE_BOOL
466 });
467
468
469# -------------------------------------------------------------------------
470=item B<txrep_whitelist_out>
471
472 range [0..200] (default: 10)
473
474When the value of this setting is greater than zero, recipients of messages sent from
475within the internal networks will be whitelisted through improving their total reputation
476score with the number of points defined by this setting. Since the IP address and other
477sender identificators are not known when sending the email, only the reputation of the
478standalone email is being whitelisted. The domain name is intentionally also left
479unaffected. The outbound whitelisting can only work when SpamAssassin is set up to scan
480also outgoing email, when local users use the SMTP server for sending email, and when
481C<internal_networks> are defined in SpamAssassin configuration. The improving of the
482reputation happens at every message sent from internal networks, so the more messages is
483being sent to the recipient, the better reputation his email address will have.
484
485
486=cut # ...................................................................
487 push (@cmds, {
488 setting => 'txrep_whitelist_out',
489 default => 10,
490 type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
491
# spent 22µs within Mail::SpamAssassin::Plugin::TxRep::__ANON__[/usr/local/lib/perl5/site_perl/Mail/SpamAssassin/Plugin/TxRep.pm:495] which was called: # once (22µs+0s) by Mail::SpamAssassin::Conf::Parser::parse at line 438 of Mail/SpamAssassin/Conf/Parser.pm
code => sub {
49216µs my ($self, $key, $value, $line) = @_;
49314µs if ($value < 0 || $value > 200) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;}
494116µs $self->{txrep_whitelist_out} = $value;
495 }
49618µs });
497
498
499# -------------------------------------------------------------------------
500=item B<txrep_ipv4_mask_len>
501
502 range [0..32] (default: 16)
503
504The AWL database keeps only the specified number of most-significant bits
505of an IPv4 address in its fields, so that different individual IP addresses
506within a subnet belonging to the same owner are managed under a single
507database record. As we have no information available on the allocated
508address ranges of senders, this CIDR mask length is only an approximation.
509The default is 16 bits, corresponding to a former class B. Increase the
510number if a finer granularity is desired, e.g. to 24 (class C) or 32.
511A value 0 is allowed but is not particularly useful, as it would treat the
512whole internet as a single organization. The number need not be a multiple
513of 8, any split is allowed.
514
515=cut # ...................................................................
516 push (@cmds, {
517 setting => 'txrep_ipv4_mask_len',
518 default => 16,
519 type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
520 code => sub {
521 my ($self, $key, $value, $line) = @_;
522 if (!defined $value || $value eq '')
523 {return $Mail::SpamAssassin::Conf::MISSING_REQUIRED_VALUE;}
524 elsif ($value !~ /^\d+$/ || $value < 0 || $value > 32)
525 {return $Mail::SpamAssassin::Conf::INVALID_VALUE;}
526 $self->{txrep_ipv4_mask_len} = $value;
527 }
52817µs });
529
530
531# -------------------------------------------------------------------------
532=item B<txrep_ipv6_mask_len>
533
534 range [0..128] (default: 48)
535
536The AWL database keeps only the specified number of most-significant bits
537of an IPv6 address in its fields, so that different individual IP addresses
538within a subnet belonging to the same owner are managed under a single
539database record. As we have no information available on the allocated address
540ranges of senders, this CIDR mask length is only an approximation. The default
541is 48 bits, corresponding to an address range commonly allocated to individual
542(smaller) organizations. Increase the number for a finer granularity, e.g.
543to 64 or 96 or 128, or decrease for wider ranges, e.g. 32. A value 0 is
544allowed but is not particularly useful, as it would treat the whole internet
545as a single organization. The number need not be a multiple of 4, any split
546is allowed.
547
548=cut # ...................................................................
549 push (@cmds, {
550 setting => 'txrep_ipv6_mask_len',
551 default => 48,
552 type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
553 code => sub {
554 my ($self, $key, $value, $line) = @_;
555 if (!defined $value || $value eq '')
556 {return $Mail::SpamAssassin::Conf::MISSING_REQUIRED_VALUE;}
557 elsif ($value !~ /^\d+$/ || $value < 0 || $value > 128)
558 {return $Mail::SpamAssassin::Conf::INVALID_VALUE;}
559 $self->{txrep_ipv6_mask_len} = $value;
560 }
56117µs });
562
563
564# -------------------------------------------------------------------------
565=item B<user_awl_sql_override_username>
566
567 string (default: undefined)
568
569Used by the SQLBasedAddrList storage implementation.
570
571If this option is set the SQLBasedAddrList module will override the set
572username with the value given. This can be useful for implementing global
573or group based TxRep databases.
574
575=cut # ...................................................................
57613µs push (@cmds, {
577 setting => 'user_awl_sql_override_username',
578 default => '',
579 type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING
580 });
581
582
583# -------------------------------------------------------------------------
584=item B<txrep_user2global_ratio>
585
586 range [0..10] (default: 0)
587
588When the option txrep_user2global_ratio is set to a value greater than zero, and
589if the server configuration allows it, two data storages will be used - user and
590global (server-wide) storages.
591
592User storage keeps only senders who send messages to the respective recipient,
593and will reflect also the corrected/learned scores, when some messages are marked
594by the user as spam or ham, or when the sender is whitelisted or blacklisted
595through the API of SpamAssassin.
596
597Global storage keeps the reputation data of all messages processed by SpamAssassin
598with their spam scores and spam/ham learning data from all users on the server.
599Hence, the module will return a reputation value even at senders not known to the
600current recipient, as long as he already sent email to anyone else on the server.
601
602The value of the txrep_user2global_ratio parameter controls the impact of each
603of the two reputations. When equal to 1, both the global and the user score will
604have the same impact on the result. When set to 2, the reputation taken from
605the user storage will have twice the impact of the global value. The final value
606of the TXREP tag will be calculated as follows:
607
608 total = ( ratio * user + global ) / ( ratio + 1 )
609
610When no reputation is found in the user storage, and a global reputation is
611available, the global storage is used fully, without applying the ratio.
612
613When the ratio is set to zero, only the default storage will be used. And it
614then depends whether you use the global, or the local user storage by default,
615which in turn is controlled either by the parameter user_awl_sql_override_username
616(in case of SQL storage), or the C</auto_whitelist_path> parameter (in case of
617Berkeley database).
618
619When this dual storage is enabled, and no global storage is defined by the
620above mentioned parameters for the Berkeley or SQL databases, TxRep will attempt
621to use a generic storage - user 'GLOBAL' in case of SQL, and in the case of
622Berkeley database it uses the path defined by '__local_state_dir__/tx-reputation',
623which typically renders into /var/db/spamassassin/tx-reputation. When the default
624storages are not available, or are not writable, you would have to set the global
625storage with the help of the C<user_awl_sql_override_username> resp.
626C<auto_whitelist_path settings>.
627
628Please note that some SpamAssassin installations run always under the same user
629ID. In such case it is pointless enabling the dual storage, because it would
630maximally lead to two identical global storages in different locations.
631
632This feature is disabled by default.
633=cut # ...................................................................
634 push (@cmds, {
635 setting => 'txrep_user2global_ratio',
636 default => 0,
637 type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING,
638 code => sub {
639 my ($self, $key, $value, $line) = @_;
640 if ($value < 0 || $value > 10) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;}
641 $self->{txrep_user2global_ratio} = $value;
642 }
64317µs });
644
645
646# -------------------------------------------------------------------------
647=item B<auto_whitelist_distinguish_signed>
648
649 (default: 1 - enabled)
650
651Used by the SQLBasedAddrList storage implementation.
652
653If this option is set the SQLBasedAddrList module will keep separate
654database entries for DKIM-validated e-mail addresses and for non-validated
655ones. Without this option, or for domains that do not use a DKIM signature,
656the reputation of legitimate email can get mixed with the reputation of
657forgeries. A pre-requisite when setting this option is that a field
658txrep.signedby exists in a SQL table, otherwise SQL operations will fail.
659A DKIM plugin must also be enabled in order for this option to take effect.
660This option is highly recommended. Unless you are using a pre-3.3.0 database
661schema and cannot upgrade, there is no reason to disable this option. If
662you are upgrading from AWL and using a pre-3.3.0 schema, the txrep.signedby
663column will not exist. It is recommended that you add this column, but if
664that is not possible you must set this option to 0 to avoid SQL errors.
665
666=cut # ...................................................................
66713µs push (@cmds, {
668 setting => 'auto_whitelist_distinguish_signed',
669 default => 1,
670 type => $Mail::SpamAssassin::Conf::CONF_TYPE_BOOL
671 });
672
673
674=item B<txrep_spf>
675
676 0 | 1 (default: 1)
677
678When enabled, TxRep will treat any IP address using a given email address as
679the same authorized identity, and will not associate any IP address with it.
680(The same happens with valid DKIM signatures. No option available for DKIM).
681
682Note: at domains that define the useless SPF +all (pass all), no IP would be
683ever associated with the email address, and all addresses (incl. the froged
684ones) would be treated as coming from the authorized source. However, such
685domains are hopefuly rare, and ask for this kind of treatment anyway.
686
687=back
688
689=cut # ...................................................................
69013µs push (@cmds, {
691 setting => 'txrep_spf',
692 default => 1,
693 type => $Mail::SpamAssassin::Conf::CONF_TYPE_BOOL
694 });
695
696
697# -------------------------------------------------------------------------
698=head2 REPUTATION WEIGHTS
699
700The overall reputation of the sender comprises several elements:
701
702=over 4
703
704=item 1) The reputation of the 'From' email address bound to the originating IP
705 address fraction (see the mask parameters for details)
706
707=item 2) The reputation of the 'From' email address alone (regardless the IP
708 address being currently used)
709
710=item 3) The reputation of the domain name of the 'From' email address
711
712=item 4) The reputation of the originating IP address, regardless of sender's email address
713
714=item 5) The reputation of the HELO name of the originating computer (if available)
715
716=back
717
718Each of these partial reputations is weighted with the help of these parameters,
719and the overall reputation is calculation as the sum of the individual
720reputations divided by the sum of all their weights:
721
722 sender_reputation = weight_email * rep_email +
723 weight_email_ip * rep_email_ip +
724 weight_domain * rep_domain +
725 weight_ip * rep_ip +
726 weight_helo * rep_helo
727
728You can disable the individual partial reputations by setting their respective
729weight to zero. This will also reduce the size of the database, since each
730partial reputation requires a separate entry in the database table. Disabling
731some of the partial reputations in this way may also help with the performance
732on busy servers, because the respective database lookups and processing will
733be skipped too.
734
735=over 4
736
737=item B<txrep_weight_email>
738
739 range [0..10] (default: 3)
740
741This weight factor controls the influence of the reputation of the standalone
742email address, regardless of the originating IP address. When adjusting the
743weight, you need to keep on mind that an email address can be easily spoofed,
744and hence spammers can use 'from' email addresses belonging to senders with
745good reputation. From this point of view, the email address bound to the
746originating IP address is a more reliable indicator for the overall reputation.
747
748On the other hand, some reputable senders may be sending from a bigger number
749of IP addresses, so looking for the reputation of the standalone email address
750without regarding the originating IP has some sense too.
751
752We recommend using a relatively low value for this partial reputation.
753
754=cut # ...................................................................
755 push (@cmds, {
756 setting => 'txrep_weight_email',
757 default => 3,
758 type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
759 code => sub {
760 my ($self, $key, $value, $line) = @_;
761 if ($value < 0 || $value > 10) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;}
762 $self->{txrep_weight_email} = $value;
763 }
76418µs });
765
766# -------------------------------------------------------------------------
767=item B<txrep_weight_email_ip>
768
769 range [0..10] (default: 10)
770
771This is the standard reputation used in the same way as it was by the original
772AWL plugin. Each sender's email address is bound to the originating IP, or
773its part as defined by the txrep_ipv4_mask_len or txrep_ipv6_mask_len parameters.
774
775At a user sending from multiple locations, diverse mail servers, or from a dynamic
776IP range out of the masked block, his email address will have a separate reputation
777value for each of the different (partial) IP addresses.
778
779When the option auto_whitelist_distinguish_signed is enabled, in contrary to
780the original AWL module, TxRep does not record the IP address when DKIM
781signature is detected. The email address is then not bound to any IP address, but
782rather just to the DKIM signature, since it is considered that it authenticates
783the sender more reliably than the IP address (which can also vary).
784
785This is by design the most relevant reputation, and its weight should be kept
786high.
787
788=cut # ...................................................................
789 push (@cmds, {
790 setting => 'txrep_weight_email_ip',
791 default => 10,
792 type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
793 code => sub {
794 my ($self, $key, $value, $line) = @_;
795 if ($value < 0 || $value > 10) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;}
796 $self->{txrep_weight_email_ip} = $value;
797 }
79817µs });
799
800# -------------------------------------------------------------------------
801=item B<txrep_weight_domain>
802
803 range [0..10] (default: 2)
804
805Some spammers may use always their real domain name in the email address,
806just with multiple or changing local parts. This reputation will record the
807spam scores of all messages send from the respective domain, regardless of
808the local part (user name) used.
809
810Similarly as with the email_ip reputation, the domain reputation is also
811bound to the originating address (or a masked block, if mask parameters used).
812It avoids giving false reputation based on spoofed email addresses.
813
814In case of a DKIM signature detected, the signature signer is used instead
815of the domain name extracted from the email address. It is considered that
816the signing authority is responsible for sending email of any domain name,
817hence the same reputation applies here.
818
819The domain reputation will give relevant picture about the owner of the
820domain in case of small servers, or corporation with strict policies, but
821will be less relevant for freemailers like Gmail, Hotmail, and similar,
822because both ham and spam may be sent by their users.
823
824The default value is set relatively low. Higher weight values may be useful,
825but we recommend caution and observing the scores before increasing it.
826
827=cut # ...................................................................
828 push (@cmds, {
829 setting => 'txrep_weight_domain',
830 default => 2,
831 type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
832 code => sub {
833 my ($self, $key, $value, $line) = @_;
834 if ($value < 0 || $value > 10) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;}
835 $self->{txrep_weight_domain} = $value;
836 }
83716µs });
838
839# -------------------------------------------------------------------------
840=item B<txrep_weight_ip>
841
842 range [0..10] (default: 4)
843
844Spammers can send through the same relay (incl. compromised hosts) under a
845multitude of email addresses. This is the exact case when the IP reputation
846can help. This reputation is a kind of a local RBL.
847
848The weight is set by default lower than for the email_IP reputation, because
849there may be cases when the same IP address hosts both spammers and acceptable
850senders (for example the marketing department of a company sends you spam, but
851you still need to get messages from their billing address).
852
853=cut # ...................................................................
854 push (@cmds, {
855 setting => 'txrep_weight_ip',
856 default => 4,
857 type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
858 code => sub {
859 my ($self, $key, $value, $line) = @_;
860 if ($value < 0 || $value > 10) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;}
861 $self->{txrep_weight_ip} = $value;
862 }
86317µs });
864
865# -------------------------------------------------------------------------
866=item B<txrep_weight_helo>
867
868 range [0..10] (default: 0.5)
869
870Big number of spam messages come from compromised hosts, often personal computers,
871or top-boxes. Their NetBIOS names are usually used as the HELO name when connecting
872to your mail server. Some of the names are pretty generic and hence may be shared by
873a big number of hosts, but often the names are quite unique and may be a good
874indicator for detecting a spammer, despite that he uses different email and IP
875addresses (spam can come also from portable devices).
876
877No IP address is bound to the HELO name when stored to the reputation database.
878This is intentional, and despite the possibility that numerous devices may share
879some of the HELO names.
880
881This option is still considered experimental, hence the low weight value, but after
882some testing it could be likely at least slightly increased.
883
884=cut # ...................................................................
885 push (@cmds, {
886 setting => 'txrep_weight_helo',
887 default => 0.5,
888 type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
889 code => sub {
890 my ($self, $key, $value, $line) = @_;
891 if ($value < 0 || $value > 10) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;}
892 $self->{txrep_weight_helo} = $value;
893 }
89416µs });
895
896
897# -------------------------------------------------------------------------
898=back
899
900=head1 ADMINISTRATOR SETTINGS
901
902These settings differ from the ones above, in that they are considered 'more
903privileged' -- even more than the ones in the B<PRIVILEGED SETTINGS> section.
904No matter what C<allow_user_rules> is set to, these can never be set from a
905user's C<user_prefs> file.
906
907=over 4
908
909=item B<txrep_factory module>
910
911 (default: Mail::SpamAssassin::DBBasedAddrList)
912
913Select alternative database factory module for the TxRep database.
914
915=cut # ...................................................................
91614µs push (@cmds, {
917 setting => 'txrep_factory',
918 is_admin => 1,
919 default => 'Mail::SpamAssassin::DBBasedAddrList',
920 type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING
921 });
922
923
924# -------------------------------------------------------------------------
925=item B<auto_whitelist_path /path/filename>
926
927 (default: ~/.spamassassin/tx-reputation)
928
929This is the TxRep directory and filename. By default, each user
930has their own reputation database in their C<~/.spamassassin> directory with
931mode 0700. For system-wide SpamAssassin use, you may want to share this
932across all users.
933
934=cut # ...................................................................
935 push (@cmds, {
936 setting => 'auto_whitelist_path',
937 is_admin => 1,
938 default => '__userstate__/tx-reputation',
939 type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING,
940 code => sub {
941 my ($self, $key, $value, $line) = @_;
942 unless (defined $value && $value !~ /^$/) {return $Mail::SpamAssassin::Conf::MISSING_REQUIRED_VALUE;}
943 if (-d $value) {return $Mail::SpamAssassin::Conf::INVALID_VALUE; }
944 $self->{txrep_path} = $value;
945 }
94618µs });
947
948
949# -------------------------------------------------------------------------
950=item B<auto_whitelist_db_modules Module ...>
951
952 (default: see below)
953
954What database modules should be used for the TxRep storage database
955file. The first named module that can be loaded from the Perl include path
956will be used. The format is:
957
958 PreferredModuleName SecondBest ThirdBest ...
959
960ie. a space-separated list of Perl module names. The default is:
961
962 DB_File GDBM_File SDBM_File
963
964NDBM_File is not supported (see SpamAssassin bug 4353).
965
966=cut # ...................................................................
96717µs push (@cmds, {
968 setting => 'auto_whitelist_db_modules',
969 is_admin => 1,
970 default => 'DB_File GDBM_File SDBM_File',
971 type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING
972 });
973
974
975# -------------------------------------------------------------------------
976=item B<auto_whitelist_file_mode>
977
978 (default: 0700)
979
980The file mode bits used for the TxRep directory or file.
981
982Make sure you specify this using the 'x' mode bits set, as it may also be used
983to create directories. However, if a file is created, the resulting file will
984not have any execute bits set (the umask is set to 0111).
985
986=cut # ...................................................................
987 push (@cmds, {
988 setting => 'auto_whitelist_file_mode',
989 is_admin => 1,
990 default => '0700',
991 type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
992 code => sub {
993 my ($self, $key, $value, $line) = @_;
994 if ($value !~ /^0?[0-7]{3}$/) {
995 return $Mail::SpamAssassin::Conf::INVALID_VALUE;
996 }
997 $self->{txrep_file_mode} = untaint_var($value);
998 }
99918µs });
1000
1001
1002# -------------------------------------------------------------------------
1003=item B<user_awl_dsn DBI:databasetype:databasename:hostname:port>
1004
1005Used by the SQLBasedAddrList storage implementation.
1006
1007This will set the DSN used to connect. Example:
1008C<DBI:mysql:spamassassin:localhost>
1009
1010=cut # ...................................................................
101114µs push (@cmds, {
1012 setting => 'user_awl_dsn',
1013 is_admin => 1,
1014 type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING
1015 });
1016
1017
1018# -------------------------------------------------------------------------
1019=item B<user_awl_sql_username username>
1020
1021Used by the SQLBasedAddrList storage implementation.
1022
1023The authorized username to connect to the above DSN.
1024
1025=cut # ...................................................................
102613µs push (@cmds, {
1027 setting => 'user_awl_sql_username',
1028 is_admin => 1,
1029 type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING
1030 });
1031
1032
1033# -------------------------------------------------------------------------
1034=item B<user_awl_sql_password password>
1035
1036Used by the SQLBasedAddrList storage implementation.
1037
1038The password for the database username, for the above DSN.
1039
1040=cut # ...................................................................
104114µs push (@cmds, {
1042 setting => 'user_awl_sql_password',
1043 is_admin => 1,
1044 type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING
1045 });
1046
1047
1048# -------------------------------------------------------------------------
1049=item B<user_awl_sql_table tablename>
1050
1051 (default: txrep)
1052
1053Used by the SQLBasedAddrList storage implementation.
1054
1055The table name where reputation is to be stored in, for the above DSN.
1056
1057=back
1058
1059=cut # ...................................................................
106014µs push (@cmds, {
1061 setting => 'user_awl_sql_table',
1062 is_admin => 1,
1063 default => 'txrep',
1064 type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING
1065 });
1066
1067119µs1650µs $conf->{parser}->register_commands(\@cmds);
1068}
1069
1070
1071###########################################################################
1072sub _message {
1073###########################################################################
1074 my ($self, $value, $msg) = @_;
1075 print "SpamAssassin TxRep: $value\n" if ($msg);
1076 dbg("TxRep: $value");
1077}
1078
1079
1080###########################################################################
1081sub _fail_exit {
1082###########################################################################
1083 my ($self, $err) = @_;
1084 my $eval_stat = ($err ne '') ? $err : "errno=$!";
1085 chomp $eval_stat;
1086 warn("TxRep: open of TxRep file failed: $eval_stat\n");
1087 if (!defined $self->{txKeepStoreTied}) {$self->finish();}
1088 return 0;
1089}
1090
1091
1092###########################################################################
1093sub _fn_envelope {
1094###########################################################################
1095 my ($self, $args, $value, $msg) = @_;
1096
1097 unless ($self->{main}->{conf}->{use_txrep}){ return 0;}
1098 unless ($args->{address}) {$self->_message($args->{cli_p},"failed ".$msg); return 0;}
1099
1100 my $factor = $self->{conf}->{txrep_weight_email} +
1101 $self->{conf}->{txrep_weight_email_ip} +
1102 $self->{conf}->{txrep_weight_domain} +
1103 $self->{conf}->{txrep_weight_ip} +
1104 $self->{conf}->{txrep_weight_helo};
1105 my $sign = $args->{signedby};
1106 my $id = $args->{address};
1107 if ($args->{address} =~ /,/) {
1108 $sign = $args->{address};
1109 $sign =~ s/^.*,//g;
1110 $id =~ s/,.*$//g;
1111 }
1112
1113 # simplified regex used for IP detection (possible FP at a domain is not critical)
1114 if ($id !~ /\./ && $self->{conf}->{txrep_weight_helo})
1115 {$factor /= $self->{conf}->{txrep_weight_helo}; $sign = 'helo';}
1116 elsif ($id =~ /^[a-f\d\.:]+$/ && $self->{conf}->{txrep_weight_ip})
1117 {$factor /= $self->{conf}->{txrep_weight_ip};}
1118 elsif ($id =~ /@/ && $self->{conf}->{txrep_weight_email})
1119 {$factor /= $self->{conf}->{txrep_weight_email};}
1120 elsif ($id !~ /@/ && $self->{conf}->{txrep_weight_domain})
1121 {$factor /= $self->{conf}->{txrep_weight_domain};}
1122 else {$factor = 1;}
1123
1124 $self->open_storages();
1125 my $score = (!defined $value)? undef : $factor * $value;
1126 my $status = $self->modify_reputation($id, $score, $sign);
1127 dbg("TxRep: $msg %s (score %s) %s", $id, $score || 'undef', $sign || '');
1128 eval {
1129 $self->_message($args->{cli_p}, ($status?"":"error ") . $msg . ": " . $id);
1130 if (!defined $self->{txKeepStoreTied}) {$self->finish();}
1131 1;
1132 } or return $self->_fail_exit( $@ );
1133 return $status;
1134}
1135
- -
1138# -------------------------------------------------------------------------
1139=head1 BLACKLISTING / WHITELISTING
1140
1141When asked by SpamAssassin to blacklist or whitelist a user, the TxRep
1142plugin adds a score of 100 (for blacklisting) or -100 (for whitelisting)
1143to the given sender's email address. At a plain address without any IP
1144address, the value is multiplied by the ratio of total reputation
1145weight to the EMAIL reputation weight to account for the reduced impact
1146of the standalone EMAIL reputation when calculating the overall reputation.
1147
1148 total_weight = weight_email + weight_email_ip + weight_domain + weight_ip + weight_helo
1149 blacklisted_reputation = 100 * total_weight / weight_email
1150
1151When a standalone email address is blacklisted/whitelisted, all records
1152of the email address bound to an IP address, DKIM signature, or a SPF pass
1153will be removed from the database, and only the standalone record is kept.
1154
1155Besides blacklisting/whitelisting of standalone email addresses, the same
1156method may be used also for blacklisting/whitelisting of IP addresses,
1157domain names, and HELO names (only dotless Netbios HELO names can be used).
1158
1159When whitelisting/blacklisting an email address or domain name, you can
1160bind them to a specified DKIM signature or SPF record by appending the
1161DKIM signing domain or the tag 'spf' after the ID in the following way:
1162
1163 spamassassin --add-addr-to-blacklist=spamming.biz,spf
1164 spamassassin --add-addr-to-whitelist=friend@good.org,good.org
1165
1166When a message contains both a DKIM signature and an SPF pass, the DKIM
1167signature takes the priority, so the record bound to the 'spf' tag won't
1168be checked. Only email addresses and domains can be bound to DKIM or SPF.
1169Records of IP adresses and HELO names are always without DKIM/SPF.
1170
1171In case of dual storage, the black/whitelisting is performed only in the
1172default storage.
1173
1174=cut
1175######################################################## plugin hooks #####
1176sub blacklist_address {my $self=shift; return $self->_fn_envelope(@_, 100, "blacklisting address");}
1177sub whitelist_address {my $self=shift; return $self->_fn_envelope(@_, -100, "whitelisting address");}
1178sub remove_address {my $self=shift; return $self->_fn_envelope(@_,undef, "removing address");}
1179###########################################################################
1180
1181
1182# -------------------------------------------------------------------------
1183=head1 REPUTATION LOGICS
1184
11851. The most significant sender identificator is equally as at AWL, the
1186 combination of the email address and the originating IP address, resp.
1187 its part defined by the IPv4 resp. IPv6 mask setting.
1188
11892. No IP checking for standalone EMAIL address reputation
1190
11913. No signature checking for IP reputation, and for HELO name reputation
1192
11934. The EMAIL_IP weight, and not the standalone EMAIL weight is used when
1194 no IP address is available (EMAIL_IP is the main indicator, and has
1195 the highest weight)
1196
11975. No IP checking at signed emails (signature authenticates the email
1198 instead of the IP address)
1199
12006. No IP checking at SPF pass (we assume the domain owner is responsable
1201 for all IP's he authorizes to send from, hence we use the same identity
1202 for all of them)
1203
12047. No signature used for standalone EMAIL reputation (would be redundant,
1205 since no IP is used at signed EMAIL_IP reputation, and we would store
1206 two identical hits)
1207
12088. When available, the DKIM signer is used instead of the domain name for
1209 the DOMAIN reputation
1210
12119. No IP and no signature used for HELO reputation (despite the possibility
1212 of the possible existence of multiple computers with the same HELO)
1213
121410. The full (unmasked IP) address is used (in the address field, instead the
1215 IP field) for the standalone IP reputation
1216
1217=cut
1218###########################################################################
1219
# spent 96340s (471ms+96340) within Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation which was called 468 times, avg 206s/call: # 234 times (212ms+96340s) by Mail::SpamAssassin::Plugin::TxRep::learn_message at line 1846, avg 412s/call # 234 times (259ms+-259ms) by Mail::SpamAssassin::Plugin::TxRep::forget_message at line 1861, avg 0s/call
sub check_senders_reputation {
1220###########################################################################
12214681.05ms my ($self, $pms) = @_;
1222
1223# just for the development debugging
1224# use Data::Printer;
1225# dbg("TxRep: DEBUG DUMP of pms: %s, %s", $pms, p($pms));
1226
12274681.77ms my $autolearn = defined $self->{autolearn};
12284681.75ms $self->{last_pms} = $self->{autolearn} = undef;
1229
1230 # Cases where we would not be able to use TxRep
12314681.31ms return 0 unless ($self->{conf}->{use_txrep});
12324681.53ms if ($self->{conf}->{use_auto_whitelist}) {
1233 warn("TxRep: cannot run when Auto-Whitelist is enabled. Please disable it!\n");
1234 return 0;
1235 }
1236468943µs if ($autolearn && !$self->{conf}->{txrep_autolearn}) {
1237 dbg("TxRep: autolearning disabled, no more reputation adjusting, quitting");
1238 return 0;
1239 }
12404686.08ms468261ms my @from = $pms->all_from_addrs();
# spent 261ms making 468 calls to Mail::SpamAssassin::PerMsgStatus::all_from_addrs, avg 558µs/call
12414681.53ms if (@from && $from[0] eq 'ignore@compiling.spamassassin.taint.org') {
1242 dbg("TxRep: no scan in lint mode, quitting");
1243 return 0;
1244 }
1245
12464681.01ms my $delta = 0;
12474684.28ms4684.43ms my $timer = $self->{main}->time_method("total_txrep");
# spent 4.43ms making 468 calls to Mail::SpamAssassin::time_method, avg 9µs/call
12484681.65ms my $msgscore = (defined $self->{learning})? $self->{learning} : $pms->get_autolearn_points();
12494685.96ms4682.93s my $date = $pms->{msg}->receive_date() || $pms->{date_header_time};
# spent 2.93s making 468 calls to Mail::SpamAssassin::Message::receive_date, avg 6.27ms/call
1250 my $msg_id = $self->{msgid} ||
12514688.01ms468235ms Mail::SpamAssassin::Plugin::Bayes->get_msgid($pms->{msg}) ||
# spent 235ms making 468 calls to Mail::SpamAssassin::Plugin::Bayes::get_msgid, avg 503µs/call
1252 $pms->get('Message-Id') || $pms->get('Message-ID') || $pms->get('MESSAGE-ID') || $pms->get('MESSAGEID');
1253
12544685.79ms46812.8ms my $from = lc $pms->get('From:addr') || $pms->get('EnvelopeFrom:addr');;
# spent 12.8ms making 468 calls to Mail::SpamAssassin::PerMsgStatus::get, avg 27µs/call
12554686.72ms4682.49ms return 0 unless $from =~ /\S/;
# spent 2.49ms making 468 calls to Mail::SpamAssassin::Plugin::TxRep::CORE:match, avg 5µs/call
12564681.62ms my $domain = $from;
125746829.4ms4683.10ms $domain =~ s/^.+@//;
# spent 3.10ms making 468 calls to Mail::SpamAssassin::Plugin::TxRep::CORE:subst, avg 7µs/call
1258
1259 # Find the last untrusted relay and populate helo and original IP
12604681.06ms my ($origip, $helo);
12614682.78ms if (defined $pms->{relays_trusted} || defined $pms->{relays_untrusted}) {
12629363.09ms my $trusteds = @{$pms->{relays_trusted}};
126314048.01ms foreach my $rly ( @{$pms->{relays_trusted}}, @{$pms->{relays_untrusted}} ) {
1264 # Get the last found HELO, regardless of private/public or trusted/untrusted
1265 # Avoiding a redundant duplicate entry if HELO is equal/similar to another identificator
12662082263ms12412108ms if (defined $rly->{helo} &&
# spent 87.6ms making 6206 calls to Mail::SpamAssassin::Plugin::TxRep::CORE:regcomp, avg 14µs/call # spent 20.6ms making 6206 calls to Mail::SpamAssassin::Plugin::TxRep::CORE:match, avg 3µs/call
1267 $rly->{helo} !~ /^\[?\Q$rly->{ip}\E\]?$/ &&
1268 $rly->{helo} !~ /^\Q$domain\E$/i &&
1269 $rly->{helo} !~ /^\Q$from\E$/i ) {
127020426.69ms $helo = $rly->{helo};
1271 }
1272 # use only trusted ID, but use the first untrusted IP (if available) (AWL bug 6908)
1273 # at low spam scores (<2) ignore trusted/untrusted
1274 # set IP to 127.0.0.1 for any internal IP, so that it can be distinguished from none (AWL bug 6357)
127520828.26ms if ((--$trusteds >= 0 || $msgscore<2) && !$msg_id && $rly->{id}) {$msg_id = $rly->{id};}
1276254812.0ms if (($trusteds >= -1 || $msgscore<2) && !$rly->{ip_private} && $rly->{ip}) {$origip = $rly->{ip};}
1277255018.0ms if ( $trusteds >= 0 && !$origip && $rly->{ip_private} && $rly->{ip}) {$origip = '127.0.0.1';}
1278 }
1279 }
1280
1281 # Look for previous scores of the same message, for instance when doing re-learning
12824683.65ms if ($self->{conf}->{txrep_track_messages}) {
12834681.91ms if ($msg_id) {
12844685.54ms46814662s my $msg_rep = $self->check_reputations($pms, 'MSG_ID', $msg_id, undef, $date, undef);
# spent 14662s making 468 calls to Mail::SpamAssassin::Plugin::TxRep::check_reputations, avg 31.3s/call
12854687.21ms4686.23ms if (defined $msg_rep && $self->count()) {
# spent 6.23ms making 468 calls to Mail::SpamAssassin::Plugin::TxRep::count, avg 13µs/call
12864683.51ms if (defined $self->{learning} && !defined $self->{forgetting}) {
1287 # already learned, forget only if already learned (count>1), and relearn
1288 # when only scanned (count=1), go ahead with normal rep scan
12892342.29ms2342.46ms if ($self->count() > 1) {
# spent 2.46ms making 234 calls to Mail::SpamAssassin::Plugin::TxRep::count, avg 11µs/call
1290234736µs $self->{last_pms} = $pms; # cache the pmstatus
12912342.55ms23449211s $self->forget_message($pms->{msg},$msg_id); # sub reentrance OK
# spent 49211s making 234 calls to Mail::SpamAssassin::Plugin::TxRep::forget_message, avg 210s/call
1292 }
1293 } elsif ($self->{forgetting}) {
1294234731µs $msgscore = $msg_rep; # forget the old stored score instead of the one got now
12952342.91ms2343.30ms dbg("TxRep: forgetting stored score %0.3f of message %s", $msgscore || 'undef', $msg_id);
# spent 3.30ms making 234 calls to Mail::SpamAssassin::Logger::dbg, avg 14µs/call
1296 } else {
1297 # calculating the delta from the stored message reputation
1298 $delta = ($msgscore + $self->{conf}->{txrep_factor}*$msg_rep) / (1+$self->{conf}->{txrep_factor}) - $msgscore;
1299 if ($delta != 0) {
1300 $pms->got_hit("TXREP", "TXREP: ", ruletype => 'eval', score => sprintf("%0.3f", $delta));
1301 }
1302 dbg("TxRep: message %s already scanned, using old data; post-TxRep score: %0.3f", $msg_id, $pms->{score} || 'undef');
1303 return 0;
1304 }
1305 } # no stored reputation found, go ahead with normal rep scan
1306 } else {dbg("TxRep: no message-id available, parsing forced");}
1307 } # else no message tracking, go ahead with normal rep scan
1308
1309 # whitelists recipients at senders from internal networks after checking MSG_ID only
13104687.58ms if ( $self->{conf}->{txrep_whitelist_out} &&
13114681.07ms defined $pms->{relays_internal} && @{$pms->{relays_internal}} &&
13124681.04ms (!defined $pms->{relays_external} || !@{$pms->{relays_external}})
1313 ) {
1314234µs23.65ms foreach my $rcpt ($pms->all_to_addrs()) {
# spent 3.65ms making 2 calls to Mail::SpamAssassin::PerMsgStatus::all_to_addrs, avg 1.83ms/call
1315216µs if ($rcpt) {
1316231µs222µs dbg("TxRep: internal sender, whitelisting recipient: $rcpt");
# spent 22µs making 2 calls to Mail::SpamAssassin::Logger::dbg, avg 11µs/call
1317229µs26.36s $self->modify_reputation($rcpt, -1*$self->{conf}->{txrep_whitelist_out}, undef);
# spent 6.36s making 2 calls to Mail::SpamAssassin::Plugin::TxRep::modify_reputation, avg 3.18s/call
1318 }
1319 }
1320 }
1321
1322 # Get the signing domain
13234686.96ms46820.1ms my $signedby = ($self->{conf}->{auto_whitelist_distinguish_signed})? $pms->get_tag('DKIMDOMAIN') : undef;
# spent 20.1ms making 468 calls to Mail::SpamAssassin::PerMsgStatus::get_tag, avg 43µs/call
1324
1325 # Summary of all information we've gathered so far
1326 dbg("TxRep: active, %s pre-score: %s, autolearn score: %s, IP: %s, address: %s %s",
1327 $msg_id || '',
13284687.12ms4684.96ms $pms->{score} || '?',
# spent 4.96ms making 468 calls to Mail::SpamAssassin::Logger::dbg, avg 11µs/call
1329 $msgscore || '?',
1330 $origip || '?',
1331 $from || '?',
1332 $signedby ? "signed by $signedby" : '(unsigned)'
1333 );
1334
13354681.45ms my $ip = $origip;
13364681.59ms if ($signedby) {
1337 $ip = undef;
1338 $domain = $signedby;
1339 } elsif ($pms->{spf_pass} && $self->{conf}->{txrep_spf}) {
1340 $ip = undef;
1341 $signedby = 'spf';
1342 }
1343
13444681.06ms my $totalweight = 0;
13454681.57ms $self->{totalweight} = $totalweight;
1346
1347 # Get current reputation info
134846816.2ms46813623s $delta += $self->check_reputations($pms, 'EMAIL_IP', $from, $ip, $signedby, $msgscore);
# spent 13623s making 468 calls to Mail::SpamAssassin::Plugin::TxRep::check_reputations, avg 29.1s/call
1349
13504682.28ms if ($domain) {
13514684.91ms46813711s $delta += $self->check_reputations($pms, 'DOMAIN', $domain, $ip, $signedby, $msgscore);
# spent 13711s making 468 calls to Mail::SpamAssassin::Plugin::TxRep::check_reputations, avg 29.3s/call
1352 }
13534682.21ms if ($helo) {
13544084.74ms40812024s $delta += $self->check_reputations($pms, 'HELO', $helo, undef, 'HELO', $msgscore);
# spent 12024s making 408 calls to Mail::SpamAssassin::Plugin::TxRep::check_reputations, avg 29.5s/call
1355 }
13564682.16ms if ($origip) {
13574682.01ms if (!$signedby) {
13584684.98ms46813734s $delta += $self->check_reputations($pms, 'EMAIL', $from, undef, undef, $msgscore);
# spent 13734s making 468 calls to Mail::SpamAssassin::Plugin::TxRep::check_reputations, avg 29.3s/call
1359 }
13604685.12ms46813799s $delta += $self->check_reputations($pms, 'IP', $origip, undef, undef, $msgscore);
# spent 13799s making 468 calls to Mail::SpamAssassin::Plugin::TxRep::check_reputations, avg 29.5s/call
1361 }
1362
1363 # Learn against this message and store reputation
13644681.59ms if (!defined $self->{learning}) {
1365 $delta = ($self->{totalweight})? $self->{conf}->{txrep_factor} * $delta / $self->{totalweight} : 0;
1366 if ($delta) {
1367 $pms->got_hit("TXREP", "TXREP: ", ruletype => 'eval', score => sprintf("%0.3f", $delta));
1368 }
1369 $msgscore += $delta;
1370 if (defined $pms->{score}) {
1371 dbg("TxRep: post-TxRep score: %.3f", $pms->{score});
1372 }
1373 }
1374 # Track message ID
13754683.31ms if ($self->{conf}->{txrep_track_messages} && $msg_id) {
13764684.63ms46814778s $self->check_reputations($pms, 'MSG_ID', $msg_id, undef, $date, $msgscore);
# spent 14778s making 468 calls to Mail::SpamAssassin::Plugin::TxRep::check_reputations, avg 31.6s/call
1377 }
1378 # Close any open resources
13794682.38ms if (!defined $self->{txKeepStoreTied}) {
1380 $self->finish();
1381 }
1382
138346811.4ms return 0;
1384}
1385
1386
1387###########################################################################
1388
# spent 96330s (197ms+96330) within Mail::SpamAssassin::Plugin::TxRep::check_reputations which was called 3216 times, avg 30.0s/call: # 468 times (39.4ms+14778s) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1376, avg 31.6s/call # 468 times (24.2ms+14662s) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1284, avg 31.3s/call # 468 times (31.3ms+13799s) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1360, avg 29.5s/call # 468 times (22.3ms+13734s) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1358, avg 29.3s/call # 468 times (22.3ms+13711s) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1351, avg 29.3s/call # 468 times (29.4ms+13623s) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1348, avg 29.1s/call # 408 times (27.8ms+12024s) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1354, avg 29.5s/call
sub check_reputations {
1389###########################################################################
139032167.37ms my $self = shift;
139132165.61ms my $delta;
1392
1393321635.0ms321694571s if ($self->open_storages()) {
# spent 94571s making 3216 calls to Mail::SpamAssassin::Plugin::TxRep::open_storages, avg 29.4s/call
1394321620.2ms if ($self->{conf}->{txrep_user2global_ratio} && $self->{user_storage} != $self->{global_storage}) {
1395 my $user = $self->check_reputation('user_storage', @_);
1396 my $global = $self->check_reputation('global_storage',@_);
1397
1398 if (defined $user and $user == $user) {
1399 $delta = ( $self->{conf}->{txrep_user2global_ratio} * $user + $global ) / ( 1 + $self->{conf}->{txrep_user2global_ratio} );
1400 } else {
1401 $delta = $global;
1402 }
1403 } else {
1404321655.7ms32161759s $delta = $self->check_reputation(undef,@_);
# spent 1759s making 3216 calls to Mail::SpamAssassin::Plugin::TxRep::check_reputation, avg 547ms/call
1405 }
1406 }
1407321657.0ms return $delta;
1408}
1409
1410
1411###########################################################################
1412
# spent 1759s (1.29+1757) within Mail::SpamAssassin::Plugin::TxRep::check_reputation which was called 3216 times, avg 547ms/call: # 3216 times (1.29s+1757s) by Mail::SpamAssassin::Plugin::TxRep::check_reputations at line 1404, avg 547ms/call
sub check_reputation {
1413###########################################################################
1414321650.8ms my ($self, $storage, $pms, $key, $id, $ip, $signedby, $msgscore) = @_;
1415
141632169.57ms my $delta = 0;
14173216397ms my $weight = ($key eq 'MSG_ID')? 1 : eval('$pms->{main}->{conf}->{txrep_weight_'.lc($key).'}');
# spent 6.50ms executing statements in 468 string evals (merged) # spent 6.21ms executing statements in 468 string evals (merged) # spent 6.18ms executing statements in 468 string evals (merged) # spent 6.15ms executing statements in 468 string evals (merged) # spent 5.43ms executing statements in 408 string evals (merged)
1418
1419# {
1420# #Bug 7164, trying to find out reason for these: _WARN: Use of uninitialized value $msgscore in addition (+) at /usr/share/perl5/vendor_perl/Mail/SpamAssassin/Plugin/TxRep.pm line 1415.
1421# no warnings;
1422#
1423# unless (defined $msgscore) {
1424# #Output some params and the calling function so we can identify more about this bug
1425# dbg("TxRep: MsgScore Undefined (bug 7164) - check_reputation Parameters: self: $self storage: $storage pms: $pms, key: $key, id: $id, ip: $ip, signedby: $signedby, msgscore: $msgscore");
1426# dbg("TxRep: MsgScore Undefined (bug 7164) - weight: $weight");
1427#
1428# my ($package, $filename, $line) = caller();
1429#
1430# chomp($package);
1431# chomp($filename);
1432# chomp($line);
1433#
1434# dbg("TxRep: MsgScore Undefined (bug 7164) - Caller Info: Package: $package - Filename: $filename - Line: $line");
1435#
1436# #Define $msgscore as a triage to hide warnings while we find the root cause
1437# #$msgscore = 0;
1438# }
1439# }
1440
1441
1442321621.9ms if (defined $weight && $weight) {
144332166.32ms my $meanrep;
1444321650.7ms321645.1ms my $timer = $self->{main}->time_method('check_txrep_'.lc($key));
# spent 45.1ms making 3216 calls to Mail::SpamAssassin::time_method, avg 14µs/call
1445
144632166.82ms if (defined $storage) {
1447 $self->{checker} = $self->{$storage};
1448 }
1449321636.0ms32161.44s my $found = $self->get_sender($id, $ip, $signedby);
# spent 1.44s making 3216 calls to Mail::SpamAssassin::Plugin::TxRep::get_sender, avg 446µs/call
1450321616.2ms my $tag_id = (defined $storage)? uc($key.'_'.substr($storage,0,1)) : uc($key);
1451321641.8ms321639.2ms if (defined $found && $self->count()) {
# spent 39.2ms making 3216 calls to Mail::SpamAssassin::Plugin::TxRep::count, avg 12µs/call
1452303852.5ms607661.9ms $meanrep = $self->total() / $self->count();
# spent 32.7ms making 3038 calls to Mail::SpamAssassin::Plugin::TxRep::total, avg 11µs/call # spent 29.3ms making 3038 calls to Mail::SpamAssassin::Plugin::TxRep::count, avg 10µs/call
1453 }
1454321620.2ms if ($self->{learning} && defined $msgscore) {
1455274810.8ms if (defined $meanrep) {
1456 # $msgscore<=>0 gives the sign of $msgscore
1457257025.0ms $msgscore += ($msgscore<=>0) * abs($meanrep);
1458 }
1459 dbg("TxRep: reputation: %s, count: %d, learning: %s, $tag_id: %s",
1460 defined $meanrep? sprintf("%.3f",$meanrep) : 'none',
1461 $self->count() || 0,
14622748108ms549649.3ms $self->{learning} || '',
# spent 26.7ms making 2748 calls to Mail::SpamAssassin::Plugin::TxRep::count, avg 10µs/call # spent 22.6ms making 2748 calls to Mail::SpamAssassin::Logger::dbg, avg 8µs/call
1463 $id || 'none'
1464 );
1465 } else {
14664681.52ms $self->{totalweight} += $weight;
14674685.37ms4683.86ms if ($key eq 'MSG_ID' && $self->count() > 0) {
# spent 3.86ms making 468 calls to Mail::SpamAssassin::Plugin::TxRep::count, avg 8µs/call
14684686.33ms9369.25ms $delta = $self->total() / $self->count();
# spent 5.59ms making 468 calls to Mail::SpamAssassin::Plugin::TxRep::total, avg 12µs/call # spent 3.66ms making 468 calls to Mail::SpamAssassin::Plugin::TxRep::count, avg 8µs/call
146946817.6ms46838.6ms $pms->set_tag('TXREP'.$tag_id, sprintf("%2.1f", $delta));
# spent 38.6ms making 468 calls to Mail::SpamAssassin::PerMsgStatus::set_tag, avg 83µs/call
1470 } elsif (defined $self->total()) {
1471 #Bug 7164 - $msgscore undefined
1472 if (defined $msgscore) {
1473 $delta = ($self->total() + $msgscore) / (1 + $self->count()) - $msgscore;
1474 } else {
1475 $delta = ($self->total()) / (1 + $self->count());
1476 }
1477
1478 $pms->set_tag('TXREP_'.$tag_id, sprintf("%2.1f", $delta));
1479 if (defined $meanrep) {
1480 $pms->set_tag('TXREP_'.$tag_id.'_MEAN', sprintf("%2.1f", $meanrep));
1481 }
1482 $pms->set_tag('TXREP_'.$tag_id.'_COUNT', sprintf("%2.1f", $self->count()));
1483 $pms->set_tag('TXREP_'.$tag_id.'_PRESCORE', sprintf("%2.1f", $pms->{score}));
1484 } else {
1485 $pms->set_tag('TXREP_'.$tag_id.'_UNKNOWN', 1);
1486 }
148746812.3ms9369.12ms dbg("TxRep: reputation: %s, count: %d, weight: %.1f, delta: %.3f, $tag_id: %s",
# spent 5.27ms making 468 calls to Mail::SpamAssassin::Plugin::TxRep::count, avg 11µs/call # spent 3.84ms making 468 calls to Mail::SpamAssassin::Logger::dbg, avg 8µs/call
1488 defined $meanrep? sprintf("%.3f",$meanrep) : 'none',
1489 $self->count() || 0,
1490 $weight || 0,
1491 $delta || 0,
1492 $id || 'none'
1493 );
1494 }
1495321634.0ms321627.5ms $timer = $self->{main}->time_method('update_txrep_'.lc($key));
# spent 27.5ms making 3216 calls to Mail::SpamAssassin::time_method, avg 9µs/call
1496321618.7ms if (defined $msgscore) {
1497274813.0ms if ($self->{forgetting}) { # forgetting a message score
1498137412.8ms1374209ms $self->remove_score($msgscore); # remove the given score and decrement the count
# spent 209ms making 1374 calls to Mail::SpamAssassin::Plugin::TxRep::remove_score, avg 152µs/call
149913744.61ms if ($key eq 'MSG_ID') { # remove the message ID score completely
15002342.32ms234876s $self->{checker}->remove_entry($self->{entry});
# spent 876s making 234 calls to Mail::SpamAssassin::DBBasedAddrList::remove_entry, avg 3.74s/call
1501 }
1502 } else {
1503137412.5ms1374259ms $self->add_score($msgscore); # add the score and increment the count
# spent 259ms making 1374 calls to Mail::SpamAssassin::Plugin::TxRep::add_score, avg 189µs/call
150413746.31ms2342.80ms if ($self->{learning} && $key eq 'MSG_ID' && $self->count() eq 1) {
# spent 2.80ms making 234 calls to Mail::SpamAssassin::Plugin::TxRep::count, avg 12µs/call
150516116µs162.44ms $self->add_score($msgscore); # increasing the count by 1 at a learned score (count=2)
# spent 2.44ms making 16 calls to Mail::SpamAssassin::Plugin::TxRep::add_score, avg 152µs/call
1506 } # it can be distinguished from a scanned score (count=1)
1507 }
1508 } elsif (defined $found && $self->{forgetting} && $key eq 'MSG_ID') {
15092342.29ms234879s $self->{checker}->remove_entry($self->{entry}); #forgetting the message ID
# spent 879s making 234 calls to Mail::SpamAssassin::DBBasedAddrList::remove_entry, avg 3.76s/call
1510 }
1511 }
151232167.18ms if (defined $storage) {
1513 $self->{checker} = $self->{default_storage};
1514 }
1515
1516321665.4ms return ($weight || 0) * ($delta || 0);
1517}
1518
- -
1521#--------------------------------------------------------------------------
1522# Database handler subroutines
1523#--------------------------------------------------------------------------
1524
1525###########################################################################
152625464241ms
# spent 133ms within Mail::SpamAssassin::Plugin::TxRep::count which was called 12732 times, avg 10µs/call: # 3216 times (39.2ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_reputation at line 1451, avg 12µs/call # 3038 times (29.3ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_reputation at line 1452, avg 10µs/call # 2748 times (26.7ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_reputation at line 1462, avg 10µs/call # 1390 times (13.2ms+0s) by Mail::SpamAssassin::Plugin::TxRep::add_score at line 1565, avg 9µs/call # 468 times (6.23ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1285, avg 13µs/call # 468 times (5.27ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_reputation at line 1487, avg 11µs/call # 468 times (3.86ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_reputation at line 1467, avg 8µs/call # 468 times (3.66ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_reputation at line 1468, avg 8µs/call # 234 times (2.80ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_reputation at line 1504, avg 12µs/call # 234 times (2.46ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1289, avg 11µs/call
sub count {my $self=shift; return (defined $self->{checker})? $self->{entry}->{count} : undef;}
1527979278.5ms
# spent 51.0ms within Mail::SpamAssassin::Plugin::TxRep::total which was called 4896 times, avg 10µs/call: # 3038 times (32.7ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_reputation at line 1452, avg 11µs/call # 1390 times (12.7ms+0s) by Mail::SpamAssassin::Plugin::TxRep::add_score at line 1565, avg 9µs/call # 468 times (5.59ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_reputation at line 1468, avg 12µs/call
sub total {my $self=shift; return (defined $self->{checker})? $self->{entry}->{totscore} : undef;}
1528###########################################################################
1529
1530
1531###########################################################################
1532
# spent 1.44s (335ms+1.10) within Mail::SpamAssassin::Plugin::TxRep::get_sender which was called 3216 times, avg 446µs/call: # 3216 times (335ms+1.10s) by Mail::SpamAssassin::Plugin::TxRep::check_reputation at line 1449, avg 446µs/call
sub get_sender {
1533###########################################################################
1534321620.4ms my ($self, $addr, $origip, $signedby) = @_;
1535
153632169.56ms return unless (defined $self->{checker});
1537
1538321634.4ms3216252ms my $fulladdr = $self->pack_addr($addr, $origip);
# spent 252ms making 3216 calls to Mail::SpamAssassin::Plugin::TxRep::pack_addr, avg 78µs/call
1539321636.4ms3216812ms my $entry = $self->{checker}->get_addr_entry($fulladdr, $signedby);
# spent 812ms making 3216 calls to Mail::SpamAssassin::DBBasedAddrList::get_addr_entry, avg 252µs/call
1540321626.3ms $self->{entry} = $entry;
154132169.61ms $origip = $origip || 'none';
1542
15433216149ms643237.1ms if ($entry->{count}<0 || $entry->{count}=~/^(nan|)$/ || $entry->{totscore}=~/^(nan|)$/) {
# spent 37.1ms making 6432 calls to Mail::SpamAssassin::Plugin::TxRep::CORE:match, avg 6µs/call
1544 warn "TxRep: resetting bad data for ($addr, $origip), count: $entry->{count}, totscore: $entry->{totscore}\n";
1545 $self->{entry}->{count} = $self->{entry}->{totscore} = 0;
1546 }
1547321655.6ms return $self->{entry}->{count};
1548}
1549
1550
1551###########################################################################
1552
# spent 262ms (82.6+179) within Mail::SpamAssassin::Plugin::TxRep::add_score which was called 1390 times, avg 188µs/call: # 1374 times (81.9ms+177ms) by Mail::SpamAssassin::Plugin::TxRep::check_reputation at line 1503, avg 189µs/call # 16 times (677µs+1.76ms) by Mail::SpamAssassin::Plugin::TxRep::check_reputation at line 1505, avg 152µs/call
sub add_score {
1553###########################################################################
155413905.63ms my ($self,$score) = @_;
1555
155613903.75ms return unless (defined $self->{checker}); # no factory defined; we can't check
1557
155813904.78ms if ($score != $score) {
1559 warn "TxRep: attempt to add a $score to TxRep entry ignored\n";
1560 return; # don't try to add a NaN
1561 }
156213904.37ms $self->{entry}->{count} ||= 0;
1563
1564 # performing the dilution aging correction
1565139031.8ms278025.9ms if (defined $self->total() && defined $self->count() && defined $self->{txrep_dilution_factor}) {
# spent 13.2ms making 1390 calls to Mail::SpamAssassin::Plugin::TxRep::count, avg 9µs/call # spent 12.7ms making 1390 calls to Mail::SpamAssassin::Plugin::TxRep::total, avg 9µs/call
1566 my $diluted_total =
1567 ($self->count() + 1) *
1568 ($self->{txrep_dilution_factor} * $self->total() + $score) /
1569 ($self->{txrep_dilution_factor} * $self->count() + 1);
1570 my $corrected_score = $diluted_total - $self->total();
1571 $self->{checker}->add_score($self->{entry}, $corrected_score);
1572 } else {
1573139013.9ms1390153ms $self->{checker}->add_score($self->{entry}, $score);
# spent 153ms making 1390 calls to Mail::SpamAssassin::DBBasedAddrList::add_score, avg 110µs/call
1574 }
1575}
1576
- -
1579###########################################################################
1580
# spent 209ms (59.1+150) within Mail::SpamAssassin::Plugin::TxRep::remove_score which was called 1374 times, avg 152µs/call: # 1374 times (59.1ms+150ms) by Mail::SpamAssassin::Plugin::TxRep::check_reputation at line 1498, avg 152µs/call
sub remove_score {
1581###########################################################################
158213745.73ms my ($self,$score) = @_;
1583
158413743.80ms return unless (defined $self->{checker}); # no factory defined; we can't check
1585
158613744.91ms if ($score != $score) { # don't try to add a NaN
1587 warn "TxRep: attempt to add a $score to TxRep entry ignored\n";
1588 return;
1589 }
1590 # no reversal dilution aging correction (not easily possible),
1591 # just removing the original message score
159213747.18ms if ($self->{entry}->{count} > 2)
15932901.21ms {$self->{entry}->{count} -= 2;}
159410843.45ms else {$self->{entry}->{count} = 0;}
1595 # substract 2, and add a score; hence decrementing by 1
1596137425.1ms1374150ms $self->{checker}->add_score($self->{entry}, -1*$score);
# spent 150ms making 1374 calls to Mail::SpamAssassin::DBBasedAddrList::add_score, avg 109µs/call
1597}
1598
- -
1601###########################################################################
1602
# spent 6.36s (251µs+6.36) within Mail::SpamAssassin::Plugin::TxRep::modify_reputation which was called 2 times, avg 3.18s/call: # 2 times (251µs+6.36s) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1317, avg 3.18s/call
sub modify_reputation {
1603###########################################################################
1604213µs my ($self, $addr, $score, $signedby) = @_;
1605
160626µs return unless (defined $self->{checker}); # no factory defined; we can't check
1607221µs284µs my $fulladdr = $self->pack_addr($addr, undef);
# spent 84µs making 2 calls to Mail::SpamAssassin::Plugin::TxRep::pack_addr, avg 42µs/call
1608220µs2428µs my $entry = $self->{checker}->get_addr_entry($fulladdr, $signedby);
# spent 428µs making 2 calls to Mail::SpamAssassin::DBBasedAddrList::get_addr_entry, avg 214µs/call
1609
1610 # remove any old entries (will remove per-ip entries as well)
1611 # always call this regardless, as the current entry may have 0
1612 # scores, but the per-ip one may have more
1613220µs26.36s $self->{checker}->remove_entry($entry);
# spent 6.36s making 2 calls to Mail::SpamAssassin::DBBasedAddrList::remove_entry, avg 3.18s/call
1614
1615 # remove address only, no new score to add if score NaN or undef
1616217µs if (defined $score && $score==$score) {
1617 # else add score. get a new entry first
1618241µs2490µs $entry = $self->{checker}->get_addr_entry($fulladdr, $signedby);
# spent 490µs making 2 calls to Mail::SpamAssassin::DBBasedAddrList::get_addr_entry, avg 245µs/call
1619220µs2210µs $self->{checker}->add_score($entry, $score);
# spent 210µs making 2 calls to Mail::SpamAssassin::DBBasedAddrList::add_score, avg 105µs/call
1620 }
1621237µs return 1;
1622}
1623
1624
1625# connecting the primary and the secondary storage; needed only on the first run
1626# (this can't be in the constructor, since the settings are not available there)
1627###########################################################################
1628
# spent 94571s (300ms+94571) within Mail::SpamAssassin::Plugin::TxRep::open_storages which was called 3216 times, avg 29.4s/call: # 3216 times (300ms+94571s) by Mail::SpamAssassin::Plugin::TxRep::check_reputations at line 1393, avg 29.4s/call
sub open_storages {
1629###########################################################################
163032166.95ms my $self = shift;
1631
1632 # disabled per bug 7191
1633 #return 1 unless (!defined $self->{default_storage});
1634
163532166.57ms my $factory;
1636321619.2ms if ($self->{main}->{pers_addr_list_factory}) {
163732159.46ms $factory = $self->{main}->{pers_addr_list_factory};
1638 } else {
163914µs my $type = $self->{conf}->{txrep_factory};
1640116µs15µs if ($type =~ /^([_A-Za-z0-9:]+)$/) {
# spent 5µs making 1 call to Mail::SpamAssassin::Plugin::TxRep::CORE:match
164119µs132µs $type = untaint_var($type);
# spent 32µs making 1 call to Mail::SpamAssassin::Util::untaint_var
1642 eval 'require '.$type.';
1643 $factory = '.$type.'->new();
1644 1;'
16451162µs or do {
# spent 432µs executing statements in string eval
1646 my $eval_stat = $@ ne '' ? $@ : "errno=$!"; chomp $eval_stat;
1647 warn "TxRep: $eval_stat\n";
1648 undef $factory;
1649 };
1650112µs111µs $self->{main}->set_persistent_address_list_factory($factory) if $factory;
1651 } else {warn "TxRep: illegal factory setting\n";}
1652 }
1653321618.6ms if (defined $factory) {
165432169.06s321694562s $self->{checker} = $self->{default_storage} = $factory->new_checker($self->{main});
# spent 94562s making 3216 calls to Mail::SpamAssassin::DBBasedAddrList::new_checker, avg 29.4s/call
1655
1656321624.7ms32158.90s if ($self->{conf}->{txrep_user2global_ratio} && !defined $self->{global_storage}) {
# spent 8.90s making 3215 calls to DB_File::DESTROY, avg 2.77ms/call
1657 # hack to handle the BDB and SQL factory types of the storage object
1658 # TODO: add an a method to the handler class instead
1659 my ($storage_type, $is_global);
1660
1661 if (ref($factory) =~ /SQLBasedAddrList/) {
1662 $is_global = defined $self->{conf}->{user_awl_sql_override_username};
1663 $storage_type = 'SQL';
1664 if ($is_global && $self->{conf}->{user_awl_sql_override_username} eq $self->{main}->{username}) {
1665 # skip double storage if current user same as the global override
1666 $self->{user_storage} = $self->{global_storage} = $self->{default_storage};
1667 }
1668 } elsif (ref($factory) =~ /DBBasedAddrList/) {
1669 $is_global = $self->{conf}->{auto_whitelist_path} !~ /__userstate__/;
1670 $storage_type = 'DB';
1671 }
1672 if (!defined $self->{global_storage}) {
1673 my $sql_override_orig = $self->{conf}->{user_awl_sql_override_username};
1674 my $awl_path_orig = $self->{conf}->{auto_whitelist_path};
1675 if ($is_global) {
1676 $self->{conf}->{user_awl_sql_override_username} = '';
1677 $self->{conf}->{auto_whitelist_path} = '__userstate__/tx-reputation';
1678 $self->{global_storage} = $self->{default_storage};
1679 $self->{user_storage} = $factory->new_checker($self->{main});
1680 } else {
1681 $self->{conf}->{user_awl_sql_override_username} = 'GLOBAL';
1682 $self->{conf}->{auto_whitelist_path} = '__local_state_dir__/tx-reputation';
1683 $self->{global_storage} = $factory->new_checker($self->{main});
1684 $self->{user_storage} = $self->{default_storage};
1685 }
1686 $self->{conf}->{user_awl_sql_override_username} = $sql_override_orig;
1687 $self->{conf}->{auto_whitelist_path} = $awl_path_orig;
1688
1689 # Another ugly hack to find out whether the user differs from
1690 # the global one. We need to add a method to the factory handlers
1691 if ($storage_type eq 'DB' &&
1692 $self->{user_storage}->{locked_file} eq $self->{global_storage}->{locked_file}) {
1693 if ($is_global)
1694 {$self->{global_storage}->finish();}
1695 else {$self->{user_storage}->finish();}
1696 $self->{user_storage} = $self->{global_storage} = $self->{default_storage};
1697 }
1698 }
1699 }
1700 } else {
1701 $self->{user_storage} = $self->{global_storage} = $self->{checker} = $self->{default_storage} = undef;
1702 warn("TxRep: could not open storages, quitting!\n");
1703 return 0;
1704 }
1705321657.2ms return 1;
1706}
1707
1708
1709###########################################################################
1710
# spent 1.06ms (45µs+1.02) within Mail::SpamAssassin::Plugin::TxRep::finish which was called: # once (45µs+1.02ms) by Mail::SpamAssassin::Plugin::TxRep::learner_close at line 1889
sub finish {
1711###########################################################################
171212µs my $self = shift;
1713
171413µs return unless (defined $self->{checker}); # no factory defined; we can't check
1715
1716110µs if ($self->{conf}->{txrep_user2global_ratio} && defined $self->{user_storage} && ($self->{user_storage} != $self->{global_storage})) {
1717 $self->{user_storage}->finish();
1718 $self->{global_storage}->finish();
1719 $self->{user_storage} = undef;
1720 $self->{global_storage} = undef;
1721 } elsif (defined $self->{default_storage}) {
1722111µs11.02ms $self->{default_storage}->finish();
# spent 1.02ms making 1 call to Mail::SpamAssassin::DBBasedAddrList::finish
172314µs $self->{default_storage} = $self->{checker} = undef;
1724 }
1725119µs $self->{factory} = undef;
1726}
1727
1728
1729###########################################################################
1730
# spent 73.1ms (55.2+17.9) within Mail::SpamAssassin::Plugin::TxRep::ip_to_awl_key which was called 936 times, avg 78µs/call: # 936 times (55.2ms+17.9ms) by Mail::SpamAssassin::Plugin::TxRep::pack_addr at line 1785, avg 78µs/call
sub ip_to_awl_key {
1731###########################################################################
17329364.00ms my ($self, $origip) = @_;
1733
17349361.76ms my $result;
17359366.34ms local $1;
173693632.5ms93617.9ms if (!defined $origip) {
# spent 17.9ms making 936 calls to Mail::SpamAssassin::Plugin::TxRep::CORE:match, avg 19µs/call
1737 # could not find an IP address to use
1738 } elsif ($origip =~ /^ (\d{1,3} \. \d{1,3}) \. \d{1,3} \. \d{1,3} $/xs) {
17399363.03ms my $mask_len = $self->{ipv4_mask_len};
17409362.64ms $mask_len = 16 if !defined $mask_len;
1741 # handle the default and easy cases manually
17429364.95ms if ($mask_len == 32) {$result = $origip;}
17439364.04ms elsif ($mask_len == 16) {$result = $1;}
1744 else {
1745 my $origip_obj = NetAddr::IP->new($origip . '/' . $mask_len);
1746 if (!defined $origip_obj) { # invalid IPv4 address
1747 dbg("TxRep: bad IPv4 address $origip");
1748 } else {
1749 $result = $origip_obj->network->addr;
1750 $result =~s/(\.0){1,3}\z//; # truncate zero tail
1751 }
1752 }
1753 } elsif ($origip =~ /:/ && # triage
1754 $origip =~
1755 /^ [0-9a-f]{0,4} (?: : [0-9a-f]{0,4} | \. [0-9]{1,3} ){2,9} $/xsi) {
1756 # looks like an IPv6 address
1757 my $mask_len = $self->{ipv6_mask_len};
1758 $mask_len = 48 if !defined $mask_len;
1759 my $origip_obj = NetAddr::IP->new6($origip . '/' . $mask_len);
1760 if (!defined $origip_obj) { # invalid IPv6 address
1761 dbg("TxRep: bad IPv6 address $origip");
1762 } else {
1763 $result = $origip_obj->network->full6; # string in a canonical form
1764 $result =~ s/(:0000){1,7}\z/::/; # compress zero tail
1765 }
1766 } else {
1767 dbg("TxRep: bad IP address $origip");
1768 }
17699364.82ms if (defined $result && length($result) > 39) { # just in case, keep under
1770 $result = substr($result,0,39); # the awl.ip field size
1771 }
1772# if (defined $result) {dbg("TxRep: IP masking %s -> %s", $origip || '?', $result || '?');}
177393611.2ms return $result;
1774}
1775
1776
1777###########################################################################
1778
# spent 252ms (147+105) within Mail::SpamAssassin::Plugin::TxRep::pack_addr which was called 3218 times, avg 78µs/call: # 3216 times (147ms+105ms) by Mail::SpamAssassin::Plugin::TxRep::get_sender at line 1538, avg 78µs/call # 2 times (72µs+12µs) by Mail::SpamAssassin::Plugin::TxRep::modify_reputation at line 1607, avg 42µs/call
sub pack_addr {
1779###########################################################################
1780321815.4ms my ($self, $addr, $origip) = @_;
1781
1782321813.7ms $addr = lc $addr;
1783321872.0ms321831.6ms $addr =~ s/[\000\;\'\"\!\|]/_/gs; # paranoia
# spent 31.6ms making 3218 calls to Mail::SpamAssassin::Plugin::TxRep::CORE:subst, avg 10µs/call
1784
1785415418.5ms93673.1ms if ( defined $origip) {$origip = $self->ip_to_awl_key($origip);}
# spent 73.1ms making 936 calls to Mail::SpamAssassin::Plugin::TxRep::ip_to_awl_key, avg 78µs/call
1786550018.1ms if (!defined $origip) {$origip = 'none';}
1787321876.8ms return $addr . "|ip=" . $origip;
1788}
1789
- -
1792# -------------------------------------------------------------------------
1793=head1 LEARNING SPAM / HAM
1794
1795When SpamAssassin is told to learn (or relearn) a given message as spam or
1796ham, all reputations relevant to the message (email, email_ip, domain, ip, helo)
1797in both global and user storages will be updated using the C<txrep_learn_penalty>
1798respectively the C<rxrep_learn_bonus> values. The new reputation of given sender
1799property (email, domain,...) will be the respective result of one of the following
1800formulas:
1801
1802 new_reputation = old_reputation + learn_penalty
1803 new_reputation = old_reputation - learn_bonus
1804
1805The TxRep plugin currently does track each message individually, hence it
1806does not detect when you learn the message repeatedly. It will add/subtract
1807the penalty/bonus score each time the message is fed to the spam learner.
1808
1809=cut
1810######################################################### plugin hook #####
1811
# spent 19µs within Mail::SpamAssassin::Plugin::TxRep::learner_new which was called: # once (19µs+0s) by Mail::SpamAssassin::PluginHandler::callback at line 204 of Mail/SpamAssassin/PluginHandler.pm
sub learner_new {
1812###########################################################################
181312µs my ($self) = @_;
1814
181519µs $self->{txKeepStoreTied} = 1;
1816116µs return $self;
1817}
1818
1819
1820######################################################### plugin hook #####
1821sub autolearn {
1822###########################################################################
1823 my ($self, $params) = @_;
1824
1825 $self->{last_pms} = $params->{permsgstatus};
1826 return $self->{autolearn} = 1;
1827}
1828
1829
1830######################################################### plugin hook #####
1831
# spent 96387s (34.1ms+96387) within Mail::SpamAssassin::Plugin::TxRep::learn_message which was called 234 times, avg 412s/call: # 234 times (34.1ms+96387s) by Mail::SpamAssassin::PluginHandler::callback at line 204 of Mail/SpamAssassin/PluginHandler.pm, avg 412s/call
sub learn_message {
1832###########################################################################
1833234555µs my ($self, $params) = @_;
1834234670µs return 0 unless (defined $params->{isspam});
1835
18362341.66ms2341.59ms dbg("TxRep: learning a message");
# spent 1.59ms making 234 calls to Mail::SpamAssassin::Logger::dbg, avg 7µs/call
18372343.40ms23476.7ms my $pms = ($self->{last_pms})? $self->{last_pms} : Mail::SpamAssassin::PerMsgStatus->new($self->{main}, $params->{msg});
# spent 76.7ms making 234 calls to Mail::SpamAssassin::PerMsgStatus::new, avg 328µs/call
18382341.55ms if (!defined $pms->{relays_internal} && !defined $pms->{relays_external}) {
18392342.81ms23446.4s $pms->extract_message_metadata();
# spent 46.4s making 234 calls to Mail::SpamAssassin::PerMsgStatus::extract_message_metadata, avg 198ms/call
1840 }
1841
18422341.38ms if ($params->{isspam})
18432341.51ms {$self->{learning} = $self->{conf}->{txrep_learn_penalty};}
1844 else {$self->{learning} = -1 * $self->{conf}->{txrep_learn_bonus};}
1845
18462342.94ms23496340s my $ret = !$self->{learning} || $self->check_senders_reputation($pms);
# spent 96340s making 234 calls to Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation, avg 412s/call
1847234720µs $self->{learning} = undef;
184823413.8ms793.25ms return $ret;
# spent 3.25ms making 79 calls to Mail::SpamAssassin::PerMsgStatus::DESTROY, avg 41µs/call
1849}
1850
1851
1852######################################################### plugin hook #####
1853
# spent 49211s (12.5ms+49211) within Mail::SpamAssassin::Plugin::TxRep::forget_message which was called 234 times, avg 210s/call: # 234 times (12.5ms+49211s) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1291, avg 210s/call
sub forget_message {
1854###########################################################################
1855234984µs my ($self, $params) = @_;
1856234876µs return 0 unless ($self->{conf}->{use_txrep});
1857234785µs my $pms = ($self->{last_pms})? $self->{last_pms} : Mail::SpamAssassin::PerMsgStatus->new($self->{main}, $params->{msg});
1858
18592341.57ms2341.48ms dbg("TxRep: forgetting a message");
# spent 1.48ms making 234 calls to Mail::SpamAssassin::Logger::dbg, avg 6µs/call
1860234639µs $self->{forgetting} = 1;
18612342.47ms2340s my $ret = $self->check_senders_reputation($pms);
# spent 49211s making 234 calls to Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation, avg 210s/call, recursion: max depth 1, sum of overlapping time 49211s
18622341.04ms $self->{forgetting} = undef;
18632342.61ms return $ret;
1864}
1865
1866
1867######################################################### plugin hook #####
1868sub learner_expire_old_training {
1869###########################################################################
1870 my ($self, $params) = @_;
1871 return 0 unless ($self->{conf}->{use_txrep} && $self->{conf}->{txrep_expiry_days});
1872
1873 dbg("TxRep: expiry not implemented yet");
1874# dbg("TxRep: expiry starting");
1875# my $timer = $self->{main}->time_method("expire_bayes");
1876# $self->{store}->expire_old_tokens($params);
1877# dbg("TxRep: expiry completed");
1878}
1879
1880
1881######################################################### plugin hook #####
1882
# spent 1.12ms (49µs+1.07) within Mail::SpamAssassin::Plugin::TxRep::learner_close which was called: # once (49µs+1.07ms) by Mail::SpamAssassin::PluginHandler::callback at line 204 of Mail/SpamAssassin/PluginHandler.pm
sub learner_close {
1883###########################################################################
188412µs my ($self, $params) = @_;
188513µs my $quiet = $params->{quiet};
188614µs return 0 unless ($self->{conf}->{use_txrep});
1887
188813µs $self->{txKeepStoreTied} = undef;
1889110µs11.06ms $self->finish();
# spent 1.06ms making 1 call to Mail::SpamAssassin::Plugin::TxRep::finish
1890118µs18µs dbg("TxRep: learner_close");
# spent 8µs making 1 call to Mail::SpamAssassin::Logger::dbg
1891}
1892
1893
1894# -------------------------------------------------------------------------
1895=head1 OPTIMIZING TXREP
1896
1897TxRep can be optimized for speed and simplicity, or for the precision in
1898assigning the reputation scores.
1899
1900First of all TxRep can be quickly disabled and re-enabled through the option
1901L</C<use_txrep>>. It can be done globally, or individually in each respective
1902C<user_prefs>. Disabling TxRep will not destroy the database, so it can be
1903re-enabled any time later again.
1904
1905On many systems, SQL-based storage may perform faster than the default
1906Berkeley DB storage, so you should consider setting it up. See the section
1907L</SQL-BASED STORAGE> for instructions.
1908
1909Then there are multiple settings that can reduce the number of records stored
1910in the database, hence reducing the size of the storage, and also the processing
1911time:
1912
19131. Setting L</C<txrep_user2global_ratio>> to zero will disable the dual storage,
1914halving so the disk space requirements, and the processing times of this plugin.
1915
19162. You can disable all but one of the L<REPUTATION WEIGHTS>. The EMAIL_IP is
1917the most specific option, so it is the most likely choice in such case, but you
1918could base the reputation system on any of the remaining scores. Each of the
1919enabled reputations adds a new entry to the database for each new identificator.
1920So while for example the number of recorded and scored domains may be big, the
1921number of stored IP addresses will be probably higher, and would require more
1922space in the storage.
1923
19243. Disabling the L</C<txrep_track_messages>> avoids storing a separate entry
1925for every scanned message, hence also reducing the disk space requirements, and
1926the processing time.
1927
19284. Disabling the option L</C<txrep_autolearn>> will save the processing time
1929at messages that trigger the auto-learning process.
1930
19315. Disabling L</C<txrep_whitelist_out>> will reduce the processing time at
1932outbound connections.
1933
19346. Keeping the option L</C<auto_whitelist_distinguish_signed>> enabled may help
1935slightly reducing the size of the database, because at signed messages, the
1936originating IP address is ignored, hence no additional database entries are
1937needed for each separate IP address (resp. a masked block of IP addresses).
1938
1939
1940Since TxRep reuses the storage architecture of the former AWL plugin, for
1941initializing the SQL storage, the same instructions apply also to TxRep.
1942Although the old AWL table can be reused for TxRep, by default TxRep expects
1943the SQL table to be named "txrep".
1944
1945To install a new SQL table for TxRep, run the appropriate SQL file for your
1946system under the /sql directory.
1947
1948If you get a syntax error at an older version of MySQL, use TYPE=MyISAM
1949instead of ENGINE=MyISAM at the end of the command. You can also use other
1950types of ENGINE (depending on what is available on your system). For example
1951MEMORY engine stores the entire table in the server memory, achieving
1952performance similar to Redis. You would need to care about the replication
1953of the RAM table to disk through a cronjob, to avoid loss of data at reboot.
1954The InnoDB engine is used by default, offering high scalability (database
1955size and concurence of accesses). In conjunction with a high value of
1956innodb_buffer_pool or with the memcached plugin (MySQL v5.6+) it can also
1957offer performance comparable to Redis.
1958
1959=cut
1960
1961111µs1;
 
# spent 78.1ms within Mail::SpamAssassin::Plugin::TxRep::CORE:match which was called 14043 times, avg 6µs/call: # 6432 times (37.1ms+0s) by Mail::SpamAssassin::Plugin::TxRep::get_sender at line 1543, avg 6µs/call # 6206 times (20.6ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1266, avg 3µs/call # 936 times (17.9ms+0s) by Mail::SpamAssassin::Plugin::TxRep::ip_to_awl_key at line 1736, avg 19µs/call # 468 times (2.49ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1255, avg 5µs/call # once (5µs+0s) by Mail::SpamAssassin::Plugin::TxRep::open_storages at line 1640
sub Mail::SpamAssassin::Plugin::TxRep::CORE:match; # opcode
# spent 87.6ms within Mail::SpamAssassin::Plugin::TxRep::CORE:regcomp which was called 6206 times, avg 14µs/call: # 6206 times (87.6ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1266, avg 14µs/call
sub Mail::SpamAssassin::Plugin::TxRep::CORE:regcomp; # opcode
# spent 34.7ms within Mail::SpamAssassin::Plugin::TxRep::CORE:subst which was called 3686 times, avg 9µs/call: # 3218 times (31.6ms+0s) by Mail::SpamAssassin::Plugin::TxRep::pack_addr at line 1783, avg 10µs/call # 468 times (3.10ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1257, avg 7µs/call
sub Mail::SpamAssassin::Plugin::TxRep::CORE:subst; # opcode