← Index
NYTProf Performance Profile   « line view »
For /usr/local/bin/sa-learn
  Run on Tue Nov 7 05:38:10 2017
Reported on Tue Nov 7 06:16:03 2017

Filename/usr/local/lib/perl5/site_perl/Mail/SpamAssassin/Plugin/TxRep.pm
StatementsExecuted 239578 statements in 2.24s
Subroutines
Calls P F Exclusive
Time
Inclusive
Time
Subroutine
311411877ms1466sMail::SpamAssassin::Plugin::TxRep::::check_reputationMail::SpamAssassin::Plugin::TxRep::check_reputation
45321393ms1476sMail::SpamAssassin::Plugin::TxRep::::check_senders_reputationMail::SpamAssassin::Plugin::TxRep::check_senders_reputation (recurses: max depth 1, inclusive time 1469s)
311411256ms1.26sMail::SpamAssassin::Plugin::TxRep::::get_senderMail::SpamAssassin::Plugin::TxRep::get_sender
311471141ms1467sMail::SpamAssassin::Plugin::TxRep::::check_reputationsMail::SpamAssassin::Plugin::TxRep::check_reputations
311621119ms210msMail::SpamAssassin::Plugin::TxRep::::pack_addrMail::SpamAssassin::Plugin::TxRep::pack_addr
12245121114ms114msMail::SpamAssassin::Plugin::TxRep::::countMail::SpamAssassin::Plugin::TxRep::count
16152194.6ms292msMail::SpamAssassin::Plugin::TxRep::::add_scoreMail::SpamAssassin::Plugin::TxRep::add_score
60121179.3ms79.3msMail::SpamAssassin::Plugin::TxRep::::CORE:regcompMail::SpamAssassin::Plugin::TxRep::CORE:regcomp (opcode)
136005167.5ms67.5msMail::SpamAssassin::Plugin::TxRep::::CORE:matchMail::SpamAssassin::Plugin::TxRep::CORE:match (opcode)
9061153.0ms67.4msMail::SpamAssassin::Plugin::TxRep::::ip_to_awl_keyMail::SpamAssassin::Plugin::TxRep::ip_to_awl_key
12811142.4ms187msMail::SpamAssassin::Plugin::TxRep::::remove_scoreMail::SpamAssassin::Plugin::TxRep::remove_score
46585139.3ms39.3msMail::SpamAssassin::Plugin::TxRep::::totalMail::SpamAssassin::Plugin::TxRep::total
2351130.9ms1520sMail::SpamAssassin::Plugin::TxRep::::learn_messageMail::SpamAssassin::Plugin::TxRep::learn_message
31141128.5ms741msMail::SpamAssassin::Plugin::TxRep::::open_storagesMail::SpamAssassin::Plugin::TxRep::open_storages
35692126.7ms26.7msMail::SpamAssassin::Plugin::TxRep::::CORE:substMail::SpamAssassin::Plugin::TxRep::CORE:subst (opcode)
11119.7ms27.5msMail::SpamAssassin::Plugin::TxRep::::BEGIN@209Mail::SpamAssassin::Plugin::TxRep::BEGIN@209
2181112.3ms1469sMail::SpamAssassin::Plugin::TxRep::::forget_messageMail::SpamAssassin::Plugin::TxRep::forget_message
211255µs6.22sMail::SpamAssassin::Plugin::TxRep::::modify_reputationMail::SpamAssassin::Plugin::TxRep::modify_reputation
111236µs1.06msMail::SpamAssassin::Plugin::TxRep::::set_configMail::SpamAssassin::Plugin::TxRep::set_config
11196µs1.23msMail::SpamAssassin::Plugin::TxRep::::newMail::SpamAssassin::Plugin::TxRep::new
11194µs5.48sMail::SpamAssassin::Plugin::TxRep::::finishMail::SpamAssassin::Plugin::TxRep::finish
11159µs69µsMail::SpamAssassin::Plugin::TxRep::::BEGIN@202Mail::SpamAssassin::Plugin::TxRep::BEGIN@202
11158µs539µsMail::SpamAssassin::Plugin::TxRep::::BEGIN@207Mail::SpamAssassin::Plugin::TxRep::BEGIN@207
11149µs5.48sMail::SpamAssassin::Plugin::TxRep::::learner_closeMail::SpamAssassin::Plugin::TxRep::learner_close
11135µs247µsMail::SpamAssassin::Plugin::TxRep::::BEGIN@211Mail::SpamAssassin::Plugin::TxRep::BEGIN@211
11129µs110µsMail::SpamAssassin::Plugin::TxRep::::BEGIN@213Mail::SpamAssassin::Plugin::TxRep::BEGIN@213
11127µs158µsMail::SpamAssassin::Plugin::TxRep::::BEGIN@210Mail::SpamAssassin::Plugin::TxRep::BEGIN@210
11127µs27µsMail::SpamAssassin::Plugin::TxRep::::__ANON__[:495]Mail::SpamAssassin::Plugin::TxRep::__ANON__[:495]
11122µs84µsMail::SpamAssassin::Plugin::TxRep::::BEGIN@205Mail::SpamAssassin::Plugin::TxRep::BEGIN@205
11120µs50µsMail::SpamAssassin::Plugin::TxRep::::BEGIN@203Mail::SpamAssassin::Plugin::TxRep::BEGIN@203
11116µs16µsMail::SpamAssassin::Plugin::TxRep::::BEGIN@208Mail::SpamAssassin::Plugin::TxRep::BEGIN@208
11116µs16µsMail::SpamAssassin::Plugin::TxRep::::learner_newMail::SpamAssassin::Plugin::TxRep::learner_new
0000s0sMail::SpamAssassin::Plugin::TxRep::::__ANON__[:306]Mail::SpamAssassin::Plugin::TxRep::__ANON__[:306]
0000s0sMail::SpamAssassin::Plugin::TxRep::::__ANON__[:350]Mail::SpamAssassin::Plugin::TxRep::__ANON__[:350]
0000s0sMail::SpamAssassin::Plugin::TxRep::::__ANON__[:375]Mail::SpamAssassin::Plugin::TxRep::__ANON__[:375]
0000s0sMail::SpamAssassin::Plugin::TxRep::::__ANON__[:398]Mail::SpamAssassin::Plugin::TxRep::__ANON__[:398]
0000s0sMail::SpamAssassin::Plugin::TxRep::::__ANON__[:421]Mail::SpamAssassin::Plugin::TxRep::__ANON__[:421]
0000s0sMail::SpamAssassin::Plugin::TxRep::::__ANON__[:446]Mail::SpamAssassin::Plugin::TxRep::__ANON__[:446]
0000s0sMail::SpamAssassin::Plugin::TxRep::::__ANON__[:527]Mail::SpamAssassin::Plugin::TxRep::__ANON__[:527]
0000s0sMail::SpamAssassin::Plugin::TxRep::::__ANON__[:560]Mail::SpamAssassin::Plugin::TxRep::__ANON__[:560]
0000s0sMail::SpamAssassin::Plugin::TxRep::::__ANON__[:642]Mail::SpamAssassin::Plugin::TxRep::__ANON__[:642]
0000s0sMail::SpamAssassin::Plugin::TxRep::::__ANON__[:763]Mail::SpamAssassin::Plugin::TxRep::__ANON__[:763]
0000s0sMail::SpamAssassin::Plugin::TxRep::::__ANON__[:797]Mail::SpamAssassin::Plugin::TxRep::__ANON__[:797]
0000s0sMail::SpamAssassin::Plugin::TxRep::::__ANON__[:836]Mail::SpamAssassin::Plugin::TxRep::__ANON__[:836]
0000s0sMail::SpamAssassin::Plugin::TxRep::::__ANON__[:862]Mail::SpamAssassin::Plugin::TxRep::__ANON__[:862]
0000s0sMail::SpamAssassin::Plugin::TxRep::::__ANON__[:893]Mail::SpamAssassin::Plugin::TxRep::__ANON__[:893]
0000s0sMail::SpamAssassin::Plugin::TxRep::::__ANON__[:945]Mail::SpamAssassin::Plugin::TxRep::__ANON__[:945]
0000s0sMail::SpamAssassin::Plugin::TxRep::::__ANON__[:998]Mail::SpamAssassin::Plugin::TxRep::__ANON__[:998]
0000s0sMail::SpamAssassin::Plugin::TxRep::::_fail_exitMail::SpamAssassin::Plugin::TxRep::_fail_exit
0000s0sMail::SpamAssassin::Plugin::TxRep::::_fn_envelopeMail::SpamAssassin::Plugin::TxRep::_fn_envelope
0000s0sMail::SpamAssassin::Plugin::TxRep::::_messageMail::SpamAssassin::Plugin::TxRep::_message
0000s0sMail::SpamAssassin::Plugin::TxRep::::autolearnMail::SpamAssassin::Plugin::TxRep::autolearn
0000s0sMail::SpamAssassin::Plugin::TxRep::::blacklist_addressMail::SpamAssassin::Plugin::TxRep::blacklist_address
0000s0sMail::SpamAssassin::Plugin::TxRep::::learner_expire_old_trainingMail::SpamAssassin::Plugin::TxRep::learner_expire_old_training
0000s0sMail::SpamAssassin::Plugin::TxRep::::remove_addressMail::SpamAssassin::Plugin::TxRep::remove_address
0000s0sMail::SpamAssassin::Plugin::TxRep::::whitelist_addressMail::SpamAssassin::Plugin::TxRep::whitelist_address
Call graph for these subroutines as a Graphviz dot language file.
Line State
ments
Time
on line
Calls Time
in subs
Code
1# <@LICENSE>
2# Licensed to the Apache Software Foundation (ASF) under one or more
3# contributor license agreements. See the NOTICE file distributed with
4# this work for additional information regarding copyright ownership.
5# The ASF licenses this file to you under the Apache License, Version 2.0
6# (the "License"); you may not use this file except in compliance with
7# the License. You may obtain a copy of the License at:
8#
9# http://www.apache.org/licenses/LICENSE-2.0
10#
11# Unless required by applicable law or agreed to in writing, software
12# distributed under the License is distributed on an "AS IS" BASIS,
13# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14# See the License for the specific language governing permissions and
15# limitations under the License.
16# </@LICENSE>
17
18
19=head1 NAME
20
21Mail::SpamAssassin::Plugin::TxRep - Normalize scores with sender reputation records
22
23
24=head1 SYNOPSIS
25
26The TxRep (Reputation) plugin is designed as an improved replacement of the AWL
27(Auto-Whitelist) plugin. It adjusts the final message spam score by looking up
28and taking in consideration the reputation of the sender.
29
30To try TxRep out, you B<have to> first disable the AWL plugin (if enabled), and
31back up its database. AWL is loaded in v310.pre and can be disabled by
32commenting out the loadplugin line:
33
34 # loadplugin Mail::SpamAssassin::Plugin::AWL
35
36When AWL is not disabled, TxRep will refuse to run.
37
38TxRep should be enabled by uncommenting the following line in v341.pre:
39
40 loadplugin Mail::SpamAssassin::Plugin::TxRep
41
42Use the supplied 60_txreputation.cf file or add these lines to a .cf file:
43
44 header TXREP eval:check_senders_reputation()
45 describe TXREP Score normalizing based on sender's reputation
46 tflags TXREP userconf noautolearn
47 priority TXREP 1000
48
49
50=head1 DESCRIPTION
51
52This plugin is intended to replace the former AWL - AutoWhiteList. Although the
53concept and the scope differ, the purpose remains the same - the normalizing of spam
54score results based on previous sender's history. The name was intentionally changed
55from "whitelist" to "reputation" to avoid any confusion, since the result score can
56be adjusted in both directions.
57
58The TxRep plugin keeps track of the average SpamAssassin score for senders.
59Senders are tracked using multiple identificators, or their combinations: the From:
60email address, the originating IP and/or an originating block of IPs, sender's domain
61name, the DKIM signature, and the HELO name. TxRep then uses the average score to reduce
62the variability in scoring from message to message, and modifies the final score by
63pushing the result towards the historical average. This improves the accuracy of
64filtering for most email.
65
66In comparison with the original AWL plugin, several conceptual changes were implemented
67in TxRep:
68
691. B<Scoring> - at AWL, although it tracks the number of messages received from each
70respective sender, when calculating the corrective score at a new message, it does
71not take it in count in any way. So for example a sender who previously sent a single
72ham message with the score of -5, and then sends a second one with the score of +10,
73AWL will issue a corrective score bringing the score towards the -5. With the default
74C<auto_whitelist_factor> of 0.5, the resulting score would be only 2.5. And it would be
75exactly the same even if the sender previously sent 1,000 messages with the average of
76-5. TxRep tries to take the maximal advantage of the collected data, and adjusts the
77final score not only with the mean reputation score stored in the database, but also
78respecting the number of messages already seen from the sender. You can see the exact
79formula in the section L</C<txrep_factor>>.
80
812. B<Learning> - AWL ignores any spam/ham learning. In fact it acts against it, which
82often leads to a frustrating situation, where a user repeatedly tags all messages of a
83given sender as spam (resp. ham), but at any new message from the sender, AWL will
84adjust the score of the message back to the historical average which does B<not> include
85the learned scores. This is now changed at TxRep, and every spam/ham learning will be
86recorded in the reputation database, and hence taken in consideration at future email
87from the respective sender. See the section L</"LEARNING SPAM / HAM"> for more details.
88
893. B<Auto-Learning> - in certain situations SpamAssassin may declare a message an
90obvious spam resp. ham, and launch the auto-learning process, so that the message can be
91re-evaluated. AWL, by design, did not perform any auto-learning adjustments. This plugin
92will readjust the stored reputation by the value defined by L</C<txrep_learn_penalty>>
93resp. L</C<txrep_learn_bonus>>. Auto-learning score thresholds may be tuned, or the
94auto-learning completely disabled, through the setting L</C<txrep_autolearn>>.
95
964. B<Relearning> - messages that were wrongly learned or auto-learned, can be relearned.
97Old reputations are removed from the database, and new ones added instead of them. The
98relearning works better when message tracking is enabled through the
99L</C<txrep_track_messages>> option. Without it, the relearned score is simply added to
100the reputation, without removing the old ones.
101
1025. B<Aging> - with AWL, any historical record of given sender has the same weight. It
103means that changes in senders behavior, or modified SA rules may take long time, or
104be virtually negated by the AWL normalization, especially at senders with high count
105of past messages, and low recent frequency. It also turns to be particularly
106counterproductive when the administrator detects new patterns in certain messages, and
107applies new rules to better tag such messages as spam or ham. AWL will practically
108eliminate the effect of the new rules, by adjusting the score back towards the (wrong)
109historical average. Only setting the C<auto_whitelist_factor> lower would help, but in
110the same time it would also reduce the overall impact of AWL, and put doubts on its
111purpose. TxRep, besides the L</C<txrep_factor>> (replacement of the C<auto_whitelist_factor>),
112introduces also the L</C<txrep_dilution_factor>> to help coping with this issue by
113progressively reducing the impact of past records. More details can be found in the
114description of the factor below.
115
1166. B<Blacklisting and Whitelisting> - when a whitelisting or blacklisting was requested
117through SpamAssassin's API, AWL adjusts the historical total score of the plain email
118address without IP (and deleted records bound to an IP), but since during the reception
119new records with IP will be added, the blacklisted entry would cease acting during
120scanning. TxRep always uses the record of the plain email address without IP together
121with the one bound to an IP address, DKIM signature, or SPF pass (unless the weight
122factor for the EMAIL reputation is set to zero). AWL uses the score of 100 (resp. -100)
123for the blacklisting (resp. whitelisting) purposes. TxRep increases the value
124proportionally to the weight factor of the EMAIL reputation. It is explained in details
125in the section L</BLACKLISTING / WHITELISTING>. TxRep can blacklist or whitelist also
126IP addresses, domain names, and dotless HELO names.
127
1287. B<Sender Identification> - AWL identifies a sender on the basis of the email address
129used, and the originating IP address (better told its part defined by the mask setting).
130The main purpose of this measure is to avoid assigning false good scores to spammers who
131spoof known email addresses. The disadvantage appears at senders who send from frequently
132changing locations or even when connecting through dynamical IP addresses that are not
133within the block defined by the mask setting. Their score is difficult or sometimes
134impossible to track. Another disadvantage is, for example, at a spammer persistently
135sending spam from the same IP address, just under different email addresses. AWL will not
136find his previous scores, unless he reuses the same email address again. TxRep uses several
137identificators, and creates separate database entries for each of them. It tracks not only
138the email/IP address combination like AWL, but also the standalone email address (regardless
139of the originating IP), the standalone IP (regardless of email address used), the domain
140name of the email address, the DKIM signature, and the HELO name of the connecting PC. The
141influence of each individual identificator may be tuned up with the help of weight factors
142described in the section L</REPUTATION WEIGHTS>.
143
1448. B<Message Tracking> - TxRep (optionally) keeps track of already scanned and/or learned
145message ID's. This is useful for avoiding to strengthen the reputation score by simply
146rescanning or relearning the same message multiple times. In the same time it also allows
147the proper relearning of once wrongly learned messages, or relearning them after the
148learn penalty or bonus were changed. See the option L</C<txrep_track_messages>>.
149
1509. B<User and Global Storages> - usually it is recommended to use the per-user setup
151of SpamAssassin, because each user may have quite different requirements, and may receive
152quite different sort of email. Especially when using the Bayesian and AWL plugins,
153the efficiency is much better when SpamAssassin is learned spam and ham separately
154for each user. However, the disadvantage is that senders and emails already learned
155many times by different users, will need to be relearned without any recognized history,
156anytime they arrive to another user. TxRep uses the advantages of both systems. It can
157use dual storages: the global common storage, where all email processed by SpamAssassin
158is recorded, and a local storage separate for each user, with reputation data from his
159email only. See more details at the setting L</C<txrep_user2global_ratio>>.
160
16110. B<Outbound Whitelisting> - when a local user sends messages to an email address, we
162assume that he needs to see the eventual answer too, hence the recipient's address should
163be whitelisted. When SpamAssassin is used for scanning outgoing email too, when local
164users use the SMTP server where SA is installed, for sending email, and when internal
165networks are defined, TxREP will improve the reputation of all 'To:' and 'CC' addresses
166from messages originating in the internal networks. Details can be found at the setting
167L</C<txrep_whitelist_out>>.
168
169Both plugins (AWL and TxREP) cannot coexist. It is necessary to disable the AWL to allow
170TxRep running. TxRep reuses the database handling of the original AWL module, and some
171its parameters bound to the database handler modules. By default, TxRep creates its own
172database, but the original auto-whitelist can be reused as a starting point. The AWL
173database can be renamed to the name defined in TxRep settings, and TxRep will start
174using it. The original auto-whitelist database has to be backed up, to allow switching
175back to the original state.
176
177The spamassassin/Plugin/TxRep.pm file replaces both spamassassin/Plugin/AWL.pm and
178spamassassin/AutoWhitelist.pm. Another two AWL files, spamassassin/DBBasedAddrList.pm
179and spamassassin/SQLBasedAddrList.pm are still needed.
180
181
182=head1 TEMPLATE TAGS
183
184This plugin module adds the following C<tags> that can be used as
185placeholders in certain options. See L<Mail::SpamAssassin::Conf>
186for more information on TEMPLATE TAGS.
187
188 _TXREP_XXX_Y_ TXREP modifier
189 _TXREP_XXX_Y_MEAN_ Mean score on which TXREP modification is based
190 _TXREP_XXX_Y_COUNT_ Number of messages on which TXREP modification is based
191 _TXREP_XXX_Y_PRESCORE_ Score before TXREP
192 _TXREP_XXX_Y_UNKNOW_ New sender (not found in the TXREP list)
193
194The XXX part of the tag takes the form of one of the following IDs, depending
195on the reputation checked: EMAIL, EMAIL_IP, IP, DOMAIN, or HELO. The _Y appendix
196ID is used only in the case of dual storage, and takes the form of either _U (for
197user storage reputations), or _G (for global storage reputations).
198
199=cut # ....................................................................
200package Mail::SpamAssassin::Plugin::TxRep;
201
202272µs278µs
# spent 69µs (59+10) within Mail::SpamAssassin::Plugin::TxRep::BEGIN@202 which was called: # once (59µs+10µs) by Mail::SpamAssassin::PluginHandler::load_plugin at line 202
use strict;
# spent 69µs making 1 call to Mail::SpamAssassin::Plugin::TxRep::BEGIN@202 # spent 10µs making 1 call to strict::import
203274µs280µs
# spent 50µs (20+30) within Mail::SpamAssassin::Plugin::TxRep::BEGIN@203 which was called: # once (20µs+30µs) by Mail::SpamAssassin::PluginHandler::load_plugin at line 203
use warnings;
# spent 50µs making 1 call to Mail::SpamAssassin::Plugin::TxRep::BEGIN@203 # spent 30µs making 1 call to warnings::import
204# use bytes;
205272µs2146µs
# spent 84µs (22+62) within Mail::SpamAssassin::Plugin::TxRep::BEGIN@205 which was called: # once (22µs+62µs) by Mail::SpamAssassin::PluginHandler::load_plugin at line 205
use re 'taint';
# spent 84µs making 1 call to Mail::SpamAssassin::Plugin::TxRep::BEGIN@205 # spent 62µs making 1 call to re::import
206
2073172µs31.02ms
# spent 539µs (58+481) within Mail::SpamAssassin::Plugin::TxRep::BEGIN@207 which was called: # once (58µs+481µs) by Mail::SpamAssassin::PluginHandler::load_plugin at line 207
use NetAddr::IP 4.000; # qw(:upper);
# spent 539µs making 1 call to Mail::SpamAssassin::Plugin::TxRep::BEGIN@207 # spent 426µs making 1 call to NetAddr::IP::import # spent 55µs making 1 call to version::_VERSION
208256µs116µs
# spent 16µs within Mail::SpamAssassin::Plugin::TxRep::BEGIN@208 which was called: # once (16µs+0s) by Mail::SpamAssassin::PluginHandler::load_plugin at line 208
use Mail::SpamAssassin::Plugin;
# spent 16µs making 1 call to Mail::SpamAssassin::Plugin::TxRep::BEGIN@208
2092379µs127.5ms
# spent 27.5ms (19.7+7.82) within Mail::SpamAssassin::Plugin::TxRep::BEGIN@209 which was called: # once (19.7ms+7.82ms) by Mail::SpamAssassin::PluginHandler::load_plugin at line 209
use Mail::SpamAssassin::Plugin::Bayes;
# spent 27.5ms making 1 call to Mail::SpamAssassin::Plugin::TxRep::BEGIN@209
210274µs2289µs
# spent 158µs (27+131) within Mail::SpamAssassin::Plugin::TxRep::BEGIN@210 which was called: # once (27µs+131µs) by Mail::SpamAssassin::PluginHandler::load_plugin at line 210
use Mail::SpamAssassin::Util qw(untaint_var);
# spent 158µs making 1 call to Mail::SpamAssassin::Plugin::TxRep::BEGIN@210 # spent 131µs making 1 call to Exporter::import
211280µs2459µs
# spent 247µs (35+212) within Mail::SpamAssassin::Plugin::TxRep::BEGIN@211 which was called: # once (35µs+212µs) by Mail::SpamAssassin::PluginHandler::load_plugin at line 211
use Mail::SpamAssassin::Logger;
# spent 247µs making 1 call to Mail::SpamAssassin::Plugin::TxRep::BEGIN@211 # spent 212µs making 1 call to Exporter::import
212
213212.6ms2190µs
# spent 110µs (29+80) within Mail::SpamAssassin::Plugin::TxRep::BEGIN@213 which was called: # once (29µs+80µs) by Mail::SpamAssassin::PluginHandler::load_plugin at line 213
use vars qw(@ISA);
# spent 110µs making 1 call to Mail::SpamAssassin::Plugin::TxRep::BEGIN@213 # spent 80µs making 1 call to vars::import
214116µs@ISA = qw(Mail::SpamAssassin::Plugin);
215
216
217###########################################################################
218
# spent 1.23ms (96µs+1.13) within Mail::SpamAssassin::Plugin::TxRep::new which was called: # once (96µs+1.13ms) by Mail::SpamAssassin::PluginHandler::load_plugin at line 1 of (eval 42)[Mail/SpamAssassin/PluginHandler.pm:129]
sub new { # constructor: register the eval rule
219###########################################################################
22013µs my ($class, $main) = @_;
221
22212µs $class = ref($class) || $class;
223113µs126µs my $self = $class->SUPER::new($main);
# spent 26µs making 1 call to Mail::SpamAssassin::Plugin::new
22412µs bless($self, $class);
225
22618µs $self->{main} = $main;
22713µs $self->{conf} = $main->{conf};
22813µs $self->{factor} = $main->{conf}->{txrep_factor};
22913µs $self->{ipv4_mask_len} = $main->{conf}->{txrep_ipv4_mask_len};
23013µs $self->{ipv6_mask_len} = $main->{conf}->{txrep_ipv6_mask_len};
231111µs132µs $self->register_eval_rule("check_senders_reputation");
# spent 32µs making 1 call to Mail::SpamAssassin::Plugin::register_eval_rule
23219µs11.06ms $self->set_config($main->{conf});
# spent 1.06ms making 1 call to Mail::SpamAssassin::Plugin::TxRep::set_config
233
234 # only the default conf loaded here, do nothing here requiring
235 # the runtime settings
23618µs111µs dbg("TxRep: new object created");
# spent 11µs making 1 call to Mail::SpamAssassin::Logger::dbg
23719µs return $self;
238}
239
240
241###########################################################################
242
# spent 1.06ms (236µs+826µs) within Mail::SpamAssassin::Plugin::TxRep::set_config which was called: # once (236µs+826µs) by Mail::SpamAssassin::Plugin::TxRep::new at line 232
sub set_config {
243###########################################################################
24412µs my($self, $conf) = @_;
24516µs my @cmds;
246
247# -------------------------------------------------------------------------
248=head1 USER PREFERENCES
249
250The following options can be used in both site-wide (C<local.cf>) and
251user-specific (C<user_prefs>) configuration files to customize how
252SpamAssassin handles incoming email messages.
253
254=over 4
255
256=item B<use_txrep>
257
258 0 | 1 (default: 0)
259
260Whether to use TxRep reputation system. TxRep tracks the long-term average
261score for each sender and then shifts the score of new messages toward that
262long-term average. This can increase or decrease the score for messages,
263depending on the long-term behavior of the particular correspondent.
264
265Note that certain tests are ignored when determining the final message score:
266
267 - rules with tflags set to 'noautolearn'
268
269=cut # ...................................................................
27017µs push (@cmds, {
271 setting => 'use_txrep',
272 default => 0,
273 type => $Mail::SpamAssassin::Conf::CONF_TYPE_BOOL
274 });
275
276
277# -------------------------------------------------------------------------
278=item B<txrep_factor>
279
280 range [0..1] (default: 0.5)
281
282How much towards the long-term mean for the sender to regress a message.
283Basically, the algorithm is to track the long-term total score and the count
284of messages for the sender (C<total> and C<count>), and then once we have
285otherwise fully calculated the score for this message (C<score>), we calculate
286the final score for the message as:
287
288 finalscore = score + factor * (total + score)/(count + 1)
289
290So if C<factor> = 0.5, then we'll move to half way between the calculated
291score and the new mean value. If C<factor> = 0.3, then we'll move about 1/3
292of the way from the score toward the mean. C<factor> = 1 means use the
293long-term mean including also the new unadjusted score; C<factor> = 0 mean
294just use the calculated score, disabling so the score averaging, though still
295recording the reputation to the database.
296
297=cut # ...................................................................
298 push (@cmds, {
299 setting => 'txrep_factor',
300 default => 0.5,
301 type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
302 code => sub {
303 my ($self, $key, $value, $line) = @_;
304 if ($value < 0 || $value > 1.0) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;}
305 $self->{txrep_factor} = $value;
306 }
307112µs });
308
309
310# -------------------------------------------------------------------------
311=item B<txrep_dilution_factor>
312
313 range [0.7..1.0] (default: 0.98)
314
315At any new email from given sender, the historical reputation records are "diluted",
316or "watered down" by certain fraction given by this factor. It means that the
317influence of old records will progressively diminish with every new message from
318given sender. This is important to allow a more flexible handling of changes in
319sender's behavior, or new improvements or changes of local SA rules.
320
321Without any dilution expiry (dilution factor set to 1), the new message score is
322simply add to the total score of given sender in the reputation database. When
323dilution is used (factor < 1), the impact of the historical reputation average is
324reduced by the factor before calculating the new average, which in turn is then
325used to adjust the new total score to be stored in the database.
326
327 newtotal = (oldcount + 1) * (newscore + dilution * oldtotal) / (dilution * oldcount + 1)
328
329In other words, it means that the older a message is, the less and less impact
330on the new average its original spam score has. For example if we set the factor
331to 0.9 (meaning dilution by 10%), the score of the new message will be recorded
332to its 100%, the last score of the same sender to 90%, the second last to 81%
333(0.9 * 0.9 = 0.81), and for example the 10th last message just to 35%.
334
335At stable systems, we recommend keeping the factor close to 1 (but still lower
336than 1). At systems where SA rules tuning and spam learning is still in progress,
337lower factors will help the reputation to quicker adapt any modifications. In
338the same time, it will also reduce the impact of the historical reputation
339though.
340
341=cut # ...................................................................
342 push (@cmds, {
343 setting => 'txrep_dilution_factor',
344 default => 0.98,
345 type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
346 code => sub {
347 my ($self, $key, $value, $line) = @_;
348 if ($value < 0.7 || $value > 1.0) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;}
349 $self->{txrep_dilution_factor} = $value;
350 }
35118µs });
352
353
354# TODO, not implemented yet, hence no advertising until then
355# -------------------------------------------------------------------------
356#=item B<txrep_expiry_days>
357#
358# range [0..10000] (default: 365)
359#
360#The scores of of messages can be removed from the total reputation, and the
361#message tracking entry removed from the database after given number of days.
362#It helps keeping the database growth under control, and it also reduces the
363#influence of old scores on the current reputation (both scoring methods, and
364#sender's behavior might have changed over time).
365#
366#=cut # ...................................................................
367 push (@cmds, {
368 setting => 'txrep_expiry_days',
369 default => 365,
370 type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
371 code => sub {
372 my ($self, $key, $value, $line) = @_;
373 if ($value < 0 || $value > 10000) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;}
374 $self->{txrep_expiry_days} = $value;
375 }
37617µs });
377
378
379# -------------------------------------------------------------------------
380=item B<txrep_learn_penalty>
381
382 range [0..200] (default: 20)
383
384When SpamAssassin is trained a SPAM message, the given penalty score will
385be added to the total reputation score of the sender, regardless of the real
386spam score. The impact of the penalty will be the smaller the higher is the
387number of messages that the sender already has in the TxRep database.
388
389=cut # ...................................................................
390 push (@cmds, {
391 setting => 'txrep_learn_penalty',
392 default => 20,
393 type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
394 code => sub {
395 my ($self, $key, $value, $line) = @_;
396 if ($value < 0 || $value > 200) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;}
397 $self->{txrep_learn_penalty} = $value;
398 }
39917µs });
400
401
402# -------------------------------------------------------------------------
403=item B<txrep_learn_bonus>
404
405 range [0..200] (default: 20)
406
407When SpamAssassin is trained a HAM message, the given penalty score will be
408deduced from the total reputation score of the sender, regardless of the real
409spam score. The impact of the penalty will be the smaller the higher is the
410number of messages that the sender already has in the TxRep database.
411
412=cut # ...................................................................
413 push (@cmds, {
414 setting => 'txrep_learn_bonus',
415 default => 20,
416 type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
417 code => sub {
418 my ($self, $key, $value, $line) = @_;
419 if ($value < 0 || $value > 200) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;}
420 $self->{txrep_learn_bonus} = $value;
421 }
42218µs });
423
424
425# -------------------------------------------------------------------------
426=item B<txrep_autolearn>
427
428 range [0..5] (default: 0)
429
430When SpamAssassin declares a message a clear spam resp. ham during the mesage
431scan, and launches the auto-learn process, sender reputation scores of given
432message will be adjusted by the value of the option L</C<txrep_learn_penalty>>,
433resp. the L</C<txrep_learn_bonus>> in the same way as during the manual learning.
434Value 0 at this option disables the auto-learn reputation adjustment - only the
435score calculated before the auto-learn will be stored to the reputation database.
436
437=cut # ...................................................................
438 push (@cmds, {
439 setting => 'txrep_autolearn',
440 default => 0,
441 type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
442 code => sub {
443 my ($self, $key, $value, $line) = @_;
444 if ($value < 0 || $value > 5) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;}
445 $self->{txrep_autolearn} = $value;
446 }
447110µs });
448
449
450# -------------------------------------------------------------------------
451=item B<txrep_track_messages>
452
453 0 | 1 (default: 1)
454
455Whether TxRep should keep track of already scanned and/or learned messages.
456When enabled, an additional record in the reputation database will be created
457to avoid false score adjustments due to repeated scanning of the same message,
458and to allow proper relearning of messages that were either previously wrongly
459learned, or need to be relearned after modifying the learn penalty or bonus.
460
461=cut # ...................................................................
46214µs push (@cmds, {
463 setting => 'txrep_track_messages',
464 default => 1,
465 type => $Mail::SpamAssassin::Conf::CONF_TYPE_BOOL
466 });
467
468
469# -------------------------------------------------------------------------
470=item B<txrep_whitelist_out>
471
472 range [0..200] (default: 10)
473
474When the value of this setting is greater than zero, recipients of messages sent from
475within the internal networks will be whitelisted through improving their total reputation
476score with the number of points defined by this setting. Since the IP address and other
477sender identificators are not known when sending the email, only the reputation of the
478standalone email is being whitelisted. The domain name is intentionally also left
479unaffected. The outbound whitelisting can only work when SpamAssassin is set up to scan
480also outgoing email, when local users use the SMTP server for sending email, and when
481C<internal_networks> are defined in SpamAssassin configuration. The improving of the
482reputation happens at every message sent from internal networks, so the more messages is
483being sent to the recipient, the better reputation his email address will have.
484
485
486=cut # ...................................................................
487 push (@cmds, {
488 setting => 'txrep_whitelist_out',
489 default => 10,
490 type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
491
# spent 27µs within Mail::SpamAssassin::Plugin::TxRep::__ANON__[/usr/local/lib/perl5/site_perl/Mail/SpamAssassin/Plugin/TxRep.pm:495] which was called: # once (27µs+0s) by Mail::SpamAssassin::Conf::Parser::parse at line 438 of Mail/SpamAssassin/Conf/Parser.pm
code => sub {
49216µs my ($self, $key, $value, $line) = @_;
49314µs if ($value < 0 || $value > 200) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;}
494115µs $self->{txrep_whitelist_out} = $value;
495 }
496111µs });
497
498
499# -------------------------------------------------------------------------
500=item B<txrep_ipv4_mask_len>
501
502 range [0..32] (default: 16)
503
504The AWL database keeps only the specified number of most-significant bits
505of an IPv4 address in its fields, so that different individual IP addresses
506within a subnet belonging to the same owner are managed under a single
507database record. As we have no information available on the allocated
508address ranges of senders, this CIDR mask length is only an approximation.
509The default is 16 bits, corresponding to a former class B. Increase the
510number if a finer granularity is desired, e.g. to 24 (class C) or 32.
511A value 0 is allowed but is not particularly useful, as it would treat the
512whole internet as a single organization. The number need not be a multiple
513of 8, any split is allowed.
514
515=cut # ...................................................................
516 push (@cmds, {
517 setting => 'txrep_ipv4_mask_len',
518 default => 16,
519 type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
520 code => sub {
521 my ($self, $key, $value, $line) = @_;
522 if (!defined $value || $value eq '')
523 {return $Mail::SpamAssassin::Conf::MISSING_REQUIRED_VALUE;}
524 elsif ($value !~ /^\d+$/ || $value < 0 || $value > 32)
525 {return $Mail::SpamAssassin::Conf::INVALID_VALUE;}
526 $self->{txrep_ipv4_mask_len} = $value;
527 }
528110µs });
529
530
531# -------------------------------------------------------------------------
532=item B<txrep_ipv6_mask_len>
533
534 range [0..128] (default: 48)
535
536The AWL database keeps only the specified number of most-significant bits
537of an IPv6 address in its fields, so that different individual IP addresses
538within a subnet belonging to the same owner are managed under a single
539database record. As we have no information available on the allocated address
540ranges of senders, this CIDR mask length is only an approximation. The default
541is 48 bits, corresponding to an address range commonly allocated to individual
542(smaller) organizations. Increase the number for a finer granularity, e.g.
543to 64 or 96 or 128, or decrease for wider ranges, e.g. 32. A value 0 is
544allowed but is not particularly useful, as it would treat the whole internet
545as a single organization. The number need not be a multiple of 4, any split
546is allowed.
547
548=cut # ...................................................................
549 push (@cmds, {
550 setting => 'txrep_ipv6_mask_len',
551 default => 48,
552 type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
553 code => sub {
554 my ($self, $key, $value, $line) = @_;
555 if (!defined $value || $value eq '')
556 {return $Mail::SpamAssassin::Conf::MISSING_REQUIRED_VALUE;}
557 elsif ($value !~ /^\d+$/ || $value < 0 || $value > 128)
558 {return $Mail::SpamAssassin::Conf::INVALID_VALUE;}
559 $self->{txrep_ipv6_mask_len} = $value;
560 }
56117µs });
562
563
564# -------------------------------------------------------------------------
565=item B<user_awl_sql_override_username>
566
567 string (default: undefined)
568
569Used by the SQLBasedAddrList storage implementation.
570
571If this option is set the SQLBasedAddrList module will override the set
572username with the value given. This can be useful for implementing global
573or group based TxRep databases.
574
575=cut # ...................................................................
57613µs push (@cmds, {
577 setting => 'user_awl_sql_override_username',
578 default => '',
579 type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING
580 });
581
582
583# -------------------------------------------------------------------------
584=item B<txrep_user2global_ratio>
585
586 range [0..10] (default: 0)
587
588When the option txrep_user2global_ratio is set to a value greater than zero, and
589if the server configuration allows it, two data storages will be used - user and
590global (server-wide) storages.
591
592User storage keeps only senders who send messages to the respective recipient,
593and will reflect also the corrected/learned scores, when some messages are marked
594by the user as spam or ham, or when the sender is whitelisted or blacklisted
595through the API of SpamAssassin.
596
597Global storage keeps the reputation data of all messages processed by SpamAssassin
598with their spam scores and spam/ham learning data from all users on the server.
599Hence, the module will return a reputation value even at senders not known to the
600current recipient, as long as he already sent email to anyone else on the server.
601
602The value of the txrep_user2global_ratio parameter controls the impact of each
603of the two reputations. When equal to 1, both the global and the user score will
604have the same impact on the result. When set to 2, the reputation taken from
605the user storage will have twice the impact of the global value. The final value
606of the TXREP tag will be calculated as follows:
607
608 total = ( ratio * user + global ) / ( ratio + 1 )
609
610When no reputation is found in the user storage, and a global reputation is
611available, the global storage is used fully, without applying the ratio.
612
613When the ratio is set to zero, only the default storage will be used. And it
614then depends whether you use the global, or the local user storage by default,
615which in turn is controlled either by the parameter user_awl_sql_override_username
616(in case of SQL storage), or the C</auto_whitelist_path> parameter (in case of
617Berkeley database).
618
619When this dual storage is enabled, and no global storage is defined by the
620above mentioned parameters for the Berkeley or SQL databases, TxRep will attempt
621to use a generic storage - user 'GLOBAL' in case of SQL, and in the case of
622Berkeley database it uses the path defined by '__local_state_dir__/tx-reputation',
623which typically renders into /var/db/spamassassin/tx-reputation. When the default
624storages are not available, or are not writable, you would have to set the global
625storage with the help of the C<user_awl_sql_override_username> resp.
626C<auto_whitelist_path settings>.
627
628Please note that some SpamAssassin installations run always under the same user
629ID. In such case it is pointless enabling the dual storage, because it would
630maximally lead to two identical global storages in different locations.
631
632This feature is disabled by default.
633=cut # ...................................................................
634 push (@cmds, {
635 setting => 'txrep_user2global_ratio',
636 default => 0,
637 type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING,
638 code => sub {
639 my ($self, $key, $value, $line) = @_;
640 if ($value < 0 || $value > 10) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;}
641 $self->{txrep_user2global_ratio} = $value;
642 }
64317µs });
644
645
646# -------------------------------------------------------------------------
647=item B<auto_whitelist_distinguish_signed>
648
649 (default: 1 - enabled)
650
651Used by the SQLBasedAddrList storage implementation.
652
653If this option is set the SQLBasedAddrList module will keep separate
654database entries for DKIM-validated e-mail addresses and for non-validated
655ones. Without this option, or for domains that do not use a DKIM signature,
656the reputation of legitimate email can get mixed with the reputation of
657forgeries. A pre-requisite when setting this option is that a field
658txrep.signedby exists in a SQL table, otherwise SQL operations will fail.
659A DKIM plugin must also be enabled in order for this option to take effect.
660This option is highly recommended. Unless you are using a pre-3.3.0 database
661schema and cannot upgrade, there is no reason to disable this option. If
662you are upgrading from AWL and using a pre-3.3.0 schema, the txrep.signedby
663column will not exist. It is recommended that you add this column, but if
664that is not possible you must set this option to 0 to avoid SQL errors.
665
666=cut # ...................................................................
66713µs push (@cmds, {
668 setting => 'auto_whitelist_distinguish_signed',
669 default => 1,
670 type => $Mail::SpamAssassin::Conf::CONF_TYPE_BOOL
671 });
672
673
674=item B<txrep_spf>
675
676 0 | 1 (default: 1)
677
678When enabled, TxRep will treat any IP address using a given email address as
679the same authorized identity, and will not associate any IP address with it.
680(The same happens with valid DKIM signatures. No option available for DKIM).
681
682Note: at domains that define the useless SPF +all (pass all), no IP would be
683ever associated with the email address, and all addresses (incl. the froged
684ones) would be treated as coming from the authorized source. However, such
685domains are hopefuly rare, and ask for this kind of treatment anyway.
686
687=back
688
689=cut # ...................................................................
69013µs push (@cmds, {
691 setting => 'txrep_spf',
692 default => 1,
693 type => $Mail::SpamAssassin::Conf::CONF_TYPE_BOOL
694 });
695
696
697# -------------------------------------------------------------------------
698=head2 REPUTATION WEIGHTS
699
700The overall reputation of the sender comprises several elements:
701
702=over 4
703
704=item 1) The reputation of the 'From' email address bound to the originating IP
705 address fraction (see the mask parameters for details)
706
707=item 2) The reputation of the 'From' email address alone (regardless the IP
708 address being currently used)
709
710=item 3) The reputation of the domain name of the 'From' email address
711
712=item 4) The reputation of the originating IP address, regardless of sender's email address
713
714=item 5) The reputation of the HELO name of the originating computer (if available)
715
716=back
717
718Each of these partial reputations is weighted with the help of these parameters,
719and the overall reputation is calculation as the sum of the individual
720reputations divided by the sum of all their weights:
721
722 sender_reputation = weight_email * rep_email +
723 weight_email_ip * rep_email_ip +
724 weight_domain * rep_domain +
725 weight_ip * rep_ip +
726 weight_helo * rep_helo
727
728You can disable the individual partial reputations by setting their respective
729weight to zero. This will also reduce the size of the database, since each
730partial reputation requires a separate entry in the database table. Disabling
731some of the partial reputations in this way may also help with the performance
732on busy servers, because the respective database lookups and processing will
733be skipped too.
734
735=over 4
736
737=item B<txrep_weight_email>
738
739 range [0..10] (default: 3)
740
741This weight factor controls the influence of the reputation of the standalone
742email address, regardless of the originating IP address. When adjusting the
743weight, you need to keep on mind that an email address can be easily spoofed,
744and hence spammers can use 'from' email addresses belonging to senders with
745good reputation. From this point of view, the email address bound to the
746originating IP address is a more reliable indicator for the overall reputation.
747
748On the other hand, some reputable senders may be sending from a bigger number
749of IP addresses, so looking for the reputation of the standalone email address
750without regarding the originating IP has some sense too.
751
752We recommend using a relatively low value for this partial reputation.
753
754=cut # ...................................................................
755 push (@cmds, {
756 setting => 'txrep_weight_email',
757 default => 3,
758 type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
759 code => sub {
760 my ($self, $key, $value, $line) = @_;
761 if ($value < 0 || $value > 10) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;}
762 $self->{txrep_weight_email} = $value;
763 }
76417µs });
765
766# -------------------------------------------------------------------------
767=item B<txrep_weight_email_ip>
768
769 range [0..10] (default: 10)
770
771This is the standard reputation used in the same way as it was by the original
772AWL plugin. Each sender's email address is bound to the originating IP, or
773its part as defined by the txrep_ipv4_mask_len or txrep_ipv6_mask_len parameters.
774
775At a user sending from multiple locations, diverse mail servers, or from a dynamic
776IP range out of the masked block, his email address will have a separate reputation
777value for each of the different (partial) IP addresses.
778
779When the option auto_whitelist_distinguish_signed is enabled, in contrary to
780the original AWL module, TxRep does not record the IP address when DKIM
781signature is detected. The email address is then not bound to any IP address, but
782rather just to the DKIM signature, since it is considered that it authenticates
783the sender more reliably than the IP address (which can also vary).
784
785This is by design the most relevant reputation, and its weight should be kept
786high.
787
788=cut # ...................................................................
789 push (@cmds, {
790 setting => 'txrep_weight_email_ip',
791 default => 10,
792 type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
793 code => sub {
794 my ($self, $key, $value, $line) = @_;
795 if ($value < 0 || $value > 10) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;}
796 $self->{txrep_weight_email_ip} = $value;
797 }
79819µs });
799
800# -------------------------------------------------------------------------
801=item B<txrep_weight_domain>
802
803 range [0..10] (default: 2)
804
805Some spammers may use always their real domain name in the email address,
806just with multiple or changing local parts. This reputation will record the
807spam scores of all messages send from the respective domain, regardless of
808the local part (user name) used.
809
810Similarly as with the email_ip reputation, the domain reputation is also
811bound to the originating address (or a masked block, if mask parameters used).
812It avoids giving false reputation based on spoofed email addresses.
813
814In case of a DKIM signature detected, the signature signer is used instead
815of the domain name extracted from the email address. It is considered that
816the signing authority is responsible for sending email of any domain name,
817hence the same reputation applies here.
818
819The domain reputation will give relevant picture about the owner of the
820domain in case of small servers, or corporation with strict policies, but
821will be less relevant for freemailers like Gmail, Hotmail, and similar,
822because both ham and spam may be sent by their users.
823
824The default value is set relatively low. Higher weight values may be useful,
825but we recommend caution and observing the scores before increasing it.
826
827=cut # ...................................................................
828 push (@cmds, {
829 setting => 'txrep_weight_domain',
830 default => 2,
831 type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
832 code => sub {
833 my ($self, $key, $value, $line) = @_;
834 if ($value < 0 || $value > 10) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;}
835 $self->{txrep_weight_domain} = $value;
836 }
837110µs });
838
839# -------------------------------------------------------------------------
840=item B<txrep_weight_ip>
841
842 range [0..10] (default: 4)
843
844Spammers can send through the same relay (incl. compromised hosts) under a
845multitude of email addresses. This is the exact case when the IP reputation
846can help. This reputation is a kind of a local RBL.
847
848The weight is set by default lower than for the email_IP reputation, because
849there may be cases when the same IP address hosts both spammers and acceptable
850senders (for example the marketing department of a company sends you spam, but
851you still need to get messages from their billing address).
852
853=cut # ...................................................................
854 push (@cmds, {
855 setting => 'txrep_weight_ip',
856 default => 4,
857 type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
858 code => sub {
859 my ($self, $key, $value, $line) = @_;
860 if ($value < 0 || $value > 10) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;}
861 $self->{txrep_weight_ip} = $value;
862 }
86317µs });
864
865# -------------------------------------------------------------------------
866=item B<txrep_weight_helo>
867
868 range [0..10] (default: 0.5)
869
870Big number of spam messages come from compromised hosts, often personal computers,
871or top-boxes. Their NetBIOS names are usually used as the HELO name when connecting
872to your mail server. Some of the names are pretty generic and hence may be shared by
873a big number of hosts, but often the names are quite unique and may be a good
874indicator for detecting a spammer, despite that he uses different email and IP
875addresses (spam can come also from portable devices).
876
877No IP address is bound to the HELO name when stored to the reputation database.
878This is intentional, and despite the possibility that numerous devices may share
879some of the HELO names.
880
881This option is still considered experimental, hence the low weight value, but after
882some testing it could be likely at least slightly increased.
883
884=cut # ...................................................................
885 push (@cmds, {
886 setting => 'txrep_weight_helo',
887 default => 0.5,
888 type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
889 code => sub {
890 my ($self, $key, $value, $line) = @_;
891 if ($value < 0 || $value > 10) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;}
892 $self->{txrep_weight_helo} = $value;
893 }
89416µs });
895
896
897# -------------------------------------------------------------------------
898=back
899
900=head1 ADMINISTRATOR SETTINGS
901
902These settings differ from the ones above, in that they are considered 'more
903privileged' -- even more than the ones in the B<PRIVILEGED SETTINGS> section.
904No matter what C<allow_user_rules> is set to, these can never be set from a
905user's C<user_prefs> file.
906
907=over 4
908
909=item B<txrep_factory module>
910
911 (default: Mail::SpamAssassin::DBBasedAddrList)
912
913Select alternative database factory module for the TxRep database.
914
915=cut # ...................................................................
91614µs push (@cmds, {
917 setting => 'txrep_factory',
918 is_admin => 1,
919 default => 'Mail::SpamAssassin::DBBasedAddrList',
920 type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING
921 });
922
923
924# -------------------------------------------------------------------------
925=item B<auto_whitelist_path /path/filename>
926
927 (default: ~/.spamassassin/tx-reputation)
928
929This is the TxRep directory and filename. By default, each user
930has their own reputation database in their C<~/.spamassassin> directory with
931mode 0700. For system-wide SpamAssassin use, you may want to share this
932across all users.
933
934=cut # ...................................................................
935 push (@cmds, {
936 setting => 'auto_whitelist_path',
937 is_admin => 1,
938 default => '__userstate__/tx-reputation',
939 type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING,
940 code => sub {
941 my ($self, $key, $value, $line) = @_;
942 unless (defined $value && $value !~ /^$/) {return $Mail::SpamAssassin::Conf::MISSING_REQUIRED_VALUE;}
943 if (-d $value) {return $Mail::SpamAssassin::Conf::INVALID_VALUE; }
944 $self->{txrep_path} = $value;
945 }
94617µs });
947
948
949# -------------------------------------------------------------------------
950=item B<auto_whitelist_db_modules Module ...>
951
952 (default: see below)
953
954What database modules should be used for the TxRep storage database
955file. The first named module that can be loaded from the Perl include path
956will be used. The format is:
957
958 PreferredModuleName SecondBest ThirdBest ...
959
960ie. a space-separated list of Perl module names. The default is:
961
962 DB_File GDBM_File SDBM_File
963
964NDBM_File is not supported (see SpamAssassin bug 4353).
965
966=cut # ...................................................................
96714µs push (@cmds, {
968 setting => 'auto_whitelist_db_modules',
969 is_admin => 1,
970 default => 'DB_File GDBM_File SDBM_File',
971 type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING
972 });
973
974
975# -------------------------------------------------------------------------
976=item B<auto_whitelist_file_mode>
977
978 (default: 0700)
979
980The file mode bits used for the TxRep directory or file.
981
982Make sure you specify this using the 'x' mode bits set, as it may also be used
983to create directories. However, if a file is created, the resulting file will
984not have any execute bits set (the umask is set to 0111).
985
986=cut # ...................................................................
987 push (@cmds, {
988 setting => 'auto_whitelist_file_mode',
989 is_admin => 1,
990 default => '0700',
991 type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
992 code => sub {
993 my ($self, $key, $value, $line) = @_;
994 if ($value !~ /^0?[0-7]{3}$/) {
995 return $Mail::SpamAssassin::Conf::INVALID_VALUE;
996 }
997 $self->{txrep_file_mode} = untaint_var($value);
998 }
999118µs });
1000
1001
1002# -------------------------------------------------------------------------
1003=item B<user_awl_dsn DBI:databasetype:databasename:hostname:port>
1004
1005Used by the SQLBasedAddrList storage implementation.
1006
1007This will set the DSN used to connect. Example:
1008C<DBI:mysql:spamassassin:localhost>
1009
1010=cut # ...................................................................
101113µs push (@cmds, {
1012 setting => 'user_awl_dsn',
1013 is_admin => 1,
1014 type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING
1015 });
1016
1017
1018# -------------------------------------------------------------------------
1019=item B<user_awl_sql_username username>
1020
1021Used by the SQLBasedAddrList storage implementation.
1022
1023The authorized username to connect to the above DSN.
1024
1025=cut # ...................................................................
102613µs push (@cmds, {
1027 setting => 'user_awl_sql_username',
1028 is_admin => 1,
1029 type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING
1030 });
1031
1032
1033# -------------------------------------------------------------------------
1034=item B<user_awl_sql_password password>
1035
1036Used by the SQLBasedAddrList storage implementation.
1037
1038The password for the database username, for the above DSN.
1039
1040=cut # ...................................................................
1041113µs push (@cmds, {
1042 setting => 'user_awl_sql_password',
1043 is_admin => 1,
1044 type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING
1045 });
1046
1047
1048# -------------------------------------------------------------------------
1049=item B<user_awl_sql_table tablename>
1050
1051 (default: txrep)
1052
1053Used by the SQLBasedAddrList storage implementation.
1054
1055The table name where reputation is to be stored in, for the above DSN.
1056
1057=back
1058
1059=cut # ...................................................................
1060110µs push (@cmds, {
1061 setting => 'user_awl_sql_table',
1062 is_admin => 1,
1063 default => 'txrep',
1064 type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING
1065 });
1066
1067119µs1826µs $conf->{parser}->register_commands(\@cmds);
1068}
1069
1070
1071###########################################################################
1072sub _message {
1073###########################################################################
1074 my ($self, $value, $msg) = @_;
1075 print "SpamAssassin TxRep: $value\n" if ($msg);
1076 dbg("TxRep: $value");
1077}
1078
1079
1080###########################################################################
1081sub _fail_exit {
1082###########################################################################
1083 my ($self, $err) = @_;
1084 my $eval_stat = ($err ne '') ? $err : "errno=$!";
1085 chomp $eval_stat;
1086 warn("TxRep: open of TxRep file failed: $eval_stat\n");
1087 if (!defined $self->{txKeepStoreTied}) {$self->finish();}
1088 return 0;
1089}
1090
1091
1092###########################################################################
1093sub _fn_envelope {
1094###########################################################################
1095 my ($self, $args, $value, $msg) = @_;
1096
1097 unless ($self->{main}->{conf}->{use_txrep}){ return 0;}
1098 unless ($args->{address}) {$self->_message($args->{cli_p},"failed ".$msg); return 0;}
1099
1100 my $factor = $self->{conf}->{txrep_weight_email} +
1101 $self->{conf}->{txrep_weight_email_ip} +
1102 $self->{conf}->{txrep_weight_domain} +
1103 $self->{conf}->{txrep_weight_ip} +
1104 $self->{conf}->{txrep_weight_helo};
1105 my $sign = $args->{signedby};
1106 my $id = $args->{address};
1107 if ($args->{address} =~ /,/) {
1108 $sign = $args->{address};
1109 $sign =~ s/^.*,//g;
1110 $id =~ s/,.*$//g;
1111 }
1112
1113 # simplified regex used for IP detection (possible FP at a domain is not critical)
1114 if ($id !~ /\./ && $self->{conf}->{txrep_weight_helo})
1115 {$factor /= $self->{conf}->{txrep_weight_helo}; $sign = 'helo';}
1116 elsif ($id =~ /^[a-f\d\.:]+$/ && $self->{conf}->{txrep_weight_ip})
1117 {$factor /= $self->{conf}->{txrep_weight_ip};}
1118 elsif ($id =~ /@/ && $self->{conf}->{txrep_weight_email})
1119 {$factor /= $self->{conf}->{txrep_weight_email};}
1120 elsif ($id !~ /@/ && $self->{conf}->{txrep_weight_domain})
1121 {$factor /= $self->{conf}->{txrep_weight_domain};}
1122 else {$factor = 1;}
1123
1124 $self->open_storages();
1125 my $score = (!defined $value)? undef : $factor * $value;
1126 my $status = $self->modify_reputation($id, $score, $sign);
1127 dbg("TxRep: $msg %s (score %s) %s", $id, $score || 'undef', $sign || '');
1128 eval {
1129 $self->_message($args->{cli_p}, ($status?"":"error ") . $msg . ": " . $id);
1130 if (!defined $self->{txKeepStoreTied}) {$self->finish();}
1131 1;
1132 } or return $self->_fail_exit( $@ );
1133 return $status;
1134}
1135
- -
1138# -------------------------------------------------------------------------
1139=head1 BLACKLISTING / WHITELISTING
1140
1141When asked by SpamAssassin to blacklist or whitelist a user, the TxRep
1142plugin adds a score of 100 (for blacklisting) or -100 (for whitelisting)
1143to the given sender's email address. At a plain address without any IP
1144address, the value is multiplied by the ratio of total reputation
1145weight to the EMAIL reputation weight to account for the reduced impact
1146of the standalone EMAIL reputation when calculating the overall reputation.
1147
1148 total_weight = weight_email + weight_email_ip + weight_domain + weight_ip + weight_helo
1149 blacklisted_reputation = 100 * total_weight / weight_email
1150
1151When a standalone email address is blacklisted/whitelisted, all records
1152of the email address bound to an IP address, DKIM signature, or a SPF pass
1153will be removed from the database, and only the standalone record is kept.
1154
1155Besides blacklisting/whitelisting of standalone email addresses, the same
1156method may be used also for blacklisting/whitelisting of IP addresses,
1157domain names, and HELO names (only dotless Netbios HELO names can be used).
1158
1159When whitelisting/blacklisting an email address or domain name, you can
1160bind them to a specified DKIM signature or SPF record by appending the
1161DKIM signing domain or the tag 'spf' after the ID in the following way:
1162
1163 spamassassin --add-addr-to-blacklist=spamming.biz,spf
1164 spamassassin --add-addr-to-whitelist=friend@good.org,good.org
1165
1166When a message contains both a DKIM signature and an SPF pass, the DKIM
1167signature takes the priority, so the record bound to the 'spf' tag won't
1168be checked. Only email addresses and domains can be bound to DKIM or SPF.
1169Records of IP adresses and HELO names are always without DKIM/SPF.
1170
1171In case of dual storage, the black/whitelisting is performed only in the
1172default storage.
1173
1174=cut
1175######################################################## plugin hooks #####
1176sub blacklist_address {my $self=shift; return $self->_fn_envelope(@_, 100, "blacklisting address");}
1177sub whitelist_address {my $self=shift; return $self->_fn_envelope(@_, -100, "whitelisting address");}
1178sub remove_address {my $self=shift; return $self->_fn_envelope(@_,undef, "removing address");}
1179###########################################################################
1180
1181
1182# -------------------------------------------------------------------------
1183=head1 REPUTATION LOGICS
1184
11851. The most significant sender identificator is equally as at AWL, the
1186 combination of the email address and the originating IP address, resp.
1187 its part defined by the IPv4 resp. IPv6 mask setting.
1188
11892. No IP checking for standalone EMAIL address reputation
1190
11913. No signature checking for IP reputation, and for HELO name reputation
1192
11934. The EMAIL_IP weight, and not the standalone EMAIL weight is used when
1194 no IP address is available (EMAIL_IP is the main indicator, and has
1195 the highest weight)
1196
11975. No IP checking at signed emails (signature authenticates the email
1198 instead of the IP address)
1199
12006. No IP checking at SPF pass (we assume the domain owner is responsable
1201 for all IP's he authorizes to send from, hence we use the same identity
1202 for all of them)
1203
12047. No signature used for standalone EMAIL reputation (would be redundant,
1205 since no IP is used at signed EMAIL_IP reputation, and we would store
1206 two identical hits)
1207
12088. When available, the DKIM signer is used instead of the domain name for
1209 the DOMAIN reputation
1210
12119. No IP and no signature used for HELO reputation (despite the possibility
1212 of the possible existence of multiple computers with the same HELO)
1213
121410. The full (unmasked IP) address is used (in the address field, instead the
1215 IP field) for the standalone IP reputation
1216
1217=cut
1218###########################################################################
1219
# spent 1476s (393ms+1476) within Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation which was called 453 times, avg 3.26s/call: # 235 times (203ms+1476s) by Mail::SpamAssassin::Plugin::TxRep::learn_message at line 1846, avg 6.28s/call # 218 times (191ms+-191ms) by Mail::SpamAssassin::Plugin::TxRep::forget_message at line 1861, avg 0s/call
sub check_senders_reputation {
1220###########################################################################
1221453930µs my ($self, $pms) = @_;
1222
1223# just for the development debugging
1224# use Data::Printer;
1225# dbg("TxRep: DEBUG DUMP of pms: %s, %s", $pms, p($pms));
1226
12274531.52ms my $autolearn = defined $self->{autolearn};
12284531.45ms $self->{last_pms} = $self->{autolearn} = undef;
1229
1230 # Cases where we would not be able to use TxRep
12314531.49ms return 0 unless ($self->{conf}->{use_txrep});
12324531.07ms if ($self->{conf}->{use_auto_whitelist}) {
1233 warn("TxRep: cannot run when Auto-Whitelist is enabled. Please disable it!\n");
1234 return 0;
1235 }
1236453889µs if ($autolearn && !$self->{conf}->{txrep_autolearn}) {
1237 dbg("TxRep: autolearning disabled, no more reputation adjusting, quitting");
1238 return 0;
1239 }
12404535.76ms453303ms my @from = $pms->all_from_addrs();
# spent 303ms making 453 calls to Mail::SpamAssassin::PerMsgStatus::all_from_addrs, avg 670µs/call
12414531.60ms if (@from && $from[0] eq 'ignore@compiling.spamassassin.taint.org') {
1242 dbg("TxRep: no scan in lint mode, quitting");
1243 return 0;
1244 }
1245
1246453965µs my $delta = 0;
12474534.46ms4533.69ms my $timer = $self->{main}->time_method("total_txrep");
# spent 3.69ms making 453 calls to Mail::SpamAssassin::time_method, avg 8µs/call
12484531.54ms my $msgscore = (defined $self->{learning})? $self->{learning} : $pms->get_autolearn_points();
12494535.41ms4532.52s my $date = $pms->{msg}->receive_date() || $pms->{date_header_time};
# spent 2.52s making 453 calls to Mail::SpamAssassin::Message::receive_date, avg 5.56ms/call
1250 my $msg_id = $self->{msgid} ||
12514537.53ms453247ms Mail::SpamAssassin::Plugin::Bayes->get_msgid($pms->{msg}) ||
# spent 247ms making 453 calls to Mail::SpamAssassin::Plugin::Bayes::get_msgid, avg 546µs/call
1252 $pms->get('Message-Id') || $pms->get('Message-ID') || $pms->get('MESSAGE-ID') || $pms->get('MESSAGEID');
1253
12544535.62ms45311.0ms my $from = lc $pms->get('From:addr') || $pms->get('EnvelopeFrom:addr');;
# spent 11.0ms making 453 calls to Mail::SpamAssassin::PerMsgStatus::get, avg 24µs/call
12554536.18ms4532.16ms return 0 unless $from =~ /\S/;
# spent 2.16ms making 453 calls to Mail::SpamAssassin::Plugin::TxRep::CORE:match, avg 5µs/call
12564531.90ms my $domain = $from;
12574536.87ms4533.16ms $domain =~ s/^.+@//;
# spent 3.16ms making 453 calls to Mail::SpamAssassin::Plugin::TxRep::CORE:subst, avg 7µs/call
1258
1259 # Find the last untrusted relay and populate helo and original IP
12604531.24ms my ($origip, $helo);
12614532.43ms if (defined $pms->{relays_trusted} || defined $pms->{relays_untrusted}) {
12629064.06ms my $trusteds = @{$pms->{relays_trusted}};
126313599.13ms foreach my $rly ( @{$pms->{relays_trusted}}, @{$pms->{relays_untrusted}} ) {
1264 # Get the last found HELO, regardless of private/public or trusted/untrusted
1265 # Avoiding a redundant duplicate entry if HELO is equal/similar to another identificator
12662017229ms12024101ms if (defined $rly->{helo} &&
# spent 79.3ms making 6012 calls to Mail::SpamAssassin::Plugin::TxRep::CORE:regcomp, avg 13µs/call # spent 21.3ms making 6012 calls to Mail::SpamAssassin::Plugin::TxRep::CORE:match, avg 4µs/call
1267 $rly->{helo} !~ /^\[?\Q$rly->{ip}\E\]?$/ &&
1268 $rly->{helo} !~ /^\Q$domain\E$/i &&
1269 $rly->{helo} !~ /^\Q$from\E$/i ) {
127019786.60ms $helo = $rly->{helo};
1271 }
1272 # use only trusted ID, but use the first untrusted IP (if available) (AWL bug 6908)
1273 # at low spam scores (<2) ignore trusted/untrusted
1274 # set IP to 127.0.0.1 for any internal IP, so that it can be distinguished from none (AWL bug 6357)
127520177.31ms if ((--$trusteds >= 0 || $msgscore<2) && !$msg_id && $rly->{id}) {$msg_id = $rly->{id};}
1276246811.3ms if (($trusteds >= -1 || $msgscore<2) && !$rly->{ip_private} && $rly->{ip}) {$origip = $rly->{ip};}
1277247017.1ms if ( $trusteds >= 0 && !$origip && $rly->{ip_private} && $rly->{ip}) {$origip = '127.0.0.1';}
1278 }
1279 }
1280
1281 # Look for previous scores of the same message, for instance when doing re-learning
12824533.37ms if ($self->{conf}->{txrep_track_messages}) {
12834531.77ms if ($msg_id) {
12844535.07ms453760s my $msg_rep = $self->check_reputations($pms, 'MSG_ID', $msg_id, undef, $date, undef);
# spent 760s making 453 calls to Mail::SpamAssassin::Plugin::TxRep::check_reputations, avg 1.68s/call
12854536.90ms4536.20ms if (defined $msg_rep && $self->count()) {
# spent 6.20ms making 453 calls to Mail::SpamAssassin::Plugin::TxRep::count, avg 14µs/call
12864362.97ms if (defined $self->{learning} && !defined $self->{forgetting}) {
1287 # already learned, forget only if already learned (count>1), and relearn
1288 # when only scanned (count=1), go ahead with normal rep scan
12892182.09ms2181.73ms if ($self->count() > 1) {
# spent 1.73ms making 218 calls to Mail::SpamAssassin::Plugin::TxRep::count, avg 8µs/call
1290218569µs $self->{last_pms} = $pms; # cache the pmstatus
12912182.75ms2181469s $self->forget_message($pms->{msg},$msg_id); # sub reentrance OK
# spent 1469s making 218 calls to Mail::SpamAssassin::Plugin::TxRep::forget_message, avg 6.74s/call
1292 }
1293 } elsif ($self->{forgetting}) {
1294218698µs $msgscore = $msg_rep; # forget the old stored score instead of the one got now
12952182.68ms2183.04ms dbg("TxRep: forgetting stored score %0.3f of message %s", $msgscore || 'undef', $msg_id);
# spent 3.04ms making 218 calls to Mail::SpamAssassin::Logger::dbg, avg 14µs/call
1296 } else {
1297 # calculating the delta from the stored message reputation
1298 $delta = ($msgscore + $self->{conf}->{txrep_factor}*$msg_rep) / (1+$self->{conf}->{txrep_factor}) - $msgscore;
1299 if ($delta != 0) {
1300 $pms->got_hit("TXREP", "TXREP: ", ruletype => 'eval', score => sprintf("%0.3f", $delta));
1301 }
1302 dbg("TxRep: message %s already scanned, using old data; post-TxRep score: %0.3f", $msg_id, $pms->{score} || 'undef');
1303 return 0;
1304 }
1305 } # no stored reputation found, go ahead with normal rep scan
1306 } else {dbg("TxRep: no message-id available, parsing forced");}
1307 } # else no message tracking, go ahead with normal rep scan
1308
1309 # whitelists recipients at senders from internal networks after checking MSG_ID only
13104537.52ms if ( $self->{conf}->{txrep_whitelist_out} &&
13114531.05ms defined $pms->{relays_internal} && @{$pms->{relays_internal}} &&
1312453997µs (!defined $pms->{relays_external} || !@{$pms->{relays_external}})
1313 ) {
1314233µs23.84ms foreach my $rcpt ($pms->all_to_addrs()) {
# spent 3.84ms making 2 calls to Mail::SpamAssassin::PerMsgStatus::all_to_addrs, avg 1.92ms/call
1315216µs if ($rcpt) {
1316231µs223µs dbg("TxRep: internal sender, whitelisting recipient: $rcpt");
# spent 23µs making 2 calls to Mail::SpamAssassin::Logger::dbg, avg 11µs/call
1317229µs26.22s $self->modify_reputation($rcpt, -1*$self->{conf}->{txrep_whitelist_out}, undef);
# spent 6.22s making 2 calls to Mail::SpamAssassin::Plugin::TxRep::modify_reputation, avg 3.11s/call
1318 }
1319 }
1320 }
1321
1322 # Get the signing domain
13234536.83ms45319.4ms my $signedby = ($self->{conf}->{auto_whitelist_distinguish_signed})? $pms->get_tag('DKIMDOMAIN') : undef;
# spent 19.4ms making 453 calls to Mail::SpamAssassin::PerMsgStatus::get_tag, avg 43µs/call
1324
1325 # Summary of all information we've gathered so far
1326 dbg("TxRep: active, %s pre-score: %s, autolearn score: %s, IP: %s, address: %s %s",
1327 $msg_id || '',
13284536.68ms4534.65ms $pms->{score} || '?',
# spent 4.65ms making 453 calls to Mail::SpamAssassin::Logger::dbg, avg 10µs/call
1329 $msgscore || '?',
1330 $origip || '?',
1331 $from || '?',
1332 $signedby ? "signed by $signedby" : '(unsigned)'
1333 );
1334
13354531.42ms my $ip = $origip;
13364531.69ms if ($signedby) {
1337 $ip = undef;
1338 $domain = $signedby;
1339 } elsif ($pms->{spf_pass} && $self->{conf}->{txrep_spf}) {
1340 $ip = undef;
1341 $signedby = 'spf';
1342 }
1343
13444531.02ms my $totalweight = 0;
13454531.40ms $self->{totalweight} = $totalweight;
1346
1347 # Get current reputation info
13484534.38ms453518ms $delta += $self->check_reputations($pms, 'EMAIL_IP', $from, $ip, $signedby, $msgscore);
# spent 518ms making 453 calls to Mail::SpamAssassin::Plugin::TxRep::check_reputations, avg 1.14ms/call
1349
13504532.01ms if ($domain) {
13514533.74ms453427ms $delta += $self->check_reputations($pms, 'DOMAIN', $domain, $ip, $signedby, $msgscore);
# spent 427ms making 453 calls to Mail::SpamAssassin::Plugin::TxRep::check_reputations, avg 942µs/call
1352 }
13534531.97ms if ($helo) {
13543963.39ms396346ms $delta += $self->check_reputations($pms, 'HELO', $helo, undef, 'HELO', $msgscore);
# spent 346ms making 396 calls to Mail::SpamAssassin::Plugin::TxRep::check_reputations, avg 873µs/call
1355 }
13564531.85ms if ($origip) {
13574531.88ms if (!$signedby) {
13584533.65ms453432ms $delta += $self->check_reputations($pms, 'EMAIL', $from, undef, undef, $msgscore);
# spent 432ms making 453 calls to Mail::SpamAssassin::Plugin::TxRep::check_reputations, avg 955µs/call
1359 }
13604533.75ms453418ms $delta += $self->check_reputations($pms, 'IP', $origip, undef, undef, $msgscore);
# spent 418ms making 453 calls to Mail::SpamAssassin::Plugin::TxRep::check_reputations, avg 922µs/call
1361 }
1362
1363 # Learn against this message and store reputation
13644531.28ms if (!defined $self->{learning}) {
1365 $delta = ($self->{totalweight})? $self->{conf}->{txrep_factor} * $delta / $self->{totalweight} : 0;
1366 if ($delta) {
1367 $pms->got_hit("TXREP", "TXREP: ", ruletype => 'eval', score => sprintf("%0.3f", $delta));
1368 }
1369 $msgscore += $delta;
1370 if (defined $pms->{score}) {
1371 dbg("TxRep: post-TxRep score: %.3f", $pms->{score});
1372 }
1373 }
1374 # Track message ID
13754532.56ms if ($self->{conf}->{txrep_track_messages} && $msg_id) {
13764533.82ms453705s $self->check_reputations($pms, 'MSG_ID', $msg_id, undef, $date, $msgscore);
# spent 705s making 453 calls to Mail::SpamAssassin::Plugin::TxRep::check_reputations, avg 1.56s/call
1377 }
1378 # Close any open resources
13794532.40ms if (!defined $self->{txKeepStoreTied}) {
1380 $self->finish();
1381 }
1382
13834539.54ms return 0;
1384}
1385
1386
1387###########################################################################
1388
# spent 1467s (141ms+1466) within Mail::SpamAssassin::Plugin::TxRep::check_reputations which was called 3114 times, avg 471ms/call: # 453 times (20.5ms+760s) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1284, avg 1.68s/call # 453 times (16.8ms+705s) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1376, avg 1.56s/call # 453 times (17.5ms+501ms) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1348, avg 1.14ms/call # 453 times (37.0ms+395ms) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1358, avg 955µs/call # 453 times (18.2ms+408ms) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1351, avg 942µs/call # 453 times (16.4ms+401ms) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1360, avg 922µs/call # 396 times (14.4ms+331ms) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1354, avg 873µs/call
sub check_reputations {
1389###########################################################################
139031146.16ms my $self = shift;
139131145.24ms my $delta;
1392
1393311433.8ms3114741ms if ($self->open_storages()) {
# spent 741ms making 3114 calls to Mail::SpamAssassin::Plugin::TxRep::open_storages, avg 238µs/call
1394311414.3ms if ($self->{conf}->{txrep_user2global_ratio} && $self->{user_storage} != $self->{global_storage}) {
1395 my $user = $self->check_reputation('user_storage', @_);
1396 my $global = $self->check_reputation('global_storage',@_);
1397
1398 if (defined $user and $user == $user) {
1399 $delta = ( $self->{conf}->{txrep_user2global_ratio} * $user + $global ) / ( 1 + $self->{conf}->{txrep_user2global_ratio} );
1400 } else {
1401 $delta = $global;
1402 }
1403 } else {
1404311427.1ms31141466s $delta = $self->check_reputation(undef,@_);
# spent 1466s making 3114 calls to Mail::SpamAssassin::Plugin::TxRep::check_reputation, avg 471ms/call
1405 }
1406 }
1407311442.0ms return $delta;
1408}
1409
1410
1411###########################################################################
1412
# spent 1466s (877ms+1465) within Mail::SpamAssassin::Plugin::TxRep::check_reputation which was called 3114 times, avg 471ms/call: # 3114 times (877ms+1465s) by Mail::SpamAssassin::Plugin::TxRep::check_reputations at line 1404, avg 471ms/call
sub check_reputation {
1413###########################################################################
1414311432.9ms my ($self, $storage, $pms, $key, $id, $ip, $signedby, $msgscore) = @_;
1415
141631145.89ms my $delta = 0;
14173114186ms my $weight = ($key eq 'MSG_ID')? 1 : eval('$pms->{main}->{conf}->{txrep_weight_'.lc($key).'}');
# spent 6.56ms executing statements in 453 string evals (merged) # spent 4.07ms executing statements in 453 string evals (merged) # spent 3.78ms executing statements in 453 string evals (merged) # spent 3.64ms executing statements in 453 string evals (merged) # spent 3.56ms executing statements in 396 string evals (merged)
1418
1419# {
1420# #Bug 7164, trying to find out reason for these: _WARN: Use of uninitialized value $msgscore in addition (+) at /usr/share/perl5/vendor_perl/Mail/SpamAssassin/Plugin/TxRep.pm line 1415.
1421# no warnings;
1422#
1423# unless (defined $msgscore) {
1424# #Output some params and the calling function so we can identify more about this bug
1425# dbg("TxRep: MsgScore Undefined (bug 7164) - check_reputation Parameters: self: $self storage: $storage pms: $pms, key: $key, id: $id, ip: $ip, signedby: $signedby, msgscore: $msgscore");
1426# dbg("TxRep: MsgScore Undefined (bug 7164) - weight: $weight");
1427#
1428# my ($package, $filename, $line) = caller();
1429#
1430# chomp($package);
1431# chomp($filename);
1432# chomp($line);
1433#
1434# dbg("TxRep: MsgScore Undefined (bug 7164) - Caller Info: Package: $package - Filename: $filename - Line: $line");
1435#
1436# #Define $msgscore as a triage to hide warnings while we find the root cause
1437# #$msgscore = 0;
1438# }
1439# }
1440
1441
1442311411.7ms if (defined $weight && $weight) {
144331145.23ms my $meanrep;
1444311434.4ms311429.1ms my $timer = $self->{main}->time_method('check_txrep_'.lc($key));
# spent 29.1ms making 3114 calls to Mail::SpamAssassin::time_method, avg 9µs/call
1445
144631145.79ms if (defined $storage) {
1447 $self->{checker} = $self->{$storage};
1448 }
1449311426.7ms31141.26s my $found = $self->get_sender($id, $ip, $signedby);
# spent 1.26s making 3114 calls to Mail::SpamAssassin::Plugin::TxRep::get_sender, avg 404µs/call
145031149.50ms my $tag_id = (defined $storage)? uc($key.'_'.substr($storage,0,1)) : uc($key);
1451311436.3ms311436.0ms if (defined $found && $self->count()) {
# spent 36.0ms making 3114 calls to Mail::SpamAssassin::Plugin::TxRep::count, avg 12µs/call
1452257340.5ms514642.9ms $meanrep = $self->total() / $self->count();
# spent 23.1ms making 2573 calls to Mail::SpamAssassin::Plugin::TxRep::total, avg 9µs/call # spent 19.8ms making 2573 calls to Mail::SpamAssassin::Plugin::TxRep::count, avg 8µs/call
1453 }
1454311415.9ms if ($self->{learning} && defined $msgscore) {
145526618.64ms if (defined $meanrep) {
1456 # $msgscore<=>0 gives the sign of $msgscore
1457213715.9ms $msgscore += ($msgscore<=>0) * abs($meanrep);
1458 }
1459 dbg("TxRep: reputation: %s, count: %d, learning: %s, $tag_id: %s",
1460 defined $meanrep? sprintf("%.3f",$meanrep) : 'none',
1461 $self->count() || 0,
1462266175.5ms532246.0ms $self->{learning} || '',
# spent 23.1ms making 2661 calls to Mail::SpamAssassin::Logger::dbg, avg 9µs/call # spent 22.9ms making 2661 calls to Mail::SpamAssassin::Plugin::TxRep::count, avg 9µs/call
1463 $id || 'none'
1464 );
1465 } else {
14664531.06ms $self->{totalweight} += $weight;
14674535.30ms4703.82ms if ($key eq 'MSG_ID' && $self->count() > 0) {
# spent 3.66ms making 453 calls to Mail::SpamAssassin::Plugin::TxRep::count, avg 8µs/call # spent 158µs making 17 calls to Mail::SpamAssassin::Plugin::TxRep::total, avg 9µs/call
14684365.64ms8726.87ms $delta = $self->total() / $self->count();
# spent 3.74ms making 436 calls to Mail::SpamAssassin::Plugin::TxRep::total, avg 9µs/call # spent 3.13ms making 436 calls to Mail::SpamAssassin::Plugin::TxRep::count, avg 7µs/call
146943612.3ms43643.2ms $pms->set_tag('TXREP'.$tag_id, sprintf("%2.1f", $delta));
# spent 43.2ms making 436 calls to Mail::SpamAssassin::PerMsgStatus::set_tag, avg 99µs/call
1470 } elsif (defined $self->total()) {
1471 #Bug 7164 - $msgscore undefined
14721772µs if (defined $msgscore) {
1473 $delta = ($self->total() + $msgscore) / (1 + $self->count()) - $msgscore;
1474 } else {
147517242µs34331µs $delta = ($self->total()) / (1 + $self->count());
# spent 166µs making 17 calls to Mail::SpamAssassin::Plugin::TxRep::total, avg 10µs/call # spent 164µs making 17 calls to Mail::SpamAssassin::Plugin::TxRep::count, avg 10µs/call
1476 }
1477
147817471µs171.23ms $pms->set_tag('TXREP_'.$tag_id, sprintf("%2.1f", $delta));
# spent 1.23ms making 17 calls to Mail::SpamAssassin::PerMsgStatus::set_tag, avg 73µs/call
14791734µs if (defined $meanrep) {
1480 $pms->set_tag('TXREP_'.$tag_id.'_MEAN', sprintf("%2.1f", $meanrep));
1481 }
148217309µs341.18ms $pms->set_tag('TXREP_'.$tag_id.'_COUNT', sprintf("%2.1f", $self->count()));
# spent 1.03ms making 17 calls to Mail::SpamAssassin::PerMsgStatus::set_tag, avg 61µs/call # spent 146µs making 17 calls to Mail::SpamAssassin::Plugin::TxRep::count, avg 9µs/call
148317205µs171.07ms $pms->set_tag('TXREP_'.$tag_id.'_PRESCORE', sprintf("%2.1f", $pms->{score}));
# spent 1.07ms making 17 calls to Mail::SpamAssassin::PerMsgStatus::set_tag, avg 63µs/call
1484 } else {
1485 $pms->set_tag('TXREP_'.$tag_id.'_UNKNOWN', 1);
1486 }
148745311.3ms9067.72ms dbg("TxRep: reputation: %s, count: %d, weight: %.1f, delta: %.3f, $tag_id: %s",
# spent 3.97ms making 453 calls to Mail::SpamAssassin::Plugin::TxRep::count, avg 9µs/call # spent 3.75ms making 453 calls to Mail::SpamAssassin::Logger::dbg, avg 8µs/call
1488 defined $meanrep? sprintf("%.3f",$meanrep) : 'none',
1489 $self->count() || 0,
1490 $weight || 0,
1491 $delta || 0,
1492 $id || 'none'
1493 );
1494 }
1495311431.7ms311428.0ms $timer = $self->{main}->time_method('update_txrep_'.lc($key));
# spent 28.0ms making 3114 calls to Mail::SpamAssassin::time_method, avg 9µs/call
1496311417.1ms if (defined $msgscore) {
1497266110.8ms if ($self->{forgetting}) { # forgetting a message score
149812819.47ms1281187ms $self->remove_score($msgscore); # remove the given score and decrement the count
# spent 187ms making 1281 calls to Mail::SpamAssassin::Plugin::TxRep::remove_score, avg 146µs/call
149912815.07ms if ($key eq 'MSG_ID') { # remove the message ID score completely
15002182.03ms218704s $self->{checker}->remove_entry($self->{entry});
# spent 704s making 218 calls to Mail::SpamAssassin::DBBasedAddrList::remove_entry, avg 3.23s/call
1501 }
1502 } else {
1503138010.7ms1380255ms $self->add_score($msgscore); # add the score and increment the count
# spent 255ms making 1380 calls to Mail::SpamAssassin::Plugin::TxRep::add_score, avg 185µs/call
150413808.81ms2351.87ms if ($self->{learning} && $key eq 'MSG_ID' && $self->count() eq 1) {
# spent 1.87ms making 235 calls to Mail::SpamAssassin::Plugin::TxRep::count, avg 8µs/call
15052351.56ms23537.2ms $self->add_score($msgscore); # increasing the count by 1 at a learned score (count=2)
# spent 37.2ms making 235 calls to Mail::SpamAssassin::Plugin::TxRep::add_score, avg 158µs/call
1506 } # it can be distinguished from a scanned score (count=1)
1507 }
1508 } elsif (defined $found && $self->{forgetting} && $key eq 'MSG_ID') {
15092182.06ms218758s $self->{checker}->remove_entry($self->{entry}); #forgetting the message ID
# spent 758s making 218 calls to Mail::SpamAssassin::DBBasedAddrList::remove_entry, avg 3.48s/call
1510 }
1511 }
151231145.88ms if (defined $storage) {
1513 $self->{checker} = $self->{default_storage};
1514 }
1515
1516311456.4ms return ($weight || 0) * ($delta || 0);
1517}
1518
- -
1521#--------------------------------------------------------------------------
1522# Database handler subroutines
1523#--------------------------------------------------------------------------
1524
1525###########################################################################
152624490181ms
# spent 114ms within Mail::SpamAssassin::Plugin::TxRep::count which was called 12245 times, avg 9µs/call: # 3114 times (36.0ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_reputation at line 1451, avg 12µs/call # 2661 times (22.9ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_reputation at line 1462, avg 9µs/call # 2573 times (19.8ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_reputation at line 1452, avg 8µs/call # 1615 times (14.8ms+0s) by Mail::SpamAssassin::Plugin::TxRep::add_score at line 1565, avg 9µs/call # 453 times (6.20ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1285, avg 14µs/call # 453 times (3.97ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_reputation at line 1487, avg 9µs/call # 453 times (3.66ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_reputation at line 1467, avg 8µs/call # 436 times (3.13ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_reputation at line 1468, avg 7µs/call # 235 times (1.87ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_reputation at line 1504, avg 8µs/call # 218 times (1.73ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1289, avg 8µs/call # 17 times (164µs+0s) by Mail::SpamAssassin::Plugin::TxRep::check_reputation at line 1475, avg 10µs/call # 17 times (146µs+0s) by Mail::SpamAssassin::Plugin::TxRep::check_reputation at line 1482, avg 9µs/call
sub count {my $self=shift; return (defined $self->{checker})? $self->{entry}->{count} : undef;}
1527931671.9ms
# spent 39.3ms within Mail::SpamAssassin::Plugin::TxRep::total which was called 4658 times, avg 8µs/call: # 2573 times (23.1ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_reputation at line 1452, avg 9µs/call # 1615 times (12.1ms+0s) by Mail::SpamAssassin::Plugin::TxRep::add_score at line 1565, avg 7µs/call # 436 times (3.74ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_reputation at line 1468, avg 9µs/call # 17 times (166µs+0s) by Mail::SpamAssassin::Plugin::TxRep::check_reputation at line 1475, avg 10µs/call # 17 times (158µs+0s) by Mail::SpamAssassin::Plugin::TxRep::check_reputation at line 1467, avg 9µs/call
sub total {my $self=shift; return (defined $self->{checker})? $self->{entry}->{totscore} : undef;}
1528###########################################################################
1529
1530
1531###########################################################################
1532
# spent 1.26s (256ms+1.00) within Mail::SpamAssassin::Plugin::TxRep::get_sender which was called 3114 times, avg 404µs/call: # 3114 times (256ms+1.00s) by Mail::SpamAssassin::Plugin::TxRep::check_reputation at line 1449, avg 404µs/call
sub get_sender {
1533###########################################################################
1534311422.2ms my ($self, $addr, $origip, $signedby) = @_;
1535
153631147.25ms return unless (defined $self->{checker});
1537
1538311426.3ms3114210ms my $fulladdr = $self->pack_addr($addr, $origip);
# spent 210ms making 3114 calls to Mail::SpamAssassin::Plugin::TxRep::pack_addr, avg 67µs/call
1539311426.8ms3114763ms my $entry = $self->{checker}->get_addr_entry($fulladdr, $signedby);
# spent 763ms making 3114 calls to Mail::SpamAssassin::DBBasedAddrList::get_addr_entry, avg 245µs/call
1540311421.9ms $self->{entry} = $entry;
154131148.33ms $origip = $origip || 'none';
1542
15433114108ms622829.8ms if ($entry->{count}<0 || $entry->{count}=~/^(nan|)$/ || $entry->{totscore}=~/^(nan|)$/) {
# spent 29.8ms making 6228 calls to Mail::SpamAssassin::Plugin::TxRep::CORE:match, avg 5µs/call
1544 warn "TxRep: resetting bad data for ($addr, $origip), count: $entry->{count}, totscore: $entry->{totscore}\n";
1545 $self->{entry}->{count} = $self->{entry}->{totscore} = 0;
1546 }
1547311436.7ms return $self->{entry}->{count};
1548}
1549
1550
1551###########################################################################
1552
# spent 292ms (94.6+197) within Mail::SpamAssassin::Plugin::TxRep::add_score which was called 1615 times, avg 181µs/call: # 1380 times (85.1ms+170ms) by Mail::SpamAssassin::Plugin::TxRep::check_reputation at line 1503, avg 185µs/call # 235 times (9.45ms+27.7ms) by Mail::SpamAssassin::Plugin::TxRep::check_reputation at line 1505, avg 158µs/call
sub add_score {
1553###########################################################################
155416155.63ms my ($self,$score) = @_;
1555
155616153.32ms return unless (defined $self->{checker}); # no factory defined; we can't check
1557
155816154.09ms if ($score != $score) {
1559 warn "TxRep: attempt to add a $score to TxRep entry ignored\n";
1560 return; # don't try to add a NaN
1561 }
156216155.41ms $self->{entry}->{count} ||= 0;
1563
1564 # performing the dilution aging correction
1565161548.2ms323026.9ms if (defined $self->total() && defined $self->count() && defined $self->{txrep_dilution_factor}) {
# spent 14.8ms making 1615 calls to Mail::SpamAssassin::Plugin::TxRep::count, avg 9µs/call # spent 12.1ms making 1615 calls to Mail::SpamAssassin::Plugin::TxRep::total, avg 7µs/call
1566 my $diluted_total =
1567 ($self->count() + 1) *
1568 ($self->{txrep_dilution_factor} * $self->total() + $score) /
1569 ($self->{txrep_dilution_factor} * $self->count() + 1);
1570 my $corrected_score = $diluted_total - $self->total();
1571 $self->{checker}->add_score($self->{entry}, $corrected_score);
1572 } else {
1573161513.7ms1615170ms $self->{checker}->add_score($self->{entry}, $score);
# spent 170ms making 1615 calls to Mail::SpamAssassin::DBBasedAddrList::add_score, avg 106µs/call
1574 }
1575}
1576
- -
1579###########################################################################
1580
# spent 187ms (42.4+144) within Mail::SpamAssassin::Plugin::TxRep::remove_score which was called 1281 times, avg 146µs/call: # 1281 times (42.4ms+144ms) by Mail::SpamAssassin::Plugin::TxRep::check_reputation at line 1498, avg 146µs/call
sub remove_score {
1581###########################################################################
158212815.13ms my ($self,$score) = @_;
1583
158412812.93ms return unless (defined $self->{checker}); # no factory defined; we can't check
1585
158612813.99ms if ($score != $score) { # don't try to add a NaN
1587 warn "TxRep: attempt to add a $score to TxRep entry ignored\n";
1588 return;
1589 }
1590 # no reversal dilution aging correction (not easily possible),
1591 # just removing the original message score
159212816.40ms if ($self->{entry}->{count} > 2)
15932951.12ms {$self->{entry}->{count} -= 2;}
15949862.35ms else {$self->{entry}->{count} = 0;}
1595 # substract 2, and add a score; hence decrementing by 1
1596128121.5ms1281144ms $self->{checker}->add_score($self->{entry}, -1*$score);
# spent 144ms making 1281 calls to Mail::SpamAssassin::DBBasedAddrList::add_score, avg 113µs/call
1597}
1598
- -
1601###########################################################################
1602
# spent 6.22s (255µs+6.22) within Mail::SpamAssassin::Plugin::TxRep::modify_reputation which was called 2 times, avg 3.11s/call: # 2 times (255µs+6.22s) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1317, avg 3.11s/call
sub modify_reputation {
1603###########################################################################
1604214µs my ($self, $addr, $score, $signedby) = @_;
1605
160627µs return unless (defined $self->{checker}); # no factory defined; we can't check
1607221µs286µs my $fulladdr = $self->pack_addr($addr, undef);
# spent 86µs making 2 calls to Mail::SpamAssassin::Plugin::TxRep::pack_addr, avg 43µs/call
1608223µs2408µs my $entry = $self->{checker}->get_addr_entry($fulladdr, $signedby);
# spent 408µs making 2 calls to Mail::SpamAssassin::DBBasedAddrList::get_addr_entry, avg 204µs/call
1609
1610 # remove any old entries (will remove per-ip entries as well)
1611 # always call this regardless, as the current entry may have 0
1612 # scores, but the per-ip one may have more
1613219µs26.22s $self->{checker}->remove_entry($entry);
# spent 6.22s making 2 calls to Mail::SpamAssassin::DBBasedAddrList::remove_entry, avg 3.11s/call
1614
1615 # remove address only, no new score to add if score NaN or undef
1616217µs if (defined $score && $score==$score) {
1617 # else add score. get a new entry first
1618245µs2397µs $entry = $self->{checker}->get_addr_entry($fulladdr, $signedby);
# spent 397µs making 2 calls to Mail::SpamAssassin::DBBasedAddrList::get_addr_entry, avg 198µs/call
1619220µs2210µs $self->{checker}->add_score($entry, $score);
# spent 210µs making 2 calls to Mail::SpamAssassin::DBBasedAddrList::add_score, avg 105µs/call
1620 }
1621236µs return 1;
1622}
1623
1624
1625# connecting the primary and the secondary storage; needed only on the first run
1626# (this can't be in the constructor, since the settings are not available there)
1627###########################################################################
1628
# spent 741ms (28.5+712) within Mail::SpamAssassin::Plugin::TxRep::open_storages which was called 3114 times, avg 238µs/call: # 3114 times (28.5ms+712ms) by Mail::SpamAssassin::Plugin::TxRep::check_reputations at line 1393, avg 238µs/call
sub open_storages {
1629###########################################################################
163031145.93ms my $self = shift;
1631
1632 # disabled per bug 7191
1633311433.7ms return 1 unless (!defined $self->{default_storage});
1634
163512µs my $factory;
163616µs if ($self->{main}->{pers_addr_list_factory}) {
1637 $factory = $self->{main}->{pers_addr_list_factory};
1638 } else {
163914µs my $type = $self->{conf}->{txrep_factory};
1640116µs15µs if ($type =~ /^([_A-Za-z0-9:]+)$/) {
# spent 5µs making 1 call to Mail::SpamAssassin::Plugin::TxRep::CORE:match
1641110µs132µs $type = untaint_var($type);
# spent 32µs making 1 call to Mail::SpamAssassin::Util::untaint_var
1642 eval 'require '.$type.';
1643 $factory = '.$type.'->new();
1644 1;'
16451165µs or do {
# spent 385µs executing statements in string eval
1646 my $eval_stat = $@ ne '' ? $@ : "errno=$!"; chomp $eval_stat;
1647 warn "TxRep: $eval_stat\n";
1648 undef $factory;
1649 };
1650112µs110µs $self->{main}->set_persistent_address_list_factory($factory) if $factory;
1651 } else {warn "TxRep: illegal factory setting\n";}
1652 }
165314µs if (defined $factory) {
1654114µs1708ms $self->{checker} = $self->{default_storage} = $factory->new_checker($self->{main});
# spent 708ms making 1 call to Mail::SpamAssassin::DBBasedAddrList::new_checker
1655
165614µs if ($self->{conf}->{txrep_user2global_ratio} && !defined $self->{global_storage}) {
1657 # hack to handle the BDB and SQL factory types of the storage object
1658 # TODO: add an a method to the handler class instead
1659 my ($storage_type, $is_global);
1660
1661 if (ref($factory) =~ /SQLBasedAddrList/) {
1662 $is_global = defined $self->{conf}->{user_awl_sql_override_username};
1663 $storage_type = 'SQL';
1664 if ($is_global && $self->{conf}->{user_awl_sql_override_username} eq $self->{main}->{username}) {
1665 # skip double storage if current user same as the global override
1666 $self->{user_storage} = $self->{global_storage} = $self->{default_storage};
1667 }
1668 } elsif (ref($factory) =~ /DBBasedAddrList/) {
1669 $is_global = $self->{conf}->{auto_whitelist_path} !~ /__userstate__/;
1670 $storage_type = 'DB';
1671 }
1672 if (!defined $self->{global_storage}) {
1673 my $sql_override_orig = $self->{conf}->{user_awl_sql_override_username};
1674 my $awl_path_orig = $self->{conf}->{auto_whitelist_path};
1675 if ($is_global) {
1676 $self->{conf}->{user_awl_sql_override_username} = '';
1677 $self->{conf}->{auto_whitelist_path} = '__userstate__/tx-reputation';
1678 $self->{global_storage} = $self->{default_storage};
1679 $self->{user_storage} = $factory->new_checker($self->{main});
1680 } else {
1681 $self->{conf}->{user_awl_sql_override_username} = 'GLOBAL';
1682 $self->{conf}->{auto_whitelist_path} = '__local_state_dir__/tx-reputation';
1683 $self->{global_storage} = $factory->new_checker($self->{main});
1684 $self->{user_storage} = $self->{default_storage};
1685 }
1686 $self->{conf}->{user_awl_sql_override_username} = $sql_override_orig;
1687 $self->{conf}->{auto_whitelist_path} = $awl_path_orig;
1688
1689 # Another ugly hack to find out whether the user differs from
1690 # the global one. We need to add a method to the factory handlers
1691 if ($storage_type eq 'DB' &&
1692 $self->{user_storage}->{locked_file} eq $self->{global_storage}->{locked_file}) {
1693 if ($is_global)
1694 {$self->{global_storage}->finish();}
1695 else {$self->{user_storage}->finish();}
1696 $self->{user_storage} = $self->{global_storage} = $self->{default_storage};
1697 }
1698 }
1699 }
1700 } else {
1701 $self->{user_storage} = $self->{global_storage} = $self->{checker} = $self->{default_storage} = undef;
1702 warn("TxRep: could not open storages, quitting!\n");
1703 return 0;
1704 }
1705111µs return 1;
1706}
1707
1708
1709###########################################################################
1710
# spent 5.48s (94µs+5.48) within Mail::SpamAssassin::Plugin::TxRep::finish which was called: # once (94µs+5.48s) by Mail::SpamAssassin::Plugin::TxRep::learner_close at line 1889
sub finish {
1711###########################################################################
171212µs my $self = shift;
1713
171414µs return unless (defined $self->{checker}); # no factory defined; we can't check
1715
1716152µs if ($self->{conf}->{txrep_user2global_ratio} && defined $self->{user_storage} && ($self->{user_storage} != $self->{global_storage})) {
1717 $self->{user_storage}->finish();
1718 $self->{global_storage}->finish();
1719 $self->{user_storage} = undef;
1720 $self->{global_storage} = undef;
1721 } elsif (defined $self->{default_storage}) {
1722110µs15.48s $self->{default_storage}->finish();
# spent 5.48s making 1 call to Mail::SpamAssassin::DBBasedAddrList::finish
172318µs $self->{default_storage} = $self->{checker} = undef;
1724 }
1725114µs $self->{factory} = undef;
1726}
1727
1728
1729###########################################################################
1730
# spent 67.4ms (53.0+14.3) within Mail::SpamAssassin::Plugin::TxRep::ip_to_awl_key which was called 906 times, avg 74µs/call: # 906 times (53.0ms+14.3ms) by Mail::SpamAssassin::Plugin::TxRep::pack_addr at line 1785, avg 74µs/call
sub ip_to_awl_key {
1731###########################################################################
17329063.67ms my ($self, $origip) = @_;
1733
17349061.62ms my $result;
17359065.86ms local $1;
173690630.4ms90614.3ms if (!defined $origip) {
# spent 14.3ms making 906 calls to Mail::SpamAssassin::Plugin::TxRep::CORE:match, avg 16µs/call
1737 # could not find an IP address to use
1738 } elsif ($origip =~ /^ (\d{1,3} \. \d{1,3}) \. \d{1,3} \. \d{1,3} $/xs) {
17399062.67ms my $mask_len = $self->{ipv4_mask_len};
17409062.58ms $mask_len = 16 if !defined $mask_len;
1741 # handle the default and easy cases manually
17429064.12ms if ($mask_len == 32) {$result = $origip;}
17439063.61ms elsif ($mask_len == 16) {$result = $1;}
1744 else {
1745 my $origip_obj = NetAddr::IP->new($origip . '/' . $mask_len);
1746 if (!defined $origip_obj) { # invalid IPv4 address
1747 dbg("TxRep: bad IPv4 address $origip");
1748 } else {
1749 $result = $origip_obj->network->addr;
1750 $result =~s/(\.0){1,3}\z//; # truncate zero tail
1751 }
1752 }
1753 } elsif ($origip =~ /:/ && # triage
1754 $origip =~
1755 /^ [0-9a-f]{0,4} (?: : [0-9a-f]{0,4} | \. [0-9]{1,3} ){2,9} $/xsi) {
1756 # looks like an IPv6 address
1757 my $mask_len = $self->{ipv6_mask_len};
1758 $mask_len = 48 if !defined $mask_len;
1759 my $origip_obj = NetAddr::IP->new6($origip . '/' . $mask_len);
1760 if (!defined $origip_obj) { # invalid IPv6 address
1761 dbg("TxRep: bad IPv6 address $origip");
1762 } else {
1763 $result = $origip_obj->network->full6; # string in a canonical form
1764 $result =~ s/(:0000){1,7}\z/::/; # compress zero tail
1765 }
1766 } else {
1767 dbg("TxRep: bad IP address $origip");
1768 }
17699065.50ms if (defined $result && length($result) > 39) { # just in case, keep under
1770 $result = substr($result,0,39); # the awl.ip field size
1771 }
1772# if (defined $result) {dbg("TxRep: IP masking %s -> %s", $origip || '?', $result || '?');}
17739069.31ms return $result;
1774}
1775
1776
1777###########################################################################
1778
# spent 210ms (119+91.0) within Mail::SpamAssassin::Plugin::TxRep::pack_addr which was called 3116 times, avg 67µs/call: # 3114 times (119ms+90.9ms) by Mail::SpamAssassin::Plugin::TxRep::get_sender at line 1538, avg 67µs/call # 2 times (72µs+14µs) by Mail::SpamAssassin::Plugin::TxRep::modify_reputation at line 1607, avg 43µs/call
sub pack_addr {
1779###########################################################################
1780311613.7ms my ($self, $addr, $origip) = @_;
1781
1782311611.6ms $addr = lc $addr;
1783311647.9ms311623.6ms $addr =~ s/[\000\;\'\"\!\|]/_/gs; # paranoia
# spent 23.6ms making 3116 calls to Mail::SpamAssassin::Plugin::TxRep::CORE:subst, avg 8µs/call
1784
1785402215.2ms90667.4ms if ( defined $origip) {$origip = $self->ip_to_awl_key($origip);}
# spent 67.4ms making 906 calls to Mail::SpamAssassin::Plugin::TxRep::ip_to_awl_key, avg 74µs/call
1786532618.2ms if (!defined $origip) {$origip = 'none';}
1787311654.2ms return $addr . "|ip=" . $origip;
1788}
1789
- -
1792# -------------------------------------------------------------------------
1793=head1 LEARNING SPAM / HAM
1794
1795When SpamAssassin is told to learn (or relearn) a given message as spam or
1796ham, all reputations relevant to the message (email, email_ip, domain, ip, helo)
1797in both global and user storages will be updated using the C<txrep_learn_penalty>
1798respectively the C<rxrep_learn_bonus> values. The new reputation of given sender
1799property (email, domain,...) will be the respective result of one of the following
1800formulas:
1801
1802 new_reputation = old_reputation + learn_penalty
1803 new_reputation = old_reputation - learn_bonus
1804
1805The TxRep plugin currently does track each message individually, hence it
1806does not detect when you learn the message repeatedly. It will add/subtract
1807the penalty/bonus score each time the message is fed to the spam learner.
1808
1809=cut
1810######################################################### plugin hook #####
1811
# spent 16µs within Mail::SpamAssassin::Plugin::TxRep::learner_new which was called: # once (16µs+0s) by Mail::SpamAssassin::PluginHandler::callback at line 204 of Mail/SpamAssassin/PluginHandler.pm
sub learner_new {
1812###########################################################################
181312µs my ($self) = @_;
1814
181517µs $self->{txKeepStoreTied} = 1;
1816111µs return $self;
1817}
1818
1819
1820######################################################### plugin hook #####
1821sub autolearn {
1822###########################################################################
1823 my ($self, $params) = @_;
1824
1825 $self->{last_pms} = $params->{permsgstatus};
1826 return $self->{autolearn} = 1;
1827}
1828
1829
1830######################################################### plugin hook #####
1831
# spent 1520s (30.9ms+1520) within Mail::SpamAssassin::Plugin::TxRep::learn_message which was called 235 times, avg 6.47s/call: # 235 times (30.9ms+1520s) by Mail::SpamAssassin::PluginHandler::callback at line 204 of Mail/SpamAssassin/PluginHandler.pm, avg 6.47s/call
sub learn_message {
1832###########################################################################
1833235521µs my ($self, $params) = @_;
1834235694µs return 0 unless (defined $params->{isspam});
1835
18362351.62ms2351.51ms dbg("TxRep: learning a message");
# spent 1.51ms making 235 calls to Mail::SpamAssassin::Logger::dbg, avg 6µs/call
18372353.48ms23568.1ms my $pms = ($self->{last_pms})? $self->{last_pms} : Mail::SpamAssassin::PerMsgStatus->new($self->{main}, $params->{msg});
# spent 68.1ms making 235 calls to Mail::SpamAssassin::PerMsgStatus::new, avg 290µs/call
18382351.63ms if (!defined $pms->{relays_internal} && !defined $pms->{relays_external}) {
18392352.45ms23543.4s $pms->extract_message_metadata();
# spent 43.4s making 235 calls to Mail::SpamAssassin::PerMsgStatus::extract_message_metadata, avg 185ms/call
1840 }
1841
18422351.38ms if ($params->{isspam})
18432351.61ms {$self->{learning} = $self->{conf}->{txrep_learn_penalty};}
1844 else {$self->{learning} = -1 * $self->{conf}->{txrep_learn_bonus};}
1845
18462352.89ms2351476s my $ret = !$self->{learning} || $self->check_senders_reputation($pms);
# spent 1476s making 235 calls to Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation, avg 6.28s/call
1847235725µs $self->{learning} = undef;
184823511.5ms803.14ms return $ret;
# spent 3.14ms making 80 calls to Mail::SpamAssassin::PerMsgStatus::DESTROY, avg 39µs/call
1849}
1850
1851
1852######################################################### plugin hook #####
1853
# spent 1469s (12.3ms+1469) within Mail::SpamAssassin::Plugin::TxRep::forget_message which was called 218 times, avg 6.74s/call: # 218 times (12.3ms+1469s) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1291, avg 6.74s/call
sub forget_message {
1854###########################################################################
1855218846µs my ($self, $params) = @_;
1856218716µs return 0 unless ($self->{conf}->{use_txrep});
1857218709µs my $pms = ($self->{last_pms})? $self->{last_pms} : Mail::SpamAssassin::PerMsgStatus->new($self->{main}, $params->{msg});
1858
18592181.47ms2181.29ms dbg("TxRep: forgetting a message");
# spent 1.29ms making 218 calls to Mail::SpamAssassin::Logger::dbg, avg 6µs/call
1860218590µs $self->{forgetting} = 1;
18612182.04ms2180s my $ret = $self->check_senders_reputation($pms);
# spent 1469s making 218 calls to Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation, avg 6.74s/call, recursion: max depth 1, sum of overlapping time 1469s
1862218945µs $self->{forgetting} = undef;
18632182.25ms return $ret;
1864}
1865
1866
1867######################################################### plugin hook #####
1868sub learner_expire_old_training {
1869###########################################################################
1870 my ($self, $params) = @_;
1871 return 0 unless ($self->{conf}->{use_txrep} && $self->{conf}->{txrep_expiry_days});
1872
1873 dbg("TxRep: expiry not implemented yet");
1874# dbg("TxRep: expiry starting");
1875# my $timer = $self->{main}->time_method("expire_bayes");
1876# $self->{store}->expire_old_tokens($params);
1877# dbg("TxRep: expiry completed");
1878}
1879
1880
1881######################################################### plugin hook #####
1882
# spent 5.48s (49µs+5.48) within Mail::SpamAssassin::Plugin::TxRep::learner_close which was called: # once (49µs+5.48s) by Mail::SpamAssassin::PluginHandler::callback at line 204 of Mail/SpamAssassin/PluginHandler.pm
sub learner_close {
1883###########################################################################
188412µs my ($self, $params) = @_;
188513µs my $quiet = $params->{quiet};
188614µs return 0 unless ($self->{conf}->{use_txrep});
1887
188813µs $self->{txKeepStoreTied} = undef;
188919µs15.48s $self->finish();
# spent 5.48s making 1 call to Mail::SpamAssassin::Plugin::TxRep::finish
1890128µs118µs dbg("TxRep: learner_close");
# spent 18µs making 1 call to Mail::SpamAssassin::Logger::dbg
1891}
1892
1893
1894# -------------------------------------------------------------------------
1895=head1 OPTIMIZING TXREP
1896
1897TxRep can be optimized for speed and simplicity, or for the precision in
1898assigning the reputation scores.
1899
1900First of all TxRep can be quickly disabled and re-enabled through the option
1901L</C<use_txrep>>. It can be done globally, or individually in each respective
1902C<user_prefs>. Disabling TxRep will not destroy the database, so it can be
1903re-enabled any time later again.
1904
1905On many systems, SQL-based storage may perform faster than the default
1906Berkeley DB storage, so you should consider setting it up. See the section
1907L</SQL-BASED STORAGE> for instructions.
1908
1909Then there are multiple settings that can reduce the number of records stored
1910in the database, hence reducing the size of the storage, and also the processing
1911time:
1912
19131. Setting L</C<txrep_user2global_ratio>> to zero will disable the dual storage,
1914halving so the disk space requirements, and the processing times of this plugin.
1915
19162. You can disable all but one of the L<REPUTATION WEIGHTS>. The EMAIL_IP is
1917the most specific option, so it is the most likely choice in such case, but you
1918could base the reputation system on any of the remaining scores. Each of the
1919enabled reputations adds a new entry to the database for each new identificator.
1920So while for example the number of recorded and scored domains may be big, the
1921number of stored IP addresses will be probably higher, and would require more
1922space in the storage.
1923
19243. Disabling the L</C<txrep_track_messages>> avoids storing a separate entry
1925for every scanned message, hence also reducing the disk space requirements, and
1926the processing time.
1927
19284. Disabling the option L</C<txrep_autolearn>> will save the processing time
1929at messages that trigger the auto-learning process.
1930
19315. Disabling L</C<txrep_whitelist_out>> will reduce the processing time at
1932outbound connections.
1933
19346. Keeping the option L</C<auto_whitelist_distinguish_signed>> enabled may help
1935slightly reducing the size of the database, because at signed messages, the
1936originating IP address is ignored, hence no additional database entries are
1937needed for each separate IP address (resp. a masked block of IP addresses).
1938
1939
1940Since TxRep reuses the storage architecture of the former AWL plugin, for
1941initializing the SQL storage, the same instructions apply also to TxRep.
1942Although the old AWL table can be reused for TxRep, by default TxRep expects
1943the SQL table to be named "txrep".
1944
1945To install a new SQL table for TxRep, run the appropriate SQL file for your
1946system under the /sql directory.
1947
1948If you get a syntax error at an older version of MySQL, use TYPE=MyISAM
1949instead of ENGINE=MyISAM at the end of the command. You can also use other
1950types of ENGINE (depending on what is available on your system). For example
1951MEMORY engine stores the entire table in the server memory, achieving
1952performance similar to Redis. You would need to care about the replication
1953of the RAM table to disk through a cronjob, to avoid loss of data at reboot.
1954The InnoDB engine is used by default, offering high scalability (database
1955size and concurence of accesses). In conjunction with a high value of
1956innodb_buffer_pool or with the memcached plugin (MySQL v5.6+) it can also
1957offer performance comparable to Redis.
1958
1959=cut
1960
1961110µs1;
 
# spent 67.5ms within Mail::SpamAssassin::Plugin::TxRep::CORE:match which was called 13600 times, avg 5µs/call: # 6228 times (29.8ms+0s) by Mail::SpamAssassin::Plugin::TxRep::get_sender at line 1543, avg 5µs/call # 6012 times (21.3ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1266, avg 4µs/call # 906 times (14.3ms+0s) by Mail::SpamAssassin::Plugin::TxRep::ip_to_awl_key at line 1736, avg 16µs/call # 453 times (2.16ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1255, avg 5µs/call # once (5µs+0s) by Mail::SpamAssassin::Plugin::TxRep::open_storages at line 1640
sub Mail::SpamAssassin::Plugin::TxRep::CORE:match; # opcode
# spent 79.3ms within Mail::SpamAssassin::Plugin::TxRep::CORE:regcomp which was called 6012 times, avg 13µs/call: # 6012 times (79.3ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1266, avg 13µs/call
sub Mail::SpamAssassin::Plugin::TxRep::CORE:regcomp; # opcode
# spent 26.7ms within Mail::SpamAssassin::Plugin::TxRep::CORE:subst which was called 3569 times, avg 7µs/call: # 3116 times (23.6ms+0s) by Mail::SpamAssassin::Plugin::TxRep::pack_addr at line 1783, avg 8µs/call # 453 times (3.16ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1257, avg 7µs/call
sub Mail::SpamAssassin::Plugin::TxRep::CORE:subst; # opcode