Filename | /usr/local/lib/perl5/site_perl/Mail/SpamAssassin/Plugin/TxRep.pm |
Statements | Executed 171883 statements in 1.63s |
Calls | P | F | Exclusive Time |
Inclusive Time |
Subroutine |
---|---|---|---|---|---|
2207 | 1 | 1 | 655ms | 614s | check_reputation | Mail::SpamAssassin::Plugin::TxRep::
321 | 2 | 1 | 277ms | 620s | check_senders_reputation (recurses: max depth 1, inclusive time 613s) | Mail::SpamAssassin::Plugin::TxRep::
2207 | 1 | 1 | 189ms | 925ms | get_sender | Mail::SpamAssassin::Plugin::TxRep::
2207 | 7 | 1 | 93.3ms | 615s | check_reputations | Mail::SpamAssassin::Plugin::TxRep::
1608 | 2 | 1 | 85.8ms | 301ms | add_score | Mail::SpamAssassin::Plugin::TxRep::
2208 | 2 | 1 | 84.8ms | 142ms | pack_addr | Mail::SpamAssassin::Plugin::TxRep::
9174 | 12 | 1 | 80.5ms | 80.5ms | count | Mail::SpamAssassin::Plugin::TxRep::
3964 | 1 | 1 | 54.0ms | 54.0ms | CORE:regcomp (opcode) | Mail::SpamAssassin::Plugin::TxRep::
9342 | 5 | 1 | 45.6ms | 45.6ms | CORE:match (opcode) | Mail::SpamAssassin::Plugin::TxRep::
1 | 1 | 1 | 34.2ms | 42.3ms | BEGIN@205 | Mail::SpamAssassin::Plugin::TxRep::
642 | 1 | 1 | 32.4ms | 40.9ms | ip_to_awl_key | Mail::SpamAssassin::Plugin::TxRep::
3797 | 5 | 1 | 32.2ms | 32.2ms | total | Mail::SpamAssassin::Plugin::TxRep::
234 | 1 | 1 | 30.1ms | 665s | learn_message | Mail::SpamAssassin::Plugin::TxRep::
512 | 1 | 1 | 28.1ms | 83.5ms | remove_score | Mail::SpamAssassin::Plugin::TxRep::
2207 | 1 | 1 | 21.1ms | 29.9ms | open_storages | Mail::SpamAssassin::Plugin::TxRep::
2529 | 2 | 1 | 18.7ms | 18.7ms | CORE:subst (opcode) | Mail::SpamAssassin::Plugin::TxRep::
87 | 1 | 1 | 4.74ms | 613s | forget_message | Mail::SpamAssassin::Plugin::TxRep::
1 | 1 | 1 | 229µs | 1.13ms | set_config | Mail::SpamAssassin::Plugin::TxRep::
1 | 1 | 1 | 148µs | 3.21s | modify_reputation | Mail::SpamAssassin::Plugin::TxRep::
1 | 1 | 1 | 112µs | 1.32ms | new | Mail::SpamAssassin::Plugin::TxRep::
1 | 1 | 1 | 92µs | 5.67s | finish | Mail::SpamAssassin::Plugin::TxRep::
1 | 1 | 1 | 57µs | 515µs | BEGIN@203 | Mail::SpamAssassin::Plugin::TxRep::
1 | 1 | 1 | 50µs | 5.67s | learner_close | Mail::SpamAssassin::Plugin::TxRep::
1 | 1 | 1 | 44µs | 54µs | BEGIN@198 | Mail::SpamAssassin::Plugin::TxRep::
1 | 1 | 1 | 32µs | 186µs | BEGIN@206 | Mail::SpamAssassin::Plugin::TxRep::
1 | 1 | 1 | 32µs | 32µs | __ANON__[:491] | Mail::SpamAssassin::Plugin::TxRep::
1 | 1 | 1 | 29µs | 107µs | BEGIN@201 | Mail::SpamAssassin::Plugin::TxRep::
1 | 1 | 1 | 28µs | 56µs | BEGIN@199 | Mail::SpamAssassin::Plugin::TxRep::
1 | 1 | 1 | 26µs | 32µs | BEGIN@200 | Mail::SpamAssassin::Plugin::TxRep::
1 | 1 | 1 | 23µs | 110µs | BEGIN@209 | Mail::SpamAssassin::Plugin::TxRep::
1 | 1 | 1 | 22µs | 166µs | BEGIN@207 | Mail::SpamAssassin::Plugin::TxRep::
1 | 1 | 1 | 18µs | 18µs | BEGIN@204 | Mail::SpamAssassin::Plugin::TxRep::
1 | 1 | 1 | 15µs | 15µs | learner_new | Mail::SpamAssassin::Plugin::TxRep::
0 | 0 | 0 | 0s | 0s | __ANON__[:302] | Mail::SpamAssassin::Plugin::TxRep::
0 | 0 | 0 | 0s | 0s | __ANON__[:346] | Mail::SpamAssassin::Plugin::TxRep::
0 | 0 | 0 | 0s | 0s | __ANON__[:371] | Mail::SpamAssassin::Plugin::TxRep::
0 | 0 | 0 | 0s | 0s | __ANON__[:394] | Mail::SpamAssassin::Plugin::TxRep::
0 | 0 | 0 | 0s | 0s | __ANON__[:417] | Mail::SpamAssassin::Plugin::TxRep::
0 | 0 | 0 | 0s | 0s | __ANON__[:442] | Mail::SpamAssassin::Plugin::TxRep::
0 | 0 | 0 | 0s | 0s | __ANON__[:523] | Mail::SpamAssassin::Plugin::TxRep::
0 | 0 | 0 | 0s | 0s | __ANON__[:556] | Mail::SpamAssassin::Plugin::TxRep::
0 | 0 | 0 | 0s | 0s | __ANON__[:638] | Mail::SpamAssassin::Plugin::TxRep::
0 | 0 | 0 | 0s | 0s | __ANON__[:754] | Mail::SpamAssassin::Plugin::TxRep::
0 | 0 | 0 | 0s | 0s | __ANON__[:788] | Mail::SpamAssassin::Plugin::TxRep::
0 | 0 | 0 | 0s | 0s | __ANON__[:827] | Mail::SpamAssassin::Plugin::TxRep::
0 | 0 | 0 | 0s | 0s | __ANON__[:853] | Mail::SpamAssassin::Plugin::TxRep::
0 | 0 | 0 | 0s | 0s | __ANON__[:884] | Mail::SpamAssassin::Plugin::TxRep::
0 | 0 | 0 | 0s | 0s | __ANON__[:936] | Mail::SpamAssassin::Plugin::TxRep::
0 | 0 | 0 | 0s | 0s | __ANON__[:989] | Mail::SpamAssassin::Plugin::TxRep::
0 | 0 | 0 | 0s | 0s | _fail_exit | Mail::SpamAssassin::Plugin::TxRep::
0 | 0 | 0 | 0s | 0s | _fn_envelope | Mail::SpamAssassin::Plugin::TxRep::
0 | 0 | 0 | 0s | 0s | _message | Mail::SpamAssassin::Plugin::TxRep::
0 | 0 | 0 | 0s | 0s | autolearn | Mail::SpamAssassin::Plugin::TxRep::
0 | 0 | 0 | 0s | 0s | blacklist_address | Mail::SpamAssassin::Plugin::TxRep::
0 | 0 | 0 | 0s | 0s | learner_expire_old_training | Mail::SpamAssassin::Plugin::TxRep::
0 | 0 | 0 | 0s | 0s | remove_address | Mail::SpamAssassin::Plugin::TxRep::
0 | 0 | 0 | 0s | 0s | whitelist_address | Mail::SpamAssassin::Plugin::TxRep::
Line | State ments |
Time on line |
Calls | Time in subs |
Code |
---|---|---|---|---|---|
1 | # <@LICENSE> | ||||
2 | # Licensed to the Apache Software Foundation (ASF) under one or more | ||||
3 | # contributor license agreements. See the NOTICE file distributed with | ||||
4 | # this work for additional information regarding copyright ownership. | ||||
5 | # The ASF licenses this file to you under the Apache License, Version 2.0 | ||||
6 | # (the "License"); you may not use this file except in compliance with | ||||
7 | # the License. You may obtain a copy of the License at: | ||||
8 | # | ||||
9 | # http://www.apache.org/licenses/LICENSE-2.0 | ||||
10 | # | ||||
11 | # Unless required by applicable law or agreed to in writing, software | ||||
12 | # distributed under the License is distributed on an "AS IS" BASIS, | ||||
13 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||||
14 | # See the License for the specific language governing permissions and | ||||
15 | # limitations under the License. | ||||
16 | # </@LICENSE> | ||||
17 | |||||
18 | |||||
19 | =head1 NAME | ||||
20 | |||||
21 | Mail::SpamAssassin::Plugin::TxRep - Normalize scores with sender reputation records | ||||
22 | |||||
23 | |||||
24 | =head1 SYNOPSIS | ||||
25 | |||||
26 | The TxRep (Reputation) plugin is designed as an improved replacement of the AWL | ||||
27 | (Auto-Whitelist) plugin. It adjusts the final message spam score by looking up and | ||||
28 | taking in consideration the reputation of the sender. | ||||
29 | |||||
30 | To try TxRep out, you B<have to> disable the AWL plugin (if present), back up its | ||||
31 | database and add a line loading this module in init.pre (AWL may be enabled in v310.pre): | ||||
32 | |||||
33 | # loadplugin Mail::SpamAssassin::Plugin::AWL | ||||
34 | loadplugin Mail::SpamAssassin::Plugin::TxRep | ||||
35 | |||||
36 | When AWL is not disabled, TxRep will refuse to run. | ||||
37 | |||||
38 | Use the supplied 60_txreputation.cf file or add these lines to a .cf file: | ||||
39 | |||||
40 | header TXREP eval:check_senders_reputation() | ||||
41 | describe TXREP Score normalizing based on sender's reputation | ||||
42 | tflags TXREP userconf noautolearn | ||||
43 | priority TXREP 1000 | ||||
44 | |||||
45 | |||||
46 | =head1 DESCRIPTION | ||||
47 | |||||
48 | This plugin is intended to replace the former AWL - AutoWhiteList. Although the | ||||
49 | concept and the scope differ, the purpose remains the same - the normalizing of spam | ||||
50 | score results based on previous sender's history. The name was intentionally changed | ||||
51 | from "whitelist" to "reputation" to avoid any confusion, since the result score can | ||||
52 | be adjusted in both directions. | ||||
53 | |||||
54 | The TxRep plugin keeps track of the average SpamAssassin score for senders. | ||||
55 | Senders are tracked using multiple identificators, or their combinations: the From: | ||||
56 | email address, the originating IP and/or an originating block of IPs, sender's domain | ||||
57 | name, the DKIM signature, and the HELO name. TxRep then uses the average score to reduce | ||||
58 | the variability in scoring from message to message, and modifies the final score by | ||||
59 | pushing the result towards the historical average. This improves the accuracy of | ||||
60 | filtering for most email. | ||||
61 | |||||
62 | In comparison with the original AWL plugin, several conceptual changes were implemented | ||||
63 | in TxRep: | ||||
64 | |||||
65 | 1. B<Scoring> - at AWL, although it tracks the number of messages received from each | ||||
66 | respective sender, when calculating the corrective score at a new message, it does | ||||
67 | not take it in count in any way. So for example a sender who previously sent a single | ||||
68 | ham message with the score of -5, and then sends a second one with the score of +10, | ||||
69 | AWL will issue a corrective score bringing the score towards the -5. With the default | ||||
70 | C<auto_whitelist_factor> of 0.5, the resulting score would be only 2.5. And it would be | ||||
71 | exactly the same even if the sender previously sent 1,000 messages with the average of | ||||
72 | -5. TxRep tries to take the maximal advantage of the collected data, and adjusts the | ||||
73 | final score not only with the mean reputation score stored in the database, but also | ||||
74 | respecting the number of messages already seen from the sender. You can see the exact | ||||
75 | formula in the section L</C<txrep_factor>>. | ||||
76 | |||||
77 | 2. B<Learning> - AWL ignores any spam/ham learning. In fact it acts against it, which | ||||
78 | often leads to a frustrating situation, where a user repeatedly tags all messages of a | ||||
79 | given sender as spam (resp. ham), but at any new message from the sender, AWL will | ||||
80 | adjust the score of the message back to the historical average which does B<not> include | ||||
81 | the learned scores. This is now changed at TxRep, and every spam/ham learning will be | ||||
82 | recorded in the reputation database, and hence taken in consideration at future email | ||||
83 | from the respective sender. See the section L</"LEARNING SPAM / HAM"> for more details. | ||||
84 | |||||
85 | 3. B<Auto-Learning> - in certain situations SpamAssassin may declare a message an | ||||
86 | obvious spam resp. ham, and launch the auto-learning process, so that the message can be | ||||
87 | re-evaluated. AWL, by design, did not perform any auto-learning adjustments. This plugin | ||||
88 | will readjust the stored reputation by the value defined by L</C<txrep_learn_penalty>> | ||||
89 | resp. L</C<txrep_learn_bonus>>. Auto-learning score thresholds may be tuned, or the | ||||
90 | auto-learning completely disabled, through the setting L</C<txrep_autolearn>>. | ||||
91 | |||||
92 | 4. B<Relearning> - messages that were wrongly learned or auto-learned, can be relearned. | ||||
93 | Old reputations are removed from the database, and new ones added instead of them. The | ||||
94 | relearning works better when message tracking is enabled through the | ||||
95 | L</C<txrep_track_messages>> option. Without it, the relearned score is simply added to | ||||
96 | the reputation, without removing the old ones. | ||||
97 | |||||
98 | 5. B<Aging> - with AWL, any historical record of given sender has the same weight. It | ||||
99 | means that changes in senders behavior, or modified SA rules may take long time, or | ||||
100 | be virtually negated by the AWL normalization, especially at senders with high count | ||||
101 | of past messages, and low recent frequency. It also turns to be particularly | ||||
102 | counterproductive when the administrator detects new patterns in certain messages, and | ||||
103 | applies new rules to better tag such messages as spam or ham. AWL will practically | ||||
104 | eliminate the effect of the new rules, by adjusting the score back towards the (wrong) | ||||
105 | historical average. Only setting the C<auto_whitelist_factor> lower would help, but in | ||||
106 | the same time it would also reduce the overall impact of AWL, and put doubts on its | ||||
107 | purpose. TxRep, besides the L</C<txrep_factor>> (replacement of the C<auto_whitelist_factor>), | ||||
108 | introduces also the L</C<txrep_dilution_factor>> to help coping with this issue by | ||||
109 | progressively reducing the impact of past records. More details can be found in the | ||||
110 | description of the factor below. | ||||
111 | |||||
112 | 6. B<Blacklisting and Whitelisting> - when a whitelisting or blacklisting was requested | ||||
113 | through SpamAssassin's API, AWL adjusts the historical total score of the plain email | ||||
114 | address without IP (and deleted records bound to an IP), but since during the reception | ||||
115 | new records with IP will be added, the blacklisted entry would cease acting during | ||||
116 | scanning. TxRep always uses the record of th plain email address without IP together | ||||
117 | with the one bound to an IP address, DKIM signature, or SPF pass (unless the weight | ||||
118 | factor for the EMAIL reputation is set to zero). AWL uses the score of 100 (resp. -100) | ||||
119 | for the blacklisting (resp. whitelisting) purposes. TxRep increases the value | ||||
120 | proportionally to the weight factor of the EMAIL reputation. It is explained in details | ||||
121 | in the section L</BLACKLISTING / WHITELISTING>. TxRep can blacklist or whitelist also | ||||
122 | IP addresses, domain names, and dotless HELO names. | ||||
123 | |||||
124 | 7. B<Sender Identification> - AWL identifies a sender on the basis of the email address | ||||
125 | used, and the originating IP address (better told its part defined by the mask setting). | ||||
126 | The main purpose of this measure is to avoid assigning false good scores to spammers who | ||||
127 | spoof known email addresses. The disadvantage appears at senders who send from frequently | ||||
128 | changing locations or even when connecting through dynamical IP addresses that are not | ||||
129 | within the block defined by the mask setting. Their score is difficult or sometimes | ||||
130 | impossible to track. Another disadvantage is, for example, at a spammer persistently | ||||
131 | sending spam from the same IP address, just under different email addresses. AWL will not | ||||
132 | find his previous scores, unless he reuses the same email address again. TxRep uses several | ||||
133 | identificators, and creates separate database entries for each of them. It tracks not only | ||||
134 | the email/IP address combination like AWL, but also the standalone email address (regardless | ||||
135 | of the originating IP), the standalone IP (regardless of email address used), the domain | ||||
136 | name of the email address, the DKIM signature, and the HELO name of the connecting PC. The | ||||
137 | influence of each individual identificator may be tuned up with the help of weight factors | ||||
138 | described in the section L</REPUTATION WEIGHTS>. | ||||
139 | |||||
140 | 8. B<Message Tracking> - TxRep (optionally) keeps track of already scanned and/or learned | ||||
141 | message ID's. This is useful for avoiding to strengthen the reputation score by simply | ||||
142 | rescanning or relearning the same message multiple times. In the same time it also allows | ||||
143 | the proper relearning of once wrongly learned messages, or relearning them after the | ||||
144 | learn penalty or bonus were changed. See the option L</C<txrep_track_messages>>. | ||||
145 | |||||
146 | 9. B<User and Global Storages> - usually it is recommended to use the per-user setup | ||||
147 | of SpamAssassin, because each user may have quite different requirements, and may receive | ||||
148 | quite different sort of email. Especially when using the Bayesian and AWL plugins, | ||||
149 | the efficiency is much better when SpamAssassin is learned spam and ham separately | ||||
150 | for each user. However, the disadvantage is that senders and emails already learned | ||||
151 | many times by different users, will need to be relearned without any recognized history, | ||||
152 | anytime they arrive to another user. TxRep uses the advantages of both systems. It can | ||||
153 | use dual storages: the global common storage, where all email processed by SpamAssassin | ||||
154 | is recorded, and a local storage separate for each user, with reputation data from his | ||||
155 | email only. See more details at the setting L</C<txrep_user2global_ratio>>. | ||||
156 | |||||
157 | 10. B<Outbound Whitelisting> - when a local user sends messages to an email address, we | ||||
158 | assume that he needs to see the eventual answer too, hence the recipient's address should | ||||
159 | be whitelisted. When SpamAssassin is used for scanning outgoing email too, when local | ||||
160 | users use the SMTP server where SA is installed, for sending email, and when internal | ||||
161 | networks are defined, TxREP will improve the reputation of all 'To:' and 'CC' addresses | ||||
162 | from messages originating in the internal networks. Details can be found at the setting | ||||
163 | L</C<txrep_whitelist_out>>. | ||||
164 | |||||
165 | Both plugins (AWL and TxREP) cannot coexist. It is necessary to disable the AWL to allow | ||||
166 | TxRep running. TxRep reuses the database handling of the original AWL module, and some | ||||
167 | its parameters bound to the database handler modules. By default, TxRep creates its own | ||||
168 | database, but the original auto-whitelist can be reused as a starting point. The AWL | ||||
169 | database can be renamed to the name defined in TxRep settings, and TxRep will start | ||||
170 | using it. The original auto-whitelist database has to be backed up, to allow switching | ||||
171 | back to the original state. | ||||
172 | |||||
173 | The spamassassin/Plugin/TxRep.pm file replaces both spamassassin/Plugin/AWL.pm and | ||||
174 | spamassassin/AutoWhitelist.pm. Another two AWL files, spamassassin/DBBasedAddrList.pm | ||||
175 | and spamassassin/SQLBasedAddrList.pm are still needed. | ||||
176 | |||||
177 | |||||
178 | =head1 TEMPLATE TAGS | ||||
179 | |||||
180 | This plugin module adds the following C<tags> that can be used as | ||||
181 | placeholders in certain options. See L<Mail::SpamAssassin::Conf> | ||||
182 | for more information on TEMPLATE TAGS. | ||||
183 | |||||
184 | _TXREP_XXX_Y_ TXREP modifier | ||||
185 | _TXREP_XXX_Y_MEAN_ Mean score on which TXREP modification is based | ||||
186 | _TXREP_XXX_Y_COUNT_ Number of messages on which TXREP modification is based | ||||
187 | _TXREP_XXX_Y_PRESCORE_ Score before TXREP | ||||
188 | _TXREP_XXX_Y_UNKNOW_ New sender (not found in the TXREP list) | ||||
189 | |||||
190 | The XXX part of the tag takes the form of one of the following IDs, depending | ||||
191 | on the reputation checked: EMAIL, EMAIL_IP, IP, DOMAIN, or HELO. The _Y appendix | ||||
192 | ID is used only in the case of dual storage, and takes the form of either _U (for | ||||
193 | user storage reputations), or _G (for global storage reputations). | ||||
194 | |||||
195 | =cut # .................................................................... | ||||
196 | package Mail::SpamAssassin::Plugin::TxRep; | ||||
197 | |||||
198 | 2 | 72µs | 2 | 64µs | # spent 54µs (44+10) within Mail::SpamAssassin::Plugin::TxRep::BEGIN@198 which was called:
# once (44µs+10µs) by Mail::SpamAssassin::PluginHandler::load_plugin at line 198 # spent 54µs making 1 call to Mail::SpamAssassin::Plugin::TxRep::BEGIN@198
# spent 10µs making 1 call to strict::import |
199 | 2 | 70µs | 2 | 84µs | # spent 56µs (28+28) within Mail::SpamAssassin::Plugin::TxRep::BEGIN@199 which was called:
# once (28µs+28µs) by Mail::SpamAssassin::PluginHandler::load_plugin at line 199 # spent 56µs making 1 call to Mail::SpamAssassin::Plugin::TxRep::BEGIN@199
# spent 28µs making 1 call to warnings::import |
200 | 2 | 80µs | 2 | 38µs | # spent 32µs (26+6) within Mail::SpamAssassin::Plugin::TxRep::BEGIN@200 which was called:
# once (26µs+6µs) by Mail::SpamAssassin::PluginHandler::load_plugin at line 200 # spent 32µs making 1 call to Mail::SpamAssassin::Plugin::TxRep::BEGIN@200
# spent 6µs making 1 call to bytes::import |
201 | 2 | 80µs | 2 | 185µs | # spent 107µs (29+78) within Mail::SpamAssassin::Plugin::TxRep::BEGIN@201 which was called:
# once (29µs+78µs) by Mail::SpamAssassin::PluginHandler::load_plugin at line 201 # spent 107µs making 1 call to Mail::SpamAssassin::Plugin::TxRep::BEGIN@201
# spent 78µs making 1 call to re::import |
202 | |||||
203 | 3 | 138µs | 3 | 972µs | # spent 515µs (57+457) within Mail::SpamAssassin::Plugin::TxRep::BEGIN@203 which was called:
# once (57µs+457µs) by Mail::SpamAssassin::PluginHandler::load_plugin at line 203 # spent 515µs making 1 call to Mail::SpamAssassin::Plugin::TxRep::BEGIN@203
# spent 421µs making 1 call to NetAddr::IP::import
# spent 37µs making 1 call to version::_VERSION |
204 | 2 | 71µs | 1 | 18µs | # spent 18µs within Mail::SpamAssassin::Plugin::TxRep::BEGIN@204 which was called:
# once (18µs+0s) by Mail::SpamAssassin::PluginHandler::load_plugin at line 204 # spent 18µs making 1 call to Mail::SpamAssassin::Plugin::TxRep::BEGIN@204 |
205 | 2 | 401µs | 1 | 42.3ms | # spent 42.3ms (34.2+8.06) within Mail::SpamAssassin::Plugin::TxRep::BEGIN@205 which was called:
# once (34.2ms+8.06ms) by Mail::SpamAssassin::PluginHandler::load_plugin at line 205 # spent 42.3ms making 1 call to Mail::SpamAssassin::Plugin::TxRep::BEGIN@205 |
206 | 2 | 67µs | 2 | 340µs | # spent 186µs (32+154) within Mail::SpamAssassin::Plugin::TxRep::BEGIN@206 which was called:
# once (32µs+154µs) by Mail::SpamAssassin::PluginHandler::load_plugin at line 206 # spent 186µs making 1 call to Mail::SpamAssassin::Plugin::TxRep::BEGIN@206
# spent 154µs making 1 call to Exporter::import |
207 | 2 | 70µs | 2 | 310µs | # spent 166µs (22+144) within Mail::SpamAssassin::Plugin::TxRep::BEGIN@207 which was called:
# once (22µs+144µs) by Mail::SpamAssassin::PluginHandler::load_plugin at line 207 # spent 166µs making 1 call to Mail::SpamAssassin::Plugin::TxRep::BEGIN@207
# spent 144µs making 1 call to Exporter::import |
208 | |||||
209 | 2 | 12.6ms | 2 | 197µs | # spent 110µs (23+87) within Mail::SpamAssassin::Plugin::TxRep::BEGIN@209 which was called:
# once (23µs+87µs) by Mail::SpamAssassin::PluginHandler::load_plugin at line 209 # spent 110µs making 1 call to Mail::SpamAssassin::Plugin::TxRep::BEGIN@209
# spent 87µs making 1 call to vars::import |
210 | 1 | 20µs | @ISA = qw(Mail::SpamAssassin::Plugin); | ||
211 | |||||
212 | |||||
213 | ########################################################################### | ||||
214 | # spent 1.32ms (112µs+1.21) within Mail::SpamAssassin::Plugin::TxRep::new which was called:
# once (112µs+1.21ms) by Mail::SpamAssassin::PluginHandler::load_plugin at line 1 of (eval 42)[Mail/SpamAssassin/PluginHandler.pm:129] | ||||
215 | ########################################################################### | ||||
216 | 1 | 3µs | my ($class, $main) = @_; | ||
217 | |||||
218 | 1 | 3µs | $class = ref($class) || $class; | ||
219 | 1 | 13µs | 1 | 32µs | my $self = $class->SUPER::new($main); # spent 32µs making 1 call to Mail::SpamAssassin::Plugin::new |
220 | 1 | 2µs | bless($self, $class); | ||
221 | |||||
222 | 1 | 10µs | $self->{main} = $main; | ||
223 | 1 | 5µs | $self->{conf} = $main->{conf}; | ||
224 | 1 | 5µs | $self->{factor} = $main->{conf}->{txrep_factor}; | ||
225 | 1 | 3µs | $self->{ipv4_mask_len} = $main->{conf}->{txrep_ipv4_mask_len}; | ||
226 | 1 | 3µs | $self->{ipv6_mask_len} = $main->{conf}->{txrep_ipv6_mask_len}; | ||
227 | 1 | 11µs | 1 | 34µs | $self->register_eval_rule("check_senders_reputation"); # spent 34µs making 1 call to Mail::SpamAssassin::Plugin::register_eval_rule |
228 | 1 | 9µs | 1 | 1.13ms | $self->set_config($main->{conf}); # spent 1.13ms making 1 call to Mail::SpamAssassin::Plugin::TxRep::set_config |
229 | |||||
230 | # only the default conf loaded here, do nothing here requiring | ||||
231 | # the runtime settings | ||||
232 | 1 | 8µs | 1 | 11µs | dbg("TxRep: new object created"); # spent 11µs making 1 call to Mail::SpamAssassin::Logger::dbg |
233 | 1 | 10µs | return $self; | ||
234 | } | ||||
235 | |||||
236 | |||||
237 | ########################################################################### | ||||
238 | # spent 1.13ms (229µs+904µs) within Mail::SpamAssassin::Plugin::TxRep::set_config which was called:
# once (229µs+904µs) by Mail::SpamAssassin::Plugin::TxRep::new at line 228 | ||||
239 | ########################################################################### | ||||
240 | 1 | 2µs | my($self, $conf) = @_; | ||
241 | 1 | 2µs | my @cmds; | ||
242 | |||||
243 | # ------------------------------------------------------------------------- | ||||
244 | =head1 USER PREFERENCES | ||||
245 | |||||
246 | The following options can be used in both site-wide (C<local.cf>) and | ||||
247 | user-specific (C<user_prefs>) configuration files to customize how | ||||
248 | SpamAssassin handles incoming email messages. | ||||
249 | |||||
250 | =over 4 | ||||
251 | |||||
252 | =item B<use_txrep> | ||||
253 | |||||
254 | 0 | 1 (default: 0) | ||||
255 | |||||
256 | Whether to use TxRep reputation system. TxRep tracks the long-term average | ||||
257 | score for each sender and then shifts the score of new messages toward that | ||||
258 | long-term average. This can increase or decrease the score for messages, | ||||
259 | depending on the long-term behavior of the particular correspondent. | ||||
260 | |||||
261 | Note that certain tests are ignored when determining the final message score: | ||||
262 | |||||
263 | - rules with tflags set to 'noautolearn' | ||||
264 | |||||
265 | =cut # ................................................................... | ||||
266 | 1 | 8µs | push (@cmds, { | ||
267 | setting => 'use_txrep', | ||||
268 | default => 0, | ||||
269 | type => $Mail::SpamAssassin::Conf::CONF_TYPE_BOOL | ||||
270 | }); | ||||
271 | |||||
272 | |||||
273 | # ------------------------------------------------------------------------- | ||||
274 | =item B<txrep_factor> | ||||
275 | |||||
276 | range [0..1] (default: 0.5) | ||||
277 | |||||
278 | How much towards the long-term mean for the sender to regress a message. | ||||
279 | Basically, the algorithm is to track the long-term total score and the count | ||||
280 | of messages for the sender (C<total> and C<count>), and then once we have | ||||
281 | otherwise fully calculated the score for this message (C<score>), we calculate | ||||
282 | the final score for the message as: | ||||
283 | |||||
284 | finalscore = score + factor * (total + score)/(count + 1) | ||||
285 | |||||
286 | So if C<factor> = 0.5, then we'll move to half way between the calculated | ||||
287 | score and the new mean value. If C<factor> = 0.3, then we'll move about 1/3 | ||||
288 | of the way from the score toward the mean. C<factor> = 1 means use the | ||||
289 | long-term mean including also the new unadjusted score; C<factor> = 0 mean | ||||
290 | just use the calculated score, disabling so the score averaging, though still | ||||
291 | recording the reputation to the database. | ||||
292 | |||||
293 | =cut # ................................................................... | ||||
294 | push (@cmds, { | ||||
295 | setting => 'txrep_factor', | ||||
296 | default => 0.5, | ||||
297 | type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC, | ||||
298 | code => sub { | ||||
299 | my ($self, $key, $value, $line) = @_; | ||||
300 | if ($value < 0 || $value > 1.0) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;} | ||||
301 | $self->{txrep_factor} = $value; | ||||
302 | } | ||||
303 | 1 | 15µs | }); | ||
304 | |||||
305 | |||||
306 | # ------------------------------------------------------------------------- | ||||
307 | =item B<txrep_dilution_factor> | ||||
308 | |||||
309 | range [0.7..1.0] (default: 0.98) | ||||
310 | |||||
311 | At any new email from given sender, the historical reputation records are "diluted", | ||||
312 | or "watered down" by certain fraction given by this factor. It means that the | ||||
313 | influence of old records will progressively diminish with every new message from | ||||
314 | given sender. This is important to allow a more flexible handling of changes in | ||||
315 | sender's behavior, or new improvements or changes of local SA rules. | ||||
316 | |||||
317 | Without any dilution expiry (dilution factor set to 1), the new message score is | ||||
318 | simply add to the total score of given sender in the reputation database. When | ||||
319 | dilution is used (factor < 1), the impact of the historical reputation average is | ||||
320 | reduced by the factor before calculating the new average, which in turn is then | ||||
321 | used to adjust the new total score to be stored in the database. | ||||
322 | |||||
323 | newtotal = (oldcount + 1) * (newscore + dilution * oldtotal) / (dilution * oldcount + 1) | ||||
324 | |||||
325 | In other words, it means that the older a message is, the less and less impact | ||||
326 | on the new average its original spam score has. For example if we set the factor | ||||
327 | to 0.9 (meaning dilution by 10%), the score of the new message will be recorded | ||||
328 | to its 100%, the last score of the same sender to 90%, the second last to 81% | ||||
329 | (0.9 * 0.9 = 0.81), and for example the 10th last message just to 35%. | ||||
330 | |||||
331 | At stable systems, we recommend keeping the factor close to 1 (but still lower | ||||
332 | than 1). At systems where SA rules tuning and spam learning is still in progress, | ||||
333 | lower factors will help the reputation to quicker adapt any modifications. In | ||||
334 | the same time, it will also reduce the impact of the historical reputation | ||||
335 | though. | ||||
336 | |||||
337 | =cut # ................................................................... | ||||
338 | push (@cmds, { | ||||
339 | setting => 'txrep_dilution_factor', | ||||
340 | default => 0.98, | ||||
341 | type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC, | ||||
342 | code => sub { | ||||
343 | my ($self, $key, $value, $line) = @_; | ||||
344 | if ($value < 0.7 || $value > 1.0) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;} | ||||
345 | $self->{txrep_dilution_factor} = $value; | ||||
346 | } | ||||
347 | 1 | 9µs | }); | ||
348 | |||||
349 | |||||
350 | # TODO, not implemented yet, hence no advertising until then | ||||
351 | # ------------------------------------------------------------------------- | ||||
352 | #=item B<txrep_expiry_days> | ||||
353 | # | ||||
354 | # range [0..10000] (default: 365) | ||||
355 | # | ||||
356 | #The scores of of messages can be removed from the total reputation, and the | ||||
357 | #message tracking entry removed from the database after given number of days. | ||||
358 | #It helps keeping the database growth under control, and it also reduces the | ||||
359 | #influence of old scores on the current reputation (both scoring methods, and | ||||
360 | #sender's behavior might have changed over time). | ||||
361 | # | ||||
362 | #=cut # ................................................................... | ||||
363 | push (@cmds, { | ||||
364 | setting => 'txrep_expiry_days', | ||||
365 | default => 365, | ||||
366 | type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC, | ||||
367 | code => sub { | ||||
368 | my ($self, $key, $value, $line) = @_; | ||||
369 | if ($value < 0 || $value > 10000) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;} | ||||
370 | $self->{txrep_expiry_days} = $value; | ||||
371 | } | ||||
372 | 1 | 8µs | }); | ||
373 | |||||
374 | |||||
375 | # ------------------------------------------------------------------------- | ||||
376 | =item B<txrep_learn_penalty> | ||||
377 | |||||
378 | range [0..200] (default: 20) | ||||
379 | |||||
380 | When SpamAssassin is trained a SPAM message, the given penalty score will | ||||
381 | be added to the total reputation score of the sender, regardless of the real | ||||
382 | spam score. The impact of the penalty will be the smaller the higher is the | ||||
383 | number of messages that the sender already has in the TxRep database. | ||||
384 | |||||
385 | =cut # ................................................................... | ||||
386 | push (@cmds, { | ||||
387 | setting => 'txrep_learn_penalty', | ||||
388 | default => 20, | ||||
389 | type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC, | ||||
390 | code => sub { | ||||
391 | my ($self, $key, $value, $line) = @_; | ||||
392 | if ($value < 0 || $value > 200) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;} | ||||
393 | $self->{txrep_learn_penalty} = $value; | ||||
394 | } | ||||
395 | 1 | 9µs | }); | ||
396 | |||||
397 | |||||
398 | # ------------------------------------------------------------------------- | ||||
399 | =item B<txrep_learn_bonus> | ||||
400 | |||||
401 | range [0..200] (default: 20) | ||||
402 | |||||
403 | When SpamAssassin is trained a HAM message, the given penalty score will be | ||||
404 | deduced from the total reputation score of the sender, regardless of the real | ||||
405 | spam score. The impact of the penalty will be the smaller the higher is the | ||||
406 | number of messages that the sender already has in the TxRep database. | ||||
407 | |||||
408 | =cut # ................................................................... | ||||
409 | push (@cmds, { | ||||
410 | setting => 'txrep_learn_bonus', | ||||
411 | default => 20, | ||||
412 | type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC, | ||||
413 | code => sub { | ||||
414 | my ($self, $key, $value, $line) = @_; | ||||
415 | if ($value < 0 || $value > 200) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;} | ||||
416 | $self->{txrep_learn_bonus} = $value; | ||||
417 | } | ||||
418 | 1 | 8µs | }); | ||
419 | |||||
420 | |||||
421 | # ------------------------------------------------------------------------- | ||||
422 | =item B<txrep_autolearn> | ||||
423 | |||||
424 | range [0..5] (default: 0) | ||||
425 | |||||
426 | When SpamAssassin declares a message a clear spam resp. ham during the mesage | ||||
427 | scan, and launches the auto-learn process, sender reputation scores of given | ||||
428 | message will be adjusted by the value of the option L</C<txrep_learn_penalty>>, | ||||
429 | resp. the L</C<txrep_learn_bonus>> in the same way as during the manual learning. | ||||
430 | Value 0 at this option disables the auto-learn reputation adjustment - only the | ||||
431 | score calculated before the auto-learn will be stored to the reputation database. | ||||
432 | |||||
433 | =cut # ................................................................... | ||||
434 | push (@cmds, { | ||||
435 | setting => 'txrep_autolearn', | ||||
436 | default => 0, | ||||
437 | type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC, | ||||
438 | code => sub { | ||||
439 | my ($self, $key, $value, $line) = @_; | ||||
440 | if ($value < 0 || $value > 5) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;} | ||||
441 | $self->{txrep_autolearn} = $value; | ||||
442 | } | ||||
443 | 1 | 10µs | }); | ||
444 | |||||
445 | |||||
446 | # ------------------------------------------------------------------------- | ||||
447 | =item B<txrep_track_messages> | ||||
448 | |||||
449 | 0 | 1 (default: 1) | ||||
450 | |||||
451 | Whether TxRep should keep track of already scanned and/or learned messages. | ||||
452 | When enabled, an additional record in the reputation database will be created | ||||
453 | to avoid false score adjustments due to repeated scanning of the same message, | ||||
454 | and to allow proper relearning of messages that were either previously wrongly | ||||
455 | learned, or need to be relearned after modifying the learn penalty or bonus. | ||||
456 | |||||
457 | =cut # ................................................................... | ||||
458 | 1 | 4µs | push (@cmds, { | ||
459 | setting => 'txrep_track_messages', | ||||
460 | default => 1, | ||||
461 | type => $Mail::SpamAssassin::Conf::CONF_TYPE_BOOL | ||||
462 | }); | ||||
463 | |||||
464 | |||||
465 | # ------------------------------------------------------------------------- | ||||
466 | =item B<txrep_whitelist_out> | ||||
467 | |||||
468 | range [0..200] (default: 10) | ||||
469 | |||||
470 | When the value of this setting is greater than zero, recipients of messages sent from | ||||
471 | within the internal networks will be whitelisted through improving their total reputation | ||||
472 | score with the number of points defined by this setting. Since the IP address and other | ||||
473 | sender identificators are not known when sending the email, only the reputation of the | ||||
474 | standalone email is being whitelisted. The domain name is intentionally also left | ||||
475 | unaffected. The outbound whitelisting can only work when SpamAssassin is set up to scan | ||||
476 | also outgoing email, when local users use the SMTP server for sending email, and when | ||||
477 | C<internal_networks> are defined in SpamAssassin configuration. The improving of the | ||||
478 | reputation happens at every message sent from internal networks, so the more messages is | ||||
479 | being sent to the recipient, the better reputation his email address will have. | ||||
480 | |||||
481 | |||||
482 | =cut # ................................................................... | ||||
483 | push (@cmds, { | ||||
484 | setting => 'txrep_whitelist_out', | ||||
485 | default => 10, | ||||
486 | type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC, | ||||
487 | # spent 32µs within Mail::SpamAssassin::Plugin::TxRep::__ANON__[/usr/local/lib/perl5/site_perl/Mail/SpamAssassin/Plugin/TxRep.pm:491] which was called:
# once (32µs+0s) by Mail::SpamAssassin::Conf::Parser::parse at line 438 of Mail/SpamAssassin/Conf/Parser.pm | ||||
488 | 1 | 6µs | my ($self, $key, $value, $line) = @_; | ||
489 | 1 | 4µs | if ($value < 0 || $value > 200) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;} | ||
490 | 1 | 15µs | $self->{txrep_whitelist_out} = $value; | ||
491 | } | ||||
492 | 1 | 9µs | }); | ||
493 | |||||
494 | |||||
495 | # ------------------------------------------------------------------------- | ||||
496 | =item B<txrep_ipv4_mask_len> | ||||
497 | |||||
498 | range [0..32] (default: 16) | ||||
499 | |||||
500 | The AWL database keeps only the specified number of most-significant bits | ||||
501 | of an IPv4 address in its fields, so that different individual IP addresses | ||||
502 | within a subnet belonging to the same owner are managed under a single | ||||
503 | database record. As we have no information available on the allocated | ||||
504 | address ranges of senders, this CIDR mask length is only an approximation. | ||||
505 | The default is 16 bits, corresponding to a former class B. Increase the | ||||
506 | number if a finer granularity is desired, e.g. to 24 (class C) or 32. | ||||
507 | A value 0 is allowed but is not particularly useful, as it would treat the | ||||
508 | whole internet as a single organization. The number need not be a multiple | ||||
509 | of 8, any split is allowed. | ||||
510 | |||||
511 | =cut # ................................................................... | ||||
512 | push (@cmds, { | ||||
513 | setting => 'txrep_ipv4_mask_len', | ||||
514 | default => 16, | ||||
515 | type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC, | ||||
516 | code => sub { | ||||
517 | my ($self, $key, $value, $line) = @_; | ||||
518 | if (!defined $value || $value eq '') | ||||
519 | {return $Mail::SpamAssassin::Conf::MISSING_REQUIRED_VALUE;} | ||||
520 | elsif ($value !~ /^\d+$/ || $value < 0 || $value > 32) | ||||
521 | {return $Mail::SpamAssassin::Conf::INVALID_VALUE;} | ||||
522 | $self->{txrep_ipv4_mask_len} = $value; | ||||
523 | } | ||||
524 | 1 | 9µs | }); | ||
525 | |||||
526 | |||||
527 | # ------------------------------------------------------------------------- | ||||
528 | =item B<txrep_ipv6_mask_len> | ||||
529 | |||||
530 | range [0..128] (default: 48) | ||||
531 | |||||
532 | The AWL database keeps only the specified number of most-significant bits | ||||
533 | of an IPv6 address in its fields, so that different individual IP addresses | ||||
534 | within a subnet belonging to the same owner are managed under a single | ||||
535 | database record. As we have no information available on the allocated address | ||||
536 | ranges of senders, this CIDR mask length is only an approximation. The default | ||||
537 | is 48 bits, corresponding to an address range commonly allocated to individual | ||||
538 | (smaller) organizations. Increase the number for a finer granularity, e.g. | ||||
539 | to 64 or 96 or 128, or decrease for wider ranges, e.g. 32. A value 0 is | ||||
540 | allowed but is not particularly useful, as it would treat the whole internet | ||||
541 | as a single organization. The number need not be a multiple of 4, any split | ||||
542 | is allowed. | ||||
543 | |||||
544 | =cut # ................................................................... | ||||
545 | push (@cmds, { | ||||
546 | setting => 'txrep_ipv6_mask_len', | ||||
547 | default => 48, | ||||
548 | type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC, | ||||
549 | code => sub { | ||||
550 | my ($self, $key, $value, $line) = @_; | ||||
551 | if (!defined $value || $value eq '') | ||||
552 | {return $Mail::SpamAssassin::Conf::MISSING_REQUIRED_VALUE;} | ||||
553 | elsif ($value !~ /^\d+$/ || $value < 0 || $value > 128) | ||||
554 | {return $Mail::SpamAssassin::Conf::INVALID_VALUE;} | ||||
555 | $self->{txrep_ipv6_mask_len} = $value; | ||||
556 | } | ||||
557 | 1 | 13µs | }); | ||
558 | |||||
559 | |||||
560 | # ------------------------------------------------------------------------- | ||||
561 | =item B<user_awl_sql_override_username> | ||||
562 | |||||
563 | string (default: undefined) | ||||
564 | |||||
565 | Used by the SQLBasedAddrList storage implementation. | ||||
566 | |||||
567 | If this option is set the SQLBasedAddrList module will override the set | ||||
568 | username with the value given. This can be useful for implementing global | ||||
569 | or group based TxRep databases. | ||||
570 | |||||
571 | =cut # ................................................................... | ||||
572 | 1 | 4µs | push (@cmds, { | ||
573 | setting => 'user_awl_sql_override_username', | ||||
574 | default => '', | ||||
575 | type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING | ||||
576 | }); | ||||
577 | |||||
578 | |||||
579 | # ------------------------------------------------------------------------- | ||||
580 | =item B<txrep_user2global_ratio> | ||||
581 | |||||
582 | range [0..10] (default: 0) | ||||
583 | |||||
584 | When the option txrep_user2global_ratio is set to a value greater than zero, and | ||||
585 | if the server configuration allows it, two data storages will be used - user and | ||||
586 | global (server-wide) storages. | ||||
587 | |||||
588 | User storage keeps only senders who send messages to the respective recipient, | ||||
589 | and will reflect also the corrected/learned scores, when some messages are marked | ||||
590 | by the user as spam or ham, or when the sender is whitelisted or blacklisted | ||||
591 | through the API of SpamAssassin. | ||||
592 | |||||
593 | Global storage keeps the reputation data of all messages processed by SpamAssassin | ||||
594 | with their spam scores and spam/ham learning data from all users on the server. | ||||
595 | Hence, the module will return a reputation value even at senders not known to the | ||||
596 | current recipient, as long as he already sent email to anyone else on the server. | ||||
597 | |||||
598 | The value of the txrep_user2global_ratio parameter controls the impact of each | ||||
599 | of the two reputations. When equal to 1, both the global and the user score will | ||||
600 | have the same impact on the result. When set to 2, the reputation taken from | ||||
601 | the user storage will have twice the impact of the global value. The final value | ||||
602 | of the TXREP tag will be calculated as follows: | ||||
603 | |||||
604 | total = ( ratio * user + global ) / ( ratio + 1 ) | ||||
605 | |||||
606 | When no reputation is found in the user storage, and a global reputation is | ||||
607 | available, the global storage is used fully, without applying the ratio. | ||||
608 | |||||
609 | When the ratio is set to zero, only the default storage will be used. And it | ||||
610 | then depends whether you use the global, or the local user storage by default, | ||||
611 | which in turn is controlled either by the parameter user_awl_sql_override_username | ||||
612 | (in case of SQL storage), or the C</auto_whitelist_path> parameter (in case of | ||||
613 | Berkeley database). | ||||
614 | |||||
615 | When this dual storage is enabled, and no global storage is defined by the | ||||
616 | above mentioned parameters for the Berkeley or SQL databases, TxRep will attempt | ||||
617 | to use a generic storage - user 'GLOBAL' in case of SQL, and in the case of | ||||
618 | Berkeley database it uses the path defined by '__local_state_dir__/tx-reputation', | ||||
619 | which typically renders into /var/db/spamassassin/tx-reputation. When the default | ||||
620 | storages are not available, or are not writable, you would have to set the global | ||||
621 | storage with the help of the C<user_awl_sql_override_username> resp. | ||||
622 | C<auto_whitelist_path settings>. | ||||
623 | |||||
624 | Please note that some SpamAssassin installations run always under the same user | ||||
625 | ID. In such case it is pointless enabling the dual storage, because it would | ||||
626 | maximally lead to two identical global storages in different locations. | ||||
627 | |||||
628 | This feature is disabled by default. | ||||
629 | =cut # ................................................................... | ||||
630 | push (@cmds, { | ||||
631 | setting => 'txrep_user2global_ratio', | ||||
632 | default => 0, | ||||
633 | type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING, | ||||
634 | code => sub { | ||||
635 | my ($self, $key, $value, $line) = @_; | ||||
636 | if ($value < 0 || $value > 10) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;} | ||||
637 | $self->{txrep_user2global_ratio} = $value; | ||||
638 | } | ||||
639 | 1 | 8µs | }); | ||
640 | |||||
641 | |||||
642 | # ------------------------------------------------------------------------- | ||||
643 | =item B<auto_whitelist_distinguish_signed> | ||||
644 | |||||
645 | (default: 1 - enabled) | ||||
646 | |||||
647 | Used by the SQLBasedAddrList storage implementation. | ||||
648 | |||||
649 | If this option is set the SQLBasedAddrList module will keep separate | ||||
650 | database entries for DKIM-validated e-mail addresses and for non-validated | ||||
651 | ones. A pre-requisite when setting this option is that a field awl.signedby | ||||
652 | exists in a SQL table, otherwise SQL operations will fail (which is why we | ||||
653 | need this option at all - for compatibility with pre-3.3.0 database schema). | ||||
654 | A plugin DKIM should also be enabled, as otherwise there is no benefit from | ||||
655 | turning on this option. | ||||
656 | |||||
657 | =cut # ................................................................... | ||||
658 | 1 | 4µs | push (@cmds, { | ||
659 | setting => 'auto_whitelist_distinguish_signed', | ||||
660 | default => 1, | ||||
661 | type => $Mail::SpamAssassin::Conf::CONF_TYPE_BOOL | ||||
662 | }); | ||||
663 | |||||
664 | |||||
665 | =item B<txrep_spf> | ||||
666 | |||||
667 | 0 | 1 (default: 1) | ||||
668 | |||||
669 | When enabled, TxRep will treat any IP address using a given email address as | ||||
670 | the same authorized identity, and will not associate any IP address with it. | ||||
671 | (The same happens with valid DKIM signatures. No option available for DKIM). | ||||
672 | |||||
673 | Note: at domains that define the useless SPF +all (pass all), no IP would be | ||||
674 | ever associated with the email address, and all addresses (incl. the froged | ||||
675 | ones) would be treated as coming from the authorized source. However, such | ||||
676 | domains are hopefuly rare, and ask for this kind of treatment anyway. | ||||
677 | |||||
678 | =back | ||||
679 | |||||
680 | =cut # ................................................................... | ||||
681 | 1 | 4µs | push (@cmds, { | ||
682 | setting => 'txrep_spf', | ||||
683 | default => 1, | ||||
684 | type => $Mail::SpamAssassin::Conf::CONF_TYPE_BOOL | ||||
685 | }); | ||||
686 | |||||
687 | |||||
688 | # ------------------------------------------------------------------------- | ||||
689 | =head2 REPUTATION WEIGHTS | ||||
690 | |||||
691 | The overall reputation of the sender comprises several elements: | ||||
692 | |||||
693 | =over 4 | ||||
694 | |||||
695 | =item 1) The reputation of the 'From' email address bound to the originating IP | ||||
696 | address fraction (see the mask parameters for details) | ||||
697 | |||||
698 | =item 2) The reputation of the 'From' email address alone (regardless the IP | ||||
699 | address being currently used) | ||||
700 | |||||
701 | =item 3) The reputation of the domain name of the 'From' email address | ||||
702 | |||||
703 | =item 4) The reputation of the originating IP address, regardless of sender's email address | ||||
704 | |||||
705 | =item 5) The reputation of the HELO name of the originating computer (if available) | ||||
706 | |||||
707 | =back | ||||
708 | |||||
709 | Each of these partial reputations is weighted with the help of these parameters, | ||||
710 | and the overall reputation is calculation as the sum of the individual | ||||
711 | reputations divided by the sum of all their weights: | ||||
712 | |||||
713 | sender_reputation = weight_email * rep_email + | ||||
714 | weight_email_ip * rep_email_ip + | ||||
715 | weight_domain * rep_domain + | ||||
716 | weight_ip * rep_ip + | ||||
717 | weight_helo * rep_helo | ||||
718 | |||||
719 | You can disable the individual partial reputations by setting their respective | ||||
720 | weight to zero. This will also reduce the size of the database, since each | ||||
721 | partial reputation requires a separate entry in the database table. Disabling | ||||
722 | some of the partial reputations in this way may also help with the performance | ||||
723 | on busy servers, because the respective database lookups and processing will | ||||
724 | be skipped too. | ||||
725 | |||||
726 | =over 4 | ||||
727 | |||||
728 | =item B<txrep_weight_email> | ||||
729 | |||||
730 | range [0..10] (default: 3) | ||||
731 | |||||
732 | This weight factor controls the influence of the reputation of the standalone | ||||
733 | email address, regardless of the originating IP address. When adjusting the | ||||
734 | weight, you need to keep on mind that an email address can be easily spoofed, | ||||
735 | and hence spammers can use 'from' email addresses belonging to senders with | ||||
736 | good reputation. From this point of view, the email address bound to the | ||||
737 | originating IP address is a more reliable indicator for the overall reputation. | ||||
738 | |||||
739 | On the other hand, some reputable senders may be sending from a bigger number | ||||
740 | of IP addresses, so looking for the reputation of the standalone email address | ||||
741 | without regarding the originating IP has some sense too. | ||||
742 | |||||
743 | We recommend using a relatively low value for this partial reputation. | ||||
744 | |||||
745 | =cut # ................................................................... | ||||
746 | push (@cmds, { | ||||
747 | setting => 'txrep_weight_email', | ||||
748 | default => 3, | ||||
749 | type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC, | ||||
750 | code => sub { | ||||
751 | my ($self, $key, $value, $line) = @_; | ||||
752 | if ($value < 0 || $value > 10) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;} | ||||
753 | $self->{txrep_weight_email} = $value; | ||||
754 | } | ||||
755 | 1 | 8µs | }); | ||
756 | |||||
757 | # ------------------------------------------------------------------------- | ||||
758 | =item B<txrep_weight_email_ip> | ||||
759 | |||||
760 | range [0..10] (default: 10) | ||||
761 | |||||
762 | This is the standard reputation used in the same way as it was by the original | ||||
763 | AWL plugin. Each sender's email address is bound to the originating IP, or | ||||
764 | its part as defined by the txrep_ipv4_mask_len or txrep_ipv6_mask_len parameters. | ||||
765 | |||||
766 | At a user sending from multiple locations, diverse mail servers, or from a dynamic | ||||
767 | IP range out of the masked block, his email address will have a separate reputation | ||||
768 | value for each of the different (partial) IP addresses. | ||||
769 | |||||
770 | When the option auto_whitelist_distinguish_signed is enabled, in contrary to | ||||
771 | the original AWL module, TxRep does not record the IP address when DKIM | ||||
772 | signature is detected. The email address is then not bound to any IP address, but | ||||
773 | rather just to the DKIM signature, since it is considered that it authenticates | ||||
774 | the sender more reliably than the IP address (which can also vary). | ||||
775 | |||||
776 | This is by design the most relevant reputation, and its weight should be kept | ||||
777 | high. | ||||
778 | |||||
779 | =cut # ................................................................... | ||||
780 | push (@cmds, { | ||||
781 | setting => 'txrep_weight_email_ip', | ||||
782 | default => 10, | ||||
783 | type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC, | ||||
784 | code => sub { | ||||
785 | my ($self, $key, $value, $line) = @_; | ||||
786 | if ($value < 0 || $value > 10) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;} | ||||
787 | $self->{txrep_weight_email_ip} = $value; | ||||
788 | } | ||||
789 | 1 | 8µs | }); | ||
790 | |||||
791 | # ------------------------------------------------------------------------- | ||||
792 | =item B<txrep_weight_domain> | ||||
793 | |||||
794 | range [0..10] (default: 2) | ||||
795 | |||||
796 | Some spammers may use always their real domain name in the email address, | ||||
797 | just with multiple or changing local parts. This reputation will record the | ||||
798 | spam scores of all messages send from the respective domain, regardless of | ||||
799 | the local part (user name) used. | ||||
800 | |||||
801 | Similarly as with the email_ip reputation, the domain reputation is also | ||||
802 | bound to the originating address (or a masked block, if mask parameters used). | ||||
803 | It avoids giving false reputation based on spoofed email addresses. | ||||
804 | |||||
805 | In case of a DKIM signature detected, the signature signer is used instead | ||||
806 | of the domain name extracted from the email address. It is considered that | ||||
807 | the signing authority is responsible for sending email of any domain name, | ||||
808 | hence the same reputation applies here. | ||||
809 | |||||
810 | The domain reputation will give relevant picture about the owner of the | ||||
811 | domain in case of small servers, or corporation with strict policies, but | ||||
812 | will be less relevant for freemailers like Gmail, Hotmail, and similar, | ||||
813 | because both ham and spam may be sent by their users. | ||||
814 | |||||
815 | The default value is set relatively low. Higher weight values may be useful, | ||||
816 | but we recommend caution and observing the scores before increasing it. | ||||
817 | |||||
818 | =cut # ................................................................... | ||||
819 | push (@cmds, { | ||||
820 | setting => 'txrep_weight_domain', | ||||
821 | default => 2, | ||||
822 | type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC, | ||||
823 | code => sub { | ||||
824 | my ($self, $key, $value, $line) = @_; | ||||
825 | if ($value < 0 || $value > 10) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;} | ||||
826 | $self->{txrep_weight_domain} = $value; | ||||
827 | } | ||||
828 | 1 | 7µs | }); | ||
829 | |||||
830 | # ------------------------------------------------------------------------- | ||||
831 | =item B<txrep_weight_ip> | ||||
832 | |||||
833 | range [0..10] (default: 4) | ||||
834 | |||||
835 | Spammers can send through the same relay (incl. compromised hosts) under a | ||||
836 | multitude of email addresses. This is the exact case when the IP reputation | ||||
837 | can help. This reputation is a kind of a local RBL. | ||||
838 | |||||
839 | The weight is set by default lower than for the email_IP reputation, because | ||||
840 | there may be cases when the same IP address hosts both spammers and acceptable | ||||
841 | senders (for example the marketing department of a company sends you spam, but | ||||
842 | you still need to get messages from their billing address). | ||||
843 | |||||
844 | =cut # ................................................................... | ||||
845 | push (@cmds, { | ||||
846 | setting => 'txrep_weight_ip', | ||||
847 | default => 4, | ||||
848 | type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC, | ||||
849 | code => sub { | ||||
850 | my ($self, $key, $value, $line) = @_; | ||||
851 | if ($value < 0 || $value > 10) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;} | ||||
852 | $self->{txrep_weight_ip} = $value; | ||||
853 | } | ||||
854 | 1 | 8µs | }); | ||
855 | |||||
856 | # ------------------------------------------------------------------------- | ||||
857 | =item B<txrep_weight_helo> | ||||
858 | |||||
859 | range [0..10] (default: 0.5) | ||||
860 | |||||
861 | Big number of spam messages come from compromised hosts, often personal computers, | ||||
862 | or top-boxes. Their NetBIOS names are usually used as the HELO name when connecting | ||||
863 | to your mail server. Some of the names are pretty generic and hence may be shared by | ||||
864 | a big number of hosts, but often the names are quite unique and may be a good | ||||
865 | indicator for detecting a spammer, despite that he uses different email and IP | ||||
866 | addresses (spam can come also from portable devices). | ||||
867 | |||||
868 | No IP address is bound to the HELO name when stored to the reputation database. | ||||
869 | This is intentional, and despite the possibility that numerous devices may share | ||||
870 | some of the HELO names. | ||||
871 | |||||
872 | This option is still considered experimental, hence the low weight value, but after | ||||
873 | some testing it could be likely at least slightly increased. | ||||
874 | |||||
875 | =cut # ................................................................... | ||||
876 | push (@cmds, { | ||||
877 | setting => 'txrep_weight_helo', | ||||
878 | default => 0.5, | ||||
879 | type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC, | ||||
880 | code => sub { | ||||
881 | my ($self, $key, $value, $line) = @_; | ||||
882 | if ($value < 0 || $value > 10) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;} | ||||
883 | $self->{txrep_weight_helo} = $value; | ||||
884 | } | ||||
885 | 1 | 7µs | }); | ||
886 | |||||
887 | |||||
888 | # ------------------------------------------------------------------------- | ||||
889 | =back | ||||
890 | |||||
891 | =head1 ADMINISTRATOR SETTINGS | ||||
892 | |||||
893 | These settings differ from the ones above, in that they are considered 'more | ||||
894 | privileged' -- even more than the ones in the B<PRIVILEGED SETTINGS> section. | ||||
895 | No matter what C<allow_user_rules> is set to, these can never be set from a | ||||
896 | user's C<user_prefs> file. | ||||
897 | |||||
898 | =over 4 | ||||
899 | |||||
900 | =item B<txrep_factory module> | ||||
901 | |||||
902 | (default: Mail::SpamAssassin::DBBasedAddrList) | ||||
903 | |||||
904 | Select alternative database factory module for the TxRep database. | ||||
905 | |||||
906 | =cut # ................................................................... | ||||
907 | 1 | 5µs | push (@cmds, { | ||
908 | setting => 'txrep_factory', | ||||
909 | is_admin => 1, | ||||
910 | default => 'Mail::SpamAssassin::DBBasedAddrList', | ||||
911 | type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING | ||||
912 | }); | ||||
913 | |||||
914 | |||||
915 | # ------------------------------------------------------------------------- | ||||
916 | =item B<auto_whitelist_path /path/filename> | ||||
917 | |||||
918 | (default: ~/.spamassassin/tx-reputation) | ||||
919 | |||||
920 | This is the TxRep directory and filename. By default, each user | ||||
921 | has their own reputation database in their C<~/.spamassassin> directory with | ||||
922 | mode 0700. For system-wide SpamAssassin use, you may want to share this | ||||
923 | across all users. | ||||
924 | |||||
925 | =cut # ................................................................... | ||||
926 | push (@cmds, { | ||||
927 | setting => 'auto_whitelist_path', | ||||
928 | is_admin => 1, | ||||
929 | default => '__userstate__/tx-reputation', | ||||
930 | type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING, | ||||
931 | code => sub { | ||||
932 | my ($self, $key, $value, $line) = @_; | ||||
933 | unless (defined $value && $value !~ /^$/) {return $Mail::SpamAssassin::Conf::MISSING_REQUIRED_VALUE;} | ||||
934 | if (-d $value) {return $Mail::SpamAssassin::Conf::INVALID_VALUE; } | ||||
935 | $self->{txrep_path} = $value; | ||||
936 | } | ||||
937 | 1 | 9µs | }); | ||
938 | |||||
939 | |||||
940 | # ------------------------------------------------------------------------- | ||||
941 | =item B<auto_whitelist_db_modules Module ...> | ||||
942 | |||||
943 | (default: see below) | ||||
944 | |||||
945 | What database modules should be used for the TxRep storage database | ||||
946 | file. The first named module that can be loaded from the Perl include path | ||||
947 | will be used. The format is: | ||||
948 | |||||
949 | PreferredModuleName SecondBest ThirdBest ... | ||||
950 | |||||
951 | ie. a space-separated list of Perl module names. The default is: | ||||
952 | |||||
953 | DB_File GDBM_File SDBM_File | ||||
954 | |||||
955 | NDBM_File is not supported (see SpamAssassin bug 4353). | ||||
956 | |||||
957 | =cut # ................................................................... | ||||
958 | 1 | 5µs | push (@cmds, { | ||
959 | setting => 'auto_whitelist_db_modules', | ||||
960 | is_admin => 1, | ||||
961 | default => 'DB_File GDBM_File SDBM_File', | ||||
962 | type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING | ||||
963 | }); | ||||
964 | |||||
965 | |||||
966 | # ------------------------------------------------------------------------- | ||||
967 | =item B<auto_whitelist_file_mode> | ||||
968 | |||||
969 | (default: 0700) | ||||
970 | |||||
971 | The file mode bits used for the TxRep directory or file. | ||||
972 | |||||
973 | Make sure you specify this using the 'x' mode bits set, as it may also be used | ||||
974 | to create directories. However, if a file is created, the resulting file will | ||||
975 | not have any execute bits set (the umask is set to 0111). | ||||
976 | |||||
977 | =cut # ................................................................... | ||||
978 | push (@cmds, { | ||||
979 | setting => 'auto_whitelist_file_mode', | ||||
980 | is_admin => 1, | ||||
981 | default => '0700', | ||||
982 | type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC, | ||||
983 | code => sub { | ||||
984 | my ($self, $key, $value, $line) = @_; | ||||
985 | if ($value !~ /^0?[0-7]{3}$/) { | ||||
986 | return $Mail::SpamAssassin::Conf::INVALID_VALUE; | ||||
987 | } | ||||
988 | $self->{txrep_file_mode} = untaint_var($value); | ||||
989 | } | ||||
990 | 1 | 8µs | }); | ||
991 | |||||
992 | |||||
993 | # ------------------------------------------------------------------------- | ||||
994 | =item B<user_awl_dsn DBI:databasetype:databasename:hostname:port> | ||||
995 | |||||
996 | Used by the SQLBasedAddrList storage implementation. | ||||
997 | |||||
998 | This will set the DSN used to connect. Example: | ||||
999 | C<DBI:mysql:spamassassin:localhost> | ||||
1000 | |||||
1001 | =cut # ................................................................... | ||||
1002 | 1 | 4µs | push (@cmds, { | ||
1003 | setting => 'user_awl_dsn', | ||||
1004 | is_admin => 1, | ||||
1005 | type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING | ||||
1006 | }); | ||||
1007 | |||||
1008 | |||||
1009 | # ------------------------------------------------------------------------- | ||||
1010 | =item B<user_awl_sql_username username> | ||||
1011 | |||||
1012 | Used by the SQLBasedAddrList storage implementation. | ||||
1013 | |||||
1014 | The authorized username to connect to the above DSN. | ||||
1015 | |||||
1016 | =cut # ................................................................... | ||||
1017 | 1 | 4µs | push (@cmds, { | ||
1018 | setting => 'user_awl_sql_username', | ||||
1019 | is_admin => 1, | ||||
1020 | type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING | ||||
1021 | }); | ||||
1022 | |||||
1023 | |||||
1024 | # ------------------------------------------------------------------------- | ||||
1025 | =item B<user_awl_sql_password password> | ||||
1026 | |||||
1027 | Used by the SQLBasedAddrList storage implementation. | ||||
1028 | |||||
1029 | The password for the database username, for the above DSN. | ||||
1030 | |||||
1031 | =cut # ................................................................... | ||||
1032 | 1 | 3µs | push (@cmds, { | ||
1033 | setting => 'user_awl_sql_password', | ||||
1034 | is_admin => 1, | ||||
1035 | type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING | ||||
1036 | }); | ||||
1037 | |||||
1038 | |||||
1039 | # ------------------------------------------------------------------------- | ||||
1040 | =item B<user_awl_sql_table tablename> | ||||
1041 | |||||
1042 | (default: txrep) | ||||
1043 | |||||
1044 | Used by the SQLBasedAddrList storage implementation. | ||||
1045 | |||||
1046 | The table name where reputation is to be stored in, for the above DSN. | ||||
1047 | |||||
1048 | =back | ||||
1049 | |||||
1050 | =cut # ................................................................... | ||||
1051 | 1 | 13µs | push (@cmds, { | ||
1052 | setting => 'user_awl_sql_table', | ||||
1053 | is_admin => 1, | ||||
1054 | default => 'txrep', | ||||
1055 | type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING | ||||
1056 | }); | ||||
1057 | |||||
1058 | 1 | 25µs | 1 | 904µs | $conf->{parser}->register_commands(\@cmds); # spent 904µs making 1 call to Mail::SpamAssassin::Conf::Parser::register_commands |
1059 | } | ||||
1060 | |||||
1061 | |||||
1062 | ########################################################################### | ||||
1063 | sub _message { | ||||
1064 | ########################################################################### | ||||
1065 | my ($self, $value, $msg) = @_; | ||||
1066 | print "SpamAssassin TxRep: $value\n" if ($msg); | ||||
1067 | dbg("TxRep: $value"); | ||||
1068 | } | ||||
1069 | |||||
1070 | |||||
1071 | ########################################################################### | ||||
1072 | sub _fail_exit { | ||||
1073 | ########################################################################### | ||||
1074 | my ($self, $err) = @_; | ||||
1075 | my $eval_stat = ($err ne '') ? $err : "errno=$!"; | ||||
1076 | chomp $eval_stat; | ||||
1077 | warn("TxRep: open of TxRep file failed: $eval_stat\n"); | ||||
1078 | if (!defined $self->{txKeepStoreTied}) {$self->finish();} | ||||
1079 | return 0; | ||||
1080 | } | ||||
1081 | |||||
1082 | |||||
1083 | ########################################################################### | ||||
1084 | sub _fn_envelope { | ||||
1085 | ########################################################################### | ||||
1086 | my ($self, $args, $value, $msg) = @_; | ||||
1087 | |||||
1088 | unless ($self->{main}->{conf}->{use_txrep}){ return 0;} | ||||
1089 | unless ($args->{address}) {$self->_message($args->{cli_p},"failed ".$msg); return 0;} | ||||
1090 | |||||
1091 | my $factor = $self->{conf}->{txrep_weight_email} + | ||||
1092 | $self->{conf}->{txrep_weight_email_ip} + | ||||
1093 | $self->{conf}->{txrep_weight_domain} + | ||||
1094 | $self->{conf}->{txrep_weight_ip} + | ||||
1095 | $self->{conf}->{txrep_weight_helo}; | ||||
1096 | my $sign = $args->{signedby}; | ||||
1097 | my $id = $args->{address}; | ||||
1098 | if ($args->{address} =~ /,/) { | ||||
1099 | $sign = $args->{address}; | ||||
1100 | $sign =~ s/^.*,//g; | ||||
1101 | $id =~ s/,.*$//g; | ||||
1102 | } | ||||
1103 | |||||
1104 | # simplified regex used for IP detection (possible FP at a domain is not critical) | ||||
1105 | if ($id !~ /\./ && $self->{conf}->{txrep_weight_helo}) | ||||
1106 | {$factor /= $self->{conf}->{txrep_weight_helo}; $sign = 'helo';} | ||||
1107 | elsif ($id =~ /^[a-f\d\.:]+$/ && $self->{conf}->{txrep_weight_ip}) | ||||
1108 | {$factor /= $self->{conf}->{txrep_weight_ip};} | ||||
1109 | elsif ($id =~ /@/ && $self->{conf}->{txrep_weight_email}) | ||||
1110 | {$factor /= $self->{conf}->{txrep_weight_email};} | ||||
1111 | elsif ($id !~ /@/ && $self->{conf}->{txrep_weight_domain}) | ||||
1112 | {$factor /= $self->{conf}->{txrep_weight_domain};} | ||||
1113 | else {$factor = 1;} | ||||
1114 | |||||
1115 | $self->open_storages(); | ||||
1116 | my $score = (!defined $value)? undef : $factor * $value; | ||||
1117 | my $status = $self->modify_reputation($id, $score, $sign); | ||||
1118 | dbg("TxRep: $msg %s (score %s) %s", $id, $score || 'undef', $sign || ''); | ||||
1119 | eval { | ||||
1120 | $self->_message($args->{cli_p}, ($status?"":"error ") . $msg . ": " . $id); | ||||
1121 | if (!defined $self->{txKeepStoreTied}) {$self->finish();} | ||||
1122 | 1; | ||||
1123 | } or return $self->_fail_exit( $@ ); | ||||
1124 | return $status; | ||||
1125 | } | ||||
1126 | |||||
- - | |||||
1129 | # ------------------------------------------------------------------------- | ||||
1130 | =head1 BLACKLISTING / WHITELISTING | ||||
1131 | |||||
1132 | When asked by SpamAssassin to blacklist or whitelist a user, the TxRep | ||||
1133 | plugin adds a score of 100 (for blacklisting) or -100 (for whitelisting) | ||||
1134 | to the given sender's email address. At a plain address without any IP | ||||
1135 | address, the value is multiplied by the ratio of total reputation | ||||
1136 | weight to the EMAIL reputation weight to account for the reduced impact | ||||
1137 | of the standalone EMAIL reputation when calculating the overall reputation. | ||||
1138 | |||||
1139 | total_weight = weight_email + weight_email_ip + weight_domain + weight_ip + weight_helo | ||||
1140 | blacklisted_reputation = 100 * total_weight / weight_email | ||||
1141 | |||||
1142 | When a standalone email address is blacklisted/whitelisted, all records | ||||
1143 | of the email address bound to an IP address, DKIM signature, or a SPF pass | ||||
1144 | will be removed from the database, and only the standalone record is kept. | ||||
1145 | |||||
1146 | Besides blacklisting/whitelisting of standalone email addresses, the same | ||||
1147 | method may be used also for blacklisting/whitelisting of IP addresses, | ||||
1148 | domain names, and HELO names (only dotless Netbios HELO names can be used). | ||||
1149 | |||||
1150 | When whitelisting/blacklisting an email address or domain name, you can | ||||
1151 | bind them to a specified DKIM signature or SPF record by appending the | ||||
1152 | DKIM signing domain or the tag 'spf' after the ID in the following way: | ||||
1153 | |||||
1154 | spamassassin --add-addr-to-blacklist=spamming.biz,spf | ||||
1155 | spamassassin --add-addr-to-whitelist=friend@good.org,good.org | ||||
1156 | |||||
1157 | When a message contains both a DKIM signature and an SPF pass, the DKIM | ||||
1158 | signature takes the priority, so the record bound to the 'spf' tag won't | ||||
1159 | be checked. Only email addresses and domains can be bound to DKIM or SPF. | ||||
1160 | Records of IP adresses and HELO names are always without DKIM/SPF. | ||||
1161 | |||||
1162 | In case of dual storage, the black/whitelisting is performed only in the | ||||
1163 | default storage. | ||||
1164 | |||||
1165 | =cut | ||||
1166 | ######################################################## plugin hooks ##### | ||||
1167 | sub blacklist_address {my $self=shift; return $self->_fn_envelope(@_, 100, "blacklisting address");} | ||||
1168 | sub whitelist_address {my $self=shift; return $self->_fn_envelope(@_, -100, "whitelisting address");} | ||||
1169 | sub remove_address {my $self=shift; return $self->_fn_envelope(@_,undef, "removing address");} | ||||
1170 | ########################################################################### | ||||
1171 | |||||
1172 | |||||
1173 | # ------------------------------------------------------------------------- | ||||
1174 | =head1 REPUTATION LOGICS | ||||
1175 | |||||
1176 | 1. The most significant sender identificator is equally as at AWL, the | ||||
1177 | combination of the email address and the originating IP address, resp. | ||||
1178 | its part defined by the IPv4 resp. IPv6 mask setting. | ||||
1179 | |||||
1180 | 2. No IP checking for standalone EMAIL address reputation | ||||
1181 | |||||
1182 | 3. No signature checking for IP reputation, and for HELO name reputation | ||||
1183 | |||||
1184 | 4. The EMAIL_IP weight, and not the standalone EMAIL weight is used when | ||||
1185 | no IP address is available (EMAIL_IP is the main indicator, and has | ||||
1186 | the highest weight) | ||||
1187 | |||||
1188 | 5. No IP checking at signed emails (signature authenticates the email | ||||
1189 | instead of the IP address) | ||||
1190 | |||||
1191 | 6. No IP checking at SPF pass (we assume the domain owner is responsable | ||||
1192 | for all IP's he authorizes to send from, hence we use the same identity | ||||
1193 | for all of them) | ||||
1194 | |||||
1195 | 7. No signature used for standalone EMAIL reputation (would be redundant, | ||||
1196 | since no IP is used at signed EMAIL_IP reputation, and we would store | ||||
1197 | two identical hits) | ||||
1198 | |||||
1199 | 8. When available, the DKIM signer is used instead of the domain name for | ||||
1200 | the DOMAIN reputation | ||||
1201 | |||||
1202 | 9. No IP and no signature used for HELO reputation (despite the possibility | ||||
1203 | of the possible existence of multiple computers with the same HELO) | ||||
1204 | |||||
1205 | 10. The full (unmasked IP) address is used (in the address field, instead the | ||||
1206 | IP field) for the standalone IP reputation | ||||
1207 | |||||
1208 | =cut | ||||
1209 | ########################################################################### | ||||
1210 | # spent 620s (277ms+620) within Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation which was called 321 times, avg 1.93s/call:
# 234 times (209ms+620s) by Mail::SpamAssassin::Plugin::TxRep::learn_message at line 1782, avg 2.65s/call
# 87 times (67.6ms+-67.6ms) by Mail::SpamAssassin::Plugin::TxRep::forget_message at line 1797, avg 0s/call | ||||
1211 | ########################################################################### | ||||
1212 | 321 | 694µs | my ($self, $pms) = @_; | ||
1213 | |||||
1214 | # just for the development debugging | ||||
1215 | # use Data::Printer; | ||||
1216 | # dbg("TxRep: DEBUG DUMP of pms: %s, %s", $pms, p($pms)); | ||||
1217 | |||||
1218 | 321 | 1.17ms | my $autolearn = defined $self->{autolearn}; | ||
1219 | 321 | 1.07ms | $self->{last_pms} = $self->{autolearn} = undef; | ||
1220 | |||||
1221 | 321 | 909µs | return 0 unless ($self->{conf}->{use_txrep}); | ||
1222 | 321 | 1.23ms | if ($self->{conf}->{use_auto_whitelist}) { | ||
1223 | warn("TxRep: cannot run when Auto-Whitelist is enabled. Please disable it!\n"); | ||||
1224 | return 0; | ||||
1225 | } | ||||
1226 | 321 | 594µs | if ($autolearn && !$self->{conf}->{txrep_autolearn}) { | ||
1227 | dbg("TxRep: autolearning disabled, no more reputation adjusting, quitting"); | ||||
1228 | return 0; | ||||
1229 | } | ||||
1230 | 321 | 3.90ms | 321 | 239ms | my @from = $pms->all_from_addrs(); # spent 239ms making 321 calls to Mail::SpamAssassin::PerMsgStatus::all_from_addrs, avg 746µs/call |
1231 | 321 | 938µs | if (@from && $from[0] eq 'ignore@compiling.spamassassin.taint.org') { | ||
1232 | dbg("TxRep: no scan in lint mode, quitting"); | ||||
1233 | return 0; | ||||
1234 | } | ||||
1235 | |||||
1236 | 321 | 944µs | my $delta = 0; | ||
1237 | 321 | 3.23ms | 321 | 2.79ms | my $timer = $self->{main}->time_method("total_txrep"); # spent 2.79ms making 321 calls to Mail::SpamAssassin::time_method, avg 9µs/call |
1238 | 321 | 1.04ms | my $msgscore = (defined $self->{learning})? $self->{learning} : $pms->get_autolearn_points(); | ||
1239 | 321 | 3.92ms | 321 | 1.79s | my $date = $pms->{msg}->receive_date() || $pms->{date_header_time}; # spent 1.79s making 321 calls to Mail::SpamAssassin::Message::receive_date, avg 5.59ms/call |
1240 | my $msg_id = $self->{msgid} || | ||||
1241 | 321 | 5.21ms | 321 | 170ms | Mail::SpamAssassin::Plugin::Bayes->get_msgid($pms->{msg}) || # spent 170ms making 321 calls to Mail::SpamAssassin::Plugin::Bayes::get_msgid, avg 528µs/call |
1242 | $pms->get('Message-Id') || $pms->get('Message-ID') || $pms->get('MESSAGE-ID') || $pms->get('MESSAGEID'); | ||||
1243 | |||||
1244 | 321 | 4.00ms | 321 | 8.15ms | my $from = lc $pms->get('From:addr') || $pms->get('EnvelopeFrom:addr');; # spent 8.15ms making 321 calls to Mail::SpamAssassin::PerMsgStatus::get, avg 25µs/call |
1245 | 321 | 14.8ms | 321 | 1.57ms | return 0 unless $from =~ /\S/; # spent 1.57ms making 321 calls to Mail::SpamAssassin::Plugin::TxRep::CORE:match, avg 5µs/call |
1246 | 321 | 1.28ms | my $domain = $from; | ||
1247 | 321 | 4.81ms | 321 | 2.33ms | $domain =~ s/^.+@//; # spent 2.33ms making 321 calls to Mail::SpamAssassin::Plugin::TxRep::CORE:subst, avg 7µs/call |
1248 | |||||
1249 | 321 | 962µs | my ($origip, $helo); | ||
1250 | 321 | 1.64ms | if (defined $pms->{relays_trusted} || defined $pms->{relays_untrusted}) { | ||
1251 | 642 | 2.53ms | my $trusteds = @{$pms->{relays_trusted}}; | ||
1252 | 963 | 4.92ms | foreach my $rly ( @{$pms->{relays_trusted}}, @{$pms->{relays_untrusted}} ) { | ||
1253 | # Get the last found HELO, regardless of private/public or trusted/untrusted | ||||
1254 | # Avoiding a redundant duplicate entry if HELO is equal/similar to another identificator | ||||
1255 | 1426 | 151ms | 7928 | 69.4ms | if (defined $rly->{helo} && $rly->{helo} !~ /^\[?$rly->{ip}\]?$/ && $rly->{helo} !~ /$domain/i && $rly->{helo} !~ /$from/i ) { # spent 54.0ms making 3964 calls to Mail::SpamAssassin::Plugin::TxRep::CORE:regcomp, avg 14µs/call
# spent 15.3ms making 3964 calls to Mail::SpamAssassin::Plugin::TxRep::CORE:match, avg 4µs/call |
1256 | 1112 | 3.77ms | $helo = $rly->{helo}; | ||
1257 | } | ||||
1258 | # use only trusted ID, but use the first untrusted IP (if available) (AWL bug 6908) | ||||
1259 | # at low spam scores (<2) ignore trusted/untrusted | ||||
1260 | # set IP to 127.0.0.1 for any internal IP, so that it can be distinguished from none (AWL bug 6357) | ||||
1261 | 1426 | 4.65ms | if ((--$trusteds >= 0 || $msgscore<2) && !$msg_id && $rly->{id}) {$msg_id = $rly->{id};} | ||
1262 | 1746 | 8.53ms | if (($trusteds >= -1 || $msgscore<2) && !$rly->{ip_private} && $rly->{ip}) {$origip = $rly->{ip};} | ||
1263 | 1747 | 12.1ms | if ( $trusteds >= 0 && !$origip && $rly->{ip_private} && $rly->{ip}) {$origip = '127.0.0.1';} | ||
1264 | } | ||||
1265 | } | ||||
1266 | |||||
1267 | 321 | 2.05ms | if ($self->{conf}->{txrep_track_messages}) { | ||
1268 | 321 | 1.30ms | if ($msg_id) { | ||
1269 | 321 | 3.90ms | 321 | 316s | my $msg_rep = $self->check_reputations($pms, 'MSG_ID', $msg_id, undef, $date, undef); # spent 316s making 321 calls to Mail::SpamAssassin::Plugin::TxRep::check_reputations, avg 985ms/call |
1270 | 321 | 3.64ms | 321 | 3.48ms | if (defined $msg_rep && $self->count()) { # spent 3.48ms making 321 calls to Mail::SpamAssassin::Plugin::TxRep::count, avg 11µs/call |
1271 | 174 | 1.26ms | if (defined $self->{learning} && !defined $self->{forgetting}) { | ||
1272 | # already learned, forget only if already learned (count>1), and relearn | ||||
1273 | # when only scanned (count=1), go ahead with normal rep scan | ||||
1274 | 87 | 824µs | 87 | 852µs | if ($self->count() > 1) { # spent 852µs making 87 calls to Mail::SpamAssassin::Plugin::TxRep::count, avg 10µs/call |
1275 | 87 | 231µs | $self->{last_pms} = $pms; # cache the pmstatus | ||
1276 | 87 | 907µs | 87 | 613s | $self->forget_message($pms->{msg},$msg_id); # sub reentrance OK # spent 613s making 87 calls to Mail::SpamAssassin::Plugin::TxRep::forget_message, avg 7.05s/call |
1277 | } | ||||
1278 | } elsif ($self->{forgetting}) { | ||||
1279 | 87 | 271µs | $msgscore = $msg_rep; # forget the old stored score instead of the one got now | ||
1280 | 87 | 1.08ms | 87 | 1.23ms | dbg("TxRep: forgetting stored score %0.3f of message %s", $msgscore || 'undef', $msg_id); # spent 1.23ms making 87 calls to Mail::SpamAssassin::Logger::dbg, avg 14µs/call |
1281 | } else { | ||||
1282 | # calculating the delta from the stored message reputation | ||||
1283 | $delta = ($msgscore + $self->{conf}->{txrep_factor}*$msg_rep) / (1+$self->{conf}->{txrep_factor}) - $msgscore; | ||||
1284 | if ($delta != 0) { | ||||
1285 | $pms->got_hit("TXREP", "TXREP: ", ruletype => 'eval', score => sprintf("%0.3f", $delta)); | ||||
1286 | } | ||||
1287 | dbg("TxRep: message %s already scanned, using old data; post-TxRep score: %0.3f", $msg_id, $pms->{score} || 'undef'); | ||||
1288 | return 0; | ||||
1289 | } | ||||
1290 | } # no stored reputation found, go ahead with normal rep scan | ||||
1291 | } else {dbg("TxRep: no message-id available, parsing forced");} | ||||
1292 | } # else no message tracking, go ahead with normal rep scan | ||||
1293 | |||||
1294 | # whitelists recipients at senders from internal networks after checking MSG_ID only | ||||
1295 | 321 | 4.51ms | if ( $self->{conf}->{txrep_whitelist_out} && | ||
1296 | 321 | 743µs | defined $pms->{relays_internal} && @{$pms->{relays_internal}} && | ||
1297 | 321 | 726µs | (!defined $pms->{relays_external} || !@{$pms->{relays_external}}) | ||
1298 | ) { | ||||
1299 | 1 | 14µs | 1 | 3.61ms | foreach my $rcpt ($pms->all_to_addrs()) { # spent 3.61ms making 1 call to Mail::SpamAssassin::PerMsgStatus::all_to_addrs |
1300 | 1 | 9µs | if ($rcpt) { | ||
1301 | 1 | 10µs | 1 | 7µs | dbg("TxRep: internal sender, whitelisting recipient: $rcpt"); # spent 7µs making 1 call to Mail::SpamAssassin::Logger::dbg |
1302 | 1 | 14µs | 1 | 3.21s | $self->modify_reputation($rcpt, -1*$self->{conf}->{txrep_whitelist_out}, undef); # spent 3.21s making 1 call to Mail::SpamAssassin::Plugin::TxRep::modify_reputation |
1303 | } | ||||
1304 | } | ||||
1305 | } | ||||
1306 | |||||
1307 | 321 | 4.28ms | 321 | 12.5ms | my $signedby = ($self->{conf}->{auto_whitelist_distinguish_signed})? $pms->get_tag('DKIMDOMAIN') : undef; # spent 12.5ms making 321 calls to Mail::SpamAssassin::PerMsgStatus::get_tag, avg 39µs/call |
1308 | dbg("TxRep: active, %s pre-score: %s, autolearn score: %s, IP: %s, address: %s %s", | ||||
1309 | $msg_id || '', | ||||
1310 | 321 | 4.17ms | 321 | 2.87ms | $pms->{score} || '?', # spent 2.87ms making 321 calls to Mail::SpamAssassin::Logger::dbg, avg 9µs/call |
1311 | $msgscore || '?', | ||||
1312 | $origip || '?', | ||||
1313 | $from || '?', | ||||
1314 | $signedby ? "signed by $signedby" : '(unsigned)' | ||||
1315 | ); | ||||
1316 | |||||
1317 | 321 | 988µs | my $ip = $origip; | ||
1318 | 321 | 1.07ms | if ($signedby) { | ||
1319 | $ip = undef; | ||||
1320 | $domain = $signedby; | ||||
1321 | } elsif ($pms->{spf_pass} && $self->{conf}->{txrep_spf}) { | ||||
1322 | $ip = undef; | ||||
1323 | $signedby = 'spf'; | ||||
1324 | } | ||||
1325 | |||||
1326 | 321 | 760µs | my $totalweight = 0; | ||
1327 | 321 | 1000µs | $self->{totalweight} = $totalweight; | ||
1328 | |||||
1329 | 321 | 2.94ms | 321 | 427ms | $delta += $self->check_reputations($pms, 'EMAIL_IP', $from, $ip, $signedby, $msgscore); # spent 427ms making 321 calls to Mail::SpamAssassin::Plugin::TxRep::check_reputations, avg 1.33ms/call |
1330 | 642 | 4.16ms | 321 | 332ms | if ($domain) {$delta += $self->check_reputations($pms, 'DOMAIN', $domain, $ip, $signedby, $msgscore);} # spent 332ms making 321 calls to Mail::SpamAssassin::Plugin::TxRep::check_reputations, avg 1.04ms/call |
1331 | 602 | 3.82ms | 281 | 248ms | if ($helo) {$delta += $self->check_reputations($pms, 'HELO', $helo, undef, 'HELO', $msgscore);} # spent 248ms making 281 calls to Mail::SpamAssassin::Plugin::TxRep::check_reputations, avg 884µs/call |
1332 | 321 | 1.31ms | if ($origip) { | ||
1333 | 642 | 3.94ms | 321 | 312ms | if (!$signedby) {$delta += $self->check_reputations($pms, 'EMAIL', $from, undef, undef, $msgscore);} # spent 312ms making 321 calls to Mail::SpamAssassin::Plugin::TxRep::check_reputations, avg 972µs/call |
1334 | 321 | 2.67ms | 321 | 303ms | $delta += $self->check_reputations($pms, 'IP', $origip, undef, undef, $msgscore); # spent 303ms making 321 calls to Mail::SpamAssassin::Plugin::TxRep::check_reputations, avg 945µs/call |
1335 | } | ||||
1336 | |||||
1337 | 321 | 873µs | if (!defined $self->{learning}) { | ||
1338 | $delta = ($self->{totalweight})? $self->{conf}->{txrep_factor} * $delta / $self->{totalweight} : 0; | ||||
1339 | if ($delta) { | ||||
1340 | $pms->got_hit("TXREP", "TXREP: ", ruletype => 'eval', score => sprintf("%0.3f", $delta)); | ||||
1341 | } | ||||
1342 | $msgscore += $delta; | ||||
1343 | if (defined $pms->{score}) { | ||||
1344 | dbg("TxRep: post-TxRep score: %.3f", $pms->{score}); | ||||
1345 | } | ||||
1346 | } | ||||
1347 | 321 | 1.86ms | if ($self->{conf}->{txrep_track_messages} && $msg_id) { | ||
1348 | 321 | 2.49ms | 321 | 297s | $self->check_reputations($pms, 'MSG_ID', $msg_id, undef, $date, $msgscore); # spent 297s making 321 calls to Mail::SpamAssassin::Plugin::TxRep::check_reputations, avg 924ms/call |
1349 | } | ||||
1350 | 321 | 1.72ms | if (!defined $self->{txKeepStoreTied}) {$self->finish();} | ||
1351 | |||||
1352 | 321 | 5.92ms | return 0; | ||
1353 | } | ||||
1354 | |||||
1355 | |||||
1356 | ########################################################################### | ||||
1357 | # spent 615s (93.3ms+614) within Mail::SpamAssassin::Plugin::TxRep::check_reputations which was called 2207 times, avg 278ms/call:
# 321 times (14.4ms+316s) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1269, avg 985ms/call
# 321 times (12.4ms+297s) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1348, avg 924ms/call
# 321 times (13.1ms+414ms) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1329, avg 1.33ms/call
# 321 times (18.2ms+314ms) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1330, avg 1.04ms/call
# 321 times (12.0ms+300ms) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1333, avg 972µs/call
# 321 times (12.4ms+291ms) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1334, avg 945µs/call
# 281 times (10.8ms+238ms) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1331, avg 884µs/call | ||||
1358 | ########################################################################### | ||||
1359 | 2207 | 4.28ms | my $self = shift; | ||
1360 | 2207 | 3.66ms | my $delta; | ||
1361 | |||||
1362 | 2207 | 20.2ms | 2207 | 29.9ms | if ($self->open_storages()) { # spent 29.9ms making 2207 calls to Mail::SpamAssassin::Plugin::TxRep::open_storages, avg 14µs/call |
1363 | 2207 | 10.5ms | if ($self->{conf}->{txrep_user2global_ratio} && $self->{user_storage} != $self->{global_storage}) { | ||
1364 | my $user = $self->check_reputation('user_storage', @_); | ||||
1365 | my $global = $self->check_reputation('global_storage',@_); | ||||
1366 | |||||
1367 | $delta = (defined $user && $user==$user) ? | ||||
1368 | ( $self->{conf}->{txrep_user2global_ratio} * $user + $global ) / ( 1 + $self->{conf}->{txrep_user2global_ratio} ) : | ||||
1369 | $global; | ||||
1370 | } else { | ||||
1371 | 2207 | 19.6ms | 2207 | 614s | $delta = $self->check_reputation(undef,@_); # spent 614s making 2207 calls to Mail::SpamAssassin::Plugin::TxRep::check_reputation, avg 278ms/call |
1372 | } | ||||
1373 | } | ||||
1374 | 2207 | 33.4ms | return $delta; | ||
1375 | } | ||||
1376 | |||||
1377 | |||||
1378 | ########################################################################### | ||||
1379 | # spent 614s (655ms+614) within Mail::SpamAssassin::Plugin::TxRep::check_reputation which was called 2207 times, avg 278ms/call:
# 2207 times (655ms+614s) by Mail::SpamAssassin::Plugin::TxRep::check_reputations at line 1371, avg 278ms/call | ||||
1380 | ########################################################################### | ||||
1381 | 2207 | 22.3ms | my ($self, $storage, $pms, $key, $id, $ip, $signedby, $msgscore) = @_; | ||
1382 | |||||
1383 | 2207 | 4.18ms | my $delta = 0; | ||
1384 | 2207 | 127ms | my $weight = ($key eq 'MSG_ID')? 1 : eval('$pms->{main}->{conf}->{txrep_weight_'.lc($key).'}'); # spent 4.32ms executing statements in 321 string evals (merged)
# spent 2.72ms executing statements in 321 string evals (merged)
# spent 2.71ms executing statements in 321 string evals (merged)
# spent 2.56ms executing statements in 321 string evals (merged)
# spent 2.31ms executing statements in 281 string evals (merged) | ||
1385 | |||||
1386 | 2207 | 8.66ms | if (defined $weight && $weight) { | ||
1387 | 2207 | 3.68ms | my $meanrep; | ||
1388 | 2207 | 24.5ms | 2207 | 20.1ms | my $timer = $self->{main}->time_method('check_txrep_'.lc($key)); # spent 20.1ms making 2207 calls to Mail::SpamAssassin::time_method, avg 9µs/call |
1389 | |||||
1390 | 2207 | 4.48ms | if (defined $storage) { | ||
1391 | $self->{checker} = $self->{$storage}; | ||||
1392 | } | ||||
1393 | 2207 | 18.5ms | 2207 | 925ms | my $found = $self->get_sender($id, $ip, $signedby); # spent 925ms making 2207 calls to Mail::SpamAssassin::Plugin::TxRep::get_sender, avg 419µs/call |
1394 | 2207 | 6.87ms | my $tag_id = (defined $storage)? uc($key.'_'.substr($storage,0,1)) : uc($key); | ||
1395 | 2207 | 24.1ms | 2207 | 22.5ms | if (defined $found && $self->count()) { # spent 22.5ms making 2207 calls to Mail::SpamAssassin::Plugin::TxRep::count, avg 10µs/call |
1396 | 1721 | 25.9ms | 3442 | 28.8ms | $meanrep = $self->total() / $self->count(); # spent 15.7ms making 1721 calls to Mail::SpamAssassin::Plugin::TxRep::total, avg 9µs/call
# spent 13.1ms making 1721 calls to Mail::SpamAssassin::Plugin::TxRep::count, avg 8µs/call |
1397 | } | ||||
1398 | 2207 | 12.6ms | if ($self->{learning} && defined $msgscore) { | ||
1399 | 1886 | 6.02ms | if (defined $meanrep) { | ||
1400 | # $msgscore<=>0 gives the sign of $msgscore | ||||
1401 | 1547 | 10.5ms | $msgscore += ($msgscore<=>0) * abs($meanrep); | ||
1402 | } | ||||
1403 | dbg("TxRep: reputation: %s, count: %d, learning: %s, $tag_id: %s", | ||||
1404 | defined $meanrep? sprintf("%.3f",$meanrep) : 'none', | ||||
1405 | $self->count() || 0, | ||||
1406 | 1886 | 55.6ms | 3772 | 34.7ms | $self->{learning} || '', # spent 18.8ms making 1886 calls to Mail::SpamAssassin::Logger::dbg, avg 10µs/call
# spent 15.9ms making 1886 calls to Mail::SpamAssassin::Plugin::TxRep::count, avg 8µs/call |
1407 | $id || 'none' | ||||
1408 | ); | ||||
1409 | } else { | ||||
1410 | 321 | 915µs | $self->{totalweight} += $weight; | ||
1411 | 321 | 4.94ms | 468 | 3.73ms | if ($key eq 'MSG_ID' && $self->count() > 0) { # spent 2.39ms making 321 calls to Mail::SpamAssassin::Plugin::TxRep::count, avg 7µs/call
# spent 1.34ms making 147 calls to Mail::SpamAssassin::Plugin::TxRep::total, avg 9µs/call |
1412 | 174 | 2.31ms | 348 | 2.55ms | $delta = $self->total() / $self->count(); # spent 1.29ms making 174 calls to Mail::SpamAssassin::Plugin::TxRep::total, avg 7µs/call
# spent 1.26ms making 174 calls to Mail::SpamAssassin::Plugin::TxRep::count, avg 7µs/call |
1413 | 174 | 4.83ms | 174 | 12.5ms | $pms->set_tag('TXREP'.$tag_id, sprintf("%2.1f",$delta)); # spent 12.5ms making 174 calls to Mail::SpamAssassin::PerMsgStatus::set_tag, avg 72µs/call |
1414 | } elsif (defined $self->total()) { | ||||
1415 | 147 | 10.7ms | 294 | 2.38ms | $delta = ($self->total() + $msgscore) / (1 + $self->count()) - $msgscore; # spent 1.28ms making 147 calls to Mail::SpamAssassin::Plugin::TxRep::total, avg 9µs/call
# spent 1.10ms making 147 calls to Mail::SpamAssassin::Plugin::TxRep::count, avg 7µs/call |
1416 | |||||
1417 | 147 | 3.64ms | 147 | 10.5ms | $pms->set_tag('TXREP_'.$tag_id, sprintf("%2.1f",$delta)); # spent 10.5ms making 147 calls to Mail::SpamAssassin::PerMsgStatus::set_tag, avg 71µs/call |
1418 | 147 | 316µs | if (defined $meanrep) { | ||
1419 | $pms->set_tag('TXREP_'.$tag_id.'_MEAN', sprintf("%2.1f", $meanrep)); | ||||
1420 | } | ||||
1421 | 147 | 2.65ms | 294 | 8.51ms | $pms->set_tag('TXREP_'.$tag_id.'_COUNT', sprintf("%2.1f", $self->count())); # spent 7.35ms making 147 calls to Mail::SpamAssassin::PerMsgStatus::set_tag, avg 50µs/call
# spent 1.16ms making 147 calls to Mail::SpamAssassin::Plugin::TxRep::count, avg 8µs/call |
1422 | 147 | 1.98ms | 147 | 8.20ms | $pms->set_tag('TXREP_'.$tag_id.'_PRESCORE', sprintf("%2.1f", $pms->{score})); # spent 8.20ms making 147 calls to Mail::SpamAssassin::PerMsgStatus::set_tag, avg 56µs/call |
1423 | } else { | ||||
1424 | $pms->set_tag('TXREP_'.$tag_id.'_UNKNOWN', 1); | ||||
1425 | } | ||||
1426 | 321 | 7.01ms | 642 | 5.42ms | dbg("TxRep: reputation: %s, count: %d, weight: %.1f, delta: %.3f, $tag_id: %s", # spent 2.94ms making 321 calls to Mail::SpamAssassin::Plugin::TxRep::count, avg 9µs/call
# spent 2.47ms making 321 calls to Mail::SpamAssassin::Logger::dbg, avg 8µs/call |
1427 | defined $meanrep? sprintf("%.3f",$meanrep) : 'none', | ||||
1428 | $self->count() || 0, | ||||
1429 | $weight || 0, | ||||
1430 | $delta || 0, | ||||
1431 | $id || 'none' | ||||
1432 | ); | ||||
1433 | } | ||||
1434 | 2207 | 21.9ms | 2207 | 19.8ms | $timer = $self->{main}->time_method('update_txrep_'.lc($key)); # spent 19.8ms making 2207 calls to Mail::SpamAssassin::time_method, avg 9µs/call |
1435 | 2207 | 22.7ms | if (defined $msgscore) { | ||
1436 | 1886 | 7.55ms | if ($self->{forgetting}) { # forgetting a message score | ||
1437 | 512 | 3.74ms | 512 | 83.5ms | $self->remove_score($msgscore); # remove the given score and decrement the count # spent 83.5ms making 512 calls to Mail::SpamAssassin::Plugin::TxRep::remove_score, avg 163µs/call |
1438 | 512 | 2.16ms | if ($key eq 'MSG_ID') { # remove the message ID score completely | ||
1439 | 87 | 794µs | 87 | 296s | $self->{checker}->remove_entry($self->{entry}); # spent 296s making 87 calls to Mail::SpamAssassin::DBBasedAddrList::remove_entry, avg 3.41s/call |
1440 | } | ||||
1441 | } else { | ||||
1442 | 1374 | 10.4ms | 1374 | 262ms | $self->add_score($msgscore); # add the score and increment the count # spent 262ms making 1374 calls to Mail::SpamAssassin::Plugin::TxRep::add_score, avg 191µs/call |
1443 | 1374 | 7.82ms | 234 | 2.36ms | if ($self->{learning} && $key eq 'MSG_ID' && $self->count() eq 1) { # spent 2.36ms making 234 calls to Mail::SpamAssassin::Plugin::TxRep::count, avg 10µs/call |
1444 | 234 | 1.61ms | 234 | 38.8ms | $self->add_score($msgscore); # increasing the count by 1 at a learned score (count=2) # spent 38.8ms making 234 calls to Mail::SpamAssassin::Plugin::TxRep::add_score, avg 166µs/call |
1445 | } # it can be distinguished from a scanned score (count=1) | ||||
1446 | } | ||||
1447 | } elsif (defined $found && $self->{forgetting} && $key eq 'MSG_ID') { | ||||
1448 | 87 | 821µs | 87 | 316s | $self->{checker}->remove_entry($self->{entry}); #forgetting the message ID # spent 316s making 87 calls to Mail::SpamAssassin::DBBasedAddrList::remove_entry, avg 3.63s/call |
1449 | } | ||||
1450 | } | ||||
1451 | 2207 | 4.05ms | if (defined $storage) {$self->{checker} = $self->{default_storage};} | ||
1452 | |||||
1453 | 2207 | 38.4ms | return ($weight || 0) * ($delta || 0); | ||
1454 | } | ||||
1455 | |||||
- - | |||||
1458 | #-------------------------------------------------------------------------- | ||||
1459 | # Database handler subroutines | ||||
1460 | #-------------------------------------------------------------------------- | ||||
1461 | |||||
1462 | ########################################################################### | ||||
1463 | 18348 | 143ms | # spent 80.5ms within Mail::SpamAssassin::Plugin::TxRep::count which was called 9174 times, avg 9µs/call:
# 2207 times (22.5ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_reputation at line 1395, avg 10µs/call
# 1886 times (15.9ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_reputation at line 1406, avg 8µs/call
# 1721 times (13.1ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_reputation at line 1396, avg 8µs/call
# 1608 times (13.5ms+0s) by Mail::SpamAssassin::Plugin::TxRep::add_score at line 1502, avg 8µs/call
# 321 times (3.48ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1270, avg 11µs/call
# 321 times (2.94ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_reputation at line 1426, avg 9µs/call
# 321 times (2.39ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_reputation at line 1411, avg 7µs/call
# 234 times (2.36ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_reputation at line 1443, avg 10µs/call
# 174 times (1.26ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_reputation at line 1412, avg 7µs/call
# 147 times (1.16ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_reputation at line 1421, avg 8µs/call
# 147 times (1.10ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_reputation at line 1415, avg 7µs/call
# 87 times (852µs+0s) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1274, avg 10µs/call | ||
1464 | 7594 | 60.6ms | # spent 32.2ms within Mail::SpamAssassin::Plugin::TxRep::total which was called 3797 times, avg 8µs/call:
# 1721 times (15.7ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_reputation at line 1396, avg 9µs/call
# 1608 times (12.6ms+0s) by Mail::SpamAssassin::Plugin::TxRep::add_score at line 1502, avg 8µs/call
# 174 times (1.29ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_reputation at line 1412, avg 7µs/call
# 147 times (1.34ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_reputation at line 1411, avg 9µs/call
# 147 times (1.28ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_reputation at line 1415, avg 9µs/call | ||
1465 | ########################################################################### | ||||
1466 | |||||
1467 | |||||
1468 | ########################################################################### | ||||
1469 | # spent 925ms (189+736) within Mail::SpamAssassin::Plugin::TxRep::get_sender which was called 2207 times, avg 419µs/call:
# 2207 times (189ms+736ms) by Mail::SpamAssassin::Plugin::TxRep::check_reputation at line 1393, avg 419µs/call | ||||
1470 | ########################################################################### | ||||
1471 | 2207 | 16.1ms | my ($self, $addr, $origip, $signedby) = @_; | ||
1472 | |||||
1473 | 2207 | 4.97ms | return unless (defined $self->{checker}); | ||
1474 | |||||
1475 | 2207 | 17.9ms | 2207 | 142ms | my $fulladdr = $self->pack_addr($addr, $origip); # spent 142ms making 2207 calls to Mail::SpamAssassin::Plugin::TxRep::pack_addr, avg 64µs/call |
1476 | 2207 | 18.8ms | 2207 | 574ms | my $entry = $self->{checker}->get_addr_entry($fulladdr, $signedby); # spent 574ms making 2207 calls to Mail::SpamAssassin::DBBasedAddrList::get_addr_entry, avg 260µs/call |
1477 | 2207 | 13.6ms | $self->{entry} = $entry; | ||
1478 | 2207 | 5.54ms | $origip = $origip || 'none'; | ||
1479 | |||||
1480 | 2207 | 83.1ms | 4414 | 20.1ms | if ($entry->{count}<0 || $entry->{count}=~/^(nan|)$/ || $entry->{totscore}=~/^(nan|)$/) { # spent 20.1ms making 4414 calls to Mail::SpamAssassin::Plugin::TxRep::CORE:match, avg 5µs/call |
1481 | warn "TxRep: resetting bad data for ($addr, $origip), count: $entry->{count}, totscore: $entry->{totscore}\n"; | ||||
1482 | $self->{entry}->{count} = $self->{entry}->{totscore} = 0; | ||||
1483 | } | ||||
1484 | 2207 | 40.1ms | return $self->{entry}->{count}; | ||
1485 | } | ||||
1486 | |||||
1487 | |||||
1488 | ########################################################################### | ||||
1489 | # spent 301ms (85.8+215) within Mail::SpamAssassin::Plugin::TxRep::add_score which was called 1608 times, avg 187µs/call:
# 1374 times (75.1ms+187ms) by Mail::SpamAssassin::Plugin::TxRep::check_reputation at line 1442, avg 191µs/call
# 234 times (10.7ms+28.1ms) by Mail::SpamAssassin::Plugin::TxRep::check_reputation at line 1444, avg 166µs/call | ||||
1490 | ########################################################################### | ||||
1491 | 1608 | 6.10ms | my ($self,$score) = @_; | ||
1492 | |||||
1493 | 1608 | 4.58ms | return unless (defined $self->{checker}); # no factory defined; we can't check | ||
1494 | |||||
1495 | 1608 | 4.26ms | if ($score != $score) { | ||
1496 | warn "TxRep: attempt to add a $score to TxRep entry ignored\n"; | ||||
1497 | return; # don't try to add a NaN | ||||
1498 | } | ||||
1499 | 1608 | 4.98ms | $self->{entry}->{count} ||= 0; | ||
1500 | |||||
1501 | # performing the dilution aging correction | ||||
1502 | 1608 | 36.4ms | 3216 | 26.0ms | if (defined $self->total() && defined $self->count() && defined $self->{txrep_dilution_factor}) { # spent 13.5ms making 1608 calls to Mail::SpamAssassin::Plugin::TxRep::count, avg 8µs/call
# spent 12.6ms making 1608 calls to Mail::SpamAssassin::Plugin::TxRep::total, avg 8µs/call |
1503 | my $diluted_total = | ||||
1504 | ($self->count() + 1) * | ||||
1505 | ($self->{txrep_dilution_factor} * $self->total() + $score) / | ||||
1506 | ($self->{txrep_dilution_factor} * $self->count() + 1); | ||||
1507 | my $corrected_score = $diluted_total - $self->total(); | ||||
1508 | $self->{checker}->add_score($self->{entry}, $corrected_score); | ||||
1509 | } else { | ||||
1510 | 1608 | 12.9ms | 1608 | 189ms | $self->{checker}->add_score($self->{entry}, $score); # spent 189ms making 1608 calls to Mail::SpamAssassin::DBBasedAddrList::add_score, avg 118µs/call |
1511 | } | ||||
1512 | } | ||||
1513 | |||||
- - | |||||
1516 | ########################################################################### | ||||
1517 | # spent 83.5ms (28.1+55.4) within Mail::SpamAssassin::Plugin::TxRep::remove_score which was called 512 times, avg 163µs/call:
# 512 times (28.1ms+55.4ms) by Mail::SpamAssassin::Plugin::TxRep::check_reputation at line 1437, avg 163µs/call | ||||
1518 | ########################################################################### | ||||
1519 | 512 | 2.57ms | my ($self,$score) = @_; | ||
1520 | |||||
1521 | 512 | 1.13ms | return unless (defined $self->{checker}); # no factory defined; we can't check | ||
1522 | |||||
1523 | 512 | 1.52ms | if ($score != $score) { # don't try to add a NaN | ||
1524 | warn "TxRep: attempt to add a $score to TxRep entry ignored\n"; | ||||
1525 | return; | ||||
1526 | } | ||||
1527 | # no reversal dilution aging correction (not easily possible), | ||||
1528 | # just removing the original message score | ||||
1529 | 512 | 2.34ms | if ($self->{entry}->{count} > 2) | ||
1530 | 58 | 231µs | {$self->{entry}->{count} -= 2;} | ||
1531 | 454 | 1.06ms | else {$self->{entry}->{count} = 0;} | ||
1532 | # substract 2, and add a score; hence decrementing by 1 | ||||
1533 | 512 | 8.29ms | 512 | 55.4ms | $self->{checker}->add_score($self->{entry}, -1*$score); # spent 55.4ms making 512 calls to Mail::SpamAssassin::DBBasedAddrList::add_score, avg 108µs/call |
1534 | } | ||||
1535 | |||||
- - | |||||
1538 | ########################################################################### | ||||
1539 | # spent 3.21s (148µs+3.20) within Mail::SpamAssassin::Plugin::TxRep::modify_reputation which was called:
# once (148µs+3.20s) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1302 | ||||
1540 | ########################################################################### | ||||
1541 | 1 | 6µs | my ($self, $addr, $score, $signedby) = @_; | ||
1542 | |||||
1543 | 1 | 2µs | return unless (defined $self->{checker}); # no factory defined; we can't check | ||
1544 | 1 | 9µs | 1 | 61µs | my $fulladdr = $self->pack_addr($addr, undef); # spent 61µs making 1 call to Mail::SpamAssassin::Plugin::TxRep::pack_addr |
1545 | 1 | 9µs | 1 | 397µs | my $entry = $self->{checker}->get_addr_entry($fulladdr, $signedby); # spent 397µs making 1 call to Mail::SpamAssassin::DBBasedAddrList::get_addr_entry |
1546 | |||||
1547 | # remove any old entries (will remove per-ip entries as well) | ||||
1548 | # always call this regardless, as the current entry may have 0 | ||||
1549 | # scores, but the per-ip one may have more | ||||
1550 | 1 | 10µs | 1 | 3.20s | $self->{checker}->remove_entry($entry); # spent 3.20s making 1 call to Mail::SpamAssassin::DBBasedAddrList::remove_entry |
1551 | |||||
1552 | # remove address only, no new score to add if score NaN or undef | ||||
1553 | 1 | 9µs | if (defined $score && $score==$score) { | ||
1554 | # else add score. get a new entry first | ||||
1555 | 1 | 23µs | 1 | 242µs | $entry = $self->{checker}->get_addr_entry($fulladdr, $signedby); # spent 242µs making 1 call to Mail::SpamAssassin::DBBasedAddrList::get_addr_entry |
1556 | 1 | 10µs | 1 | 108µs | $self->{checker}->add_score($entry, $score); # spent 108µs making 1 call to Mail::SpamAssassin::DBBasedAddrList::add_score |
1557 | } | ||||
1558 | 1 | 21µs | return 1; | ||
1559 | } | ||||
1560 | |||||
1561 | |||||
1562 | # connecting the primary and the secondary storage; needed only on the first run | ||||
1563 | # (this can't be in the constructor, since the settings are not available there) | ||||
1564 | ########################################################################### | ||||
1565 | # spent 29.9ms (21.1+8.78) within Mail::SpamAssassin::Plugin::TxRep::open_storages which was called 2207 times, avg 14µs/call:
# 2207 times (21.1ms+8.78ms) by Mail::SpamAssassin::Plugin::TxRep::check_reputations at line 1362, avg 14µs/call | ||||
1566 | ########################################################################### | ||||
1567 | 2207 | 4.02ms | my $self = shift; | ||
1568 | |||||
1569 | 2207 | 24.7ms | return 1 unless (!defined $self->{default_storage}); | ||
1570 | |||||
1571 | 1 | 2µs | my $factory; | ||
1572 | 1 | 11µs | if ($self->{main}->{pers_addr_list_factory}) { | ||
1573 | $factory = $self->{main}->{pers_addr_list_factory}; | ||||
1574 | } else { | ||||
1575 | 1 | 4µs | my $type = $self->{conf}->{txrep_factory}; | ||
1576 | 1 | 15µs | 1 | 5µs | if ($type =~ /^([_A-Za-z0-9:]+)$/) { # spent 5µs making 1 call to Mail::SpamAssassin::Plugin::TxRep::CORE:match |
1577 | 1 | 10µs | 1 | 30µs | $type = untaint_var($type); # spent 30µs making 1 call to Mail::SpamAssassin::Util::untaint_var |
1578 | eval 'require '.$type.'; | ||||
1579 | $factory = '.$type.'->new(); | ||||
1580 | 1;' | ||||
1581 | 1 | 163µs | or do { # spent 372µs executing statements in string eval | ||
1582 | my $eval_stat = $@ ne '' ? $@ : "errno=$!"; chomp $eval_stat; | ||||
1583 | warn "TxRep: $eval_stat\n"; | ||||
1584 | undef $factory; | ||||
1585 | }; | ||||
1586 | 1 | 16µs | 1 | 10µs | $self->{main}->set_persistent_address_list_factory($factory) if $factory; # spent 10µs making 1 call to Mail::SpamAssassin::set_persistent_address_list_factory |
1587 | } else {warn "TxRep: illegal factory setting\n";} | ||||
1588 | } | ||||
1589 | 1 | 4µs | if (defined $factory) { | ||
1590 | 1 | 14µs | 1 | 4.18ms | $self->{checker} = $self->{default_storage} = $factory->new_checker($self->{main}); # spent 4.18ms making 1 call to Mail::SpamAssassin::DBBasedAddrList::new_checker |
1591 | |||||
1592 | 1 | 4µs | if ($self->{conf}->{txrep_user2global_ratio} && !defined $self->{global_storage}) { | ||
1593 | # hack to handle the BDB and SQL factory types of the storage object | ||||
1594 | # TODO: add an a method to the handler class instead | ||||
1595 | my ($storage_type, $is_global); | ||||
1596 | |||||
1597 | if (ref($factory) =~ /SQLasedAddrList/) { | ||||
1598 | $is_global = defined $self->{conf}->{user_awl_sql_override_username}; | ||||
1599 | $storage_type = 'SQL'; | ||||
1600 | if ($is_global && $self->{conf}->{user_awl_sql_override_username} eq $self->{main}->{username}) { | ||||
1601 | # skip double storage if current user same as the global override | ||||
1602 | $self->{user_storage} = $self->{global_storage} = $self->{default_storage}; | ||||
1603 | } | ||||
1604 | } elsif (ref($factory) =~ /DBBasedAddrList/) { | ||||
1605 | $is_global = $self->{conf}->{auto_whitelist_path} !~ /__userstate__/; | ||||
1606 | $storage_type = 'DB'; | ||||
1607 | } | ||||
1608 | if (!defined $self->{global_storage}) { | ||||
1609 | my $sql_override_orig = $self->{conf}->{user_awl_sql_override_username}; | ||||
1610 | my $awl_path_orig = $self->{conf}->{auto_whitelist_path}; | ||||
1611 | if ($is_global) { | ||||
1612 | $self->{conf}->{user_awl_sql_override_username} = ''; | ||||
1613 | $self->{conf}->{auto_whitelist_path} = '__userstate__/tx-reputation'; | ||||
1614 | $self->{global_storage} = $self->{default_storage}; | ||||
1615 | $self->{user_storage} = $factory->new_checker($self->{main}); | ||||
1616 | } else { | ||||
1617 | $self->{conf}->{user_awl_sql_override_username} = 'GLOBAL'; | ||||
1618 | $self->{conf}->{auto_whitelist_path} = '__local_state_dir__/tx-reputation'; | ||||
1619 | $self->{global_storage} = $factory->new_checker($self->{main}); | ||||
1620 | $self->{user_storage} = $self->{default_storage}; | ||||
1621 | } | ||||
1622 | $self->{conf}->{user_awl_sql_override_username} = $sql_override_orig; | ||||
1623 | $self->{conf}->{auto_whitelist_path} = $awl_path_orig; | ||||
1624 | |||||
1625 | # Another ugly hack to find out whether the user differs from | ||||
1626 | # the global one. We need to add a method to the factory handlers | ||||
1627 | if ($storage_type eq 'DB' && | ||||
1628 | $self->{user_storage}->{locked_file} eq $self->{global_storage}->{locked_file}) { | ||||
1629 | if ($is_global) | ||||
1630 | {$self->{global_storage}->finish();} | ||||
1631 | else {$self->{user_storage}->finish();} | ||||
1632 | $self->{user_storage} = $self->{global_storage} = $self->{default_storage}; | ||||
1633 | } | ||||
1634 | } | ||||
1635 | } | ||||
1636 | } else { | ||||
1637 | $self->{user_storage} = $self->{global_storage} = $self->{checker} = $self->{default_storage} = undef; | ||||
1638 | warn("TxRep: could not open storages, quitting!\n"); | ||||
1639 | return 0; | ||||
1640 | } | ||||
1641 | 1 | 11µs | return 1; | ||
1642 | } | ||||
1643 | |||||
1644 | |||||
1645 | ########################################################################### | ||||
1646 | # spent 5.67s (92µs+5.67) within Mail::SpamAssassin::Plugin::TxRep::finish which was called:
# once (92µs+5.67s) by Mail::SpamAssassin::Plugin::TxRep::learner_close at line 1825 | ||||
1647 | ########################################################################### | ||||
1648 | 1 | 2µs | my $self = shift; | ||
1649 | |||||
1650 | 1 | 3µs | return unless (defined $self->{checker}); # no factory defined; we can't check | ||
1651 | |||||
1652 | 1 | 50µs | if ($self->{conf}->{txrep_user2global_ratio} && defined $self->{user_storage} && ($self->{user_storage} != $self->{global_storage})) { | ||
1653 | $self->{user_storage}->finish(); | ||||
1654 | $self->{global_storage}->finish(); | ||||
1655 | $self->{user_storage} = undef; | ||||
1656 | $self->{global_storage} = undef; | ||||
1657 | } elsif (defined $self->{default_storage}) { | ||||
1658 | 1 | 11µs | 1 | 5.67s | $self->{default_storage}->finish(); # spent 5.67s making 1 call to Mail::SpamAssassin::DBBasedAddrList::finish |
1659 | 1 | 8µs | $self->{default_storage} = $self->{checker} = undef; | ||
1660 | } | ||||
1661 | 1 | 14µs | $self->{factory} = undef; | ||
1662 | } | ||||
1663 | |||||
1664 | |||||
1665 | ########################################################################### | ||||
1666 | # spent 40.9ms (32.4+8.54) within Mail::SpamAssassin::Plugin::TxRep::ip_to_awl_key which was called 642 times, avg 64µs/call:
# 642 times (32.4ms+8.54ms) by Mail::SpamAssassin::Plugin::TxRep::pack_addr at line 1721, avg 64µs/call | ||||
1667 | ########################################################################### | ||||
1668 | 642 | 2.63ms | my ($self, $origip) = @_; | ||
1669 | |||||
1670 | 642 | 1.15ms | my $result; | ||
1671 | 642 | 3.36ms | local $1; | ||
1672 | 642 | 16.3ms | 642 | 8.54ms | if (!defined $origip) { # spent 8.54ms making 642 calls to Mail::SpamAssassin::Plugin::TxRep::CORE:match, avg 13µs/call |
1673 | # could not find an IP address to use | ||||
1674 | } elsif ($origip =~ /^ (\d{1,3} \. \d{1,3}) \. \d{1,3} \. \d{1,3} $/xs) { | ||||
1675 | 642 | 1.88ms | my $mask_len = $self->{ipv4_mask_len}; | ||
1676 | 642 | 1.78ms | $mask_len = 16 if !defined $mask_len; | ||
1677 | # handle the default and easy cases manually | ||||
1678 | 642 | 2.97ms | if ($mask_len == 32) {$result = $origip;} | ||
1679 | 642 | 2.50ms | elsif ($mask_len == 16) {$result = $1;} | ||
1680 | else { | ||||
1681 | my $origip_obj = NetAddr::IP->new($origip . '/' . $mask_len); | ||||
1682 | if (!defined $origip_obj) { # invalid IPv4 address | ||||
1683 | dbg("TxRep: bad IPv4 address $origip"); | ||||
1684 | } else { | ||||
1685 | $result = $origip_obj->network->addr; | ||||
1686 | $result =~s/(\.0){1,3}\z//; # truncate zero tail | ||||
1687 | } | ||||
1688 | } | ||||
1689 | } elsif ($origip =~ /:/ && # triage | ||||
1690 | $origip =~ | ||||
1691 | /^ [0-9a-f]{0,4} (?: : [0-9a-f]{0,4} | \. [0-9]{1,3} ){2,9} $/xsi) { | ||||
1692 | # looks like an IPv6 address | ||||
1693 | my $mask_len = $self->{ipv6_mask_len}; | ||||
1694 | $mask_len = 48 if !defined $mask_len; | ||||
1695 | my $origip_obj = NetAddr::IP->new6($origip . '/' . $mask_len); | ||||
1696 | if (!defined $origip_obj) { # invalid IPv6 address | ||||
1697 | dbg("TxRep: bad IPv6 address $origip"); | ||||
1698 | } else { | ||||
1699 | $result = $origip_obj->network->full6; # string in a canonical form | ||||
1700 | $result =~ s/(:0000){1,7}\z/::/; # compress zero tail | ||||
1701 | } | ||||
1702 | } else { | ||||
1703 | dbg("TxRep: bad IP address $origip"); | ||||
1704 | } | ||||
1705 | 642 | 2.69ms | if (defined $result && length($result) > 39) { # just in case, keep under | ||
1706 | $result = substr($result,0,39); # the awl.ip field size | ||||
1707 | } | ||||
1708 | # if (defined $result) {dbg("TxRep: IP masking %s -> %s", $origip || '?', $result || '?');} | ||||
1709 | 642 | 7.70ms | return $result; | ||
1710 | } | ||||
1711 | |||||
1712 | |||||
1713 | ########################################################################### | ||||
1714 | # spent 142ms (84.8+57.3) within Mail::SpamAssassin::Plugin::TxRep::pack_addr which was called 2208 times, avg 64µs/call:
# 2207 times (84.8ms+57.3ms) by Mail::SpamAssassin::Plugin::TxRep::get_sender at line 1475, avg 64µs/call
# once (50µs+11µs) by Mail::SpamAssassin::Plugin::TxRep::modify_reputation at line 1544 | ||||
1715 | ########################################################################### | ||||
1716 | 2208 | 9.12ms | my ($self, $addr, $origip) = @_; | ||
1717 | |||||
1718 | 2208 | 6.96ms | $addr = lc $addr; | ||
1719 | 2208 | 34.8ms | 2208 | 16.4ms | $addr =~ s/[\000\;\'\"\!\|]/_/gs; # paranoia # spent 16.4ms making 2208 calls to Mail::SpamAssassin::Plugin::TxRep::CORE:subst, avg 7µs/call |
1720 | |||||
1721 | 2850 | 11.4ms | 642 | 40.9ms | if ( defined $origip) {$origip = $self->ip_to_awl_key($origip);} # spent 40.9ms making 642 calls to Mail::SpamAssassin::Plugin::TxRep::ip_to_awl_key, avg 64µs/call |
1722 | 3774 | 12.3ms | if (!defined $origip) {$origip = 'none';} | ||
1723 | 2208 | 31.7ms | return $addr . "|ip=" . $origip; | ||
1724 | } | ||||
1725 | |||||
- - | |||||
1728 | # ------------------------------------------------------------------------- | ||||
1729 | =head1 LEARNING SPAM / HAM | ||||
1730 | |||||
1731 | When SpamAssassin is told to learn (or relearn) a given message as spam or | ||||
1732 | ham, all reputations relevant to the message (email, email_ip, domain, ip, helo) | ||||
1733 | in both global and user storages will be updated using the C<txrep_learn_penalty> | ||||
1734 | respectively the C<rxrep_learn_bonus> values. The new reputation of given sender | ||||
1735 | property (email, domain,...) will be the respective result of one of the following | ||||
1736 | formulas: | ||||
1737 | |||||
1738 | new_reputation = old_reputation + learn_penalty | ||||
1739 | new_reputation = old_reputation - learn_bonus | ||||
1740 | |||||
1741 | The TxRep plugin currently does track each message individually, hence it | ||||
1742 | does not detect when you learn the message repeatedly. It will add/subtract | ||||
1743 | the penalty/bonus score each time the message is fed to the spam learner. | ||||
1744 | |||||
1745 | =cut | ||||
1746 | ######################################################### plugin hook ##### | ||||
1747 | # spent 15µs within Mail::SpamAssassin::Plugin::TxRep::learner_new which was called:
# once (15µs+0s) by Mail::SpamAssassin::PluginHandler::callback at line 204 of Mail/SpamAssassin/PluginHandler.pm | ||||
1748 | ########################################################################### | ||||
1749 | 1 | 2µs | my ($self) = @_; | ||
1750 | |||||
1751 | 1 | 6µs | $self->{txKeepStoreTied} = 1; | ||
1752 | 1 | 10µs | return $self; | ||
1753 | } | ||||
1754 | |||||
1755 | |||||
1756 | ######################################################### plugin hook ##### | ||||
1757 | sub autolearn { | ||||
1758 | ########################################################################### | ||||
1759 | my ($self, $params) = @_; | ||||
1760 | |||||
1761 | $self->{last_pms} = $params->{permsgstatus}; | ||||
1762 | return $self->{autolearn} = 1; | ||||
1763 | } | ||||
1764 | |||||
1765 | |||||
1766 | ######################################################### plugin hook ##### | ||||
1767 | # spent 665s (30.1ms+665) within Mail::SpamAssassin::Plugin::TxRep::learn_message which was called 234 times, avg 2.84s/call:
# 234 times (30.1ms+665s) by Mail::SpamAssassin::PluginHandler::callback at line 204 of Mail/SpamAssassin/PluginHandler.pm, avg 2.84s/call | ||||
1768 | ########################################################################### | ||||
1769 | 234 | 521µs | my ($self, $params) = @_; | ||
1770 | 234 | 692µs | return 0 unless (defined $params->{isspam}); | ||
1771 | |||||
1772 | 234 | 1.61ms | 234 | 1.54ms | dbg("TxRep: learning a message"); # spent 1.54ms making 234 calls to Mail::SpamAssassin::Logger::dbg, avg 7µs/call |
1773 | 234 | 3.39ms | 234 | 68.8ms | my $pms = ($self->{last_pms})? $self->{last_pms} : Mail::SpamAssassin::PerMsgStatus->new($self->{main}, $params->{msg}); # spent 68.8ms making 234 calls to Mail::SpamAssassin::PerMsgStatus::new, avg 294µs/call |
1774 | 234 | 1.59ms | if (!defined $pms->{relays_internal} && !defined $pms->{relays_external}) { | ||
1775 | 234 | 2.31ms | 234 | 44.9s | $pms->extract_message_metadata(); # spent 44.9s making 234 calls to Mail::SpamAssassin::PerMsgStatus::extract_message_metadata, avg 192ms/call |
1776 | } | ||||
1777 | |||||
1778 | 234 | 1.41ms | if ($params->{isspam}) | ||
1779 | 234 | 1.38ms | {$self->{learning} = $self->{conf}->{txrep_learn_penalty};} | ||
1780 | else {$self->{learning} = -1 * $self->{conf}->{txrep_learn_bonus};} | ||||
1781 | |||||
1782 | 234 | 2.93ms | 234 | 620s | my $ret = !$self->{learning} || $self->check_senders_reputation($pms); # spent 620s making 234 calls to Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation, avg 2.65s/call |
1783 | 234 | 678µs | $self->{learning} = undef; | ||
1784 | 234 | 10.7ms | 79 | 2.90ms | return $ret; # spent 2.90ms making 79 calls to Mail::SpamAssassin::PerMsgStatus::DESTROY, avg 37µs/call |
1785 | } | ||||
1786 | |||||
1787 | |||||
1788 | ######################################################### plugin hook ##### | ||||
1789 | # spent 613s (4.74ms+613) within Mail::SpamAssassin::Plugin::TxRep::forget_message which was called 87 times, avg 7.05s/call:
# 87 times (4.74ms+613s) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1276, avg 7.05s/call | ||||
1790 | ########################################################################### | ||||
1791 | 87 | 325µs | my ($self, $params) = @_; | ||
1792 | 87 | 290µs | return 0 unless ($self->{conf}->{use_txrep}); | ||
1793 | 87 | 304µs | my $pms = ($self->{last_pms})? $self->{last_pms} : Mail::SpamAssassin::PerMsgStatus->new($self->{main}, $params->{msg}); | ||
1794 | |||||
1795 | 87 | 570µs | 87 | 516µs | dbg("TxRep: forgetting a message"); # spent 516µs making 87 calls to Mail::SpamAssassin::Logger::dbg, avg 6µs/call |
1796 | 87 | 242µs | $self->{forgetting} = 1; | ||
1797 | 87 | 829µs | 87 | 0s | my $ret = $self->check_senders_reputation($pms); # spent 613s making 87 calls to Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation, avg 7.05s/call, recursion: max depth 1, sum of overlapping time 613s |
1798 | 87 | 462µs | $self->{forgetting} = undef; | ||
1799 | 87 | 846µs | return $ret; | ||
1800 | } | ||||
1801 | |||||
1802 | |||||
1803 | ######################################################### plugin hook ##### | ||||
1804 | sub learner_expire_old_training { | ||||
1805 | ########################################################################### | ||||
1806 | my ($self, $params) = @_; | ||||
1807 | return 0 unless ($self->{conf}->{use_txrep} && $self->{conf}->{txrep_expiry_days}); | ||||
1808 | |||||
1809 | dbg("TxRep: expiry not implemented yet"); | ||||
1810 | # dbg("TxRep: expiry starting"); | ||||
1811 | # my $timer = $self->{main}->time_method("expire_bayes"); | ||||
1812 | # $self->{store}->expire_old_tokens($params); | ||||
1813 | # dbg("TxRep: expiry completed"); | ||||
1814 | } | ||||
1815 | |||||
1816 | |||||
1817 | ######################################################### plugin hook ##### | ||||
1818 | # spent 5.67s (50µs+5.67) within Mail::SpamAssassin::Plugin::TxRep::learner_close which was called:
# once (50µs+5.67s) by Mail::SpamAssassin::PluginHandler::callback at line 204 of Mail/SpamAssassin/PluginHandler.pm | ||||
1819 | ########################################################################### | ||||
1820 | 1 | 2µs | my ($self, $params) = @_; | ||
1821 | 1 | 3µs | my $quiet = $params->{quiet}; | ||
1822 | 1 | 4µs | return 0 unless ($self->{conf}->{use_txrep}); | ||
1823 | |||||
1824 | 1 | 3µs | $self->{txKeepStoreTied} = undef; | ||
1825 | 1 | 10µs | 1 | 5.67s | $self->finish(); # spent 5.67s making 1 call to Mail::SpamAssassin::Plugin::TxRep::finish |
1826 | 1 | 30µs | 1 | 19µs | dbg("TxRep: learner_close"); # spent 19µs making 1 call to Mail::SpamAssassin::Logger::dbg |
1827 | } | ||||
1828 | |||||
1829 | |||||
1830 | # ------------------------------------------------------------------------- | ||||
1831 | =head1 OPTIMIZING TXREP | ||||
1832 | |||||
1833 | TxRep can be optimized for speed and simplicity, or for the precision in | ||||
1834 | assigning the reputation scores. | ||||
1835 | |||||
1836 | First of all TxRep can be quickly disabled and re-enabled through the option | ||||
1837 | L</C<use_txrep>>. It can be done globally, or individually in each respective | ||||
1838 | C<user_prefs>. Disabling TxRep will not destroy the database, so it can be | ||||
1839 | re-enabled any time later again. | ||||
1840 | |||||
1841 | On many systems, SQL-based storage may perform faster than the default | ||||
1842 | Berkeley DB storage, so you should consider setting it up. See the section | ||||
1843 | L</SQL-BASED STORAGE> for instructions. | ||||
1844 | |||||
1845 | Then there are multiple settings that can reduce the number of records stored | ||||
1846 | in the database, hence reducing the size of the storage, and also the processing | ||||
1847 | time: | ||||
1848 | |||||
1849 | 1. Setting L</C<txrep_user2global_ratio>> to zero will disable the dual storage, | ||||
1850 | halving so the disk space requirements, and the processing times of this plugin. | ||||
1851 | |||||
1852 | 2. You can disable all but one of the L<REPUTATION WEIGHTS>. The EMAIL_IP is | ||||
1853 | the most specific option, so it is the most likely choice in such case, but you | ||||
1854 | could base the reputation system on any of the remaining scores. Each of the | ||||
1855 | enabled reputations adds a new entry to the database for each new identificator. | ||||
1856 | So while for example the number of recorded and scored domains may be big, the | ||||
1857 | number of stored IP addresses will be probably higher, and would require more | ||||
1858 | space in the storage. | ||||
1859 | |||||
1860 | 3. Disabling the L</C<txrep_track_messages>> avoids storing a separate entry | ||||
1861 | for every scanned message, hence also reducing the disk space requirements, and | ||||
1862 | the processing time. | ||||
1863 | |||||
1864 | 4. Disabling the option L</C<txrep_autolearn>> will save the processing time | ||||
1865 | at messages that trigger the auto-learning process. | ||||
1866 | |||||
1867 | 5. Disabling L</C<txrep_whitelist_out>> will reduce the processing time at | ||||
1868 | outbound connections. | ||||
1869 | |||||
1870 | 6. Keeping the option L</C<auto_whitelist_distinguish_signed>> enabled may help | ||||
1871 | slightly reducing the size of the database, because at signed messages, the | ||||
1872 | originating IP address is ignored, hence no additional database entries are | ||||
1873 | needed for each separate IP address (resp. a masked block of IP addresses). | ||||
1874 | |||||
1875 | |||||
1876 | Since TxRep reuses the storage architecture of the former AWL plugin, for | ||||
1877 | initializing the SQL storage, the same instructions apply also to TxRep. | ||||
1878 | Although the old AWL table can be reused for TxRep, by default TxRep expects | ||||
1879 | the SQL table to be named "txrep". | ||||
1880 | |||||
1881 | To install a new SQL table for TxRep, run the appropriate SQL file for your | ||||
1882 | system under the /sql directory. | ||||
1883 | |||||
1884 | If you get a syntax error at an older version of MySQL, use TYPE=MyISAM | ||||
1885 | instead of ENGINE=MyISAM at the end of the command. You can also use other | ||||
1886 | types of ENGINE (depending on what is available on your system). For example | ||||
1887 | MEMORY engine stores the entire table in the server memory, achieving | ||||
1888 | performance similar to Redis. You would need to care about the replication | ||||
1889 | of the RAM table to disk through a cronjob, to avoid loss of data at reboot. | ||||
1890 | The InnoDB engine is used by default, offering high scalability (database | ||||
1891 | size and concurence of accesses). In conjunction with a high value of | ||||
1892 | innodb_buffer_pool or with the memcached plugin (MySQL v5.6+) it can also | ||||
1893 | offer performance comparable to Redis. | ||||
1894 | |||||
1895 | =cut | ||||
1896 | |||||
1897 | 1 | 12µs | 1; | ||
# spent 45.6ms within Mail::SpamAssassin::Plugin::TxRep::CORE:match which was called 9342 times, avg 5µs/call:
# 4414 times (20.1ms+0s) by Mail::SpamAssassin::Plugin::TxRep::get_sender at line 1480, avg 5µs/call
# 3964 times (15.3ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1255, avg 4µs/call
# 642 times (8.54ms+0s) by Mail::SpamAssassin::Plugin::TxRep::ip_to_awl_key at line 1672, avg 13µs/call
# 321 times (1.57ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1245, avg 5µs/call
# once (5µs+0s) by Mail::SpamAssassin::Plugin::TxRep::open_storages at line 1576 | |||||
# spent 54.0ms within Mail::SpamAssassin::Plugin::TxRep::CORE:regcomp which was called 3964 times, avg 14µs/call:
# 3964 times (54.0ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1255, avg 14µs/call | |||||
# spent 18.7ms within Mail::SpamAssassin::Plugin::TxRep::CORE:subst which was called 2529 times, avg 7µs/call:
# 2208 times (16.4ms+0s) by Mail::SpamAssassin::Plugin::TxRep::pack_addr at line 1719, avg 7µs/call
# 321 times (2.33ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1247, avg 7µs/call |