Filename | /usr/local/lib/perl5/site_perl/Mail/SpamAssassin/Plugin/TxRep.pm |
Statements | Executed 266199 statements in 12.1s |
Calls | P | F | Exclusive Time |
Inclusive Time |
Subroutine |
---|---|---|---|---|---|
3216 | 1 | 1 | 1.29s | 1759s | check_reputation | Mail::SpamAssassin::Plugin::TxRep::
468 | 2 | 1 | 471ms | 96340s | check_senders_reputation (recurses: max depth 1, inclusive time 49211s) | Mail::SpamAssassin::Plugin::TxRep::
3216 | 1 | 1 | 335ms | 1.44s | get_sender | Mail::SpamAssassin::Plugin::TxRep::
3216 | 1 | 1 | 300ms | 94571s | open_storages | Mail::SpamAssassin::Plugin::TxRep::
3216 | 7 | 1 | 197ms | 96330s | check_reputations | Mail::SpamAssassin::Plugin::TxRep::
3218 | 2 | 1 | 147ms | 252ms | pack_addr | Mail::SpamAssassin::Plugin::TxRep::
12732 | 10 | 1 | 133ms | 133ms | count | Mail::SpamAssassin::Plugin::TxRep::
6206 | 1 | 1 | 87.6ms | 87.6ms | CORE:regcomp (opcode) | Mail::SpamAssassin::Plugin::TxRep::
1390 | 2 | 1 | 82.6ms | 262ms | add_score | Mail::SpamAssassin::Plugin::TxRep::
14043 | 5 | 1 | 78.1ms | 78.1ms | CORE:match (opcode) | Mail::SpamAssassin::Plugin::TxRep::
1374 | 1 | 1 | 59.1ms | 209ms | remove_score | Mail::SpamAssassin::Plugin::TxRep::
936 | 1 | 1 | 55.2ms | 73.1ms | ip_to_awl_key | Mail::SpamAssassin::Plugin::TxRep::
4896 | 3 | 1 | 51.0ms | 51.0ms | total | Mail::SpamAssassin::Plugin::TxRep::
3686 | 2 | 1 | 34.7ms | 34.7ms | CORE:subst (opcode) | Mail::SpamAssassin::Plugin::TxRep::
234 | 1 | 1 | 34.1ms | 96387s | learn_message | Mail::SpamAssassin::Plugin::TxRep::
1 | 1 | 1 | 18.9ms | 26.5ms | BEGIN@209 | Mail::SpamAssassin::Plugin::TxRep::
234 | 1 | 1 | 12.5ms | 49211s | forget_message | Mail::SpamAssassin::Plugin::TxRep::
2 | 1 | 1 | 251µs | 6.36s | modify_reputation | Mail::SpamAssassin::Plugin::TxRep::
1 | 1 | 1 | 198µs | 848µs | set_config | Mail::SpamAssassin::Plugin::TxRep::
1 | 1 | 1 | 102µs | 1.02ms | new | Mail::SpamAssassin::Plugin::TxRep::
1 | 1 | 1 | 55µs | 67µs | BEGIN@202 | Mail::SpamAssassin::Plugin::TxRep::
1 | 1 | 1 | 55µs | 515µs | BEGIN@207 | Mail::SpamAssassin::Plugin::TxRep::
1 | 1 | 1 | 49µs | 1.12ms | learner_close | Mail::SpamAssassin::Plugin::TxRep::
1 | 1 | 1 | 45µs | 1.06ms | finish | Mail::SpamAssassin::Plugin::TxRep::
1 | 1 | 1 | 26µs | 162µs | BEGIN@210 | Mail::SpamAssassin::Plugin::TxRep::
1 | 1 | 1 | 23µs | 152µs | BEGIN@211 | Mail::SpamAssassin::Plugin::TxRep::
1 | 1 | 1 | 22µs | 102µs | BEGIN@213 | Mail::SpamAssassin::Plugin::TxRep::
1 | 1 | 1 | 22µs | 22µs | __ANON__[:495] | Mail::SpamAssassin::Plugin::TxRep::
1 | 1 | 1 | 22µs | 88µs | BEGIN@205 | Mail::SpamAssassin::Plugin::TxRep::
1 | 1 | 1 | 21µs | 62µs | BEGIN@203 | Mail::SpamAssassin::Plugin::TxRep::
1 | 1 | 1 | 19µs | 19µs | learner_new | Mail::SpamAssassin::Plugin::TxRep::
1 | 1 | 1 | 14µs | 14µs | BEGIN@208 | Mail::SpamAssassin::Plugin::TxRep::
0 | 0 | 0 | 0s | 0s | __ANON__[:306] | Mail::SpamAssassin::Plugin::TxRep::
0 | 0 | 0 | 0s | 0s | __ANON__[:350] | Mail::SpamAssassin::Plugin::TxRep::
0 | 0 | 0 | 0s | 0s | __ANON__[:375] | Mail::SpamAssassin::Plugin::TxRep::
0 | 0 | 0 | 0s | 0s | __ANON__[:398] | Mail::SpamAssassin::Plugin::TxRep::
0 | 0 | 0 | 0s | 0s | __ANON__[:421] | Mail::SpamAssassin::Plugin::TxRep::
0 | 0 | 0 | 0s | 0s | __ANON__[:446] | Mail::SpamAssassin::Plugin::TxRep::
0 | 0 | 0 | 0s | 0s | __ANON__[:527] | Mail::SpamAssassin::Plugin::TxRep::
0 | 0 | 0 | 0s | 0s | __ANON__[:560] | Mail::SpamAssassin::Plugin::TxRep::
0 | 0 | 0 | 0s | 0s | __ANON__[:642] | Mail::SpamAssassin::Plugin::TxRep::
0 | 0 | 0 | 0s | 0s | __ANON__[:763] | Mail::SpamAssassin::Plugin::TxRep::
0 | 0 | 0 | 0s | 0s | __ANON__[:797] | Mail::SpamAssassin::Plugin::TxRep::
0 | 0 | 0 | 0s | 0s | __ANON__[:836] | Mail::SpamAssassin::Plugin::TxRep::
0 | 0 | 0 | 0s | 0s | __ANON__[:862] | Mail::SpamAssassin::Plugin::TxRep::
0 | 0 | 0 | 0s | 0s | __ANON__[:893] | Mail::SpamAssassin::Plugin::TxRep::
0 | 0 | 0 | 0s | 0s | __ANON__[:945] | Mail::SpamAssassin::Plugin::TxRep::
0 | 0 | 0 | 0s | 0s | __ANON__[:998] | Mail::SpamAssassin::Plugin::TxRep::
0 | 0 | 0 | 0s | 0s | _fail_exit | Mail::SpamAssassin::Plugin::TxRep::
0 | 0 | 0 | 0s | 0s | _fn_envelope | Mail::SpamAssassin::Plugin::TxRep::
0 | 0 | 0 | 0s | 0s | _message | Mail::SpamAssassin::Plugin::TxRep::
0 | 0 | 0 | 0s | 0s | autolearn | Mail::SpamAssassin::Plugin::TxRep::
0 | 0 | 0 | 0s | 0s | blacklist_address | Mail::SpamAssassin::Plugin::TxRep::
0 | 0 | 0 | 0s | 0s | learner_expire_old_training | Mail::SpamAssassin::Plugin::TxRep::
0 | 0 | 0 | 0s | 0s | remove_address | Mail::SpamAssassin::Plugin::TxRep::
0 | 0 | 0 | 0s | 0s | whitelist_address | Mail::SpamAssassin::Plugin::TxRep::
Line | State ments |
Time on line |
Calls | Time in subs |
Code |
---|---|---|---|---|---|
1 | # <@LICENSE> | ||||
2 | # Licensed to the Apache Software Foundation (ASF) under one or more | ||||
3 | # contributor license agreements. See the NOTICE file distributed with | ||||
4 | # this work for additional information regarding copyright ownership. | ||||
5 | # The ASF licenses this file to you under the Apache License, Version 2.0 | ||||
6 | # (the "License"); you may not use this file except in compliance with | ||||
7 | # the License. You may obtain a copy of the License at: | ||||
8 | # | ||||
9 | # http://www.apache.org/licenses/LICENSE-2.0 | ||||
10 | # | ||||
11 | # Unless required by applicable law or agreed to in writing, software | ||||
12 | # distributed under the License is distributed on an "AS IS" BASIS, | ||||
13 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||||
14 | # See the License for the specific language governing permissions and | ||||
15 | # limitations under the License. | ||||
16 | # </@LICENSE> | ||||
17 | |||||
18 | |||||
19 | =head1 NAME | ||||
20 | |||||
21 | Mail::SpamAssassin::Plugin::TxRep - Normalize scores with sender reputation records | ||||
22 | |||||
23 | |||||
24 | =head1 SYNOPSIS | ||||
25 | |||||
26 | The TxRep (Reputation) plugin is designed as an improved replacement of the AWL | ||||
27 | (Auto-Whitelist) plugin. It adjusts the final message spam score by looking up | ||||
28 | and taking in consideration the reputation of the sender. | ||||
29 | |||||
30 | To try TxRep out, you B<have to> first disable the AWL plugin (if enabled), and | ||||
31 | back up its database. AWL is loaded in v310.pre and can be disabled by | ||||
32 | commenting out the loadplugin line: | ||||
33 | |||||
34 | # loadplugin Mail::SpamAssassin::Plugin::AWL | ||||
35 | |||||
36 | When AWL is not disabled, TxRep will refuse to run. | ||||
37 | |||||
38 | TxRep should be enabled by uncommenting the following line in v341.pre: | ||||
39 | |||||
40 | loadplugin Mail::SpamAssassin::Plugin::TxRep | ||||
41 | |||||
42 | Use the supplied 60_txreputation.cf file or add these lines to a .cf file: | ||||
43 | |||||
44 | header TXREP eval:check_senders_reputation() | ||||
45 | describe TXREP Score normalizing based on sender's reputation | ||||
46 | tflags TXREP userconf noautolearn | ||||
47 | priority TXREP 1000 | ||||
48 | |||||
49 | |||||
50 | =head1 DESCRIPTION | ||||
51 | |||||
52 | This plugin is intended to replace the former AWL - AutoWhiteList. Although the | ||||
53 | concept and the scope differ, the purpose remains the same - the normalizing of spam | ||||
54 | score results based on previous sender's history. The name was intentionally changed | ||||
55 | from "whitelist" to "reputation" to avoid any confusion, since the result score can | ||||
56 | be adjusted in both directions. | ||||
57 | |||||
58 | The TxRep plugin keeps track of the average SpamAssassin score for senders. | ||||
59 | Senders are tracked using multiple identificators, or their combinations: the From: | ||||
60 | email address, the originating IP and/or an originating block of IPs, sender's domain | ||||
61 | name, the DKIM signature, and the HELO name. TxRep then uses the average score to reduce | ||||
62 | the variability in scoring from message to message, and modifies the final score by | ||||
63 | pushing the result towards the historical average. This improves the accuracy of | ||||
64 | filtering for most email. | ||||
65 | |||||
66 | In comparison with the original AWL plugin, several conceptual changes were implemented | ||||
67 | in TxRep: | ||||
68 | |||||
69 | 1. B<Scoring> - at AWL, although it tracks the number of messages received from each | ||||
70 | respective sender, when calculating the corrective score at a new message, it does | ||||
71 | not take it in count in any way. So for example a sender who previously sent a single | ||||
72 | ham message with the score of -5, and then sends a second one with the score of +10, | ||||
73 | AWL will issue a corrective score bringing the score towards the -5. With the default | ||||
74 | C<auto_whitelist_factor> of 0.5, the resulting score would be only 2.5. And it would be | ||||
75 | exactly the same even if the sender previously sent 1,000 messages with the average of | ||||
76 | -5. TxRep tries to take the maximal advantage of the collected data, and adjusts the | ||||
77 | final score not only with the mean reputation score stored in the database, but also | ||||
78 | respecting the number of messages already seen from the sender. You can see the exact | ||||
79 | formula in the section L</C<txrep_factor>>. | ||||
80 | |||||
81 | 2. B<Learning> - AWL ignores any spam/ham learning. In fact it acts against it, which | ||||
82 | often leads to a frustrating situation, where a user repeatedly tags all messages of a | ||||
83 | given sender as spam (resp. ham), but at any new message from the sender, AWL will | ||||
84 | adjust the score of the message back to the historical average which does B<not> include | ||||
85 | the learned scores. This is now changed at TxRep, and every spam/ham learning will be | ||||
86 | recorded in the reputation database, and hence taken in consideration at future email | ||||
87 | from the respective sender. See the section L</"LEARNING SPAM / HAM"> for more details. | ||||
88 | |||||
89 | 3. B<Auto-Learning> - in certain situations SpamAssassin may declare a message an | ||||
90 | obvious spam resp. ham, and launch the auto-learning process, so that the message can be | ||||
91 | re-evaluated. AWL, by design, did not perform any auto-learning adjustments. This plugin | ||||
92 | will readjust the stored reputation by the value defined by L</C<txrep_learn_penalty>> | ||||
93 | resp. L</C<txrep_learn_bonus>>. Auto-learning score thresholds may be tuned, or the | ||||
94 | auto-learning completely disabled, through the setting L</C<txrep_autolearn>>. | ||||
95 | |||||
96 | 4. B<Relearning> - messages that were wrongly learned or auto-learned, can be relearned. | ||||
97 | Old reputations are removed from the database, and new ones added instead of them. The | ||||
98 | relearning works better when message tracking is enabled through the | ||||
99 | L</C<txrep_track_messages>> option. Without it, the relearned score is simply added to | ||||
100 | the reputation, without removing the old ones. | ||||
101 | |||||
102 | 5. B<Aging> - with AWL, any historical record of given sender has the same weight. It | ||||
103 | means that changes in senders behavior, or modified SA rules may take long time, or | ||||
104 | be virtually negated by the AWL normalization, especially at senders with high count | ||||
105 | of past messages, and low recent frequency. It also turns to be particularly | ||||
106 | counterproductive when the administrator detects new patterns in certain messages, and | ||||
107 | applies new rules to better tag such messages as spam or ham. AWL will practically | ||||
108 | eliminate the effect of the new rules, by adjusting the score back towards the (wrong) | ||||
109 | historical average. Only setting the C<auto_whitelist_factor> lower would help, but in | ||||
110 | the same time it would also reduce the overall impact of AWL, and put doubts on its | ||||
111 | purpose. TxRep, besides the L</C<txrep_factor>> (replacement of the C<auto_whitelist_factor>), | ||||
112 | introduces also the L</C<txrep_dilution_factor>> to help coping with this issue by | ||||
113 | progressively reducing the impact of past records. More details can be found in the | ||||
114 | description of the factor below. | ||||
115 | |||||
116 | 6. B<Blacklisting and Whitelisting> - when a whitelisting or blacklisting was requested | ||||
117 | through SpamAssassin's API, AWL adjusts the historical total score of the plain email | ||||
118 | address without IP (and deleted records bound to an IP), but since during the reception | ||||
119 | new records with IP will be added, the blacklisted entry would cease acting during | ||||
120 | scanning. TxRep always uses the record of the plain email address without IP together | ||||
121 | with the one bound to an IP address, DKIM signature, or SPF pass (unless the weight | ||||
122 | factor for the EMAIL reputation is set to zero). AWL uses the score of 100 (resp. -100) | ||||
123 | for the blacklisting (resp. whitelisting) purposes. TxRep increases the value | ||||
124 | proportionally to the weight factor of the EMAIL reputation. It is explained in details | ||||
125 | in the section L</BLACKLISTING / WHITELISTING>. TxRep can blacklist or whitelist also | ||||
126 | IP addresses, domain names, and dotless HELO names. | ||||
127 | |||||
128 | 7. B<Sender Identification> - AWL identifies a sender on the basis of the email address | ||||
129 | used, and the originating IP address (better told its part defined by the mask setting). | ||||
130 | The main purpose of this measure is to avoid assigning false good scores to spammers who | ||||
131 | spoof known email addresses. The disadvantage appears at senders who send from frequently | ||||
132 | changing locations or even when connecting through dynamical IP addresses that are not | ||||
133 | within the block defined by the mask setting. Their score is difficult or sometimes | ||||
134 | impossible to track. Another disadvantage is, for example, at a spammer persistently | ||||
135 | sending spam from the same IP address, just under different email addresses. AWL will not | ||||
136 | find his previous scores, unless he reuses the same email address again. TxRep uses several | ||||
137 | identificators, and creates separate database entries for each of them. It tracks not only | ||||
138 | the email/IP address combination like AWL, but also the standalone email address (regardless | ||||
139 | of the originating IP), the standalone IP (regardless of email address used), the domain | ||||
140 | name of the email address, the DKIM signature, and the HELO name of the connecting PC. The | ||||
141 | influence of each individual identificator may be tuned up with the help of weight factors | ||||
142 | described in the section L</REPUTATION WEIGHTS>. | ||||
143 | |||||
144 | 8. B<Message Tracking> - TxRep (optionally) keeps track of already scanned and/or learned | ||||
145 | message ID's. This is useful for avoiding to strengthen the reputation score by simply | ||||
146 | rescanning or relearning the same message multiple times. In the same time it also allows | ||||
147 | the proper relearning of once wrongly learned messages, or relearning them after the | ||||
148 | learn penalty or bonus were changed. See the option L</C<txrep_track_messages>>. | ||||
149 | |||||
150 | 9. B<User and Global Storages> - usually it is recommended to use the per-user setup | ||||
151 | of SpamAssassin, because each user may have quite different requirements, and may receive | ||||
152 | quite different sort of email. Especially when using the Bayesian and AWL plugins, | ||||
153 | the efficiency is much better when SpamAssassin is learned spam and ham separately | ||||
154 | for each user. However, the disadvantage is that senders and emails already learned | ||||
155 | many times by different users, will need to be relearned without any recognized history, | ||||
156 | anytime they arrive to another user. TxRep uses the advantages of both systems. It can | ||||
157 | use dual storages: the global common storage, where all email processed by SpamAssassin | ||||
158 | is recorded, and a local storage separate for each user, with reputation data from his | ||||
159 | email only. See more details at the setting L</C<txrep_user2global_ratio>>. | ||||
160 | |||||
161 | 10. B<Outbound Whitelisting> - when a local user sends messages to an email address, we | ||||
162 | assume that he needs to see the eventual answer too, hence the recipient's address should | ||||
163 | be whitelisted. When SpamAssassin is used for scanning outgoing email too, when local | ||||
164 | users use the SMTP server where SA is installed, for sending email, and when internal | ||||
165 | networks are defined, TxREP will improve the reputation of all 'To:' and 'CC' addresses | ||||
166 | from messages originating in the internal networks. Details can be found at the setting | ||||
167 | L</C<txrep_whitelist_out>>. | ||||
168 | |||||
169 | Both plugins (AWL and TxREP) cannot coexist. It is necessary to disable the AWL to allow | ||||
170 | TxRep running. TxRep reuses the database handling of the original AWL module, and some | ||||
171 | its parameters bound to the database handler modules. By default, TxRep creates its own | ||||
172 | database, but the original auto-whitelist can be reused as a starting point. The AWL | ||||
173 | database can be renamed to the name defined in TxRep settings, and TxRep will start | ||||
174 | using it. The original auto-whitelist database has to be backed up, to allow switching | ||||
175 | back to the original state. | ||||
176 | |||||
177 | The spamassassin/Plugin/TxRep.pm file replaces both spamassassin/Plugin/AWL.pm and | ||||
178 | spamassassin/AutoWhitelist.pm. Another two AWL files, spamassassin/DBBasedAddrList.pm | ||||
179 | and spamassassin/SQLBasedAddrList.pm are still needed. | ||||
180 | |||||
181 | |||||
182 | =head1 TEMPLATE TAGS | ||||
183 | |||||
184 | This plugin module adds the following C<tags> that can be used as | ||||
185 | placeholders in certain options. See L<Mail::SpamAssassin::Conf> | ||||
186 | for more information on TEMPLATE TAGS. | ||||
187 | |||||
188 | _TXREP_XXX_Y_ TXREP modifier | ||||
189 | _TXREP_XXX_Y_MEAN_ Mean score on which TXREP modification is based | ||||
190 | _TXREP_XXX_Y_COUNT_ Number of messages on which TXREP modification is based | ||||
191 | _TXREP_XXX_Y_PRESCORE_ Score before TXREP | ||||
192 | _TXREP_XXX_Y_UNKNOW_ New sender (not found in the TXREP list) | ||||
193 | |||||
194 | The XXX part of the tag takes the form of one of the following IDs, depending | ||||
195 | on the reputation checked: EMAIL, EMAIL_IP, IP, DOMAIN, or HELO. The _Y appendix | ||||
196 | ID is used only in the case of dual storage, and takes the form of either _U (for | ||||
197 | user storage reputations), or _G (for global storage reputations). | ||||
198 | |||||
199 | =cut # .................................................................... | ||||
200 | package Mail::SpamAssassin::Plugin::TxRep; | ||||
201 | |||||
202 | 2 | 64µs | 2 | 79µs | # spent 67µs (55+12) within Mail::SpamAssassin::Plugin::TxRep::BEGIN@202 which was called:
# once (55µs+12µs) by Mail::SpamAssassin::PluginHandler::load_plugin at line 202 # spent 67µs making 1 call to Mail::SpamAssassin::Plugin::TxRep::BEGIN@202
# spent 12µs making 1 call to strict::import |
203 | 2 | 63µs | 2 | 103µs | # spent 62µs (21+41) within Mail::SpamAssassin::Plugin::TxRep::BEGIN@203 which was called:
# once (21µs+41µs) by Mail::SpamAssassin::PluginHandler::load_plugin at line 203 # spent 62µs making 1 call to Mail::SpamAssassin::Plugin::TxRep::BEGIN@203
# spent 41µs making 1 call to warnings::import |
204 | # use bytes; | ||||
205 | 2 | 68µs | 2 | 155µs | # spent 88µs (22+67) within Mail::SpamAssassin::Plugin::TxRep::BEGIN@205 which was called:
# once (22µs+67µs) by Mail::SpamAssassin::PluginHandler::load_plugin at line 205 # spent 88µs making 1 call to Mail::SpamAssassin::Plugin::TxRep::BEGIN@205
# spent 67µs making 1 call to re::import |
206 | |||||
207 | 3 | 129µs | 3 | 975µs | # spent 515µs (55+460) within Mail::SpamAssassin::Plugin::TxRep::BEGIN@207 which was called:
# once (55µs+460µs) by Mail::SpamAssassin::PluginHandler::load_plugin at line 207 # spent 515µs making 1 call to Mail::SpamAssassin::Plugin::TxRep::BEGIN@207
# spent 430µs making 1 call to NetAddr::IP::import
# spent 30µs making 1 call to version::_VERSION |
208 | 2 | 52µs | 1 | 14µs | # spent 14µs within Mail::SpamAssassin::Plugin::TxRep::BEGIN@208 which was called:
# once (14µs+0s) by Mail::SpamAssassin::PluginHandler::load_plugin at line 208 # spent 14µs making 1 call to Mail::SpamAssassin::Plugin::TxRep::BEGIN@208 |
209 | 2 | 404µs | 1 | 26.5ms | # spent 26.5ms (18.9+7.59) within Mail::SpamAssassin::Plugin::TxRep::BEGIN@209 which was called:
# once (18.9ms+7.59ms) by Mail::SpamAssassin::PluginHandler::load_plugin at line 209 # spent 26.5ms making 1 call to Mail::SpamAssassin::Plugin::TxRep::BEGIN@209 |
210 | 2 | 70µs | 2 | 298µs | # spent 162µs (26+136) within Mail::SpamAssassin::Plugin::TxRep::BEGIN@210 which was called:
# once (26µs+136µs) by Mail::SpamAssassin::PluginHandler::load_plugin at line 210 # spent 162µs making 1 call to Mail::SpamAssassin::Plugin::TxRep::BEGIN@210
# spent 136µs making 1 call to Exporter::import |
211 | 2 | 65µs | 2 | 280µs | # spent 152µs (23+128) within Mail::SpamAssassin::Plugin::TxRep::BEGIN@211 which was called:
# once (23µs+128µs) by Mail::SpamAssassin::PluginHandler::load_plugin at line 211 # spent 152µs making 1 call to Mail::SpamAssassin::Plugin::TxRep::BEGIN@211
# spent 128µs making 1 call to Exporter::import |
212 | |||||
213 | 2 | 11.9ms | 2 | 182µs | # spent 102µs (22+80) within Mail::SpamAssassin::Plugin::TxRep::BEGIN@213 which was called:
# once (22µs+80µs) by Mail::SpamAssassin::PluginHandler::load_plugin at line 213 # spent 102µs making 1 call to Mail::SpamAssassin::Plugin::TxRep::BEGIN@213
# spent 80µs making 1 call to vars::import |
214 | 1 | 18µs | @ISA = qw(Mail::SpamAssassin::Plugin); | ||
215 | |||||
216 | |||||
217 | ########################################################################### | ||||
218 | # spent 1.02ms (102µs+916µs) within Mail::SpamAssassin::Plugin::TxRep::new which was called:
# once (102µs+916µs) by Mail::SpamAssassin::PluginHandler::load_plugin at line 1 of (eval 42)[Mail/SpamAssassin/PluginHandler.pm:129] | ||||
219 | ########################################################################### | ||||
220 | 1 | 3µs | my ($class, $main) = @_; | ||
221 | |||||
222 | 1 | 2µs | $class = ref($class) || $class; | ||
223 | 1 | 13µs | 1 | 25µs | my $self = $class->SUPER::new($main); # spent 25µs making 1 call to Mail::SpamAssassin::Plugin::new |
224 | 1 | 2µs | bless($self, $class); | ||
225 | |||||
226 | 1 | 13µs | $self->{main} = $main; | ||
227 | 1 | 3µs | $self->{conf} = $main->{conf}; | ||
228 | 1 | 3µs | $self->{factor} = $main->{conf}->{txrep_factor}; | ||
229 | 1 | 3µs | $self->{ipv4_mask_len} = $main->{conf}->{txrep_ipv4_mask_len}; | ||
230 | 1 | 2µs | $self->{ipv6_mask_len} = $main->{conf}->{txrep_ipv6_mask_len}; | ||
231 | 1 | 11µs | 1 | 32µs | $self->register_eval_rule("check_senders_reputation"); # spent 32µs making 1 call to Mail::SpamAssassin::Plugin::register_eval_rule |
232 | 1 | 8µs | 1 | 848µs | $self->set_config($main->{conf}); # spent 848µs making 1 call to Mail::SpamAssassin::Plugin::TxRep::set_config |
233 | |||||
234 | # only the default conf loaded here, do nothing here requiring | ||||
235 | # the runtime settings | ||||
236 | 1 | 8µs | 1 | 10µs | dbg("TxRep: new object created"); # spent 10µs making 1 call to Mail::SpamAssassin::Logger::dbg |
237 | 1 | 9µs | return $self; | ||
238 | } | ||||
239 | |||||
240 | |||||
241 | ########################################################################### | ||||
242 | # spent 848µs (198+650) within Mail::SpamAssassin::Plugin::TxRep::set_config which was called:
# once (198µs+650µs) by Mail::SpamAssassin::Plugin::TxRep::new at line 232 | ||||
243 | ########################################################################### | ||||
244 | 1 | 2µs | my($self, $conf) = @_; | ||
245 | 1 | 2µs | my @cmds; | ||
246 | |||||
247 | # ------------------------------------------------------------------------- | ||||
248 | =head1 USER PREFERENCES | ||||
249 | |||||
250 | The following options can be used in both site-wide (C<local.cf>) and | ||||
251 | user-specific (C<user_prefs>) configuration files to customize how | ||||
252 | SpamAssassin handles incoming email messages. | ||||
253 | |||||
254 | =over 4 | ||||
255 | |||||
256 | =item B<use_txrep> | ||||
257 | |||||
258 | 0 | 1 (default: 0) | ||||
259 | |||||
260 | Whether to use TxRep reputation system. TxRep tracks the long-term average | ||||
261 | score for each sender and then shifts the score of new messages toward that | ||||
262 | long-term average. This can increase or decrease the score for messages, | ||||
263 | depending on the long-term behavior of the particular correspondent. | ||||
264 | |||||
265 | Note that certain tests are ignored when determining the final message score: | ||||
266 | |||||
267 | - rules with tflags set to 'noautolearn' | ||||
268 | |||||
269 | =cut # ................................................................... | ||||
270 | 1 | 8µs | push (@cmds, { | ||
271 | setting => 'use_txrep', | ||||
272 | default => 0, | ||||
273 | type => $Mail::SpamAssassin::Conf::CONF_TYPE_BOOL | ||||
274 | }); | ||||
275 | |||||
276 | |||||
277 | # ------------------------------------------------------------------------- | ||||
278 | =item B<txrep_factor> | ||||
279 | |||||
280 | range [0..1] (default: 0.5) | ||||
281 | |||||
282 | How much towards the long-term mean for the sender to regress a message. | ||||
283 | Basically, the algorithm is to track the long-term total score and the count | ||||
284 | of messages for the sender (C<total> and C<count>), and then once we have | ||||
285 | otherwise fully calculated the score for this message (C<score>), we calculate | ||||
286 | the final score for the message as: | ||||
287 | |||||
288 | finalscore = score + factor * (total + score)/(count + 1) | ||||
289 | |||||
290 | So if C<factor> = 0.5, then we'll move to half way between the calculated | ||||
291 | score and the new mean value. If C<factor> = 0.3, then we'll move about 1/3 | ||||
292 | of the way from the score toward the mean. C<factor> = 1 means use the | ||||
293 | long-term mean including also the new unadjusted score; C<factor> = 0 mean | ||||
294 | just use the calculated score, disabling so the score averaging, though still | ||||
295 | recording the reputation to the database. | ||||
296 | |||||
297 | =cut # ................................................................... | ||||
298 | push (@cmds, { | ||||
299 | setting => 'txrep_factor', | ||||
300 | default => 0.5, | ||||
301 | type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC, | ||||
302 | code => sub { | ||||
303 | my ($self, $key, $value, $line) = @_; | ||||
304 | if ($value < 0 || $value > 1.0) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;} | ||||
305 | $self->{txrep_factor} = $value; | ||||
306 | } | ||||
307 | 1 | 13µs | }); | ||
308 | |||||
309 | |||||
310 | # ------------------------------------------------------------------------- | ||||
311 | =item B<txrep_dilution_factor> | ||||
312 | |||||
313 | range [0.7..1.0] (default: 0.98) | ||||
314 | |||||
315 | At any new email from given sender, the historical reputation records are "diluted", | ||||
316 | or "watered down" by certain fraction given by this factor. It means that the | ||||
317 | influence of old records will progressively diminish with every new message from | ||||
318 | given sender. This is important to allow a more flexible handling of changes in | ||||
319 | sender's behavior, or new improvements or changes of local SA rules. | ||||
320 | |||||
321 | Without any dilution expiry (dilution factor set to 1), the new message score is | ||||
322 | simply add to the total score of given sender in the reputation database. When | ||||
323 | dilution is used (factor < 1), the impact of the historical reputation average is | ||||
324 | reduced by the factor before calculating the new average, which in turn is then | ||||
325 | used to adjust the new total score to be stored in the database. | ||||
326 | |||||
327 | newtotal = (oldcount + 1) * (newscore + dilution * oldtotal) / (dilution * oldcount + 1) | ||||
328 | |||||
329 | In other words, it means that the older a message is, the less and less impact | ||||
330 | on the new average its original spam score has. For example if we set the factor | ||||
331 | to 0.9 (meaning dilution by 10%), the score of the new message will be recorded | ||||
332 | to its 100%, the last score of the same sender to 90%, the second last to 81% | ||||
333 | (0.9 * 0.9 = 0.81), and for example the 10th last message just to 35%. | ||||
334 | |||||
335 | At stable systems, we recommend keeping the factor close to 1 (but still lower | ||||
336 | than 1). At systems where SA rules tuning and spam learning is still in progress, | ||||
337 | lower factors will help the reputation to quicker adapt any modifications. In | ||||
338 | the same time, it will also reduce the impact of the historical reputation | ||||
339 | though. | ||||
340 | |||||
341 | =cut # ................................................................... | ||||
342 | push (@cmds, { | ||||
343 | setting => 'txrep_dilution_factor', | ||||
344 | default => 0.98, | ||||
345 | type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC, | ||||
346 | code => sub { | ||||
347 | my ($self, $key, $value, $line) = @_; | ||||
348 | if ($value < 0.7 || $value > 1.0) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;} | ||||
349 | $self->{txrep_dilution_factor} = $value; | ||||
350 | } | ||||
351 | 1 | 8µs | }); | ||
352 | |||||
353 | |||||
354 | # TODO, not implemented yet, hence no advertising until then | ||||
355 | # ------------------------------------------------------------------------- | ||||
356 | #=item B<txrep_expiry_days> | ||||
357 | # | ||||
358 | # range [0..10000] (default: 365) | ||||
359 | # | ||||
360 | #The scores of of messages can be removed from the total reputation, and the | ||||
361 | #message tracking entry removed from the database after given number of days. | ||||
362 | #It helps keeping the database growth under control, and it also reduces the | ||||
363 | #influence of old scores on the current reputation (both scoring methods, and | ||||
364 | #sender's behavior might have changed over time). | ||||
365 | # | ||||
366 | #=cut # ................................................................... | ||||
367 | push (@cmds, { | ||||
368 | setting => 'txrep_expiry_days', | ||||
369 | default => 365, | ||||
370 | type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC, | ||||
371 | code => sub { | ||||
372 | my ($self, $key, $value, $line) = @_; | ||||
373 | if ($value < 0 || $value > 10000) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;} | ||||
374 | $self->{txrep_expiry_days} = $value; | ||||
375 | } | ||||
376 | 1 | 7µs | }); | ||
377 | |||||
378 | |||||
379 | # ------------------------------------------------------------------------- | ||||
380 | =item B<txrep_learn_penalty> | ||||
381 | |||||
382 | range [0..200] (default: 20) | ||||
383 | |||||
384 | When SpamAssassin is trained a SPAM message, the given penalty score will | ||||
385 | be added to the total reputation score of the sender, regardless of the real | ||||
386 | spam score. The impact of the penalty will be the smaller the higher is the | ||||
387 | number of messages that the sender already has in the TxRep database. | ||||
388 | |||||
389 | =cut # ................................................................... | ||||
390 | push (@cmds, { | ||||
391 | setting => 'txrep_learn_penalty', | ||||
392 | default => 20, | ||||
393 | type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC, | ||||
394 | code => sub { | ||||
395 | my ($self, $key, $value, $line) = @_; | ||||
396 | if ($value < 0 || $value > 200) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;} | ||||
397 | $self->{txrep_learn_penalty} = $value; | ||||
398 | } | ||||
399 | 1 | 8µs | }); | ||
400 | |||||
401 | |||||
402 | # ------------------------------------------------------------------------- | ||||
403 | =item B<txrep_learn_bonus> | ||||
404 | |||||
405 | range [0..200] (default: 20) | ||||
406 | |||||
407 | When SpamAssassin is trained a HAM message, the given penalty score will be | ||||
408 | deduced from the total reputation score of the sender, regardless of the real | ||||
409 | spam score. The impact of the penalty will be the smaller the higher is the | ||||
410 | number of messages that the sender already has in the TxRep database. | ||||
411 | |||||
412 | =cut # ................................................................... | ||||
413 | push (@cmds, { | ||||
414 | setting => 'txrep_learn_bonus', | ||||
415 | default => 20, | ||||
416 | type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC, | ||||
417 | code => sub { | ||||
418 | my ($self, $key, $value, $line) = @_; | ||||
419 | if ($value < 0 || $value > 200) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;} | ||||
420 | $self->{txrep_learn_bonus} = $value; | ||||
421 | } | ||||
422 | 1 | 7µs | }); | ||
423 | |||||
424 | |||||
425 | # ------------------------------------------------------------------------- | ||||
426 | =item B<txrep_autolearn> | ||||
427 | |||||
428 | range [0..5] (default: 0) | ||||
429 | |||||
430 | When SpamAssassin declares a message a clear spam resp. ham during the mesage | ||||
431 | scan, and launches the auto-learn process, sender reputation scores of given | ||||
432 | message will be adjusted by the value of the option L</C<txrep_learn_penalty>>, | ||||
433 | resp. the L</C<txrep_learn_bonus>> in the same way as during the manual learning. | ||||
434 | Value 0 at this option disables the auto-learn reputation adjustment - only the | ||||
435 | score calculated before the auto-learn will be stored to the reputation database. | ||||
436 | |||||
437 | =cut # ................................................................... | ||||
438 | push (@cmds, { | ||||
439 | setting => 'txrep_autolearn', | ||||
440 | default => 0, | ||||
441 | type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC, | ||||
442 | code => sub { | ||||
443 | my ($self, $key, $value, $line) = @_; | ||||
444 | if ($value < 0 || $value > 5) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;} | ||||
445 | $self->{txrep_autolearn} = $value; | ||||
446 | } | ||||
447 | 1 | 8µs | }); | ||
448 | |||||
449 | |||||
450 | # ------------------------------------------------------------------------- | ||||
451 | =item B<txrep_track_messages> | ||||
452 | |||||
453 | 0 | 1 (default: 1) | ||||
454 | |||||
455 | Whether TxRep should keep track of already scanned and/or learned messages. | ||||
456 | When enabled, an additional record in the reputation database will be created | ||||
457 | to avoid false score adjustments due to repeated scanning of the same message, | ||||
458 | and to allow proper relearning of messages that were either previously wrongly | ||||
459 | learned, or need to be relearned after modifying the learn penalty or bonus. | ||||
460 | |||||
461 | =cut # ................................................................... | ||||
462 | 1 | 3µs | push (@cmds, { | ||
463 | setting => 'txrep_track_messages', | ||||
464 | default => 1, | ||||
465 | type => $Mail::SpamAssassin::Conf::CONF_TYPE_BOOL | ||||
466 | }); | ||||
467 | |||||
468 | |||||
469 | # ------------------------------------------------------------------------- | ||||
470 | =item B<txrep_whitelist_out> | ||||
471 | |||||
472 | range [0..200] (default: 10) | ||||
473 | |||||
474 | When the value of this setting is greater than zero, recipients of messages sent from | ||||
475 | within the internal networks will be whitelisted through improving their total reputation | ||||
476 | score with the number of points defined by this setting. Since the IP address and other | ||||
477 | sender identificators are not known when sending the email, only the reputation of the | ||||
478 | standalone email is being whitelisted. The domain name is intentionally also left | ||||
479 | unaffected. The outbound whitelisting can only work when SpamAssassin is set up to scan | ||||
480 | also outgoing email, when local users use the SMTP server for sending email, and when | ||||
481 | C<internal_networks> are defined in SpamAssassin configuration. The improving of the | ||||
482 | reputation happens at every message sent from internal networks, so the more messages is | ||||
483 | being sent to the recipient, the better reputation his email address will have. | ||||
484 | |||||
485 | |||||
486 | =cut # ................................................................... | ||||
487 | push (@cmds, { | ||||
488 | setting => 'txrep_whitelist_out', | ||||
489 | default => 10, | ||||
490 | type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC, | ||||
491 | # spent 22µs within Mail::SpamAssassin::Plugin::TxRep::__ANON__[/usr/local/lib/perl5/site_perl/Mail/SpamAssassin/Plugin/TxRep.pm:495] which was called:
# once (22µs+0s) by Mail::SpamAssassin::Conf::Parser::parse at line 438 of Mail/SpamAssassin/Conf/Parser.pm | ||||
492 | 1 | 6µs | my ($self, $key, $value, $line) = @_; | ||
493 | 1 | 4µs | if ($value < 0 || $value > 200) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;} | ||
494 | 1 | 16µs | $self->{txrep_whitelist_out} = $value; | ||
495 | } | ||||
496 | 1 | 8µs | }); | ||
497 | |||||
498 | |||||
499 | # ------------------------------------------------------------------------- | ||||
500 | =item B<txrep_ipv4_mask_len> | ||||
501 | |||||
502 | range [0..32] (default: 16) | ||||
503 | |||||
504 | The AWL database keeps only the specified number of most-significant bits | ||||
505 | of an IPv4 address in its fields, so that different individual IP addresses | ||||
506 | within a subnet belonging to the same owner are managed under a single | ||||
507 | database record. As we have no information available on the allocated | ||||
508 | address ranges of senders, this CIDR mask length is only an approximation. | ||||
509 | The default is 16 bits, corresponding to a former class B. Increase the | ||||
510 | number if a finer granularity is desired, e.g. to 24 (class C) or 32. | ||||
511 | A value 0 is allowed but is not particularly useful, as it would treat the | ||||
512 | whole internet as a single organization. The number need not be a multiple | ||||
513 | of 8, any split is allowed. | ||||
514 | |||||
515 | =cut # ................................................................... | ||||
516 | push (@cmds, { | ||||
517 | setting => 'txrep_ipv4_mask_len', | ||||
518 | default => 16, | ||||
519 | type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC, | ||||
520 | code => sub { | ||||
521 | my ($self, $key, $value, $line) = @_; | ||||
522 | if (!defined $value || $value eq '') | ||||
523 | {return $Mail::SpamAssassin::Conf::MISSING_REQUIRED_VALUE;} | ||||
524 | elsif ($value !~ /^\d+$/ || $value < 0 || $value > 32) | ||||
525 | {return $Mail::SpamAssassin::Conf::INVALID_VALUE;} | ||||
526 | $self->{txrep_ipv4_mask_len} = $value; | ||||
527 | } | ||||
528 | 1 | 7µs | }); | ||
529 | |||||
530 | |||||
531 | # ------------------------------------------------------------------------- | ||||
532 | =item B<txrep_ipv6_mask_len> | ||||
533 | |||||
534 | range [0..128] (default: 48) | ||||
535 | |||||
536 | The AWL database keeps only the specified number of most-significant bits | ||||
537 | of an IPv6 address in its fields, so that different individual IP addresses | ||||
538 | within a subnet belonging to the same owner are managed under a single | ||||
539 | database record. As we have no information available on the allocated address | ||||
540 | ranges of senders, this CIDR mask length is only an approximation. The default | ||||
541 | is 48 bits, corresponding to an address range commonly allocated to individual | ||||
542 | (smaller) organizations. Increase the number for a finer granularity, e.g. | ||||
543 | to 64 or 96 or 128, or decrease for wider ranges, e.g. 32. A value 0 is | ||||
544 | allowed but is not particularly useful, as it would treat the whole internet | ||||
545 | as a single organization. The number need not be a multiple of 4, any split | ||||
546 | is allowed. | ||||
547 | |||||
548 | =cut # ................................................................... | ||||
549 | push (@cmds, { | ||||
550 | setting => 'txrep_ipv6_mask_len', | ||||
551 | default => 48, | ||||
552 | type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC, | ||||
553 | code => sub { | ||||
554 | my ($self, $key, $value, $line) = @_; | ||||
555 | if (!defined $value || $value eq '') | ||||
556 | {return $Mail::SpamAssassin::Conf::MISSING_REQUIRED_VALUE;} | ||||
557 | elsif ($value !~ /^\d+$/ || $value < 0 || $value > 128) | ||||
558 | {return $Mail::SpamAssassin::Conf::INVALID_VALUE;} | ||||
559 | $self->{txrep_ipv6_mask_len} = $value; | ||||
560 | } | ||||
561 | 1 | 7µs | }); | ||
562 | |||||
563 | |||||
564 | # ------------------------------------------------------------------------- | ||||
565 | =item B<user_awl_sql_override_username> | ||||
566 | |||||
567 | string (default: undefined) | ||||
568 | |||||
569 | Used by the SQLBasedAddrList storage implementation. | ||||
570 | |||||
571 | If this option is set the SQLBasedAddrList module will override the set | ||||
572 | username with the value given. This can be useful for implementing global | ||||
573 | or group based TxRep databases. | ||||
574 | |||||
575 | =cut # ................................................................... | ||||
576 | 1 | 3µs | push (@cmds, { | ||
577 | setting => 'user_awl_sql_override_username', | ||||
578 | default => '', | ||||
579 | type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING | ||||
580 | }); | ||||
581 | |||||
582 | |||||
583 | # ------------------------------------------------------------------------- | ||||
584 | =item B<txrep_user2global_ratio> | ||||
585 | |||||
586 | range [0..10] (default: 0) | ||||
587 | |||||
588 | When the option txrep_user2global_ratio is set to a value greater than zero, and | ||||
589 | if the server configuration allows it, two data storages will be used - user and | ||||
590 | global (server-wide) storages. | ||||
591 | |||||
592 | User storage keeps only senders who send messages to the respective recipient, | ||||
593 | and will reflect also the corrected/learned scores, when some messages are marked | ||||
594 | by the user as spam or ham, or when the sender is whitelisted or blacklisted | ||||
595 | through the API of SpamAssassin. | ||||
596 | |||||
597 | Global storage keeps the reputation data of all messages processed by SpamAssassin | ||||
598 | with their spam scores and spam/ham learning data from all users on the server. | ||||
599 | Hence, the module will return a reputation value even at senders not known to the | ||||
600 | current recipient, as long as he already sent email to anyone else on the server. | ||||
601 | |||||
602 | The value of the txrep_user2global_ratio parameter controls the impact of each | ||||
603 | of the two reputations. When equal to 1, both the global and the user score will | ||||
604 | have the same impact on the result. When set to 2, the reputation taken from | ||||
605 | the user storage will have twice the impact of the global value. The final value | ||||
606 | of the TXREP tag will be calculated as follows: | ||||
607 | |||||
608 | total = ( ratio * user + global ) / ( ratio + 1 ) | ||||
609 | |||||
610 | When no reputation is found in the user storage, and a global reputation is | ||||
611 | available, the global storage is used fully, without applying the ratio. | ||||
612 | |||||
613 | When the ratio is set to zero, only the default storage will be used. And it | ||||
614 | then depends whether you use the global, or the local user storage by default, | ||||
615 | which in turn is controlled either by the parameter user_awl_sql_override_username | ||||
616 | (in case of SQL storage), or the C</auto_whitelist_path> parameter (in case of | ||||
617 | Berkeley database). | ||||
618 | |||||
619 | When this dual storage is enabled, and no global storage is defined by the | ||||
620 | above mentioned parameters for the Berkeley or SQL databases, TxRep will attempt | ||||
621 | to use a generic storage - user 'GLOBAL' in case of SQL, and in the case of | ||||
622 | Berkeley database it uses the path defined by '__local_state_dir__/tx-reputation', | ||||
623 | which typically renders into /var/db/spamassassin/tx-reputation. When the default | ||||
624 | storages are not available, or are not writable, you would have to set the global | ||||
625 | storage with the help of the C<user_awl_sql_override_username> resp. | ||||
626 | C<auto_whitelist_path settings>. | ||||
627 | |||||
628 | Please note that some SpamAssassin installations run always under the same user | ||||
629 | ID. In such case it is pointless enabling the dual storage, because it would | ||||
630 | maximally lead to two identical global storages in different locations. | ||||
631 | |||||
632 | This feature is disabled by default. | ||||
633 | =cut # ................................................................... | ||||
634 | push (@cmds, { | ||||
635 | setting => 'txrep_user2global_ratio', | ||||
636 | default => 0, | ||||
637 | type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING, | ||||
638 | code => sub { | ||||
639 | my ($self, $key, $value, $line) = @_; | ||||
640 | if ($value < 0 || $value > 10) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;} | ||||
641 | $self->{txrep_user2global_ratio} = $value; | ||||
642 | } | ||||
643 | 1 | 7µs | }); | ||
644 | |||||
645 | |||||
646 | # ------------------------------------------------------------------------- | ||||
647 | =item B<auto_whitelist_distinguish_signed> | ||||
648 | |||||
649 | (default: 1 - enabled) | ||||
650 | |||||
651 | Used by the SQLBasedAddrList storage implementation. | ||||
652 | |||||
653 | If this option is set the SQLBasedAddrList module will keep separate | ||||
654 | database entries for DKIM-validated e-mail addresses and for non-validated | ||||
655 | ones. Without this option, or for domains that do not use a DKIM signature, | ||||
656 | the reputation of legitimate email can get mixed with the reputation of | ||||
657 | forgeries. A pre-requisite when setting this option is that a field | ||||
658 | txrep.signedby exists in a SQL table, otherwise SQL operations will fail. | ||||
659 | A DKIM plugin must also be enabled in order for this option to take effect. | ||||
660 | This option is highly recommended. Unless you are using a pre-3.3.0 database | ||||
661 | schema and cannot upgrade, there is no reason to disable this option. If | ||||
662 | you are upgrading from AWL and using a pre-3.3.0 schema, the txrep.signedby | ||||
663 | column will not exist. It is recommended that you add this column, but if | ||||
664 | that is not possible you must set this option to 0 to avoid SQL errors. | ||||
665 | |||||
666 | =cut # ................................................................... | ||||
667 | 1 | 3µs | push (@cmds, { | ||
668 | setting => 'auto_whitelist_distinguish_signed', | ||||
669 | default => 1, | ||||
670 | type => $Mail::SpamAssassin::Conf::CONF_TYPE_BOOL | ||||
671 | }); | ||||
672 | |||||
673 | |||||
674 | =item B<txrep_spf> | ||||
675 | |||||
676 | 0 | 1 (default: 1) | ||||
677 | |||||
678 | When enabled, TxRep will treat any IP address using a given email address as | ||||
679 | the same authorized identity, and will not associate any IP address with it. | ||||
680 | (The same happens with valid DKIM signatures. No option available for DKIM). | ||||
681 | |||||
682 | Note: at domains that define the useless SPF +all (pass all), no IP would be | ||||
683 | ever associated with the email address, and all addresses (incl. the froged | ||||
684 | ones) would be treated as coming from the authorized source. However, such | ||||
685 | domains are hopefuly rare, and ask for this kind of treatment anyway. | ||||
686 | |||||
687 | =back | ||||
688 | |||||
689 | =cut # ................................................................... | ||||
690 | 1 | 3µs | push (@cmds, { | ||
691 | setting => 'txrep_spf', | ||||
692 | default => 1, | ||||
693 | type => $Mail::SpamAssassin::Conf::CONF_TYPE_BOOL | ||||
694 | }); | ||||
695 | |||||
696 | |||||
697 | # ------------------------------------------------------------------------- | ||||
698 | =head2 REPUTATION WEIGHTS | ||||
699 | |||||
700 | The overall reputation of the sender comprises several elements: | ||||
701 | |||||
702 | =over 4 | ||||
703 | |||||
704 | =item 1) The reputation of the 'From' email address bound to the originating IP | ||||
705 | address fraction (see the mask parameters for details) | ||||
706 | |||||
707 | =item 2) The reputation of the 'From' email address alone (regardless the IP | ||||
708 | address being currently used) | ||||
709 | |||||
710 | =item 3) The reputation of the domain name of the 'From' email address | ||||
711 | |||||
712 | =item 4) The reputation of the originating IP address, regardless of sender's email address | ||||
713 | |||||
714 | =item 5) The reputation of the HELO name of the originating computer (if available) | ||||
715 | |||||
716 | =back | ||||
717 | |||||
718 | Each of these partial reputations is weighted with the help of these parameters, | ||||
719 | and the overall reputation is calculation as the sum of the individual | ||||
720 | reputations divided by the sum of all their weights: | ||||
721 | |||||
722 | sender_reputation = weight_email * rep_email + | ||||
723 | weight_email_ip * rep_email_ip + | ||||
724 | weight_domain * rep_domain + | ||||
725 | weight_ip * rep_ip + | ||||
726 | weight_helo * rep_helo | ||||
727 | |||||
728 | You can disable the individual partial reputations by setting their respective | ||||
729 | weight to zero. This will also reduce the size of the database, since each | ||||
730 | partial reputation requires a separate entry in the database table. Disabling | ||||
731 | some of the partial reputations in this way may also help with the performance | ||||
732 | on busy servers, because the respective database lookups and processing will | ||||
733 | be skipped too. | ||||
734 | |||||
735 | =over 4 | ||||
736 | |||||
737 | =item B<txrep_weight_email> | ||||
738 | |||||
739 | range [0..10] (default: 3) | ||||
740 | |||||
741 | This weight factor controls the influence of the reputation of the standalone | ||||
742 | email address, regardless of the originating IP address. When adjusting the | ||||
743 | weight, you need to keep on mind that an email address can be easily spoofed, | ||||
744 | and hence spammers can use 'from' email addresses belonging to senders with | ||||
745 | good reputation. From this point of view, the email address bound to the | ||||
746 | originating IP address is a more reliable indicator for the overall reputation. | ||||
747 | |||||
748 | On the other hand, some reputable senders may be sending from a bigger number | ||||
749 | of IP addresses, so looking for the reputation of the standalone email address | ||||
750 | without regarding the originating IP has some sense too. | ||||
751 | |||||
752 | We recommend using a relatively low value for this partial reputation. | ||||
753 | |||||
754 | =cut # ................................................................... | ||||
755 | push (@cmds, { | ||||
756 | setting => 'txrep_weight_email', | ||||
757 | default => 3, | ||||
758 | type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC, | ||||
759 | code => sub { | ||||
760 | my ($self, $key, $value, $line) = @_; | ||||
761 | if ($value < 0 || $value > 10) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;} | ||||
762 | $self->{txrep_weight_email} = $value; | ||||
763 | } | ||||
764 | 1 | 8µs | }); | ||
765 | |||||
766 | # ------------------------------------------------------------------------- | ||||
767 | =item B<txrep_weight_email_ip> | ||||
768 | |||||
769 | range [0..10] (default: 10) | ||||
770 | |||||
771 | This is the standard reputation used in the same way as it was by the original | ||||
772 | AWL plugin. Each sender's email address is bound to the originating IP, or | ||||
773 | its part as defined by the txrep_ipv4_mask_len or txrep_ipv6_mask_len parameters. | ||||
774 | |||||
775 | At a user sending from multiple locations, diverse mail servers, or from a dynamic | ||||
776 | IP range out of the masked block, his email address will have a separate reputation | ||||
777 | value for each of the different (partial) IP addresses. | ||||
778 | |||||
779 | When the option auto_whitelist_distinguish_signed is enabled, in contrary to | ||||
780 | the original AWL module, TxRep does not record the IP address when DKIM | ||||
781 | signature is detected. The email address is then not bound to any IP address, but | ||||
782 | rather just to the DKIM signature, since it is considered that it authenticates | ||||
783 | the sender more reliably than the IP address (which can also vary). | ||||
784 | |||||
785 | This is by design the most relevant reputation, and its weight should be kept | ||||
786 | high. | ||||
787 | |||||
788 | =cut # ................................................................... | ||||
789 | push (@cmds, { | ||||
790 | setting => 'txrep_weight_email_ip', | ||||
791 | default => 10, | ||||
792 | type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC, | ||||
793 | code => sub { | ||||
794 | my ($self, $key, $value, $line) = @_; | ||||
795 | if ($value < 0 || $value > 10) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;} | ||||
796 | $self->{txrep_weight_email_ip} = $value; | ||||
797 | } | ||||
798 | 1 | 7µs | }); | ||
799 | |||||
800 | # ------------------------------------------------------------------------- | ||||
801 | =item B<txrep_weight_domain> | ||||
802 | |||||
803 | range [0..10] (default: 2) | ||||
804 | |||||
805 | Some spammers may use always their real domain name in the email address, | ||||
806 | just with multiple or changing local parts. This reputation will record the | ||||
807 | spam scores of all messages send from the respective domain, regardless of | ||||
808 | the local part (user name) used. | ||||
809 | |||||
810 | Similarly as with the email_ip reputation, the domain reputation is also | ||||
811 | bound to the originating address (or a masked block, if mask parameters used). | ||||
812 | It avoids giving false reputation based on spoofed email addresses. | ||||
813 | |||||
814 | In case of a DKIM signature detected, the signature signer is used instead | ||||
815 | of the domain name extracted from the email address. It is considered that | ||||
816 | the signing authority is responsible for sending email of any domain name, | ||||
817 | hence the same reputation applies here. | ||||
818 | |||||
819 | The domain reputation will give relevant picture about the owner of the | ||||
820 | domain in case of small servers, or corporation with strict policies, but | ||||
821 | will be less relevant for freemailers like Gmail, Hotmail, and similar, | ||||
822 | because both ham and spam may be sent by their users. | ||||
823 | |||||
824 | The default value is set relatively low. Higher weight values may be useful, | ||||
825 | but we recommend caution and observing the scores before increasing it. | ||||
826 | |||||
827 | =cut # ................................................................... | ||||
828 | push (@cmds, { | ||||
829 | setting => 'txrep_weight_domain', | ||||
830 | default => 2, | ||||
831 | type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC, | ||||
832 | code => sub { | ||||
833 | my ($self, $key, $value, $line) = @_; | ||||
834 | if ($value < 0 || $value > 10) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;} | ||||
835 | $self->{txrep_weight_domain} = $value; | ||||
836 | } | ||||
837 | 1 | 6µs | }); | ||
838 | |||||
839 | # ------------------------------------------------------------------------- | ||||
840 | =item B<txrep_weight_ip> | ||||
841 | |||||
842 | range [0..10] (default: 4) | ||||
843 | |||||
844 | Spammers can send through the same relay (incl. compromised hosts) under a | ||||
845 | multitude of email addresses. This is the exact case when the IP reputation | ||||
846 | can help. This reputation is a kind of a local RBL. | ||||
847 | |||||
848 | The weight is set by default lower than for the email_IP reputation, because | ||||
849 | there may be cases when the same IP address hosts both spammers and acceptable | ||||
850 | senders (for example the marketing department of a company sends you spam, but | ||||
851 | you still need to get messages from their billing address). | ||||
852 | |||||
853 | =cut # ................................................................... | ||||
854 | push (@cmds, { | ||||
855 | setting => 'txrep_weight_ip', | ||||
856 | default => 4, | ||||
857 | type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC, | ||||
858 | code => sub { | ||||
859 | my ($self, $key, $value, $line) = @_; | ||||
860 | if ($value < 0 || $value > 10) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;} | ||||
861 | $self->{txrep_weight_ip} = $value; | ||||
862 | } | ||||
863 | 1 | 7µs | }); | ||
864 | |||||
865 | # ------------------------------------------------------------------------- | ||||
866 | =item B<txrep_weight_helo> | ||||
867 | |||||
868 | range [0..10] (default: 0.5) | ||||
869 | |||||
870 | Big number of spam messages come from compromised hosts, often personal computers, | ||||
871 | or top-boxes. Their NetBIOS names are usually used as the HELO name when connecting | ||||
872 | to your mail server. Some of the names are pretty generic and hence may be shared by | ||||
873 | a big number of hosts, but often the names are quite unique and may be a good | ||||
874 | indicator for detecting a spammer, despite that he uses different email and IP | ||||
875 | addresses (spam can come also from portable devices). | ||||
876 | |||||
877 | No IP address is bound to the HELO name when stored to the reputation database. | ||||
878 | This is intentional, and despite the possibility that numerous devices may share | ||||
879 | some of the HELO names. | ||||
880 | |||||
881 | This option is still considered experimental, hence the low weight value, but after | ||||
882 | some testing it could be likely at least slightly increased. | ||||
883 | |||||
884 | =cut # ................................................................... | ||||
885 | push (@cmds, { | ||||
886 | setting => 'txrep_weight_helo', | ||||
887 | default => 0.5, | ||||
888 | type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC, | ||||
889 | code => sub { | ||||
890 | my ($self, $key, $value, $line) = @_; | ||||
891 | if ($value < 0 || $value > 10) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;} | ||||
892 | $self->{txrep_weight_helo} = $value; | ||||
893 | } | ||||
894 | 1 | 6µs | }); | ||
895 | |||||
896 | |||||
897 | # ------------------------------------------------------------------------- | ||||
898 | =back | ||||
899 | |||||
900 | =head1 ADMINISTRATOR SETTINGS | ||||
901 | |||||
902 | These settings differ from the ones above, in that they are considered 'more | ||||
903 | privileged' -- even more than the ones in the B<PRIVILEGED SETTINGS> section. | ||||
904 | No matter what C<allow_user_rules> is set to, these can never be set from a | ||||
905 | user's C<user_prefs> file. | ||||
906 | |||||
907 | =over 4 | ||||
908 | |||||
909 | =item B<txrep_factory module> | ||||
910 | |||||
911 | (default: Mail::SpamAssassin::DBBasedAddrList) | ||||
912 | |||||
913 | Select alternative database factory module for the TxRep database. | ||||
914 | |||||
915 | =cut # ................................................................... | ||||
916 | 1 | 4µs | push (@cmds, { | ||
917 | setting => 'txrep_factory', | ||||
918 | is_admin => 1, | ||||
919 | default => 'Mail::SpamAssassin::DBBasedAddrList', | ||||
920 | type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING | ||||
921 | }); | ||||
922 | |||||
923 | |||||
924 | # ------------------------------------------------------------------------- | ||||
925 | =item B<auto_whitelist_path /path/filename> | ||||
926 | |||||
927 | (default: ~/.spamassassin/tx-reputation) | ||||
928 | |||||
929 | This is the TxRep directory and filename. By default, each user | ||||
930 | has their own reputation database in their C<~/.spamassassin> directory with | ||||
931 | mode 0700. For system-wide SpamAssassin use, you may want to share this | ||||
932 | across all users. | ||||
933 | |||||
934 | =cut # ................................................................... | ||||
935 | push (@cmds, { | ||||
936 | setting => 'auto_whitelist_path', | ||||
937 | is_admin => 1, | ||||
938 | default => '__userstate__/tx-reputation', | ||||
939 | type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING, | ||||
940 | code => sub { | ||||
941 | my ($self, $key, $value, $line) = @_; | ||||
942 | unless (defined $value && $value !~ /^$/) {return $Mail::SpamAssassin::Conf::MISSING_REQUIRED_VALUE;} | ||||
943 | if (-d $value) {return $Mail::SpamAssassin::Conf::INVALID_VALUE; } | ||||
944 | $self->{txrep_path} = $value; | ||||
945 | } | ||||
946 | 1 | 8µs | }); | ||
947 | |||||
948 | |||||
949 | # ------------------------------------------------------------------------- | ||||
950 | =item B<auto_whitelist_db_modules Module ...> | ||||
951 | |||||
952 | (default: see below) | ||||
953 | |||||
954 | What database modules should be used for the TxRep storage database | ||||
955 | file. The first named module that can be loaded from the Perl include path | ||||
956 | will be used. The format is: | ||||
957 | |||||
958 | PreferredModuleName SecondBest ThirdBest ... | ||||
959 | |||||
960 | ie. a space-separated list of Perl module names. The default is: | ||||
961 | |||||
962 | DB_File GDBM_File SDBM_File | ||||
963 | |||||
964 | NDBM_File is not supported (see SpamAssassin bug 4353). | ||||
965 | |||||
966 | =cut # ................................................................... | ||||
967 | 1 | 7µs | push (@cmds, { | ||
968 | setting => 'auto_whitelist_db_modules', | ||||
969 | is_admin => 1, | ||||
970 | default => 'DB_File GDBM_File SDBM_File', | ||||
971 | type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING | ||||
972 | }); | ||||
973 | |||||
974 | |||||
975 | # ------------------------------------------------------------------------- | ||||
976 | =item B<auto_whitelist_file_mode> | ||||
977 | |||||
978 | (default: 0700) | ||||
979 | |||||
980 | The file mode bits used for the TxRep directory or file. | ||||
981 | |||||
982 | Make sure you specify this using the 'x' mode bits set, as it may also be used | ||||
983 | to create directories. However, if a file is created, the resulting file will | ||||
984 | not have any execute bits set (the umask is set to 0111). | ||||
985 | |||||
986 | =cut # ................................................................... | ||||
987 | push (@cmds, { | ||||
988 | setting => 'auto_whitelist_file_mode', | ||||
989 | is_admin => 1, | ||||
990 | default => '0700', | ||||
991 | type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC, | ||||
992 | code => sub { | ||||
993 | my ($self, $key, $value, $line) = @_; | ||||
994 | if ($value !~ /^0?[0-7]{3}$/) { | ||||
995 | return $Mail::SpamAssassin::Conf::INVALID_VALUE; | ||||
996 | } | ||||
997 | $self->{txrep_file_mode} = untaint_var($value); | ||||
998 | } | ||||
999 | 1 | 8µs | }); | ||
1000 | |||||
1001 | |||||
1002 | # ------------------------------------------------------------------------- | ||||
1003 | =item B<user_awl_dsn DBI:databasetype:databasename:hostname:port> | ||||
1004 | |||||
1005 | Used by the SQLBasedAddrList storage implementation. | ||||
1006 | |||||
1007 | This will set the DSN used to connect. Example: | ||||
1008 | C<DBI:mysql:spamassassin:localhost> | ||||
1009 | |||||
1010 | =cut # ................................................................... | ||||
1011 | 1 | 4µs | push (@cmds, { | ||
1012 | setting => 'user_awl_dsn', | ||||
1013 | is_admin => 1, | ||||
1014 | type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING | ||||
1015 | }); | ||||
1016 | |||||
1017 | |||||
1018 | # ------------------------------------------------------------------------- | ||||
1019 | =item B<user_awl_sql_username username> | ||||
1020 | |||||
1021 | Used by the SQLBasedAddrList storage implementation. | ||||
1022 | |||||
1023 | The authorized username to connect to the above DSN. | ||||
1024 | |||||
1025 | =cut # ................................................................... | ||||
1026 | 1 | 3µs | push (@cmds, { | ||
1027 | setting => 'user_awl_sql_username', | ||||
1028 | is_admin => 1, | ||||
1029 | type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING | ||||
1030 | }); | ||||
1031 | |||||
1032 | |||||
1033 | # ------------------------------------------------------------------------- | ||||
1034 | =item B<user_awl_sql_password password> | ||||
1035 | |||||
1036 | Used by the SQLBasedAddrList storage implementation. | ||||
1037 | |||||
1038 | The password for the database username, for the above DSN. | ||||
1039 | |||||
1040 | =cut # ................................................................... | ||||
1041 | 1 | 4µs | push (@cmds, { | ||
1042 | setting => 'user_awl_sql_password', | ||||
1043 | is_admin => 1, | ||||
1044 | type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING | ||||
1045 | }); | ||||
1046 | |||||
1047 | |||||
1048 | # ------------------------------------------------------------------------- | ||||
1049 | =item B<user_awl_sql_table tablename> | ||||
1050 | |||||
1051 | (default: txrep) | ||||
1052 | |||||
1053 | Used by the SQLBasedAddrList storage implementation. | ||||
1054 | |||||
1055 | The table name where reputation is to be stored in, for the above DSN. | ||||
1056 | |||||
1057 | =back | ||||
1058 | |||||
1059 | =cut # ................................................................... | ||||
1060 | 1 | 4µs | push (@cmds, { | ||
1061 | setting => 'user_awl_sql_table', | ||||
1062 | is_admin => 1, | ||||
1063 | default => 'txrep', | ||||
1064 | type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING | ||||
1065 | }); | ||||
1066 | |||||
1067 | 1 | 19µs | 1 | 650µs | $conf->{parser}->register_commands(\@cmds); # spent 650µs making 1 call to Mail::SpamAssassin::Conf::Parser::register_commands |
1068 | } | ||||
1069 | |||||
1070 | |||||
1071 | ########################################################################### | ||||
1072 | sub _message { | ||||
1073 | ########################################################################### | ||||
1074 | my ($self, $value, $msg) = @_; | ||||
1075 | print "SpamAssassin TxRep: $value\n" if ($msg); | ||||
1076 | dbg("TxRep: $value"); | ||||
1077 | } | ||||
1078 | |||||
1079 | |||||
1080 | ########################################################################### | ||||
1081 | sub _fail_exit { | ||||
1082 | ########################################################################### | ||||
1083 | my ($self, $err) = @_; | ||||
1084 | my $eval_stat = ($err ne '') ? $err : "errno=$!"; | ||||
1085 | chomp $eval_stat; | ||||
1086 | warn("TxRep: open of TxRep file failed: $eval_stat\n"); | ||||
1087 | if (!defined $self->{txKeepStoreTied}) {$self->finish();} | ||||
1088 | return 0; | ||||
1089 | } | ||||
1090 | |||||
1091 | |||||
1092 | ########################################################################### | ||||
1093 | sub _fn_envelope { | ||||
1094 | ########################################################################### | ||||
1095 | my ($self, $args, $value, $msg) = @_; | ||||
1096 | |||||
1097 | unless ($self->{main}->{conf}->{use_txrep}){ return 0;} | ||||
1098 | unless ($args->{address}) {$self->_message($args->{cli_p},"failed ".$msg); return 0;} | ||||
1099 | |||||
1100 | my $factor = $self->{conf}->{txrep_weight_email} + | ||||
1101 | $self->{conf}->{txrep_weight_email_ip} + | ||||
1102 | $self->{conf}->{txrep_weight_domain} + | ||||
1103 | $self->{conf}->{txrep_weight_ip} + | ||||
1104 | $self->{conf}->{txrep_weight_helo}; | ||||
1105 | my $sign = $args->{signedby}; | ||||
1106 | my $id = $args->{address}; | ||||
1107 | if ($args->{address} =~ /,/) { | ||||
1108 | $sign = $args->{address}; | ||||
1109 | $sign =~ s/^.*,//g; | ||||
1110 | $id =~ s/,.*$//g; | ||||
1111 | } | ||||
1112 | |||||
1113 | # simplified regex used for IP detection (possible FP at a domain is not critical) | ||||
1114 | if ($id !~ /\./ && $self->{conf}->{txrep_weight_helo}) | ||||
1115 | {$factor /= $self->{conf}->{txrep_weight_helo}; $sign = 'helo';} | ||||
1116 | elsif ($id =~ /^[a-f\d\.:]+$/ && $self->{conf}->{txrep_weight_ip}) | ||||
1117 | {$factor /= $self->{conf}->{txrep_weight_ip};} | ||||
1118 | elsif ($id =~ /@/ && $self->{conf}->{txrep_weight_email}) | ||||
1119 | {$factor /= $self->{conf}->{txrep_weight_email};} | ||||
1120 | elsif ($id !~ /@/ && $self->{conf}->{txrep_weight_domain}) | ||||
1121 | {$factor /= $self->{conf}->{txrep_weight_domain};} | ||||
1122 | else {$factor = 1;} | ||||
1123 | |||||
1124 | $self->open_storages(); | ||||
1125 | my $score = (!defined $value)? undef : $factor * $value; | ||||
1126 | my $status = $self->modify_reputation($id, $score, $sign); | ||||
1127 | dbg("TxRep: $msg %s (score %s) %s", $id, $score || 'undef', $sign || ''); | ||||
1128 | eval { | ||||
1129 | $self->_message($args->{cli_p}, ($status?"":"error ") . $msg . ": " . $id); | ||||
1130 | if (!defined $self->{txKeepStoreTied}) {$self->finish();} | ||||
1131 | 1; | ||||
1132 | } or return $self->_fail_exit( $@ ); | ||||
1133 | return $status; | ||||
1134 | } | ||||
1135 | |||||
- - | |||||
1138 | # ------------------------------------------------------------------------- | ||||
1139 | =head1 BLACKLISTING / WHITELISTING | ||||
1140 | |||||
1141 | When asked by SpamAssassin to blacklist or whitelist a user, the TxRep | ||||
1142 | plugin adds a score of 100 (for blacklisting) or -100 (for whitelisting) | ||||
1143 | to the given sender's email address. At a plain address without any IP | ||||
1144 | address, the value is multiplied by the ratio of total reputation | ||||
1145 | weight to the EMAIL reputation weight to account for the reduced impact | ||||
1146 | of the standalone EMAIL reputation when calculating the overall reputation. | ||||
1147 | |||||
1148 | total_weight = weight_email + weight_email_ip + weight_domain + weight_ip + weight_helo | ||||
1149 | blacklisted_reputation = 100 * total_weight / weight_email | ||||
1150 | |||||
1151 | When a standalone email address is blacklisted/whitelisted, all records | ||||
1152 | of the email address bound to an IP address, DKIM signature, or a SPF pass | ||||
1153 | will be removed from the database, and only the standalone record is kept. | ||||
1154 | |||||
1155 | Besides blacklisting/whitelisting of standalone email addresses, the same | ||||
1156 | method may be used also for blacklisting/whitelisting of IP addresses, | ||||
1157 | domain names, and HELO names (only dotless Netbios HELO names can be used). | ||||
1158 | |||||
1159 | When whitelisting/blacklisting an email address or domain name, you can | ||||
1160 | bind them to a specified DKIM signature or SPF record by appending the | ||||
1161 | DKIM signing domain or the tag 'spf' after the ID in the following way: | ||||
1162 | |||||
1163 | spamassassin --add-addr-to-blacklist=spamming.biz,spf | ||||
1164 | spamassassin --add-addr-to-whitelist=friend@good.org,good.org | ||||
1165 | |||||
1166 | When a message contains both a DKIM signature and an SPF pass, the DKIM | ||||
1167 | signature takes the priority, so the record bound to the 'spf' tag won't | ||||
1168 | be checked. Only email addresses and domains can be bound to DKIM or SPF. | ||||
1169 | Records of IP adresses and HELO names are always without DKIM/SPF. | ||||
1170 | |||||
1171 | In case of dual storage, the black/whitelisting is performed only in the | ||||
1172 | default storage. | ||||
1173 | |||||
1174 | =cut | ||||
1175 | ######################################################## plugin hooks ##### | ||||
1176 | sub blacklist_address {my $self=shift; return $self->_fn_envelope(@_, 100, "blacklisting address");} | ||||
1177 | sub whitelist_address {my $self=shift; return $self->_fn_envelope(@_, -100, "whitelisting address");} | ||||
1178 | sub remove_address {my $self=shift; return $self->_fn_envelope(@_,undef, "removing address");} | ||||
1179 | ########################################################################### | ||||
1180 | |||||
1181 | |||||
1182 | # ------------------------------------------------------------------------- | ||||
1183 | =head1 REPUTATION LOGICS | ||||
1184 | |||||
1185 | 1. The most significant sender identificator is equally as at AWL, the | ||||
1186 | combination of the email address and the originating IP address, resp. | ||||
1187 | its part defined by the IPv4 resp. IPv6 mask setting. | ||||
1188 | |||||
1189 | 2. No IP checking for standalone EMAIL address reputation | ||||
1190 | |||||
1191 | 3. No signature checking for IP reputation, and for HELO name reputation | ||||
1192 | |||||
1193 | 4. The EMAIL_IP weight, and not the standalone EMAIL weight is used when | ||||
1194 | no IP address is available (EMAIL_IP is the main indicator, and has | ||||
1195 | the highest weight) | ||||
1196 | |||||
1197 | 5. No IP checking at signed emails (signature authenticates the email | ||||
1198 | instead of the IP address) | ||||
1199 | |||||
1200 | 6. No IP checking at SPF pass (we assume the domain owner is responsable | ||||
1201 | for all IP's he authorizes to send from, hence we use the same identity | ||||
1202 | for all of them) | ||||
1203 | |||||
1204 | 7. No signature used for standalone EMAIL reputation (would be redundant, | ||||
1205 | since no IP is used at signed EMAIL_IP reputation, and we would store | ||||
1206 | two identical hits) | ||||
1207 | |||||
1208 | 8. When available, the DKIM signer is used instead of the domain name for | ||||
1209 | the DOMAIN reputation | ||||
1210 | |||||
1211 | 9. No IP and no signature used for HELO reputation (despite the possibility | ||||
1212 | of the possible existence of multiple computers with the same HELO) | ||||
1213 | |||||
1214 | 10. The full (unmasked IP) address is used (in the address field, instead the | ||||
1215 | IP field) for the standalone IP reputation | ||||
1216 | |||||
1217 | =cut | ||||
1218 | ########################################################################### | ||||
1219 | # spent 96340s (471ms+96340) within Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation which was called 468 times, avg 206s/call:
# 234 times (212ms+96340s) by Mail::SpamAssassin::Plugin::TxRep::learn_message at line 1846, avg 412s/call
# 234 times (259ms+-259ms) by Mail::SpamAssassin::Plugin::TxRep::forget_message at line 1861, avg 0s/call | ||||
1220 | ########################################################################### | ||||
1221 | 468 | 1.05ms | my ($self, $pms) = @_; | ||
1222 | |||||
1223 | # just for the development debugging | ||||
1224 | # use Data::Printer; | ||||
1225 | # dbg("TxRep: DEBUG DUMP of pms: %s, %s", $pms, p($pms)); | ||||
1226 | |||||
1227 | 468 | 1.77ms | my $autolearn = defined $self->{autolearn}; | ||
1228 | 468 | 1.75ms | $self->{last_pms} = $self->{autolearn} = undef; | ||
1229 | |||||
1230 | # Cases where we would not be able to use TxRep | ||||
1231 | 468 | 1.31ms | return 0 unless ($self->{conf}->{use_txrep}); | ||
1232 | 468 | 1.53ms | if ($self->{conf}->{use_auto_whitelist}) { | ||
1233 | warn("TxRep: cannot run when Auto-Whitelist is enabled. Please disable it!\n"); | ||||
1234 | return 0; | ||||
1235 | } | ||||
1236 | 468 | 943µs | if ($autolearn && !$self->{conf}->{txrep_autolearn}) { | ||
1237 | dbg("TxRep: autolearning disabled, no more reputation adjusting, quitting"); | ||||
1238 | return 0; | ||||
1239 | } | ||||
1240 | 468 | 6.08ms | 468 | 261ms | my @from = $pms->all_from_addrs(); # spent 261ms making 468 calls to Mail::SpamAssassin::PerMsgStatus::all_from_addrs, avg 558µs/call |
1241 | 468 | 1.53ms | if (@from && $from[0] eq 'ignore@compiling.spamassassin.taint.org') { | ||
1242 | dbg("TxRep: no scan in lint mode, quitting"); | ||||
1243 | return 0; | ||||
1244 | } | ||||
1245 | |||||
1246 | 468 | 1.01ms | my $delta = 0; | ||
1247 | 468 | 4.28ms | 468 | 4.43ms | my $timer = $self->{main}->time_method("total_txrep"); # spent 4.43ms making 468 calls to Mail::SpamAssassin::time_method, avg 9µs/call |
1248 | 468 | 1.65ms | my $msgscore = (defined $self->{learning})? $self->{learning} : $pms->get_autolearn_points(); | ||
1249 | 468 | 5.96ms | 468 | 2.93s | my $date = $pms->{msg}->receive_date() || $pms->{date_header_time}; # spent 2.93s making 468 calls to Mail::SpamAssassin::Message::receive_date, avg 6.27ms/call |
1250 | my $msg_id = $self->{msgid} || | ||||
1251 | 468 | 8.01ms | 468 | 235ms | Mail::SpamAssassin::Plugin::Bayes->get_msgid($pms->{msg}) || # spent 235ms making 468 calls to Mail::SpamAssassin::Plugin::Bayes::get_msgid, avg 503µs/call |
1252 | $pms->get('Message-Id') || $pms->get('Message-ID') || $pms->get('MESSAGE-ID') || $pms->get('MESSAGEID'); | ||||
1253 | |||||
1254 | 468 | 5.79ms | 468 | 12.8ms | my $from = lc $pms->get('From:addr') || $pms->get('EnvelopeFrom:addr');; # spent 12.8ms making 468 calls to Mail::SpamAssassin::PerMsgStatus::get, avg 27µs/call |
1255 | 468 | 6.72ms | 468 | 2.49ms | return 0 unless $from =~ /\S/; # spent 2.49ms making 468 calls to Mail::SpamAssassin::Plugin::TxRep::CORE:match, avg 5µs/call |
1256 | 468 | 1.62ms | my $domain = $from; | ||
1257 | 468 | 29.4ms | 468 | 3.10ms | $domain =~ s/^.+@//; # spent 3.10ms making 468 calls to Mail::SpamAssassin::Plugin::TxRep::CORE:subst, avg 7µs/call |
1258 | |||||
1259 | # Find the last untrusted relay and populate helo and original IP | ||||
1260 | 468 | 1.06ms | my ($origip, $helo); | ||
1261 | 468 | 2.78ms | if (defined $pms->{relays_trusted} || defined $pms->{relays_untrusted}) { | ||
1262 | 936 | 3.09ms | my $trusteds = @{$pms->{relays_trusted}}; | ||
1263 | 1404 | 8.01ms | foreach my $rly ( @{$pms->{relays_trusted}}, @{$pms->{relays_untrusted}} ) { | ||
1264 | # Get the last found HELO, regardless of private/public or trusted/untrusted | ||||
1265 | # Avoiding a redundant duplicate entry if HELO is equal/similar to another identificator | ||||
1266 | 2082 | 263ms | 12412 | 108ms | if (defined $rly->{helo} && # spent 87.6ms making 6206 calls to Mail::SpamAssassin::Plugin::TxRep::CORE:regcomp, avg 14µs/call
# spent 20.6ms making 6206 calls to Mail::SpamAssassin::Plugin::TxRep::CORE:match, avg 3µs/call |
1267 | $rly->{helo} !~ /^\[?\Q$rly->{ip}\E\]?$/ && | ||||
1268 | $rly->{helo} !~ /^\Q$domain\E$/i && | ||||
1269 | $rly->{helo} !~ /^\Q$from\E$/i ) { | ||||
1270 | 2042 | 6.69ms | $helo = $rly->{helo}; | ||
1271 | } | ||||
1272 | # use only trusted ID, but use the first untrusted IP (if available) (AWL bug 6908) | ||||
1273 | # at low spam scores (<2) ignore trusted/untrusted | ||||
1274 | # set IP to 127.0.0.1 for any internal IP, so that it can be distinguished from none (AWL bug 6357) | ||||
1275 | 2082 | 8.26ms | if ((--$trusteds >= 0 || $msgscore<2) && !$msg_id && $rly->{id}) {$msg_id = $rly->{id};} | ||
1276 | 2548 | 12.0ms | if (($trusteds >= -1 || $msgscore<2) && !$rly->{ip_private} && $rly->{ip}) {$origip = $rly->{ip};} | ||
1277 | 2550 | 18.0ms | if ( $trusteds >= 0 && !$origip && $rly->{ip_private} && $rly->{ip}) {$origip = '127.0.0.1';} | ||
1278 | } | ||||
1279 | } | ||||
1280 | |||||
1281 | # Look for previous scores of the same message, for instance when doing re-learning | ||||
1282 | 468 | 3.65ms | if ($self->{conf}->{txrep_track_messages}) { | ||
1283 | 468 | 1.91ms | if ($msg_id) { | ||
1284 | 468 | 5.54ms | 468 | 14662s | my $msg_rep = $self->check_reputations($pms, 'MSG_ID', $msg_id, undef, $date, undef); # spent 14662s making 468 calls to Mail::SpamAssassin::Plugin::TxRep::check_reputations, avg 31.3s/call |
1285 | 468 | 7.21ms | 468 | 6.23ms | if (defined $msg_rep && $self->count()) { # spent 6.23ms making 468 calls to Mail::SpamAssassin::Plugin::TxRep::count, avg 13µs/call |
1286 | 468 | 3.51ms | if (defined $self->{learning} && !defined $self->{forgetting}) { | ||
1287 | # already learned, forget only if already learned (count>1), and relearn | ||||
1288 | # when only scanned (count=1), go ahead with normal rep scan | ||||
1289 | 234 | 2.29ms | 234 | 2.46ms | if ($self->count() > 1) { # spent 2.46ms making 234 calls to Mail::SpamAssassin::Plugin::TxRep::count, avg 11µs/call |
1290 | 234 | 736µs | $self->{last_pms} = $pms; # cache the pmstatus | ||
1291 | 234 | 2.55ms | 234 | 49211s | $self->forget_message($pms->{msg},$msg_id); # sub reentrance OK # spent 49211s making 234 calls to Mail::SpamAssassin::Plugin::TxRep::forget_message, avg 210s/call |
1292 | } | ||||
1293 | } elsif ($self->{forgetting}) { | ||||
1294 | 234 | 731µs | $msgscore = $msg_rep; # forget the old stored score instead of the one got now | ||
1295 | 234 | 2.91ms | 234 | 3.30ms | dbg("TxRep: forgetting stored score %0.3f of message %s", $msgscore || 'undef', $msg_id); # spent 3.30ms making 234 calls to Mail::SpamAssassin::Logger::dbg, avg 14µs/call |
1296 | } else { | ||||
1297 | # calculating the delta from the stored message reputation | ||||
1298 | $delta = ($msgscore + $self->{conf}->{txrep_factor}*$msg_rep) / (1+$self->{conf}->{txrep_factor}) - $msgscore; | ||||
1299 | if ($delta != 0) { | ||||
1300 | $pms->got_hit("TXREP", "TXREP: ", ruletype => 'eval', score => sprintf("%0.3f", $delta)); | ||||
1301 | } | ||||
1302 | dbg("TxRep: message %s already scanned, using old data; post-TxRep score: %0.3f", $msg_id, $pms->{score} || 'undef'); | ||||
1303 | return 0; | ||||
1304 | } | ||||
1305 | } # no stored reputation found, go ahead with normal rep scan | ||||
1306 | } else {dbg("TxRep: no message-id available, parsing forced");} | ||||
1307 | } # else no message tracking, go ahead with normal rep scan | ||||
1308 | |||||
1309 | # whitelists recipients at senders from internal networks after checking MSG_ID only | ||||
1310 | 468 | 7.58ms | if ( $self->{conf}->{txrep_whitelist_out} && | ||
1311 | 468 | 1.07ms | defined $pms->{relays_internal} && @{$pms->{relays_internal}} && | ||
1312 | 468 | 1.04ms | (!defined $pms->{relays_external} || !@{$pms->{relays_external}}) | ||
1313 | ) { | ||||
1314 | 2 | 34µs | 2 | 3.65ms | foreach my $rcpt ($pms->all_to_addrs()) { # spent 3.65ms making 2 calls to Mail::SpamAssassin::PerMsgStatus::all_to_addrs, avg 1.83ms/call |
1315 | 2 | 16µs | if ($rcpt) { | ||
1316 | 2 | 31µs | 2 | 22µs | dbg("TxRep: internal sender, whitelisting recipient: $rcpt"); # spent 22µs making 2 calls to Mail::SpamAssassin::Logger::dbg, avg 11µs/call |
1317 | 2 | 29µs | 2 | 6.36s | $self->modify_reputation($rcpt, -1*$self->{conf}->{txrep_whitelist_out}, undef); # spent 6.36s making 2 calls to Mail::SpamAssassin::Plugin::TxRep::modify_reputation, avg 3.18s/call |
1318 | } | ||||
1319 | } | ||||
1320 | } | ||||
1321 | |||||
1322 | # Get the signing domain | ||||
1323 | 468 | 6.96ms | 468 | 20.1ms | my $signedby = ($self->{conf}->{auto_whitelist_distinguish_signed})? $pms->get_tag('DKIMDOMAIN') : undef; # spent 20.1ms making 468 calls to Mail::SpamAssassin::PerMsgStatus::get_tag, avg 43µs/call |
1324 | |||||
1325 | # Summary of all information we've gathered so far | ||||
1326 | dbg("TxRep: active, %s pre-score: %s, autolearn score: %s, IP: %s, address: %s %s", | ||||
1327 | $msg_id || '', | ||||
1328 | 468 | 7.12ms | 468 | 4.96ms | $pms->{score} || '?', # spent 4.96ms making 468 calls to Mail::SpamAssassin::Logger::dbg, avg 11µs/call |
1329 | $msgscore || '?', | ||||
1330 | $origip || '?', | ||||
1331 | $from || '?', | ||||
1332 | $signedby ? "signed by $signedby" : '(unsigned)' | ||||
1333 | ); | ||||
1334 | |||||
1335 | 468 | 1.45ms | my $ip = $origip; | ||
1336 | 468 | 1.59ms | if ($signedby) { | ||
1337 | $ip = undef; | ||||
1338 | $domain = $signedby; | ||||
1339 | } elsif ($pms->{spf_pass} && $self->{conf}->{txrep_spf}) { | ||||
1340 | $ip = undef; | ||||
1341 | $signedby = 'spf'; | ||||
1342 | } | ||||
1343 | |||||
1344 | 468 | 1.06ms | my $totalweight = 0; | ||
1345 | 468 | 1.57ms | $self->{totalweight} = $totalweight; | ||
1346 | |||||
1347 | # Get current reputation info | ||||
1348 | 468 | 16.2ms | 468 | 13623s | $delta += $self->check_reputations($pms, 'EMAIL_IP', $from, $ip, $signedby, $msgscore); # spent 13623s making 468 calls to Mail::SpamAssassin::Plugin::TxRep::check_reputations, avg 29.1s/call |
1349 | |||||
1350 | 468 | 2.28ms | if ($domain) { | ||
1351 | 468 | 4.91ms | 468 | 13711s | $delta += $self->check_reputations($pms, 'DOMAIN', $domain, $ip, $signedby, $msgscore); # spent 13711s making 468 calls to Mail::SpamAssassin::Plugin::TxRep::check_reputations, avg 29.3s/call |
1352 | } | ||||
1353 | 468 | 2.21ms | if ($helo) { | ||
1354 | 408 | 4.74ms | 408 | 12024s | $delta += $self->check_reputations($pms, 'HELO', $helo, undef, 'HELO', $msgscore); # spent 12024s making 408 calls to Mail::SpamAssassin::Plugin::TxRep::check_reputations, avg 29.5s/call |
1355 | } | ||||
1356 | 468 | 2.16ms | if ($origip) { | ||
1357 | 468 | 2.01ms | if (!$signedby) { | ||
1358 | 468 | 4.98ms | 468 | 13734s | $delta += $self->check_reputations($pms, 'EMAIL', $from, undef, undef, $msgscore); # spent 13734s making 468 calls to Mail::SpamAssassin::Plugin::TxRep::check_reputations, avg 29.3s/call |
1359 | } | ||||
1360 | 468 | 5.12ms | 468 | 13799s | $delta += $self->check_reputations($pms, 'IP', $origip, undef, undef, $msgscore); # spent 13799s making 468 calls to Mail::SpamAssassin::Plugin::TxRep::check_reputations, avg 29.5s/call |
1361 | } | ||||
1362 | |||||
1363 | # Learn against this message and store reputation | ||||
1364 | 468 | 1.59ms | if (!defined $self->{learning}) { | ||
1365 | $delta = ($self->{totalweight})? $self->{conf}->{txrep_factor} * $delta / $self->{totalweight} : 0; | ||||
1366 | if ($delta) { | ||||
1367 | $pms->got_hit("TXREP", "TXREP: ", ruletype => 'eval', score => sprintf("%0.3f", $delta)); | ||||
1368 | } | ||||
1369 | $msgscore += $delta; | ||||
1370 | if (defined $pms->{score}) { | ||||
1371 | dbg("TxRep: post-TxRep score: %.3f", $pms->{score}); | ||||
1372 | } | ||||
1373 | } | ||||
1374 | # Track message ID | ||||
1375 | 468 | 3.31ms | if ($self->{conf}->{txrep_track_messages} && $msg_id) { | ||
1376 | 468 | 4.63ms | 468 | 14778s | $self->check_reputations($pms, 'MSG_ID', $msg_id, undef, $date, $msgscore); # spent 14778s making 468 calls to Mail::SpamAssassin::Plugin::TxRep::check_reputations, avg 31.6s/call |
1377 | } | ||||
1378 | # Close any open resources | ||||
1379 | 468 | 2.38ms | if (!defined $self->{txKeepStoreTied}) { | ||
1380 | $self->finish(); | ||||
1381 | } | ||||
1382 | |||||
1383 | 468 | 11.4ms | return 0; | ||
1384 | } | ||||
1385 | |||||
1386 | |||||
1387 | ########################################################################### | ||||
1388 | # spent 96330s (197ms+96330) within Mail::SpamAssassin::Plugin::TxRep::check_reputations which was called 3216 times, avg 30.0s/call:
# 468 times (39.4ms+14778s) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1376, avg 31.6s/call
# 468 times (24.2ms+14662s) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1284, avg 31.3s/call
# 468 times (31.3ms+13799s) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1360, avg 29.5s/call
# 468 times (22.3ms+13734s) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1358, avg 29.3s/call
# 468 times (22.3ms+13711s) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1351, avg 29.3s/call
# 468 times (29.4ms+13623s) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1348, avg 29.1s/call
# 408 times (27.8ms+12024s) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1354, avg 29.5s/call | ||||
1389 | ########################################################################### | ||||
1390 | 3216 | 7.37ms | my $self = shift; | ||
1391 | 3216 | 5.61ms | my $delta; | ||
1392 | |||||
1393 | 3216 | 35.0ms | 3216 | 94571s | if ($self->open_storages()) { # spent 94571s making 3216 calls to Mail::SpamAssassin::Plugin::TxRep::open_storages, avg 29.4s/call |
1394 | 3216 | 20.2ms | if ($self->{conf}->{txrep_user2global_ratio} && $self->{user_storage} != $self->{global_storage}) { | ||
1395 | my $user = $self->check_reputation('user_storage', @_); | ||||
1396 | my $global = $self->check_reputation('global_storage',@_); | ||||
1397 | |||||
1398 | if (defined $user and $user == $user) { | ||||
1399 | $delta = ( $self->{conf}->{txrep_user2global_ratio} * $user + $global ) / ( 1 + $self->{conf}->{txrep_user2global_ratio} ); | ||||
1400 | } else { | ||||
1401 | $delta = $global; | ||||
1402 | } | ||||
1403 | } else { | ||||
1404 | 3216 | 55.7ms | 3216 | 1759s | $delta = $self->check_reputation(undef,@_); # spent 1759s making 3216 calls to Mail::SpamAssassin::Plugin::TxRep::check_reputation, avg 547ms/call |
1405 | } | ||||
1406 | } | ||||
1407 | 3216 | 57.0ms | return $delta; | ||
1408 | } | ||||
1409 | |||||
1410 | |||||
1411 | ########################################################################### | ||||
1412 | # spent 1759s (1.29+1757) within Mail::SpamAssassin::Plugin::TxRep::check_reputation which was called 3216 times, avg 547ms/call:
# 3216 times (1.29s+1757s) by Mail::SpamAssassin::Plugin::TxRep::check_reputations at line 1404, avg 547ms/call | ||||
1413 | ########################################################################### | ||||
1414 | 3216 | 50.8ms | my ($self, $storage, $pms, $key, $id, $ip, $signedby, $msgscore) = @_; | ||
1415 | |||||
1416 | 3216 | 9.57ms | my $delta = 0; | ||
1417 | 3216 | 397ms | my $weight = ($key eq 'MSG_ID')? 1 : eval('$pms->{main}->{conf}->{txrep_weight_'.lc($key).'}'); # spent 6.50ms executing statements in 468 string evals (merged)
# spent 6.21ms executing statements in 468 string evals (merged)
# spent 6.18ms executing statements in 468 string evals (merged)
# spent 6.15ms executing statements in 468 string evals (merged)
# spent 5.43ms executing statements in 408 string evals (merged) | ||
1418 | |||||
1419 | # { | ||||
1420 | # #Bug 7164, trying to find out reason for these: _WARN: Use of uninitialized value $msgscore in addition (+) at /usr/share/perl5/vendor_perl/Mail/SpamAssassin/Plugin/TxRep.pm line 1415. | ||||
1421 | # no warnings; | ||||
1422 | # | ||||
1423 | # unless (defined $msgscore) { | ||||
1424 | # #Output some params and the calling function so we can identify more about this bug | ||||
1425 | # dbg("TxRep: MsgScore Undefined (bug 7164) - check_reputation Parameters: self: $self storage: $storage pms: $pms, key: $key, id: $id, ip: $ip, signedby: $signedby, msgscore: $msgscore"); | ||||
1426 | # dbg("TxRep: MsgScore Undefined (bug 7164) - weight: $weight"); | ||||
1427 | # | ||||
1428 | # my ($package, $filename, $line) = caller(); | ||||
1429 | # | ||||
1430 | # chomp($package); | ||||
1431 | # chomp($filename); | ||||
1432 | # chomp($line); | ||||
1433 | # | ||||
1434 | # dbg("TxRep: MsgScore Undefined (bug 7164) - Caller Info: Package: $package - Filename: $filename - Line: $line"); | ||||
1435 | # | ||||
1436 | # #Define $msgscore as a triage to hide warnings while we find the root cause | ||||
1437 | # #$msgscore = 0; | ||||
1438 | # } | ||||
1439 | # } | ||||
1440 | |||||
1441 | |||||
1442 | 3216 | 21.9ms | if (defined $weight && $weight) { | ||
1443 | 3216 | 6.32ms | my $meanrep; | ||
1444 | 3216 | 50.7ms | 3216 | 45.1ms | my $timer = $self->{main}->time_method('check_txrep_'.lc($key)); # spent 45.1ms making 3216 calls to Mail::SpamAssassin::time_method, avg 14µs/call |
1445 | |||||
1446 | 3216 | 6.82ms | if (defined $storage) { | ||
1447 | $self->{checker} = $self->{$storage}; | ||||
1448 | } | ||||
1449 | 3216 | 36.0ms | 3216 | 1.44s | my $found = $self->get_sender($id, $ip, $signedby); # spent 1.44s making 3216 calls to Mail::SpamAssassin::Plugin::TxRep::get_sender, avg 446µs/call |
1450 | 3216 | 16.2ms | my $tag_id = (defined $storage)? uc($key.'_'.substr($storage,0,1)) : uc($key); | ||
1451 | 3216 | 41.8ms | 3216 | 39.2ms | if (defined $found && $self->count()) { # spent 39.2ms making 3216 calls to Mail::SpamAssassin::Plugin::TxRep::count, avg 12µs/call |
1452 | 3038 | 52.5ms | 6076 | 61.9ms | $meanrep = $self->total() / $self->count(); # spent 32.7ms making 3038 calls to Mail::SpamAssassin::Plugin::TxRep::total, avg 11µs/call
# spent 29.3ms making 3038 calls to Mail::SpamAssassin::Plugin::TxRep::count, avg 10µs/call |
1453 | } | ||||
1454 | 3216 | 20.2ms | if ($self->{learning} && defined $msgscore) { | ||
1455 | 2748 | 10.8ms | if (defined $meanrep) { | ||
1456 | # $msgscore<=>0 gives the sign of $msgscore | ||||
1457 | 2570 | 25.0ms | $msgscore += ($msgscore<=>0) * abs($meanrep); | ||
1458 | } | ||||
1459 | dbg("TxRep: reputation: %s, count: %d, learning: %s, $tag_id: %s", | ||||
1460 | defined $meanrep? sprintf("%.3f",$meanrep) : 'none', | ||||
1461 | $self->count() || 0, | ||||
1462 | 2748 | 108ms | 5496 | 49.3ms | $self->{learning} || '', # spent 26.7ms making 2748 calls to Mail::SpamAssassin::Plugin::TxRep::count, avg 10µs/call
# spent 22.6ms making 2748 calls to Mail::SpamAssassin::Logger::dbg, avg 8µs/call |
1463 | $id || 'none' | ||||
1464 | ); | ||||
1465 | } else { | ||||
1466 | 468 | 1.52ms | $self->{totalweight} += $weight; | ||
1467 | 468 | 5.37ms | 468 | 3.86ms | if ($key eq 'MSG_ID' && $self->count() > 0) { # spent 3.86ms making 468 calls to Mail::SpamAssassin::Plugin::TxRep::count, avg 8µs/call |
1468 | 468 | 6.33ms | 936 | 9.25ms | $delta = $self->total() / $self->count(); # spent 5.59ms making 468 calls to Mail::SpamAssassin::Plugin::TxRep::total, avg 12µs/call
# spent 3.66ms making 468 calls to Mail::SpamAssassin::Plugin::TxRep::count, avg 8µs/call |
1469 | 468 | 17.6ms | 468 | 38.6ms | $pms->set_tag('TXREP'.$tag_id, sprintf("%2.1f", $delta)); # spent 38.6ms making 468 calls to Mail::SpamAssassin::PerMsgStatus::set_tag, avg 83µs/call |
1470 | } elsif (defined $self->total()) { | ||||
1471 | #Bug 7164 - $msgscore undefined | ||||
1472 | if (defined $msgscore) { | ||||
1473 | $delta = ($self->total() + $msgscore) / (1 + $self->count()) - $msgscore; | ||||
1474 | } else { | ||||
1475 | $delta = ($self->total()) / (1 + $self->count()); | ||||
1476 | } | ||||
1477 | |||||
1478 | $pms->set_tag('TXREP_'.$tag_id, sprintf("%2.1f", $delta)); | ||||
1479 | if (defined $meanrep) { | ||||
1480 | $pms->set_tag('TXREP_'.$tag_id.'_MEAN', sprintf("%2.1f", $meanrep)); | ||||
1481 | } | ||||
1482 | $pms->set_tag('TXREP_'.$tag_id.'_COUNT', sprintf("%2.1f", $self->count())); | ||||
1483 | $pms->set_tag('TXREP_'.$tag_id.'_PRESCORE', sprintf("%2.1f", $pms->{score})); | ||||
1484 | } else { | ||||
1485 | $pms->set_tag('TXREP_'.$tag_id.'_UNKNOWN', 1); | ||||
1486 | } | ||||
1487 | 468 | 12.3ms | 936 | 9.12ms | dbg("TxRep: reputation: %s, count: %d, weight: %.1f, delta: %.3f, $tag_id: %s", # spent 5.27ms making 468 calls to Mail::SpamAssassin::Plugin::TxRep::count, avg 11µs/call
# spent 3.84ms making 468 calls to Mail::SpamAssassin::Logger::dbg, avg 8µs/call |
1488 | defined $meanrep? sprintf("%.3f",$meanrep) : 'none', | ||||
1489 | $self->count() || 0, | ||||
1490 | $weight || 0, | ||||
1491 | $delta || 0, | ||||
1492 | $id || 'none' | ||||
1493 | ); | ||||
1494 | } | ||||
1495 | 3216 | 34.0ms | 3216 | 27.5ms | $timer = $self->{main}->time_method('update_txrep_'.lc($key)); # spent 27.5ms making 3216 calls to Mail::SpamAssassin::time_method, avg 9µs/call |
1496 | 3216 | 18.7ms | if (defined $msgscore) { | ||
1497 | 2748 | 13.0ms | if ($self->{forgetting}) { # forgetting a message score | ||
1498 | 1374 | 12.8ms | 1374 | 209ms | $self->remove_score($msgscore); # remove the given score and decrement the count # spent 209ms making 1374 calls to Mail::SpamAssassin::Plugin::TxRep::remove_score, avg 152µs/call |
1499 | 1374 | 4.61ms | if ($key eq 'MSG_ID') { # remove the message ID score completely | ||
1500 | 234 | 2.32ms | 234 | 876s | $self->{checker}->remove_entry($self->{entry}); # spent 876s making 234 calls to Mail::SpamAssassin::DBBasedAddrList::remove_entry, avg 3.74s/call |
1501 | } | ||||
1502 | } else { | ||||
1503 | 1374 | 12.5ms | 1374 | 259ms | $self->add_score($msgscore); # add the score and increment the count # spent 259ms making 1374 calls to Mail::SpamAssassin::Plugin::TxRep::add_score, avg 189µs/call |
1504 | 1374 | 6.31ms | 234 | 2.80ms | if ($self->{learning} && $key eq 'MSG_ID' && $self->count() eq 1) { # spent 2.80ms making 234 calls to Mail::SpamAssassin::Plugin::TxRep::count, avg 12µs/call |
1505 | 16 | 116µs | 16 | 2.44ms | $self->add_score($msgscore); # increasing the count by 1 at a learned score (count=2) # spent 2.44ms making 16 calls to Mail::SpamAssassin::Plugin::TxRep::add_score, avg 152µs/call |
1506 | } # it can be distinguished from a scanned score (count=1) | ||||
1507 | } | ||||
1508 | } elsif (defined $found && $self->{forgetting} && $key eq 'MSG_ID') { | ||||
1509 | 234 | 2.29ms | 234 | 879s | $self->{checker}->remove_entry($self->{entry}); #forgetting the message ID # spent 879s making 234 calls to Mail::SpamAssassin::DBBasedAddrList::remove_entry, avg 3.76s/call |
1510 | } | ||||
1511 | } | ||||
1512 | 3216 | 7.18ms | if (defined $storage) { | ||
1513 | $self->{checker} = $self->{default_storage}; | ||||
1514 | } | ||||
1515 | |||||
1516 | 3216 | 65.4ms | return ($weight || 0) * ($delta || 0); | ||
1517 | } | ||||
1518 | |||||
- - | |||||
1521 | #-------------------------------------------------------------------------- | ||||
1522 | # Database handler subroutines | ||||
1523 | #-------------------------------------------------------------------------- | ||||
1524 | |||||
1525 | ########################################################################### | ||||
1526 | 25464 | 241ms | # spent 133ms within Mail::SpamAssassin::Plugin::TxRep::count which was called 12732 times, avg 10µs/call:
# 3216 times (39.2ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_reputation at line 1451, avg 12µs/call
# 3038 times (29.3ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_reputation at line 1452, avg 10µs/call
# 2748 times (26.7ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_reputation at line 1462, avg 10µs/call
# 1390 times (13.2ms+0s) by Mail::SpamAssassin::Plugin::TxRep::add_score at line 1565, avg 9µs/call
# 468 times (6.23ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1285, avg 13µs/call
# 468 times (5.27ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_reputation at line 1487, avg 11µs/call
# 468 times (3.86ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_reputation at line 1467, avg 8µs/call
# 468 times (3.66ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_reputation at line 1468, avg 8µs/call
# 234 times (2.80ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_reputation at line 1504, avg 12µs/call
# 234 times (2.46ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1289, avg 11µs/call | ||
1527 | 9792 | 78.5ms | # spent 51.0ms within Mail::SpamAssassin::Plugin::TxRep::total which was called 4896 times, avg 10µs/call:
# 3038 times (32.7ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_reputation at line 1452, avg 11µs/call
# 1390 times (12.7ms+0s) by Mail::SpamAssassin::Plugin::TxRep::add_score at line 1565, avg 9µs/call
# 468 times (5.59ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_reputation at line 1468, avg 12µs/call | ||
1528 | ########################################################################### | ||||
1529 | |||||
1530 | |||||
1531 | ########################################################################### | ||||
1532 | # spent 1.44s (335ms+1.10) within Mail::SpamAssassin::Plugin::TxRep::get_sender which was called 3216 times, avg 446µs/call:
# 3216 times (335ms+1.10s) by Mail::SpamAssassin::Plugin::TxRep::check_reputation at line 1449, avg 446µs/call | ||||
1533 | ########################################################################### | ||||
1534 | 3216 | 20.4ms | my ($self, $addr, $origip, $signedby) = @_; | ||
1535 | |||||
1536 | 3216 | 9.56ms | return unless (defined $self->{checker}); | ||
1537 | |||||
1538 | 3216 | 34.4ms | 3216 | 252ms | my $fulladdr = $self->pack_addr($addr, $origip); # spent 252ms making 3216 calls to Mail::SpamAssassin::Plugin::TxRep::pack_addr, avg 78µs/call |
1539 | 3216 | 36.4ms | 3216 | 812ms | my $entry = $self->{checker}->get_addr_entry($fulladdr, $signedby); # spent 812ms making 3216 calls to Mail::SpamAssassin::DBBasedAddrList::get_addr_entry, avg 252µs/call |
1540 | 3216 | 26.3ms | $self->{entry} = $entry; | ||
1541 | 3216 | 9.61ms | $origip = $origip || 'none'; | ||
1542 | |||||
1543 | 3216 | 149ms | 6432 | 37.1ms | if ($entry->{count}<0 || $entry->{count}=~/^(nan|)$/ || $entry->{totscore}=~/^(nan|)$/) { # spent 37.1ms making 6432 calls to Mail::SpamAssassin::Plugin::TxRep::CORE:match, avg 6µs/call |
1544 | warn "TxRep: resetting bad data for ($addr, $origip), count: $entry->{count}, totscore: $entry->{totscore}\n"; | ||||
1545 | $self->{entry}->{count} = $self->{entry}->{totscore} = 0; | ||||
1546 | } | ||||
1547 | 3216 | 55.6ms | return $self->{entry}->{count}; | ||
1548 | } | ||||
1549 | |||||
1550 | |||||
1551 | ########################################################################### | ||||
1552 | # spent 262ms (82.6+179) within Mail::SpamAssassin::Plugin::TxRep::add_score which was called 1390 times, avg 188µs/call:
# 1374 times (81.9ms+177ms) by Mail::SpamAssassin::Plugin::TxRep::check_reputation at line 1503, avg 189µs/call
# 16 times (677µs+1.76ms) by Mail::SpamAssassin::Plugin::TxRep::check_reputation at line 1505, avg 152µs/call | ||||
1553 | ########################################################################### | ||||
1554 | 1390 | 5.63ms | my ($self,$score) = @_; | ||
1555 | |||||
1556 | 1390 | 3.75ms | return unless (defined $self->{checker}); # no factory defined; we can't check | ||
1557 | |||||
1558 | 1390 | 4.78ms | if ($score != $score) { | ||
1559 | warn "TxRep: attempt to add a $score to TxRep entry ignored\n"; | ||||
1560 | return; # don't try to add a NaN | ||||
1561 | } | ||||
1562 | 1390 | 4.37ms | $self->{entry}->{count} ||= 0; | ||
1563 | |||||
1564 | # performing the dilution aging correction | ||||
1565 | 1390 | 31.8ms | 2780 | 25.9ms | if (defined $self->total() && defined $self->count() && defined $self->{txrep_dilution_factor}) { # spent 13.2ms making 1390 calls to Mail::SpamAssassin::Plugin::TxRep::count, avg 9µs/call
# spent 12.7ms making 1390 calls to Mail::SpamAssassin::Plugin::TxRep::total, avg 9µs/call |
1566 | my $diluted_total = | ||||
1567 | ($self->count() + 1) * | ||||
1568 | ($self->{txrep_dilution_factor} * $self->total() + $score) / | ||||
1569 | ($self->{txrep_dilution_factor} * $self->count() + 1); | ||||
1570 | my $corrected_score = $diluted_total - $self->total(); | ||||
1571 | $self->{checker}->add_score($self->{entry}, $corrected_score); | ||||
1572 | } else { | ||||
1573 | 1390 | 13.9ms | 1390 | 153ms | $self->{checker}->add_score($self->{entry}, $score); # spent 153ms making 1390 calls to Mail::SpamAssassin::DBBasedAddrList::add_score, avg 110µs/call |
1574 | } | ||||
1575 | } | ||||
1576 | |||||
- - | |||||
1579 | ########################################################################### | ||||
1580 | # spent 209ms (59.1+150) within Mail::SpamAssassin::Plugin::TxRep::remove_score which was called 1374 times, avg 152µs/call:
# 1374 times (59.1ms+150ms) by Mail::SpamAssassin::Plugin::TxRep::check_reputation at line 1498, avg 152µs/call | ||||
1581 | ########################################################################### | ||||
1582 | 1374 | 5.73ms | my ($self,$score) = @_; | ||
1583 | |||||
1584 | 1374 | 3.80ms | return unless (defined $self->{checker}); # no factory defined; we can't check | ||
1585 | |||||
1586 | 1374 | 4.91ms | if ($score != $score) { # don't try to add a NaN | ||
1587 | warn "TxRep: attempt to add a $score to TxRep entry ignored\n"; | ||||
1588 | return; | ||||
1589 | } | ||||
1590 | # no reversal dilution aging correction (not easily possible), | ||||
1591 | # just removing the original message score | ||||
1592 | 1374 | 7.18ms | if ($self->{entry}->{count} > 2) | ||
1593 | 290 | 1.21ms | {$self->{entry}->{count} -= 2;} | ||
1594 | 1084 | 3.45ms | else {$self->{entry}->{count} = 0;} | ||
1595 | # substract 2, and add a score; hence decrementing by 1 | ||||
1596 | 1374 | 25.1ms | 1374 | 150ms | $self->{checker}->add_score($self->{entry}, -1*$score); # spent 150ms making 1374 calls to Mail::SpamAssassin::DBBasedAddrList::add_score, avg 109µs/call |
1597 | } | ||||
1598 | |||||
- - | |||||
1601 | ########################################################################### | ||||
1602 | # spent 6.36s (251µs+6.36) within Mail::SpamAssassin::Plugin::TxRep::modify_reputation which was called 2 times, avg 3.18s/call:
# 2 times (251µs+6.36s) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1317, avg 3.18s/call | ||||
1603 | ########################################################################### | ||||
1604 | 2 | 13µs | my ($self, $addr, $score, $signedby) = @_; | ||
1605 | |||||
1606 | 2 | 6µs | return unless (defined $self->{checker}); # no factory defined; we can't check | ||
1607 | 2 | 21µs | 2 | 84µs | my $fulladdr = $self->pack_addr($addr, undef); # spent 84µs making 2 calls to Mail::SpamAssassin::Plugin::TxRep::pack_addr, avg 42µs/call |
1608 | 2 | 20µs | 2 | 428µs | my $entry = $self->{checker}->get_addr_entry($fulladdr, $signedby); # spent 428µs making 2 calls to Mail::SpamAssassin::DBBasedAddrList::get_addr_entry, avg 214µs/call |
1609 | |||||
1610 | # remove any old entries (will remove per-ip entries as well) | ||||
1611 | # always call this regardless, as the current entry may have 0 | ||||
1612 | # scores, but the per-ip one may have more | ||||
1613 | 2 | 20µs | 2 | 6.36s | $self->{checker}->remove_entry($entry); # spent 6.36s making 2 calls to Mail::SpamAssassin::DBBasedAddrList::remove_entry, avg 3.18s/call |
1614 | |||||
1615 | # remove address only, no new score to add if score NaN or undef | ||||
1616 | 2 | 17µs | if (defined $score && $score==$score) { | ||
1617 | # else add score. get a new entry first | ||||
1618 | 2 | 41µs | 2 | 490µs | $entry = $self->{checker}->get_addr_entry($fulladdr, $signedby); # spent 490µs making 2 calls to Mail::SpamAssassin::DBBasedAddrList::get_addr_entry, avg 245µs/call |
1619 | 2 | 20µs | 2 | 210µs | $self->{checker}->add_score($entry, $score); # spent 210µs making 2 calls to Mail::SpamAssassin::DBBasedAddrList::add_score, avg 105µs/call |
1620 | } | ||||
1621 | 2 | 37µs | return 1; | ||
1622 | } | ||||
1623 | |||||
1624 | |||||
1625 | # connecting the primary and the secondary storage; needed only on the first run | ||||
1626 | # (this can't be in the constructor, since the settings are not available there) | ||||
1627 | ########################################################################### | ||||
1628 | # spent 94571s (300ms+94571) within Mail::SpamAssassin::Plugin::TxRep::open_storages which was called 3216 times, avg 29.4s/call:
# 3216 times (300ms+94571s) by Mail::SpamAssassin::Plugin::TxRep::check_reputations at line 1393, avg 29.4s/call | ||||
1629 | ########################################################################### | ||||
1630 | 3216 | 6.95ms | my $self = shift; | ||
1631 | |||||
1632 | # disabled per bug 7191 | ||||
1633 | #return 1 unless (!defined $self->{default_storage}); | ||||
1634 | |||||
1635 | 3216 | 6.57ms | my $factory; | ||
1636 | 3216 | 19.2ms | if ($self->{main}->{pers_addr_list_factory}) { | ||
1637 | 3215 | 9.46ms | $factory = $self->{main}->{pers_addr_list_factory}; | ||
1638 | } else { | ||||
1639 | 1 | 4µs | my $type = $self->{conf}->{txrep_factory}; | ||
1640 | 1 | 16µs | 1 | 5µs | if ($type =~ /^([_A-Za-z0-9:]+)$/) { # spent 5µs making 1 call to Mail::SpamAssassin::Plugin::TxRep::CORE:match |
1641 | 1 | 9µs | 1 | 32µs | $type = untaint_var($type); # spent 32µs making 1 call to Mail::SpamAssassin::Util::untaint_var |
1642 | eval 'require '.$type.'; | ||||
1643 | $factory = '.$type.'->new(); | ||||
1644 | 1;' | ||||
1645 | 1 | 162µs | or do { # spent 432µs executing statements in string eval | ||
1646 | my $eval_stat = $@ ne '' ? $@ : "errno=$!"; chomp $eval_stat; | ||||
1647 | warn "TxRep: $eval_stat\n"; | ||||
1648 | undef $factory; | ||||
1649 | }; | ||||
1650 | 1 | 12µs | 1 | 11µs | $self->{main}->set_persistent_address_list_factory($factory) if $factory; # spent 11µs making 1 call to Mail::SpamAssassin::set_persistent_address_list_factory |
1651 | } else {warn "TxRep: illegal factory setting\n";} | ||||
1652 | } | ||||
1653 | 3216 | 18.6ms | if (defined $factory) { | ||
1654 | 3216 | 9.06s | 3216 | 94562s | $self->{checker} = $self->{default_storage} = $factory->new_checker($self->{main}); # spent 94562s making 3216 calls to Mail::SpamAssassin::DBBasedAddrList::new_checker, avg 29.4s/call |
1655 | |||||
1656 | 3216 | 24.7ms | 3215 | 8.90s | if ($self->{conf}->{txrep_user2global_ratio} && !defined $self->{global_storage}) { # spent 8.90s making 3215 calls to DB_File::DESTROY, avg 2.77ms/call |
1657 | # hack to handle the BDB and SQL factory types of the storage object | ||||
1658 | # TODO: add an a method to the handler class instead | ||||
1659 | my ($storage_type, $is_global); | ||||
1660 | |||||
1661 | if (ref($factory) =~ /SQLBasedAddrList/) { | ||||
1662 | $is_global = defined $self->{conf}->{user_awl_sql_override_username}; | ||||
1663 | $storage_type = 'SQL'; | ||||
1664 | if ($is_global && $self->{conf}->{user_awl_sql_override_username} eq $self->{main}->{username}) { | ||||
1665 | # skip double storage if current user same as the global override | ||||
1666 | $self->{user_storage} = $self->{global_storage} = $self->{default_storage}; | ||||
1667 | } | ||||
1668 | } elsif (ref($factory) =~ /DBBasedAddrList/) { | ||||
1669 | $is_global = $self->{conf}->{auto_whitelist_path} !~ /__userstate__/; | ||||
1670 | $storage_type = 'DB'; | ||||
1671 | } | ||||
1672 | if (!defined $self->{global_storage}) { | ||||
1673 | my $sql_override_orig = $self->{conf}->{user_awl_sql_override_username}; | ||||
1674 | my $awl_path_orig = $self->{conf}->{auto_whitelist_path}; | ||||
1675 | if ($is_global) { | ||||
1676 | $self->{conf}->{user_awl_sql_override_username} = ''; | ||||
1677 | $self->{conf}->{auto_whitelist_path} = '__userstate__/tx-reputation'; | ||||
1678 | $self->{global_storage} = $self->{default_storage}; | ||||
1679 | $self->{user_storage} = $factory->new_checker($self->{main}); | ||||
1680 | } else { | ||||
1681 | $self->{conf}->{user_awl_sql_override_username} = 'GLOBAL'; | ||||
1682 | $self->{conf}->{auto_whitelist_path} = '__local_state_dir__/tx-reputation'; | ||||
1683 | $self->{global_storage} = $factory->new_checker($self->{main}); | ||||
1684 | $self->{user_storage} = $self->{default_storage}; | ||||
1685 | } | ||||
1686 | $self->{conf}->{user_awl_sql_override_username} = $sql_override_orig; | ||||
1687 | $self->{conf}->{auto_whitelist_path} = $awl_path_orig; | ||||
1688 | |||||
1689 | # Another ugly hack to find out whether the user differs from | ||||
1690 | # the global one. We need to add a method to the factory handlers | ||||
1691 | if ($storage_type eq 'DB' && | ||||
1692 | $self->{user_storage}->{locked_file} eq $self->{global_storage}->{locked_file}) { | ||||
1693 | if ($is_global) | ||||
1694 | {$self->{global_storage}->finish();} | ||||
1695 | else {$self->{user_storage}->finish();} | ||||
1696 | $self->{user_storage} = $self->{global_storage} = $self->{default_storage}; | ||||
1697 | } | ||||
1698 | } | ||||
1699 | } | ||||
1700 | } else { | ||||
1701 | $self->{user_storage} = $self->{global_storage} = $self->{checker} = $self->{default_storage} = undef; | ||||
1702 | warn("TxRep: could not open storages, quitting!\n"); | ||||
1703 | return 0; | ||||
1704 | } | ||||
1705 | 3216 | 57.2ms | return 1; | ||
1706 | } | ||||
1707 | |||||
1708 | |||||
1709 | ########################################################################### | ||||
1710 | # spent 1.06ms (45µs+1.02) within Mail::SpamAssassin::Plugin::TxRep::finish which was called:
# once (45µs+1.02ms) by Mail::SpamAssassin::Plugin::TxRep::learner_close at line 1889 | ||||
1711 | ########################################################################### | ||||
1712 | 1 | 2µs | my $self = shift; | ||
1713 | |||||
1714 | 1 | 3µs | return unless (defined $self->{checker}); # no factory defined; we can't check | ||
1715 | |||||
1716 | 1 | 10µs | if ($self->{conf}->{txrep_user2global_ratio} && defined $self->{user_storage} && ($self->{user_storage} != $self->{global_storage})) { | ||
1717 | $self->{user_storage}->finish(); | ||||
1718 | $self->{global_storage}->finish(); | ||||
1719 | $self->{user_storage} = undef; | ||||
1720 | $self->{global_storage} = undef; | ||||
1721 | } elsif (defined $self->{default_storage}) { | ||||
1722 | 1 | 11µs | 1 | 1.02ms | $self->{default_storage}->finish(); # spent 1.02ms making 1 call to Mail::SpamAssassin::DBBasedAddrList::finish |
1723 | 1 | 4µs | $self->{default_storage} = $self->{checker} = undef; | ||
1724 | } | ||||
1725 | 1 | 19µs | $self->{factory} = undef; | ||
1726 | } | ||||
1727 | |||||
1728 | |||||
1729 | ########################################################################### | ||||
1730 | # spent 73.1ms (55.2+17.9) within Mail::SpamAssassin::Plugin::TxRep::ip_to_awl_key which was called 936 times, avg 78µs/call:
# 936 times (55.2ms+17.9ms) by Mail::SpamAssassin::Plugin::TxRep::pack_addr at line 1785, avg 78µs/call | ||||
1731 | ########################################################################### | ||||
1732 | 936 | 4.00ms | my ($self, $origip) = @_; | ||
1733 | |||||
1734 | 936 | 1.76ms | my $result; | ||
1735 | 936 | 6.34ms | local $1; | ||
1736 | 936 | 32.5ms | 936 | 17.9ms | if (!defined $origip) { # spent 17.9ms making 936 calls to Mail::SpamAssassin::Plugin::TxRep::CORE:match, avg 19µs/call |
1737 | # could not find an IP address to use | ||||
1738 | } elsif ($origip =~ /^ (\d{1,3} \. \d{1,3}) \. \d{1,3} \. \d{1,3} $/xs) { | ||||
1739 | 936 | 3.03ms | my $mask_len = $self->{ipv4_mask_len}; | ||
1740 | 936 | 2.64ms | $mask_len = 16 if !defined $mask_len; | ||
1741 | # handle the default and easy cases manually | ||||
1742 | 936 | 4.95ms | if ($mask_len == 32) {$result = $origip;} | ||
1743 | 936 | 4.04ms | elsif ($mask_len == 16) {$result = $1;} | ||
1744 | else { | ||||
1745 | my $origip_obj = NetAddr::IP->new($origip . '/' . $mask_len); | ||||
1746 | if (!defined $origip_obj) { # invalid IPv4 address | ||||
1747 | dbg("TxRep: bad IPv4 address $origip"); | ||||
1748 | } else { | ||||
1749 | $result = $origip_obj->network->addr; | ||||
1750 | $result =~s/(\.0){1,3}\z//; # truncate zero tail | ||||
1751 | } | ||||
1752 | } | ||||
1753 | } elsif ($origip =~ /:/ && # triage | ||||
1754 | $origip =~ | ||||
1755 | /^ [0-9a-f]{0,4} (?: : [0-9a-f]{0,4} | \. [0-9]{1,3} ){2,9} $/xsi) { | ||||
1756 | # looks like an IPv6 address | ||||
1757 | my $mask_len = $self->{ipv6_mask_len}; | ||||
1758 | $mask_len = 48 if !defined $mask_len; | ||||
1759 | my $origip_obj = NetAddr::IP->new6($origip . '/' . $mask_len); | ||||
1760 | if (!defined $origip_obj) { # invalid IPv6 address | ||||
1761 | dbg("TxRep: bad IPv6 address $origip"); | ||||
1762 | } else { | ||||
1763 | $result = $origip_obj->network->full6; # string in a canonical form | ||||
1764 | $result =~ s/(:0000){1,7}\z/::/; # compress zero tail | ||||
1765 | } | ||||
1766 | } else { | ||||
1767 | dbg("TxRep: bad IP address $origip"); | ||||
1768 | } | ||||
1769 | 936 | 4.82ms | if (defined $result && length($result) > 39) { # just in case, keep under | ||
1770 | $result = substr($result,0,39); # the awl.ip field size | ||||
1771 | } | ||||
1772 | # if (defined $result) {dbg("TxRep: IP masking %s -> %s", $origip || '?', $result || '?');} | ||||
1773 | 936 | 11.2ms | return $result; | ||
1774 | } | ||||
1775 | |||||
1776 | |||||
1777 | ########################################################################### | ||||
1778 | # spent 252ms (147+105) within Mail::SpamAssassin::Plugin::TxRep::pack_addr which was called 3218 times, avg 78µs/call:
# 3216 times (147ms+105ms) by Mail::SpamAssassin::Plugin::TxRep::get_sender at line 1538, avg 78µs/call
# 2 times (72µs+12µs) by Mail::SpamAssassin::Plugin::TxRep::modify_reputation at line 1607, avg 42µs/call | ||||
1779 | ########################################################################### | ||||
1780 | 3218 | 15.4ms | my ($self, $addr, $origip) = @_; | ||
1781 | |||||
1782 | 3218 | 13.7ms | $addr = lc $addr; | ||
1783 | 3218 | 72.0ms | 3218 | 31.6ms | $addr =~ s/[\000\;\'\"\!\|]/_/gs; # paranoia # spent 31.6ms making 3218 calls to Mail::SpamAssassin::Plugin::TxRep::CORE:subst, avg 10µs/call |
1784 | |||||
1785 | 4154 | 18.5ms | 936 | 73.1ms | if ( defined $origip) {$origip = $self->ip_to_awl_key($origip);} # spent 73.1ms making 936 calls to Mail::SpamAssassin::Plugin::TxRep::ip_to_awl_key, avg 78µs/call |
1786 | 5500 | 18.1ms | if (!defined $origip) {$origip = 'none';} | ||
1787 | 3218 | 76.8ms | return $addr . "|ip=" . $origip; | ||
1788 | } | ||||
1789 | |||||
- - | |||||
1792 | # ------------------------------------------------------------------------- | ||||
1793 | =head1 LEARNING SPAM / HAM | ||||
1794 | |||||
1795 | When SpamAssassin is told to learn (or relearn) a given message as spam or | ||||
1796 | ham, all reputations relevant to the message (email, email_ip, domain, ip, helo) | ||||
1797 | in both global and user storages will be updated using the C<txrep_learn_penalty> | ||||
1798 | respectively the C<rxrep_learn_bonus> values. The new reputation of given sender | ||||
1799 | property (email, domain,...) will be the respective result of one of the following | ||||
1800 | formulas: | ||||
1801 | |||||
1802 | new_reputation = old_reputation + learn_penalty | ||||
1803 | new_reputation = old_reputation - learn_bonus | ||||
1804 | |||||
1805 | The TxRep plugin currently does track each message individually, hence it | ||||
1806 | does not detect when you learn the message repeatedly. It will add/subtract | ||||
1807 | the penalty/bonus score each time the message is fed to the spam learner. | ||||
1808 | |||||
1809 | =cut | ||||
1810 | ######################################################### plugin hook ##### | ||||
1811 | # spent 19µs within Mail::SpamAssassin::Plugin::TxRep::learner_new which was called:
# once (19µs+0s) by Mail::SpamAssassin::PluginHandler::callback at line 204 of Mail/SpamAssassin/PluginHandler.pm | ||||
1812 | ########################################################################### | ||||
1813 | 1 | 2µs | my ($self) = @_; | ||
1814 | |||||
1815 | 1 | 9µs | $self->{txKeepStoreTied} = 1; | ||
1816 | 1 | 16µs | return $self; | ||
1817 | } | ||||
1818 | |||||
1819 | |||||
1820 | ######################################################### plugin hook ##### | ||||
1821 | sub autolearn { | ||||
1822 | ########################################################################### | ||||
1823 | my ($self, $params) = @_; | ||||
1824 | |||||
1825 | $self->{last_pms} = $params->{permsgstatus}; | ||||
1826 | return $self->{autolearn} = 1; | ||||
1827 | } | ||||
1828 | |||||
1829 | |||||
1830 | ######################################################### plugin hook ##### | ||||
1831 | # spent 96387s (34.1ms+96387) within Mail::SpamAssassin::Plugin::TxRep::learn_message which was called 234 times, avg 412s/call:
# 234 times (34.1ms+96387s) by Mail::SpamAssassin::PluginHandler::callback at line 204 of Mail/SpamAssassin/PluginHandler.pm, avg 412s/call | ||||
1832 | ########################################################################### | ||||
1833 | 234 | 555µs | my ($self, $params) = @_; | ||
1834 | 234 | 670µs | return 0 unless (defined $params->{isspam}); | ||
1835 | |||||
1836 | 234 | 1.66ms | 234 | 1.59ms | dbg("TxRep: learning a message"); # spent 1.59ms making 234 calls to Mail::SpamAssassin::Logger::dbg, avg 7µs/call |
1837 | 234 | 3.40ms | 234 | 76.7ms | my $pms = ($self->{last_pms})? $self->{last_pms} : Mail::SpamAssassin::PerMsgStatus->new($self->{main}, $params->{msg}); # spent 76.7ms making 234 calls to Mail::SpamAssassin::PerMsgStatus::new, avg 328µs/call |
1838 | 234 | 1.55ms | if (!defined $pms->{relays_internal} && !defined $pms->{relays_external}) { | ||
1839 | 234 | 2.81ms | 234 | 46.4s | $pms->extract_message_metadata(); # spent 46.4s making 234 calls to Mail::SpamAssassin::PerMsgStatus::extract_message_metadata, avg 198ms/call |
1840 | } | ||||
1841 | |||||
1842 | 234 | 1.38ms | if ($params->{isspam}) | ||
1843 | 234 | 1.51ms | {$self->{learning} = $self->{conf}->{txrep_learn_penalty};} | ||
1844 | else {$self->{learning} = -1 * $self->{conf}->{txrep_learn_bonus};} | ||||
1845 | |||||
1846 | 234 | 2.94ms | 234 | 96340s | my $ret = !$self->{learning} || $self->check_senders_reputation($pms); # spent 96340s making 234 calls to Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation, avg 412s/call |
1847 | 234 | 720µs | $self->{learning} = undef; | ||
1848 | 234 | 13.8ms | 79 | 3.25ms | return $ret; # spent 3.25ms making 79 calls to Mail::SpamAssassin::PerMsgStatus::DESTROY, avg 41µs/call |
1849 | } | ||||
1850 | |||||
1851 | |||||
1852 | ######################################################### plugin hook ##### | ||||
1853 | # spent 49211s (12.5ms+49211) within Mail::SpamAssassin::Plugin::TxRep::forget_message which was called 234 times, avg 210s/call:
# 234 times (12.5ms+49211s) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1291, avg 210s/call | ||||
1854 | ########################################################################### | ||||
1855 | 234 | 984µs | my ($self, $params) = @_; | ||
1856 | 234 | 876µs | return 0 unless ($self->{conf}->{use_txrep}); | ||
1857 | 234 | 785µs | my $pms = ($self->{last_pms})? $self->{last_pms} : Mail::SpamAssassin::PerMsgStatus->new($self->{main}, $params->{msg}); | ||
1858 | |||||
1859 | 234 | 1.57ms | 234 | 1.48ms | dbg("TxRep: forgetting a message"); # spent 1.48ms making 234 calls to Mail::SpamAssassin::Logger::dbg, avg 6µs/call |
1860 | 234 | 639µs | $self->{forgetting} = 1; | ||
1861 | 234 | 2.47ms | 234 | 0s | my $ret = $self->check_senders_reputation($pms); # spent 49211s making 234 calls to Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation, avg 210s/call, recursion: max depth 1, sum of overlapping time 49211s |
1862 | 234 | 1.04ms | $self->{forgetting} = undef; | ||
1863 | 234 | 2.61ms | return $ret; | ||
1864 | } | ||||
1865 | |||||
1866 | |||||
1867 | ######################################################### plugin hook ##### | ||||
1868 | sub learner_expire_old_training { | ||||
1869 | ########################################################################### | ||||
1870 | my ($self, $params) = @_; | ||||
1871 | return 0 unless ($self->{conf}->{use_txrep} && $self->{conf}->{txrep_expiry_days}); | ||||
1872 | |||||
1873 | dbg("TxRep: expiry not implemented yet"); | ||||
1874 | # dbg("TxRep: expiry starting"); | ||||
1875 | # my $timer = $self->{main}->time_method("expire_bayes"); | ||||
1876 | # $self->{store}->expire_old_tokens($params); | ||||
1877 | # dbg("TxRep: expiry completed"); | ||||
1878 | } | ||||
1879 | |||||
1880 | |||||
1881 | ######################################################### plugin hook ##### | ||||
1882 | # spent 1.12ms (49µs+1.07) within Mail::SpamAssassin::Plugin::TxRep::learner_close which was called:
# once (49µs+1.07ms) by Mail::SpamAssassin::PluginHandler::callback at line 204 of Mail/SpamAssassin/PluginHandler.pm | ||||
1883 | ########################################################################### | ||||
1884 | 1 | 2µs | my ($self, $params) = @_; | ||
1885 | 1 | 3µs | my $quiet = $params->{quiet}; | ||
1886 | 1 | 4µs | return 0 unless ($self->{conf}->{use_txrep}); | ||
1887 | |||||
1888 | 1 | 3µs | $self->{txKeepStoreTied} = undef; | ||
1889 | 1 | 10µs | 1 | 1.06ms | $self->finish(); # spent 1.06ms making 1 call to Mail::SpamAssassin::Plugin::TxRep::finish |
1890 | 1 | 18µs | 1 | 8µs | dbg("TxRep: learner_close"); # spent 8µs making 1 call to Mail::SpamAssassin::Logger::dbg |
1891 | } | ||||
1892 | |||||
1893 | |||||
1894 | # ------------------------------------------------------------------------- | ||||
1895 | =head1 OPTIMIZING TXREP | ||||
1896 | |||||
1897 | TxRep can be optimized for speed and simplicity, or for the precision in | ||||
1898 | assigning the reputation scores. | ||||
1899 | |||||
1900 | First of all TxRep can be quickly disabled and re-enabled through the option | ||||
1901 | L</C<use_txrep>>. It can be done globally, or individually in each respective | ||||
1902 | C<user_prefs>. Disabling TxRep will not destroy the database, so it can be | ||||
1903 | re-enabled any time later again. | ||||
1904 | |||||
1905 | On many systems, SQL-based storage may perform faster than the default | ||||
1906 | Berkeley DB storage, so you should consider setting it up. See the section | ||||
1907 | L</SQL-BASED STORAGE> for instructions. | ||||
1908 | |||||
1909 | Then there are multiple settings that can reduce the number of records stored | ||||
1910 | in the database, hence reducing the size of the storage, and also the processing | ||||
1911 | time: | ||||
1912 | |||||
1913 | 1. Setting L</C<txrep_user2global_ratio>> to zero will disable the dual storage, | ||||
1914 | halving so the disk space requirements, and the processing times of this plugin. | ||||
1915 | |||||
1916 | 2. You can disable all but one of the L<REPUTATION WEIGHTS>. The EMAIL_IP is | ||||
1917 | the most specific option, so it is the most likely choice in such case, but you | ||||
1918 | could base the reputation system on any of the remaining scores. Each of the | ||||
1919 | enabled reputations adds a new entry to the database for each new identificator. | ||||
1920 | So while for example the number of recorded and scored domains may be big, the | ||||
1921 | number of stored IP addresses will be probably higher, and would require more | ||||
1922 | space in the storage. | ||||
1923 | |||||
1924 | 3. Disabling the L</C<txrep_track_messages>> avoids storing a separate entry | ||||
1925 | for every scanned message, hence also reducing the disk space requirements, and | ||||
1926 | the processing time. | ||||
1927 | |||||
1928 | 4. Disabling the option L</C<txrep_autolearn>> will save the processing time | ||||
1929 | at messages that trigger the auto-learning process. | ||||
1930 | |||||
1931 | 5. Disabling L</C<txrep_whitelist_out>> will reduce the processing time at | ||||
1932 | outbound connections. | ||||
1933 | |||||
1934 | 6. Keeping the option L</C<auto_whitelist_distinguish_signed>> enabled may help | ||||
1935 | slightly reducing the size of the database, because at signed messages, the | ||||
1936 | originating IP address is ignored, hence no additional database entries are | ||||
1937 | needed for each separate IP address (resp. a masked block of IP addresses). | ||||
1938 | |||||
1939 | |||||
1940 | Since TxRep reuses the storage architecture of the former AWL plugin, for | ||||
1941 | initializing the SQL storage, the same instructions apply also to TxRep. | ||||
1942 | Although the old AWL table can be reused for TxRep, by default TxRep expects | ||||
1943 | the SQL table to be named "txrep". | ||||
1944 | |||||
1945 | To install a new SQL table for TxRep, run the appropriate SQL file for your | ||||
1946 | system under the /sql directory. | ||||
1947 | |||||
1948 | If you get a syntax error at an older version of MySQL, use TYPE=MyISAM | ||||
1949 | instead of ENGINE=MyISAM at the end of the command. You can also use other | ||||
1950 | types of ENGINE (depending on what is available on your system). For example | ||||
1951 | MEMORY engine stores the entire table in the server memory, achieving | ||||
1952 | performance similar to Redis. You would need to care about the replication | ||||
1953 | of the RAM table to disk through a cronjob, to avoid loss of data at reboot. | ||||
1954 | The InnoDB engine is used by default, offering high scalability (database | ||||
1955 | size and concurence of accesses). In conjunction with a high value of | ||||
1956 | innodb_buffer_pool or with the memcached plugin (MySQL v5.6+) it can also | ||||
1957 | offer performance comparable to Redis. | ||||
1958 | |||||
1959 | =cut | ||||
1960 | |||||
1961 | 1 | 11µs | 1; | ||
# spent 78.1ms within Mail::SpamAssassin::Plugin::TxRep::CORE:match which was called 14043 times, avg 6µs/call:
# 6432 times (37.1ms+0s) by Mail::SpamAssassin::Plugin::TxRep::get_sender at line 1543, avg 6µs/call
# 6206 times (20.6ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1266, avg 3µs/call
# 936 times (17.9ms+0s) by Mail::SpamAssassin::Plugin::TxRep::ip_to_awl_key at line 1736, avg 19µs/call
# 468 times (2.49ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1255, avg 5µs/call
# once (5µs+0s) by Mail::SpamAssassin::Plugin::TxRep::open_storages at line 1640 | |||||
# spent 87.6ms within Mail::SpamAssassin::Plugin::TxRep::CORE:regcomp which was called 6206 times, avg 14µs/call:
# 6206 times (87.6ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1266, avg 14µs/call | |||||
# spent 34.7ms within Mail::SpamAssassin::Plugin::TxRep::CORE:subst which was called 3686 times, avg 9µs/call:
# 3218 times (31.6ms+0s) by Mail::SpamAssassin::Plugin::TxRep::pack_addr at line 1783, avg 10µs/call
# 468 times (3.10ms+0s) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1257, avg 7µs/call |