Filename | /usr/local/lib/perl5/site_perl/Mail/SpamAssassin/BayesStore/DBM.pm |
Statements | Executed 435816 statements in 9.97s |
Calls | P | F | Exclusive Time |
Inclusive Time |
Subroutine |
---|---|---|---|---|---|
235 | 1 | 1 | 2.23s | 3.41s | multi_tok_count_change | Mail::SpamAssassin::BayesStore::DBM::
74419 | 3 | 1 | 1.01s | 1.01s | defer_update | Mail::SpamAssassin::BayesStore::DBM::
73949 | 1 | 1 | 183ms | 183ms | CORE:unpack (opcode) | Mail::SpamAssassin::BayesStore::DBM::
12862 | 1 | 1 | 130ms | 130ms | get_magic_re | Mail::SpamAssassin::BayesStore::DBM::
236 | 2 | 1 | 113ms | 6.32s | tie_db_readonly | Mail::SpamAssassin::BayesStore::DBM::
236 | 2 | 1 | 99.0ms | 177ms | get_storage_variables | Mail::SpamAssassin::BayesStore::DBM::
235 | 1 | 1 | 29.7ms | 29.7ms | CORE:open (opcode) | Mail::SpamAssassin::BayesStore::DBM::
235 | 1 | 1 | 26.7ms | 98.4ms | cleanup | Mail::SpamAssassin::BayesStore::DBM::
235 | 1 | 1 | 24.8ms | 24.8ms | CORE:syswrite (opcode) | Mail::SpamAssassin::BayesStore::DBM::
241 | 2 | 1 | 12.3ms | 12.3ms | CORE:ftfile (opcode) | Mail::SpamAssassin::BayesStore::DBM::
460 | 1 | 1 | 11.0ms | 31.6ms | seen_get | Mail::SpamAssassin::BayesStore::DBM::
235 | 1 | 1 | 7.30ms | 11.1ms | seen_put | Mail::SpamAssassin::BayesStore::DBM::
235 | 1 | 1 | 6.50ms | 9.44ms | nspam_nham_change | Mail::SpamAssassin::BayesStore::DBM::
235 | 1 | 1 | 6.33ms | 11.5ms | _get_journal_filename | Mail::SpamAssassin::BayesStore::DBM::
1 | 1 | 1 | 5.42ms | 5.91ms | BEGIN@38 | Mail::SpamAssassin::BayesStore::DBM::
235 | 2 | 1 | 5.05ms | 7.36ms | _check_db_version | Mail::SpamAssassin::BayesStore::DBM::
1183 | 2 | 1 | 4.61ms | 4.61ms | CORE:match (opcode) | Mail::SpamAssassin::BayesStore::DBM::
476 | 3 | 1 | 3.39ms | 3.39ms | DBM_MODULE | Mail::SpamAssassin::BayesStore::DBM::
238 | 2 | 1 | 3.30ms | 3.30ms | HAS_DBM_MODULE | Mail::SpamAssassin::BayesStore::DBM::
235 | 1 | 1 | 3.27ms | 3.27ms | CORE:close (opcode) | Mail::SpamAssassin::BayesStore::DBM::
238 | 2 | 1 | 1.63ms | 1.63ms | DB_EXTENSIONS | Mail::SpamAssassin::BayesStore::DBM::
474 | 4 | 1 | 1.62ms | 1.62ms | CORE:umask (opcode) | Mail::SpamAssassin::BayesStore::DBM::
2 | 1 | 1 | 1.00ms | 2.64s | tie_db_writable | Mail::SpamAssassin::BayesStore::DBM::
235 | 1 | 1 | 850µs | 850µs | CORE:tell (opcode) | Mail::SpamAssassin::BayesStore::DBM::
2 | 2 | 2 | 580µs | 393ms | untie_db | Mail::SpamAssassin::BayesStore::DBM::
1 | 1 | 1 | 58µs | 72µs | BEGIN@20 | Mail::SpamAssassin::BayesStore::DBM::
1 | 1 | 1 | 54µs | 76µs | new | Mail::SpamAssassin::BayesStore::DBM::
1 | 1 | 1 | 42µs | 253µs | BEGIN@41 | Mail::SpamAssassin::BayesStore::DBM::
1 | 1 | 1 | 38µs | 281µs | BEGIN@27 | Mail::SpamAssassin::BayesStore::DBM::
1 | 1 | 1 | 38µs | 201µs | BEGIN@1900 | Mail::SpamAssassin::BayesStore::DBM::
1 | 1 | 1 | 37µs | 257µs | BEGIN@1450 | Mail::SpamAssassin::BayesStore::DBM::
1 | 1 | 1 | 35µs | 208µs | BEGIN@1896 | Mail::SpamAssassin::BayesStore::DBM::
2 | 1 | 1 | 35µs | 35µs | CORE:ftdir (opcode) | Mail::SpamAssassin::BayesStore::DBM::
1 | 1 | 1 | 35µs | 216µs | BEGIN@37 | Mail::SpamAssassin::BayesStore::DBM::
1 | 1 | 1 | 34µs | 181µs | BEGIN@29 | Mail::SpamAssassin::BayesStore::DBM::
1 | 1 | 1 | 34µs | 184µs | BEGIN@31 | Mail::SpamAssassin::BayesStore::DBM::
1 | 1 | 1 | 33µs | 65µs | _upgrade_db | Mail::SpamAssassin::BayesStore::DBM::
1 | 1 | 1 | 31µs | 218µs | BEGIN@1899 | Mail::SpamAssassin::BayesStore::DBM::
1 | 1 | 1 | 30µs | 834µs | BEGIN@43 | Mail::SpamAssassin::BayesStore::DBM::
1 | 1 | 1 | 30µs | 243µs | BEGIN@1895 | Mail::SpamAssassin::BayesStore::DBM::
1 | 1 | 1 | 28µs | 35µs | BEGIN@22 | Mail::SpamAssassin::BayesStore::DBM::
1 | 1 | 1 | 28µs | 246µs | BEGIN@39 | Mail::SpamAssassin::BayesStore::DBM::
1 | 1 | 1 | 28µs | 2.48ms | BEGIN@25 | Mail::SpamAssassin::BayesStore::DBM::
1 | 1 | 1 | 27µs | 92µs | BEGIN@23 | Mail::SpamAssassin::BayesStore::DBM::
1 | 1 | 1 | 26µs | 68µs | BEGIN@21 | Mail::SpamAssassin::BayesStore::DBM::
1 | 1 | 1 | 23µs | 151µs | BEGIN@26 | Mail::SpamAssassin::BayesStore::DBM::
1 | 1 | 1 | 22µs | 218µs | BEGIN@1897 | Mail::SpamAssassin::BayesStore::DBM::
1 | 1 | 1 | 18µs | 18µs | BEGIN@36 | Mail::SpamAssassin::BayesStore::DBM::
1 | 1 | 1 | 16µs | 16µs | BEGIN@28 | Mail::SpamAssassin::BayesStore::DBM::
1 | 1 | 1 | 12µs | 12µs | db_readable | Mail::SpamAssassin::BayesStore::DBM::
1 | 1 | 1 | 9µs | 9µs | CORE:qr (opcode) | Mail::SpamAssassin::BayesStore::DBM::
0 | 0 | 0 | 0s | 0s | _rename_file | Mail::SpamAssassin::BayesStore::DBM::
0 | 0 | 0 | 0s | 0s | _seen_delete_direct | Mail::SpamAssassin::BayesStore::DBM::
0 | 0 | 0 | 0s | 0s | _seen_put_direct | Mail::SpamAssassin::BayesStore::DBM::
0 | 0 | 0 | 0s | 0s | _sync_journal | Mail::SpamAssassin::BayesStore::DBM::
0 | 0 | 0 | 0s | 0s | _sync_journal_trapped | Mail::SpamAssassin::BayesStore::DBM::
0 | 0 | 0 | 0s | 0s | _unlink_file | Mail::SpamAssassin::BayesStore::DBM::
0 | 0 | 0 | 0s | 0s | backup_database | Mail::SpamAssassin::BayesStore::DBM::
0 | 0 | 0 | 0s | 0s | calculate_expire_delta | Mail::SpamAssassin::BayesStore::DBM::
0 | 0 | 0 | 0s | 0s | clear_database | Mail::SpamAssassin::BayesStore::DBM::
0 | 0 | 0 | 0s | 0s | db_writable | Mail::SpamAssassin::BayesStore::DBM::
0 | 0 | 0 | 0s | 0s | dump_db_toks | Mail::SpamAssassin::BayesStore::DBM::
0 | 0 | 0 | 0s | 0s | get_running_expire_tok | Mail::SpamAssassin::BayesStore::DBM::
0 | 0 | 0 | 0s | 0s | nspam_nham_get | Mail::SpamAssassin::BayesStore::DBM::
0 | 0 | 0 | 0s | 0s | perform_upgrade | Mail::SpamAssassin::BayesStore::DBM::
0 | 0 | 0 | 0s | 0s | remove_running_expire_tok | Mail::SpamAssassin::BayesStore::DBM::
0 | 0 | 0 | 0s | 0s | restore_database | Mail::SpamAssassin::BayesStore::DBM::
0 | 0 | 0 | 0s | 0s | sa_die | Mail::SpamAssassin::BayesStore::DBM::
0 | 0 | 0 | 0s | 0s | seen_delete | Mail::SpamAssassin::BayesStore::DBM::
0 | 0 | 0 | 0s | 0s | set_last_expire | Mail::SpamAssassin::BayesStore::DBM::
0 | 0 | 0 | 0s | 0s | set_running_expire_tok | Mail::SpamAssassin::BayesStore::DBM::
0 | 0 | 0 | 0s | 0s | sync | Mail::SpamAssassin::BayesStore::DBM::
0 | 0 | 0 | 0s | 0s | sync_due | Mail::SpamAssassin::BayesStore::DBM::
0 | 0 | 0 | 0s | 0s | tok_count_change | Mail::SpamAssassin::BayesStore::DBM::
0 | 0 | 0 | 0s | 0s | tok_get | Mail::SpamAssassin::BayesStore::DBM::
0 | 0 | 0 | 0s | 0s | tok_get_all | Mail::SpamAssassin::BayesStore::DBM::
0 | 0 | 0 | 0s | 0s | tok_pack | Mail::SpamAssassin::BayesStore::DBM::
0 | 0 | 0 | 0s | 0s | tok_put | Mail::SpamAssassin::BayesStore::DBM::
0 | 0 | 0 | 0s | 0s | tok_sync_counters | Mail::SpamAssassin::BayesStore::DBM::
0 | 0 | 0 | 0s | 0s | tok_sync_nspam_nham | Mail::SpamAssassin::BayesStore::DBM::
0 | 0 | 0 | 0s | 0s | tok_touch | Mail::SpamAssassin::BayesStore::DBM::
0 | 0 | 0 | 0s | 0s | tok_touch_all | Mail::SpamAssassin::BayesStore::DBM::
0 | 0 | 0 | 0s | 0s | tok_touch_token | Mail::SpamAssassin::BayesStore::DBM::
0 | 0 | 0 | 0s | 0s | tok_unpack | Mail::SpamAssassin::BayesStore::DBM::
0 | 0 | 0 | 0s | 0s | token_expiration | Mail::SpamAssassin::BayesStore::DBM::
0 | 0 | 0 | 0s | 0s | upgrade_old_dbm_files_trapped | Mail::SpamAssassin::BayesStore::DBM::
Line | State ments |
Time on line |
Calls | Time in subs |
Code |
---|---|---|---|---|---|
1 | # <@LICENSE> | ||||
2 | # Licensed to the Apache Software Foundation (ASF) under one or more | ||||
3 | # contributor license agreements. See the NOTICE file distributed with | ||||
4 | # this work for additional information regarding copyright ownership. | ||||
5 | # The ASF licenses this file to you under the Apache License, Version 2.0 | ||||
6 | # (the "License"); you may not use this file except in compliance with | ||||
7 | # the License. You may obtain a copy of the License at: | ||||
8 | # | ||||
9 | # http://www.apache.org/licenses/LICENSE-2.0 | ||||
10 | # | ||||
11 | # Unless required by applicable law or agreed to in writing, software | ||||
12 | # distributed under the License is distributed on an "AS IS" BASIS, | ||||
13 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||||
14 | # See the License for the specific language governing permissions and | ||||
15 | # limitations under the License. | ||||
16 | # </@LICENSE> | ||||
17 | |||||
18 | package Mail::SpamAssassin::BayesStore::DBM; | ||||
19 | |||||
20 | 2 | 71µs | 2 | 86µs | # spent 72µs (58+14) within Mail::SpamAssassin::BayesStore::DBM::BEGIN@20 which was called:
# once (58µs+14µs) by Mail::SpamAssassin::Plugin::Bayes::learner_new at line 20 # spent 72µs making 1 call to Mail::SpamAssassin::BayesStore::DBM::BEGIN@20
# spent 14µs making 1 call to strict::import |
21 | 2 | 81µs | 2 | 110µs | # spent 68µs (26+42) within Mail::SpamAssassin::BayesStore::DBM::BEGIN@21 which was called:
# once (26µs+42µs) by Mail::SpamAssassin::Plugin::Bayes::learner_new at line 21 # spent 68µs making 1 call to Mail::SpamAssassin::BayesStore::DBM::BEGIN@21
# spent 42µs making 1 call to warnings::import |
22 | 2 | 76µs | 2 | 42µs | # spent 35µs (28+7) within Mail::SpamAssassin::BayesStore::DBM::BEGIN@22 which was called:
# once (28µs+7µs) by Mail::SpamAssassin::Plugin::Bayes::learner_new at line 22 # spent 35µs making 1 call to Mail::SpamAssassin::BayesStore::DBM::BEGIN@22
# spent 7µs making 1 call to bytes::import |
23 | 2 | 66µs | 2 | 156µs | # spent 92µs (27+65) within Mail::SpamAssassin::BayesStore::DBM::BEGIN@23 which was called:
# once (27µs+65µs) by Mail::SpamAssassin::Plugin::Bayes::learner_new at line 23 # spent 92µs making 1 call to Mail::SpamAssassin::BayesStore::DBM::BEGIN@23
# spent 65µs making 1 call to re::import |
24 | |||||
25 | 2 | 85µs | 2 | 4.93ms | # spent 2.48ms (28µs+2.45) within Mail::SpamAssassin::BayesStore::DBM::BEGIN@25 which was called:
# once (28µs+2.45ms) by Mail::SpamAssassin::Plugin::Bayes::learner_new at line 25 # spent 2.48ms making 1 call to Mail::SpamAssassin::BayesStore::DBM::BEGIN@25
# spent 2.45ms making 1 call to Exporter::import |
26 | 2 | 78µs | 2 | 278µs | # spent 151µs (23+128) within Mail::SpamAssassin::BayesStore::DBM::BEGIN@26 which was called:
# once (23µs+128µs) by Mail::SpamAssassin::Plugin::Bayes::learner_new at line 26 # spent 151µs making 1 call to Mail::SpamAssassin::BayesStore::DBM::BEGIN@26
# spent 128µs making 1 call to Exporter::import |
27 | 2 | 78µs | 2 | 523µs | # spent 281µs (38+242) within Mail::SpamAssassin::BayesStore::DBM::BEGIN@27 which was called:
# once (38µs+242µs) by Mail::SpamAssassin::Plugin::Bayes::learner_new at line 27 # spent 281µs making 1 call to Mail::SpamAssassin::BayesStore::DBM::BEGIN@27
# spent 242µs making 1 call to Exporter::import |
28 | 2 | 71µs | 1 | 16µs | # spent 16µs within Mail::SpamAssassin::BayesStore::DBM::BEGIN@28 which was called:
# once (16µs+0s) by Mail::SpamAssassin::Plugin::Bayes::learner_new at line 28 # spent 16µs making 1 call to Mail::SpamAssassin::BayesStore::DBM::BEGIN@28 |
29 | 2 | 153µs | 2 | 328µs | # spent 181µs (34+147) within Mail::SpamAssassin::BayesStore::DBM::BEGIN@29 which was called:
# once (34µs+147µs) by Mail::SpamAssassin::Plugin::Bayes::learner_new at line 29 # spent 181µs making 1 call to Mail::SpamAssassin::BayesStore::DBM::BEGIN@29
# spent 147µs making 1 call to Exporter::import |
30 | |||||
31 | # spent 184µs (34+150) within Mail::SpamAssassin::BayesStore::DBM::BEGIN@31 which was called:
# once (34µs+150µs) by Mail::SpamAssassin::Plugin::Bayes::learner_new at line 34 | ||||
32 | 3 | 16µs | 1 | 150µs | eval { require Digest::SHA; import Digest::SHA qw(sha1); 1 } # spent 150µs making 1 call to Exporter::import |
33 | 1 | 20µs | or do { require Digest::SHA1; import Digest::SHA1 qw(sha1) } | ||
34 | 1 | 51µs | 1 | 184µs | } # spent 184µs making 1 call to Mail::SpamAssassin::BayesStore::DBM::BEGIN@31 |
35 | |||||
36 | 2 | 76µs | 1 | 18µs | # spent 18µs within Mail::SpamAssassin::BayesStore::DBM::BEGIN@36 which was called:
# once (18µs+0s) by Mail::SpamAssassin::Plugin::Bayes::learner_new at line 36 # spent 18µs making 1 call to Mail::SpamAssassin::BayesStore::DBM::BEGIN@36 |
37 | 2 | 88µs | 2 | 397µs | # spent 216µs (35+181) within Mail::SpamAssassin::BayesStore::DBM::BEGIN@37 which was called:
# once (35µs+181µs) by Mail::SpamAssassin::Plugin::Bayes::learner_new at line 37 # spent 216µs making 1 call to Mail::SpamAssassin::BayesStore::DBM::BEGIN@37
# spent 181µs making 1 call to Exporter::import |
38 | 2 | 345µs | 1 | 5.91ms | # spent 5.91ms (5.42+485µs) within Mail::SpamAssassin::BayesStore::DBM::BEGIN@38 which was called:
# once (5.42ms+485µs) by Mail::SpamAssassin::Plugin::Bayes::learner_new at line 38 # spent 5.91ms making 1 call to Mail::SpamAssassin::BayesStore::DBM::BEGIN@38 |
39 | 2 | 120µs | 2 | 465µs | # spent 246µs (28+219) within Mail::SpamAssassin::BayesStore::DBM::BEGIN@39 which was called:
# once (28µs+219µs) by Mail::SpamAssassin::Plugin::Bayes::learner_new at line 39 # spent 246µs making 1 call to Mail::SpamAssassin::BayesStore::DBM::BEGIN@39
# spent 218µs making 1 call to Exporter::import |
40 | |||||
41 | 2 | 147µs | 3 | 464µs | # spent 253µs (42+211) within Mail::SpamAssassin::BayesStore::DBM::BEGIN@41 which was called:
# once (42µs+211µs) by Mail::SpamAssassin::Plugin::Bayes::learner_new at line 41 # spent 253µs making 1 call to Mail::SpamAssassin::BayesStore::DBM::BEGIN@41
# spent 202µs making 1 call to constant::import
# spent 9µs making 1 call to Mail::SpamAssassin::BayesStore::DBM::CORE:qr |
42 | |||||
43 | 1 | 3µs | # spent 834µs (30+803) within Mail::SpamAssassin::BayesStore::DBM::BEGIN@43 which was called:
# once (30µs+803µs) by Mail::SpamAssassin::Plugin::Bayes::learner_new at line 50 | ||
44 | @ISA | ||||
45 | @DBNAMES | ||||
46 | $NSPAM_MAGIC_TOKEN $NHAM_MAGIC_TOKEN $LAST_EXPIRE_MAGIC_TOKEN $LAST_JOURNAL_SYNC_MAGIC_TOKEN | ||||
47 | $NTOKENS_MAGIC_TOKEN $OLDEST_TOKEN_AGE_MAGIC_TOKEN $LAST_EXPIRE_REDUCE_MAGIC_TOKEN | ||||
48 | $RUNNING_EXPIRE_MAGIC_TOKEN $DB_VERSION_MAGIC_TOKEN $LAST_ATIME_DELTA_MAGIC_TOKEN | ||||
49 | $NEWEST_TOKEN_AGE_MAGIC_TOKEN | ||||
50 | 1 | 13.4ms | 2 | 1.64ms | }; # spent 834µs making 1 call to Mail::SpamAssassin::BayesStore::DBM::BEGIN@43
# spent 803µs making 1 call to vars::import |
51 | |||||
52 | 1 | 24µs | @ISA = qw( Mail::SpamAssassin::BayesStore ); | ||
53 | |||||
54 | # db layout (quoting Matt): | ||||
55 | # | ||||
56 | # > need five db files though to make it real fast: | ||||
57 | # [probs] 1. ngood and nbad (two entries, so could be a flat file rather | ||||
58 | # than a db file). (now 2 entries in db_toks) | ||||
59 | # [toks] 2. good token -> number seen | ||||
60 | # [toks] 3. bad token -> number seen (both are packed into 1 entry in 1 db) | ||||
61 | # [probs] 4. Consolidated good token -> probability | ||||
62 | # [probs] 5. Consolidated bad token -> probability | ||||
63 | # > As you add new mails, you update the entry in 2 or 3, then regenerate | ||||
64 | # > the entry for that token in 4 or 5. | ||||
65 | # > Then as you test a new mail, you just need to pull the probability | ||||
66 | # > direct from 4 and 5, and generate the overall probability. A simple and | ||||
67 | # > very fast operation. | ||||
68 | # | ||||
69 | # jm: we use probs as overall probability. <0.5 = ham, >0.5 = spam | ||||
70 | # | ||||
71 | # update: probs is no longer maintained as a db, to keep on-disk and in-core | ||||
72 | # usage down. | ||||
73 | # | ||||
74 | # also, added a new one to support forgetting, auto-learning, and | ||||
75 | # auto-forgetting for refiled mails: | ||||
76 | # [seen] 6. a list of Message-IDs of messages already learnt from. values | ||||
77 | # are 's' for learnt-as-spam, 'h' for learnt-as-ham. | ||||
78 | # | ||||
79 | # and another, called [scancount] to model the scan-count for expiry. | ||||
80 | # This is not a database. Instead it increases by one byte for each | ||||
81 | # message scanned (note: scanned, not learned). | ||||
82 | |||||
83 | 1 | 3µs | @DBNAMES = qw(toks seen); | ||
84 | |||||
85 | # These are the magic tokens we use to track stuff in the DB. | ||||
86 | # The format is '^M^A^G^I^C' followed by any string you want. | ||||
87 | # None of the control chars will be in a real token. | ||||
88 | 1 | 2µs | $DB_VERSION_MAGIC_TOKEN = "\015\001\007\011\003DBVERSION"; | ||
89 | 1 | 2µs | $LAST_ATIME_DELTA_MAGIC_TOKEN = "\015\001\007\011\003LASTATIMEDELTA"; | ||
90 | 1 | 2µs | $LAST_EXPIRE_MAGIC_TOKEN = "\015\001\007\011\003LASTEXPIRE"; | ||
91 | 1 | 2µs | $LAST_EXPIRE_REDUCE_MAGIC_TOKEN = "\015\001\007\011\003LASTEXPIREREDUCE"; | ||
92 | 1 | 2µs | $LAST_JOURNAL_SYNC_MAGIC_TOKEN = "\015\001\007\011\003LASTJOURNALSYNC"; | ||
93 | 1 | 2µs | $NEWEST_TOKEN_AGE_MAGIC_TOKEN = "\015\001\007\011\003NEWESTAGE"; | ||
94 | 1 | 2µs | $NHAM_MAGIC_TOKEN = "\015\001\007\011\003NHAM"; | ||
95 | 1 | 2µs | $NSPAM_MAGIC_TOKEN = "\015\001\007\011\003NSPAM"; | ||
96 | 1 | 2µs | $NTOKENS_MAGIC_TOKEN = "\015\001\007\011\003NTOKENS"; | ||
97 | 1 | 2µs | $OLDEST_TOKEN_AGE_MAGIC_TOKEN = "\015\001\007\011\003OLDESTAGE"; | ||
98 | 1 | 2µs | $RUNNING_EXPIRE_MAGIC_TOKEN = "\015\001\007\011\003RUNNINGEXPIRE"; | ||
99 | |||||
100 | # spent 3.30ms within Mail::SpamAssassin::BayesStore::DBM::HAS_DBM_MODULE which was called 238 times, avg 14µs/call:
# 236 times (3.27ms+0s) by Mail::SpamAssassin::BayesStore::DBM::tie_db_readonly at line 141, avg 14µs/call
# 2 times (28µs+0s) by Mail::SpamAssassin::BayesStore::DBM::tie_db_writable at line 234, avg 14µs/call | ||||
101 | 238 | 549µs | my ($self) = @_; | ||
102 | 238 | 776µs | if (exists($self->{has_dbm_module})) { | ||
103 | 237 | 2.84ms | return $self->{has_dbm_module}; | ||
104 | } | ||||
105 | 2 | 20µs | $self->{has_dbm_module} = eval { require DB_File; }; | ||
106 | } | ||||
107 | |||||
108 | # spent 3.39ms within Mail::SpamAssassin::BayesStore::DBM::DBM_MODULE which was called 476 times, avg 7µs/call:
# 468 times (3.35ms+0s) by Mail::SpamAssassin::BayesStore::DBM::tie_db_readonly at line 182, avg 7µs/call
# 4 times (21µs+0s) by Mail::SpamAssassin::BayesStore::DBM::tie_db_writable at line 295, avg 5µs/call
# 4 times (20µs+0s) by Mail::SpamAssassin::BayesStore::DBM::tie_db_writable at line 301, avg 5µs/call | ||||
109 | 476 | 4.52ms | return "DB_File"; | ||
110 | } | ||||
111 | |||||
112 | # Possible file extensions used by the kinds of database files DB_File | ||||
113 | # might create. We need these so we can create a new file and rename | ||||
114 | # it into place. | ||||
115 | # spent 1.63ms within Mail::SpamAssassin::BayesStore::DBM::DB_EXTENSIONS which was called 238 times, avg 7µs/call:
# 236 times (1.62ms+0s) by Mail::SpamAssassin::BayesStore::DBM::tie_db_readonly at line 161, avg 7µs/call
# 2 times (11µs+0s) by Mail::SpamAssassin::BayesStore::DBM::tie_db_writable at line 259, avg 6µs/call | ||||
116 | 238 | 2.15ms | return ('', '.db'); | ||
117 | } | ||||
118 | |||||
119 | ########################################################################### | ||||
120 | |||||
121 | # spent 76µs (54+22) within Mail::SpamAssassin::BayesStore::DBM::new which was called:
# once (54µs+22µs) by Mail::SpamAssassin::Plugin::Bayes::learner_new at line 3 of (eval 1133)[Mail/SpamAssassin/Plugin/Bayes.pm:1661] | ||||
122 | 1 | 2µs | my $class = shift; | ||
123 | 1 | 2µs | $class = ref($class) || $class; | ||
124 | |||||
125 | 1 | 14µs | 1 | 22µs | my $self = $class->SUPER::new(@_); # spent 22µs making 1 call to Mail::SpamAssassin::BayesStore::new |
126 | |||||
127 | 1 | 11µs | $self->{supported_db_version} = 3; | ||
128 | |||||
129 | 1 | 2µs | $self->{already_tied} = 0; | ||
130 | 1 | 2µs | $self->{is_locked} = 0; | ||
131 | 1 | 3µs | $self->{string_to_journal} = ''; | ||
132 | |||||
133 | 1 | 11µs | $self; | ||
134 | } | ||||
135 | |||||
136 | ########################################################################### | ||||
137 | |||||
138 | # spent 6.32s (113ms+6.20) within Mail::SpamAssassin::BayesStore::DBM::tie_db_readonly which was called 236 times, avg 26.8ms/call:
# 235 times (113ms+6.20s) by Mail::SpamAssassin::Plugin::Bayes::learn_message at line 389 of Mail/SpamAssassin/Plugin/Bayes.pm, avg 26.9ms/call
# once (130µs+386µs) by Mail::SpamAssassin::Plugin::Bayes::learner_is_scan_available at line 640 of Mail/SpamAssassin/Plugin/Bayes.pm | ||||
139 | 236 | 643µs | my ($self) = @_; | ||
140 | |||||
141 | 236 | 2.23ms | 236 | 3.27ms | if (!$self->HAS_DBM_MODULE) { # spent 3.27ms making 236 calls to Mail::SpamAssassin::BayesStore::DBM::HAS_DBM_MODULE, avg 14µs/call |
142 | dbg("bayes: %s module not installed, cannot use bayes", $self->DBM_MODULE); | ||||
143 | return 0; | ||||
144 | } | ||||
145 | |||||
146 | # return if we've already tied to the db's, using the same mode | ||||
147 | # (locked/unlocked) as before. | ||||
148 | 236 | 1.37ms | return 1 if ($self->{already_tied} && $self->{is_locked} == 0); | ||
149 | |||||
150 | 236 | 904µs | my $main = $self->{bayes}->{main}; | ||
151 | 236 | 1.08ms | if (!defined($main->{conf}->{bayes_path})) { | ||
152 | dbg("bayes: bayes_path not defined"); | ||||
153 | return 0; | ||||
154 | } | ||||
155 | |||||
156 | 236 | 2.54ms | 236 | 11.7ms | $self->read_db_configs(); # spent 11.7ms making 236 calls to Mail::SpamAssassin::BayesStore::read_db_configs, avg 50µs/call |
157 | |||||
158 | 236 | 2.71ms | 236 | 5.53ms | my $path = $main->sed_path($main->{conf}->{bayes_path}); # spent 5.53ms making 236 calls to Mail::SpamAssassin::sed_path, avg 23µs/call |
159 | |||||
160 | 236 | 565µs | my $found = 0; | ||
161 | 236 | 2.37ms | 236 | 1.62ms | for my $ext ($self->DB_EXTENSIONS) { # spent 1.62ms making 236 calls to Mail::SpamAssassin::BayesStore::DBM::DB_EXTENSIONS, avg 7µs/call |
162 | 238 | 14.9ms | 238 | 12.3ms | if (-f $path.'_toks'.$ext) { # spent 12.3ms making 238 calls to Mail::SpamAssassin::BayesStore::DBM::CORE:ftfile, avg 52µs/call |
163 | 234 | 547µs | $found = 1; | ||
164 | 234 | 786µs | last; | ||
165 | } | ||||
166 | } | ||||
167 | |||||
168 | 236 | 470µs | if (!$found) { | ||
169 | 2 | 18µs | 2 | 15µs | dbg("bayes: no dbs present, cannot tie DB R/O: %s", $path.'_toks'); # spent 15µs making 2 calls to Mail::SpamAssassin::Logger::dbg, avg 7µs/call |
170 | 2 | 22µs | return 0; | ||
171 | } | ||||
172 | |||||
173 | 234 | 1.55ms | foreach my $dbname (@DBNAMES) { | ||
174 | 468 | 2.70ms | my $name = $path.'_'.$dbname; | ||
175 | 468 | 1.36ms | my $db_var = 'db_'.$dbname; | ||
176 | 468 | 4.38ms | 468 | 4.28ms | dbg("bayes: tie-ing to DB file R/O $name"); # spent 4.28ms making 468 calls to Mail::SpamAssassin::Logger::dbg, avg 9µs/call |
177 | |||||
178 | # Bug 6901, [rt.cpan.org #83060] | ||||
179 | # DB_File: Repeated tie to the same hash with no untie causes corruption | ||||
180 | 936 | 5.52s | 468 | 5.49s | untie %{$self->{$db_var}}; # has no effect if the variable is not tied # spent 5.49s making 468 calls to DB_File::DESTROY, avg 11.7ms/call |
181 | |||||
182 | 936 | 16.9ms | 936 | 99.6ms | if (!tie %{$self->{$db_var}}, $self->DBM_MODULE, $name, O_RDONLY, # spent 96.3ms making 468 calls to DB_File::TIEHASH, avg 206µs/call
# spent 3.35ms making 468 calls to Mail::SpamAssassin::BayesStore::DBM::DBM_MODULE, avg 7µs/call |
183 | (oct($main->{conf}->{bayes_file_mode}) & 0666)) | ||||
184 | { | ||||
185 | # bug 2975: it's acceptable for the db_seen to not be present, | ||||
186 | # to allow it to be recycled. if that's the case, just create | ||||
187 | # a new, empty one. we don't need to lock it, since we won't | ||||
188 | # be writing to it; let the R/W api deal with that case. | ||||
189 | |||||
190 | if ($dbname eq 'seen') { | ||||
191 | # Bug 6901, [rt.cpan.org #83060] | ||||
192 | untie %{$self->{$db_var}}; # has no effect if the variable is not tied | ||||
193 | tie %{$self->{$db_var}}, $self->DBM_MODULE, $name, O_RDWR|O_CREAT, | ||||
194 | (oct($main->{conf}->{bayes_file_mode}) & 0666) | ||||
195 | or goto failed_to_tie; | ||||
196 | } | ||||
197 | else { | ||||
198 | goto failed_to_tie; | ||||
199 | } | ||||
200 | } | ||||
201 | } | ||||
202 | |||||
203 | 234 | 4.43ms | 234 | 176ms | $self->{db_version} = ($self->get_storage_variables())[6]; # spent 176ms making 234 calls to Mail::SpamAssassin::BayesStore::DBM::get_storage_variables, avg 752µs/call |
204 | 234 | 2.00ms | 234 | 1.88ms | dbg("bayes: found bayes db version %s", $self->{db_version}); # spent 1.88ms making 234 calls to Mail::SpamAssassin::Logger::dbg, avg 8µs/call |
205 | |||||
206 | # If the DB version is one we don't understand, abort! | ||||
207 | 234 | 2.39ms | 234 | 7.33ms | if ($self->_check_db_version() != 0) { # spent 7.33ms making 234 calls to Mail::SpamAssassin::BayesStore::DBM::_check_db_version, avg 31µs/call |
208 | 1 | 67µs | warn("bayes: bayes db version ".$self->{db_version}." is not able to be used, aborting!"); | ||
209 | 1 | 12µs | 1 | 392ms | $self->untie_db(); # spent 392ms making 1 call to Mail::SpamAssassin::BayesStore::DBM::untie_db |
210 | 1 | 16µs | return 0; | ||
211 | } | ||||
212 | |||||
213 | 233 | 750µs | $self->{already_tied} = 1; | ||
214 | 233 | 9.58ms | return 1; | ||
215 | |||||
216 | failed_to_tie: | ||||
217 | warn "bayes: cannot open bayes databases ${path}_* R/O: tie failed: $!\n"; | ||||
218 | foreach my $dbname (@DBNAMES) { | ||||
219 | my $db_var = 'db_'.$dbname; | ||||
220 | next unless exists $self->{$db_var}; | ||||
221 | dbg("bayes: untie-ing DB file $dbname"); | ||||
222 | untie %{$self->{$db_var}}; | ||||
223 | } | ||||
224 | |||||
225 | return 0; | ||||
226 | } | ||||
227 | |||||
228 | # tie() to the databases, read-write and locked. Any callers of | ||||
229 | # this should ensure they call untie_db() afterwards! | ||||
230 | # | ||||
231 | # spent 2.64s (1.00ms+2.64) within Mail::SpamAssassin::BayesStore::DBM::tie_db_writable which was called 2 times, avg 1.32s/call:
# 2 times (1.00ms+2.64s) by Mail::SpamAssassin::Plugin::Bayes::learn_message at line 389 of Mail/SpamAssassin/Plugin/Bayes.pm, avg 1.32s/call | ||||
232 | 2 | 5µs | my ($self) = @_; | ||
233 | |||||
234 | 2 | 17µs | 2 | 28µs | if (!$self->HAS_DBM_MODULE) { # spent 28µs making 2 calls to Mail::SpamAssassin::BayesStore::DBM::HAS_DBM_MODULE, avg 14µs/call |
235 | dbg("bayes: %s module not installed, cannot use bayes", $self->DBM_MODULE); | ||||
236 | return 0; | ||||
237 | } | ||||
238 | |||||
239 | # Useful shortcut ... | ||||
240 | 2 | 9µs | my $main = $self->{bayes}->{main}; | ||
241 | |||||
242 | # if we've already tied the db's using the same mode | ||||
243 | # (locked/unlocked) as we want now, freshen the lock and return. | ||||
244 | 2 | 5µs | if ($self->{already_tied} && $self->{is_locked} == 1) { | ||
245 | $main->{locker}->refresh_lock($self->{locked_file}); | ||||
246 | return 1; | ||||
247 | } | ||||
248 | |||||
249 | 2 | 10µs | if (!defined($main->{conf}->{bayes_path})) { | ||
250 | dbg("bayes: bayes_path not defined"); | ||||
251 | return 0; | ||||
252 | } | ||||
253 | |||||
254 | 2 | 17µs | 2 | 77µs | $self->read_db_configs(); # spent 77µs making 2 calls to Mail::SpamAssassin::BayesStore::read_db_configs, avg 38µs/call |
255 | |||||
256 | 2 | 25µs | 2 | 44µs | my $path = $main->sed_path($main->{conf}->{bayes_path}); # spent 44µs making 2 calls to Mail::SpamAssassin::sed_path, avg 22µs/call |
257 | |||||
258 | 2 | 5µs | my $found = 0; | ||
259 | 2 | 20µs | 2 | 11µs | for my $ext ($self->DB_EXTENSIONS) { # spent 11µs making 2 calls to Mail::SpamAssassin::BayesStore::DBM::DB_EXTENSIONS, avg 6µs/call |
260 | 3 | 72µs | 3 | 33µs | if (-f $path.'_toks'.$ext) { # spent 33µs making 3 calls to Mail::SpamAssassin::BayesStore::DBM::CORE:ftfile, avg 11µs/call |
261 | 1 | 2µs | $found = 1; | ||
262 | 1 | 4µs | last; | ||
263 | } | ||||
264 | } | ||||
265 | |||||
266 | 2 | 24µs | 2 | 525µs | my $parentdir = dirname($path); # spent 525µs making 2 calls to File::Basename::dirname, avg 262µs/call |
267 | 2 | 58µs | 2 | 35µs | if (!-d $parentdir) { # spent 35µs making 2 calls to Mail::SpamAssassin::BayesStore::DBM::CORE:ftdir, avg 17µs/call |
268 | # run in an eval(); if mkpath has no perms, it calls die() | ||||
269 | eval { | ||||
270 | mkpath($parentdir, 0, (oct($main->{conf}->{bayes_file_mode}) & 0777)); | ||||
271 | }; | ||||
272 | } | ||||
273 | |||||
274 | 2 | 4µs | my $tout; | ||
275 | 2 | 12µs | if ($main->{learn_wait_for_lock}) { | ||
276 | 2 | 5µs | $tout = 300; # TODO: Dan to write better lock code | ||
277 | } else { | ||||
278 | $tout = 10; | ||||
279 | } | ||||
280 | 2 | 35µs | 2 | 2.63s | if ($main->{locker}->safe_lock($path, $tout, $main->{conf}->{bayes_file_mode})) # spent 2.63s making 2 calls to Mail::SpamAssassin::Locker::UnixNFSSafe::safe_lock, avg 1.32s/call |
281 | { | ||||
282 | 2 | 10µs | $self->{locked_file} = $path; | ||
283 | 2 | 7µs | $self->{is_locked} = 1; | ||
284 | } else { | ||||
285 | warn "bayes: cannot open bayes databases ${path}_* R/W: lock failed: $!\n"; | ||||
286 | return 0; | ||||
287 | } | ||||
288 | |||||
289 | 2 | 30µs | 2 | 7µs | my $umask = umask 0; # spent 7µs making 2 calls to Mail::SpamAssassin::BayesStore::DBM::CORE:umask, avg 3µs/call |
290 | 2 | 23µs | foreach my $dbname (@DBNAMES) { | ||
291 | 4 | 28µs | my $name = $path.'_'.$dbname; | ||
292 | 4 | 11µs | my $db_var = 'db_'.$dbname; | ||
293 | 4 | 42µs | 4 | 38µs | dbg("bayes: tie-ing to DB file R/W $name"); # spent 38µs making 4 calls to Mail::SpamAssassin::Logger::dbg, avg 9µs/call |
294 | |||||
295 | 4 | 65µs | 8 | 3.27ms | ($self->DBM_MODULE eq 'DB_File') and # spent 3.25ms making 4 calls to Mail::SpamAssassin::Util::avoid_db_file_locking_bug, avg 811µs/call
# spent 21µs making 4 calls to Mail::SpamAssassin::BayesStore::DBM::DBM_MODULE, avg 5µs/call |
296 | Mail::SpamAssassin::Util::avoid_db_file_locking_bug ($name); | ||||
297 | |||||
298 | # Bug 6901, [rt.cpan.org #83060] | ||||
299 | 8 | 46µs | untie %{$self->{$db_var}}; # has no effect if the variable is not tied | ||
300 | 4 | 9µs | tie %{$self->{$db_var}}, $self->DBM_MODULE, $name, O_RDWR|O_CREAT, | ||
301 | 4 | 125µs | 8 | 942µs | (oct($main->{conf}->{bayes_file_mode}) & 0666) # spent 922µs making 4 calls to DB_File::TIEHASH, avg 230µs/call
# spent 20µs making 4 calls to Mail::SpamAssassin::BayesStore::DBM::DBM_MODULE, avg 5µs/call |
302 | or goto failed_to_tie; | ||||
303 | } | ||||
304 | 2 | 21µs | 2 | 5µs | umask $umask; # spent 5µs making 2 calls to Mail::SpamAssassin::BayesStore::DBM::CORE:umask, avg 3µs/call |
305 | |||||
306 | # set our cache to what version DB we're using | ||||
307 | 2 | 34µs | 2 | 1.10ms | $self->{db_version} = ($self->get_storage_variables())[6]; # spent 1.10ms making 2 calls to Mail::SpamAssassin::BayesStore::DBM::get_storage_variables, avg 550µs/call |
308 | # don't bother printing this unless found since it would be bogus anyway | ||||
309 | 2 | 12µs | 1 | 8µs | dbg("bayes: found bayes db version %s", $self->{db_version}) if $found; # spent 8µs making 1 call to Mail::SpamAssassin::Logger::dbg |
310 | |||||
311 | # figure out if we can read the current DB and if we need to do a | ||||
312 | # DB version update and do it if necessary if either has a problem, | ||||
313 | # fail immediately | ||||
314 | # | ||||
315 | 2 | 17µs | 1 | 65µs | if ($found && !$self->_upgrade_db()) { # spent 65µs making 1 call to Mail::SpamAssassin::BayesStore::DBM::_upgrade_db |
316 | $self->untie_db(); | ||||
317 | return 0; | ||||
318 | } | ||||
319 | elsif (!$found) { # new DB, make sure we know that ... | ||||
320 | 1 | 65µs | 3 | 19µs | $self->{db_version} = $self->{db_toks}->{$DB_VERSION_MAGIC_TOKEN} = $self->DB_VERSION; # spent 9µs making 1 call to Mail::SpamAssassin::BayesStore::DB_VERSION
# spent 6µs making 1 call to DB_File::STORE
# spent 5µs making 1 call to DB_File::FETCH |
321 | 1 | 24µs | 1 | 4µs | $self->{db_toks}->{$NTOKENS_MAGIC_TOKEN} = 0; # no tokens in the db ... # spent 4µs making 1 call to DB_File::STORE |
322 | 1 | 8µs | 1 | 8µs | dbg("bayes: new db, set db version %s and 0 tokens", $self->{db_version}); # spent 8µs making 1 call to Mail::SpamAssassin::Logger::dbg |
323 | } | ||||
324 | |||||
325 | 2 | 7µs | $self->{already_tied} = 1; | ||
326 | 2 | 24µs | return 1; | ||
327 | |||||
328 | failed_to_tie: | ||||
329 | my $err = $!; | ||||
330 | umask $umask; | ||||
331 | |||||
332 | foreach my $dbname (@DBNAMES) { | ||||
333 | my $db_var = 'db_'.$dbname; | ||||
334 | next unless exists $self->{$db_var}; | ||||
335 | dbg("bayes: untie-ing DB file $dbname"); | ||||
336 | untie %{$self->{$db_var}}; | ||||
337 | } | ||||
338 | |||||
339 | if ($self->{is_locked}) { | ||||
340 | $self->{bayes}->{main}->{locker}->safe_unlock($self->{locked_file}); | ||||
341 | $self->{is_locked} = 0; | ||||
342 | } | ||||
343 | warn "bayes: cannot open bayes databases ${path}_* R/W: tie failed: $err\n"; | ||||
344 | return 0; | ||||
345 | } | ||||
346 | |||||
347 | # Do we understand how to deal with this DB version? | ||||
348 | # spent 7.36ms (5.05+2.31) within Mail::SpamAssassin::BayesStore::DBM::_check_db_version which was called 235 times, avg 31µs/call:
# 234 times (5.03ms+2.30ms) by Mail::SpamAssassin::BayesStore::DBM::tie_db_readonly at line 207, avg 31µs/call
# once (22µs+10µs) by Mail::SpamAssassin::BayesStore::DBM::_upgrade_db at line 359 | ||||
349 | 235 | 617µs | my ($self) = @_; | ||
350 | |||||
351 | # return -1 if older, 0 if current, 1 if newer | ||||
352 | 235 | 4.41ms | 235 | 2.31ms | return $self->{db_version} <=> $self->DB_VERSION; # spent 2.31ms making 235 calls to Mail::SpamAssassin::BayesStore::DB_VERSION, avg 10µs/call |
353 | } | ||||
354 | |||||
355 | # Check to see if we need to upgrade the DB, and do so if necessary | ||||
356 | # spent 65µs (33+32) within Mail::SpamAssassin::BayesStore::DBM::_upgrade_db which was called:
# once (33µs+32µs) by Mail::SpamAssassin::BayesStore::DBM::tie_db_writable at line 315 | ||||
357 | 1 | 2µs | my ($self) = @_; | ||
358 | |||||
359 | 1 | 11µs | 1 | 32µs | my $verschk = $self->_check_db_version(); # spent 32µs making 1 call to Mail::SpamAssassin::BayesStore::DBM::_check_db_version |
360 | 1 | 2µs | my $res = 0; # used later on for tie() checks | ||
361 | 1 | 2µs | my $umask; # used later for umask modifications | ||
362 | |||||
363 | # If the DB is the latest version, no problem. | ||||
364 | 1 | 12µs | return 1 if ($verschk == 0); | ||
365 | |||||
366 | # If the DB is a newer version that we know what to do with ... abort! | ||||
367 | if ($verschk == 1) { | ||||
368 | warn("bayes: bayes db version ".$self->{db_version}." is newer than we understand, aborting!"); | ||||
369 | return 0; | ||||
370 | } | ||||
371 | |||||
372 | # If the current DB version is lower than the new version, upgrade! | ||||
373 | # Do conversions in order so we can go 1 -> 3, make sure to update | ||||
374 | # $self->{db_version} along the way | ||||
375 | |||||
376 | dbg("bayes: detected bayes db format %s, upgrading", $self->{db_version}); | ||||
377 | |||||
378 | # since DB_File will not shrink a database (!!), we need to *create* | ||||
379 | # a new one instead. | ||||
380 | my $main = $self->{bayes}->{main}; | ||||
381 | my $path = $main->sed_path($main->{conf}->{bayes_path}); | ||||
382 | my $name = $path.'_toks'; | ||||
383 | |||||
384 | # older version's journal files are likely not in the same format as the new ones, so remove it. | ||||
385 | my $jpath = $self->_get_journal_filename(); | ||||
386 | if (-f $jpath) { | ||||
387 | dbg("bayes: old journal file found, removing"); | ||||
388 | warn "bayes: couldn't remove $jpath: $!" if (!unlink $jpath); | ||||
389 | } | ||||
390 | |||||
391 | if ($self->{db_version} < 2) { | ||||
392 | dbg("bayes: upgrading database format from v%s to v2", $self->{db_version}); | ||||
393 | $self->set_running_expire_tok(); | ||||
394 | |||||
395 | my ($DB_NSPAM_MAGIC_TOKEN, $DB_NHAM_MAGIC_TOKEN, $DB_NTOKENS_MAGIC_TOKEN); | ||||
396 | my ($DB_OLDEST_TOKEN_AGE_MAGIC_TOKEN, $DB_LAST_EXPIRE_MAGIC_TOKEN); | ||||
397 | |||||
398 | # Magic tokens for version 0, defined as '**[A-Z]+' | ||||
399 | if ($self->{db_version} == 0) { | ||||
400 | $DB_NSPAM_MAGIC_TOKEN = '**NSPAM'; | ||||
401 | $DB_NHAM_MAGIC_TOKEN = '**NHAM'; | ||||
402 | $DB_NTOKENS_MAGIC_TOKEN = '**NTOKENS'; | ||||
403 | #$DB_OLDEST_TOKEN_AGE_MAGIC_TOKEN = '**OLDESTAGE'; | ||||
404 | #$DB_LAST_EXPIRE_MAGIC_TOKEN = '**LASTEXPIRE'; | ||||
405 | #$DB_SCANCOUNT_BASE_MAGIC_TOKEN = '**SCANBASE'; | ||||
406 | #$DB_RUNNING_EXPIRE_MAGIC_TOKEN = '**RUNNINGEXPIRE'; | ||||
407 | } | ||||
408 | else { | ||||
409 | $DB_NSPAM_MAGIC_TOKEN = "\015\001\007\011\003NSPAM"; | ||||
410 | $DB_NHAM_MAGIC_TOKEN = "\015\001\007\011\003NHAM"; | ||||
411 | $DB_NTOKENS_MAGIC_TOKEN = "\015\001\007\011\003NTOKENS"; | ||||
412 | #$DB_OLDEST_TOKEN_AGE_MAGIC_TOKEN = "\015\001\007\011\003OLDESTAGE"; | ||||
413 | #$DB_LAST_EXPIRE_MAGIC_TOKEN = "\015\001\007\011\003LASTEXPIRE"; | ||||
414 | #$DB_SCANCOUNT_BASE_MAGIC_TOKEN = "\015\001\007\011\003SCANBASE"; | ||||
415 | #$DB_RUNNING_EXPIRE_MAGIC_TOKEN = "\015\001\007\011\003RUNNINGEXPIRE"; | ||||
416 | } | ||||
417 | |||||
418 | # remember when we started ... | ||||
419 | my $started = time; | ||||
420 | my $newatime = $started; | ||||
421 | |||||
422 | # use O_EXCL to avoid races (bonus paranoia, since we should be locked | ||||
423 | # anyway) | ||||
424 | my %new_toks; | ||||
425 | $umask = umask 0; | ||||
426 | |||||
427 | $res = tie %new_toks, $self->DBM_MODULE, "${name}.new", | ||||
428 | O_RDWR|O_CREAT|O_EXCL, | ||||
429 | (oct($main->{conf}->{bayes_file_mode}) & 0666); | ||||
430 | umask $umask; | ||||
431 | return 0 unless $res; | ||||
432 | undef $res; | ||||
433 | |||||
434 | # add the magic tokens to the new db. | ||||
435 | $new_toks{$NSPAM_MAGIC_TOKEN} = $self->{db_toks}->{$DB_NSPAM_MAGIC_TOKEN}; | ||||
436 | $new_toks{$NHAM_MAGIC_TOKEN} = $self->{db_toks}->{$DB_NHAM_MAGIC_TOKEN}; | ||||
437 | $new_toks{$NTOKENS_MAGIC_TOKEN} = $self->{db_toks}->{$DB_NTOKENS_MAGIC_TOKEN}; | ||||
438 | $new_toks{$DB_VERSION_MAGIC_TOKEN} = 2; # we're now a DB version 2 file | ||||
439 | $new_toks{$OLDEST_TOKEN_AGE_MAGIC_TOKEN} = $newatime; | ||||
440 | $new_toks{$LAST_EXPIRE_MAGIC_TOKEN} = $newatime; | ||||
441 | $new_toks{$NEWEST_TOKEN_AGE_MAGIC_TOKEN} = $newatime; | ||||
442 | $new_toks{$LAST_JOURNAL_SYNC_MAGIC_TOKEN} = $newatime; | ||||
443 | $new_toks{$LAST_ATIME_DELTA_MAGIC_TOKEN} = 0; | ||||
444 | $new_toks{$LAST_EXPIRE_REDUCE_MAGIC_TOKEN} = 0; | ||||
445 | |||||
446 | # deal with the data tokens | ||||
447 | my ($tok, $packed); | ||||
448 | my $count = 0; | ||||
449 | while (($tok, $packed) = each %{$self->{db_toks}}) { | ||||
450 | next if ($tok =~ /^(?:\*\*[A-Z]+$|\015\001\007\011\003)/); # skip magic tokens | ||||
451 | |||||
452 | my ($ts, $th, $atime) = $self->tok_unpack($packed); | ||||
453 | $new_toks{$tok} = $self->tok_pack($ts, $th, $newatime); | ||||
454 | |||||
455 | # Refresh the lock every so often... | ||||
456 | if (($count++ % 1000) == 0) { | ||||
457 | $self->set_running_expire_tok(); | ||||
458 | } | ||||
459 | } | ||||
460 | |||||
461 | |||||
462 | # now untie so we can do renames | ||||
463 | untie %{$self->{db_toks}}; | ||||
464 | untie %new_toks; | ||||
465 | |||||
466 | # This is the critical phase (moving files around), so don't allow | ||||
467 | # it to be interrupted. | ||||
468 | local $SIG{'INT'} = 'IGNORE'; | ||||
469 | local $SIG{'TERM'} = 'IGNORE'; | ||||
470 | local $SIG{'HUP'} = 'IGNORE' if !am_running_on_windows(); | ||||
471 | |||||
472 | # older versions used scancount, so kill the stupid little file ... | ||||
473 | my $msgc = $path.'_msgcount'; | ||||
474 | if (-f $msgc) { | ||||
475 | dbg("bayes: old msgcount file found, removing"); | ||||
476 | if (!unlink $msgc) { | ||||
477 | warn "bayes: couldn't remove $msgc: $!"; | ||||
478 | } | ||||
479 | } | ||||
480 | |||||
481 | # now rename in the new one. Try several extensions | ||||
482 | for my $ext ($self->DB_EXTENSIONS) { | ||||
483 | my $newf = $name.'.new'.$ext; | ||||
484 | my $oldf = $name.$ext; | ||||
485 | next unless (-f $newf); | ||||
486 | if (!rename ($newf, $oldf)) { | ||||
487 | warn "bayes: rename $newf to $oldf failed: $!\n"; | ||||
488 | return 0; | ||||
489 | } | ||||
490 | } | ||||
491 | |||||
492 | # re-tie to the new db in read-write mode ... | ||||
493 | $umask = umask 0; | ||||
494 | # Bug 6901, [rt.cpan.org #83060] | ||||
495 | untie %{$self->{db_toks}}; # has no effect if the variable is not tied | ||||
496 | $res = tie %{$self->{db_toks}}, $self->DBM_MODULE, $name, O_RDWR|O_CREAT, | ||||
497 | (oct($main->{conf}->{bayes_file_mode}) & 0666); | ||||
498 | umask $umask; | ||||
499 | return 0 unless $res; | ||||
500 | undef $res; | ||||
501 | |||||
502 | dbg("bayes: upgraded database format from v%s to v2 in %d seconds", | ||||
503 | $self->{db_version}, time - $started); | ||||
504 | $self->{db_version} = 2; # need this for other functions which check | ||||
505 | } | ||||
506 | |||||
507 | # Version 3 of the database converts all existing tokens to SHA1 hashes | ||||
508 | if ($self->{db_version} == 2) { | ||||
509 | dbg("bayes: upgrading database format from v%s to v3", $self->{db_version}); | ||||
510 | $self->set_running_expire_tok(); | ||||
511 | |||||
512 | my $DB_NSPAM_MAGIC_TOKEN = "\015\001\007\011\003NSPAM"; | ||||
513 | my $DB_NHAM_MAGIC_TOKEN = "\015\001\007\011\003NHAM"; | ||||
514 | my $DB_NTOKENS_MAGIC_TOKEN = "\015\001\007\011\003NTOKENS"; | ||||
515 | my $DB_OLDEST_TOKEN_AGE_MAGIC_TOKEN = "\015\001\007\011\003OLDESTAGE"; | ||||
516 | my $DB_LAST_EXPIRE_MAGIC_TOKEN = "\015\001\007\011\003LASTEXPIRE"; | ||||
517 | my $DB_NEWEST_TOKEN_AGE_MAGIC_TOKEN = "\015\001\007\011\003NEWESTAGE"; | ||||
518 | my $DB_LAST_JOURNAL_SYNC_MAGIC_TOKEN = "\015\001\007\011\003LASTJOURNALSYNC"; | ||||
519 | my $DB_LAST_ATIME_DELTA_MAGIC_TOKEN = "\015\001\007\011\003LASTATIMEDELTA"; | ||||
520 | my $DB_LAST_EXPIRE_REDUCE_MAGIC_TOKEN = "\015\001\007\011\003LASTEXPIREREDUCE"; | ||||
521 | |||||
522 | # remember when we started ... | ||||
523 | my $started = time; | ||||
524 | |||||
525 | # use O_EXCL to avoid races (bonus paranoia, since we should be locked | ||||
526 | # anyway) | ||||
527 | my %new_toks; | ||||
528 | $umask = umask 0; | ||||
529 | $res = tie %new_toks, $self->DBM_MODULE, "${name}.new", O_RDWR|O_CREAT|O_EXCL, | ||||
530 | (oct($main->{conf}->{bayes_file_mode}) & 0666); | ||||
531 | umask $umask; | ||||
532 | return 0 unless $res; | ||||
533 | undef $res; | ||||
534 | |||||
535 | # add the magic tokens to the new db. | ||||
536 | $new_toks{$NSPAM_MAGIC_TOKEN} = $self->{db_toks}->{$DB_NSPAM_MAGIC_TOKEN}; | ||||
537 | $new_toks{$NHAM_MAGIC_TOKEN} = $self->{db_toks}->{$DB_NHAM_MAGIC_TOKEN}; | ||||
538 | $new_toks{$NTOKENS_MAGIC_TOKEN} = $self->{db_toks}->{$DB_NTOKENS_MAGIC_TOKEN}; | ||||
539 | $new_toks{$DB_VERSION_MAGIC_TOKEN} = 3; # we're now a DB version 3 file | ||||
540 | $new_toks{$OLDEST_TOKEN_AGE_MAGIC_TOKEN} = $self->{db_toks}->{$DB_OLDEST_TOKEN_AGE_MAGIC_TOKEN}; | ||||
541 | $new_toks{$LAST_EXPIRE_MAGIC_TOKEN} = $self->{db_toks}->{$DB_LAST_EXPIRE_MAGIC_TOKEN}; | ||||
542 | $new_toks{$NEWEST_TOKEN_AGE_MAGIC_TOKEN} = $self->{db_toks}->{$DB_NEWEST_TOKEN_AGE_MAGIC_TOKEN}; | ||||
543 | $new_toks{$LAST_JOURNAL_SYNC_MAGIC_TOKEN} = $self->{db_toks}->{$DB_LAST_JOURNAL_SYNC_MAGIC_TOKEN}; | ||||
544 | $new_toks{$LAST_ATIME_DELTA_MAGIC_TOKEN} = $self->{db_toks}->{$DB_LAST_ATIME_DELTA_MAGIC_TOKEN}; | ||||
545 | $new_toks{$LAST_EXPIRE_REDUCE_MAGIC_TOKEN} =$self->{db_toks}->{$DB_LAST_EXPIRE_REDUCE_MAGIC_TOKEN}; | ||||
546 | |||||
547 | # deal with the data tokens | ||||
548 | my $count = 0; | ||||
549 | while (my ($tok, $packed) = each %{$self->{db_toks}}) { | ||||
550 | next if ($tok =~ /^\015\001\007\011\003/); # skip magic tokens | ||||
551 | my $tok_hash = substr(sha1($tok), -5); | ||||
552 | $new_toks{$tok_hash} = $packed; | ||||
553 | |||||
554 | # Refresh the lock every so often... | ||||
555 | if (($count++ % 1000) == 0) { | ||||
556 | $self->set_running_expire_tok(); | ||||
557 | } | ||||
558 | } | ||||
559 | |||||
560 | # now untie so we can do renames | ||||
561 | untie %{$self->{db_toks}}; | ||||
562 | untie %new_toks; | ||||
563 | |||||
564 | # This is the critical phase (moving files around), so don't allow | ||||
565 | # it to be interrupted. | ||||
566 | local $SIG{'INT'} = 'IGNORE'; | ||||
567 | local $SIG{'TERM'} = 'IGNORE'; | ||||
568 | local $SIG{'HUP'} = 'IGNORE' if !am_running_on_windows(); | ||||
569 | |||||
570 | # now rename in the new one. Try several extensions | ||||
571 | for my $ext ($self->DB_EXTENSIONS) { | ||||
572 | my $newf = $name.'.new'.$ext; | ||||
573 | my $oldf = $name.$ext; | ||||
574 | next unless (-f $newf); | ||||
575 | if (!rename($newf, $oldf)) { | ||||
576 | warn "bayes: rename $newf to $oldf failed: $!\n"; | ||||
577 | return 0; | ||||
578 | } | ||||
579 | } | ||||
580 | |||||
581 | # re-tie to the new db in read-write mode ... | ||||
582 | $umask = umask 0; | ||||
583 | # Bug 6901, [rt.cpan.org #83060] | ||||
584 | untie %{$self->{db_toks}}; # has no effect if the variable is not tied | ||||
585 | $res = tie %{$self->{db_toks}}, $self->DBM_MODULE, $name, O_RDWR|O_CREAT, | ||||
586 | (oct ($main->{conf}->{bayes_file_mode}) & 0666); | ||||
587 | umask $umask; | ||||
588 | return 0 unless $res; | ||||
589 | undef $res; | ||||
590 | |||||
591 | dbg("bayes: upgraded database format from v%s to v3 in %d seconds", | ||||
592 | $self->{db_version}, time - $started); | ||||
593 | |||||
594 | $self->{db_version} = 3; # need this for other functions which check | ||||
595 | } | ||||
596 | |||||
597 | # if ($self->{db_version} == 3) { | ||||
598 | # ... | ||||
599 | # $self->{db_version} = 4; # need this for other functions which check | ||||
600 | # } | ||||
601 | # ... and so on. | ||||
602 | |||||
603 | return 1; | ||||
604 | } | ||||
605 | |||||
606 | ########################################################################### | ||||
607 | |||||
608 | # spent 393ms (580µs+392) within Mail::SpamAssassin::BayesStore::DBM::untie_db which was called 2 times, avg 196ms/call:
# once (410µs+391ms) by Mail::SpamAssassin::BayesStore::DBM::tie_db_readonly at line 209
# once (170µs+1.02ms) by Mail::SpamAssassin::Plugin::Bayes::learner_close at line 332 of Mail/SpamAssassin/Plugin/Bayes.pm | ||||
609 | 2 | 7µs | my $self = shift; | ||
610 | |||||
611 | 2 | 7µs | return if (!$self->{already_tied}); | ||
612 | |||||
613 | 2 | 14µs | 2 | 13µs | dbg("bayes: untie-ing"); # spent 13µs making 2 calls to Mail::SpamAssassin::Logger::dbg, avg 7µs/call |
614 | |||||
615 | 2 | 19µs | foreach my $dbname (@DBNAMES) { | ||
616 | 4 | 19µs | my $db_var = 'db_'.$dbname; | ||
617 | |||||
618 | 4 | 36µs | if (exists $self->{$db_var}) { | ||
619 | # dbg("bayes: untie-ing $db_var"); | ||||
620 | 8 | 390ms | 4 | 390ms | untie %{$self->{$db_var}}; # spent 390ms making 4 calls to DB_File::DESTROY, avg 97.5ms/call |
621 | 4 | 35µs | delete $self->{$db_var}; | ||
622 | } | ||||
623 | } | ||||
624 | |||||
625 | 2 | 11µs | if ($self->{is_locked}) { | ||
626 | 2 | 23µs | 2 | 24µs | dbg("bayes: files locked, now unlocking lock"); # spent 24µs making 2 calls to Mail::SpamAssassin::Logger::dbg, avg 12µs/call |
627 | 2 | 37µs | 2 | 2.15ms | $self->{bayes}->{main}->{locker}->safe_unlock ($self->{locked_file}); # spent 2.15ms making 2 calls to Mail::SpamAssassin::Locker::UnixNFSSafe::safe_unlock, avg 1.07ms/call |
628 | 2 | 10µs | $self->{is_locked} = 0; | ||
629 | } | ||||
630 | |||||
631 | 2 | 7µs | $self->{already_tied} = 0; | ||
632 | 2 | 29µs | $self->{db_version} = undef; | ||
633 | } | ||||
634 | |||||
635 | ########################################################################### | ||||
636 | |||||
637 | sub calculate_expire_delta { | ||||
638 | my ($self, $newest_atime, $start, $max_expire_mult) = @_; | ||||
639 | |||||
640 | my %delta; # use a hash since an array is going to be very sparse | ||||
641 | |||||
642 | # do the first pass, figure out atime delta | ||||
643 | my ($tok, $packed); | ||||
644 | while (($tok, $packed) = each %{$self->{db_toks}}) { | ||||
645 | next if ($tok =~ MAGIC_RE); # skip magic tokens | ||||
646 | |||||
647 | my ($ts, $th, $atime) = $self->tok_unpack ($packed); | ||||
648 | |||||
649 | # Go through from $start * 1 to $start * 512, mark how many tokens | ||||
650 | # we would expire | ||||
651 | my $token_age = $newest_atime - $atime; | ||||
652 | for (my $i = 1; $i <= $max_expire_mult; $i<<=1) { | ||||
653 | if ($token_age >= $start * $i) { | ||||
654 | $delta{$i}++; | ||||
655 | } | ||||
656 | else { | ||||
657 | # If the token age is less than the expire delta, it'll be | ||||
658 | # less for all upcoming checks too, so abort early. | ||||
659 | last; | ||||
660 | } | ||||
661 | } | ||||
662 | } | ||||
663 | return %delta; | ||||
664 | } | ||||
665 | |||||
666 | ########################################################################### | ||||
667 | |||||
668 | sub token_expiration { | ||||
669 | my ($self, $opts, $newdelta, @vars) = @_; | ||||
670 | |||||
671 | my $deleted = 0; | ||||
672 | my $kept = 0; | ||||
673 | my $num_hapaxes = 0; | ||||
674 | my $num_lowfreq = 0; | ||||
675 | |||||
676 | # since DB_File will not shrink a database (!!), we need to *create* | ||||
677 | # a new one instead. | ||||
678 | my $main = $self->{bayes}->{main}; | ||||
679 | my $path = $main->sed_path($main->{conf}->{bayes_path}); | ||||
680 | |||||
681 | # use a temporary PID-based suffix just in case another one was | ||||
682 | # created previously by an interrupted expire | ||||
683 | my $tmpsuffix = "expire$$"; | ||||
684 | my $tmpdbname = $path.'_toks.'.$tmpsuffix; | ||||
685 | |||||
686 | # clean out any leftover db copies from previous runs | ||||
687 | for my $ext ($self->DB_EXTENSIONS) { unlink ($tmpdbname.$ext); } | ||||
688 | |||||
689 | # use O_EXCL to avoid races (bonus paranoia, since we should be locked | ||||
690 | # anyway) | ||||
691 | my %new_toks; | ||||
692 | my $umask = umask 0; | ||||
693 | tie %new_toks, $self->DBM_MODULE, $tmpdbname, O_RDWR|O_CREAT|O_EXCL, | ||||
694 | (oct ($main->{conf}->{bayes_file_mode}) & 0666); | ||||
695 | umask $umask; | ||||
696 | my $oldest; | ||||
697 | |||||
698 | my $showdots = $opts->{showdots}; | ||||
699 | if ($showdots) { print STDERR "\n"; } | ||||
700 | |||||
701 | # We've chosen a new atime delta if we've gotten here, so record it | ||||
702 | # for posterity. | ||||
703 | $new_toks{$LAST_ATIME_DELTA_MAGIC_TOKEN} = $newdelta; | ||||
704 | |||||
705 | # Figure out how old is too old... | ||||
706 | my $too_old = $vars[10] - $newdelta; # tooold = newest - delta | ||||
707 | |||||
708 | # Go ahead and do the move to new db/expire run now ... | ||||
709 | my ($tok, $packed); | ||||
710 | while (($tok, $packed) = each %{$self->{db_toks}}) { | ||||
711 | next if ($tok =~ MAGIC_RE); # skip magic tokens | ||||
712 | |||||
713 | my ($ts, $th, $atime) = $self->tok_unpack ($packed); | ||||
714 | |||||
715 | if ($atime < $too_old) { | ||||
716 | $deleted++; | ||||
717 | } | ||||
718 | else { | ||||
719 | # if token atime > newest, reset to newest ... | ||||
720 | if ($atime > $vars[10]) { | ||||
721 | $atime = $vars[10]; | ||||
722 | } | ||||
723 | |||||
724 | $new_toks{$tok} = $self->tok_pack ($ts, $th, $atime); $kept++; | ||||
725 | if (!defined($oldest) || $atime < $oldest) { $oldest = $atime; } | ||||
726 | if ($ts + $th == 1) { | ||||
727 | $num_hapaxes++; | ||||
728 | } elsif ($ts < 8 && $th < 8) { | ||||
729 | $num_lowfreq++; | ||||
730 | } | ||||
731 | } | ||||
732 | |||||
733 | if ((($kept + $deleted) % 1000) == 0) { | ||||
734 | if ($showdots) { print STDERR "."; } | ||||
735 | $self->set_running_expire_tok(); | ||||
736 | } | ||||
737 | } | ||||
738 | |||||
739 | # and add the magic tokens. don't add the expire_running token. | ||||
740 | $new_toks{$DB_VERSION_MAGIC_TOKEN} = $self->DB_VERSION; | ||||
741 | |||||
742 | # We haven't changed messages of each type seen, so just copy over. | ||||
743 | $new_toks{$NSPAM_MAGIC_TOKEN} = $vars[1]; | ||||
744 | $new_toks{$NHAM_MAGIC_TOKEN} = $vars[2]; | ||||
745 | |||||
746 | # We magically haven't removed the newest token, so just copy that value over. | ||||
747 | $new_toks{$NEWEST_TOKEN_AGE_MAGIC_TOKEN} = $vars[10]; | ||||
748 | |||||
749 | # The rest of these have been modified, so replace as necessary. | ||||
750 | $new_toks{$NTOKENS_MAGIC_TOKEN} = $kept; | ||||
751 | $new_toks{$LAST_EXPIRE_MAGIC_TOKEN} = time(); | ||||
752 | $new_toks{$OLDEST_TOKEN_AGE_MAGIC_TOKEN} = $oldest; | ||||
753 | $new_toks{$LAST_EXPIRE_REDUCE_MAGIC_TOKEN} = $deleted; | ||||
754 | |||||
755 | # Sanity check: if we expired too many tokens, abort! | ||||
756 | if ($kept < 100000) { | ||||
757 | dbg("bayes: token expiration would expire too many tokens, aborting"); | ||||
758 | # set the magic tokens appropriately | ||||
759 | # make sure the next expire run does a first pass | ||||
760 | $self->{db_toks}->{$LAST_EXPIRE_MAGIC_TOKEN} = time(); | ||||
761 | $self->{db_toks}->{$LAST_EXPIRE_REDUCE_MAGIC_TOKEN} = 0; | ||||
762 | $self->{db_toks}->{$LAST_ATIME_DELTA_MAGIC_TOKEN} = 0; | ||||
763 | |||||
764 | # remove the new DB | ||||
765 | untie %new_toks; | ||||
766 | for my $ext ($self->DB_EXTENSIONS) { unlink ($tmpdbname.$ext); } | ||||
767 | |||||
768 | # reset the results for the return | ||||
769 | $kept = $vars[3]; | ||||
770 | $deleted = 0; | ||||
771 | $num_hapaxes = 0; | ||||
772 | $num_lowfreq = 0; | ||||
773 | } | ||||
774 | else { | ||||
775 | # now untie so we can do renames | ||||
776 | untie %{$self->{db_toks}}; | ||||
777 | untie %new_toks; | ||||
778 | |||||
779 | # This is the critical phase (moving files around), so don't allow | ||||
780 | # it to be interrupted. Scope the signal changes. | ||||
781 | { | ||||
782 | local $SIG{'INT'} = 'IGNORE'; | ||||
783 | local $SIG{'TERM'} = 'IGNORE'; | ||||
784 | local $SIG{'HUP'} = 'IGNORE' if !am_running_on_windows(); | ||||
785 | |||||
786 | # now rename in the new one. Try several extensions | ||||
787 | for my $ext ($self->DB_EXTENSIONS) { | ||||
788 | my $newf = $tmpdbname.$ext; | ||||
789 | my $oldf = $path.'_toks'.$ext; | ||||
790 | next unless (-f $newf); | ||||
791 | if (!rename ($newf, $oldf)) { | ||||
792 | warn "bayes: rename $newf to $oldf failed: $!\n"; | ||||
793 | } | ||||
794 | } | ||||
795 | } | ||||
796 | } | ||||
797 | |||||
798 | # Call untie_db() so we unlock correctly. | ||||
799 | $self->untie_db(); | ||||
800 | |||||
801 | return ($kept, $deleted, $num_hapaxes, $num_lowfreq); | ||||
802 | } | ||||
803 | |||||
804 | ########################################################################### | ||||
805 | |||||
806 | # Is a sync due? | ||||
807 | sub sync_due { | ||||
808 | my ($self) = @_; | ||||
809 | |||||
810 | # don't bother doing old db versions | ||||
811 | return 0 if ($self->{db_version} < $self->DB_VERSION); | ||||
812 | |||||
813 | my $conf = $self->{bayes}->{main}->{conf}; | ||||
814 | return 0 if ($conf->{bayes_journal_max_size} == 0); | ||||
815 | |||||
816 | my @vars = $self->get_storage_variables(); | ||||
817 | dbg("bayes: DB journal sync: last sync: %s", $vars[7]); | ||||
818 | |||||
819 | ## Ok, should we do a sync? | ||||
820 | |||||
821 | # Not if the journal file doesn't exist, it's not a file, or it's 0 | ||||
822 | # bytes long. | ||||
823 | return 0 unless (stat($self->_get_journal_filename()) && -f _); | ||||
824 | |||||
825 | # Yes if the file size is larger than the specified maximum size. | ||||
826 | return 1 if (-s _ > $conf->{bayes_journal_max_size}); | ||||
827 | |||||
828 | # Yes there has been a sync before, and if it's been at least a day | ||||
829 | # since that sync. | ||||
830 | return 1 if (($vars[7] > 0) && (time - $vars[7] > 86400)); | ||||
831 | |||||
832 | # No, I guess not. | ||||
833 | return 0; | ||||
834 | } | ||||
835 | |||||
836 | ########################################################################### | ||||
837 | # db_seen reading APIs | ||||
838 | |||||
839 | # spent 31.6ms (11.0+20.7) within Mail::SpamAssassin::BayesStore::DBM::seen_get which was called 460 times, avg 69µs/call:
# 460 times (11.0ms+20.7ms) by Mail::SpamAssassin::Plugin::Bayes::_learn_trapped at line 421 of Mail/SpamAssassin/Plugin/Bayes.pm, avg 69µs/call | ||||
840 | 460 | 2.04ms | my ($self, $msgid) = @_; | ||
841 | 460 | 30.9ms | 460 | 20.7ms | $self->{db_seen}->{$msgid}; # spent 20.7ms making 460 calls to DB_File::FETCH, avg 45µs/call |
842 | } | ||||
843 | |||||
844 | # spent 11.1ms (7.30+3.83) within Mail::SpamAssassin::BayesStore::DBM::seen_put which was called 235 times, avg 47µs/call:
# 235 times (7.30ms+3.83ms) by Mail::SpamAssassin::Plugin::Bayes::_learn_trapped at line 485 of Mail/SpamAssassin/Plugin/Bayes.pm, avg 47µs/call | ||||
845 | 235 | 1.20ms | my ($self, $msgid, $seen) = @_; | ||
846 | |||||
847 | 235 | 2.64ms | if ($self->{bayes}->{main}->{learn_to_journal}) { | ||
848 | 235 | 2.46ms | 235 | 3.83ms | $self->defer_update ("m $seen $msgid"); # spent 3.83ms making 235 calls to Mail::SpamAssassin::BayesStore::DBM::defer_update, avg 16µs/call |
849 | } | ||||
850 | else { | ||||
851 | $self->_seen_put_direct($msgid, $seen); | ||||
852 | } | ||||
853 | } | ||||
854 | sub _seen_put_direct { | ||||
855 | my ($self, $msgid, $seen) = @_; | ||||
856 | $self->{db_seen}->{$msgid} = $seen; | ||||
857 | } | ||||
858 | |||||
859 | sub seen_delete { | ||||
860 | my ($self, $msgid) = @_; | ||||
861 | |||||
862 | if ($self->{bayes}->{main}->{learn_to_journal}) { | ||||
863 | $self->defer_update ("m f $msgid"); | ||||
864 | } | ||||
865 | else { | ||||
866 | $self->_seen_delete_direct($msgid); | ||||
867 | } | ||||
868 | } | ||||
869 | sub _seen_delete_direct { | ||||
870 | my ($self, $msgid) = @_; | ||||
871 | delete $self->{db_seen}->{$msgid}; | ||||
872 | } | ||||
873 | |||||
874 | ########################################################################### | ||||
875 | # db reading APIs | ||||
876 | |||||
877 | sub tok_get { | ||||
878 | my ($self, $tok) = @_; | ||||
879 | $self->tok_unpack ($self->{db_toks}->{$tok}); | ||||
880 | } | ||||
881 | |||||
882 | sub tok_get_all { | ||||
883 | my ($self, @tokens) = @_; | ||||
884 | |||||
885 | my @tokensdata; | ||||
886 | foreach my $token (@tokens) { | ||||
887 | my ($tok_spam, $tok_ham, $atime) = $self->tok_unpack($self->{db_toks}->{$token}); | ||||
888 | push(@tokensdata, [$token, $tok_spam, $tok_ham, $atime]); | ||||
889 | } | ||||
890 | return \@tokensdata; | ||||
891 | } | ||||
892 | |||||
893 | # return the magic tokens in a specific order: | ||||
894 | # 0: scan count base | ||||
895 | # 1: number of spam | ||||
896 | # 2: number of ham | ||||
897 | # 3: number of tokens in db | ||||
898 | # 4: last expire atime | ||||
899 | # 5: oldest token in db atime | ||||
900 | # 6: db version value | ||||
901 | # 7: last journal sync | ||||
902 | # 8: last atime delta | ||||
903 | # 9: last expire reduction count | ||||
904 | # 10: newest token in db atime | ||||
905 | # | ||||
906 | # spent 177ms (99.0+78.1) within Mail::SpamAssassin::BayesStore::DBM::get_storage_variables which was called 236 times, avg 750µs/call:
# 234 times (98.4ms+77.5ms) by Mail::SpamAssassin::BayesStore::DBM::tie_db_readonly at line 203, avg 752µs/call
# 2 times (539µs+562µs) by Mail::SpamAssassin::BayesStore::DBM::tie_db_writable at line 307, avg 550µs/call | ||||
907 | 236 | 580µs | my ($self) = @_; | ||
908 | 236 | 538µs | my @values; | ||
909 | |||||
910 | 236 | 17.5ms | 236 | 12.1ms | my $db_ver = $self->{db_toks}->{$DB_VERSION_MAGIC_TOKEN}; # spent 12.1ms making 236 calls to DB_File::FETCH, avg 51µs/call |
911 | |||||
912 | 238 | 13.5ms | 234 | 1.68ms | if (!$db_ver || $db_ver =~ /\D/) { $db_ver = 0; } # spent 1.68ms making 234 calls to Mail::SpamAssassin::BayesStore::DBM::CORE:match, avg 7µs/call |
913 | |||||
914 | 236 | 3.59ms | if ($db_ver >= 2) { | ||
915 | 234 | 663µs | my $DB2_LAST_ATIME_DELTA_MAGIC_TOKEN = "\015\001\007\011\003LASTATIMEDELTA"; | ||
916 | 234 | 651µs | my $DB2_LAST_EXPIRE_MAGIC_TOKEN = "\015\001\007\011\003LASTEXPIRE"; | ||
917 | 234 | 606µs | my $DB2_LAST_EXPIRE_REDUCE_MAGIC_TOKEN = "\015\001\007\011\003LASTEXPIREREDUCE"; | ||
918 | 234 | 579µs | my $DB2_LAST_JOURNAL_SYNC_MAGIC_TOKEN = "\015\001\007\011\003LASTJOURNALSYNC"; | ||
919 | 234 | 635µs | my $DB2_NEWEST_TOKEN_AGE_MAGIC_TOKEN = "\015\001\007\011\003NEWESTAGE"; | ||
920 | 234 | 539µs | my $DB2_NHAM_MAGIC_TOKEN = "\015\001\007\011\003NHAM"; | ||
921 | 234 | 516µs | my $DB2_NSPAM_MAGIC_TOKEN = "\015\001\007\011\003NSPAM"; | ||
922 | 234 | 555µs | my $DB2_NTOKENS_MAGIC_TOKEN = "\015\001\007\011\003NTOKENS"; | ||
923 | 234 | 576µs | my $DB2_OLDEST_TOKEN_AGE_MAGIC_TOKEN = "\015\001\007\011\003OLDESTAGE"; | ||
924 | 234 | 608µs | my $DB2_RUNNING_EXPIRE_MAGIC_TOKEN = "\015\001\007\011\003RUNNINGEXPIRE"; | ||
925 | |||||
926 | @values = ( | ||||
927 | 0, | ||||
928 | $self->{db_toks}->{$DB2_NSPAM_MAGIC_TOKEN}, | ||||
929 | $self->{db_toks}->{$DB2_NHAM_MAGIC_TOKEN}, | ||||
930 | $self->{db_toks}->{$DB2_NTOKENS_MAGIC_TOKEN}, | ||||
931 | $self->{db_toks}->{$DB2_LAST_EXPIRE_MAGIC_TOKEN}, | ||||
932 | $self->{db_toks}->{$DB2_OLDEST_TOKEN_AGE_MAGIC_TOKEN}, | ||||
933 | $db_ver, | ||||
934 | $self->{db_toks}->{$DB2_LAST_JOURNAL_SYNC_MAGIC_TOKEN}, | ||||
935 | $self->{db_toks}->{$DB2_LAST_ATIME_DELTA_MAGIC_TOKEN}, | ||||
936 | $self->{db_toks}->{$DB2_LAST_EXPIRE_REDUCE_MAGIC_TOKEN}, | ||||
937 | 234 | 101ms | 2106 | 61.2ms | $self->{db_toks}->{$DB2_NEWEST_TOKEN_AGE_MAGIC_TOKEN}, # spent 61.2ms making 2106 calls to DB_File::FETCH, avg 29µs/call |
938 | ); | ||||
939 | } | ||||
940 | elsif ($db_ver == 0) { | ||||
941 | 2 | 6µs | my $DB0_NSPAM_MAGIC_TOKEN = '**NSPAM'; | ||
942 | 2 | 5µs | my $DB0_NHAM_MAGIC_TOKEN = '**NHAM'; | ||
943 | 2 | 6µs | my $DB0_OLDEST_TOKEN_AGE_MAGIC_TOKEN = '**OLDESTAGE'; | ||
944 | 2 | 5µs | my $DB0_LAST_EXPIRE_MAGIC_TOKEN = '**LASTEXPIRE'; | ||
945 | 2 | 5µs | my $DB0_NTOKENS_MAGIC_TOKEN = '**NTOKENS'; | ||
946 | 2 | 5µs | my $DB0_SCANCOUNT_BASE_MAGIC_TOKEN = '**SCANBASE'; | ||
947 | |||||
948 | @values = ( | ||||
949 | $self->{db_toks}->{$DB0_SCANCOUNT_BASE_MAGIC_TOKEN}, | ||||
950 | $self->{db_toks}->{$DB0_NSPAM_MAGIC_TOKEN}, | ||||
951 | $self->{db_toks}->{$DB0_NHAM_MAGIC_TOKEN}, | ||||
952 | $self->{db_toks}->{$DB0_NTOKENS_MAGIC_TOKEN}, | ||||
953 | $self->{db_toks}->{$DB0_LAST_EXPIRE_MAGIC_TOKEN}, | ||||
954 | 2 | 243µs | 12 | 121µs | $self->{db_toks}->{$DB0_OLDEST_TOKEN_AGE_MAGIC_TOKEN}, # spent 121µs making 12 calls to DB_File::FETCH, avg 10µs/call |
955 | 0, | ||||
956 | 0, | ||||
957 | 0, | ||||
958 | 0, | ||||
959 | 0, | ||||
960 | ); | ||||
961 | } | ||||
962 | elsif ($db_ver == 1) { | ||||
963 | my $DB1_NSPAM_MAGIC_TOKEN = "\015\001\007\011\003NSPAM"; | ||||
964 | my $DB1_NHAM_MAGIC_TOKEN = "\015\001\007\011\003NHAM"; | ||||
965 | my $DB1_OLDEST_TOKEN_AGE_MAGIC_TOKEN = "\015\001\007\011\003OLDESTAGE"; | ||||
966 | my $DB1_LAST_EXPIRE_MAGIC_TOKEN = "\015\001\007\011\003LASTEXPIRE"; | ||||
967 | my $DB1_NTOKENS_MAGIC_TOKEN = "\015\001\007\011\003NTOKENS"; | ||||
968 | my $DB1_SCANCOUNT_BASE_MAGIC_TOKEN = "\015\001\007\011\003SCANBASE"; | ||||
969 | |||||
970 | @values = ( | ||||
971 | $self->{db_toks}->{$DB1_SCANCOUNT_BASE_MAGIC_TOKEN}, | ||||
972 | $self->{db_toks}->{$DB1_NSPAM_MAGIC_TOKEN}, | ||||
973 | $self->{db_toks}->{$DB1_NHAM_MAGIC_TOKEN}, | ||||
974 | $self->{db_toks}->{$DB1_NTOKENS_MAGIC_TOKEN}, | ||||
975 | $self->{db_toks}->{$DB1_LAST_EXPIRE_MAGIC_TOKEN}, | ||||
976 | $self->{db_toks}->{$DB1_OLDEST_TOKEN_AGE_MAGIC_TOKEN}, | ||||
977 | 1, | ||||
978 | 0, | ||||
979 | 0, | ||||
980 | 0, | ||||
981 | 0, | ||||
982 | ); | ||||
983 | } | ||||
984 | |||||
985 | 236 | 1.52ms | foreach (@values) { | ||
986 | 2596 | 24.8ms | 949 | 2.93ms | if (!$_ || $_ =~ /\D/) { # spent 2.93ms making 949 calls to Mail::SpamAssassin::BayesStore::DBM::CORE:match, avg 3µs/call |
987 | 1647 | 3.04ms | $_ = 0; | ||
988 | } | ||||
989 | } | ||||
990 | |||||
991 | 236 | 4.36ms | return @values; | ||
992 | } | ||||
993 | |||||
994 | sub dump_db_toks { | ||||
995 | my ($self, $template, $regex, @vars) = @_; | ||||
996 | |||||
997 | while (my ($tok, $tokvalue) = each %{$self->{db_toks}}) { | ||||
998 | next if ($tok =~ MAGIC_RE); # skip magic tokens | ||||
999 | next if (defined $regex && ($tok !~ /$regex/o)); | ||||
1000 | |||||
1001 | # We have the value already, so just unpack it. | ||||
1002 | my ($ts, $th, $atime) = $self->tok_unpack ($tokvalue); | ||||
1003 | |||||
1004 | my $prob = $self->{bayes}->_compute_prob_for_token($tok, $vars[1], $vars[2], $ts, $th); | ||||
1005 | $prob ||= 0.5; | ||||
1006 | |||||
1007 | my $encoded_tok = unpack("H*",$tok); | ||||
1008 | printf $template,$prob,$ts,$th,$atime,$encoded_tok; | ||||
1009 | } | ||||
1010 | } | ||||
1011 | |||||
1012 | sub set_last_expire { | ||||
1013 | my ($self, $time) = @_; | ||||
1014 | $self->{db_toks}->{$LAST_EXPIRE_MAGIC_TOKEN} = time(); | ||||
1015 | } | ||||
1016 | |||||
1017 | ## Don't bother using get_magic_tokens here. This token should only | ||||
1018 | ## ever exist when we're running expire, so we don't want to convert it if | ||||
1019 | ## it's there and we're not expiring ... | ||||
1020 | sub get_running_expire_tok { | ||||
1021 | my ($self) = @_; | ||||
1022 | my $running = $self->{db_toks}->{$RUNNING_EXPIRE_MAGIC_TOKEN}; | ||||
1023 | if (!$running || $running =~ /\D/) { return; } | ||||
1024 | return $running; | ||||
1025 | } | ||||
1026 | |||||
1027 | sub set_running_expire_tok { | ||||
1028 | my ($self) = @_; | ||||
1029 | |||||
1030 | # update the lock and running expire magic token | ||||
1031 | $self->{bayes}->{main}->{locker}->refresh_lock ($self->{locked_file}); | ||||
1032 | $self->{db_toks}->{$RUNNING_EXPIRE_MAGIC_TOKEN} = time(); | ||||
1033 | } | ||||
1034 | |||||
1035 | sub remove_running_expire_tok { | ||||
1036 | my ($self) = @_; | ||||
1037 | delete $self->{db_toks}->{$RUNNING_EXPIRE_MAGIC_TOKEN}; | ||||
1038 | } | ||||
1039 | |||||
1040 | ########################################################################### | ||||
1041 | |||||
1042 | # db abstraction: allow deferred writes, since we will be frequently | ||||
1043 | # writing while checking. | ||||
1044 | |||||
1045 | sub tok_count_change { | ||||
1046 | my ($self, $ds, $dh, $tok, $atime) = @_; | ||||
1047 | |||||
1048 | $atime = 0 unless defined $atime; | ||||
1049 | |||||
1050 | if ($self->{bayes}->{main}->{learn_to_journal}) { | ||||
1051 | # we can't store the SHA1 binary value in the journal, so convert it | ||||
1052 | # to a printable value that can be converted back later | ||||
1053 | my $encoded_tok = unpack("H*",$tok); | ||||
1054 | $self->defer_update ("c $ds $dh $atime $encoded_tok"); | ||||
1055 | } else { | ||||
1056 | $self->tok_sync_counters ($ds, $dh, $atime, $tok); | ||||
1057 | } | ||||
1058 | } | ||||
1059 | |||||
1060 | # spent 3.41s (2.23+1.18) within Mail::SpamAssassin::BayesStore::DBM::multi_tok_count_change which was called 235 times, avg 14.5ms/call:
# 235 times (2.23s+1.18s) by Mail::SpamAssassin::Plugin::Bayes::_learn_trapped at line 478 of Mail/SpamAssassin/Plugin/Bayes.pm, avg 14.5ms/call | ||||
1061 | 235 | 1.18ms | my ($self, $ds, $dh, $tokens, $atime) = @_; | ||
1062 | |||||
1063 | 235 | 562µs | $atime = 0 unless defined $atime; | ||
1064 | |||||
1065 | 470 | 28.6ms | foreach my $tok (keys %{$tokens}) { | ||
1066 | 73949 | 467ms | if ($self->{bayes}->{main}->{learn_to_journal}) { | ||
1067 | # we can't store the SHA1 binary value in the journal, so convert it | ||||
1068 | # to a printable value that can be converted back later | ||||
1069 | 73949 | 846ms | 73949 | 183ms | my $encoded_tok = unpack("H*",$tok); # spent 183ms making 73949 calls to Mail::SpamAssassin::BayesStore::DBM::CORE:unpack, avg 2µs/call |
1070 | 73949 | 767ms | 73949 | 1.00s | $self->defer_update ("c $ds $dh $atime $encoded_tok"); # spent 1.00s making 73949 calls to Mail::SpamAssassin::BayesStore::DBM::defer_update, avg 14µs/call |
1071 | } else { | ||||
1072 | $self->tok_sync_counters ($ds, $dh, $atime, $tok); | ||||
1073 | } | ||||
1074 | } | ||||
1075 | } | ||||
1076 | |||||
1077 | sub nspam_nham_get { | ||||
1078 | my ($self) = @_; | ||||
1079 | my @vars = $self->get_storage_variables(); | ||||
1080 | ($vars[1], $vars[2]); | ||||
1081 | } | ||||
1082 | |||||
1083 | # spent 9.44ms (6.50+2.93) within Mail::SpamAssassin::BayesStore::DBM::nspam_nham_change which was called 235 times, avg 40µs/call:
# 235 times (6.50ms+2.93ms) by Mail::SpamAssassin::Plugin::Bayes::_learn_trapped at line 477 of Mail/SpamAssassin/Plugin/Bayes.pm, avg 40µs/call | ||||
1084 | 235 | 711µs | my ($self, $ds, $dh) = @_; | ||
1085 | |||||
1086 | 235 | 2.91ms | if ($self->{bayes}->{main}->{learn_to_journal}) { | ||
1087 | 235 | 2.91ms | 235 | 2.93ms | $self->defer_update ("n $ds $dh"); # spent 2.93ms making 235 calls to Mail::SpamAssassin::BayesStore::DBM::defer_update, avg 12µs/call |
1088 | } else { | ||||
1089 | $self->tok_sync_nspam_nham ($ds, $dh); | ||||
1090 | } | ||||
1091 | } | ||||
1092 | |||||
1093 | sub tok_touch { | ||||
1094 | my ($self, $tok, $atime) = @_; | ||||
1095 | # we can't store the SHA1 binary value in the journal, so convert it | ||||
1096 | # to a printable value that can be converted back later | ||||
1097 | my $encoded_tok = unpack("H*", $tok); | ||||
1098 | $self->defer_update ("t $atime $encoded_tok"); | ||||
1099 | } | ||||
1100 | |||||
1101 | sub tok_touch_all { | ||||
1102 | my ($self, $tokens, $atime) = @_; | ||||
1103 | |||||
1104 | foreach my $token (@{$tokens}) { | ||||
1105 | # we can't store the SHA1 binary value in the journal, so convert it | ||||
1106 | # to a printable value that can be converted back later | ||||
1107 | my $encoded_tok = unpack("H*", $token); | ||||
1108 | $self->defer_update ("t $atime $encoded_tok"); | ||||
1109 | } | ||||
1110 | } | ||||
1111 | |||||
1112 | # spent 1.01s within Mail::SpamAssassin::BayesStore::DBM::defer_update which was called 74419 times, avg 14µs/call:
# 73949 times (1.00s+0s) by Mail::SpamAssassin::BayesStore::DBM::multi_tok_count_change at line 1070, avg 14µs/call
# 235 times (3.83ms+0s) by Mail::SpamAssassin::BayesStore::DBM::seen_put at line 848, avg 16µs/call
# 235 times (2.93ms+0s) by Mail::SpamAssassin::BayesStore::DBM::nspam_nham_change at line 1087, avg 12µs/call | ||||
1113 | 74419 | 218ms | my ($self, $str) = @_; | ||
1114 | 74419 | 1.10s | $self->{string_to_journal} .= "$str\n"; | ||
1115 | } | ||||
1116 | |||||
1117 | ########################################################################### | ||||
1118 | |||||
1119 | # spent 98.4ms (26.7+71.8) within Mail::SpamAssassin::BayesStore::DBM::cleanup which was called 235 times, avg 419µs/call:
# 235 times (26.7ms+71.8ms) by Mail::SpamAssassin::Plugin::Bayes::_learn_trapped at line 486 of Mail/SpamAssassin/Plugin/Bayes.pm, avg 419µs/call | ||||
1120 | 235 | 544µs | my ($self) = @_; | ||
1121 | |||||
1122 | 235 | 1.02ms | my $nbytes = length ($self->{string_to_journal}); | ||
1123 | 235 | 629µs | return if ($nbytes == 0); | ||
1124 | |||||
1125 | 235 | 2.24ms | 235 | 11.5ms | my $path = $self->_get_journal_filename(); # spent 11.5ms making 235 calls to Mail::SpamAssassin::BayesStore::DBM::_get_journal_filename, avg 49µs/call |
1126 | |||||
1127 | # use append mode, write atomically, then close, so simultaneous updates are | ||||
1128 | # not lost | ||||
1129 | 235 | 811µs | my $conf = $self->{bayes}->{main}->{conf}; | ||
1130 | |||||
1131 | # set the umask to the inverse of what we want ... | ||||
1132 | 235 | 3.81ms | 235 | 1.06ms | my $umask = umask(0777 - (oct ($conf->{bayes_file_mode}) & 0666)); # spent 1.06ms making 235 calls to Mail::SpamAssassin::BayesStore::DBM::CORE:umask, avg 5µs/call |
1133 | |||||
1134 | 235 | 32.4ms | 235 | 29.7ms | if (!open (OUT, ">>".$path)) { # spent 29.7ms making 235 calls to Mail::SpamAssassin::BayesStore::DBM::CORE:open, avg 126µs/call |
1135 | warn "bayes: cannot write to $path, bayes db update ignored: $!\n"; | ||||
1136 | umask $umask; # reset umask | ||||
1137 | return; | ||||
1138 | } | ||||
1139 | 235 | 2.09ms | 235 | 547µs | umask $umask; # reset umask # spent 547µs making 235 calls to Mail::SpamAssassin::BayesStore::DBM::CORE:umask, avg 2µs/call |
1140 | |||||
1141 | # do not use print() here, it will break up the buffer if it's >8192 bytes, | ||||
1142 | # which could result in two sets of tokens getting mixed up and their | ||||
1143 | # touches missed. | ||||
1144 | 235 | 624µs | my $write_failure = 0; | ||
1145 | 235 | 2.62ms | 235 | 850µs | my $original_point = tell OUT; # spent 850µs making 235 calls to Mail::SpamAssassin::BayesStore::DBM::CORE:tell, avg 4µs/call |
1146 | 235 | 562µs | $original_point >= 0 or die "Can't obtain file position: $!"; | ||
1147 | 235 | 460µs | my $len; | ||
1148 | 235 | 1.91ms | do { | ||
1149 | 235 | 27.7ms | 235 | 24.8ms | $len = syswrite (OUT, $self->{string_to_journal}, $nbytes); # spent 24.8ms making 235 calls to Mail::SpamAssassin::BayesStore::DBM::CORE:syswrite, avg 106µs/call |
1150 | |||||
1151 | # argh, write failure, give up | ||||
1152 | 235 | 781µs | if (!defined $len || $len < 0) { | ||
1153 | my $err = ''; | ||||
1154 | if (!defined $len) { | ||||
1155 | $len = 0; | ||||
1156 | $err = " ($!)"; | ||||
1157 | } | ||||
1158 | warn "bayes: write failed to Bayes journal $path ($len of $nbytes)!$err\n"; | ||||
1159 | last; | ||||
1160 | } | ||||
1161 | |||||
1162 | # This shouldn't happen, but could if the fs is full... | ||||
1163 | 235 | 664µs | if ($len != $nbytes) { | ||
1164 | warn "bayes: partial write to bayes journal $path ($len of $nbytes), recovering\n"; | ||||
1165 | |||||
1166 | # we want to be atomic, so revert the journal file back to where | ||||
1167 | # we know it's "good". if we can't truncate the journal, or we've | ||||
1168 | # tried 5 times to do the write, abort! | ||||
1169 | if (!truncate(OUT, $original_point) || ($write_failure++ > 4)) { | ||||
1170 | warn "bayes: cannot write to bayes journal $path, aborting!\n"; | ||||
1171 | last; | ||||
1172 | } | ||||
1173 | |||||
1174 | # if the fs is full, let's give the system a break | ||||
1175 | sleep 1; | ||||
1176 | } | ||||
1177 | } while ($len != $nbytes); | ||||
1178 | |||||
1179 | 235 | 5.32ms | 235 | 3.27ms | if (!close OUT) { # spent 3.27ms making 235 calls to Mail::SpamAssassin::BayesStore::DBM::CORE:close, avg 14µs/call |
1180 | warn "bayes: cannot write to $path, bayes db update ignored\n"; | ||||
1181 | } | ||||
1182 | |||||
1183 | 235 | 2.77ms | $self->{string_to_journal} = ''; | ||
1184 | } | ||||
1185 | |||||
1186 | # Return a qr'd RE to match a token with the correct format's magic token | ||||
1187 | # spent 130ms within Mail::SpamAssassin::BayesStore::DBM::get_magic_re which was called 12862 times, avg 10µs/call:
# 12862 times (130ms+0s) by Mail::SpamAssassin::Plugin::Bayes::_tokenize_line at line 1168 of Mail/SpamAssassin/Plugin/Bayes.pm, avg 10µs/call | ||||
1188 | 12862 | 24.4ms | my ($self) = @_; | ||
1189 | |||||
1190 | 12862 | 36.6ms | if (!defined $self->{db_version} || $self->{db_version} >= 1) { | ||
1191 | 12862 | 129ms | return MAGIC_RE; | ||
1192 | } | ||||
1193 | |||||
1194 | # When in doubt, assume v0 | ||||
1195 | return qr/^\*\*[A-Z]+$/; | ||||
1196 | } | ||||
1197 | |||||
1198 | # provide a more generalized public interface into the journal sync | ||||
1199 | |||||
1200 | sub sync { | ||||
1201 | my ($self, $opts) = @_; | ||||
1202 | |||||
1203 | return $self->_sync_journal($opts); | ||||
1204 | } | ||||
1205 | |||||
1206 | ########################################################################### | ||||
1207 | # And this method reads the journal and applies the changes in one | ||||
1208 | # (locked) transaction. | ||||
1209 | |||||
1210 | sub _sync_journal { | ||||
1211 | my ($self, $opts) = @_; | ||||
1212 | my $ret = 0; | ||||
1213 | |||||
1214 | my $path = $self->_get_journal_filename(); | ||||
1215 | |||||
1216 | # if $path doesn't exist, or it's not a file, or is 0 bytes in length, return | ||||
1217 | if (!stat($path) || !-f _ || -z _) { | ||||
1218 | return 0; | ||||
1219 | } | ||||
1220 | |||||
1221 | my $eval_stat; | ||||
1222 | eval { | ||||
1223 | local $SIG{'__DIE__'}; # do not run user die() traps in here | ||||
1224 | if ($self->tie_db_writable()) { | ||||
1225 | $ret = $self->_sync_journal_trapped($opts, $path); | ||||
1226 | } | ||||
1227 | 1; | ||||
1228 | } or do { | ||||
1229 | $eval_stat = $@ ne '' ? $@ : "errno=$!"; chomp $eval_stat; | ||||
1230 | }; | ||||
1231 | |||||
1232 | # ok, untie from write-mode if we can | ||||
1233 | if (!$self->{bayes}->{main}->{learn_caller_will_untie}) { | ||||
1234 | $self->untie_db(); | ||||
1235 | } | ||||
1236 | |||||
1237 | # handle any errors that may have occurred | ||||
1238 | if (defined $eval_stat) { | ||||
1239 | warn "bayes: $eval_stat\n"; | ||||
1240 | return 0; | ||||
1241 | } | ||||
1242 | |||||
1243 | $ret; | ||||
1244 | } | ||||
1245 | |||||
1246 | sub _sync_journal_trapped { | ||||
1247 | my ($self, $opts, $path) = @_; | ||||
1248 | |||||
1249 | # Flag that we're doing work | ||||
1250 | $self->set_running_expire_tok(); | ||||
1251 | |||||
1252 | my $started = time(); | ||||
1253 | my $count = 0; | ||||
1254 | my $total_count = 0; | ||||
1255 | my %tokens; | ||||
1256 | my $showdots = $opts->{showdots}; | ||||
1257 | my $retirepath = $path.".old"; | ||||
1258 | |||||
1259 | # if $path doesn't exist, or it's not a file, or is 0 bytes in length, | ||||
1260 | # return we have to check again since the file may have been removed | ||||
1261 | # by a recent bayes db upgrade ... | ||||
1262 | if (!stat($path) || !-f _ || -z _) { | ||||
1263 | return 0; | ||||
1264 | } | ||||
1265 | |||||
1266 | if (!-r $path) { # will we be able to read the file? | ||||
1267 | warn "bayes: bad permissions on journal, can't read: $path\n"; | ||||
1268 | return 0; | ||||
1269 | } | ||||
1270 | |||||
1271 | # This is the critical phase (moving files around), so don't allow | ||||
1272 | # it to be interrupted. | ||||
1273 | { | ||||
1274 | local $SIG{'INT'} = 'IGNORE'; | ||||
1275 | local $SIG{'TERM'} = 'IGNORE'; | ||||
1276 | local $SIG{'HUP'} = 'IGNORE' if !am_running_on_windows(); | ||||
1277 | |||||
1278 | # retire the journal, so we can update the db files from it in peace. | ||||
1279 | # TODO: use locking here | ||||
1280 | if (!rename ($path, $retirepath)) { | ||||
1281 | warn "bayes: failed rename $path to $retirepath\n"; | ||||
1282 | return 0; | ||||
1283 | } | ||||
1284 | |||||
1285 | # now read the retired journal | ||||
1286 | local *JOURNAL; | ||||
1287 | if (!open (JOURNAL, "<$retirepath")) { | ||||
1288 | warn "bayes: cannot open read $retirepath\n"; | ||||
1289 | return 0; | ||||
1290 | } | ||||
1291 | |||||
1292 | |||||
1293 | # Read the journal | ||||
1294 | for ($!=0; defined($_=<JOURNAL>); $!=0) { | ||||
1295 | $total_count++; | ||||
1296 | |||||
1297 | if (/^t (\d+) (.+)$/) { # Token timestamp update, cache resultant entries | ||||
1298 | my $tok = pack("H*",$2); | ||||
1299 | $tokens{$tok} = $1+0 if (!exists $tokens{$tok} || $1+0 > $tokens{$tok}); | ||||
1300 | } elsif (/^c (-?\d+) (-?\d+) (\d+) (.+)$/) { # Add/full token update | ||||
1301 | my $tok = pack("H*",$4); | ||||
1302 | $self->tok_sync_counters ($1+0, $2+0, $3+0, $tok); | ||||
1303 | $count++; | ||||
1304 | } elsif (/^n (-?\d+) (-?\d+)$/) { # update ham/spam count | ||||
1305 | $self->tok_sync_nspam_nham ($1+0, $2+0); | ||||
1306 | $count++; | ||||
1307 | } elsif (/^m ([hsf]) (.+)$/) { # update msgid seen database | ||||
1308 | if ($1 eq "f") { | ||||
1309 | $self->_seen_delete_direct($2); | ||||
1310 | } | ||||
1311 | else { | ||||
1312 | $self->_seen_put_direct($2,$1); | ||||
1313 | } | ||||
1314 | $count++; | ||||
1315 | } else { | ||||
1316 | warn "bayes: gibberish entry found in journal: $_"; | ||||
1317 | } | ||||
1318 | } | ||||
1319 | defined $_ || $!==0 or | ||||
1320 | $!==EBADF ? dbg("bayes: error reading journal file: $!") | ||||
1321 | : die "error reading journal file: $!"; | ||||
1322 | close(JOURNAL) or die "Can't close journal file: $!"; | ||||
1323 | |||||
1324 | # Now that we've determined what tokens we need to update and their | ||||
1325 | # final values, update the DB. Should be much smaller than the full | ||||
1326 | # journal entries. | ||||
1327 | while (my ($k,$v) = each %tokens) { | ||||
1328 | $self->tok_touch_token ($v, $k); | ||||
1329 | |||||
1330 | if ((++$count % 1000) == 0) { | ||||
1331 | if ($showdots) { print STDERR "."; } | ||||
1332 | $self->set_running_expire_tok(); | ||||
1333 | } | ||||
1334 | } | ||||
1335 | |||||
1336 | if ($showdots) { print STDERR "\n"; } | ||||
1337 | |||||
1338 | # we're all done, so unlink the old journal file | ||||
1339 | unlink ($retirepath) || warn "bayes: can't unlink $retirepath: $!\n"; | ||||
1340 | |||||
1341 | $self->{db_toks}->{$LAST_JOURNAL_SYNC_MAGIC_TOKEN} = $started; | ||||
1342 | |||||
1343 | my $done = time(); | ||||
1344 | my $msg = ("bayes: synced databases from journal in " . | ||||
1345 | ($done - $started) . | ||||
1346 | " seconds: $count unique entries ($total_count total entries)"); | ||||
1347 | |||||
1348 | if ($opts->{verbose}) { | ||||
1349 | print $msg,"\n"; | ||||
1350 | } else { | ||||
1351 | dbg($msg); | ||||
1352 | } | ||||
1353 | } | ||||
1354 | |||||
1355 | # else, that's the lot, we're synced. return | ||||
1356 | return 1; | ||||
1357 | } | ||||
1358 | |||||
1359 | sub tok_touch_token { | ||||
1360 | my ($self, $atime, $tok) = @_; | ||||
1361 | my ($ts, $th, $oldatime) = $self->tok_get ($tok); | ||||
1362 | |||||
1363 | # If the new atime is < the old atime, ignore the update | ||||
1364 | # We figure that we'll never want to lower a token atime, so abort if | ||||
1365 | # we try. (journal out of sync, etc.) | ||||
1366 | return if ($oldatime >= $atime); | ||||
1367 | |||||
1368 | $self->tok_put ($tok, $ts, $th, $atime); | ||||
1369 | } | ||||
1370 | |||||
1371 | sub tok_sync_counters { | ||||
1372 | my ($self, $ds, $dh, $atime, $tok) = @_; | ||||
1373 | my ($ts, $th, $oldatime) = $self->tok_get ($tok); | ||||
1374 | $ts += $ds; if ($ts < 0) { $ts = 0; } | ||||
1375 | $th += $dh; if ($th < 0) { $th = 0; } | ||||
1376 | |||||
1377 | # Don't roll the atime of tokens backwards ... | ||||
1378 | $atime = $oldatime if ($oldatime > $atime); | ||||
1379 | |||||
1380 | $self->tok_put ($tok, $ts, $th, $atime); | ||||
1381 | } | ||||
1382 | |||||
1383 | sub tok_put { | ||||
1384 | my ($self, $tok, $ts, $th, $atime) = @_; | ||||
1385 | $ts ||= 0; | ||||
1386 | $th ||= 0; | ||||
1387 | |||||
1388 | # Ignore magic tokens, the don't go in this way ... | ||||
1389 | return if ($tok =~ MAGIC_RE); | ||||
1390 | |||||
1391 | # use defined() rather than exists(); the latter is not supported | ||||
1392 | # by NDBM_File, believe it or not. Using defined() did not | ||||
1393 | # indicate any noticeable speed hit in my testing. (Mar 31 2003 jm) | ||||
1394 | my $exists_already = defined $self->{db_toks}->{$tok}; | ||||
1395 | |||||
1396 | if ($ts == 0 && $th == 0) { | ||||
1397 | return if (!$exists_already); # If the token doesn't exist, just return | ||||
1398 | $self->{db_toks}->{$NTOKENS_MAGIC_TOKEN}--; | ||||
1399 | delete $self->{db_toks}->{$tok}; | ||||
1400 | } else { | ||||
1401 | if (!$exists_already) { # If the token doesn't exist, raise the token count | ||||
1402 | $self->{db_toks}->{$NTOKENS_MAGIC_TOKEN}++; | ||||
1403 | } | ||||
1404 | |||||
1405 | $self->{db_toks}->{$tok} = $self->tok_pack ($ts, $th, $atime); | ||||
1406 | |||||
1407 | my $newmagic = $self->{db_toks}->{$NEWEST_TOKEN_AGE_MAGIC_TOKEN}; | ||||
1408 | if (!defined ($newmagic) || $atime > $newmagic) { | ||||
1409 | $self->{db_toks}->{$NEWEST_TOKEN_AGE_MAGIC_TOKEN} = $atime; | ||||
1410 | } | ||||
1411 | |||||
1412 | # Make sure to check for either !defined or "" ... Apparently | ||||
1413 | # sometimes the DB module doesn't return the value correctly. :( | ||||
1414 | my $oldmagic = $self->{db_toks}->{$OLDEST_TOKEN_AGE_MAGIC_TOKEN}; | ||||
1415 | if (!defined ($oldmagic) || $oldmagic eq "" || $atime < $oldmagic) { | ||||
1416 | $self->{db_toks}->{$OLDEST_TOKEN_AGE_MAGIC_TOKEN} = $atime; | ||||
1417 | } | ||||
1418 | } | ||||
1419 | } | ||||
1420 | |||||
1421 | sub tok_sync_nspam_nham { | ||||
1422 | my ($self, $ds, $dh) = @_; | ||||
1423 | my ($ns, $nh) = ($self->get_storage_variables())[1,2]; | ||||
1424 | if ($ds) { $ns += $ds; } if ($ns < 0) { $ns = 0; } | ||||
1425 | if ($dh) { $nh += $dh; } if ($nh < 0) { $nh = 0; } | ||||
1426 | $self->{db_toks}->{$NSPAM_MAGIC_TOKEN} = $ns; | ||||
1427 | $self->{db_toks}->{$NHAM_MAGIC_TOKEN} = $nh; | ||||
1428 | } | ||||
1429 | |||||
1430 | ########################################################################### | ||||
1431 | |||||
1432 | # spent 11.5ms (6.33+5.15) within Mail::SpamAssassin::BayesStore::DBM::_get_journal_filename which was called 235 times, avg 49µs/call:
# 235 times (6.33ms+5.15ms) by Mail::SpamAssassin::BayesStore::DBM::cleanup at line 1125, avg 49µs/call | ||||
1433 | 235 | 563µs | my ($self) = @_; | ||
1434 | |||||
1435 | 235 | 820µs | my $main = $self->{bayes}->{main}; | ||
1436 | 235 | 4.83ms | 235 | 5.15ms | return $main->sed_path($main->{conf}->{bayes_path}."_journal"); # spent 5.15ms making 235 calls to Mail::SpamAssassin::sed_path, avg 22µs/call |
1437 | } | ||||
1438 | |||||
1439 | ########################################################################### | ||||
1440 | |||||
1441 | # this is called directly from sa-learn(1). | ||||
1442 | sub perform_upgrade { | ||||
1443 | my ($self, $opts) = @_; | ||||
1444 | my $ret = 0; | ||||
1445 | |||||
1446 | my $eval_stat; | ||||
1447 | eval { | ||||
1448 | local $SIG{'__DIE__'}; # do not run user die() traps in here | ||||
1449 | |||||
1450 | 2 | 4.55ms | 2 | 477µs | # spent 257µs (37+220) within Mail::SpamAssassin::BayesStore::DBM::BEGIN@1450 which was called:
# once (37µs+220µs) by Mail::SpamAssassin::Plugin::Bayes::learner_new at line 1450 # spent 257µs making 1 call to Mail::SpamAssassin::BayesStore::DBM::BEGIN@1450
# spent 220µs making 1 call to Exporter::import |
1451 | |||||
1452 | # bayes directory | ||||
1453 | my $main = $self->{bayes}->{main}; | ||||
1454 | my $path = $main->sed_path($main->{conf}->{bayes_path}); | ||||
1455 | |||||
1456 | # prevent dirname() from tainting the result, it assumes $1 is not tainted | ||||
1457 | local($1,$2,$3); # Bug 6310; perl #67962 (fixed in perl 5.12/5.13) | ||||
1458 | my $dir = dirname($path); | ||||
1459 | |||||
1460 | # make temporary copy since old dbm and new dbm may have same name | ||||
1461 | opendir(DIR, $dir) or die "bayes: can't opendir $dir: $!"; | ||||
1462 | my @files = grep { /^bayes_(?:seen|toks)(?:\.\w+)?$/ } readdir(DIR); | ||||
1463 | closedir(DIR) or die "bayes: can't close directory $dir: $!"; | ||||
1464 | if (@files < 2 || !grep(/bayes_seen/,@files) || !grep(/bayes_toks/,@files)) | ||||
1465 | { | ||||
1466 | die "bayes: unable to find bayes_toks and bayes_seen, stopping\n"; | ||||
1467 | } | ||||
1468 | # untaint @files (already safe after grep) | ||||
1469 | untaint_var(\@files); | ||||
1470 | |||||
1471 | for (@files) { | ||||
1472 | my $src = "$dir/$_"; | ||||
1473 | my $dst = "$dir/old_$_"; | ||||
1474 | eval q{ | ||||
1475 | use File::Copy; | ||||
1476 | copy($src, $dst); | ||||
1477 | } || die "bayes: can't copy $src to $dst: $!\n"; | ||||
1478 | } | ||||
1479 | |||||
1480 | # delete previous to make way for import | ||||
1481 | for (@files) { unlink("$dir/$_"); } | ||||
1482 | |||||
1483 | # import | ||||
1484 | if ($self->tie_db_writable()) { | ||||
1485 | $ret += $self->upgrade_old_dbm_files_trapped("$dir/old_bayes_seen", | ||||
1486 | $self->{db_seen}); | ||||
1487 | $ret += $self->upgrade_old_dbm_files_trapped("$dir/old_bayes_toks", | ||||
1488 | $self->{db_toks}); | ||||
1489 | } | ||||
1490 | |||||
1491 | if ($ret == 2) { | ||||
1492 | print "import successful, original files saved with \"old\" prefix\n"; | ||||
1493 | } | ||||
1494 | else { | ||||
1495 | print "import failed, original files saved with \"old\" prefix\n"; | ||||
1496 | } | ||||
1497 | 1; | ||||
1498 | } or do { | ||||
1499 | $eval_stat = $@ ne '' ? $@ : "errno=$!"; chomp $eval_stat; | ||||
1500 | }; | ||||
1501 | |||||
1502 | $self->untie_db(); | ||||
1503 | |||||
1504 | # if we died, untie the dbm files | ||||
1505 | if (defined $eval_stat) { | ||||
1506 | warn "bayes: perform_upgrade: $eval_stat\n"; | ||||
1507 | return 0; | ||||
1508 | } | ||||
1509 | $ret; | ||||
1510 | } | ||||
1511 | |||||
1512 | sub upgrade_old_dbm_files_trapped { | ||||
1513 | my ($self, $filename, $output) = @_; | ||||
1514 | |||||
1515 | my $count; | ||||
1516 | my %in; | ||||
1517 | |||||
1518 | print "upgrading to DB_File, please be patient: $filename\n"; | ||||
1519 | |||||
1520 | # try each type of file until we find one with > 0 entries | ||||
1521 | for my $dbm ('DB_File', 'GDBM_File', 'NDBM_File', 'SDBM_File') { | ||||
1522 | $count = 0; | ||||
1523 | # wrap in eval so it doesn't run in general use. This accesses db | ||||
1524 | # modules directly. | ||||
1525 | # Note: (bug 2390), the 'use' needs to be on the same line as the eval | ||||
1526 | # for RPM dependency checks to work properly. It's lame, but... | ||||
1527 | my $eval_stat; | ||||
1528 | eval 'use ' . $dbm . '; | ||||
1529 | tie %in, "' . $dbm . '", $filename, O_RDONLY, 0600; | ||||
1530 | %{ $output } = %in; | ||||
1531 | $count = scalar keys %{ $output }; | ||||
1532 | untie %in; | ||||
1533 | 1; | ||||
1534 | ' or do { | ||||
1535 | $eval_stat = $@ ne '' ? $@ : "errno=$!"; chomp $eval_stat; | ||||
1536 | }; | ||||
1537 | if (defined $eval_stat) { | ||||
1538 | print "$dbm: $dbm module not installed(?), nothing copied: $eval_stat\n"; | ||||
1539 | dbg("bayes: error was: $eval_stat"); | ||||
1540 | } | ||||
1541 | elsif ($count == 0) { | ||||
1542 | print "$dbm: no database of that kind found, nothing copied\n"; | ||||
1543 | } | ||||
1544 | else { | ||||
1545 | print "$dbm: copied $count entries\n"; | ||||
1546 | return 1; | ||||
1547 | } | ||||
1548 | } | ||||
1549 | |||||
1550 | return 0; | ||||
1551 | } | ||||
1552 | |||||
1553 | sub clear_database { | ||||
1554 | my ($self) = @_; | ||||
1555 | |||||
1556 | return 0 unless ($self->tie_db_writable()); | ||||
1557 | |||||
1558 | dbg("bayes: untie-ing in preparation for removal."); | ||||
1559 | |||||
1560 | foreach my $dbname (@DBNAMES) { | ||||
1561 | my $db_var = 'db_'.$dbname; | ||||
1562 | |||||
1563 | if (exists $self->{$db_var}) { | ||||
1564 | # dbg("bayes: untie-ing $db_var"); | ||||
1565 | untie %{$self->{$db_var}}; | ||||
1566 | delete $self->{$db_var}; | ||||
1567 | } | ||||
1568 | } | ||||
1569 | |||||
1570 | my $path = $self->{bayes}->{main}->sed_path($self->{bayes}->{main}->{conf}->{bayes_path}); | ||||
1571 | |||||
1572 | foreach my $dbname (@DBNAMES, 'journal') { | ||||
1573 | foreach my $ext ($self->DB_EXTENSIONS) { | ||||
1574 | my $name = $path.'_'.$dbname.$ext; | ||||
1575 | my $ret = unlink $name; | ||||
1576 | dbg("bayes: clear_database: %s %s", | ||||
1577 | $ret ? 'removed' : 'tried to remove', $name); | ||||
1578 | } | ||||
1579 | } | ||||
1580 | |||||
1581 | # the journal file needs to be done separately since it has no extension | ||||
1582 | foreach my $dbname ('journal') { | ||||
1583 | my $name = $path.'_'.$dbname; | ||||
1584 | my $ret = unlink $name; | ||||
1585 | dbg("bayes: clear_database: %s %s", | ||||
1586 | $ret ? 'removed' : 'tried to remove', $name); | ||||
1587 | } | ||||
1588 | |||||
1589 | $self->untie_db(); | ||||
1590 | |||||
1591 | return 1; | ||||
1592 | } | ||||
1593 | |||||
1594 | sub backup_database { | ||||
1595 | my ($self) = @_; | ||||
1596 | |||||
1597 | # we tie writable because we want the upgrade code to kick in if needed | ||||
1598 | return 0 unless ($self->tie_db_writable()); | ||||
1599 | |||||
1600 | my @vars = $self->get_storage_variables(); | ||||
1601 | |||||
1602 | print "v\t$vars[6]\tdb_version # this must be the first line!!!\n"; | ||||
1603 | print "v\t$vars[1]\tnum_spam\n"; | ||||
1604 | print "v\t$vars[2]\tnum_nonspam\n"; | ||||
1605 | |||||
1606 | while (my ($tok, $packed) = each %{$self->{db_toks}}) { | ||||
1607 | next if ($tok =~ MAGIC_RE); # skip magic tokens | ||||
1608 | |||||
1609 | my ($ts, $th, $atime) = $self->tok_unpack($packed); | ||||
1610 | my $encoded_token = unpack("H*",$tok); | ||||
1611 | print "t\t$ts\t$th\t$atime\t$encoded_token\n"; | ||||
1612 | } | ||||
1613 | |||||
1614 | while (my ($msgid, $flag) = each %{$self->{db_seen}}) { | ||||
1615 | print "s\t$flag\t$msgid\n"; | ||||
1616 | } | ||||
1617 | |||||
1618 | $self->untie_db(); | ||||
1619 | |||||
1620 | return 1; | ||||
1621 | } | ||||
1622 | |||||
1623 | sub restore_database { | ||||
1624 | my ($self, $filename, $showdots) = @_; | ||||
1625 | |||||
1626 | local *DUMPFILE; | ||||
1627 | if (!open(DUMPFILE, '<', $filename)) { | ||||
1628 | dbg("bayes: unable to open backup file $filename: $!"); | ||||
1629 | return 0; | ||||
1630 | } | ||||
1631 | |||||
1632 | if (!$self->tie_db_writable()) { | ||||
1633 | dbg("bayes: failed to tie db writable"); | ||||
1634 | return 0; | ||||
1635 | } | ||||
1636 | |||||
1637 | my $main = $self->{bayes}->{main}; | ||||
1638 | my $path = $main->sed_path($main->{conf}->{bayes_path}); | ||||
1639 | |||||
1640 | # use a temporary PID-based suffix just in case another one was | ||||
1641 | # created previously by an interrupted expire | ||||
1642 | my $tmpsuffix = "convert$$"; | ||||
1643 | my $tmptoksdbname = $path.'_toks.'.$tmpsuffix; | ||||
1644 | my $tmpseendbname = $path.'_seen.'.$tmpsuffix; | ||||
1645 | my $toksdbname = $path.'_toks'; | ||||
1646 | my $seendbname = $path.'_seen'; | ||||
1647 | |||||
1648 | my %new_toks; | ||||
1649 | my %new_seen; | ||||
1650 | my $umask = umask 0; | ||||
1651 | unless (tie %new_toks, $self->DBM_MODULE, $tmptoksdbname, O_RDWR|O_CREAT|O_EXCL, | ||||
1652 | (oct ($main->{conf}->{bayes_file_mode}) & 0666)) { | ||||
1653 | dbg("bayes: failed to tie temp toks db: $!"); | ||||
1654 | $self->untie_db(); | ||||
1655 | umask $umask; | ||||
1656 | return 0; | ||||
1657 | } | ||||
1658 | unless (tie %new_seen, $self->DBM_MODULE, $tmpseendbname, O_RDWR|O_CREAT|O_EXCL, | ||||
1659 | (oct ($main->{conf}->{bayes_file_mode}) & 0666)) { | ||||
1660 | dbg("bayes: failed to tie temp seen db: $!"); | ||||
1661 | untie %new_toks; | ||||
1662 | $self->_unlink_file($tmptoksdbname); | ||||
1663 | $self->untie_db(); | ||||
1664 | umask $umask; | ||||
1665 | return 0; | ||||
1666 | } | ||||
1667 | umask $umask; | ||||
1668 | |||||
1669 | my $line_count = 0; | ||||
1670 | my $db_version; | ||||
1671 | my $token_count = 0; | ||||
1672 | my $num_spam; | ||||
1673 | my $num_ham; | ||||
1674 | my $error_p = 0; | ||||
1675 | my $newest_token_age = 0; | ||||
1676 | # Kinda wierd I know, but we need a nice big value and we know there will be | ||||
1677 | # no tokens > time() since we reset atime if > time(), so use that with a | ||||
1678 | # little buffer just in case. | ||||
1679 | my $oldest_token_age = time() + 100000; | ||||
1680 | |||||
1681 | my $line = <DUMPFILE>; | ||||
1682 | defined $line or die "Error reading dump file: $!"; | ||||
1683 | $line_count++; | ||||
1684 | |||||
1685 | # We require the database version line to be the first in the file so we can | ||||
1686 | # figure out how to properly deal with the file. If it is not the first | ||||
1687 | # line then fail | ||||
1688 | if ($line =~ m/^v\s+(\d+)\s+db_version/) { | ||||
1689 | $db_version = $1; | ||||
1690 | } | ||||
1691 | else { | ||||
1692 | dbg("bayes: database version must be the first line in the backup file, correct and re-run"); | ||||
1693 | untie %new_toks; | ||||
1694 | untie %new_seen; | ||||
1695 | $self->_unlink_file($tmptoksdbname); | ||||
1696 | $self->_unlink_file($tmpseendbname); | ||||
1697 | $self->untie_db(); | ||||
1698 | return 0; | ||||
1699 | } | ||||
1700 | |||||
1701 | unless ($db_version == 2 || $db_version == 3) { | ||||
1702 | warn("bayes: database version $db_version is unsupported, must be version 2 or 3"); | ||||
1703 | untie %new_toks; | ||||
1704 | untie %new_seen; | ||||
1705 | $self->_unlink_file($tmptoksdbname); | ||||
1706 | $self->_unlink_file($tmpseendbname); | ||||
1707 | $self->untie_db(); | ||||
1708 | return 0; | ||||
1709 | } | ||||
1710 | |||||
1711 | for ($!=0; defined($line=<DUMPFILE>); $!=0) { | ||||
1712 | chomp($line); | ||||
1713 | $line_count++; | ||||
1714 | |||||
1715 | if ($line_count % 1000 == 0) { | ||||
1716 | print STDERR "." if ($showdots); | ||||
1717 | } | ||||
1718 | |||||
1719 | if ($line =~ /^v\s+/) { # variable line | ||||
1720 | my @parsed_line = split(/\s+/, $line, 3); | ||||
1721 | my $value = $parsed_line[1] + 0; | ||||
1722 | if ($parsed_line[2] eq 'num_spam') { | ||||
1723 | $num_spam = $value; | ||||
1724 | } | ||||
1725 | elsif ($parsed_line[2] eq 'num_nonspam') { | ||||
1726 | $num_ham = $value; | ||||
1727 | } | ||||
1728 | else { | ||||
1729 | dbg("bayes: restore_database: skipping unknown line: $line"); | ||||
1730 | } | ||||
1731 | } | ||||
1732 | elsif ($line =~ /^t\s+/) { # token line | ||||
1733 | my @parsed_line = split(/\s+/, $line, 5); | ||||
1734 | my $spam_count = $parsed_line[1] + 0; | ||||
1735 | my $ham_count = $parsed_line[2] + 0; | ||||
1736 | my $atime = $parsed_line[3] + 0; | ||||
1737 | my $token = $parsed_line[4]; | ||||
1738 | |||||
1739 | my $token_warn_p = 0; | ||||
1740 | my @warnings; | ||||
1741 | |||||
1742 | if ($spam_count < 0) { | ||||
1743 | $spam_count = 0; | ||||
1744 | push(@warnings, 'spam count < 0, resetting'); | ||||
1745 | $token_warn_p = 1; | ||||
1746 | } | ||||
1747 | if ($ham_count < 0) { | ||||
1748 | $ham_count = 0; | ||||
1749 | push(@warnings, 'ham count < 0, resetting'); | ||||
1750 | $token_warn_p = 1; | ||||
1751 | } | ||||
1752 | |||||
1753 | if ($spam_count == 0 && $ham_count == 0) { | ||||
1754 | dbg("bayes: token has zero spam and ham count, skipping"); | ||||
1755 | next; | ||||
1756 | } | ||||
1757 | |||||
1758 | if ($atime > time()) { | ||||
1759 | $atime = time(); | ||||
1760 | push(@warnings, 'atime > current time, resetting'); | ||||
1761 | $token_warn_p = 1; | ||||
1762 | } | ||||
1763 | |||||
1764 | if ($token_warn_p) { | ||||
1765 | dbg("bayes: token (%s) has the following warnings:\n%s", | ||||
1766 | $token, join("\n",@warnings)); | ||||
1767 | } | ||||
1768 | |||||
1769 | # database versions < 3 did not encode their token values | ||||
1770 | if ($db_version < 3) { | ||||
1771 | $token = substr(sha1($token), -5); | ||||
1772 | } | ||||
1773 | else { | ||||
1774 | # turn unpacked binary token back into binary value | ||||
1775 | $token = pack("H*",$token); | ||||
1776 | } | ||||
1777 | |||||
1778 | $new_toks{$token} = $self->tok_pack($spam_count, $ham_count, $atime); | ||||
1779 | if ($atime < $oldest_token_age) { | ||||
1780 | $oldest_token_age = $atime; | ||||
1781 | } | ||||
1782 | if ($atime > $newest_token_age) { | ||||
1783 | $newest_token_age = $atime; | ||||
1784 | } | ||||
1785 | $token_count++; | ||||
1786 | } | ||||
1787 | elsif ($line =~ /^s\s+/) { # seen line | ||||
1788 | my @parsed_line = split(/\s+/, $line, 3); | ||||
1789 | my $flag = $parsed_line[1]; | ||||
1790 | my $msgid = $parsed_line[2]; | ||||
1791 | |||||
1792 | unless ($flag eq 'h' || $flag eq 's') { | ||||
1793 | dbg("bayes: unknown seen flag ($flag) for line: $line, skipping"); | ||||
1794 | next; | ||||
1795 | } | ||||
1796 | |||||
1797 | unless ($msgid) { | ||||
1798 | dbg("bayes: blank msgid for line: $line, skipping"); | ||||
1799 | next; | ||||
1800 | } | ||||
1801 | |||||
1802 | $new_seen{$msgid} = $flag; | ||||
1803 | } | ||||
1804 | else { | ||||
1805 | dbg("bayes: skipping unknown line: $line"); | ||||
1806 | next; | ||||
1807 | } | ||||
1808 | } | ||||
1809 | defined $line || $!==0 or die "Error reading dump file: $!"; | ||||
1810 | close(DUMPFILE) or die "Can't close dump file: $!"; | ||||
1811 | |||||
1812 | print STDERR "\n" if ($showdots); | ||||
1813 | |||||
1814 | unless (defined($num_spam)) { | ||||
1815 | dbg("bayes: unable to find num spam, please check file"); | ||||
1816 | $error_p = 1; | ||||
1817 | } | ||||
1818 | |||||
1819 | unless (defined($num_ham)) { | ||||
1820 | dbg("bayes: unable to find num ham, please check file"); | ||||
1821 | $error_p = 1; | ||||
1822 | } | ||||
1823 | |||||
1824 | if ($error_p) { | ||||
1825 | dbg("bayes: error(s) while attempting to load $filename, correct and re-run"); | ||||
1826 | |||||
1827 | untie %new_toks; | ||||
1828 | untie %new_seen; | ||||
1829 | $self->_unlink_file($tmptoksdbname); | ||||
1830 | $self->_unlink_file($tmpseendbname); | ||||
1831 | $self->untie_db(); | ||||
1832 | return 0; | ||||
1833 | } | ||||
1834 | |||||
1835 | # set the calculated magic tokens | ||||
1836 | $new_toks{$DB_VERSION_MAGIC_TOKEN} = $self->DB_VERSION(); | ||||
1837 | $new_toks{$NTOKENS_MAGIC_TOKEN} = $token_count; | ||||
1838 | $new_toks{$NSPAM_MAGIC_TOKEN} = $num_spam; | ||||
1839 | $new_toks{$NHAM_MAGIC_TOKEN} = $num_ham; | ||||
1840 | $new_toks{$NEWEST_TOKEN_AGE_MAGIC_TOKEN} = $newest_token_age; | ||||
1841 | $new_toks{$OLDEST_TOKEN_AGE_MAGIC_TOKEN} = $oldest_token_age; | ||||
1842 | |||||
1843 | # go ahead and zero out these, chances are good that they are bogus anyway. | ||||
1844 | $new_toks{$LAST_EXPIRE_MAGIC_TOKEN} = 0; | ||||
1845 | $new_toks{$LAST_JOURNAL_SYNC_MAGIC_TOKEN} = 0; | ||||
1846 | $new_toks{$LAST_ATIME_DELTA_MAGIC_TOKEN} = 0; | ||||
1847 | $new_toks{$LAST_EXPIRE_REDUCE_MAGIC_TOKEN} = 0; | ||||
1848 | |||||
1849 | local $SIG{'INT'} = 'IGNORE'; | ||||
1850 | local $SIG{'TERM'} = 'IGNORE'; | ||||
1851 | local $SIG{'HUP'} = 'IGNORE' if !am_running_on_windows(); | ||||
1852 | |||||
1853 | untie %new_toks; | ||||
1854 | untie %new_seen; | ||||
1855 | $self->untie_db(); | ||||
1856 | |||||
1857 | # Here is where something can go horribly wrong and screw up the bayes | ||||
1858 | # database files. If we are able to copy one and not the other then it | ||||
1859 | # will leave the database in an inconsistent state. Since this is an | ||||
1860 | # edge case, and they're trying to replace the DB anyway we should be ok. | ||||
1861 | unless ($self->_rename_file($tmptoksdbname, $toksdbname)) { | ||||
1862 | dbg("bayes: error while renaming $tmptoksdbname to $toksdbname: $!"); | ||||
1863 | return 0; | ||||
1864 | } | ||||
1865 | unless ($self->_rename_file($tmpseendbname, $seendbname)) { | ||||
1866 | dbg("bayes: error while renaming $tmpseendbname to $seendbname: $!"); | ||||
1867 | dbg("bayes: database now in inconsistent state"); | ||||
1868 | return 0; | ||||
1869 | } | ||||
1870 | |||||
1871 | dbg("bayes: parsed $line_count lines"); | ||||
1872 | dbg("bayes: created database with $token_count tokens based on $num_spam spam messages and $num_ham ham messages"); | ||||
1873 | |||||
1874 | return 1; | ||||
1875 | } | ||||
1876 | |||||
1877 | ########################################################################### | ||||
1878 | |||||
1879 | # token marshalling format for db_toks. | ||||
1880 | |||||
1881 | # Since we may have many entries with few hits, especially thousands of hapaxes | ||||
1882 | # (1-occurrence entries), use a flexible entry format, instead of simply "2 | ||||
1883 | # packed ints", to keep the memory and disk space usage down. In my | ||||
1884 | # 18k-message test corpus, only 8.9% have >= 8 hits in either counter, so we | ||||
1885 | # can use a 1-byte representation for the other 91% of low-hitting entries | ||||
1886 | # and save masses of space. | ||||
1887 | |||||
1888 | # This looks like: XXSSSHHH (XX = format bits, SSS = 3 spam-count bits, HHH = 3 | ||||
1889 | # ham-count bits). If XX in the first byte is 11, it's packed as this 1-byte | ||||
1890 | # representation; otherwise, if XX in the first byte is 00, it's packed as | ||||
1891 | # "CLL", ie. 1 byte and 2 32-bit "longs" in perl pack format. | ||||
1892 | |||||
1893 | # Savings: roughly halves size of toks db, at the cost of a ~10% slowdown. | ||||
1894 | |||||
1895 | 2 | 90µs | 2 | 456µs | # spent 243µs (30+213) within Mail::SpamAssassin::BayesStore::DBM::BEGIN@1895 which was called:
# once (30µs+213µs) by Mail::SpamAssassin::Plugin::Bayes::learner_new at line 1895 # spent 243µs making 1 call to Mail::SpamAssassin::BayesStore::DBM::BEGIN@1895
# spent 213µs making 1 call to constant::import |
1896 | 2 | 62µs | 2 | 380µs | # spent 208µs (35+172) within Mail::SpamAssassin::BayesStore::DBM::BEGIN@1896 which was called:
# once (35µs+172µs) by Mail::SpamAssassin::Plugin::Bayes::learner_new at line 1896 # spent 208µs making 1 call to Mail::SpamAssassin::BayesStore::DBM::BEGIN@1896
# spent 172µs making 1 call to constant::import |
1897 | 2 | 88µs | 2 | 415µs | # spent 218µs (22+197) within Mail::SpamAssassin::BayesStore::DBM::BEGIN@1897 which was called:
# once (22µs+197µs) by Mail::SpamAssassin::Plugin::Bayes::learner_new at line 1897 # spent 218µs making 1 call to Mail::SpamAssassin::BayesStore::DBM::BEGIN@1897
# spent 197µs making 1 call to constant::import |
1898 | |||||
1899 | 2 | 74µs | 2 | 406µs | # spent 218µs (31+188) within Mail::SpamAssassin::BayesStore::DBM::BEGIN@1899 which was called:
# once (31µs+188µs) by Mail::SpamAssassin::Plugin::Bayes::learner_new at line 1899 # spent 218µs making 1 call to Mail::SpamAssassin::BayesStore::DBM::BEGIN@1899
# spent 188µs making 1 call to constant::import |
1900 | 2 | 982µs | 2 | 365µs | # spent 201µs (38+164) within Mail::SpamAssassin::BayesStore::DBM::BEGIN@1900 which was called:
# once (38µs+164µs) by Mail::SpamAssassin::Plugin::Bayes::learner_new at line 1900 # spent 201µs making 1 call to Mail::SpamAssassin::BayesStore::DBM::BEGIN@1900
# spent 164µs making 1 call to constant::import |
1901 | |||||
1902 | sub tok_unpack { | ||||
1903 | my ($self, $value) = @_; | ||||
1904 | $value ||= 0; | ||||
1905 | |||||
1906 | my ($packed, $atime); | ||||
1907 | if ($self->{db_version} >= 1) { | ||||
1908 | ($packed, $atime) = unpack("CV", $value); | ||||
1909 | } | ||||
1910 | elsif ($self->{db_version} == 0) { | ||||
1911 | ($packed, $atime) = unpack("CS", $value); | ||||
1912 | } | ||||
1913 | |||||
1914 | if (($packed & FORMAT_FLAG) == ONE_BYTE_FORMAT) { | ||||
1915 | return (($packed & ONE_BYTE_SSS_BITS) >> 3, | ||||
1916 | $packed & ONE_BYTE_HHH_BITS, | ||||
1917 | $atime || 0); | ||||
1918 | } | ||||
1919 | elsif (($packed & FORMAT_FLAG) == TWO_LONGS_FORMAT) { | ||||
1920 | my ($packed, $ts, $th, $atime); | ||||
1921 | if ($self->{db_version} >= 1) { | ||||
1922 | ($packed, $ts, $th, $atime) = unpack("CVVV", $value); | ||||
1923 | } | ||||
1924 | elsif ($self->{db_version} == 0) { | ||||
1925 | ($packed, $ts, $th, $atime) = unpack("CLLS", $value); | ||||
1926 | } | ||||
1927 | return ($ts || 0, $th || 0, $atime || 0); | ||||
1928 | } | ||||
1929 | # other formats would go here... | ||||
1930 | else { | ||||
1931 | warn "bayes: unknown packing format for bayes db, please re-learn: $packed"; | ||||
1932 | return (0, 0, 0); | ||||
1933 | } | ||||
1934 | } | ||||
1935 | |||||
1936 | sub tok_pack { | ||||
1937 | my ($self, $ts, $th, $atime) = @_; | ||||
1938 | $ts ||= 0; $th ||= 0; $atime ||= 0; | ||||
1939 | if ($ts < 8 && $th < 8) { | ||||
1940 | return pack ("CV", ONE_BYTE_FORMAT | ($ts << 3) | $th, $atime); | ||||
1941 | } else { | ||||
1942 | return pack ("CVVV", TWO_LONGS_FORMAT, $ts, $th, $atime); | ||||
1943 | } | ||||
1944 | } | ||||
1945 | |||||
1946 | ########################################################################### | ||||
1947 | |||||
1948 | # spent 12µs within Mail::SpamAssassin::BayesStore::DBM::db_readable which was called:
# once (12µs+0s) by Mail::SpamAssassin::Plugin::Bayes::learner_close at line 330 of Mail/SpamAssassin/Plugin/Bayes.pm | ||||
1949 | 1 | 2µs | my ($self) = @_; | ||
1950 | 1 | 15µs | return $self->{already_tied}; | ||
1951 | } | ||||
1952 | |||||
1953 | sub db_writable { | ||||
1954 | my ($self) = @_; | ||||
1955 | return $self->{already_tied} && $self->{is_locked}; | ||||
1956 | } | ||||
1957 | |||||
1958 | ########################################################################### | ||||
1959 | |||||
1960 | sub _unlink_file { | ||||
1961 | my ($self, $filename) = @_; | ||||
1962 | |||||
1963 | unlink $filename; | ||||
1964 | } | ||||
1965 | |||||
1966 | sub _rename_file { | ||||
1967 | my ($self, $sourcefilename, $targetfilename) = @_; | ||||
1968 | |||||
1969 | return 0 unless (rename($sourcefilename, $targetfilename)); | ||||
1970 | |||||
1971 | return 1; | ||||
1972 | } | ||||
1973 | |||||
1974 | sub sa_die { Mail::SpamAssassin::sa_die(@_); } | ||||
1975 | |||||
1976 | 1 | 28µs | 1; | ||
# spent 3.27ms within Mail::SpamAssassin::BayesStore::DBM::CORE:close which was called 235 times, avg 14µs/call:
# 235 times (3.27ms+0s) by Mail::SpamAssassin::BayesStore::DBM::cleanup at line 1179, avg 14µs/call | |||||
# spent 35µs within Mail::SpamAssassin::BayesStore::DBM::CORE:ftdir which was called 2 times, avg 17µs/call:
# 2 times (35µs+0s) by Mail::SpamAssassin::BayesStore::DBM::tie_db_writable at line 267, avg 17µs/call | |||||
# spent 12.3ms within Mail::SpamAssassin::BayesStore::DBM::CORE:ftfile which was called 241 times, avg 51µs/call:
# 238 times (12.3ms+0s) by Mail::SpamAssassin::BayesStore::DBM::tie_db_readonly at line 162, avg 52µs/call
# 3 times (33µs+0s) by Mail::SpamAssassin::BayesStore::DBM::tie_db_writable at line 260, avg 11µs/call | |||||
# spent 4.61ms within Mail::SpamAssassin::BayesStore::DBM::CORE:match which was called 1183 times, avg 4µs/call:
# 949 times (2.93ms+0s) by Mail::SpamAssassin::BayesStore::DBM::get_storage_variables at line 986, avg 3µs/call
# 234 times (1.68ms+0s) by Mail::SpamAssassin::BayesStore::DBM::get_storage_variables at line 912, avg 7µs/call | |||||
# spent 29.7ms within Mail::SpamAssassin::BayesStore::DBM::CORE:open which was called 235 times, avg 126µs/call:
# 235 times (29.7ms+0s) by Mail::SpamAssassin::BayesStore::DBM::cleanup at line 1134, avg 126µs/call | |||||
# spent 9µs within Mail::SpamAssassin::BayesStore::DBM::CORE:qr which was called:
# once (9µs+0s) by Mail::SpamAssassin::BayesStore::DBM::BEGIN@41 at line 41 | |||||
# spent 24.8ms within Mail::SpamAssassin::BayesStore::DBM::CORE:syswrite which was called 235 times, avg 106µs/call:
# 235 times (24.8ms+0s) by Mail::SpamAssassin::BayesStore::DBM::cleanup at line 1149, avg 106µs/call | |||||
# spent 850µs within Mail::SpamAssassin::BayesStore::DBM::CORE:tell which was called 235 times, avg 4µs/call:
# 235 times (850µs+0s) by Mail::SpamAssassin::BayesStore::DBM::cleanup at line 1145, avg 4µs/call | |||||
# spent 1.62ms within Mail::SpamAssassin::BayesStore::DBM::CORE:umask which was called 474 times, avg 3µs/call:
# 235 times (1.06ms+0s) by Mail::SpamAssassin::BayesStore::DBM::cleanup at line 1132, avg 5µs/call
# 235 times (547µs+0s) by Mail::SpamAssassin::BayesStore::DBM::cleanup at line 1139, avg 2µs/call
# 2 times (7µs+0s) by Mail::SpamAssassin::BayesStore::DBM::tie_db_writable at line 289, avg 3µs/call
# 2 times (5µs+0s) by Mail::SpamAssassin::BayesStore::DBM::tie_db_writable at line 304, avg 3µs/call | |||||
# spent 183ms within Mail::SpamAssassin::BayesStore::DBM::CORE:unpack which was called 73949 times, avg 2µs/call:
# 73949 times (183ms+0s) by Mail::SpamAssassin::BayesStore::DBM::multi_tok_count_change at line 1069, avg 2µs/call |