← Index
NYTProf Performance Profile   « line view »
For /usr/local/bin/sa-learn
  Run on Tue Nov 7 05:38:10 2017
Reported on Tue Nov 7 06:16:06 2017

Filename/usr/local/lib/perl5/site_perl/Mail/SpamAssassin/BayesStore/DBM.pm
StatementsExecuted 435816 statements in 9.97s
Subroutines
Calls P F Exclusive
Time
Inclusive
Time
Subroutine
235112.23s3.41sMail::SpamAssassin::BayesStore::DBM::::multi_tok_count_changeMail::SpamAssassin::BayesStore::DBM::multi_tok_count_change
74419311.01s1.01sMail::SpamAssassin::BayesStore::DBM::::defer_updateMail::SpamAssassin::BayesStore::DBM::defer_update
7394911183ms183msMail::SpamAssassin::BayesStore::DBM::::CORE:unpackMail::SpamAssassin::BayesStore::DBM::CORE:unpack (opcode)
1286211130ms130msMail::SpamAssassin::BayesStore::DBM::::get_magic_reMail::SpamAssassin::BayesStore::DBM::get_magic_re
23621113ms6.32sMail::SpamAssassin::BayesStore::DBM::::tie_db_readonlyMail::SpamAssassin::BayesStore::DBM::tie_db_readonly
2362199.0ms177msMail::SpamAssassin::BayesStore::DBM::::get_storage_variablesMail::SpamAssassin::BayesStore::DBM::get_storage_variables
2351129.7ms29.7msMail::SpamAssassin::BayesStore::DBM::::CORE:openMail::SpamAssassin::BayesStore::DBM::CORE:open (opcode)
2351126.7ms98.4msMail::SpamAssassin::BayesStore::DBM::::cleanupMail::SpamAssassin::BayesStore::DBM::cleanup
2351124.8ms24.8msMail::SpamAssassin::BayesStore::DBM::::CORE:syswriteMail::SpamAssassin::BayesStore::DBM::CORE:syswrite (opcode)
2412112.3ms12.3msMail::SpamAssassin::BayesStore::DBM::::CORE:ftfileMail::SpamAssassin::BayesStore::DBM::CORE:ftfile (opcode)
4601111.0ms31.6msMail::SpamAssassin::BayesStore::DBM::::seen_getMail::SpamAssassin::BayesStore::DBM::seen_get
235117.30ms11.1msMail::SpamAssassin::BayesStore::DBM::::seen_putMail::SpamAssassin::BayesStore::DBM::seen_put
235116.50ms9.44msMail::SpamAssassin::BayesStore::DBM::::nspam_nham_changeMail::SpamAssassin::BayesStore::DBM::nspam_nham_change
235116.33ms11.5msMail::SpamAssassin::BayesStore::DBM::::_get_journal_filenameMail::SpamAssassin::BayesStore::DBM::_get_journal_filename
1115.42ms5.91msMail::SpamAssassin::BayesStore::DBM::::BEGIN@38Mail::SpamAssassin::BayesStore::DBM::BEGIN@38
235215.05ms7.36msMail::SpamAssassin::BayesStore::DBM::::_check_db_versionMail::SpamAssassin::BayesStore::DBM::_check_db_version
1183214.61ms4.61msMail::SpamAssassin::BayesStore::DBM::::CORE:matchMail::SpamAssassin::BayesStore::DBM::CORE:match (opcode)
476313.39ms3.39msMail::SpamAssassin::BayesStore::DBM::::DBM_MODULEMail::SpamAssassin::BayesStore::DBM::DBM_MODULE
238213.30ms3.30msMail::SpamAssassin::BayesStore::DBM::::HAS_DBM_MODULEMail::SpamAssassin::BayesStore::DBM::HAS_DBM_MODULE
235113.27ms3.27msMail::SpamAssassin::BayesStore::DBM::::CORE:closeMail::SpamAssassin::BayesStore::DBM::CORE:close (opcode)
238211.63ms1.63msMail::SpamAssassin::BayesStore::DBM::::DB_EXTENSIONSMail::SpamAssassin::BayesStore::DBM::DB_EXTENSIONS
474411.62ms1.62msMail::SpamAssassin::BayesStore::DBM::::CORE:umaskMail::SpamAssassin::BayesStore::DBM::CORE:umask (opcode)
2111.00ms2.64sMail::SpamAssassin::BayesStore::DBM::::tie_db_writableMail::SpamAssassin::BayesStore::DBM::tie_db_writable
23511850µs850µsMail::SpamAssassin::BayesStore::DBM::::CORE:tellMail::SpamAssassin::BayesStore::DBM::CORE:tell (opcode)
222580µs393msMail::SpamAssassin::BayesStore::DBM::::untie_dbMail::SpamAssassin::BayesStore::DBM::untie_db
11158µs72µsMail::SpamAssassin::BayesStore::DBM::::BEGIN@20Mail::SpamAssassin::BayesStore::DBM::BEGIN@20
11154µs76µsMail::SpamAssassin::BayesStore::DBM::::newMail::SpamAssassin::BayesStore::DBM::new
11142µs253µsMail::SpamAssassin::BayesStore::DBM::::BEGIN@41Mail::SpamAssassin::BayesStore::DBM::BEGIN@41
11138µs281µsMail::SpamAssassin::BayesStore::DBM::::BEGIN@27Mail::SpamAssassin::BayesStore::DBM::BEGIN@27
11138µs201µsMail::SpamAssassin::BayesStore::DBM::::BEGIN@1900Mail::SpamAssassin::BayesStore::DBM::BEGIN@1900
11137µs257µsMail::SpamAssassin::BayesStore::DBM::::BEGIN@1450Mail::SpamAssassin::BayesStore::DBM::BEGIN@1450
11135µs208µsMail::SpamAssassin::BayesStore::DBM::::BEGIN@1896Mail::SpamAssassin::BayesStore::DBM::BEGIN@1896
21135µs35µsMail::SpamAssassin::BayesStore::DBM::::CORE:ftdirMail::SpamAssassin::BayesStore::DBM::CORE:ftdir (opcode)
11135µs216µsMail::SpamAssassin::BayesStore::DBM::::BEGIN@37Mail::SpamAssassin::BayesStore::DBM::BEGIN@37
11134µs181µsMail::SpamAssassin::BayesStore::DBM::::BEGIN@29Mail::SpamAssassin::BayesStore::DBM::BEGIN@29
11134µs184µsMail::SpamAssassin::BayesStore::DBM::::BEGIN@31Mail::SpamAssassin::BayesStore::DBM::BEGIN@31
11133µs65µsMail::SpamAssassin::BayesStore::DBM::::_upgrade_dbMail::SpamAssassin::BayesStore::DBM::_upgrade_db
11131µs218µsMail::SpamAssassin::BayesStore::DBM::::BEGIN@1899Mail::SpamAssassin::BayesStore::DBM::BEGIN@1899
11130µs834µsMail::SpamAssassin::BayesStore::DBM::::BEGIN@43Mail::SpamAssassin::BayesStore::DBM::BEGIN@43
11130µs243µsMail::SpamAssassin::BayesStore::DBM::::BEGIN@1895Mail::SpamAssassin::BayesStore::DBM::BEGIN@1895
11128µs35µsMail::SpamAssassin::BayesStore::DBM::::BEGIN@22Mail::SpamAssassin::BayesStore::DBM::BEGIN@22
11128µs246µsMail::SpamAssassin::BayesStore::DBM::::BEGIN@39Mail::SpamAssassin::BayesStore::DBM::BEGIN@39
11128µs2.48msMail::SpamAssassin::BayesStore::DBM::::BEGIN@25Mail::SpamAssassin::BayesStore::DBM::BEGIN@25
11127µs92µsMail::SpamAssassin::BayesStore::DBM::::BEGIN@23Mail::SpamAssassin::BayesStore::DBM::BEGIN@23
11126µs68µsMail::SpamAssassin::BayesStore::DBM::::BEGIN@21Mail::SpamAssassin::BayesStore::DBM::BEGIN@21
11123µs151µsMail::SpamAssassin::BayesStore::DBM::::BEGIN@26Mail::SpamAssassin::BayesStore::DBM::BEGIN@26
11122µs218µsMail::SpamAssassin::BayesStore::DBM::::BEGIN@1897Mail::SpamAssassin::BayesStore::DBM::BEGIN@1897
11118µs18µsMail::SpamAssassin::BayesStore::DBM::::BEGIN@36Mail::SpamAssassin::BayesStore::DBM::BEGIN@36
11116µs16µsMail::SpamAssassin::BayesStore::DBM::::BEGIN@28Mail::SpamAssassin::BayesStore::DBM::BEGIN@28
11112µs12µsMail::SpamAssassin::BayesStore::DBM::::db_readableMail::SpamAssassin::BayesStore::DBM::db_readable
1119µs9µsMail::SpamAssassin::BayesStore::DBM::::CORE:qrMail::SpamAssassin::BayesStore::DBM::CORE:qr (opcode)
0000s0sMail::SpamAssassin::BayesStore::DBM::::_rename_fileMail::SpamAssassin::BayesStore::DBM::_rename_file
0000s0sMail::SpamAssassin::BayesStore::DBM::::_seen_delete_directMail::SpamAssassin::BayesStore::DBM::_seen_delete_direct
0000s0sMail::SpamAssassin::BayesStore::DBM::::_seen_put_directMail::SpamAssassin::BayesStore::DBM::_seen_put_direct
0000s0sMail::SpamAssassin::BayesStore::DBM::::_sync_journalMail::SpamAssassin::BayesStore::DBM::_sync_journal
0000s0sMail::SpamAssassin::BayesStore::DBM::::_sync_journal_trappedMail::SpamAssassin::BayesStore::DBM::_sync_journal_trapped
0000s0sMail::SpamAssassin::BayesStore::DBM::::_unlink_fileMail::SpamAssassin::BayesStore::DBM::_unlink_file
0000s0sMail::SpamAssassin::BayesStore::DBM::::backup_databaseMail::SpamAssassin::BayesStore::DBM::backup_database
0000s0sMail::SpamAssassin::BayesStore::DBM::::calculate_expire_deltaMail::SpamAssassin::BayesStore::DBM::calculate_expire_delta
0000s0sMail::SpamAssassin::BayesStore::DBM::::clear_databaseMail::SpamAssassin::BayesStore::DBM::clear_database
0000s0sMail::SpamAssassin::BayesStore::DBM::::db_writableMail::SpamAssassin::BayesStore::DBM::db_writable
0000s0sMail::SpamAssassin::BayesStore::DBM::::dump_db_toksMail::SpamAssassin::BayesStore::DBM::dump_db_toks
0000s0sMail::SpamAssassin::BayesStore::DBM::::get_running_expire_tokMail::SpamAssassin::BayesStore::DBM::get_running_expire_tok
0000s0sMail::SpamAssassin::BayesStore::DBM::::nspam_nham_getMail::SpamAssassin::BayesStore::DBM::nspam_nham_get
0000s0sMail::SpamAssassin::BayesStore::DBM::::perform_upgradeMail::SpamAssassin::BayesStore::DBM::perform_upgrade
0000s0sMail::SpamAssassin::BayesStore::DBM::::remove_running_expire_tokMail::SpamAssassin::BayesStore::DBM::remove_running_expire_tok
0000s0sMail::SpamAssassin::BayesStore::DBM::::restore_databaseMail::SpamAssassin::BayesStore::DBM::restore_database
0000s0sMail::SpamAssassin::BayesStore::DBM::::sa_dieMail::SpamAssassin::BayesStore::DBM::sa_die
0000s0sMail::SpamAssassin::BayesStore::DBM::::seen_deleteMail::SpamAssassin::BayesStore::DBM::seen_delete
0000s0sMail::SpamAssassin::BayesStore::DBM::::set_last_expireMail::SpamAssassin::BayesStore::DBM::set_last_expire
0000s0sMail::SpamAssassin::BayesStore::DBM::::set_running_expire_tokMail::SpamAssassin::BayesStore::DBM::set_running_expire_tok
0000s0sMail::SpamAssassin::BayesStore::DBM::::syncMail::SpamAssassin::BayesStore::DBM::sync
0000s0sMail::SpamAssassin::BayesStore::DBM::::sync_dueMail::SpamAssassin::BayesStore::DBM::sync_due
0000s0sMail::SpamAssassin::BayesStore::DBM::::tok_count_changeMail::SpamAssassin::BayesStore::DBM::tok_count_change
0000s0sMail::SpamAssassin::BayesStore::DBM::::tok_getMail::SpamAssassin::BayesStore::DBM::tok_get
0000s0sMail::SpamAssassin::BayesStore::DBM::::tok_get_allMail::SpamAssassin::BayesStore::DBM::tok_get_all
0000s0sMail::SpamAssassin::BayesStore::DBM::::tok_packMail::SpamAssassin::BayesStore::DBM::tok_pack
0000s0sMail::SpamAssassin::BayesStore::DBM::::tok_putMail::SpamAssassin::BayesStore::DBM::tok_put
0000s0sMail::SpamAssassin::BayesStore::DBM::::tok_sync_countersMail::SpamAssassin::BayesStore::DBM::tok_sync_counters
0000s0sMail::SpamAssassin::BayesStore::DBM::::tok_sync_nspam_nhamMail::SpamAssassin::BayesStore::DBM::tok_sync_nspam_nham
0000s0sMail::SpamAssassin::BayesStore::DBM::::tok_touchMail::SpamAssassin::BayesStore::DBM::tok_touch
0000s0sMail::SpamAssassin::BayesStore::DBM::::tok_touch_allMail::SpamAssassin::BayesStore::DBM::tok_touch_all
0000s0sMail::SpamAssassin::BayesStore::DBM::::tok_touch_tokenMail::SpamAssassin::BayesStore::DBM::tok_touch_token
0000s0sMail::SpamAssassin::BayesStore::DBM::::tok_unpackMail::SpamAssassin::BayesStore::DBM::tok_unpack
0000s0sMail::SpamAssassin::BayesStore::DBM::::token_expirationMail::SpamAssassin::BayesStore::DBM::token_expiration
0000s0sMail::SpamAssassin::BayesStore::DBM::::upgrade_old_dbm_files_trappedMail::SpamAssassin::BayesStore::DBM::upgrade_old_dbm_files_trapped
Call graph for these subroutines as a Graphviz dot language file.
Line State
ments
Time
on line
Calls Time
in subs
Code
1# <@LICENSE>
2# Licensed to the Apache Software Foundation (ASF) under one or more
3# contributor license agreements. See the NOTICE file distributed with
4# this work for additional information regarding copyright ownership.
5# The ASF licenses this file to you under the Apache License, Version 2.0
6# (the "License"); you may not use this file except in compliance with
7# the License. You may obtain a copy of the License at:
8#
9# http://www.apache.org/licenses/LICENSE-2.0
10#
11# Unless required by applicable law or agreed to in writing, software
12# distributed under the License is distributed on an "AS IS" BASIS,
13# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14# See the License for the specific language governing permissions and
15# limitations under the License.
16# </@LICENSE>
17
18package Mail::SpamAssassin::BayesStore::DBM;
19
20271µs286µs
# spent 72µs (58+14) within Mail::SpamAssassin::BayesStore::DBM::BEGIN@20 which was called: # once (58µs+14µs) by Mail::SpamAssassin::Plugin::Bayes::learner_new at line 20
use strict;
# spent 72µs making 1 call to Mail::SpamAssassin::BayesStore::DBM::BEGIN@20 # spent 14µs making 1 call to strict::import
21281µs2110µs
# spent 68µs (26+42) within Mail::SpamAssassin::BayesStore::DBM::BEGIN@21 which was called: # once (26µs+42µs) by Mail::SpamAssassin::Plugin::Bayes::learner_new at line 21
use warnings;
# spent 68µs making 1 call to Mail::SpamAssassin::BayesStore::DBM::BEGIN@21 # spent 42µs making 1 call to warnings::import
22276µs242µs
# spent 35µs (28+7) within Mail::SpamAssassin::BayesStore::DBM::BEGIN@22 which was called: # once (28µs+7µs) by Mail::SpamAssassin::Plugin::Bayes::learner_new at line 22
use bytes;
# spent 35µs making 1 call to Mail::SpamAssassin::BayesStore::DBM::BEGIN@22 # spent 7µs making 1 call to bytes::import
23266µs2156µs
# spent 92µs (27+65) within Mail::SpamAssassin::BayesStore::DBM::BEGIN@23 which was called: # once (27µs+65µs) by Mail::SpamAssassin::Plugin::Bayes::learner_new at line 23
use re 'taint';
# spent 92µs making 1 call to Mail::SpamAssassin::BayesStore::DBM::BEGIN@23 # spent 65µs making 1 call to re::import
24
25285µs24.93ms
# spent 2.48ms (28µs+2.45) within Mail::SpamAssassin::BayesStore::DBM::BEGIN@25 which was called: # once (28µs+2.45ms) by Mail::SpamAssassin::Plugin::Bayes::learner_new at line 25
use Fcntl;
# spent 2.48ms making 1 call to Mail::SpamAssassin::BayesStore::DBM::BEGIN@25 # spent 2.45ms making 1 call to Exporter::import
26278µs2278µs
# spent 151µs (23+128) within Mail::SpamAssassin::BayesStore::DBM::BEGIN@26 which was called: # once (23µs+128µs) by Mail::SpamAssassin::Plugin::Bayes::learner_new at line 26
use Errno qw(EBADF);
# spent 151µs making 1 call to Mail::SpamAssassin::BayesStore::DBM::BEGIN@26 # spent 128µs making 1 call to Exporter::import
27278µs2523µs
# spent 281µs (38+242) within Mail::SpamAssassin::BayesStore::DBM::BEGIN@27 which was called: # once (38µs+242µs) by Mail::SpamAssassin::Plugin::Bayes::learner_new at line 27
use File::Basename;
# spent 281µs making 1 call to Mail::SpamAssassin::BayesStore::DBM::BEGIN@27 # spent 242µs making 1 call to Exporter::import
28271µs116µs
# spent 16µs within Mail::SpamAssassin::BayesStore::DBM::BEGIN@28 which was called: # once (16µs+0s) by Mail::SpamAssassin::Plugin::Bayes::learner_new at line 28
use File::Spec;
# spent 16µs making 1 call to Mail::SpamAssassin::BayesStore::DBM::BEGIN@28
292153µs2328µs
# spent 181µs (34+147) within Mail::SpamAssassin::BayesStore::DBM::BEGIN@29 which was called: # once (34µs+147µs) by Mail::SpamAssassin::Plugin::Bayes::learner_new at line 29
use File::Path;
# spent 181µs making 1 call to Mail::SpamAssassin::BayesStore::DBM::BEGIN@29 # spent 147µs making 1 call to Exporter::import
30
31
# spent 184µs (34+150) within Mail::SpamAssassin::BayesStore::DBM::BEGIN@31 which was called: # once (34µs+150µs) by Mail::SpamAssassin::Plugin::Bayes::learner_new at line 34
BEGIN {
32316µs1150µs eval { require Digest::SHA; import Digest::SHA qw(sha1); 1 }
# spent 150µs making 1 call to Exporter::import
33120µs or do { require Digest::SHA1; import Digest::SHA1 qw(sha1) }
34151µs1184µs}
# spent 184µs making 1 call to Mail::SpamAssassin::BayesStore::DBM::BEGIN@31
35
36276µs118µs
# spent 18µs within Mail::SpamAssassin::BayesStore::DBM::BEGIN@36 which was called: # once (18µs+0s) by Mail::SpamAssassin::Plugin::Bayes::learner_new at line 36
use Mail::SpamAssassin;
# spent 18µs making 1 call to Mail::SpamAssassin::BayesStore::DBM::BEGIN@36
37288µs2397µs
# spent 216µs (35+181) within Mail::SpamAssassin::BayesStore::DBM::BEGIN@37 which was called: # once (35µs+181µs) by Mail::SpamAssassin::Plugin::Bayes::learner_new at line 37
use Mail::SpamAssassin::Util qw(untaint_var am_running_on_windows);
# spent 216µs making 1 call to Mail::SpamAssassin::BayesStore::DBM::BEGIN@37 # spent 181µs making 1 call to Exporter::import
382345µs15.91ms
# spent 5.91ms (5.42+485µs) within Mail::SpamAssassin::BayesStore::DBM::BEGIN@38 which was called: # once (5.42ms+485µs) by Mail::SpamAssassin::Plugin::Bayes::learner_new at line 38
use Mail::SpamAssassin::BayesStore;
# spent 5.91ms making 1 call to Mail::SpamAssassin::BayesStore::DBM::BEGIN@38
392120µs2465µs
# spent 246µs (28+219) within Mail::SpamAssassin::BayesStore::DBM::BEGIN@39 which was called: # once (28µs+219µs) by Mail::SpamAssassin::Plugin::Bayes::learner_new at line 39
use Mail::SpamAssassin::Logger;
# spent 246µs making 1 call to Mail::SpamAssassin::BayesStore::DBM::BEGIN@39 # spent 218µs making 1 call to Exporter::import
40
412147µs3464µs
# spent 253µs (42+211) within Mail::SpamAssassin::BayesStore::DBM::BEGIN@41 which was called: # once (42µs+211µs) by Mail::SpamAssassin::Plugin::Bayes::learner_new at line 41
use constant MAGIC_RE => qr/^\015\001\007\011\003/;
# spent 253µs making 1 call to Mail::SpamAssassin::BayesStore::DBM::BEGIN@41 # spent 202µs making 1 call to constant::import # spent 9µs making 1 call to Mail::SpamAssassin::BayesStore::DBM::CORE:qr
42
4313µs
# spent 834µs (30+803) within Mail::SpamAssassin::BayesStore::DBM::BEGIN@43 which was called: # once (30µs+803µs) by Mail::SpamAssassin::Plugin::Bayes::learner_new at line 50
use vars qw{
44 @ISA
45 @DBNAMES
46 $NSPAM_MAGIC_TOKEN $NHAM_MAGIC_TOKEN $LAST_EXPIRE_MAGIC_TOKEN $LAST_JOURNAL_SYNC_MAGIC_TOKEN
47 $NTOKENS_MAGIC_TOKEN $OLDEST_TOKEN_AGE_MAGIC_TOKEN $LAST_EXPIRE_REDUCE_MAGIC_TOKEN
48 $RUNNING_EXPIRE_MAGIC_TOKEN $DB_VERSION_MAGIC_TOKEN $LAST_ATIME_DELTA_MAGIC_TOKEN
49 $NEWEST_TOKEN_AGE_MAGIC_TOKEN
50113.4ms21.64ms};
# spent 834µs making 1 call to Mail::SpamAssassin::BayesStore::DBM::BEGIN@43 # spent 803µs making 1 call to vars::import
51
52124µs@ISA = qw( Mail::SpamAssassin::BayesStore );
53
54# db layout (quoting Matt):
55#
56# > need five db files though to make it real fast:
57# [probs] 1. ngood and nbad (two entries, so could be a flat file rather
58# than a db file). (now 2 entries in db_toks)
59# [toks] 2. good token -> number seen
60# [toks] 3. bad token -> number seen (both are packed into 1 entry in 1 db)
61# [probs] 4. Consolidated good token -> probability
62# [probs] 5. Consolidated bad token -> probability
63# > As you add new mails, you update the entry in 2 or 3, then regenerate
64# > the entry for that token in 4 or 5.
65# > Then as you test a new mail, you just need to pull the probability
66# > direct from 4 and 5, and generate the overall probability. A simple and
67# > very fast operation.
68#
69# jm: we use probs as overall probability. <0.5 = ham, >0.5 = spam
70#
71# update: probs is no longer maintained as a db, to keep on-disk and in-core
72# usage down.
73#
74# also, added a new one to support forgetting, auto-learning, and
75# auto-forgetting for refiled mails:
76# [seen] 6. a list of Message-IDs of messages already learnt from. values
77# are 's' for learnt-as-spam, 'h' for learnt-as-ham.
78#
79# and another, called [scancount] to model the scan-count for expiry.
80# This is not a database. Instead it increases by one byte for each
81# message scanned (note: scanned, not learned).
82
8313µs@DBNAMES = qw(toks seen);
84
85# These are the magic tokens we use to track stuff in the DB.
86# The format is '^M^A^G^I^C' followed by any string you want.
87# None of the control chars will be in a real token.
8812µs$DB_VERSION_MAGIC_TOKEN = "\015\001\007\011\003DBVERSION";
8912µs$LAST_ATIME_DELTA_MAGIC_TOKEN = "\015\001\007\011\003LASTATIMEDELTA";
9012µs$LAST_EXPIRE_MAGIC_TOKEN = "\015\001\007\011\003LASTEXPIRE";
9112µs$LAST_EXPIRE_REDUCE_MAGIC_TOKEN = "\015\001\007\011\003LASTEXPIREREDUCE";
9212µs$LAST_JOURNAL_SYNC_MAGIC_TOKEN = "\015\001\007\011\003LASTJOURNALSYNC";
9312µs$NEWEST_TOKEN_AGE_MAGIC_TOKEN = "\015\001\007\011\003NEWESTAGE";
9412µs$NHAM_MAGIC_TOKEN = "\015\001\007\011\003NHAM";
9512µs$NSPAM_MAGIC_TOKEN = "\015\001\007\011\003NSPAM";
9612µs$NTOKENS_MAGIC_TOKEN = "\015\001\007\011\003NTOKENS";
9712µs$OLDEST_TOKEN_AGE_MAGIC_TOKEN = "\015\001\007\011\003OLDESTAGE";
9812µs$RUNNING_EXPIRE_MAGIC_TOKEN = "\015\001\007\011\003RUNNINGEXPIRE";
99
100
# spent 3.30ms within Mail::SpamAssassin::BayesStore::DBM::HAS_DBM_MODULE which was called 238 times, avg 14µs/call: # 236 times (3.27ms+0s) by Mail::SpamAssassin::BayesStore::DBM::tie_db_readonly at line 141, avg 14µs/call # 2 times (28µs+0s) by Mail::SpamAssassin::BayesStore::DBM::tie_db_writable at line 234, avg 14µs/call
sub HAS_DBM_MODULE {
101238549µs my ($self) = @_;
102238776µs if (exists($self->{has_dbm_module})) {
1032372.84ms return $self->{has_dbm_module};
104 }
105220µs $self->{has_dbm_module} = eval { require DB_File; };
106}
107
108
# spent 3.39ms within Mail::SpamAssassin::BayesStore::DBM::DBM_MODULE which was called 476 times, avg 7µs/call: # 468 times (3.35ms+0s) by Mail::SpamAssassin::BayesStore::DBM::tie_db_readonly at line 182, avg 7µs/call # 4 times (21µs+0s) by Mail::SpamAssassin::BayesStore::DBM::tie_db_writable at line 295, avg 5µs/call # 4 times (20µs+0s) by Mail::SpamAssassin::BayesStore::DBM::tie_db_writable at line 301, avg 5µs/call
sub DBM_MODULE {
1094764.52ms return "DB_File";
110}
111
112# Possible file extensions used by the kinds of database files DB_File
113# might create. We need these so we can create a new file and rename
114# it into place.
115
# spent 1.63ms within Mail::SpamAssassin::BayesStore::DBM::DB_EXTENSIONS which was called 238 times, avg 7µs/call: # 236 times (1.62ms+0s) by Mail::SpamAssassin::BayesStore::DBM::tie_db_readonly at line 161, avg 7µs/call # 2 times (11µs+0s) by Mail::SpamAssassin::BayesStore::DBM::tie_db_writable at line 259, avg 6µs/call
sub DB_EXTENSIONS {
1162382.15ms return ('', '.db');
117}
118
119###########################################################################
120
121
# spent 76µs (54+22) within Mail::SpamAssassin::BayesStore::DBM::new which was called: # once (54µs+22µs) by Mail::SpamAssassin::Plugin::Bayes::learner_new at line 3 of (eval 1133)[Mail/SpamAssassin/Plugin/Bayes.pm:1661]
sub new {
12212µs my $class = shift;
12312µs $class = ref($class) || $class;
124
125114µs122µs my $self = $class->SUPER::new(@_);
# spent 22µs making 1 call to Mail::SpamAssassin::BayesStore::new
126
127111µs $self->{supported_db_version} = 3;
128
12912µs $self->{already_tied} = 0;
13012µs $self->{is_locked} = 0;
13113µs $self->{string_to_journal} = '';
132
133111µs $self;
134}
135
136###########################################################################
137
138
# spent 6.32s (113ms+6.20) within Mail::SpamAssassin::BayesStore::DBM::tie_db_readonly which was called 236 times, avg 26.8ms/call: # 235 times (113ms+6.20s) by Mail::SpamAssassin::Plugin::Bayes::learn_message at line 389 of Mail/SpamAssassin/Plugin/Bayes.pm, avg 26.9ms/call # once (130µs+386µs) by Mail::SpamAssassin::Plugin::Bayes::learner_is_scan_available at line 640 of Mail/SpamAssassin/Plugin/Bayes.pm
sub tie_db_readonly {
139236643µs my ($self) = @_;
140
1412362.23ms2363.27ms if (!$self->HAS_DBM_MODULE) {
# spent 3.27ms making 236 calls to Mail::SpamAssassin::BayesStore::DBM::HAS_DBM_MODULE, avg 14µs/call
142 dbg("bayes: %s module not installed, cannot use bayes", $self->DBM_MODULE);
143 return 0;
144 }
145
146 # return if we've already tied to the db's, using the same mode
147 # (locked/unlocked) as before.
1482361.37ms return 1 if ($self->{already_tied} && $self->{is_locked} == 0);
149
150236904µs my $main = $self->{bayes}->{main};
1512361.08ms if (!defined($main->{conf}->{bayes_path})) {
152 dbg("bayes: bayes_path not defined");
153 return 0;
154 }
155
1562362.54ms23611.7ms $self->read_db_configs();
# spent 11.7ms making 236 calls to Mail::SpamAssassin::BayesStore::read_db_configs, avg 50µs/call
157
1582362.71ms2365.53ms my $path = $main->sed_path($main->{conf}->{bayes_path});
# spent 5.53ms making 236 calls to Mail::SpamAssassin::sed_path, avg 23µs/call
159
160236565µs my $found = 0;
1612362.37ms2361.62ms for my $ext ($self->DB_EXTENSIONS) {
# spent 1.62ms making 236 calls to Mail::SpamAssassin::BayesStore::DBM::DB_EXTENSIONS, avg 7µs/call
16223814.9ms23812.3ms if (-f $path.'_toks'.$ext) {
# spent 12.3ms making 238 calls to Mail::SpamAssassin::BayesStore::DBM::CORE:ftfile, avg 52µs/call
163234547µs $found = 1;
164234786µs last;
165 }
166 }
167
168236470µs if (!$found) {
169218µs215µs dbg("bayes: no dbs present, cannot tie DB R/O: %s", $path.'_toks');
# spent 15µs making 2 calls to Mail::SpamAssassin::Logger::dbg, avg 7µs/call
170222µs return 0;
171 }
172
1732341.55ms foreach my $dbname (@DBNAMES) {
1744682.70ms my $name = $path.'_'.$dbname;
1754681.36ms my $db_var = 'db_'.$dbname;
1764684.38ms4684.28ms dbg("bayes: tie-ing to DB file R/O $name");
# spent 4.28ms making 468 calls to Mail::SpamAssassin::Logger::dbg, avg 9µs/call
177
178 # Bug 6901, [rt.cpan.org #83060]
179 # DB_File: Repeated tie to the same hash with no untie causes corruption
1809365.52s4685.49s untie %{$self->{$db_var}}; # has no effect if the variable is not tied
# spent 5.49s making 468 calls to DB_File::DESTROY, avg 11.7ms/call
181
18293616.9ms93699.6ms if (!tie %{$self->{$db_var}}, $self->DBM_MODULE, $name, O_RDONLY,
# spent 96.3ms making 468 calls to DB_File::TIEHASH, avg 206µs/call # spent 3.35ms making 468 calls to Mail::SpamAssassin::BayesStore::DBM::DBM_MODULE, avg 7µs/call
183 (oct($main->{conf}->{bayes_file_mode}) & 0666))
184 {
185 # bug 2975: it's acceptable for the db_seen to not be present,
186 # to allow it to be recycled. if that's the case, just create
187 # a new, empty one. we don't need to lock it, since we won't
188 # be writing to it; let the R/W api deal with that case.
189
190 if ($dbname eq 'seen') {
191 # Bug 6901, [rt.cpan.org #83060]
192 untie %{$self->{$db_var}}; # has no effect if the variable is not tied
193 tie %{$self->{$db_var}}, $self->DBM_MODULE, $name, O_RDWR|O_CREAT,
194 (oct($main->{conf}->{bayes_file_mode}) & 0666)
195 or goto failed_to_tie;
196 }
197 else {
198 goto failed_to_tie;
199 }
200 }
201 }
202
2032344.43ms234176ms $self->{db_version} = ($self->get_storage_variables())[6];
# spent 176ms making 234 calls to Mail::SpamAssassin::BayesStore::DBM::get_storage_variables, avg 752µs/call
2042342.00ms2341.88ms dbg("bayes: found bayes db version %s", $self->{db_version});
# spent 1.88ms making 234 calls to Mail::SpamAssassin::Logger::dbg, avg 8µs/call
205
206 # If the DB version is one we don't understand, abort!
2072342.39ms2347.33ms if ($self->_check_db_version() != 0) {
# spent 7.33ms making 234 calls to Mail::SpamAssassin::BayesStore::DBM::_check_db_version, avg 31µs/call
208167µs warn("bayes: bayes db version ".$self->{db_version}." is not able to be used, aborting!");
209112µs1392ms $self->untie_db();
# spent 392ms making 1 call to Mail::SpamAssassin::BayesStore::DBM::untie_db
210116µs return 0;
211 }
212
213233750µs $self->{already_tied} = 1;
2142339.58ms return 1;
215
216failed_to_tie:
217 warn "bayes: cannot open bayes databases ${path}_* R/O: tie failed: $!\n";
218 foreach my $dbname (@DBNAMES) {
219 my $db_var = 'db_'.$dbname;
220 next unless exists $self->{$db_var};
221 dbg("bayes: untie-ing DB file $dbname");
222 untie %{$self->{$db_var}};
223 }
224
225 return 0;
226}
227
228# tie() to the databases, read-write and locked. Any callers of
229# this should ensure they call untie_db() afterwards!
230#
231
# spent 2.64s (1.00ms+2.64) within Mail::SpamAssassin::BayesStore::DBM::tie_db_writable which was called 2 times, avg 1.32s/call: # 2 times (1.00ms+2.64s) by Mail::SpamAssassin::Plugin::Bayes::learn_message at line 389 of Mail/SpamAssassin/Plugin/Bayes.pm, avg 1.32s/call
sub tie_db_writable {
23225µs my ($self) = @_;
233
234217µs228µs if (!$self->HAS_DBM_MODULE) {
# spent 28µs making 2 calls to Mail::SpamAssassin::BayesStore::DBM::HAS_DBM_MODULE, avg 14µs/call
235 dbg("bayes: %s module not installed, cannot use bayes", $self->DBM_MODULE);
236 return 0;
237 }
238
239 # Useful shortcut ...
24029µs my $main = $self->{bayes}->{main};
241
242 # if we've already tied the db's using the same mode
243 # (locked/unlocked) as we want now, freshen the lock and return.
24425µs if ($self->{already_tied} && $self->{is_locked} == 1) {
245 $main->{locker}->refresh_lock($self->{locked_file});
246 return 1;
247 }
248
249210µs if (!defined($main->{conf}->{bayes_path})) {
250 dbg("bayes: bayes_path not defined");
251 return 0;
252 }
253
254217µs277µs $self->read_db_configs();
# spent 77µs making 2 calls to Mail::SpamAssassin::BayesStore::read_db_configs, avg 38µs/call
255
256225µs244µs my $path = $main->sed_path($main->{conf}->{bayes_path});
# spent 44µs making 2 calls to Mail::SpamAssassin::sed_path, avg 22µs/call
257
25825µs my $found = 0;
259220µs211µs for my $ext ($self->DB_EXTENSIONS) {
# spent 11µs making 2 calls to Mail::SpamAssassin::BayesStore::DBM::DB_EXTENSIONS, avg 6µs/call
260372µs333µs if (-f $path.'_toks'.$ext) {
# spent 33µs making 3 calls to Mail::SpamAssassin::BayesStore::DBM::CORE:ftfile, avg 11µs/call
26112µs $found = 1;
26214µs last;
263 }
264 }
265
266224µs2525µs my $parentdir = dirname($path);
# spent 525µs making 2 calls to File::Basename::dirname, avg 262µs/call
267258µs235µs if (!-d $parentdir) {
# spent 35µs making 2 calls to Mail::SpamAssassin::BayesStore::DBM::CORE:ftdir, avg 17µs/call
268 # run in an eval(); if mkpath has no perms, it calls die()
269 eval {
270 mkpath($parentdir, 0, (oct($main->{conf}->{bayes_file_mode}) & 0777));
271 };
272 }
273
27424µs my $tout;
275212µs if ($main->{learn_wait_for_lock}) {
27625µs $tout = 300; # TODO: Dan to write better lock code
277 } else {
278 $tout = 10;
279 }
280235µs22.63s if ($main->{locker}->safe_lock($path, $tout, $main->{conf}->{bayes_file_mode}))
# spent 2.63s making 2 calls to Mail::SpamAssassin::Locker::UnixNFSSafe::safe_lock, avg 1.32s/call
281 {
282210µs $self->{locked_file} = $path;
28327µs $self->{is_locked} = 1;
284 } else {
285 warn "bayes: cannot open bayes databases ${path}_* R/W: lock failed: $!\n";
286 return 0;
287 }
288
289230µs27µs my $umask = umask 0;
# spent 7µs making 2 calls to Mail::SpamAssassin::BayesStore::DBM::CORE:umask, avg 3µs/call
290223µs foreach my $dbname (@DBNAMES) {
291428µs my $name = $path.'_'.$dbname;
292411µs my $db_var = 'db_'.$dbname;
293442µs438µs dbg("bayes: tie-ing to DB file R/W $name");
# spent 38µs making 4 calls to Mail::SpamAssassin::Logger::dbg, avg 9µs/call
294
295465µs83.27ms ($self->DBM_MODULE eq 'DB_File') and
# spent 3.25ms making 4 calls to Mail::SpamAssassin::Util::avoid_db_file_locking_bug, avg 811µs/call # spent 21µs making 4 calls to Mail::SpamAssassin::BayesStore::DBM::DBM_MODULE, avg 5µs/call
296 Mail::SpamAssassin::Util::avoid_db_file_locking_bug ($name);
297
298 # Bug 6901, [rt.cpan.org #83060]
299846µs untie %{$self->{$db_var}}; # has no effect if the variable is not tied
30049µs tie %{$self->{$db_var}}, $self->DBM_MODULE, $name, O_RDWR|O_CREAT,
3014125µs8942µs (oct($main->{conf}->{bayes_file_mode}) & 0666)
# spent 922µs making 4 calls to DB_File::TIEHASH, avg 230µs/call # spent 20µs making 4 calls to Mail::SpamAssassin::BayesStore::DBM::DBM_MODULE, avg 5µs/call
302 or goto failed_to_tie;
303 }
304221µs25µs umask $umask;
# spent 5µs making 2 calls to Mail::SpamAssassin::BayesStore::DBM::CORE:umask, avg 3µs/call
305
306 # set our cache to what version DB we're using
307234µs21.10ms $self->{db_version} = ($self->get_storage_variables())[6];
# spent 1.10ms making 2 calls to Mail::SpamAssassin::BayesStore::DBM::get_storage_variables, avg 550µs/call
308 # don't bother printing this unless found since it would be bogus anyway
309212µs18µs dbg("bayes: found bayes db version %s", $self->{db_version}) if $found;
# spent 8µs making 1 call to Mail::SpamAssassin::Logger::dbg
310
311 # figure out if we can read the current DB and if we need to do a
312 # DB version update and do it if necessary if either has a problem,
313 # fail immediately
314 #
315217µs165µs if ($found && !$self->_upgrade_db()) {
# spent 65µs making 1 call to Mail::SpamAssassin::BayesStore::DBM::_upgrade_db
316 $self->untie_db();
317 return 0;
318 }
319 elsif (!$found) { # new DB, make sure we know that ...
320165µs319µs $self->{db_version} = $self->{db_toks}->{$DB_VERSION_MAGIC_TOKEN} = $self->DB_VERSION;
# spent 9µs making 1 call to Mail::SpamAssassin::BayesStore::DB_VERSION # spent 6µs making 1 call to DB_File::STORE # spent 5µs making 1 call to DB_File::FETCH
321124µs14µs $self->{db_toks}->{$NTOKENS_MAGIC_TOKEN} = 0; # no tokens in the db ...
# spent 4µs making 1 call to DB_File::STORE
32218µs18µs dbg("bayes: new db, set db version %s and 0 tokens", $self->{db_version});
# spent 8µs making 1 call to Mail::SpamAssassin::Logger::dbg
323 }
324
32527µs $self->{already_tied} = 1;
326224µs return 1;
327
328failed_to_tie:
329 my $err = $!;
330 umask $umask;
331
332 foreach my $dbname (@DBNAMES) {
333 my $db_var = 'db_'.$dbname;
334 next unless exists $self->{$db_var};
335 dbg("bayes: untie-ing DB file $dbname");
336 untie %{$self->{$db_var}};
337 }
338
339 if ($self->{is_locked}) {
340 $self->{bayes}->{main}->{locker}->safe_unlock($self->{locked_file});
341 $self->{is_locked} = 0;
342 }
343 warn "bayes: cannot open bayes databases ${path}_* R/W: tie failed: $err\n";
344 return 0;
345}
346
347# Do we understand how to deal with this DB version?
348
# spent 7.36ms (5.05+2.31) within Mail::SpamAssassin::BayesStore::DBM::_check_db_version which was called 235 times, avg 31µs/call: # 234 times (5.03ms+2.30ms) by Mail::SpamAssassin::BayesStore::DBM::tie_db_readonly at line 207, avg 31µs/call # once (22µs+10µs) by Mail::SpamAssassin::BayesStore::DBM::_upgrade_db at line 359
sub _check_db_version {
349235617µs my ($self) = @_;
350
351 # return -1 if older, 0 if current, 1 if newer
3522354.41ms2352.31ms return $self->{db_version} <=> $self->DB_VERSION;
# spent 2.31ms making 235 calls to Mail::SpamAssassin::BayesStore::DB_VERSION, avg 10µs/call
353}
354
355# Check to see if we need to upgrade the DB, and do so if necessary
356
# spent 65µs (33+32) within Mail::SpamAssassin::BayesStore::DBM::_upgrade_db which was called: # once (33µs+32µs) by Mail::SpamAssassin::BayesStore::DBM::tie_db_writable at line 315
sub _upgrade_db {
35712µs my ($self) = @_;
358
359111µs132µs my $verschk = $self->_check_db_version();
36012µs my $res = 0; # used later on for tie() checks
36112µs my $umask; # used later for umask modifications
362
363 # If the DB is the latest version, no problem.
364112µs return 1 if ($verschk == 0);
365
366 # If the DB is a newer version that we know what to do with ... abort!
367 if ($verschk == 1) {
368 warn("bayes: bayes db version ".$self->{db_version}." is newer than we understand, aborting!");
369 return 0;
370 }
371
372 # If the current DB version is lower than the new version, upgrade!
373 # Do conversions in order so we can go 1 -> 3, make sure to update
374 # $self->{db_version} along the way
375
376 dbg("bayes: detected bayes db format %s, upgrading", $self->{db_version});
377
378 # since DB_File will not shrink a database (!!), we need to *create*
379 # a new one instead.
380 my $main = $self->{bayes}->{main};
381 my $path = $main->sed_path($main->{conf}->{bayes_path});
382 my $name = $path.'_toks';
383
384 # older version's journal files are likely not in the same format as the new ones, so remove it.
385 my $jpath = $self->_get_journal_filename();
386 if (-f $jpath) {
387 dbg("bayes: old journal file found, removing");
388 warn "bayes: couldn't remove $jpath: $!" if (!unlink $jpath);
389 }
390
391 if ($self->{db_version} < 2) {
392 dbg("bayes: upgrading database format from v%s to v2", $self->{db_version});
393 $self->set_running_expire_tok();
394
395 my ($DB_NSPAM_MAGIC_TOKEN, $DB_NHAM_MAGIC_TOKEN, $DB_NTOKENS_MAGIC_TOKEN);
396 my ($DB_OLDEST_TOKEN_AGE_MAGIC_TOKEN, $DB_LAST_EXPIRE_MAGIC_TOKEN);
397
398 # Magic tokens for version 0, defined as '**[A-Z]+'
399 if ($self->{db_version} == 0) {
400 $DB_NSPAM_MAGIC_TOKEN = '**NSPAM';
401 $DB_NHAM_MAGIC_TOKEN = '**NHAM';
402 $DB_NTOKENS_MAGIC_TOKEN = '**NTOKENS';
403 #$DB_OLDEST_TOKEN_AGE_MAGIC_TOKEN = '**OLDESTAGE';
404 #$DB_LAST_EXPIRE_MAGIC_TOKEN = '**LASTEXPIRE';
405 #$DB_SCANCOUNT_BASE_MAGIC_TOKEN = '**SCANBASE';
406 #$DB_RUNNING_EXPIRE_MAGIC_TOKEN = '**RUNNINGEXPIRE';
407 }
408 else {
409 $DB_NSPAM_MAGIC_TOKEN = "\015\001\007\011\003NSPAM";
410 $DB_NHAM_MAGIC_TOKEN = "\015\001\007\011\003NHAM";
411 $DB_NTOKENS_MAGIC_TOKEN = "\015\001\007\011\003NTOKENS";
412 #$DB_OLDEST_TOKEN_AGE_MAGIC_TOKEN = "\015\001\007\011\003OLDESTAGE";
413 #$DB_LAST_EXPIRE_MAGIC_TOKEN = "\015\001\007\011\003LASTEXPIRE";
414 #$DB_SCANCOUNT_BASE_MAGIC_TOKEN = "\015\001\007\011\003SCANBASE";
415 #$DB_RUNNING_EXPIRE_MAGIC_TOKEN = "\015\001\007\011\003RUNNINGEXPIRE";
416 }
417
418 # remember when we started ...
419 my $started = time;
420 my $newatime = $started;
421
422 # use O_EXCL to avoid races (bonus paranoia, since we should be locked
423 # anyway)
424 my %new_toks;
425 $umask = umask 0;
426
427 $res = tie %new_toks, $self->DBM_MODULE, "${name}.new",
428 O_RDWR|O_CREAT|O_EXCL,
429 (oct($main->{conf}->{bayes_file_mode}) & 0666);
430 umask $umask;
431 return 0 unless $res;
432 undef $res;
433
434 # add the magic tokens to the new db.
435 $new_toks{$NSPAM_MAGIC_TOKEN} = $self->{db_toks}->{$DB_NSPAM_MAGIC_TOKEN};
436 $new_toks{$NHAM_MAGIC_TOKEN} = $self->{db_toks}->{$DB_NHAM_MAGIC_TOKEN};
437 $new_toks{$NTOKENS_MAGIC_TOKEN} = $self->{db_toks}->{$DB_NTOKENS_MAGIC_TOKEN};
438 $new_toks{$DB_VERSION_MAGIC_TOKEN} = 2; # we're now a DB version 2 file
439 $new_toks{$OLDEST_TOKEN_AGE_MAGIC_TOKEN} = $newatime;
440 $new_toks{$LAST_EXPIRE_MAGIC_TOKEN} = $newatime;
441 $new_toks{$NEWEST_TOKEN_AGE_MAGIC_TOKEN} = $newatime;
442 $new_toks{$LAST_JOURNAL_SYNC_MAGIC_TOKEN} = $newatime;
443 $new_toks{$LAST_ATIME_DELTA_MAGIC_TOKEN} = 0;
444 $new_toks{$LAST_EXPIRE_REDUCE_MAGIC_TOKEN} = 0;
445
446 # deal with the data tokens
447 my ($tok, $packed);
448 my $count = 0;
449 while (($tok, $packed) = each %{$self->{db_toks}}) {
450 next if ($tok =~ /^(?:\*\*[A-Z]+$|\015\001\007\011\003)/); # skip magic tokens
451
452 my ($ts, $th, $atime) = $self->tok_unpack($packed);
453 $new_toks{$tok} = $self->tok_pack($ts, $th, $newatime);
454
455 # Refresh the lock every so often...
456 if (($count++ % 1000) == 0) {
457 $self->set_running_expire_tok();
458 }
459 }
460
461
462 # now untie so we can do renames
463 untie %{$self->{db_toks}};
464 untie %new_toks;
465
466 # This is the critical phase (moving files around), so don't allow
467 # it to be interrupted.
468 local $SIG{'INT'} = 'IGNORE';
469 local $SIG{'TERM'} = 'IGNORE';
470 local $SIG{'HUP'} = 'IGNORE' if !am_running_on_windows();
471
472 # older versions used scancount, so kill the stupid little file ...
473 my $msgc = $path.'_msgcount';
474 if (-f $msgc) {
475 dbg("bayes: old msgcount file found, removing");
476 if (!unlink $msgc) {
477 warn "bayes: couldn't remove $msgc: $!";
478 }
479 }
480
481 # now rename in the new one. Try several extensions
482 for my $ext ($self->DB_EXTENSIONS) {
483 my $newf = $name.'.new'.$ext;
484 my $oldf = $name.$ext;
485 next unless (-f $newf);
486 if (!rename ($newf, $oldf)) {
487 warn "bayes: rename $newf to $oldf failed: $!\n";
488 return 0;
489 }
490 }
491
492 # re-tie to the new db in read-write mode ...
493 $umask = umask 0;
494 # Bug 6901, [rt.cpan.org #83060]
495 untie %{$self->{db_toks}}; # has no effect if the variable is not tied
496 $res = tie %{$self->{db_toks}}, $self->DBM_MODULE, $name, O_RDWR|O_CREAT,
497 (oct($main->{conf}->{bayes_file_mode}) & 0666);
498 umask $umask;
499 return 0 unless $res;
500 undef $res;
501
502 dbg("bayes: upgraded database format from v%s to v2 in %d seconds",
503 $self->{db_version}, time - $started);
504 $self->{db_version} = 2; # need this for other functions which check
505 }
506
507 # Version 3 of the database converts all existing tokens to SHA1 hashes
508 if ($self->{db_version} == 2) {
509 dbg("bayes: upgrading database format from v%s to v3", $self->{db_version});
510 $self->set_running_expire_tok();
511
512 my $DB_NSPAM_MAGIC_TOKEN = "\015\001\007\011\003NSPAM";
513 my $DB_NHAM_MAGIC_TOKEN = "\015\001\007\011\003NHAM";
514 my $DB_NTOKENS_MAGIC_TOKEN = "\015\001\007\011\003NTOKENS";
515 my $DB_OLDEST_TOKEN_AGE_MAGIC_TOKEN = "\015\001\007\011\003OLDESTAGE";
516 my $DB_LAST_EXPIRE_MAGIC_TOKEN = "\015\001\007\011\003LASTEXPIRE";
517 my $DB_NEWEST_TOKEN_AGE_MAGIC_TOKEN = "\015\001\007\011\003NEWESTAGE";
518 my $DB_LAST_JOURNAL_SYNC_MAGIC_TOKEN = "\015\001\007\011\003LASTJOURNALSYNC";
519 my $DB_LAST_ATIME_DELTA_MAGIC_TOKEN = "\015\001\007\011\003LASTATIMEDELTA";
520 my $DB_LAST_EXPIRE_REDUCE_MAGIC_TOKEN = "\015\001\007\011\003LASTEXPIREREDUCE";
521
522 # remember when we started ...
523 my $started = time;
524
525 # use O_EXCL to avoid races (bonus paranoia, since we should be locked
526 # anyway)
527 my %new_toks;
528 $umask = umask 0;
529 $res = tie %new_toks, $self->DBM_MODULE, "${name}.new", O_RDWR|O_CREAT|O_EXCL,
530 (oct($main->{conf}->{bayes_file_mode}) & 0666);
531 umask $umask;
532 return 0 unless $res;
533 undef $res;
534
535 # add the magic tokens to the new db.
536 $new_toks{$NSPAM_MAGIC_TOKEN} = $self->{db_toks}->{$DB_NSPAM_MAGIC_TOKEN};
537 $new_toks{$NHAM_MAGIC_TOKEN} = $self->{db_toks}->{$DB_NHAM_MAGIC_TOKEN};
538 $new_toks{$NTOKENS_MAGIC_TOKEN} = $self->{db_toks}->{$DB_NTOKENS_MAGIC_TOKEN};
539 $new_toks{$DB_VERSION_MAGIC_TOKEN} = 3; # we're now a DB version 3 file
540 $new_toks{$OLDEST_TOKEN_AGE_MAGIC_TOKEN} = $self->{db_toks}->{$DB_OLDEST_TOKEN_AGE_MAGIC_TOKEN};
541 $new_toks{$LAST_EXPIRE_MAGIC_TOKEN} = $self->{db_toks}->{$DB_LAST_EXPIRE_MAGIC_TOKEN};
542 $new_toks{$NEWEST_TOKEN_AGE_MAGIC_TOKEN} = $self->{db_toks}->{$DB_NEWEST_TOKEN_AGE_MAGIC_TOKEN};
543 $new_toks{$LAST_JOURNAL_SYNC_MAGIC_TOKEN} = $self->{db_toks}->{$DB_LAST_JOURNAL_SYNC_MAGIC_TOKEN};
544 $new_toks{$LAST_ATIME_DELTA_MAGIC_TOKEN} = $self->{db_toks}->{$DB_LAST_ATIME_DELTA_MAGIC_TOKEN};
545 $new_toks{$LAST_EXPIRE_REDUCE_MAGIC_TOKEN} =$self->{db_toks}->{$DB_LAST_EXPIRE_REDUCE_MAGIC_TOKEN};
546
547 # deal with the data tokens
548 my $count = 0;
549 while (my ($tok, $packed) = each %{$self->{db_toks}}) {
550 next if ($tok =~ /^\015\001\007\011\003/); # skip magic tokens
551 my $tok_hash = substr(sha1($tok), -5);
552 $new_toks{$tok_hash} = $packed;
553
554 # Refresh the lock every so often...
555 if (($count++ % 1000) == 0) {
556 $self->set_running_expire_tok();
557 }
558 }
559
560 # now untie so we can do renames
561 untie %{$self->{db_toks}};
562 untie %new_toks;
563
564 # This is the critical phase (moving files around), so don't allow
565 # it to be interrupted.
566 local $SIG{'INT'} = 'IGNORE';
567 local $SIG{'TERM'} = 'IGNORE';
568 local $SIG{'HUP'} = 'IGNORE' if !am_running_on_windows();
569
570 # now rename in the new one. Try several extensions
571 for my $ext ($self->DB_EXTENSIONS) {
572 my $newf = $name.'.new'.$ext;
573 my $oldf = $name.$ext;
574 next unless (-f $newf);
575 if (!rename($newf, $oldf)) {
576 warn "bayes: rename $newf to $oldf failed: $!\n";
577 return 0;
578 }
579 }
580
581 # re-tie to the new db in read-write mode ...
582 $umask = umask 0;
583 # Bug 6901, [rt.cpan.org #83060]
584 untie %{$self->{db_toks}}; # has no effect if the variable is not tied
585 $res = tie %{$self->{db_toks}}, $self->DBM_MODULE, $name, O_RDWR|O_CREAT,
586 (oct ($main->{conf}->{bayes_file_mode}) & 0666);
587 umask $umask;
588 return 0 unless $res;
589 undef $res;
590
591 dbg("bayes: upgraded database format from v%s to v3 in %d seconds",
592 $self->{db_version}, time - $started);
593
594 $self->{db_version} = 3; # need this for other functions which check
595 }
596
597 # if ($self->{db_version} == 3) {
598 # ...
599 # $self->{db_version} = 4; # need this for other functions which check
600 # }
601 # ... and so on.
602
603 return 1;
604}
605
606###########################################################################
607
608
# spent 393ms (580µs+392) within Mail::SpamAssassin::BayesStore::DBM::untie_db which was called 2 times, avg 196ms/call: # once (410µs+391ms) by Mail::SpamAssassin::BayesStore::DBM::tie_db_readonly at line 209 # once (170µs+1.02ms) by Mail::SpamAssassin::Plugin::Bayes::learner_close at line 332 of Mail/SpamAssassin/Plugin/Bayes.pm
sub untie_db {
60927µs my $self = shift;
610
61127µs return if (!$self->{already_tied});
612
613214µs213µs dbg("bayes: untie-ing");
# spent 13µs making 2 calls to Mail::SpamAssassin::Logger::dbg, avg 7µs/call
614
615219µs foreach my $dbname (@DBNAMES) {
616419µs my $db_var = 'db_'.$dbname;
617
618436µs if (exists $self->{$db_var}) {
619 # dbg("bayes: untie-ing $db_var");
6208390ms4390ms untie %{$self->{$db_var}};
# spent 390ms making 4 calls to DB_File::DESTROY, avg 97.5ms/call
621435µs delete $self->{$db_var};
622 }
623 }
624
625211µs if ($self->{is_locked}) {
626223µs224µs dbg("bayes: files locked, now unlocking lock");
# spent 24µs making 2 calls to Mail::SpamAssassin::Logger::dbg, avg 12µs/call
627237µs22.15ms $self->{bayes}->{main}->{locker}->safe_unlock ($self->{locked_file});
# spent 2.15ms making 2 calls to Mail::SpamAssassin::Locker::UnixNFSSafe::safe_unlock, avg 1.07ms/call
628210µs $self->{is_locked} = 0;
629 }
630
63127µs $self->{already_tied} = 0;
632229µs $self->{db_version} = undef;
633}
634
635###########################################################################
636
637sub calculate_expire_delta {
638 my ($self, $newest_atime, $start, $max_expire_mult) = @_;
639
640 my %delta; # use a hash since an array is going to be very sparse
641
642 # do the first pass, figure out atime delta
643 my ($tok, $packed);
644 while (($tok, $packed) = each %{$self->{db_toks}}) {
645 next if ($tok =~ MAGIC_RE); # skip magic tokens
646
647 my ($ts, $th, $atime) = $self->tok_unpack ($packed);
648
649 # Go through from $start * 1 to $start * 512, mark how many tokens
650 # we would expire
651 my $token_age = $newest_atime - $atime;
652 for (my $i = 1; $i <= $max_expire_mult; $i<<=1) {
653 if ($token_age >= $start * $i) {
654 $delta{$i}++;
655 }
656 else {
657 # If the token age is less than the expire delta, it'll be
658 # less for all upcoming checks too, so abort early.
659 last;
660 }
661 }
662 }
663 return %delta;
664}
665
666###########################################################################
667
668sub token_expiration {
669 my ($self, $opts, $newdelta, @vars) = @_;
670
671 my $deleted = 0;
672 my $kept = 0;
673 my $num_hapaxes = 0;
674 my $num_lowfreq = 0;
675
676 # since DB_File will not shrink a database (!!), we need to *create*
677 # a new one instead.
678 my $main = $self->{bayes}->{main};
679 my $path = $main->sed_path($main->{conf}->{bayes_path});
680
681 # use a temporary PID-based suffix just in case another one was
682 # created previously by an interrupted expire
683 my $tmpsuffix = "expire$$";
684 my $tmpdbname = $path.'_toks.'.$tmpsuffix;
685
686 # clean out any leftover db copies from previous runs
687 for my $ext ($self->DB_EXTENSIONS) { unlink ($tmpdbname.$ext); }
688
689 # use O_EXCL to avoid races (bonus paranoia, since we should be locked
690 # anyway)
691 my %new_toks;
692 my $umask = umask 0;
693 tie %new_toks, $self->DBM_MODULE, $tmpdbname, O_RDWR|O_CREAT|O_EXCL,
694 (oct ($main->{conf}->{bayes_file_mode}) & 0666);
695 umask $umask;
696 my $oldest;
697
698 my $showdots = $opts->{showdots};
699 if ($showdots) { print STDERR "\n"; }
700
701 # We've chosen a new atime delta if we've gotten here, so record it
702 # for posterity.
703 $new_toks{$LAST_ATIME_DELTA_MAGIC_TOKEN} = $newdelta;
704
705 # Figure out how old is too old...
706 my $too_old = $vars[10] - $newdelta; # tooold = newest - delta
707
708 # Go ahead and do the move to new db/expire run now ...
709 my ($tok, $packed);
710 while (($tok, $packed) = each %{$self->{db_toks}}) {
711 next if ($tok =~ MAGIC_RE); # skip magic tokens
712
713 my ($ts, $th, $atime) = $self->tok_unpack ($packed);
714
715 if ($atime < $too_old) {
716 $deleted++;
717 }
718 else {
719 # if token atime > newest, reset to newest ...
720 if ($atime > $vars[10]) {
721 $atime = $vars[10];
722 }
723
724 $new_toks{$tok} = $self->tok_pack ($ts, $th, $atime); $kept++;
725 if (!defined($oldest) || $atime < $oldest) { $oldest = $atime; }
726 if ($ts + $th == 1) {
727 $num_hapaxes++;
728 } elsif ($ts < 8 && $th < 8) {
729 $num_lowfreq++;
730 }
731 }
732
733 if ((($kept + $deleted) % 1000) == 0) {
734 if ($showdots) { print STDERR "."; }
735 $self->set_running_expire_tok();
736 }
737 }
738
739 # and add the magic tokens. don't add the expire_running token.
740 $new_toks{$DB_VERSION_MAGIC_TOKEN} = $self->DB_VERSION;
741
742 # We haven't changed messages of each type seen, so just copy over.
743 $new_toks{$NSPAM_MAGIC_TOKEN} = $vars[1];
744 $new_toks{$NHAM_MAGIC_TOKEN} = $vars[2];
745
746 # We magically haven't removed the newest token, so just copy that value over.
747 $new_toks{$NEWEST_TOKEN_AGE_MAGIC_TOKEN} = $vars[10];
748
749 # The rest of these have been modified, so replace as necessary.
750 $new_toks{$NTOKENS_MAGIC_TOKEN} = $kept;
751 $new_toks{$LAST_EXPIRE_MAGIC_TOKEN} = time();
752 $new_toks{$OLDEST_TOKEN_AGE_MAGIC_TOKEN} = $oldest;
753 $new_toks{$LAST_EXPIRE_REDUCE_MAGIC_TOKEN} = $deleted;
754
755 # Sanity check: if we expired too many tokens, abort!
756 if ($kept < 100000) {
757 dbg("bayes: token expiration would expire too many tokens, aborting");
758 # set the magic tokens appropriately
759 # make sure the next expire run does a first pass
760 $self->{db_toks}->{$LAST_EXPIRE_MAGIC_TOKEN} = time();
761 $self->{db_toks}->{$LAST_EXPIRE_REDUCE_MAGIC_TOKEN} = 0;
762 $self->{db_toks}->{$LAST_ATIME_DELTA_MAGIC_TOKEN} = 0;
763
764 # remove the new DB
765 untie %new_toks;
766 for my $ext ($self->DB_EXTENSIONS) { unlink ($tmpdbname.$ext); }
767
768 # reset the results for the return
769 $kept = $vars[3];
770 $deleted = 0;
771 $num_hapaxes = 0;
772 $num_lowfreq = 0;
773 }
774 else {
775 # now untie so we can do renames
776 untie %{$self->{db_toks}};
777 untie %new_toks;
778
779 # This is the critical phase (moving files around), so don't allow
780 # it to be interrupted. Scope the signal changes.
781 {
782 local $SIG{'INT'} = 'IGNORE';
783 local $SIG{'TERM'} = 'IGNORE';
784 local $SIG{'HUP'} = 'IGNORE' if !am_running_on_windows();
785
786 # now rename in the new one. Try several extensions
787 for my $ext ($self->DB_EXTENSIONS) {
788 my $newf = $tmpdbname.$ext;
789 my $oldf = $path.'_toks'.$ext;
790 next unless (-f $newf);
791 if (!rename ($newf, $oldf)) {
792 warn "bayes: rename $newf to $oldf failed: $!\n";
793 }
794 }
795 }
796 }
797
798 # Call untie_db() so we unlock correctly.
799 $self->untie_db();
800
801 return ($kept, $deleted, $num_hapaxes, $num_lowfreq);
802}
803
804###########################################################################
805
806# Is a sync due?
807sub sync_due {
808 my ($self) = @_;
809
810 # don't bother doing old db versions
811 return 0 if ($self->{db_version} < $self->DB_VERSION);
812
813 my $conf = $self->{bayes}->{main}->{conf};
814 return 0 if ($conf->{bayes_journal_max_size} == 0);
815
816 my @vars = $self->get_storage_variables();
817 dbg("bayes: DB journal sync: last sync: %s", $vars[7]);
818
819 ## Ok, should we do a sync?
820
821 # Not if the journal file doesn't exist, it's not a file, or it's 0
822 # bytes long.
823 return 0 unless (stat($self->_get_journal_filename()) && -f _);
824
825 # Yes if the file size is larger than the specified maximum size.
826 return 1 if (-s _ > $conf->{bayes_journal_max_size});
827
828 # Yes there has been a sync before, and if it's been at least a day
829 # since that sync.
830 return 1 if (($vars[7] > 0) && (time - $vars[7] > 86400));
831
832 # No, I guess not.
833 return 0;
834}
835
836###########################################################################
837# db_seen reading APIs
838
839
# spent 31.6ms (11.0+20.7) within Mail::SpamAssassin::BayesStore::DBM::seen_get which was called 460 times, avg 69µs/call: # 460 times (11.0ms+20.7ms) by Mail::SpamAssassin::Plugin::Bayes::_learn_trapped at line 421 of Mail/SpamAssassin/Plugin/Bayes.pm, avg 69µs/call
sub seen_get {
8404602.04ms my ($self, $msgid) = @_;
84146030.9ms46020.7ms $self->{db_seen}->{$msgid};
# spent 20.7ms making 460 calls to DB_File::FETCH, avg 45µs/call
842}
843
844
# spent 11.1ms (7.30+3.83) within Mail::SpamAssassin::BayesStore::DBM::seen_put which was called 235 times, avg 47µs/call: # 235 times (7.30ms+3.83ms) by Mail::SpamAssassin::Plugin::Bayes::_learn_trapped at line 485 of Mail/SpamAssassin/Plugin/Bayes.pm, avg 47µs/call
sub seen_put {
8452351.20ms my ($self, $msgid, $seen) = @_;
846
8472352.64ms if ($self->{bayes}->{main}->{learn_to_journal}) {
8482352.46ms2353.83ms $self->defer_update ("m $seen $msgid");
# spent 3.83ms making 235 calls to Mail::SpamAssassin::BayesStore::DBM::defer_update, avg 16µs/call
849 }
850 else {
851 $self->_seen_put_direct($msgid, $seen);
852 }
853}
854sub _seen_put_direct {
855 my ($self, $msgid, $seen) = @_;
856 $self->{db_seen}->{$msgid} = $seen;
857}
858
859sub seen_delete {
860 my ($self, $msgid) = @_;
861
862 if ($self->{bayes}->{main}->{learn_to_journal}) {
863 $self->defer_update ("m f $msgid");
864 }
865 else {
866 $self->_seen_delete_direct($msgid);
867 }
868}
869sub _seen_delete_direct {
870 my ($self, $msgid) = @_;
871 delete $self->{db_seen}->{$msgid};
872}
873
874###########################################################################
875# db reading APIs
876
877sub tok_get {
878 my ($self, $tok) = @_;
879 $self->tok_unpack ($self->{db_toks}->{$tok});
880}
881
882sub tok_get_all {
883 my ($self, @tokens) = @_;
884
885 my @tokensdata;
886 foreach my $token (@tokens) {
887 my ($tok_spam, $tok_ham, $atime) = $self->tok_unpack($self->{db_toks}->{$token});
888 push(@tokensdata, [$token, $tok_spam, $tok_ham, $atime]);
889 }
890 return \@tokensdata;
891}
892
893# return the magic tokens in a specific order:
894# 0: scan count base
895# 1: number of spam
896# 2: number of ham
897# 3: number of tokens in db
898# 4: last expire atime
899# 5: oldest token in db atime
900# 6: db version value
901# 7: last journal sync
902# 8: last atime delta
903# 9: last expire reduction count
904# 10: newest token in db atime
905#
906
# spent 177ms (99.0+78.1) within Mail::SpamAssassin::BayesStore::DBM::get_storage_variables which was called 236 times, avg 750µs/call: # 234 times (98.4ms+77.5ms) by Mail::SpamAssassin::BayesStore::DBM::tie_db_readonly at line 203, avg 752µs/call # 2 times (539µs+562µs) by Mail::SpamAssassin::BayesStore::DBM::tie_db_writable at line 307, avg 550µs/call
sub get_storage_variables {
907236580µs my ($self) = @_;
908236538µs my @values;
909
91023617.5ms23612.1ms my $db_ver = $self->{db_toks}->{$DB_VERSION_MAGIC_TOKEN};
# spent 12.1ms making 236 calls to DB_File::FETCH, avg 51µs/call
911
91223813.5ms2341.68ms if (!$db_ver || $db_ver =~ /\D/) { $db_ver = 0; }
# spent 1.68ms making 234 calls to Mail::SpamAssassin::BayesStore::DBM::CORE:match, avg 7µs/call
913
9142363.59ms if ($db_ver >= 2) {
915234663µs my $DB2_LAST_ATIME_DELTA_MAGIC_TOKEN = "\015\001\007\011\003LASTATIMEDELTA";
916234651µs my $DB2_LAST_EXPIRE_MAGIC_TOKEN = "\015\001\007\011\003LASTEXPIRE";
917234606µs my $DB2_LAST_EXPIRE_REDUCE_MAGIC_TOKEN = "\015\001\007\011\003LASTEXPIREREDUCE";
918234579µs my $DB2_LAST_JOURNAL_SYNC_MAGIC_TOKEN = "\015\001\007\011\003LASTJOURNALSYNC";
919234635µs my $DB2_NEWEST_TOKEN_AGE_MAGIC_TOKEN = "\015\001\007\011\003NEWESTAGE";
920234539µs my $DB2_NHAM_MAGIC_TOKEN = "\015\001\007\011\003NHAM";
921234516µs my $DB2_NSPAM_MAGIC_TOKEN = "\015\001\007\011\003NSPAM";
922234555µs my $DB2_NTOKENS_MAGIC_TOKEN = "\015\001\007\011\003NTOKENS";
923234576µs my $DB2_OLDEST_TOKEN_AGE_MAGIC_TOKEN = "\015\001\007\011\003OLDESTAGE";
924234608µs my $DB2_RUNNING_EXPIRE_MAGIC_TOKEN = "\015\001\007\011\003RUNNINGEXPIRE";
925
926 @values = (
927 0,
928 $self->{db_toks}->{$DB2_NSPAM_MAGIC_TOKEN},
929 $self->{db_toks}->{$DB2_NHAM_MAGIC_TOKEN},
930 $self->{db_toks}->{$DB2_NTOKENS_MAGIC_TOKEN},
931 $self->{db_toks}->{$DB2_LAST_EXPIRE_MAGIC_TOKEN},
932 $self->{db_toks}->{$DB2_OLDEST_TOKEN_AGE_MAGIC_TOKEN},
933 $db_ver,
934 $self->{db_toks}->{$DB2_LAST_JOURNAL_SYNC_MAGIC_TOKEN},
935 $self->{db_toks}->{$DB2_LAST_ATIME_DELTA_MAGIC_TOKEN},
936 $self->{db_toks}->{$DB2_LAST_EXPIRE_REDUCE_MAGIC_TOKEN},
937234101ms210661.2ms $self->{db_toks}->{$DB2_NEWEST_TOKEN_AGE_MAGIC_TOKEN},
# spent 61.2ms making 2106 calls to DB_File::FETCH, avg 29µs/call
938 );
939 }
940 elsif ($db_ver == 0) {
94126µs my $DB0_NSPAM_MAGIC_TOKEN = '**NSPAM';
94225µs my $DB0_NHAM_MAGIC_TOKEN = '**NHAM';
94326µs my $DB0_OLDEST_TOKEN_AGE_MAGIC_TOKEN = '**OLDESTAGE';
94425µs my $DB0_LAST_EXPIRE_MAGIC_TOKEN = '**LASTEXPIRE';
94525µs my $DB0_NTOKENS_MAGIC_TOKEN = '**NTOKENS';
94625µs my $DB0_SCANCOUNT_BASE_MAGIC_TOKEN = '**SCANBASE';
947
948 @values = (
949 $self->{db_toks}->{$DB0_SCANCOUNT_BASE_MAGIC_TOKEN},
950 $self->{db_toks}->{$DB0_NSPAM_MAGIC_TOKEN},
951 $self->{db_toks}->{$DB0_NHAM_MAGIC_TOKEN},
952 $self->{db_toks}->{$DB0_NTOKENS_MAGIC_TOKEN},
953 $self->{db_toks}->{$DB0_LAST_EXPIRE_MAGIC_TOKEN},
9542243µs12121µs $self->{db_toks}->{$DB0_OLDEST_TOKEN_AGE_MAGIC_TOKEN},
# spent 121µs making 12 calls to DB_File::FETCH, avg 10µs/call
955 0,
956 0,
957 0,
958 0,
959 0,
960 );
961 }
962 elsif ($db_ver == 1) {
963 my $DB1_NSPAM_MAGIC_TOKEN = "\015\001\007\011\003NSPAM";
964 my $DB1_NHAM_MAGIC_TOKEN = "\015\001\007\011\003NHAM";
965 my $DB1_OLDEST_TOKEN_AGE_MAGIC_TOKEN = "\015\001\007\011\003OLDESTAGE";
966 my $DB1_LAST_EXPIRE_MAGIC_TOKEN = "\015\001\007\011\003LASTEXPIRE";
967 my $DB1_NTOKENS_MAGIC_TOKEN = "\015\001\007\011\003NTOKENS";
968 my $DB1_SCANCOUNT_BASE_MAGIC_TOKEN = "\015\001\007\011\003SCANBASE";
969
970 @values = (
971 $self->{db_toks}->{$DB1_SCANCOUNT_BASE_MAGIC_TOKEN},
972 $self->{db_toks}->{$DB1_NSPAM_MAGIC_TOKEN},
973 $self->{db_toks}->{$DB1_NHAM_MAGIC_TOKEN},
974 $self->{db_toks}->{$DB1_NTOKENS_MAGIC_TOKEN},
975 $self->{db_toks}->{$DB1_LAST_EXPIRE_MAGIC_TOKEN},
976 $self->{db_toks}->{$DB1_OLDEST_TOKEN_AGE_MAGIC_TOKEN},
977 1,
978 0,
979 0,
980 0,
981 0,
982 );
983 }
984
9852361.52ms foreach (@values) {
986259624.8ms9492.93ms if (!$_ || $_ =~ /\D/) {
# spent 2.93ms making 949 calls to Mail::SpamAssassin::BayesStore::DBM::CORE:match, avg 3µs/call
98716473.04ms $_ = 0;
988 }
989 }
990
9912364.36ms return @values;
992}
993
994sub dump_db_toks {
995 my ($self, $template, $regex, @vars) = @_;
996
997 while (my ($tok, $tokvalue) = each %{$self->{db_toks}}) {
998 next if ($tok =~ MAGIC_RE); # skip magic tokens
999 next if (defined $regex && ($tok !~ /$regex/o));
1000
1001 # We have the value already, so just unpack it.
1002 my ($ts, $th, $atime) = $self->tok_unpack ($tokvalue);
1003
1004 my $prob = $self->{bayes}->_compute_prob_for_token($tok, $vars[1], $vars[2], $ts, $th);
1005 $prob ||= 0.5;
1006
1007 my $encoded_tok = unpack("H*",$tok);
1008 printf $template,$prob,$ts,$th,$atime,$encoded_tok;
1009 }
1010}
1011
1012sub set_last_expire {
1013 my ($self, $time) = @_;
1014 $self->{db_toks}->{$LAST_EXPIRE_MAGIC_TOKEN} = time();
1015}
1016
1017## Don't bother using get_magic_tokens here. This token should only
1018## ever exist when we're running expire, so we don't want to convert it if
1019## it's there and we're not expiring ...
1020sub get_running_expire_tok {
1021 my ($self) = @_;
1022 my $running = $self->{db_toks}->{$RUNNING_EXPIRE_MAGIC_TOKEN};
1023 if (!$running || $running =~ /\D/) { return; }
1024 return $running;
1025}
1026
1027sub set_running_expire_tok {
1028 my ($self) = @_;
1029
1030 # update the lock and running expire magic token
1031 $self->{bayes}->{main}->{locker}->refresh_lock ($self->{locked_file});
1032 $self->{db_toks}->{$RUNNING_EXPIRE_MAGIC_TOKEN} = time();
1033}
1034
1035sub remove_running_expire_tok {
1036 my ($self) = @_;
1037 delete $self->{db_toks}->{$RUNNING_EXPIRE_MAGIC_TOKEN};
1038}
1039
1040###########################################################################
1041
1042# db abstraction: allow deferred writes, since we will be frequently
1043# writing while checking.
1044
1045sub tok_count_change {
1046 my ($self, $ds, $dh, $tok, $atime) = @_;
1047
1048 $atime = 0 unless defined $atime;
1049
1050 if ($self->{bayes}->{main}->{learn_to_journal}) {
1051 # we can't store the SHA1 binary value in the journal, so convert it
1052 # to a printable value that can be converted back later
1053 my $encoded_tok = unpack("H*",$tok);
1054 $self->defer_update ("c $ds $dh $atime $encoded_tok");
1055 } else {
1056 $self->tok_sync_counters ($ds, $dh, $atime, $tok);
1057 }
1058}
1059
1060
# spent 3.41s (2.23+1.18) within Mail::SpamAssassin::BayesStore::DBM::multi_tok_count_change which was called 235 times, avg 14.5ms/call: # 235 times (2.23s+1.18s) by Mail::SpamAssassin::Plugin::Bayes::_learn_trapped at line 478 of Mail/SpamAssassin/Plugin/Bayes.pm, avg 14.5ms/call
sub multi_tok_count_change {
10612351.18ms my ($self, $ds, $dh, $tokens, $atime) = @_;
1062
1063235562µs $atime = 0 unless defined $atime;
1064
106547028.6ms foreach my $tok (keys %{$tokens}) {
106673949467ms if ($self->{bayes}->{main}->{learn_to_journal}) {
1067 # we can't store the SHA1 binary value in the journal, so convert it
1068 # to a printable value that can be converted back later
106973949846ms73949183ms my $encoded_tok = unpack("H*",$tok);
# spent 183ms making 73949 calls to Mail::SpamAssassin::BayesStore::DBM::CORE:unpack, avg 2µs/call
107073949767ms739491.00s $self->defer_update ("c $ds $dh $atime $encoded_tok");
# spent 1.00s making 73949 calls to Mail::SpamAssassin::BayesStore::DBM::defer_update, avg 14µs/call
1071 } else {
1072 $self->tok_sync_counters ($ds, $dh, $atime, $tok);
1073 }
1074 }
1075}
1076
1077sub nspam_nham_get {
1078 my ($self) = @_;
1079 my @vars = $self->get_storage_variables();
1080 ($vars[1], $vars[2]);
1081}
1082
1083
# spent 9.44ms (6.50+2.93) within Mail::SpamAssassin::BayesStore::DBM::nspam_nham_change which was called 235 times, avg 40µs/call: # 235 times (6.50ms+2.93ms) by Mail::SpamAssassin::Plugin::Bayes::_learn_trapped at line 477 of Mail/SpamAssassin/Plugin/Bayes.pm, avg 40µs/call
sub nspam_nham_change {
1084235711µs my ($self, $ds, $dh) = @_;
1085
10862352.91ms if ($self->{bayes}->{main}->{learn_to_journal}) {
10872352.91ms2352.93ms $self->defer_update ("n $ds $dh");
# spent 2.93ms making 235 calls to Mail::SpamAssassin::BayesStore::DBM::defer_update, avg 12µs/call
1088 } else {
1089 $self->tok_sync_nspam_nham ($ds, $dh);
1090 }
1091}
1092
1093sub tok_touch {
1094 my ($self, $tok, $atime) = @_;
1095 # we can't store the SHA1 binary value in the journal, so convert it
1096 # to a printable value that can be converted back later
1097 my $encoded_tok = unpack("H*", $tok);
1098 $self->defer_update ("t $atime $encoded_tok");
1099}
1100
1101sub tok_touch_all {
1102 my ($self, $tokens, $atime) = @_;
1103
1104 foreach my $token (@{$tokens}) {
1105 # we can't store the SHA1 binary value in the journal, so convert it
1106 # to a printable value that can be converted back later
1107 my $encoded_tok = unpack("H*", $token);
1108 $self->defer_update ("t $atime $encoded_tok");
1109 }
1110}
1111
1112
# spent 1.01s within Mail::SpamAssassin::BayesStore::DBM::defer_update which was called 74419 times, avg 14µs/call: # 73949 times (1.00s+0s) by Mail::SpamAssassin::BayesStore::DBM::multi_tok_count_change at line 1070, avg 14µs/call # 235 times (3.83ms+0s) by Mail::SpamAssassin::BayesStore::DBM::seen_put at line 848, avg 16µs/call # 235 times (2.93ms+0s) by Mail::SpamAssassin::BayesStore::DBM::nspam_nham_change at line 1087, avg 12µs/call
sub defer_update {
111374419218ms my ($self, $str) = @_;
1114744191.10s $self->{string_to_journal} .= "$str\n";
1115}
1116
1117###########################################################################
1118
1119
# spent 98.4ms (26.7+71.8) within Mail::SpamAssassin::BayesStore::DBM::cleanup which was called 235 times, avg 419µs/call: # 235 times (26.7ms+71.8ms) by Mail::SpamAssassin::Plugin::Bayes::_learn_trapped at line 486 of Mail/SpamAssassin/Plugin/Bayes.pm, avg 419µs/call
sub cleanup {
1120235544µs my ($self) = @_;
1121
11222351.02ms my $nbytes = length ($self->{string_to_journal});
1123235629µs return if ($nbytes == 0);
1124
11252352.24ms23511.5ms my $path = $self->_get_journal_filename();
# spent 11.5ms making 235 calls to Mail::SpamAssassin::BayesStore::DBM::_get_journal_filename, avg 49µs/call
1126
1127 # use append mode, write atomically, then close, so simultaneous updates are
1128 # not lost
1129235811µs my $conf = $self->{bayes}->{main}->{conf};
1130
1131 # set the umask to the inverse of what we want ...
11322353.81ms2351.06ms my $umask = umask(0777 - (oct ($conf->{bayes_file_mode}) & 0666));
# spent 1.06ms making 235 calls to Mail::SpamAssassin::BayesStore::DBM::CORE:umask, avg 5µs/call
1133
113423532.4ms23529.7ms if (!open (OUT, ">>".$path)) {
# spent 29.7ms making 235 calls to Mail::SpamAssassin::BayesStore::DBM::CORE:open, avg 126µs/call
1135 warn "bayes: cannot write to $path, bayes db update ignored: $!\n";
1136 umask $umask; # reset umask
1137 return;
1138 }
11392352.09ms235547µs umask $umask; # reset umask
# spent 547µs making 235 calls to Mail::SpamAssassin::BayesStore::DBM::CORE:umask, avg 2µs/call
1140
1141 # do not use print() here, it will break up the buffer if it's >8192 bytes,
1142 # which could result in two sets of tokens getting mixed up and their
1143 # touches missed.
1144235624µs my $write_failure = 0;
11452352.62ms235850µs my $original_point = tell OUT;
# spent 850µs making 235 calls to Mail::SpamAssassin::BayesStore::DBM::CORE:tell, avg 4µs/call
1146235562µs $original_point >= 0 or die "Can't obtain file position: $!";
1147235460µs my $len;
11482351.91ms do {
114923527.7ms23524.8ms $len = syswrite (OUT, $self->{string_to_journal}, $nbytes);
# spent 24.8ms making 235 calls to Mail::SpamAssassin::BayesStore::DBM::CORE:syswrite, avg 106µs/call
1150
1151 # argh, write failure, give up
1152235781µs if (!defined $len || $len < 0) {
1153 my $err = '';
1154 if (!defined $len) {
1155 $len = 0;
1156 $err = " ($!)";
1157 }
1158 warn "bayes: write failed to Bayes journal $path ($len of $nbytes)!$err\n";
1159 last;
1160 }
1161
1162 # This shouldn't happen, but could if the fs is full...
1163235664µs if ($len != $nbytes) {
1164 warn "bayes: partial write to bayes journal $path ($len of $nbytes), recovering\n";
1165
1166 # we want to be atomic, so revert the journal file back to where
1167 # we know it's "good". if we can't truncate the journal, or we've
1168 # tried 5 times to do the write, abort!
1169 if (!truncate(OUT, $original_point) || ($write_failure++ > 4)) {
1170 warn "bayes: cannot write to bayes journal $path, aborting!\n";
1171 last;
1172 }
1173
1174 # if the fs is full, let's give the system a break
1175 sleep 1;
1176 }
1177 } while ($len != $nbytes);
1178
11792355.32ms2353.27ms if (!close OUT) {
# spent 3.27ms making 235 calls to Mail::SpamAssassin::BayesStore::DBM::CORE:close, avg 14µs/call
1180 warn "bayes: cannot write to $path, bayes db update ignored\n";
1181 }
1182
11832352.77ms $self->{string_to_journal} = '';
1184}
1185
1186# Return a qr'd RE to match a token with the correct format's magic token
1187
# spent 130ms within Mail::SpamAssassin::BayesStore::DBM::get_magic_re which was called 12862 times, avg 10µs/call: # 12862 times (130ms+0s) by Mail::SpamAssassin::Plugin::Bayes::_tokenize_line at line 1168 of Mail/SpamAssassin/Plugin/Bayes.pm, avg 10µs/call
sub get_magic_re {
11881286224.4ms my ($self) = @_;
1189
11901286236.6ms if (!defined $self->{db_version} || $self->{db_version} >= 1) {
119112862129ms return MAGIC_RE;
1192 }
1193
1194 # When in doubt, assume v0
1195 return qr/^\*\*[A-Z]+$/;
1196}
1197
1198# provide a more generalized public interface into the journal sync
1199
1200sub sync {
1201 my ($self, $opts) = @_;
1202
1203 return $self->_sync_journal($opts);
1204}
1205
1206###########################################################################
1207# And this method reads the journal and applies the changes in one
1208# (locked) transaction.
1209
1210sub _sync_journal {
1211 my ($self, $opts) = @_;
1212 my $ret = 0;
1213
1214 my $path = $self->_get_journal_filename();
1215
1216 # if $path doesn't exist, or it's not a file, or is 0 bytes in length, return
1217 if (!stat($path) || !-f _ || -z _) {
1218 return 0;
1219 }
1220
1221 my $eval_stat;
1222 eval {
1223 local $SIG{'__DIE__'}; # do not run user die() traps in here
1224 if ($self->tie_db_writable()) {
1225 $ret = $self->_sync_journal_trapped($opts, $path);
1226 }
1227 1;
1228 } or do {
1229 $eval_stat = $@ ne '' ? $@ : "errno=$!"; chomp $eval_stat;
1230 };
1231
1232 # ok, untie from write-mode if we can
1233 if (!$self->{bayes}->{main}->{learn_caller_will_untie}) {
1234 $self->untie_db();
1235 }
1236
1237 # handle any errors that may have occurred
1238 if (defined $eval_stat) {
1239 warn "bayes: $eval_stat\n";
1240 return 0;
1241 }
1242
1243 $ret;
1244}
1245
1246sub _sync_journal_trapped {
1247 my ($self, $opts, $path) = @_;
1248
1249 # Flag that we're doing work
1250 $self->set_running_expire_tok();
1251
1252 my $started = time();
1253 my $count = 0;
1254 my $total_count = 0;
1255 my %tokens;
1256 my $showdots = $opts->{showdots};
1257 my $retirepath = $path.".old";
1258
1259 # if $path doesn't exist, or it's not a file, or is 0 bytes in length,
1260 # return we have to check again since the file may have been removed
1261 # by a recent bayes db upgrade ...
1262 if (!stat($path) || !-f _ || -z _) {
1263 return 0;
1264 }
1265
1266 if (!-r $path) { # will we be able to read the file?
1267 warn "bayes: bad permissions on journal, can't read: $path\n";
1268 return 0;
1269 }
1270
1271 # This is the critical phase (moving files around), so don't allow
1272 # it to be interrupted.
1273 {
1274 local $SIG{'INT'} = 'IGNORE';
1275 local $SIG{'TERM'} = 'IGNORE';
1276 local $SIG{'HUP'} = 'IGNORE' if !am_running_on_windows();
1277
1278 # retire the journal, so we can update the db files from it in peace.
1279 # TODO: use locking here
1280 if (!rename ($path, $retirepath)) {
1281 warn "bayes: failed rename $path to $retirepath\n";
1282 return 0;
1283 }
1284
1285 # now read the retired journal
1286 local *JOURNAL;
1287 if (!open (JOURNAL, "<$retirepath")) {
1288 warn "bayes: cannot open read $retirepath\n";
1289 return 0;
1290 }
1291
1292
1293 # Read the journal
1294 for ($!=0; defined($_=<JOURNAL>); $!=0) {
1295 $total_count++;
1296
1297 if (/^t (\d+) (.+)$/) { # Token timestamp update, cache resultant entries
1298 my $tok = pack("H*",$2);
1299 $tokens{$tok} = $1+0 if (!exists $tokens{$tok} || $1+0 > $tokens{$tok});
1300 } elsif (/^c (-?\d+) (-?\d+) (\d+) (.+)$/) { # Add/full token update
1301 my $tok = pack("H*",$4);
1302 $self->tok_sync_counters ($1+0, $2+0, $3+0, $tok);
1303 $count++;
1304 } elsif (/^n (-?\d+) (-?\d+)$/) { # update ham/spam count
1305 $self->tok_sync_nspam_nham ($1+0, $2+0);
1306 $count++;
1307 } elsif (/^m ([hsf]) (.+)$/) { # update msgid seen database
1308 if ($1 eq "f") {
1309 $self->_seen_delete_direct($2);
1310 }
1311 else {
1312 $self->_seen_put_direct($2,$1);
1313 }
1314 $count++;
1315 } else {
1316 warn "bayes: gibberish entry found in journal: $_";
1317 }
1318 }
1319 defined $_ || $!==0 or
1320 $!==EBADF ? dbg("bayes: error reading journal file: $!")
1321 : die "error reading journal file: $!";
1322 close(JOURNAL) or die "Can't close journal file: $!";
1323
1324 # Now that we've determined what tokens we need to update and their
1325 # final values, update the DB. Should be much smaller than the full
1326 # journal entries.
1327 while (my ($k,$v) = each %tokens) {
1328 $self->tok_touch_token ($v, $k);
1329
1330 if ((++$count % 1000) == 0) {
1331 if ($showdots) { print STDERR "."; }
1332 $self->set_running_expire_tok();
1333 }
1334 }
1335
1336 if ($showdots) { print STDERR "\n"; }
1337
1338 # we're all done, so unlink the old journal file
1339 unlink ($retirepath) || warn "bayes: can't unlink $retirepath: $!\n";
1340
1341 $self->{db_toks}->{$LAST_JOURNAL_SYNC_MAGIC_TOKEN} = $started;
1342
1343 my $done = time();
1344 my $msg = ("bayes: synced databases from journal in " .
1345 ($done - $started) .
1346 " seconds: $count unique entries ($total_count total entries)");
1347
1348 if ($opts->{verbose}) {
1349 print $msg,"\n";
1350 } else {
1351 dbg($msg);
1352 }
1353 }
1354
1355 # else, that's the lot, we're synced. return
1356 return 1;
1357}
1358
1359sub tok_touch_token {
1360 my ($self, $atime, $tok) = @_;
1361 my ($ts, $th, $oldatime) = $self->tok_get ($tok);
1362
1363 # If the new atime is < the old atime, ignore the update
1364 # We figure that we'll never want to lower a token atime, so abort if
1365 # we try. (journal out of sync, etc.)
1366 return if ($oldatime >= $atime);
1367
1368 $self->tok_put ($tok, $ts, $th, $atime);
1369}
1370
1371sub tok_sync_counters {
1372 my ($self, $ds, $dh, $atime, $tok) = @_;
1373 my ($ts, $th, $oldatime) = $self->tok_get ($tok);
1374 $ts += $ds; if ($ts < 0) { $ts = 0; }
1375 $th += $dh; if ($th < 0) { $th = 0; }
1376
1377 # Don't roll the atime of tokens backwards ...
1378 $atime = $oldatime if ($oldatime > $atime);
1379
1380 $self->tok_put ($tok, $ts, $th, $atime);
1381}
1382
1383sub tok_put {
1384 my ($self, $tok, $ts, $th, $atime) = @_;
1385 $ts ||= 0;
1386 $th ||= 0;
1387
1388 # Ignore magic tokens, the don't go in this way ...
1389 return if ($tok =~ MAGIC_RE);
1390
1391 # use defined() rather than exists(); the latter is not supported
1392 # by NDBM_File, believe it or not. Using defined() did not
1393 # indicate any noticeable speed hit in my testing. (Mar 31 2003 jm)
1394 my $exists_already = defined $self->{db_toks}->{$tok};
1395
1396 if ($ts == 0 && $th == 0) {
1397 return if (!$exists_already); # If the token doesn't exist, just return
1398 $self->{db_toks}->{$NTOKENS_MAGIC_TOKEN}--;
1399 delete $self->{db_toks}->{$tok};
1400 } else {
1401 if (!$exists_already) { # If the token doesn't exist, raise the token count
1402 $self->{db_toks}->{$NTOKENS_MAGIC_TOKEN}++;
1403 }
1404
1405 $self->{db_toks}->{$tok} = $self->tok_pack ($ts, $th, $atime);
1406
1407 my $newmagic = $self->{db_toks}->{$NEWEST_TOKEN_AGE_MAGIC_TOKEN};
1408 if (!defined ($newmagic) || $atime > $newmagic) {
1409 $self->{db_toks}->{$NEWEST_TOKEN_AGE_MAGIC_TOKEN} = $atime;
1410 }
1411
1412 # Make sure to check for either !defined or "" ... Apparently
1413 # sometimes the DB module doesn't return the value correctly. :(
1414 my $oldmagic = $self->{db_toks}->{$OLDEST_TOKEN_AGE_MAGIC_TOKEN};
1415 if (!defined ($oldmagic) || $oldmagic eq "" || $atime < $oldmagic) {
1416 $self->{db_toks}->{$OLDEST_TOKEN_AGE_MAGIC_TOKEN} = $atime;
1417 }
1418 }
1419}
1420
1421sub tok_sync_nspam_nham {
1422 my ($self, $ds, $dh) = @_;
1423 my ($ns, $nh) = ($self->get_storage_variables())[1,2];
1424 if ($ds) { $ns += $ds; } if ($ns < 0) { $ns = 0; }
1425 if ($dh) { $nh += $dh; } if ($nh < 0) { $nh = 0; }
1426 $self->{db_toks}->{$NSPAM_MAGIC_TOKEN} = $ns;
1427 $self->{db_toks}->{$NHAM_MAGIC_TOKEN} = $nh;
1428}
1429
1430###########################################################################
1431
1432
# spent 11.5ms (6.33+5.15) within Mail::SpamAssassin::BayesStore::DBM::_get_journal_filename which was called 235 times, avg 49µs/call: # 235 times (6.33ms+5.15ms) by Mail::SpamAssassin::BayesStore::DBM::cleanup at line 1125, avg 49µs/call
sub _get_journal_filename {
1433235563µs my ($self) = @_;
1434
1435235820µs my $main = $self->{bayes}->{main};
14362354.83ms2355.15ms return $main->sed_path($main->{conf}->{bayes_path}."_journal");
# spent 5.15ms making 235 calls to Mail::SpamAssassin::sed_path, avg 22µs/call
1437}
1438
1439###########################################################################
1440
1441# this is called directly from sa-learn(1).
1442sub perform_upgrade {
1443 my ($self, $opts) = @_;
1444 my $ret = 0;
1445
1446 my $eval_stat;
1447 eval {
1448 local $SIG{'__DIE__'}; # do not run user die() traps in here
1449
145024.55ms2477µs
# spent 257µs (37+220) within Mail::SpamAssassin::BayesStore::DBM::BEGIN@1450 which was called: # once (37µs+220µs) by Mail::SpamAssassin::Plugin::Bayes::learner_new at line 1450
use File::Basename;
# spent 257µs making 1 call to Mail::SpamAssassin::BayesStore::DBM::BEGIN@1450 # spent 220µs making 1 call to Exporter::import
1451
1452 # bayes directory
1453 my $main = $self->{bayes}->{main};
1454 my $path = $main->sed_path($main->{conf}->{bayes_path});
1455
1456 # prevent dirname() from tainting the result, it assumes $1 is not tainted
1457 local($1,$2,$3); # Bug 6310; perl #67962 (fixed in perl 5.12/5.13)
1458 my $dir = dirname($path);
1459
1460 # make temporary copy since old dbm and new dbm may have same name
1461 opendir(DIR, $dir) or die "bayes: can't opendir $dir: $!";
1462 my @files = grep { /^bayes_(?:seen|toks)(?:\.\w+)?$/ } readdir(DIR);
1463 closedir(DIR) or die "bayes: can't close directory $dir: $!";
1464 if (@files < 2 || !grep(/bayes_seen/,@files) || !grep(/bayes_toks/,@files))
1465 {
1466 die "bayes: unable to find bayes_toks and bayes_seen, stopping\n";
1467 }
1468 # untaint @files (already safe after grep)
1469 untaint_var(\@files);
1470
1471 for (@files) {
1472 my $src = "$dir/$_";
1473 my $dst = "$dir/old_$_";
1474 eval q{
1475 use File::Copy;
1476 copy($src, $dst);
1477 } || die "bayes: can't copy $src to $dst: $!\n";
1478 }
1479
1480 # delete previous to make way for import
1481 for (@files) { unlink("$dir/$_"); }
1482
1483 # import
1484 if ($self->tie_db_writable()) {
1485 $ret += $self->upgrade_old_dbm_files_trapped("$dir/old_bayes_seen",
1486 $self->{db_seen});
1487 $ret += $self->upgrade_old_dbm_files_trapped("$dir/old_bayes_toks",
1488 $self->{db_toks});
1489 }
1490
1491 if ($ret == 2) {
1492 print "import successful, original files saved with \"old\" prefix\n";
1493 }
1494 else {
1495 print "import failed, original files saved with \"old\" prefix\n";
1496 }
1497 1;
1498 } or do {
1499 $eval_stat = $@ ne '' ? $@ : "errno=$!"; chomp $eval_stat;
1500 };
1501
1502 $self->untie_db();
1503
1504 # if we died, untie the dbm files
1505 if (defined $eval_stat) {
1506 warn "bayes: perform_upgrade: $eval_stat\n";
1507 return 0;
1508 }
1509 $ret;
1510}
1511
1512sub upgrade_old_dbm_files_trapped {
1513 my ($self, $filename, $output) = @_;
1514
1515 my $count;
1516 my %in;
1517
1518 print "upgrading to DB_File, please be patient: $filename\n";
1519
1520 # try each type of file until we find one with > 0 entries
1521 for my $dbm ('DB_File', 'GDBM_File', 'NDBM_File', 'SDBM_File') {
1522 $count = 0;
1523 # wrap in eval so it doesn't run in general use. This accesses db
1524 # modules directly.
1525 # Note: (bug 2390), the 'use' needs to be on the same line as the eval
1526 # for RPM dependency checks to work properly. It's lame, but...
1527 my $eval_stat;
1528 eval 'use ' . $dbm . ';
1529 tie %in, "' . $dbm . '", $filename, O_RDONLY, 0600;
1530 %{ $output } = %in;
1531 $count = scalar keys %{ $output };
1532 untie %in;
1533 1;
1534 ' or do {
1535 $eval_stat = $@ ne '' ? $@ : "errno=$!"; chomp $eval_stat;
1536 };
1537 if (defined $eval_stat) {
1538 print "$dbm: $dbm module not installed(?), nothing copied: $eval_stat\n";
1539 dbg("bayes: error was: $eval_stat");
1540 }
1541 elsif ($count == 0) {
1542 print "$dbm: no database of that kind found, nothing copied\n";
1543 }
1544 else {
1545 print "$dbm: copied $count entries\n";
1546 return 1;
1547 }
1548 }
1549
1550 return 0;
1551}
1552
1553sub clear_database {
1554 my ($self) = @_;
1555
1556 return 0 unless ($self->tie_db_writable());
1557
1558 dbg("bayes: untie-ing in preparation for removal.");
1559
1560 foreach my $dbname (@DBNAMES) {
1561 my $db_var = 'db_'.$dbname;
1562
1563 if (exists $self->{$db_var}) {
1564 # dbg("bayes: untie-ing $db_var");
1565 untie %{$self->{$db_var}};
1566 delete $self->{$db_var};
1567 }
1568 }
1569
1570 my $path = $self->{bayes}->{main}->sed_path($self->{bayes}->{main}->{conf}->{bayes_path});
1571
1572 foreach my $dbname (@DBNAMES, 'journal') {
1573 foreach my $ext ($self->DB_EXTENSIONS) {
1574 my $name = $path.'_'.$dbname.$ext;
1575 my $ret = unlink $name;
1576 dbg("bayes: clear_database: %s %s",
1577 $ret ? 'removed' : 'tried to remove', $name);
1578 }
1579 }
1580
1581 # the journal file needs to be done separately since it has no extension
1582 foreach my $dbname ('journal') {
1583 my $name = $path.'_'.$dbname;
1584 my $ret = unlink $name;
1585 dbg("bayes: clear_database: %s %s",
1586 $ret ? 'removed' : 'tried to remove', $name);
1587 }
1588
1589 $self->untie_db();
1590
1591 return 1;
1592}
1593
1594sub backup_database {
1595 my ($self) = @_;
1596
1597 # we tie writable because we want the upgrade code to kick in if needed
1598 return 0 unless ($self->tie_db_writable());
1599
1600 my @vars = $self->get_storage_variables();
1601
1602 print "v\t$vars[6]\tdb_version # this must be the first line!!!\n";
1603 print "v\t$vars[1]\tnum_spam\n";
1604 print "v\t$vars[2]\tnum_nonspam\n";
1605
1606 while (my ($tok, $packed) = each %{$self->{db_toks}}) {
1607 next if ($tok =~ MAGIC_RE); # skip magic tokens
1608
1609 my ($ts, $th, $atime) = $self->tok_unpack($packed);
1610 my $encoded_token = unpack("H*",$tok);
1611 print "t\t$ts\t$th\t$atime\t$encoded_token\n";
1612 }
1613
1614 while (my ($msgid, $flag) = each %{$self->{db_seen}}) {
1615 print "s\t$flag\t$msgid\n";
1616 }
1617
1618 $self->untie_db();
1619
1620 return 1;
1621}
1622
1623sub restore_database {
1624 my ($self, $filename, $showdots) = @_;
1625
1626 local *DUMPFILE;
1627 if (!open(DUMPFILE, '<', $filename)) {
1628 dbg("bayes: unable to open backup file $filename: $!");
1629 return 0;
1630 }
1631
1632 if (!$self->tie_db_writable()) {
1633 dbg("bayes: failed to tie db writable");
1634 return 0;
1635 }
1636
1637 my $main = $self->{bayes}->{main};
1638 my $path = $main->sed_path($main->{conf}->{bayes_path});
1639
1640 # use a temporary PID-based suffix just in case another one was
1641 # created previously by an interrupted expire
1642 my $tmpsuffix = "convert$$";
1643 my $tmptoksdbname = $path.'_toks.'.$tmpsuffix;
1644 my $tmpseendbname = $path.'_seen.'.$tmpsuffix;
1645 my $toksdbname = $path.'_toks';
1646 my $seendbname = $path.'_seen';
1647
1648 my %new_toks;
1649 my %new_seen;
1650 my $umask = umask 0;
1651 unless (tie %new_toks, $self->DBM_MODULE, $tmptoksdbname, O_RDWR|O_CREAT|O_EXCL,
1652 (oct ($main->{conf}->{bayes_file_mode}) & 0666)) {
1653 dbg("bayes: failed to tie temp toks db: $!");
1654 $self->untie_db();
1655 umask $umask;
1656 return 0;
1657 }
1658 unless (tie %new_seen, $self->DBM_MODULE, $tmpseendbname, O_RDWR|O_CREAT|O_EXCL,
1659 (oct ($main->{conf}->{bayes_file_mode}) & 0666)) {
1660 dbg("bayes: failed to tie temp seen db: $!");
1661 untie %new_toks;
1662 $self->_unlink_file($tmptoksdbname);
1663 $self->untie_db();
1664 umask $umask;
1665 return 0;
1666 }
1667 umask $umask;
1668
1669 my $line_count = 0;
1670 my $db_version;
1671 my $token_count = 0;
1672 my $num_spam;
1673 my $num_ham;
1674 my $error_p = 0;
1675 my $newest_token_age = 0;
1676 # Kinda wierd I know, but we need a nice big value and we know there will be
1677 # no tokens > time() since we reset atime if > time(), so use that with a
1678 # little buffer just in case.
1679 my $oldest_token_age = time() + 100000;
1680
1681 my $line = <DUMPFILE>;
1682 defined $line or die "Error reading dump file: $!";
1683 $line_count++;
1684
1685 # We require the database version line to be the first in the file so we can
1686 # figure out how to properly deal with the file. If it is not the first
1687 # line then fail
1688 if ($line =~ m/^v\s+(\d+)\s+db_version/) {
1689 $db_version = $1;
1690 }
1691 else {
1692 dbg("bayes: database version must be the first line in the backup file, correct and re-run");
1693 untie %new_toks;
1694 untie %new_seen;
1695 $self->_unlink_file($tmptoksdbname);
1696 $self->_unlink_file($tmpseendbname);
1697 $self->untie_db();
1698 return 0;
1699 }
1700
1701 unless ($db_version == 2 || $db_version == 3) {
1702 warn("bayes: database version $db_version is unsupported, must be version 2 or 3");
1703 untie %new_toks;
1704 untie %new_seen;
1705 $self->_unlink_file($tmptoksdbname);
1706 $self->_unlink_file($tmpseendbname);
1707 $self->untie_db();
1708 return 0;
1709 }
1710
1711 for ($!=0; defined($line=<DUMPFILE>); $!=0) {
1712 chomp($line);
1713 $line_count++;
1714
1715 if ($line_count % 1000 == 0) {
1716 print STDERR "." if ($showdots);
1717 }
1718
1719 if ($line =~ /^v\s+/) { # variable line
1720 my @parsed_line = split(/\s+/, $line, 3);
1721 my $value = $parsed_line[1] + 0;
1722 if ($parsed_line[2] eq 'num_spam') {
1723 $num_spam = $value;
1724 }
1725 elsif ($parsed_line[2] eq 'num_nonspam') {
1726 $num_ham = $value;
1727 }
1728 else {
1729 dbg("bayes: restore_database: skipping unknown line: $line");
1730 }
1731 }
1732 elsif ($line =~ /^t\s+/) { # token line
1733 my @parsed_line = split(/\s+/, $line, 5);
1734 my $spam_count = $parsed_line[1] + 0;
1735 my $ham_count = $parsed_line[2] + 0;
1736 my $atime = $parsed_line[3] + 0;
1737 my $token = $parsed_line[4];
1738
1739 my $token_warn_p = 0;
1740 my @warnings;
1741
1742 if ($spam_count < 0) {
1743 $spam_count = 0;
1744 push(@warnings, 'spam count < 0, resetting');
1745 $token_warn_p = 1;
1746 }
1747 if ($ham_count < 0) {
1748 $ham_count = 0;
1749 push(@warnings, 'ham count < 0, resetting');
1750 $token_warn_p = 1;
1751 }
1752
1753 if ($spam_count == 0 && $ham_count == 0) {
1754 dbg("bayes: token has zero spam and ham count, skipping");
1755 next;
1756 }
1757
1758 if ($atime > time()) {
1759 $atime = time();
1760 push(@warnings, 'atime > current time, resetting');
1761 $token_warn_p = 1;
1762 }
1763
1764 if ($token_warn_p) {
1765 dbg("bayes: token (%s) has the following warnings:\n%s",
1766 $token, join("\n",@warnings));
1767 }
1768
1769 # database versions < 3 did not encode their token values
1770 if ($db_version < 3) {
1771 $token = substr(sha1($token), -5);
1772 }
1773 else {
1774 # turn unpacked binary token back into binary value
1775 $token = pack("H*",$token);
1776 }
1777
1778 $new_toks{$token} = $self->tok_pack($spam_count, $ham_count, $atime);
1779 if ($atime < $oldest_token_age) {
1780 $oldest_token_age = $atime;
1781 }
1782 if ($atime > $newest_token_age) {
1783 $newest_token_age = $atime;
1784 }
1785 $token_count++;
1786 }
1787 elsif ($line =~ /^s\s+/) { # seen line
1788 my @parsed_line = split(/\s+/, $line, 3);
1789 my $flag = $parsed_line[1];
1790 my $msgid = $parsed_line[2];
1791
1792 unless ($flag eq 'h' || $flag eq 's') {
1793 dbg("bayes: unknown seen flag ($flag) for line: $line, skipping");
1794 next;
1795 }
1796
1797 unless ($msgid) {
1798 dbg("bayes: blank msgid for line: $line, skipping");
1799 next;
1800 }
1801
1802 $new_seen{$msgid} = $flag;
1803 }
1804 else {
1805 dbg("bayes: skipping unknown line: $line");
1806 next;
1807 }
1808 }
1809 defined $line || $!==0 or die "Error reading dump file: $!";
1810 close(DUMPFILE) or die "Can't close dump file: $!";
1811
1812 print STDERR "\n" if ($showdots);
1813
1814 unless (defined($num_spam)) {
1815 dbg("bayes: unable to find num spam, please check file");
1816 $error_p = 1;
1817 }
1818
1819 unless (defined($num_ham)) {
1820 dbg("bayes: unable to find num ham, please check file");
1821 $error_p = 1;
1822 }
1823
1824 if ($error_p) {
1825 dbg("bayes: error(s) while attempting to load $filename, correct and re-run");
1826
1827 untie %new_toks;
1828 untie %new_seen;
1829 $self->_unlink_file($tmptoksdbname);
1830 $self->_unlink_file($tmpseendbname);
1831 $self->untie_db();
1832 return 0;
1833 }
1834
1835 # set the calculated magic tokens
1836 $new_toks{$DB_VERSION_MAGIC_TOKEN} = $self->DB_VERSION();
1837 $new_toks{$NTOKENS_MAGIC_TOKEN} = $token_count;
1838 $new_toks{$NSPAM_MAGIC_TOKEN} = $num_spam;
1839 $new_toks{$NHAM_MAGIC_TOKEN} = $num_ham;
1840 $new_toks{$NEWEST_TOKEN_AGE_MAGIC_TOKEN} = $newest_token_age;
1841 $new_toks{$OLDEST_TOKEN_AGE_MAGIC_TOKEN} = $oldest_token_age;
1842
1843 # go ahead and zero out these, chances are good that they are bogus anyway.
1844 $new_toks{$LAST_EXPIRE_MAGIC_TOKEN} = 0;
1845 $new_toks{$LAST_JOURNAL_SYNC_MAGIC_TOKEN} = 0;
1846 $new_toks{$LAST_ATIME_DELTA_MAGIC_TOKEN} = 0;
1847 $new_toks{$LAST_EXPIRE_REDUCE_MAGIC_TOKEN} = 0;
1848
1849 local $SIG{'INT'} = 'IGNORE';
1850 local $SIG{'TERM'} = 'IGNORE';
1851 local $SIG{'HUP'} = 'IGNORE' if !am_running_on_windows();
1852
1853 untie %new_toks;
1854 untie %new_seen;
1855 $self->untie_db();
1856
1857 # Here is where something can go horribly wrong and screw up the bayes
1858 # database files. If we are able to copy one and not the other then it
1859 # will leave the database in an inconsistent state. Since this is an
1860 # edge case, and they're trying to replace the DB anyway we should be ok.
1861 unless ($self->_rename_file($tmptoksdbname, $toksdbname)) {
1862 dbg("bayes: error while renaming $tmptoksdbname to $toksdbname: $!");
1863 return 0;
1864 }
1865 unless ($self->_rename_file($tmpseendbname, $seendbname)) {
1866 dbg("bayes: error while renaming $tmpseendbname to $seendbname: $!");
1867 dbg("bayes: database now in inconsistent state");
1868 return 0;
1869 }
1870
1871 dbg("bayes: parsed $line_count lines");
1872 dbg("bayes: created database with $token_count tokens based on $num_spam spam messages and $num_ham ham messages");
1873
1874 return 1;
1875}
1876
1877###########################################################################
1878
1879# token marshalling format for db_toks.
1880
1881# Since we may have many entries with few hits, especially thousands of hapaxes
1882# (1-occurrence entries), use a flexible entry format, instead of simply "2
1883# packed ints", to keep the memory and disk space usage down. In my
1884# 18k-message test corpus, only 8.9% have >= 8 hits in either counter, so we
1885# can use a 1-byte representation for the other 91% of low-hitting entries
1886# and save masses of space.
1887
1888# This looks like: XXSSSHHH (XX = format bits, SSS = 3 spam-count bits, HHH = 3
1889# ham-count bits). If XX in the first byte is 11, it's packed as this 1-byte
1890# representation; otherwise, if XX in the first byte is 00, it's packed as
1891# "CLL", ie. 1 byte and 2 32-bit "longs" in perl pack format.
1892
1893# Savings: roughly halves size of toks db, at the cost of a ~10% slowdown.
1894
1895290µs2456µs
# spent 243µs (30+213) within Mail::SpamAssassin::BayesStore::DBM::BEGIN@1895 which was called: # once (30µs+213µs) by Mail::SpamAssassin::Plugin::Bayes::learner_new at line 1895
use constant FORMAT_FLAG => 0xc0; # 11000000
# spent 243µs making 1 call to Mail::SpamAssassin::BayesStore::DBM::BEGIN@1895 # spent 213µs making 1 call to constant::import
1896262µs2380µs
# spent 208µs (35+172) within Mail::SpamAssassin::BayesStore::DBM::BEGIN@1896 which was called: # once (35µs+172µs) by Mail::SpamAssassin::Plugin::Bayes::learner_new at line 1896
use constant ONE_BYTE_FORMAT => 0xc0; # 11000000
# spent 208µs making 1 call to Mail::SpamAssassin::BayesStore::DBM::BEGIN@1896 # spent 172µs making 1 call to constant::import
1897288µs2415µs
# spent 218µs (22+197) within Mail::SpamAssassin::BayesStore::DBM::BEGIN@1897 which was called: # once (22µs+197µs) by Mail::SpamAssassin::Plugin::Bayes::learner_new at line 1897
use constant TWO_LONGS_FORMAT => 0x00; # 00000000
# spent 218µs making 1 call to Mail::SpamAssassin::BayesStore::DBM::BEGIN@1897 # spent 197µs making 1 call to constant::import
1898
1899274µs2406µs
# spent 218µs (31+188) within Mail::SpamAssassin::BayesStore::DBM::BEGIN@1899 which was called: # once (31µs+188µs) by Mail::SpamAssassin::Plugin::Bayes::learner_new at line 1899
use constant ONE_BYTE_SSS_BITS => 0x38; # 00111000
# spent 218µs making 1 call to Mail::SpamAssassin::BayesStore::DBM::BEGIN@1899 # spent 188µs making 1 call to constant::import
19002982µs2365µs
# spent 201µs (38+164) within Mail::SpamAssassin::BayesStore::DBM::BEGIN@1900 which was called: # once (38µs+164µs) by Mail::SpamAssassin::Plugin::Bayes::learner_new at line 1900
use constant ONE_BYTE_HHH_BITS => 0x07; # 00000111
# spent 201µs making 1 call to Mail::SpamAssassin::BayesStore::DBM::BEGIN@1900 # spent 164µs making 1 call to constant::import
1901
1902sub tok_unpack {
1903 my ($self, $value) = @_;
1904 $value ||= 0;
1905
1906 my ($packed, $atime);
1907 if ($self->{db_version} >= 1) {
1908 ($packed, $atime) = unpack("CV", $value);
1909 }
1910 elsif ($self->{db_version} == 0) {
1911 ($packed, $atime) = unpack("CS", $value);
1912 }
1913
1914 if (($packed & FORMAT_FLAG) == ONE_BYTE_FORMAT) {
1915 return (($packed & ONE_BYTE_SSS_BITS) >> 3,
1916 $packed & ONE_BYTE_HHH_BITS,
1917 $atime || 0);
1918 }
1919 elsif (($packed & FORMAT_FLAG) == TWO_LONGS_FORMAT) {
1920 my ($packed, $ts, $th, $atime);
1921 if ($self->{db_version} >= 1) {
1922 ($packed, $ts, $th, $atime) = unpack("CVVV", $value);
1923 }
1924 elsif ($self->{db_version} == 0) {
1925 ($packed, $ts, $th, $atime) = unpack("CLLS", $value);
1926 }
1927 return ($ts || 0, $th || 0, $atime || 0);
1928 }
1929 # other formats would go here...
1930 else {
1931 warn "bayes: unknown packing format for bayes db, please re-learn: $packed";
1932 return (0, 0, 0);
1933 }
1934}
1935
1936sub tok_pack {
1937 my ($self, $ts, $th, $atime) = @_;
1938 $ts ||= 0; $th ||= 0; $atime ||= 0;
1939 if ($ts < 8 && $th < 8) {
1940 return pack ("CV", ONE_BYTE_FORMAT | ($ts << 3) | $th, $atime);
1941 } else {
1942 return pack ("CVVV", TWO_LONGS_FORMAT, $ts, $th, $atime);
1943 }
1944}
1945
1946###########################################################################
1947
1948
# spent 12µs within Mail::SpamAssassin::BayesStore::DBM::db_readable which was called: # once (12µs+0s) by Mail::SpamAssassin::Plugin::Bayes::learner_close at line 330 of Mail/SpamAssassin/Plugin/Bayes.pm
sub db_readable {
194912µs my ($self) = @_;
1950115µs return $self->{already_tied};
1951}
1952
1953sub db_writable {
1954 my ($self) = @_;
1955 return $self->{already_tied} && $self->{is_locked};
1956}
1957
1958###########################################################################
1959
1960sub _unlink_file {
1961 my ($self, $filename) = @_;
1962
1963 unlink $filename;
1964}
1965
1966sub _rename_file {
1967 my ($self, $sourcefilename, $targetfilename) = @_;
1968
1969 return 0 unless (rename($sourcefilename, $targetfilename));
1970
1971 return 1;
1972}
1973
1974sub sa_die { Mail::SpamAssassin::sa_die(@_); }
1975
1976128µs1;
 
# spent 3.27ms within Mail::SpamAssassin::BayesStore::DBM::CORE:close which was called 235 times, avg 14µs/call: # 235 times (3.27ms+0s) by Mail::SpamAssassin::BayesStore::DBM::cleanup at line 1179, avg 14µs/call
sub Mail::SpamAssassin::BayesStore::DBM::CORE:close; # opcode
# spent 35µs within Mail::SpamAssassin::BayesStore::DBM::CORE:ftdir which was called 2 times, avg 17µs/call: # 2 times (35µs+0s) by Mail::SpamAssassin::BayesStore::DBM::tie_db_writable at line 267, avg 17µs/call
sub Mail::SpamAssassin::BayesStore::DBM::CORE:ftdir; # opcode
# spent 12.3ms within Mail::SpamAssassin::BayesStore::DBM::CORE:ftfile which was called 241 times, avg 51µs/call: # 238 times (12.3ms+0s) by Mail::SpamAssassin::BayesStore::DBM::tie_db_readonly at line 162, avg 52µs/call # 3 times (33µs+0s) by Mail::SpamAssassin::BayesStore::DBM::tie_db_writable at line 260, avg 11µs/call
sub Mail::SpamAssassin::BayesStore::DBM::CORE:ftfile; # opcode
# spent 4.61ms within Mail::SpamAssassin::BayesStore::DBM::CORE:match which was called 1183 times, avg 4µs/call: # 949 times (2.93ms+0s) by Mail::SpamAssassin::BayesStore::DBM::get_storage_variables at line 986, avg 3µs/call # 234 times (1.68ms+0s) by Mail::SpamAssassin::BayesStore::DBM::get_storage_variables at line 912, avg 7µs/call
sub Mail::SpamAssassin::BayesStore::DBM::CORE:match; # opcode
# spent 29.7ms within Mail::SpamAssassin::BayesStore::DBM::CORE:open which was called 235 times, avg 126µs/call: # 235 times (29.7ms+0s) by Mail::SpamAssassin::BayesStore::DBM::cleanup at line 1134, avg 126µs/call
sub Mail::SpamAssassin::BayesStore::DBM::CORE:open; # opcode
# spent 9µs within Mail::SpamAssassin::BayesStore::DBM::CORE:qr which was called: # once (9µs+0s) by Mail::SpamAssassin::BayesStore::DBM::BEGIN@41 at line 41
sub Mail::SpamAssassin::BayesStore::DBM::CORE:qr; # opcode
# spent 24.8ms within Mail::SpamAssassin::BayesStore::DBM::CORE:syswrite which was called 235 times, avg 106µs/call: # 235 times (24.8ms+0s) by Mail::SpamAssassin::BayesStore::DBM::cleanup at line 1149, avg 106µs/call
sub Mail::SpamAssassin::BayesStore::DBM::CORE:syswrite; # opcode
# spent 850µs within Mail::SpamAssassin::BayesStore::DBM::CORE:tell which was called 235 times, avg 4µs/call: # 235 times (850µs+0s) by Mail::SpamAssassin::BayesStore::DBM::cleanup at line 1145, avg 4µs/call
sub Mail::SpamAssassin::BayesStore::DBM::CORE:tell; # opcode
# spent 1.62ms within Mail::SpamAssassin::BayesStore::DBM::CORE:umask which was called 474 times, avg 3µs/call: # 235 times (1.06ms+0s) by Mail::SpamAssassin::BayesStore::DBM::cleanup at line 1132, avg 5µs/call # 235 times (547µs+0s) by Mail::SpamAssassin::BayesStore::DBM::cleanup at line 1139, avg 2µs/call # 2 times (7µs+0s) by Mail::SpamAssassin::BayesStore::DBM::tie_db_writable at line 289, avg 3µs/call # 2 times (5µs+0s) by Mail::SpamAssassin::BayesStore::DBM::tie_db_writable at line 304, avg 3µs/call
sub Mail::SpamAssassin::BayesStore::DBM::CORE:umask; # opcode
# spent 183ms within Mail::SpamAssassin::BayesStore::DBM::CORE:unpack which was called 73949 times, avg 2µs/call: # 73949 times (183ms+0s) by Mail::SpamAssassin::BayesStore::DBM::multi_tok_count_change at line 1069, avg 2µs/call
sub Mail::SpamAssassin::BayesStore::DBM::CORE:unpack; # opcode