← Index
NYTProf Performance Profile   « line view »
For /usr/local/bin/sa-learn
  Run on Tue Nov 7 05:38:10 2017
Reported on Tue Nov 7 06:16:03 2017

Filename/usr/local/lib/perl5/site_perl/Mail/SpamAssassin/ArchiveIterator.pm
StatementsExecuted 30385 statements in 605ms
Subroutines
Calls P F Exclusive
Time
Inclusive
Time
Subroutine
24011349ms1575sMail::SpamAssassin::ArchiveIterator::::_run_fileMail::SpamAssassin::ArchiveIterator::_run_file
2401169.0ms1575sMail::SpamAssassin::ArchiveIterator::::_run_messageMail::SpamAssassin::ArchiveIterator::_run_message
142853138.3ms38.3msMail::SpamAssassin::ArchiveIterator::::CORE:matchMail::SpamAssassin::ArchiveIterator::CORE:match (opcode)
6922119.8ms19.8msMail::SpamAssassin::ArchiveIterator::::CORE:readMail::SpamAssassin::ArchiveIterator::CORE:read (opcode)
2401119.6ms22.0msMail::SpamAssassin::ArchiveIterator::::_index_packMail::SpamAssassin::ArchiveIterator::_index_pack
2401116.8ms54.0msMail::SpamAssassin::ArchiveIterator::::_scan_fileMail::SpamAssassin::ArchiveIterator::_scan_file
2401114.3ms14.3msMail::SpamAssassin::ArchiveIterator::::CORE:openMail::SpamAssassin::ArchiveIterator::CORE:open (opcode)
7224112.4ms12.4msMail::SpamAssassin::ArchiveIterator::::CORE:statMail::SpamAssassin::ArchiveIterator::CORE:stat (opcode)
11112.0ms1575sMail::SpamAssassin::ArchiveIterator::::_runMail::SpamAssassin::ArchiveIterator::_run
2401111.5ms28.7msMail::SpamAssassin::ArchiveIterator::::_mail_openMail::SpamAssassin::ArchiveIterator::_mail_open
21111.1ms72.9msMail::SpamAssassin::ArchiveIterator::::_scan_directoryMail::SpamAssassin::ArchiveIterator::_scan_directory
240115.75ms27.8msMail::SpamAssassin::ArchiveIterator::::__ANON__[:305]Mail::SpamAssassin::ArchiveIterator::__ANON__[:305]
240114.13ms6.40msMail::SpamAssassin::ArchiveIterator::::_index_unpackMail::SpamAssassin::ArchiveIterator::_index_unpack
240112.88ms2.88msMail::SpamAssassin::ArchiveIterator::::_message_is_useful_by_file_modtimeMail::SpamAssassin::ArchiveIterator::_message_is_useful_by_file_modtime
1112.47ms3.49msMail::SpamAssassin::ArchiveIterator::::BEGIN@31Mail::SpamAssassin::ArchiveIterator::BEGIN@31
240112.43ms2.43msMail::SpamAssassin::ArchiveIterator::::CORE:packMail::SpamAssassin::ArchiveIterator::CORE:pack (opcode)
240212.42ms2.42msMail::SpamAssassin::ArchiveIterator::::CORE:closeMail::SpamAssassin::ArchiveIterator::CORE:close (opcode)
240112.26ms2.26msMail::SpamAssassin::ArchiveIterator::::CORE:unpackMail::SpamAssassin::ArchiveIterator::CORE:unpack (opcode)
722411.67ms1.67msMail::SpamAssassin::ArchiveIterator::::CORE:ftfileMail::SpamAssassin::ArchiveIterator::CORE:ftfile (opcode)
240111.49ms1.49msMail::SpamAssassin::ArchiveIterator::::_bump_scan_progressMail::SpamAssassin::ArchiveIterator::_bump_scan_progress
240111.01ms1.01msMail::SpamAssassin::ArchiveIterator::::CORE:binmodeMail::SpamAssassin::ArchiveIterator::CORE:binmode (opcode)
24521864µs864µsMail::SpamAssassin::ArchiveIterator::::CORE:ftsizeMail::SpamAssassin::ArchiveIterator::CORE:ftsize (opcode)
211668µs668µsMail::SpamAssassin::ArchiveIterator::::CORE:readdirMail::SpamAssassin::ArchiveIterator::CORE:readdir (opcode)
111331µs73.7msMail::SpamAssassin::ArchiveIterator::::_scan_targetsMail::SpamAssassin::ArchiveIterator::_scan_targets
211129µs129µsMail::SpamAssassin::ArchiveIterator::::CORE:globMail::SpamAssassin::ArchiveIterator::CORE:glob (opcode)
211125µs155µsMail::SpamAssassin::ArchiveIterator::::_set_default_message_selection_optsMail::SpamAssassin::ArchiveIterator::_set_default_message_selection_opts
21199µs247µsMail::SpamAssassin::ArchiveIterator::::_fix_globsMail::SpamAssassin::ArchiveIterator::_fix_globs
11197µs1575sMail::SpamAssassin::ArchiveIterator::::runMail::SpamAssassin::ArchiveIterator::run
11164µs229µsMail::SpamAssassin::ArchiveIterator::::BEGIN@27Mail::SpamAssassin::ArchiveIterator::BEGIN@27
21162µs62µsMail::SpamAssassin::ArchiveIterator::::CORE:open_dirMail::SpamAssassin::ArchiveIterator::CORE:open_dir (opcode)
11151µs51µsMail::SpamAssassin::ArchiveIterator::::newMail::SpamAssassin::ArchiveIterator::new
11149µs63µsMail::SpamAssassin::ArchiveIterator::::BEGIN@22Mail::SpamAssassin::ArchiveIterator::BEGIN@22
11139µs120µsMail::SpamAssassin::ArchiveIterator::::BEGIN@28Mail::SpamAssassin::ArchiveIterator::BEGIN@28
42138µs38µsMail::SpamAssassin::ArchiveIterator::::CORE:ftdirMail::SpamAssassin::ArchiveIterator::CORE:ftdir (opcode)
11132µs245µsMail::SpamAssassin::ArchiveIterator::::BEGIN@30Mail::SpamAssassin::ArchiveIterator::BEGIN@30
84131µs31µsMail::SpamAssassin::ArchiveIterator::::CORE:substMail::SpamAssassin::ArchiveIterator::CORE:subst (opcode)
11126µs224µsMail::SpamAssassin::ArchiveIterator::::BEGIN@34Mail::SpamAssassin::ArchiveIterator::BEGIN@34
11125µs65µsMail::SpamAssassin::ArchiveIterator::::BEGIN@23Mail::SpamAssassin::ArchiveIterator::BEGIN@23
11125µs32µsMail::SpamAssassin::ArchiveIterator::::BEGIN@24Mail::SpamAssassin::ArchiveIterator::BEGIN@24
11125µs753µsMail::SpamAssassin::ArchiveIterator::::BEGIN@29Mail::SpamAssassin::ArchiveIterator::BEGIN@29
11124µs277µsMail::SpamAssassin::ArchiveIterator::::BEGIN@36Mail::SpamAssassin::ArchiveIterator::BEGIN@36
11122µs90µsMail::SpamAssassin::ArchiveIterator::::BEGIN@25Mail::SpamAssassin::ArchiveIterator::BEGIN@25
21118µs18µsMail::SpamAssassin::ArchiveIterator::::CORE:closedirMail::SpamAssassin::ArchiveIterator::CORE:closedir (opcode)
11115µs15µsMail::SpamAssassin::ArchiveIterator::::set_functionsMail::SpamAssassin::ArchiveIterator::set_functions
11114µs14µsMail::SpamAssassin::ArchiveIterator::::_create_cacheMail::SpamAssassin::ArchiveIterator::_create_cache
0000s0sMail::SpamAssassin::ArchiveIterator::::_message_is_useful_by_dateMail::SpamAssassin::ArchiveIterator::_message_is_useful_by_date
0000s0sMail::SpamAssassin::ArchiveIterator::::_run_mailboxMail::SpamAssassin::ArchiveIterator::_run_mailbox
0000s0sMail::SpamAssassin::ArchiveIterator::::_run_mbxMail::SpamAssassin::ArchiveIterator::_run_mbx
0000s0sMail::SpamAssassin::ArchiveIterator::::_scan_mailboxMail::SpamAssassin::ArchiveIterator::_scan_mailbox
0000s0sMail::SpamAssassin::ArchiveIterator::::_scan_mbxMail::SpamAssassin::ArchiveIterator::_scan_mbx
0000s0sMail::SpamAssassin::ArchiveIterator::::_scanprob_says_scanMail::SpamAssassin::ArchiveIterator::_scanprob_says_scan
Call graph for these subroutines as a Graphviz dot language file.
Line State
ments
Time
on line
Calls Time
in subs
Code
1# iterate over mail archives, calling a function on each message.
2#
3# <@LICENSE>
4# Licensed to the Apache Software Foundation (ASF) under one or more
5# contributor license agreements. See the NOTICE file distributed with
6# this work for additional information regarding copyright ownership.
7# The ASF licenses this file to you under the Apache License, Version 2.0
8# (the "License"); you may not use this file except in compliance with
9# the License. You may obtain a copy of the License at:
10#
11# http://www.apache.org/licenses/LICENSE-2.0
12#
13# Unless required by applicable law or agreed to in writing, software
14# distributed under the License is distributed on an "AS IS" BASIS,
15# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
16# See the License for the specific language governing permissions and
17# limitations under the License.
18# </@LICENSE>
19
20package Mail::SpamAssassin::ArchiveIterator;
21
22275µs278µs
# spent 63µs (49+14) within Mail::SpamAssassin::ArchiveIterator::BEGIN@22 which was called: # once (49µs+14µs) by main::BEGIN@66 at line 22
use strict;
# spent 63µs making 1 call to Mail::SpamAssassin::ArchiveIterator::BEGIN@22 # spent 14µs making 1 call to strict::import
23264µs2104µs
# spent 65µs (25+39) within Mail::SpamAssassin::ArchiveIterator::BEGIN@23 which was called: # once (25µs+39µs) by main::BEGIN@66 at line 23
use warnings;
# spent 65µs making 1 call to Mail::SpamAssassin::ArchiveIterator::BEGIN@23 # spent 39µs making 1 call to warnings::import
24266µs239µs
# spent 32µs (25+7) within Mail::SpamAssassin::ArchiveIterator::BEGIN@24 which was called: # once (25µs+7µs) by main::BEGIN@66 at line 24
use bytes;
# spent 32µs making 1 call to Mail::SpamAssassin::ArchiveIterator::BEGIN@24 # spent 7µs making 1 call to bytes::import
25284µs2157µs
# spent 90µs (22+67) within Mail::SpamAssassin::ArchiveIterator::BEGIN@25 which was called: # once (22µs+67µs) by main::BEGIN@66 at line 25
use re 'taint';
# spent 90µs making 1 call to Mail::SpamAssassin::ArchiveIterator::BEGIN@25 # spent 67µs making 1 call to re::import
26
27274µs2394µs
# spent 229µs (64+165) within Mail::SpamAssassin::ArchiveIterator::BEGIN@27 which was called: # once (64µs+165µs) by main::BEGIN@66 at line 27
use Errno qw(ENOENT EACCES EBADF);
# spent 229µs making 1 call to Mail::SpamAssassin::ArchiveIterator::BEGIN@27 # spent 165µs making 1 call to Exporter::import
28268µs2201µs
# spent 120µs (39+81) within Mail::SpamAssassin::ArchiveIterator::BEGIN@28 which was called: # once (39µs+81µs) by main::BEGIN@66 at line 28
use Mail::SpamAssassin::Util;
# spent 120µs making 1 call to Mail::SpamAssassin::ArchiveIterator::BEGIN@28 # spent 81µs making 1 call to Exporter::import
29284µs21.48ms
# spent 753µs (25+728) within Mail::SpamAssassin::ArchiveIterator::BEGIN@29 which was called: # once (25µs+728µs) by main::BEGIN@66 at line 29
use Mail::SpamAssassin::Constants qw(:sa);
# spent 753µs making 1 call to Mail::SpamAssassin::ArchiveIterator::BEGIN@29 # spent 728µs making 1 call to Exporter::import
30277µs2459µs
# spent 245µs (32+213) within Mail::SpamAssassin::ArchiveIterator::BEGIN@30 which was called: # once (32µs+213µs) by main::BEGIN@66 at line 30
use Mail::SpamAssassin::Logger;
# spent 245µs making 1 call to Mail::SpamAssassin::ArchiveIterator::BEGIN@30 # spent 213µs making 1 call to Exporter::import
312391µs13.49ms
# spent 3.49ms (2.47+1.02) within Mail::SpamAssassin::ArchiveIterator::BEGIN@31 which was called: # once (2.47ms+1.02ms) by main::BEGIN@66 at line 31
use Mail::SpamAssassin::AICache;
# spent 3.49ms making 1 call to Mail::SpamAssassin::ArchiveIterator::BEGIN@31
32
33# 256 KiB is a big email, unless stated otherwise
34287µs2422µs
# spent 224µs (26+198) within Mail::SpamAssassin::ArchiveIterator::BEGIN@34 which was called: # once (26µs+198µs) by main::BEGIN@66 at line 34
use constant BIG_BYTES => 256*1024;
# spent 224µs making 1 call to Mail::SpamAssassin::ArchiveIterator::BEGIN@34 # spent 198µs making 1 call to constant::import
35
3613µs
# spent 277µs (24+253) within Mail::SpamAssassin::ArchiveIterator::BEGIN@36 which was called: # once (24µs+253µs) by main::BEGIN@66 at line 41
use vars qw {
37 $MESSAGES
38 $AICache
39 %class_opts
40 @ISA
41110.9ms2530µs};
# spent 277µs making 1 call to Mail::SpamAssassin::ArchiveIterator::BEGIN@36 # spent 253µs making 1 call to vars::import
42
43112µs@ISA = qw();
44
45=head1 NAME
46
47Mail::SpamAssassin::ArchiveIterator - find and process messages one at a time
48
49=head1 SYNOPSIS
50
51 my $iter = new Mail::SpamAssassin::ArchiveIterator(
52 {
53 'opt_max_size' => 256 * 1024, # 0 implies no limit
54 'opt_cache' => 1,
55 }
56 );
57
58 $iter->set_functions( \&wanted, sub { } );
59
60 eval { $iter->run(@ARGV); };
61
62 sub wanted {
63 my($class, $filename, $recv_date, $msg_array) = @_;
64
65
66 ...
67 }
68
69=head1 DESCRIPTION
70
71The Mail::SpamAssassin::ArchiveIterator module will go through a set
72of mbox files, mbx files, and directories (with a single message per
73file) and generate a list of messages. It will then call the C<wanted_sub>
74and C<result_sub> functions appropriately per message.
75
76=head1 METHODS
77
78=over 4
79
80=cut
81
82
83###########################################################################
84
85=item $item = new Mail::SpamAssassin::ArchiveIterator( [ { opt => val, ... } ] )
86
87Constructs a new C<Mail::SpamAssassin::ArchiveIterator> object. You may
88pass the following attribute-value pairs to the constructor. The pairs are
89optional unless otherwise noted.
90
91=over 4
92
93=item opt_max_size
94
95A value of option I<opt_max_size> determines a limit (number of bytes)
96beyond which a message is considered large and is skipped by ArchiveIterator.
97
98A value 0 implies no size limit, all messages are examined. An undefined
99value implies a default limit of 256 KiB.
100
101=item opt_all
102
103Setting this option to true implicitly sets I<opt_max_size> to 0, i.e.
104no limit of a message size, all messages are processes by ArchiveIterator.
105For compatibility with SpamAssassin versions older than 3.4.0 which
106lacked option I<opt_max_size>.
107
108=item opt_scanprob
109
110Randomly select messages to scan, with a probability of N, where N ranges
111from 0.0 (no messages scanned) to 1.0 (all messages scanned). Default
112is 1.0.
113
114This setting can be specified separately for each target.
115
116=item opt_before
117
118Only use messages which are received after the given time_t value.
119Negative values are an offset from the current time, e.g. -86400 =
120last 24 hours; or as parsed by Time::ParseDate (e.g. '-6 months')
121
122This setting can be specified separately for each target.
123
124=item opt_after
125
126Same as opt_before, except the messages are only used if after the given
127time_t value.
128
129This setting can be specified separately for each target.
130
131=item opt_want_date
132
133Set to 1 (default) if you want the received date to be filled in
134in the C<wanted_sub> callback below. Set this to 0 to avoid this;
135it's a good idea to set this to 0 if you can, as it imposes a performance
136hit.
137
138=item opt_skip_empty_messages
139
140Set to 1 if you want to skip corrupt, 0-byte messages. The default is 0.
141
142=item opt_cache
143
144Set to 0 (default) if you don't want to use cached information to help speed
145up ArchiveIterator. Set to 1 to enable. This setting requires C<opt_cachedir>
146also be set.
147
148=item opt_cachedir
149
150Set to the path of a directory where you wish to store cached information for
151C<opt_cache>, if you don't want to mix them with the input files (as is the
152default). The directory must be both readable and writable.
153
154=item wanted_sub
155
156Reference to a subroutine which will process message data. Usually
157set via set_functions(). The routine will be passed 5 values: class
158(scalar), filename (scalar), received date (scalar), message content
159(array reference, one message line per element), and the message format
160key ('f' for file, 'm' for mbox, 'b' for mbx).
161
162Note that if C<opt_want_date> is set to 0, the received date scalar will be
163undefined.
164
165=item result_sub
166
167Reference to a subroutine which will process the results of the wanted_sub
168for each message processed. Usually set via set_functions().
169The routine will be passed 3 values: class (scalar), result (scalar, returned
170from wanted_sub), and received date (scalar).
171
172Note that if C<opt_want_date> is set to 0, the received date scalar will be
173undefined.
174
175=item scan_progress_sub
176
177Reference to a subroutine which will be called intermittently during
178the 'scan' phase of the mass-check. No guarantees are made as to
179how frequently this may happen, mind you.
180
181=item opt_from_regex
182
183This setting allows for flexibility in specifying the mbox format From separator.
184
185It defaults to the regular expression:
186
187/^From \S+ ?(\S\S\S \S\S\S .\d .\d:\d\d:\d\d \d{4}|.\d-\d\d-\d{4}_\d\d:\d\d:\d\d_)/
188
189Some SpamAssassin programs such as sa-learn will use the configuration option
190'mbox_format_from_regex' to override the default regular expression.
191
192=back
193
194=cut
195
196
# spent 51µs within Mail::SpamAssassin::ArchiveIterator::new which was called: # once (51µs+0s) by main::RUNTIME at line 467 of /usr/local/bin/sa-learn
sub new {
19713µs my $class = shift;
19813µs $class = ref($class) || $class;
199
20012µs my $self = shift;
20112µs if (!defined $self) { $self = { }; }
20213µs bless ($self, $class);
203
204 # If any of these options are set, we need to figure out the message's
205 # receive date at scan time. opt_after, opt_before, or opt_want_date
206 $self->{determine_receive_date} =
207 defined $self->{opt_after} || defined $self->{opt_before} ||
208110µs $self->{opt_want_date};
209
21015µs $self->{s} = [ ]; # spam, of course
21114µs $self->{h} = [ ]; # ham, as if you couldn't guess
212
21313µs $self->{access_problem} = 0;
214
21516µs if ($self->{opt_all}) {
216 $self->{opt_max_size} = 0;
217 } elsif (!defined $self->{opt_max_size}) {
21813µs $self->{opt_max_size} = BIG_BYTES;
219 }
220
221111µs $self;
222}
223
224###########################################################################
225
226=item set_functions( \&wanted_sub, \&result_sub )
227
228Sets the subroutines used for message processing (wanted_sub), and result
229reporting. For more information, see I<new()> above.
230
231=cut
232
233
# spent 15µs within Mail::SpamAssassin::ArchiveIterator::set_functions which was called: # once (15µs+0s) by main::RUNTIME at line 469 of /usr/local/bin/sa-learn
sub set_functions {
23413µs my ($self, $wanted, $result) = @_;
23514µs $self->{wanted_sub} = $wanted if defined $wanted;
236111µs $self->{result_sub} = $result if defined $result;
237}
238
239###########################################################################
240
241=item run ( @target_paths )
242
243Generates the list of messages to process, then runs each message through the
244configured wanted subroutine. Files which have a name ending in C<.gz> or
245C<.bz2> will be properly uncompressed via call to C<gzip -dc> and C<bzip2 -dc>
246respectively.
247
248The target_paths array is expected to be either one element per path in the
249following format: C<class:format:raw_location>, or a hash reference containing
250key-value option pairs and a 'target' key with a value in that format.
251
252The key-value option pairs that can be used are: opt_scanprob, opt_after,
253opt_before. See the constructor method's documentation for more information
254on their effects.
255
256run() returns 0 if there was an error (can't open a file, etc,) and 1 if there
257were no errors.
258
259=over 4
260
261=item class
262
263Either 'h' for ham or 's' for spam. If the class is longer than 1 character,
264it will be truncated. If blank, 'h' is default.
265
266=item format
267
268Specifies the format of the raw_location. C<dir> is a directory whose
269files are individual messages, C<file> a file with a single message,
270C<mbox> an mbox formatted file, or C<mbx> for an mbx formatted directory.
271
272C<detect> can also be used. This assumes C<mbox> for any file whose path
273contains the pattern C</\.mbox/i>, C<file> anything that is not a
274directory, or C<directory> otherwise.
275
276=item raw_location
277
278Path to file or directory. File globbing is allowed using the
279standard csh-style globbing (see C<perldoc -f glob>). C<~> at the
280front of the value will be replaced by the C<HOME> environment
281variable. Escaped whitespace is protected as well.
282
283B<NOTE:> C<~user> is not allowed.
284
285B<NOTE 2:> C<-> is not allowed as a raw location. To have
286ArchiveIterator deal with STDIN, generate a temp file.
287
288=back
289
290=cut
291
292
# spent 1575s (97µs+1575) within Mail::SpamAssassin::ArchiveIterator::run which was called: # once (97µs+1575s) by main::RUNTIME at line 478 of /usr/local/bin/sa-learn
sub run {
29317µs my ($self, @targets) = @_;
294
29513µs if (!defined $self->{wanted_sub}) {
296 warn "archive-iterator: set_functions never called";
297 return 0;
298 }
299
300 # scan the targets and get the number and list of messages
301 $self->_scan_targets(\@targets,
302
# spent 27.8ms (5.75+22.0) within Mail::SpamAssassin::ArchiveIterator::__ANON__[/usr/local/lib/perl5/site_perl/Mail/SpamAssassin/ArchiveIterator.pm:305] which was called 240 times, avg 116µs/call: # 240 times (5.75ms+22.0ms) by Mail::SpamAssassin::ArchiveIterator::_scan_file at line 880, avg 116µs/call
sub {
3032401.21ms my($self, $date, $class, $format, $mail) = @_;
3044804.49ms24022.0ms push(@{$self->{$class}}, _index_pack($date, $class, $format, $mail));
# spent 22.0ms making 240 calls to Mail::SpamAssassin::ArchiveIterator::_index_pack, avg 92µs/call
305 }
306128µs173.7ms );
307
30812µs my $messages;
309 # for ease of memory, we'll play with pointers
31013µs $messages = $self->{s};
31113µs undef $self->{s};
312314µs push(@{$messages}, @{$self->{h}});
31312µs undef $self->{h};
314
31526µs $MESSAGES = scalar(@{$messages});
316
317 # go ahead and run through all of the messages specified
318124µs11575s return $self->_run($messages);
# spent 1575s making 1 call to Mail::SpamAssassin::ArchiveIterator::_run
319}
320
321
# spent 1575s (12.0ms+1575) within Mail::SpamAssassin::ArchiveIterator::_run which was called: # once (12.0ms+1575s) by Mail::SpamAssassin::ArchiveIterator::run at line 318
sub _run {
32212µs my ($self, $messages) = @_;
323
3242423.01ms while (my $message = shift @{$messages}) {
3252403.80ms2401575s my($class, undef, $date, undef, $result) = $self->_run_message($message);
# spent 1575s making 240 calls to Mail::SpamAssassin::ArchiveIterator::_run_message, avg 6.56s/call
3264754.35ms2354.17ms &{$self->{result_sub}}($class, $result, $date) if $result;
# spent 4.17ms making 235 calls to main::result, avg 18µs/call
327 }
328116µs return ! $self->{access_problem};
329}
330
331############################################################################
332
333## run_message and related functions to process a single message
334
335
# spent 1575s (69.0ms+1575) within Mail::SpamAssassin::ArchiveIterator::_run_message which was called 240 times, avg 6.56s/call: # 240 times (69.0ms+1575s) by Mail::SpamAssassin::ArchiveIterator::_run at line 325, avg 6.56s/call
sub _run_message {
3362401.00ms my ($self, $msg) = @_;
337
3382403.26ms2406.40ms my ($date, $class, $format, $mail) = _index_unpack($msg);
# spent 6.40ms making 240 calls to Mail::SpamAssassin::ArchiveIterator::_index_unpack, avg 27µs/call
339
340240708µs if ($format eq 'f') {
34124063.2ms2401575s return $self->_run_file($class, $format, $mail, $date);
# spent 1575s making 240 calls to Mail::SpamAssassin::ArchiveIterator::_run_file, avg 6.56s/call
342 }
343 elsif ($format eq 'm') {
344 return $self->_run_mailbox($class, $format, $mail, $date);
345 }
346 elsif ($format eq 'b') {
347 return $self->_run_mbx($class, $format, $mail, $date);
348 }
349}
350
351
# spent 1575s (349ms+1575) within Mail::SpamAssassin::ArchiveIterator::_run_file which was called 240 times, avg 6.56s/call: # 240 times (349ms+1575s) by Mail::SpamAssassin::ArchiveIterator::_run_message at line 341, avg 6.56s/call
sub _run_file {
3522402.14ms my ($self, $class, $format, $where, $date) = @_;
353
3542401.94ms24028.7ms if (!_mail_open($where)) {
# spent 28.7ms making 240 calls to Mail::SpamAssassin::ArchiveIterator::_mail_open, avg 120µs/call
355 $self->{access_problem} = 1;
356 return;
357 }
358
3592403.42ms2401.34ms my $stat_errn = stat(INPUT) ? 0 : 0+$!;
# spent 1.34ms making 240 calls to Mail::SpamAssassin::ArchiveIterator::CORE:stat, avg 6µs/call
3602403.12ms240825µs if ($stat_errn == ENOENT) {
# spent 825µs making 240 calls to Mail::SpamAssassin::ArchiveIterator::CORE:ftfile, avg 3µs/call
361 dbg("archive-iterator: no such input ($where)");
362 return;
363 }
364 elsif ($stat_errn != 0) {
365 warn "archive-iterator: no access to input ($where): $!";
366 return;
367 }
368 elsif (!-f _ && !-c _ && !-p _) {
369 warn "archive-iterator: not a plain file (or char.spec. or pipe) ($where)";
370 return;
371 }
372
373240826µs my $opt_max_size = $self->{opt_max_size};
37424012.3ms4801.23ms if (!$opt_max_size) {
# spent 856µs making 240 calls to Mail::SpamAssassin::ArchiveIterator::CORE:ftsize, avg 4µs/call # spent 376µs making 240 calls to Mail::SpamAssassin::ArchiveIterator::CORE:ftfile, avg 2µs/call
375 # process any size
376 } elsif (!-f _) {
377 # must check size while reading
378 } elsif (-s _ > $opt_max_size) {
379 # skip too-big mails
380 # note that -s can only deal with files, it returns 0 on char.spec. STDIN
381586µs1065µs info("archive-iterator: skipping large message: ".
# spent 56µs making 5 calls to Mail::SpamAssassin::Logger::info, avg 11µs/call # spent 9µs making 5 calls to Mail::SpamAssassin::ArchiveIterator::CORE:ftsize, avg 2µs/call
382 "file size %d, limit %d bytes", -s _, $opt_max_size);
383581µs539µs close INPUT or die "error closing input file: $!";
# spent 39µs making 5 calls to Mail::SpamAssassin::ArchiveIterator::CORE:close, avg 8µs/call
384550µs return;
385 }
386
387235463µs my @msg;
388 my $header;
389235515µs my $len = 0;
390235592µs my $str = '';
391235434µs my($inbuf,$nread);
39223515.4ms23512.5ms while ( $nread=read(INPUT,$inbuf,16384) ) {
# spent 12.5ms making 235 calls to Mail::SpamAssassin::ArchiveIterator::CORE:read, avg 53µs/call
3934571.17ms $len += $nread;
394457998µs if ($opt_max_size && $len > $opt_max_size) {
395 info("archive-iterator: skipping large message: read %d, limit %d bytes",
396 $len, $opt_max_size);
397 close INPUT or die "error closing input file: $!";
398 return;
399 }
40045723.3ms4577.33ms $str .= $inbuf;
# spent 7.33ms making 457 calls to Mail::SpamAssassin::ArchiveIterator::CORE:read, avg 16µs/call
401 }
402235526µs defined $nread or die "error reading: $!";
403235756µs undef $inbuf;
40447069.3ms @msg = split(/^/m, $str, -1); undef $str;
4052351.59ms for my $j (0..$#msg) {
40614031250ms1356135.8ms if ($msg[$j] =~ /^\015?$/) { $header = $j; last }
# spent 35.8ms making 13561 calls to Mail::SpamAssassin::ArchiveIterator::CORE:match, avg 3µs/call
407 }
4082354.25ms2352.38ms close INPUT or die "error closing input file: $!";
# spent 2.38ms making 235 calls to Mail::SpamAssassin::ArchiveIterator::CORE:close, avg 10µs/call
409
4102351.00ms if ($date == AI_TIME_UNKNOWN && $self->{determine_receive_date}) {
411 $date = Mail::SpamAssassin::Util::receive_date(join('', splice(@msg, 0, $header)));
412 }
413
4144708.67ms2351575s return($class, $format, $date, $where, &{$self->{wanted_sub}}($class, $where, $date, \@msg, $format));
# spent 1575s making 235 calls to main::wanted, avg 6.70s/call
415}
416
417sub _run_mailbox {
418 my ($self, $class, $format, $where, $date) = @_;
419
420 my ($file, $offset);
421 { local($1,$2); # Bug 7140 (avoids perl bug [perl #123880])
422 ($file, $offset) = ($where =~ m/(.*)\.(\d+)$/);
423 }
424 my @msg;
425 my $header;
426 if (!_mail_open($file)) {
427 $self->{access_problem} = 1;
428 return;
429 }
430
431 my $opt_max_size = $self->{opt_max_size};
432 dbg("archive-iterator: _run_mailbox %s, ofs %d, limit %d",
433 $file, $offset, $opt_max_size||0);
434
435 seek(INPUT,$offset,0) or die "cannot reposition file to $offset: $!";
436
437 my $size = 0;
438 for ($!=0; <INPUT>; $!=0) {
439 #Changed Regex to use option Per bug 6703
440 last if (substr($_,0,5) eq "From " && @msg && /$self->{opt_from_regex}/o);
441 $size += length($_);
442 push (@msg, $_);
443
444 # skip mails that are too big
445 if ($opt_max_size && $size > $opt_max_size) {
446 info("archive-iterator: skipping large message: ".
447 "%d lines, %d bytes, limit %d bytes",
448 scalar @msg, $size, $opt_max_size);
449 close INPUT or die "error closing input file: $!";
450 return;
451 }
452
453 if (!defined $header && /^\s*$/) {
454 $header = $#msg;
455 }
456 }
457 defined $_ || $!==0 or
458 $!==EBADF ? dbg("archive-iterator: error reading: $!")
459 : die "error reading: $!";
460 close INPUT or die "error closing input file: $!";
461
462 if ($date == AI_TIME_UNKNOWN && $self->{determine_receive_date}) {
463 $date = Mail::SpamAssassin::Util::receive_date(join('', splice(@msg, 0, $header)));
464 }
465
466 return($class, $format, $date, $where, &{$self->{wanted_sub}}($class, $where, $date, \@msg, $format));
467}
468
469sub _run_mbx {
470 my ($self, $class, $format, $where, $date) = @_;
471
472 my ($file, $offset) = ($where =~ m/(.*)\.(\d+)$/);
473 my @msg;
474 my $header;
475
476 if (!_mail_open($file)) {
477 $self->{access_problem} = 1;
478 return;
479 }
480
481 my $opt_max_size = $self->{opt_max_size};
482 dbg("archive-iterator: _run_mbx %s, ofs %d, limit %d",
483 $file, $offset, $opt_max_size||0);
484
485 seek(INPUT,$offset,0) or die "cannot reposition file to $offset: $!";
486
487 my $size = 0;
488 for ($!=0; <INPUT>; $!=0) {
489 last if ($_ =~ MBX_SEPARATOR);
490 $size += length($_);
491 push (@msg, $_);
492
493 # skip mails that are too big
494 if ($opt_max_size && $size > $opt_max_size) {
495 info("archive-iterator: skipping large message: ".
496 "%d lines, %d bytes, limit %d bytes",
497 scalar @msg, $size, $opt_max_size);
498 close INPUT or die "error closing input file: $!";
499 return;
500 }
501
502 if (!defined $header && /^\s*$/) {
503 $header = $#msg;
504 }
505 }
506 defined $_ || $!==0 or
507 $!==EBADF ? dbg("archive-iterator: error reading: $!")
508 : die "error reading: $!";
509 close INPUT or die "error closing input file: $!";
510
511 if ($date == AI_TIME_UNKNOWN && $self->{determine_receive_date}) {
512 $date = Mail::SpamAssassin::Util::receive_date(join('', splice(@msg, 0, $header)));
513 }
514
515 return($class, $format, $date, $where, &{$self->{wanted_sub}}($class, $where, $date, \@msg, $format));
516}
517
518############################################################################
519
520## FUNCTIONS BELOW THIS POINT ARE FOR FINDING THE MESSAGES TO RUN AGAINST
521
522############################################################################
523
524
# spent 73.7ms (331µs+73.3) within Mail::SpamAssassin::ArchiveIterator::_scan_targets which was called: # once (331µs+73.3ms) by Mail::SpamAssassin::ArchiveIterator::run at line 306
sub _scan_targets {
52512µs my ($self, $targets, $bkfunc) = @_;
526
52713µs %class_opts = ();
528
529114µs foreach my $target (@${targets}) {
53024µs if (!defined $target) {
531 warn "archive-iterator: invalid (undef) value in target list";
532 next;
533 }
534
53524µs my %opts;
53625µs if (ref $target eq 'HASH') {
537 # e.g. { target => $target, opt_foo => 1, opt_bar => 0.4 ... }
538 foreach my $k (keys %{$target}) {
539 if ($k =~ /^opt_/) {
540 $opts{$k} = $target->{$k};
541 }
542 }
543 $target = $target->{target};
544 }
545
546221µs my ($class, $format, $rawloc) = split(/:/, $target, 3);
547
548 # "class"
54927µs if (!defined $format) {
550 warn "archive-iterator: invalid (undef) format in target list, $target";
551 next;
552 }
553 # "class:format"
55424µs if (!defined $rawloc) {
555 warn "archive-iterator: invalid (undef) raw location in target list, $target";
556 next;
557 }
558
55926µs if ($rawloc eq '-') {
560 warn 'archive-iterator: raw location "-" is not supported';
561 next;
562 }
563
564 # use ham by default, things like "spamassassin" can't specify the type
565212µs $class = substr($class, 0, 1) || 'h';
566
567 # keep a copy of the most recent message-selection options for
568 # each class
569210µs $class_opts{$class} = \%opts;
570
571210µs foreach my $k (keys %opts) {
572 $self->{$k} = $opts{$k};
573 }
574224µs2155µs $self->_set_default_message_selection_opts();
# spent 155µs making 2 calls to Mail::SpamAssassin::ArchiveIterator::_set_default_message_selection_opts, avg 77µs/call
575
576221µs2247µs my @locations = $self->_fix_globs($rawloc);
# spent 247µs making 2 calls to Mail::SpamAssassin::ArchiveIterator::_fix_globs, avg 123µs/call
577
578221µs foreach my $location (@locations) {
57923µs my $method;
580
581 # for this location only; 'detect' means they can differ for each location
58226µs my $thisformat = $format;
583
584213µs if ($format eq 'detect') {
585 # detect the format
586264µs227µs my $stat_errn = stat($location) ? 0 : 0+$!;
# spent 27µs making 2 calls to Mail::SpamAssassin::ArchiveIterator::CORE:stat, avg 14µs/call
587238µs25µs if ($stat_errn == ENOENT) {
# spent 5µs making 2 calls to Mail::SpamAssassin::ArchiveIterator::CORE:ftdir, avg 3µs/call
588 $thisformat = 'file'; # actually, no file - to be detected later
589 }
590 elsif ($stat_errn != 0) {
591 warn "archive-iterator: no access to $location: $!";
592 $thisformat = 'file';
593 }
594 elsif (-d _) {
595 # it's a directory
59626µs $thisformat = 'dir';
597 }
598 elsif ($location =~ /\.mbox/i) {
599 # filename indicates mbox
600 $thisformat = 'mbox';
601 }
602 else {
603 $thisformat = 'file';
604 }
605 }
606
60729µs if ($thisformat eq 'dir') {
60826µs $method = \&_scan_directory;
609 }
610 elsif ($thisformat eq 'mbox') {
611 $method = \&_scan_mailbox;
612 }
613 elsif ($thisformat eq 'file') {
614 $method = \&_scan_file;
615 }
616 elsif ($thisformat eq 'mbx') {
617 $method = \&_scan_mbx;
618 }
619 else {
620 warn "archive-iterator: format $thisformat (from $format) unknown!";
621 next;
622 }
623
624 # call the appropriate method
625432µs272.9ms &{$method}($self, $class, $location, $bkfunc);
# spent 72.9ms making 2 calls to Mail::SpamAssassin::ArchiveIterator::_scan_directory, avg 36.5ms/call
626 }
627 }
628}
629
630
# spent 28.7ms (11.5+17.2) within Mail::SpamAssassin::ArchiveIterator::_mail_open which was called 240 times, avg 120µs/call: # 240 times (11.5ms+17.2ms) by Mail::SpamAssassin::ArchiveIterator::_run_file at line 354, avg 120µs/call
sub _mail_open {
631240688µs my ($file) = @_;
632
633 # bug 5288: the "magic" version of open will strip leading and trailing
634 # whitespace from the expression. switch to the three-argument version
635 # of open which does not strip whitespace. see "perldoc -f open" and
636 # "perldoc perlipc" for more information.
637
638 # Assume that the file by default is just a plain file
639240978µs my @expr = ( $file );
640240675µs my $mode = '<';
641
642 # Handle different types of compressed files
6432404.77ms4801.91ms if ($file =~ /\.gz$/) {
# spent 1.91ms making 480 calls to Mail::SpamAssassin::ArchiveIterator::CORE:match, avg 4µs/call
644 $mode = '-|';
645 unshift @expr, 'gunzip', '-cd';
646 }
647 elsif ($file =~ /\.bz2$/) {
648 $mode = '-|';
649 unshift @expr, 'bzip2', '-cd';
650 }
651
652 # Go ahead and try to open the file
65324016.8ms24014.3ms if (!open (INPUT, $mode, @expr)) {
# spent 14.3ms making 240 calls to Mail::SpamAssassin::ArchiveIterator::CORE:open, avg 60µs/call
654 warn "archive-iterator: unable to open $file: $!\n";
655 return 0;
656 }
657
658 # bug 5249: mail could have 8-bit data, need this on some platforms
6592402.77ms2401.01ms binmode INPUT or die "cannot set input file to binmode: $!";
# spent 1.01ms making 240 calls to Mail::SpamAssassin::ArchiveIterator::CORE:binmode, avg 4µs/call
660
6612402.88ms return 1;
662}
663
664
# spent 155µs (125+29) within Mail::SpamAssassin::ArchiveIterator::_set_default_message_selection_opts which was called 2 times, avg 77µs/call: # 2 times (125µs+29µs) by Mail::SpamAssassin::ArchiveIterator::_scan_targets at line 574, avg 77µs/call
sub _set_default_message_selection_opts {
66524µs my ($self) = @_;
666
66727µs $self->{opt_scanprob} = 1.0 unless (defined $self->{opt_scanprob});
66825µs $self->{opt_want_date} = 1 unless (defined $self->{opt_want_date});
66926µs $self->{opt_cache} = 0 unless (defined $self->{opt_cache});
670 #Changed Regex to include boundaries for Communigate Pro versions (5.2.x and later). per Bug 6413
67126µs $self->{opt_from_regex} = '^From \S+ ?(\S\S\S \S\S\S .\d .\d:\d\d:\d\d \d{4}|.\d-\d\d-\d{4}_\d\d:\d\d:\d\d_)' unless (defined $self->{opt_from_regex});
672
673 #STRIP LEADING AND TRAILING / FROM REGEX FOR OPTION
674228µs27µs $self->{opt_from_regex} =~ s/^\///;
# spent 7µs making 2 calls to Mail::SpamAssassin::ArchiveIterator::CORE:subst, avg 3µs/call
675220µs25µs $self->{opt_from_regex} =~ s/\/$//;
# spent 5µs making 2 calls to Mail::SpamAssassin::ArchiveIterator::CORE:subst, avg 3µs/call
676
677263µs217µs dbg("archive-iterator: _set_default_message_selection_opts After: Scanprob[$self->{opt_scanprob}], want_date[$self->{opt_want_date}], cache[$self->{opt_cache}], from_regex[$self->{opt_from_regex}]");
# spent 17µs making 2 calls to Mail::SpamAssassin::Logger::dbg, avg 9µs/call
678
679}
680
681############################################################################
682
683sub _message_is_useful_by_date {
684 my ($self, $date) = @_;
685
686 if (!$self->{opt_after} && !$self->{opt_before}) {
687 # Not using the feature
688 return 1;
689 }
690
691 return 0 unless $date; # undef or 0 date = unusable
692
693 if (!$self->{opt_before}) {
694 # Just care about after
695 return $date > $self->{opt_after};
696 }
697 else {
698 return (($date < $self->{opt_before}) && ($date > $self->{opt_after}));
699 }
700}
701
702# additional check, based solely on a file's mod timestamp. we cannot
703# make assumptions about --before, since the file may have been "touch"ed
704# since the last message was appended; but we can assume that too-old
705# files cannot contain messages newer than their modtime.
706
# spent 2.88ms within Mail::SpamAssassin::ArchiveIterator::_message_is_useful_by_file_modtime which was called 240 times, avg 12µs/call: # 240 times (2.88ms+0s) by Mail::SpamAssassin::ArchiveIterator::_scan_file at line 834, avg 12µs/call
sub _message_is_useful_by_file_modtime {
707240775µs my ($self, $date) = @_;
708
709 # better safe than sorry, if date is undef; let other stuff catch errors
710240407µs return 1 unless $date;
711
712240449µs if ($self->{opt_after}) {
713 return ($date > $self->{opt_after});
714 }
715 else {
7162401.63ms return 1; # --after not in use
717 }
718}
719
720sub _scanprob_says_scan {
721 my ($self) = @_;
722 if (defined $self->{opt_scanprob} && $self->{opt_scanprob} < 1.0) {
723 if ( int( rand( 1 / $self->{opt_scanprob} ) ) != 0 ) {
724 return 0;
725 }
726 }
727 return 1;
728}
729
730############################################################################
731
732# 0 850852128 atime
733# 1 h class
734# 2 m format
735# 3 ./ham/goodmsgs.0 path
736
737# put the date in first, big-endian packed format
738# this format lets cmp easily sort by date, then class, format, and path.
739
# spent 22.0ms (19.6+2.43) within Mail::SpamAssassin::ArchiveIterator::_index_pack which was called 240 times, avg 92µs/call: # 240 times (19.6ms+2.43ms) by Mail::SpamAssassin::ArchiveIterator::__ANON__[/usr/local/lib/perl5/site_perl/Mail/SpamAssassin/ArchiveIterator.pm:305] at line 304, avg 92µs/call
sub _index_pack {
74024022.5ms2402.43ms return pack("NAAA*", @_);
# spent 2.43ms making 240 calls to Mail::SpamAssassin::ArchiveIterator::CORE:pack, avg 10µs/call
741}
742
743
# spent 6.40ms (4.13+2.26) within Mail::SpamAssassin::ArchiveIterator::_index_unpack which was called 240 times, avg 27µs/call: # 240 times (4.13ms+2.26ms) by Mail::SpamAssassin::ArchiveIterator::_run_message at line 338, avg 27µs/call
sub _index_unpack {
7442407.19ms2402.26ms return unpack("NAAA*", $_[0]);
# spent 2.26ms making 240 calls to Mail::SpamAssassin::ArchiveIterator::CORE:unpack, avg 9µs/call
745}
746
747############################################################################
748
749
# spent 72.9ms (11.1+61.8) within Mail::SpamAssassin::ArchiveIterator::_scan_directory which was called 2 times, avg 36.5ms/call: # 2 times (11.1ms+61.8ms) by Mail::SpamAssassin::ArchiveIterator::_scan_targets at line 625, avg 36.5ms/call
sub _scan_directory {
750212µs my ($self, $class, $folder, $bkfunc) = @_;
751
75224µs my(@files,@subdirs);
753
7542117µs466µs if (-d "$folder/new" && -d "$folder/cur" && -d "$folder/tmp") {
# spent 33µs making 2 calls to Mail::SpamAssassin::ArchiveIterator::CORE:ftdir, avg 17µs/call # spent 33µs making 2 calls to Mail::SpamAssassin::ArchiveIterator::CORE:ftfile, avg 16µs/call
755 # Maildir format: bug 3003
756 for my $sub ("new", "cur") {
757 opendir (DIR, "$folder/$sub")
758 or die "Can't open '$folder/$sub' dir: $!\n";
759 # Don't learn from messages marked as deleted
760 # Or files starting with a leading dot
761 push @files, map { "$sub/$_" } grep { !/^\.|:2,.*T/ } readdir(DIR);
762 closedir(DIR) or die "error closing directory $folder: $!";
763 }
764 }
765 elsif (-f "$folder/cyrus.header") {
766 opendir(DIR, $folder)
767 or die "archive-iterator: can't open '$folder' dir: $!\n";
768
769 # cyrus metadata: http://unix.lsa.umich.edu/docs/imap/imap-lsa-srv_3.html
770 @files = grep { $_ ne '.' && $_ ne '..' &&
771 /^\S+$/ && !/^cyrus\.(?:index|header|cache|seen)/ }
772 readdir(DIR);
773 closedir(DIR) or die "error closing directory $folder: $!";
774 }
775 else {
776296µs262µs opendir(DIR, $folder)
# spent 62µs making 2 calls to Mail::SpamAssassin::ArchiveIterator::CORE:open_dir, avg 31µs/call
777 or die "archive-iterator: can't open '$folder' dir: $!\n";
778
779 # ignore ,234 (deleted or refiled messages) and MH metadata dotfiles
7802465.49ms2461.24ms @files = grep { !/^[,.]/ } readdir(DIR);
# spent 668µs making 2 calls to Mail::SpamAssassin::ArchiveIterator::CORE:readdir, avg 334µs/call # spent 571µs making 244 calls to Mail::SpamAssassin::ArchiveIterator::CORE:match, avg 2µs/call
781238µs218µs closedir(DIR) or die "error closing directory $folder: $!";
# spent 18µs making 2 calls to Mail::SpamAssassin::ArchiveIterator::CORE:closedir, avg 9µs/call
782 }
783
78421.01ms $_ = "$folder/$_" for @files;
785
786214µs if (!@files) {
787 # this is not a problem; no need to warn about it
788 # warn "archive-iterator: readdir found no mail in '$folder' directory\n";
78917µs return;
790 }
791
792111µs114µs $self->_create_cache('dir', $folder);
793
79417µs foreach my $file (@files) {
7952407.57ms2406.05ms my $stat_errn = stat($file) ? 0 : 0+$!;
# spent 6.05ms making 240 calls to Mail::SpamAssassin::ArchiveIterator::CORE:stat, avg 25µs/call
7962402.62ms240433µs if ($stat_errn == ENOENT) {
# spent 433µs making 240 calls to Mail::SpamAssassin::ArchiveIterator::CORE:ftfile, avg 2µs/call
797 # no longer there?
798 }
799 elsif ($stat_errn != 0) {
800 warn "archive-iterator: no access to $file: $!";
801 }
802 elsif (-f _ || -c _ || -p _) {
8032401.59ms24054.0ms $self->_scan_file($class, $file, $bkfunc);
# spent 54.0ms making 240 calls to Mail::SpamAssassin::ArchiveIterator::_scan_file, avg 225µs/call
804 }
805 elsif (-d _) {
806 push(@subdirs, $file);
807 }
808 else {
809 warn "archive-iterator: $file is not a plain file or directory: $!";
810 }
811 }
8121101µs undef @files; # release storage
813
814 # recurse into directories
81515µs foreach my $dir (@subdirs) {
816 $self->_scan_directory($class, $dir, $bkfunc);
817 }
818
819115µs if (defined $AICache) {
820 $AICache = $AICache->finish();
821 }
822}
823
824
# spent 54.0ms (16.8+37.2) within Mail::SpamAssassin::ArchiveIterator::_scan_file which was called 240 times, avg 225µs/call: # 240 times (16.8ms+37.2ms) by Mail::SpamAssassin::ArchiveIterator::_scan_directory at line 803, avg 225µs/call
sub _scan_file {
8252401.29ms my ($self, $class, $mail, $bkfunc) = @_;
826
8272401.41ms2401.49ms $self->_bump_scan_progress();
# spent 1.49ms making 240 calls to Mail::SpamAssassin::ArchiveIterator::_bump_scan_progress, avg 6µs/call
828
829 # only perform these stat() operations if we're not using a cache;
830 # it's faster to perform lookups in the cache, and more accurate
831240787µs if (!defined $AICache) {
8322409.17ms2405.00ms my @s = stat($mail);
# spent 5.00ms making 240 calls to Mail::SpamAssassin::ArchiveIterator::CORE:stat, avg 21µs/call
833240431µs @s or warn "archive-iterator: no access to $mail: $!";
8342402.22ms2402.88ms return unless $self->_message_is_useful_by_file_modtime($s[9]);
# spent 2.88ms making 240 calls to Mail::SpamAssassin::ArchiveIterator::_message_is_useful_by_file_modtime, avg 12µs/call
835 }
836
837240425µs my $date = AI_TIME_UNKNOWN;
838240800µs if ($self->{determine_receive_date}) {
839 unless (defined $AICache and $date = $AICache->check($mail)) {
840 # silently skip directories/non-files; some folders may
841 # contain extraneous dirs etc.
842 my $stat_errn = stat($mail) ? 0 : 0+$!;
843 if ($stat_errn != 0) {
844 warn "archive-iterator: no access to $mail: $!";
845 return;
846 }
847 elsif (!-f _) {
848 return;
849 }
850
851 my $header = '';
852 if (!_mail_open($mail)) {
853 $self->{access_problem} = 1;
854 return;
855 }
856 for ($!=0; <INPUT>; $!=0) {
857 last if /^\015?$/s;
858 $header .= $_;
859 }
860 defined $_ || $!==0 or
861 $!==EBADF ? dbg("archive-iterator: error reading: $!")
862 : die "error reading: $!";
863 close INPUT or die "error closing input file: $!";
864
865 return if ($self->{opt_skip_empty_messages} && $header eq '');
866
867 $date = Mail::SpamAssassin::Util::receive_date($header);
868 if (defined $AICache) {
869 $AICache->update($mail, $date);
870 }
871 }
872
873 return if !$self->_message_is_useful_by_date($date);
874 return if !$self->_scanprob_says_scan();
875 }
876 else {
877240417µs return if ($self->{opt_skip_empty_messages} && (-z $mail));
878 }
879
8804802.39ms24027.8ms &{$bkfunc}($self, $date, $class, 'f', $mail);
881
8822401.49ms return;
883}
884
885sub _scan_mailbox {
886 my ($self, $class, $folder, $bkfunc) = @_;
887 my @files;
888
889 my $stat_errn = stat($folder) ? 0 : 0+$!;
890 if ($stat_errn == ENOENT) {
891 # no longer there?
892 }
893 elsif ($stat_errn != 0) {
894 warn "archive-iterator: no access to $folder: $!";
895 }
896 elsif (-f _) {
897 push(@files, $folder);
898 }
899 elsif (-d _) {
900 # passed a directory of mboxes
901 $folder =~ s/\/\s*$//; #Remove trailing slash, if there
902 if (!opendir(DIR, $folder)) {
903 warn "archive-iterator: can't open '$folder' dir: $!\n";
904 $self->{access_problem} = 1;
905 return;
906 }
907 while ($_ = readdir(DIR)) {
908 next if $_ eq '.' || $_ eq '..' || !/^[^\.]\S*$/;
909 # hmmm, ignores folders with spaces in the name???
910 $stat_errn = stat("$folder/$_") ? 0 : 0+$!;
911 if ($stat_errn == ENOENT) {
912 # no longer there?
913 }
914 elsif ($stat_errn != 0) {
915 warn "archive-iterator: no access to $folder/$_: $!";
916 }
917 elsif (-f _) {
918 push(@files, "$folder/$_");
919 }
920 }
921 closedir(DIR) or die "error closing directory $folder: $!";
922 }
923 else {
924 warn "archive-iterator: $folder is not a plain file or directory: $!";
925 }
926
927 foreach my $file (@files) {
928 $self->_bump_scan_progress();
929 if ($file =~ /\.(?:gz|bz2)$/) {
930 warn "archive-iterator: compressed mbox folders are not supported at this time\n";
931 $self->{access_problem} = 1;
932 next;
933 }
934
935 my @s = stat($file);
936 @s or warn "archive-iterator: no access to $file: $!";
937 next unless $self->_message_is_useful_by_file_modtime($s[9]);
938
939 my $info = {};
940 my $count;
941
942 $self->_create_cache('mbox', $file);
943
944 if ($self->{opt_cache}) {
945 if ($count = $AICache->count()) {
946 $info = $AICache->check();
947 }
948 }
949
950 unless ($count) {
951 if (!_mail_open($file)) {
952 $self->{access_problem} = 1;
953 next;
954 }
955
956 my $start = 0; # start of a message
957 my $where = 0; # current byte offset
958 my $first = ''; # first line of message
959 my $header = ''; # header text
960 my $in_header = 0; # are in we a header?
961 while (!eof INPUT) {
962 my $offset = $start; # byte offset of this message
963 my $header = $first; # remember first line
964 for ($!=0; <INPUT>; $!=0) {
965 if ($in_header) {
966 if (/^\015?$/s) {
967 $in_header = 0;
968 }
969 else {
970 $header .= $_;
971 }
972 }
973 #Changed Regex to use option Per bug 6703
974 if (substr($_,0,5) eq "From " && /$self->{opt_from_regex}/o) {
975 $in_header = 1;
976 $first = $_;
977 $start = $where;
978 $where = tell INPUT;
979 $where >= 0 or die "cannot obtain file position: $!";
980 last;
981 }
982 $where = tell INPUT;
983 $where >= 0 or die "cannot obtain file position: $!";
984 }
985 defined $_ || $!==0 or
986 $!==EBADF ? dbg("archive-iterator: error reading: $!")
987 : die "error reading: $!";
988 if ($header ne '') {
989 # next if ($self->{opt_skip_empty_messages} && $header eq '');
990 $self->_bump_scan_progress();
991 $info->{$offset} = Mail::SpamAssassin::Util::receive_date($header);
992 }
993 }
994 close INPUT or die "error closing input file: $!";
995 }
996
997 while(my($k,$v) = each %{$info}) {
998 if (defined $AICache && !$count) {
999 $AICache->update($k, $v);
1000 }
1001
1002 if ($self->{determine_receive_date}) {
1003 next if !$self->_message_is_useful_by_date($v);
1004 }
1005 next if !$self->_scanprob_says_scan();
1006
1007 &{$bkfunc}($self, $v, $class, 'm', "$file.$k");
1008 }
1009
1010 if (defined $AICache) {
1011 $AICache = $AICache->finish();
1012 }
1013 }
1014}
1015
1016sub _scan_mbx {
1017 my ($self, $class, $folder, $bkfunc) = @_;
1018 my (@files, $fp);
1019
1020 my $stat_errn = stat($folder) ? 0 : 0+$!;
1021 if ($stat_errn == ENOENT) {
1022 # no longer there?
1023 }
1024 elsif ($stat_errn != 0) {
1025 warn "archive-iterator: no access to $folder: $!";
1026 }
1027 elsif (-f _) {
1028 push(@files, $folder);
1029 }
1030 elsif (-d _) {
1031 # got passed a directory full of mbx folders.
1032 $folder =~ s/\/\s*$//; # remove trailing slash, if there is one
1033 if (!opendir(DIR, $folder)) {
1034 warn "archive-iterator: can't open '$folder' dir: $!\n";
1035 $self->{access_problem} = 1;
1036 return;
1037 }
1038 while ($_ = readdir(DIR)) {
1039 next if $_ eq '.' || $_ eq '..' || !/^[^\.]\S*$/;
1040 # hmmm, ignores folders with spaces in the name???
1041 $stat_errn = stat("$folder/$_") ? 0 : 0+$!;
1042 if ($stat_errn == ENOENT) {
1043 # no longer there?
1044 }
1045 elsif ($stat_errn != 0) {
1046 warn "archive-iterator: no access to $folder/$_: $!";
1047 }
1048 elsif (-f _) {
1049 push(@files, "$folder/$_");
1050 }
1051 }
1052 closedir(DIR) or die "error closing directory $folder: $!";
1053 }
1054 else {
1055 warn "archive-iterator: $folder is not a plain file or directory: $!";
1056 }
1057
1058 foreach my $file (@files) {
1059 $self->_bump_scan_progress();
1060
1061 if ($folder =~ /\.(?:gz|bz2)$/) {
1062 warn "archive-iterator: compressed mbx folders are not supported at this time\n";
1063 $self->{access_problem} = 1;
1064 next;
1065 }
1066
1067 my @s = stat($file);
1068 @s or warn "archive-iterator: no access to $file: $!";
1069 next unless $self->_message_is_useful_by_file_modtime($s[9]);
1070
1071 my $info = {};
1072 my $count;
1073
1074 $self->_create_cache('mbx', $file);
1075
1076 if ($self->{opt_cache}) {
1077 if ($count = $AICache->count()) {
1078 $info = $AICache->check();
1079 }
1080 }
1081
1082 unless ($count) {
1083 if (!_mail_open($file)) {
1084 $self->{access_problem} = 1;
1085 next;
1086 }
1087
1088 # check the mailbox is in mbx format
1089 $! = 0; $fp = <INPUT>;
1090 defined $fp || $!==0 or
1091 $!==EBADF ? dbg("archive-iterator: error reading: $!")
1092 : die "error reading: $!";
1093 if (!defined $fp) {
1094 die "archive-iterator: error: mailbox not in mbx format - empty!\n";
1095 } elsif ($fp !~ /\*mbx\*/) {
1096 die "archive-iterator: error: mailbox not in mbx format!\n";
1097 }
1098
1099 # skip mbx headers to the first email...
1100 seek(INPUT,2048,0) or die "cannot reposition file to 2048: $!";
1101 my $sep = MBX_SEPARATOR;
1102
1103 for ($!=0; <INPUT>; $!=0) {
1104 if ($_ =~ /$sep/) {
1105 my $offset = tell INPUT;
1106 $offset >= 0 or die "cannot obtain file position: $!";
1107 my $size = $2;
1108
1109 # gather up the headers...
1110 my $header = '';
1111 for ($!=0; <INPUT>; $!=0) {
1112 last if (/^\015?$/s);
1113 $header .= $_;
1114 }
1115 defined $_ || $!==0 or
1116 $!==EBADF ? dbg("archive-iterator: error reading: $!")
1117 : die "error reading: $!";
1118 if (!($self->{opt_skip_empty_messages} && $header eq '')) {
1119 $self->_bump_scan_progress();
1120 $info->{$offset} = Mail::SpamAssassin::Util::receive_date($header);
1121 }
1122
1123 # go onto the next message
1124 seek(INPUT, $offset + $size, 0)
1125 or die "cannot reposition file to $offset + $size: $!";
1126 }
1127 else {
1128 die "archive-iterator: error: failure to read message body!\n";
1129 }
1130 }
1131 defined $_ || $!==0 or
1132 $!==EBADF ? dbg("archive-iterator: error reading: $!")
1133 : die "error reading: $!";
1134 close INPUT or die "error closing input file: $!";
1135 }
1136
1137 while(my($k,$v) = each %{$info}) {
1138 if (defined $AICache && !$count) {
1139 $AICache->update($k, $v);
1140 }
1141
1142 if ($self->{determine_receive_date}) {
1143 next if !$self->_message_is_useful_by_date($v);
1144 }
1145 next if !$self->_scanprob_says_scan();
1146
1147 &{$bkfunc}($self, $v, $class, 'b', "$file.$k");
1148 }
1149
1150 if (defined $AICache) {
1151 $AICache = $AICache->finish();
1152 }
1153 }
1154}
1155
1156############################################################################
1157
1158
# spent 1.49ms within Mail::SpamAssassin::ArchiveIterator::_bump_scan_progress which was called 240 times, avg 6µs/call: # 240 times (1.49ms+0s) by Mail::SpamAssassin::ArchiveIterator::_scan_file at line 827, avg 6µs/call
sub _bump_scan_progress {
1159240400µs my ($self) = @_;
11602401.52ms if (exists $self->{scan_progress_sub}) {
1161 return unless ($self->{scan_progress_counter}++ % 50 == 0);
1162 $self->{scan_progress_sub}->();
1163 }
1164}
1165
1166############################################################################
1167
1168{
116912µs my $home;
1170
1171
# spent 247µs (99+148) within Mail::SpamAssassin::ArchiveIterator::_fix_globs which was called 2 times, avg 123µs/call: # 2 times (99µs+148µs) by Mail::SpamAssassin::ArchiveIterator::_scan_targets at line 576, avg 123µs/call
sub _fix_globs {
117228µs my ($self, $path) = @_;
1173
117426µs unless (defined $home) {
117515µs $home = $ENV{'HOME'};
1176
1177 # No $HOME set? Try to find it, portably.
117812µs unless ($home) {
1179 if (!Mail::SpamAssassin::Util::am_running_on_windows()) {
1180 $home = (Mail::SpamAssassin::Util::portable_getpwuid($<))[7];
1181 } else {
1182 my $vol = $ENV{'HOMEDRIVE'} || 'C:';
1183 my $dir = $ENV{'HOMEPATH'} || '\\';
1184 $home = File::Spec->catpath($vol, $dir, '');
1185 }
1186
1187 # Fall back to no replacement at all.
1188 $home ||= '~';
1189 }
1190 }
1191224µs211µs $path =~ s,^~/,${home}/,;
# spent 11µs making 2 calls to Mail::SpamAssassin::ArchiveIterator::CORE:subst, avg 5µs/call
1192
1193 # protect/escape spaces: ./Mail/My Letters => ./Mail/My\ Letters
1194229µs29µs $path =~ s/(?<!\\)(\s)/\\$1/g;
# spent 9µs making 2 calls to Mail::SpamAssassin::ArchiveIterator::CORE:subst, avg 4µs/call
1195
1196 # return csh-style globs: ./corpus/*.mbox => er, you know what it does ;)
11972176µs2129µs return glob($path);
# spent 129µs making 2 calls to Mail::SpamAssassin::ArchiveIterator::CORE:glob, avg 64µs/call
1198 }
1199}
1200
120113µs
# spent 14µs within Mail::SpamAssassin::ArchiveIterator::_create_cache which was called: # once (14µs+0s) by Mail::SpamAssassin::ArchiveIterator::_scan_directory at line 792
sub _create_cache {
120215µs my ($self, $type, $path) = @_;
1203
1204113µs if ($self->{opt_cache}) {
1205 $AICache = Mail::SpamAssassin::AICache->new({
1206 'type' => $type,
1207 'prefix' => $self->{opt_cachedir},
1208 'path' => $path,
1209 });
1210 }
1211}
1212
1213############################################################################
1214
1215112µs1;
1216
1217__END__
 
# spent 1.01ms within Mail::SpamAssassin::ArchiveIterator::CORE:binmode which was called 240 times, avg 4µs/call: # 240 times (1.01ms+0s) by Mail::SpamAssassin::ArchiveIterator::_mail_open at line 659, avg 4µs/call
sub Mail::SpamAssassin::ArchiveIterator::CORE:binmode; # opcode
# spent 2.42ms within Mail::SpamAssassin::ArchiveIterator::CORE:close which was called 240 times, avg 10µs/call: # 235 times (2.38ms+0s) by Mail::SpamAssassin::ArchiveIterator::_run_file at line 408, avg 10µs/call # 5 times (39µs+0s) by Mail::SpamAssassin::ArchiveIterator::_run_file at line 383, avg 8µs/call
sub Mail::SpamAssassin::ArchiveIterator::CORE:close; # opcode
# spent 18µs within Mail::SpamAssassin::ArchiveIterator::CORE:closedir which was called 2 times, avg 9µs/call: # 2 times (18µs+0s) by Mail::SpamAssassin::ArchiveIterator::_scan_directory at line 781, avg 9µs/call
sub Mail::SpamAssassin::ArchiveIterator::CORE:closedir; # opcode
# spent 38µs within Mail::SpamAssassin::ArchiveIterator::CORE:ftdir which was called 4 times, avg 10µs/call: # 2 times (33µs+0s) by Mail::SpamAssassin::ArchiveIterator::_scan_directory at line 754, avg 17µs/call # 2 times (5µs+0s) by Mail::SpamAssassin::ArchiveIterator::_scan_targets at line 587, avg 3µs/call
sub Mail::SpamAssassin::ArchiveIterator::CORE:ftdir; # opcode
# spent 1.67ms within Mail::SpamAssassin::ArchiveIterator::CORE:ftfile which was called 722 times, avg 2µs/call: # 240 times (825µs+0s) by Mail::SpamAssassin::ArchiveIterator::_run_file at line 360, avg 3µs/call # 240 times (433µs+0s) by Mail::SpamAssassin::ArchiveIterator::_scan_directory at line 796, avg 2µs/call # 240 times (376µs+0s) by Mail::SpamAssassin::ArchiveIterator::_run_file at line 374, avg 2µs/call # 2 times (33µs+0s) by Mail::SpamAssassin::ArchiveIterator::_scan_directory at line 754, avg 16µs/call
sub Mail::SpamAssassin::ArchiveIterator::CORE:ftfile; # opcode
# spent 864µs within Mail::SpamAssassin::ArchiveIterator::CORE:ftsize which was called 245 times, avg 4µs/call: # 240 times (856µs+0s) by Mail::SpamAssassin::ArchiveIterator::_run_file at line 374, avg 4µs/call # 5 times (9µs+0s) by Mail::SpamAssassin::ArchiveIterator::_run_file at line 381, avg 2µs/call
sub Mail::SpamAssassin::ArchiveIterator::CORE:ftsize; # opcode
# spent 129µs within Mail::SpamAssassin::ArchiveIterator::CORE:glob which was called 2 times, avg 64µs/call: # 2 times (129µs+0s) by Mail::SpamAssassin::ArchiveIterator::_fix_globs at line 1197, avg 64µs/call
sub Mail::SpamAssassin::ArchiveIterator::CORE:glob; # opcode
# spent 38.3ms within Mail::SpamAssassin::ArchiveIterator::CORE:match which was called 14285 times, avg 3µs/call: # 13561 times (35.8ms+0s) by Mail::SpamAssassin::ArchiveIterator::_run_file at line 406, avg 3µs/call # 480 times (1.91ms+0s) by Mail::SpamAssassin::ArchiveIterator::_mail_open at line 643, avg 4µs/call # 244 times (571µs+0s) by Mail::SpamAssassin::ArchiveIterator::_scan_directory at line 780, avg 2µs/call
sub Mail::SpamAssassin::ArchiveIterator::CORE:match; # opcode
# spent 14.3ms within Mail::SpamAssassin::ArchiveIterator::CORE:open which was called 240 times, avg 60µs/call: # 240 times (14.3ms+0s) by Mail::SpamAssassin::ArchiveIterator::_mail_open at line 653, avg 60µs/call
sub Mail::SpamAssassin::ArchiveIterator::CORE:open; # opcode
# spent 62µs within Mail::SpamAssassin::ArchiveIterator::CORE:open_dir which was called 2 times, avg 31µs/call: # 2 times (62µs+0s) by Mail::SpamAssassin::ArchiveIterator::_scan_directory at line 776, avg 31µs/call
sub Mail::SpamAssassin::ArchiveIterator::CORE:open_dir; # opcode
# spent 2.43ms within Mail::SpamAssassin::ArchiveIterator::CORE:pack which was called 240 times, avg 10µs/call: # 240 times (2.43ms+0s) by Mail::SpamAssassin::ArchiveIterator::_index_pack at line 740, avg 10µs/call
sub Mail::SpamAssassin::ArchiveIterator::CORE:pack; # opcode
# spent 19.8ms within Mail::SpamAssassin::ArchiveIterator::CORE:read which was called 692 times, avg 29µs/call: # 457 times (7.33ms+0s) by Mail::SpamAssassin::ArchiveIterator::_run_file at line 400, avg 16µs/call # 235 times (12.5ms+0s) by Mail::SpamAssassin::ArchiveIterator::_run_file at line 392, avg 53µs/call
sub Mail::SpamAssassin::ArchiveIterator::CORE:read; # opcode
# spent 668µs within Mail::SpamAssassin::ArchiveIterator::CORE:readdir which was called 2 times, avg 334µs/call: # 2 times (668µs+0s) by Mail::SpamAssassin::ArchiveIterator::_scan_directory at line 780, avg 334µs/call
sub Mail::SpamAssassin::ArchiveIterator::CORE:readdir; # opcode
# spent 12.4ms within Mail::SpamAssassin::ArchiveIterator::CORE:stat which was called 722 times, avg 17µs/call: # 240 times (6.05ms+0s) by Mail::SpamAssassin::ArchiveIterator::_scan_directory at line 795, avg 25µs/call # 240 times (5.00ms+0s) by Mail::SpamAssassin::ArchiveIterator::_scan_file at line 832, avg 21µs/call # 240 times (1.34ms+0s) by Mail::SpamAssassin::ArchiveIterator::_run_file at line 359, avg 6µs/call # 2 times (27µs+0s) by Mail::SpamAssassin::ArchiveIterator::_scan_targets at line 586, avg 14µs/call
sub Mail::SpamAssassin::ArchiveIterator::CORE:stat; # opcode
# spent 31µs within Mail::SpamAssassin::ArchiveIterator::CORE:subst which was called 8 times, avg 4µs/call: # 2 times (11µs+0s) by Mail::SpamAssassin::ArchiveIterator::_fix_globs at line 1191, avg 5µs/call # 2 times (9µs+0s) by Mail::SpamAssassin::ArchiveIterator::_fix_globs at line 1194, avg 4µs/call # 2 times (7µs+0s) by Mail::SpamAssassin::ArchiveIterator::_set_default_message_selection_opts at line 674, avg 3µs/call # 2 times (5µs+0s) by Mail::SpamAssassin::ArchiveIterator::_set_default_message_selection_opts at line 675, avg 3µs/call
sub Mail::SpamAssassin::ArchiveIterator::CORE:subst; # opcode
# spent 2.26ms within Mail::SpamAssassin::ArchiveIterator::CORE:unpack which was called 240 times, avg 9µs/call: # 240 times (2.26ms+0s) by Mail::SpamAssassin::ArchiveIterator::_index_unpack at line 744, avg 9µs/call
sub Mail::SpamAssassin::ArchiveIterator::CORE:unpack; # opcode