← Index
NYTProf Performance Profile   « line view »
For /usr/local/bin/sa-learn
  Run on Sun Nov 5 03:09:29 2017
Reported on Mon Nov 6 13:20:48 2017

Filename/usr/local/lib/perl5/site_perl/Mail/SpamAssassin/ArchiveIterator.pm
StatementsExecuted 30244 statements in 617ms
Subroutines
Calls P F Exclusive
Time
Inclusive
Time
Subroutine
23911322ms96438sMail::SpamAssassin::ArchiveIterator::::_run_fileMail::SpamAssassin::ArchiveIterator::_run_file
2391172.7ms96438sMail::SpamAssassin::ArchiveIterator::::_run_messageMail::SpamAssassin::ArchiveIterator::_run_message
2391142.4ms64.3msMail::SpamAssassin::ArchiveIterator::::_scan_fileMail::SpamAssassin::ArchiveIterator::_scan_file
142083139.2ms39.2msMail::SpamAssassin::ArchiveIterator::::CORE:matchMail::SpamAssassin::ArchiveIterator::CORE:match (opcode)
2391124.0ms24.0msMail::SpamAssassin::ArchiveIterator::::CORE:openMail::SpamAssassin::ArchiveIterator::CORE:open (opcode)
6902119.4ms19.4msMail::SpamAssassin::ArchiveIterator::::CORE:readMail::SpamAssassin::ArchiveIterator::CORE:read (opcode)
7194117.0ms17.0msMail::SpamAssassin::ArchiveIterator::::CORE:statMail::SpamAssassin::ArchiveIterator::CORE:stat (opcode)
21114.1ms90.3msMail::SpamAssassin::ArchiveIterator::::_scan_directoryMail::SpamAssassin::ArchiveIterator::_scan_directory
11112.2ms96438sMail::SpamAssassin::ArchiveIterator::::_runMail::SpamAssassin::ArchiveIterator::_run
2391112.0ms39.1msMail::SpamAssassin::ArchiveIterator::::_mail_openMail::SpamAssassin::ArchiveIterator::_mail_open
239115.48ms10.9msMail::SpamAssassin::ArchiveIterator::::__ANON__[:305]Mail::SpamAssassin::ArchiveIterator::__ANON__[:305]
239114.25ms6.61msMail::SpamAssassin::ArchiveIterator::::_index_unpackMail::SpamAssassin::ArchiveIterator::_index_unpack
239113.49ms3.49msMail::SpamAssassin::ArchiveIterator::::_message_is_useful_by_file_modtimeMail::SpamAssassin::ArchiveIterator::_message_is_useful_by_file_modtime
239112.90ms2.90msMail::SpamAssassin::ArchiveIterator::::CORE:packMail::SpamAssassin::ArchiveIterator::CORE:pack (opcode)
239112.51ms5.41msMail::SpamAssassin::ArchiveIterator::::_index_packMail::SpamAssassin::ArchiveIterator::_index_pack
239212.39ms2.39msMail::SpamAssassin::ArchiveIterator::::CORE:closeMail::SpamAssassin::ArchiveIterator::CORE:close (opcode)
239112.36ms2.36msMail::SpamAssassin::ArchiveIterator::::CORE:unpackMail::SpamAssassin::ArchiveIterator::CORE:unpack (opcode)
1112.30ms3.00msMail::SpamAssassin::ArchiveIterator::::BEGIN@31Mail::SpamAssassin::ArchiveIterator::BEGIN@31
239111.98ms1.98msMail::SpamAssassin::ArchiveIterator::::_bump_scan_progressMail::SpamAssassin::ArchiveIterator::_bump_scan_progress
719411.71ms1.71msMail::SpamAssassin::ArchiveIterator::::CORE:ftfileMail::SpamAssassin::ArchiveIterator::CORE:ftfile (opcode)
239111.04ms1.04msMail::SpamAssassin::ArchiveIterator::::CORE:binmodeMail::SpamAssassin::ArchiveIterator::CORE:binmode (opcode)
24421816µs816µsMail::SpamAssassin::ArchiveIterator::::CORE:ftsizeMail::SpamAssassin::ArchiveIterator::CORE:ftsize (opcode)
211664µs664µsMail::SpamAssassin::ArchiveIterator::::CORE:readdirMail::SpamAssassin::ArchiveIterator::CORE:readdir (opcode)
111338µs91.2msMail::SpamAssassin::ArchiveIterator::::_scan_targetsMail::SpamAssassin::ArchiveIterator::_scan_targets
211150µs150µsMail::SpamAssassin::ArchiveIterator::::CORE:globMail::SpamAssassin::ArchiveIterator::CORE:glob (opcode)
211141µs175µsMail::SpamAssassin::ArchiveIterator::::_set_default_message_selection_optsMail::SpamAssassin::ArchiveIterator::_set_default_message_selection_opts
111102µs96438sMail::SpamAssassin::ArchiveIterator::::runMail::SpamAssassin::ArchiveIterator::run
211101µs266µsMail::SpamAssassin::ArchiveIterator::::_fix_globsMail::SpamAssassin::ArchiveIterator::_fix_globs
21194µs94µsMail::SpamAssassin::ArchiveIterator::::CORE:open_dirMail::SpamAssassin::ArchiveIterator::CORE:open_dir (opcode)
42187µs87µsMail::SpamAssassin::ArchiveIterator::::CORE:ftdirMail::SpamAssassin::ArchiveIterator::CORE:ftdir (opcode)
11163µs212µsMail::SpamAssassin::ArchiveIterator::::BEGIN@27Mail::SpamAssassin::ArchiveIterator::BEGIN@27
11149µs49µsMail::SpamAssassin::ArchiveIterator::::newMail::SpamAssassin::ArchiveIterator::new
11148µs62µsMail::SpamAssassin::ArchiveIterator::::BEGIN@22Mail::SpamAssassin::ArchiveIterator::BEGIN@22
84131µs31µsMail::SpamAssassin::ArchiveIterator::::CORE:substMail::SpamAssassin::ArchiveIterator::CORE:subst (opcode)
11127µs221µsMail::SpamAssassin::ArchiveIterator::::BEGIN@34Mail::SpamAssassin::ArchiveIterator::BEGIN@34
11126µs592µsMail::SpamAssassin::ArchiveIterator::::BEGIN@29Mail::SpamAssassin::ArchiveIterator::BEGIN@29
11126µs160µsMail::SpamAssassin::ArchiveIterator::::BEGIN@30Mail::SpamAssassin::ArchiveIterator::BEGIN@30
11124µs62µsMail::SpamAssassin::ArchiveIterator::::BEGIN@23Mail::SpamAssassin::ArchiveIterator::BEGIN@23
11124µs202µsMail::SpamAssassin::ArchiveIterator::::BEGIN@36Mail::SpamAssassin::ArchiveIterator::BEGIN@36
11123µs84µsMail::SpamAssassin::ArchiveIterator::::BEGIN@28Mail::SpamAssassin::ArchiveIterator::BEGIN@28
11123µs30µsMail::SpamAssassin::ArchiveIterator::::BEGIN@24Mail::SpamAssassin::ArchiveIterator::BEGIN@24
11123µs90µsMail::SpamAssassin::ArchiveIterator::::BEGIN@25Mail::SpamAssassin::ArchiveIterator::BEGIN@25
21120µs20µsMail::SpamAssassin::ArchiveIterator::::CORE:closedirMail::SpamAssassin::ArchiveIterator::CORE:closedir (opcode)
11115µs15µsMail::SpamAssassin::ArchiveIterator::::set_functionsMail::SpamAssassin::ArchiveIterator::set_functions
11114µs14µsMail::SpamAssassin::ArchiveIterator::::_create_cacheMail::SpamAssassin::ArchiveIterator::_create_cache
0000s0sMail::SpamAssassin::ArchiveIterator::::_message_is_useful_by_dateMail::SpamAssassin::ArchiveIterator::_message_is_useful_by_date
0000s0sMail::SpamAssassin::ArchiveIterator::::_run_mailboxMail::SpamAssassin::ArchiveIterator::_run_mailbox
0000s0sMail::SpamAssassin::ArchiveIterator::::_run_mbxMail::SpamAssassin::ArchiveIterator::_run_mbx
0000s0sMail::SpamAssassin::ArchiveIterator::::_scan_mailboxMail::SpamAssassin::ArchiveIterator::_scan_mailbox
0000s0sMail::SpamAssassin::ArchiveIterator::::_scan_mbxMail::SpamAssassin::ArchiveIterator::_scan_mbx
0000s0sMail::SpamAssassin::ArchiveIterator::::_scanprob_says_scanMail::SpamAssassin::ArchiveIterator::_scanprob_says_scan
Call graph for these subroutines as a Graphviz dot language file.
Line State
ments
Time
on line
Calls Time
in subs
Code
1# iterate over mail archives, calling a function on each message.
2#
3# <@LICENSE>
4# Licensed to the Apache Software Foundation (ASF) under one or more
5# contributor license agreements. See the NOTICE file distributed with
6# this work for additional information regarding copyright ownership.
7# The ASF licenses this file to you under the Apache License, Version 2.0
8# (the "License"); you may not use this file except in compliance with
9# the License. You may obtain a copy of the License at:
10#
11# http://www.apache.org/licenses/LICENSE-2.0
12#
13# Unless required by applicable law or agreed to in writing, software
14# distributed under the License is distributed on an "AS IS" BASIS,
15# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
16# See the License for the specific language governing permissions and
17# limitations under the License.
18# </@LICENSE>
19
20package Mail::SpamAssassin::ArchiveIterator;
21
22263µs276µs
# spent 62µs (48+14) within Mail::SpamAssassin::ArchiveIterator::BEGIN@22 which was called: # once (48µs+14µs) by main::BEGIN@66 at line 22
use strict;
# spent 62µs making 1 call to Mail::SpamAssassin::ArchiveIterator::BEGIN@22 # spent 14µs making 1 call to strict::import
23257µs2100µs
# spent 62µs (24+38) within Mail::SpamAssassin::ArchiveIterator::BEGIN@23 which was called: # once (24µs+38µs) by main::BEGIN@66 at line 23
use warnings;
# spent 62µs making 1 call to Mail::SpamAssassin::ArchiveIterator::BEGIN@23 # spent 38µs making 1 call to warnings::import
24266µs237µs
# spent 30µs (23+7) within Mail::SpamAssassin::ArchiveIterator::BEGIN@24 which was called: # once (23µs+7µs) by main::BEGIN@66 at line 24
use bytes;
# spent 30µs making 1 call to Mail::SpamAssassin::ArchiveIterator::BEGIN@24 # spent 7µs making 1 call to bytes::import
25270µs2156µs
# spent 90µs (23+67) within Mail::SpamAssassin::ArchiveIterator::BEGIN@25 which was called: # once (23µs+67µs) by main::BEGIN@66 at line 25
use re 'taint';
# spent 90µs making 1 call to Mail::SpamAssassin::ArchiveIterator::BEGIN@25 # spent 67µs making 1 call to re::import
26
27261µs2361µs
# spent 212µs (63+149) within Mail::SpamAssassin::ArchiveIterator::BEGIN@27 which was called: # once (63µs+149µs) by main::BEGIN@66 at line 27
use Errno qw(ENOENT EACCES EBADF);
# spent 212µs making 1 call to Mail::SpamAssassin::ArchiveIterator::BEGIN@27 # spent 149µs making 1 call to Exporter::import
28261µs2145µs
# spent 84µs (23+61) within Mail::SpamAssassin::ArchiveIterator::BEGIN@28 which was called: # once (23µs+61µs) by main::BEGIN@66 at line 28
use Mail::SpamAssassin::Util;
# spent 84µs making 1 call to Mail::SpamAssassin::ArchiveIterator::BEGIN@28 # spent 61µs making 1 call to Exporter::import
29266µs21.16ms
# spent 592µs (26+566) within Mail::SpamAssassin::ArchiveIterator::BEGIN@29 which was called: # once (26µs+566µs) by main::BEGIN@66 at line 29
use Mail::SpamAssassin::Constants qw(:sa);
# spent 592µs making 1 call to Mail::SpamAssassin::ArchiveIterator::BEGIN@29 # spent 566µs making 1 call to Exporter::import
30261µs2294µs
# spent 160µs (26+134) within Mail::SpamAssassin::ArchiveIterator::BEGIN@30 which was called: # once (26µs+134µs) by main::BEGIN@66 at line 30
use Mail::SpamAssassin::Logger;
# spent 160µs making 1 call to Mail::SpamAssassin::ArchiveIterator::BEGIN@30 # spent 134µs making 1 call to Exporter::import
312373µs13.00ms
# spent 3.00ms (2.30+705µs) within Mail::SpamAssassin::ArchiveIterator::BEGIN@31 which was called: # once (2.30ms+705µs) by main::BEGIN@66 at line 31
use Mail::SpamAssassin::AICache;
# spent 3.00ms making 1 call to Mail::SpamAssassin::ArchiveIterator::BEGIN@31
32
33# 256 KiB is a big email, unless stated otherwise
34279µs2415µs
# spent 221µs (27+194) within Mail::SpamAssassin::ArchiveIterator::BEGIN@34 which was called: # once (27µs+194µs) by main::BEGIN@66 at line 34
use constant BIG_BYTES => 256*1024;
# spent 221µs making 1 call to Mail::SpamAssassin::ArchiveIterator::BEGIN@34 # spent 194µs making 1 call to constant::import
35
3613µs
# spent 202µs (24+179) within Mail::SpamAssassin::ArchiveIterator::BEGIN@36 which was called: # once (24µs+179µs) by main::BEGIN@66 at line 41
use vars qw {
37 $MESSAGES
38 $AICache
39 %class_opts
40 @ISA
41110.3ms2382µs};
# spent 202µs making 1 call to Mail::SpamAssassin::ArchiveIterator::BEGIN@36 # spent 179µs making 1 call to vars::import
42
43114µs@ISA = qw();
44
45=head1 NAME
46
47Mail::SpamAssassin::ArchiveIterator - find and process messages one at a time
48
49=head1 SYNOPSIS
50
51 my $iter = new Mail::SpamAssassin::ArchiveIterator(
52 {
53 'opt_max_size' => 256 * 1024, # 0 implies no limit
54 'opt_cache' => 1,
55 }
56 );
57
58 $iter->set_functions( \&wanted, sub { } );
59
60 eval { $iter->run(@ARGV); };
61
62 sub wanted {
63 my($class, $filename, $recv_date, $msg_array) = @_;
64
65
66 ...
67 }
68
69=head1 DESCRIPTION
70
71The Mail::SpamAssassin::ArchiveIterator module will go through a set
72of mbox files, mbx files, and directories (with a single message per
73file) and generate a list of messages. It will then call the C<wanted_sub>
74and C<result_sub> functions appropriately per message.
75
76=head1 METHODS
77
78=over 4
79
80=cut
81
82
83###########################################################################
84
85=item $item = new Mail::SpamAssassin::ArchiveIterator( [ { opt => val, ... } ] )
86
87Constructs a new C<Mail::SpamAssassin::ArchiveIterator> object. You may
88pass the following attribute-value pairs to the constructor. The pairs are
89optional unless otherwise noted.
90
91=over 4
92
93=item opt_max_size
94
95A value of option I<opt_max_size> determines a limit (number of bytes)
96beyond which a message is considered large and is skipped by ArchiveIterator.
97
98A value 0 implies no size limit, all messages are examined. An undefined
99value implies a default limit of 256 KiB.
100
101=item opt_all
102
103Setting this option to true implicitly sets I<opt_max_size> to 0, i.e.
104no limit of a message size, all messages are processes by ArchiveIterator.
105For compatibility with SpamAssassin versions older than 3.4.0 which
106lacked option I<opt_max_size>.
107
108=item opt_scanprob
109
110Randomly select messages to scan, with a probability of N, where N ranges
111from 0.0 (no messages scanned) to 1.0 (all messages scanned). Default
112is 1.0.
113
114This setting can be specified separately for each target.
115
116=item opt_before
117
118Only use messages which are received after the given time_t value.
119Negative values are an offset from the current time, e.g. -86400 =
120last 24 hours; or as parsed by Time::ParseDate (e.g. '-6 months')
121
122This setting can be specified separately for each target.
123
124=item opt_after
125
126Same as opt_before, except the messages are only used if after the given
127time_t value.
128
129This setting can be specified separately for each target.
130
131=item opt_want_date
132
133Set to 1 (default) if you want the received date to be filled in
134in the C<wanted_sub> callback below. Set this to 0 to avoid this;
135it's a good idea to set this to 0 if you can, as it imposes a performance
136hit.
137
138=item opt_skip_empty_messages
139
140Set to 1 if you want to skip corrupt, 0-byte messages. The default is 0.
141
142=item opt_cache
143
144Set to 0 (default) if you don't want to use cached information to help speed
145up ArchiveIterator. Set to 1 to enable. This setting requires C<opt_cachedir>
146also be set.
147
148=item opt_cachedir
149
150Set to the path of a directory where you wish to store cached information for
151C<opt_cache>, if you don't want to mix them with the input files (as is the
152default). The directory must be both readable and writable.
153
154=item wanted_sub
155
156Reference to a subroutine which will process message data. Usually
157set via set_functions(). The routine will be passed 5 values: class
158(scalar), filename (scalar), received date (scalar), message content
159(array reference, one message line per element), and the message format
160key ('f' for file, 'm' for mbox, 'b' for mbx).
161
162Note that if C<opt_want_date> is set to 0, the received date scalar will be
163undefined.
164
165=item result_sub
166
167Reference to a subroutine which will process the results of the wanted_sub
168for each message processed. Usually set via set_functions().
169The routine will be passed 3 values: class (scalar), result (scalar, returned
170from wanted_sub), and received date (scalar).
171
172Note that if C<opt_want_date> is set to 0, the received date scalar will be
173undefined.
174
175=item scan_progress_sub
176
177Reference to a subroutine which will be called intermittently during
178the 'scan' phase of the mass-check. No guarantees are made as to
179how frequently this may happen, mind you.
180
181=item opt_from_regex
182
183This setting allows for flexibility in specifying the mbox format From separator.
184
185It defaults to the regular expression:
186
187/^From \S+ ?(\S\S\S \S\S\S .\d .\d:\d\d:\d\d \d{4}|.\d-\d\d-\d{4}_\d\d:\d\d:\d\d_)/
188
189Some SpamAssassin programs such as sa-learn will use the configuration option
190'mbox_format_from_regex' to override the default regular expression.
191
192=back
193
194=cut
195
196
# spent 49µs within Mail::SpamAssassin::ArchiveIterator::new which was called: # once (49µs+0s) by main::RUNTIME at line 467 of /usr/local/bin/sa-learn
sub new {
19712µs my $class = shift;
19812µs $class = ref($class) || $class;
199
20012µs my $self = shift;
20112µs if (!defined $self) { $self = { }; }
20212µs bless ($self, $class);
203
204 # If any of these options are set, we need to figure out the message's
205 # receive date at scan time. opt_after, opt_before, or opt_want_date
206 $self->{determine_receive_date} =
207 defined $self->{opt_after} || defined $self->{opt_before} ||
208110µs $self->{opt_want_date};
209
21014µs $self->{s} = [ ]; # spam, of course
21113µs $self->{h} = [ ]; # ham, as if you couldn't guess
212
21314µs $self->{access_problem} = 0;
214
21516µs if ($self->{opt_all}) {
216 $self->{opt_max_size} = 0;
217 } elsif (!defined $self->{opt_max_size}) {
21814µs $self->{opt_max_size} = BIG_BYTES;
219 }
220
221111µs $self;
222}
223
224###########################################################################
225
226=item set_functions( \&wanted_sub, \&result_sub )
227
228Sets the subroutines used for message processing (wanted_sub), and result
229reporting. For more information, see I<new()> above.
230
231=cut
232
233
# spent 15µs within Mail::SpamAssassin::ArchiveIterator::set_functions which was called: # once (15µs+0s) by main::RUNTIME at line 469 of /usr/local/bin/sa-learn
sub set_functions {
23413µs my ($self, $wanted, $result) = @_;
23514µs $self->{wanted_sub} = $wanted if defined $wanted;
236111µs $self->{result_sub} = $result if defined $result;
237}
238
239###########################################################################
240
241=item run ( @target_paths )
242
243Generates the list of messages to process, then runs each message through the
244configured wanted subroutine. Files which have a name ending in C<.gz> or
245C<.bz2> will be properly uncompressed via call to C<gzip -dc> and C<bzip2 -dc>
246respectively.
247
248The target_paths array is expected to be either one element per path in the
249following format: C<class:format:raw_location>, or a hash reference containing
250key-value option pairs and a 'target' key with a value in that format.
251
252The key-value option pairs that can be used are: opt_scanprob, opt_after,
253opt_before. See the constructor method's documentation for more information
254on their effects.
255
256run() returns 0 if there was an error (can't open a file, etc,) and 1 if there
257were no errors.
258
259=over 4
260
261=item class
262
263Either 'h' for ham or 's' for spam. If the class is longer than 1 character,
264it will be truncated. If blank, 'h' is default.
265
266=item format
267
268Specifies the format of the raw_location. C<dir> is a directory whose
269files are individual messages, C<file> a file with a single message,
270C<mbox> an mbox formatted file, or C<mbx> for an mbx formatted directory.
271
272C<detect> can also be used. This assumes C<mbox> for any file whose path
273contains the pattern C</\.mbox/i>, C<file> anything that is not a
274directory, or C<directory> otherwise.
275
276=item raw_location
277
278Path to file or directory. File globbing is allowed using the
279standard csh-style globbing (see C<perldoc -f glob>). C<~> at the
280front of the value will be replaced by the C<HOME> environment
281variable. Escaped whitespace is protected as well.
282
283B<NOTE:> C<~user> is not allowed.
284
285B<NOTE 2:> C<-> is not allowed as a raw location. To have
286ArchiveIterator deal with STDIN, generate a temp file.
287
288=back
289
290=cut
291
292
# spent 96438s (102µs+96438) within Mail::SpamAssassin::ArchiveIterator::run which was called: # once (102µs+96438s) by main::RUNTIME at line 478 of /usr/local/bin/sa-learn
sub run {
29317µs my ($self, @targets) = @_;
294
29513µs if (!defined $self->{wanted_sub}) {
296 warn "archive-iterator: set_functions never called";
297 return 0;
298 }
299
300 # scan the targets and get the number and list of messages
301 $self->_scan_targets(\@targets,
302
# spent 10.9ms (5.48+5.41) within Mail::SpamAssassin::ArchiveIterator::__ANON__[/usr/local/lib/perl5/site_perl/Mail/SpamAssassin/ArchiveIterator.pm:305] which was called 239 times, avg 46µs/call: # 239 times (5.48ms+5.41ms) by Mail::SpamAssassin::ArchiveIterator::_scan_file at line 880, avg 46µs/call
sub {
3032391.22ms my($self, $date, $class, $format, $mail) = @_;
30447814.8ms2395.41ms push(@{$self->{$class}}, _index_pack($date, $class, $format, $mail));
# spent 5.41ms making 239 calls to Mail::SpamAssassin::ArchiveIterator::_index_pack, avg 23µs/call
305 }
306128µs191.2ms );
307
30812µs my $messages;
309 # for ease of memory, we'll play with pointers
31013µs $messages = $self->{s};
31113µs undef $self->{s};
312311µs push(@{$messages}, @{$self->{h}});
31312µs undef $self->{h};
314
31526µs $MESSAGES = scalar(@{$messages});
316
317 # go ahead and run through all of the messages specified
318125µs196438s return $self->_run($messages);
# spent 96438s making 1 call to Mail::SpamAssassin::ArchiveIterator::_run
319}
320
321
# spent 96438s (12.2ms+96438) within Mail::SpamAssassin::ArchiveIterator::_run which was called: # once (12.2ms+96438s) by Mail::SpamAssassin::ArchiveIterator::run at line 318
sub _run {
32212µs my ($self, $messages) = @_;
323
3242413.14ms while (my $message = shift @{$messages}) {
3252393.96ms23996438s my($class, undef, $date, undef, $result) = $self->_run_message($message);
# spent 96438s making 239 calls to Mail::SpamAssassin::ArchiveIterator::_run_message, avg 404s/call
3264734.15ms2344.54ms &{$self->{result_sub}}($class, $result, $date) if $result;
# spent 4.54ms making 234 calls to main::result, avg 19µs/call
327 }
328118µs return ! $self->{access_problem};
329}
330
331############################################################################
332
333## run_message and related functions to process a single message
334
335
# spent 96438s (72.7ms+96438) within Mail::SpamAssassin::ArchiveIterator::_run_message which was called 239 times, avg 404s/call: # 239 times (72.7ms+96438s) by Mail::SpamAssassin::ArchiveIterator::_run at line 325, avg 404s/call
sub _run_message {
3362391.03ms my ($self, $msg) = @_;
337
3382393.45ms2396.61ms my ($date, $class, $format, $mail) = _index_unpack($msg);
# spent 6.61ms making 239 calls to Mail::SpamAssassin::ArchiveIterator::_index_unpack, avg 28µs/call
339
340239727µs if ($format eq 'f') {
34123966.9ms23996438s return $self->_run_file($class, $format, $mail, $date);
# spent 96438s making 239 calls to Mail::SpamAssassin::ArchiveIterator::_run_file, avg 404s/call
342 }
343 elsif ($format eq 'm') {
344 return $self->_run_mailbox($class, $format, $mail, $date);
345 }
346 elsif ($format eq 'b') {
347 return $self->_run_mbx($class, $format, $mail, $date);
348 }
349}
350
351
# spent 96438s (322ms+96437) within Mail::SpamAssassin::ArchiveIterator::_run_file which was called 239 times, avg 404s/call: # 239 times (322ms+96437s) by Mail::SpamAssassin::ArchiveIterator::_run_message at line 341, avg 404s/call
sub _run_file {
3522392.16ms my ($self, $class, $format, $where, $date) = @_;
353
3542391.97ms23939.1ms if (!_mail_open($where)) {
# spent 39.1ms making 239 calls to Mail::SpamAssassin::ArchiveIterator::_mail_open, avg 164µs/call
355 $self->{access_problem} = 1;
356 return;
357 }
358
3592399.29ms2391.38ms my $stat_errn = stat(INPUT) ? 0 : 0+$!;
# spent 1.38ms making 239 calls to Mail::SpamAssassin::ArchiveIterator::CORE:stat, avg 6µs/call
3602393.16ms239873µs if ($stat_errn == ENOENT) {
# spent 873µs making 239 calls to Mail::SpamAssassin::ArchiveIterator::CORE:ftfile, avg 4µs/call
361 dbg("archive-iterator: no such input ($where)");
362 return;
363 }
364 elsif ($stat_errn != 0) {
365 warn "archive-iterator: no access to input ($where): $!";
366 return;
367 }
368 elsif (!-f _ && !-c _ && !-p _) {
369 warn "archive-iterator: not a plain file (or char.spec. or pipe) ($where)";
370 return;
371 }
372
373239971µs my $opt_max_size = $self->{opt_max_size};
3742394.29ms4781.19ms if (!$opt_max_size) {
# spent 808µs making 239 calls to Mail::SpamAssassin::ArchiveIterator::CORE:ftsize, avg 3µs/call # spent 386µs making 239 calls to Mail::SpamAssassin::ArchiveIterator::CORE:ftfile, avg 2µs/call
375 # process any size
376 } elsif (!-f _) {
377 # must check size while reading
378 } elsif (-s _ > $opt_max_size) {
379 # skip too-big mails
380 # note that -s can only deal with files, it returns 0 on char.spec. STDIN
381586µs1065µs info("archive-iterator: skipping large message: ".
# spent 57µs making 5 calls to Mail::SpamAssassin::Logger::info, avg 11µs/call # spent 8µs making 5 calls to Mail::SpamAssassin::ArchiveIterator::CORE:ftsize, avg 2µs/call
382 "file size %d, limit %d bytes", -s _, $opt_max_size);
383585µs538µs close INPUT or die "error closing input file: $!";
# spent 38µs making 5 calls to Mail::SpamAssassin::ArchiveIterator::CORE:close, avg 8µs/call
384546µs return;
385 }
386
387234457µs my @msg;
388 my $header;
389234508µs my $len = 0;
390234591µs my $str = '';
391234440µs my($inbuf,$nread);
39223420.5ms23412.0ms while ( $nread=read(INPUT,$inbuf,16384) ) {
# spent 12.0ms making 234 calls to Mail::SpamAssassin::ArchiveIterator::CORE:read, avg 51µs/call
3934561.22ms $len += $nread;
3944561.01ms if ($opt_max_size && $len > $opt_max_size) {
395 info("archive-iterator: skipping large message: read %d, limit %d bytes",
396 $len, $opt_max_size);
397 close INPUT or die "error closing input file: $!";
398 return;
399 }
40045622.2ms4567.41ms $str .= $inbuf;
# spent 7.41ms making 456 calls to Mail::SpamAssassin::ArchiveIterator::CORE:read, avg 16µs/call
401 }
402234526µs defined $nread or die "error reading: $!";
403234755µs undef $inbuf;
40446871.6ms @msg = split(/^/m, $str, -1); undef $str;
4052341.73ms for my $j (0..$#msg) {
40613955226ms1348736.6ms if ($msg[$j] =~ /^\015?$/) { $header = $j; last }
# spent 36.6ms making 13487 calls to Mail::SpamAssassin::ArchiveIterator::CORE:match, avg 3µs/call
407 }
4082344.41ms2342.36ms close INPUT or die "error closing input file: $!";
# spent 2.36ms making 234 calls to Mail::SpamAssassin::ArchiveIterator::CORE:close, avg 10µs/call
409
4102341000µs if ($date == AI_TIME_UNKNOWN && $self->{determine_receive_date}) {
411 $date = Mail::SpamAssassin::Util::receive_date(join('', splice(@msg, 0, $header)));
412 }
413
4144689.38ms23496437s return($class, $format, $date, $where, &{$self->{wanted_sub}}($class, $where, $date, \@msg, $format));
# spent 96437s making 234 calls to main::wanted, avg 412s/call
415}
416
417sub _run_mailbox {
418 my ($self, $class, $format, $where, $date) = @_;
419
420 my ($file, $offset);
421 { local($1,$2); # Bug 7140 (avoids perl bug [perl #123880])
422 ($file, $offset) = ($where =~ m/(.*)\.(\d+)$/);
423 }
424 my @msg;
425 my $header;
426 if (!_mail_open($file)) {
427 $self->{access_problem} = 1;
428 return;
429 }
430
431 my $opt_max_size = $self->{opt_max_size};
432 dbg("archive-iterator: _run_mailbox %s, ofs %d, limit %d",
433 $file, $offset, $opt_max_size||0);
434
435 seek(INPUT,$offset,0) or die "cannot reposition file to $offset: $!";
436
437 my $size = 0;
438 for ($!=0; <INPUT>; $!=0) {
439 #Changed Regex to use option Per bug 6703
440 last if (substr($_,0,5) eq "From " && @msg && /$self->{opt_from_regex}/o);
441 $size += length($_);
442 push (@msg, $_);
443
444 # skip mails that are too big
445 if ($opt_max_size && $size > $opt_max_size) {
446 info("archive-iterator: skipping large message: ".
447 "%d lines, %d bytes, limit %d bytes",
448 scalar @msg, $size, $opt_max_size);
449 close INPUT or die "error closing input file: $!";
450 return;
451 }
452
453 if (!defined $header && /^\s*$/) {
454 $header = $#msg;
455 }
456 }
457 defined $_ || $!==0 or
458 $!==EBADF ? dbg("archive-iterator: error reading: $!")
459 : die "error reading: $!";
460 close INPUT or die "error closing input file: $!";
461
462 if ($date == AI_TIME_UNKNOWN && $self->{determine_receive_date}) {
463 $date = Mail::SpamAssassin::Util::receive_date(join('', splice(@msg, 0, $header)));
464 }
465
466 return($class, $format, $date, $where, &{$self->{wanted_sub}}($class, $where, $date, \@msg, $format));
467}
468
469sub _run_mbx {
470 my ($self, $class, $format, $where, $date) = @_;
471
472 my ($file, $offset) = ($where =~ m/(.*)\.(\d+)$/);
473 my @msg;
474 my $header;
475
476 if (!_mail_open($file)) {
477 $self->{access_problem} = 1;
478 return;
479 }
480
481 my $opt_max_size = $self->{opt_max_size};
482 dbg("archive-iterator: _run_mbx %s, ofs %d, limit %d",
483 $file, $offset, $opt_max_size||0);
484
485 seek(INPUT,$offset,0) or die "cannot reposition file to $offset: $!";
486
487 my $size = 0;
488 for ($!=0; <INPUT>; $!=0) {
489 last if ($_ =~ MBX_SEPARATOR);
490 $size += length($_);
491 push (@msg, $_);
492
493 # skip mails that are too big
494 if ($opt_max_size && $size > $opt_max_size) {
495 info("archive-iterator: skipping large message: ".
496 "%d lines, %d bytes, limit %d bytes",
497 scalar @msg, $size, $opt_max_size);
498 close INPUT or die "error closing input file: $!";
499 return;
500 }
501
502 if (!defined $header && /^\s*$/) {
503 $header = $#msg;
504 }
505 }
506 defined $_ || $!==0 or
507 $!==EBADF ? dbg("archive-iterator: error reading: $!")
508 : die "error reading: $!";
509 close INPUT or die "error closing input file: $!";
510
511 if ($date == AI_TIME_UNKNOWN && $self->{determine_receive_date}) {
512 $date = Mail::SpamAssassin::Util::receive_date(join('', splice(@msg, 0, $header)));
513 }
514
515 return($class, $format, $date, $where, &{$self->{wanted_sub}}($class, $where, $date, \@msg, $format));
516}
517
518############################################################################
519
520## FUNCTIONS BELOW THIS POINT ARE FOR FINDING THE MESSAGES TO RUN AGAINST
521
522############################################################################
523
524
# spent 91.2ms (338µs+90.8) within Mail::SpamAssassin::ArchiveIterator::_scan_targets which was called: # once (338µs+90.8ms) by Mail::SpamAssassin::ArchiveIterator::run at line 306
sub _scan_targets {
52512µs my ($self, $targets, $bkfunc) = @_;
526
52713µs %class_opts = ();
528
529116µs foreach my $target (@${targets}) {
53025µs if (!defined $target) {
531 warn "archive-iterator: invalid (undef) value in target list";
532 next;
533 }
534
53524µs my %opts;
53627µs if (ref $target eq 'HASH') {
537 # e.g. { target => $target, opt_foo => 1, opt_bar => 0.4 ... }
538 foreach my $k (keys %{$target}) {
539 if ($k =~ /^opt_/) {
540 $opts{$k} = $target->{$k};
541 }
542 }
543 $target = $target->{target};
544 }
545
546225µs my ($class, $format, $rawloc) = split(/:/, $target, 3);
547
548 # "class"
54925µs if (!defined $format) {
550 warn "archive-iterator: invalid (undef) format in target list, $target";
551 next;
552 }
553 # "class:format"
55424µs if (!defined $rawloc) {
555 warn "archive-iterator: invalid (undef) raw location in target list, $target";
556 next;
557 }
558
55925µs if ($rawloc eq '-') {
560 warn 'archive-iterator: raw location "-" is not supported';
561 next;
562 }
563
564 # use ham by default, things like "spamassassin" can't specify the type
565214µs $class = substr($class, 0, 1) || 'h';
566
567 # keep a copy of the most recent message-selection options for
568 # each class
569211µs $class_opts{$class} = \%opts;
570
571212µs foreach my $k (keys %opts) {
572 $self->{$k} = $opts{$k};
573 }
574219µs2175µs $self->_set_default_message_selection_opts();
# spent 175µs making 2 calls to Mail::SpamAssassin::ArchiveIterator::_set_default_message_selection_opts, avg 87µs/call
575
576223µs2266µs my @locations = $self->_fix_globs($rawloc);
# spent 266µs making 2 calls to Mail::SpamAssassin::ArchiveIterator::_fix_globs, avg 133µs/call
577
578221µs foreach my $location (@locations) {
57924µs my $method;
580
581 # for this location only; 'detect' means they can differ for each location
58225µs my $thisformat = $format;
583
584210µs if ($format eq 'detect') {
585 # detect the format
586299µs263µs my $stat_errn = stat($location) ? 0 : 0+$!;
# spent 63µs making 2 calls to Mail::SpamAssassin::ArchiveIterator::CORE:stat, avg 32µs/call
587242µs27µs if ($stat_errn == ENOENT) {
# spent 7µs making 2 calls to Mail::SpamAssassin::ArchiveIterator::CORE:ftdir, avg 3µs/call
588 $thisformat = 'file'; # actually, no file - to be detected later
589 }
590 elsif ($stat_errn != 0) {
591 warn "archive-iterator: no access to $location: $!";
592 $thisformat = 'file';
593 }
594 elsif (-d _) {
595 # it's a directory
59627µs $thisformat = 'dir';
597 }
598 elsif ($location =~ /\.mbox/i) {
599 # filename indicates mbox
600 $thisformat = 'mbox';
601 }
602 else {
603 $thisformat = 'file';
604 }
605 }
606
607211µs if ($thisformat eq 'dir') {
60827µs $method = \&_scan_directory;
609 }
610 elsif ($thisformat eq 'mbox') {
611 $method = \&_scan_mailbox;
612 }
613 elsif ($thisformat eq 'file') {
614 $method = \&_scan_file;
615 }
616 elsif ($thisformat eq 'mbx') {
617 $method = \&_scan_mbx;
618 }
619 else {
620 warn "archive-iterator: format $thisformat (from $format) unknown!";
621 next;
622 }
623
624 # call the appropriate method
625433µs290.3ms &{$method}($self, $class, $location, $bkfunc);
# spent 90.3ms making 2 calls to Mail::SpamAssassin::ArchiveIterator::_scan_directory, avg 45.2ms/call
626 }
627 }
628}
629
630
# spent 39.1ms (12.0+27.1) within Mail::SpamAssassin::ArchiveIterator::_mail_open which was called 239 times, avg 164µs/call: # 239 times (12.0ms+27.1ms) by Mail::SpamAssassin::ArchiveIterator::_run_file at line 354, avg 164µs/call
sub _mail_open {
631239673µs my ($file) = @_;
632
633 # bug 5288: the "magic" version of open will strip leading and trailing
634 # whitespace from the expression. switch to the three-argument version
635 # of open which does not strip whitespace. see "perldoc -f open" and
636 # "perldoc perlipc" for more information.
637
638 # Assume that the file by default is just a plain file
6392391.07ms my @expr = ( $file );
640239681µs my $mode = '<';
641
642 # Handle different types of compressed files
6432394.88ms4782.00ms if ($file =~ /\.gz$/) {
# spent 2.00ms making 478 calls to Mail::SpamAssassin::ArchiveIterator::CORE:match, avg 4µs/call
644 $mode = '-|';
645 unshift @expr, 'gunzip', '-cd';
646 }
647 elsif ($file =~ /\.bz2$/) {
648 $mode = '-|';
649 unshift @expr, 'bzip2', '-cd';
650 }
651
652 # Go ahead and try to open the file
65323926.6ms23924.0ms if (!open (INPUT, $mode, @expr)) {
# spent 24.0ms making 239 calls to Mail::SpamAssassin::ArchiveIterator::CORE:open, avg 101µs/call
654 warn "archive-iterator: unable to open $file: $!\n";
655 return 0;
656 }
657
658 # bug 5249: mail could have 8-bit data, need this on some platforms
6592393.00ms2391.04ms binmode INPUT or die "cannot set input file to binmode: $!";
# spent 1.04ms making 239 calls to Mail::SpamAssassin::ArchiveIterator::CORE:binmode, avg 4µs/call
660
6612392.80ms return 1;
662}
663
664
# spent 175µs (141+34) within Mail::SpamAssassin::ArchiveIterator::_set_default_message_selection_opts which was called 2 times, avg 87µs/call: # 2 times (141µs+34µs) by Mail::SpamAssassin::ArchiveIterator::_scan_targets at line 574, avg 87µs/call
sub _set_default_message_selection_opts {
66524µs my ($self) = @_;
666
66728µs $self->{opt_scanprob} = 1.0 unless (defined $self->{opt_scanprob});
66826µs $self->{opt_want_date} = 1 unless (defined $self->{opt_want_date});
66926µs $self->{opt_cache} = 0 unless (defined $self->{opt_cache});
670 #Changed Regex to include boundaries for Communigate Pro versions (5.2.x and later). per Bug 6413
67127µs $self->{opt_from_regex} = '^From \S+ ?(\S\S\S \S\S\S .\d .\d:\d\d:\d\d \d{4}|.\d-\d\d-\d{4}_\d\d:\d\d:\d\d_)' unless (defined $self->{opt_from_regex});
672
673 #STRIP LEADING AND TRAILING / FROM REGEX FOR OPTION
674233µs29µs $self->{opt_from_regex} =~ s/^\///;
# spent 9µs making 2 calls to Mail::SpamAssassin::ArchiveIterator::CORE:subst, avg 5µs/call
675223µs26µs $self->{opt_from_regex} =~ s/\/$//;
# spent 6µs making 2 calls to Mail::SpamAssassin::ArchiveIterator::CORE:subst, avg 3µs/call
676
677266µs219µs dbg("archive-iterator: _set_default_message_selection_opts After: Scanprob[$self->{opt_scanprob}], want_date[$self->{opt_want_date}], cache[$self->{opt_cache}], from_regex[$self->{opt_from_regex}]");
# spent 19µs making 2 calls to Mail::SpamAssassin::Logger::dbg, avg 9µs/call
678
679}
680
681############################################################################
682
683sub _message_is_useful_by_date {
684 my ($self, $date) = @_;
685
686 if (!$self->{opt_after} && !$self->{opt_before}) {
687 # Not using the feature
688 return 1;
689 }
690
691 return 0 unless $date; # undef or 0 date = unusable
692
693 if (!$self->{opt_before}) {
694 # Just care about after
695 return $date > $self->{opt_after};
696 }
697 else {
698 return (($date < $self->{opt_before}) && ($date > $self->{opt_after}));
699 }
700}
701
702# additional check, based solely on a file's mod timestamp. we cannot
703# make assumptions about --before, since the file may have been "touch"ed
704# since the last message was appended; but we can assume that too-old
705# files cannot contain messages newer than their modtime.
706
# spent 3.49ms within Mail::SpamAssassin::ArchiveIterator::_message_is_useful_by_file_modtime which was called 239 times, avg 15µs/call: # 239 times (3.49ms+0s) by Mail::SpamAssassin::ArchiveIterator::_scan_file at line 834, avg 15µs/call
sub _message_is_useful_by_file_modtime {
707239833µs my ($self, $date) = @_;
708
709 # better safe than sorry, if date is undef; let other stuff catch errors
710239406µs return 1 unless $date;
711
712239482µs if ($self->{opt_after}) {
713 return ($date > $self->{opt_after});
714 }
715 else {
7162393.26ms return 1; # --after not in use
717 }
718}
719
720sub _scanprob_says_scan {
721 my ($self) = @_;
722 if (defined $self->{opt_scanprob} && $self->{opt_scanprob} < 1.0) {
723 if ( int( rand( 1 / $self->{opt_scanprob} ) ) != 0 ) {
724 return 0;
725 }
726 }
727 return 1;
728}
729
730############################################################################
731
732# 0 850852128 atime
733# 1 h class
734# 2 m format
735# 3 ./ham/goodmsgs.0 path
736
737# put the date in first, big-endian packed format
738# this format lets cmp easily sort by date, then class, format, and path.
739
# spent 5.41ms (2.51+2.90) within Mail::SpamAssassin::ArchiveIterator::_index_pack which was called 239 times, avg 23µs/call: # 239 times (2.51ms+2.90ms) by Mail::SpamAssassin::ArchiveIterator::__ANON__[/usr/local/lib/perl5/site_perl/Mail/SpamAssassin/ArchiveIterator.pm:305] at line 304, avg 23µs/call
sub _index_pack {
7402396.07ms2392.90ms return pack("NAAA*", @_);
# spent 2.90ms making 239 calls to Mail::SpamAssassin::ArchiveIterator::CORE:pack, avg 12µs/call
741}
742
743
# spent 6.61ms (4.25+2.36) within Mail::SpamAssassin::ArchiveIterator::_index_unpack which was called 239 times, avg 28µs/call: # 239 times (4.25ms+2.36ms) by Mail::SpamAssassin::ArchiveIterator::_run_message at line 338, avg 28µs/call
sub _index_unpack {
7442397.06ms2392.36ms return unpack("NAAA*", $_[0]);
# spent 2.36ms making 239 calls to Mail::SpamAssassin::ArchiveIterator::CORE:unpack, avg 10µs/call
745}
746
747############################################################################
748
749
# spent 90.3ms (14.1+76.2) within Mail::SpamAssassin::ArchiveIterator::_scan_directory which was called 2 times, avg 45.2ms/call: # 2 times (14.1ms+76.2ms) by Mail::SpamAssassin::ArchiveIterator::_scan_targets at line 625, avg 45.2ms/call
sub _scan_directory {
750220µs my ($self, $class, $folder, $bkfunc) = @_;
751
75224µs my(@files,@subdirs);
753
7542183µs4123µs if (-d "$folder/new" && -d "$folder/cur" && -d "$folder/tmp") {
# spent 81µs making 2 calls to Mail::SpamAssassin::ArchiveIterator::CORE:ftdir, avg 40µs/call # spent 42µs making 2 calls to Mail::SpamAssassin::ArchiveIterator::CORE:ftfile, avg 21µs/call
755 # Maildir format: bug 3003
756 for my $sub ("new", "cur") {
757 opendir (DIR, "$folder/$sub")
758 or die "Can't open '$folder/$sub' dir: $!\n";
759 # Don't learn from messages marked as deleted
760 # Or files starting with a leading dot
761 push @files, map { "$sub/$_" } grep { !/^\.|:2,.*T/ } readdir(DIR);
762 closedir(DIR) or die "error closing directory $folder: $!";
763 }
764 }
765 elsif (-f "$folder/cyrus.header") {
766 opendir(DIR, $folder)
767 or die "archive-iterator: can't open '$folder' dir: $!\n";
768
769 # cyrus metadata: http://unix.lsa.umich.edu/docs/imap/imap-lsa-srv_3.html
770 @files = grep { $_ ne '.' && $_ ne '..' &&
771 /^\S+$/ && !/^cyrus\.(?:index|header|cache|seen)/ }
772 readdir(DIR);
773 closedir(DIR) or die "error closing directory $folder: $!";
774 }
775 else {
7762126µs294µs opendir(DIR, $folder)
# spent 94µs making 2 calls to Mail::SpamAssassin::ArchiveIterator::CORE:open_dir, avg 47µs/call
777 or die "archive-iterator: can't open '$folder' dir: $!\n";
778
779 # ignore ,234 (deleted or refiled messages) and MH metadata dotfiles
7802455.46ms2451.22ms @files = grep { !/^[,.]/ } readdir(DIR);
# spent 664µs making 2 calls to Mail::SpamAssassin::ArchiveIterator::CORE:readdir, avg 332µs/call # spent 559µs making 243 calls to Mail::SpamAssassin::ArchiveIterator::CORE:match, avg 2µs/call
781242µs220µs closedir(DIR) or die "error closing directory $folder: $!";
# spent 20µs making 2 calls to Mail::SpamAssassin::ArchiveIterator::CORE:closedir, avg 10µs/call
782 }
783
7842926µs $_ = "$folder/$_" for @files;
785
78624µs if (!@files) {
787 # this is not a problem; no need to warn about it
788 # warn "archive-iterator: readdir found no mail in '$folder' directory\n";
789111µs return;
790 }
791
792110µs114µs $self->_create_cache('dir', $folder);
793
79414µs foreach my $file (@files) {
79523912.1ms23910.1ms my $stat_errn = stat($file) ? 0 : 0+$!;
# spent 10.1ms making 239 calls to Mail::SpamAssassin::ArchiveIterator::CORE:stat, avg 42µs/call
7962394.18ms239414µs if ($stat_errn == ENOENT) {
# spent 414µs making 239 calls to Mail::SpamAssassin::ArchiveIterator::CORE:ftfile, avg 2µs/call
797 # no longer there?
798 }
799 elsif ($stat_errn != 0) {
800 warn "archive-iterator: no access to $file: $!";
801 }
802 elsif (-f _ || -c _ || -p _) {
8032391.63ms23964.3ms $self->_scan_file($class, $file, $bkfunc);
# spent 64.3ms making 239 calls to Mail::SpamAssassin::ArchiveIterator::_scan_file, avg 269µs/call
804 }
805 elsif (-d _) {
806 push(@subdirs, $file);
807 }
808 else {
809 warn "archive-iterator: $file is not a plain file or directory: $!";
810 }
811 }
8121150µs undef @files; # release storage
813
814 # recurse into directories
81515µs foreach my $dir (@subdirs) {
816 $self->_scan_directory($class, $dir, $bkfunc);
817 }
818
819120µs if (defined $AICache) {
820 $AICache = $AICache->finish();
821 }
822}
823
824
# spent 64.3ms (42.4+21.9) within Mail::SpamAssassin::ArchiveIterator::_scan_file which was called 239 times, avg 269µs/call: # 239 times (42.4ms+21.9ms) by Mail::SpamAssassin::ArchiveIterator::_scan_directory at line 803, avg 269µs/call
sub _scan_file {
8252392.77ms my ($self, $class, $mail, $bkfunc) = @_;
826
8272391.38ms2391.98ms $self->_bump_scan_progress();
# spent 1.98ms making 239 calls to Mail::SpamAssassin::ArchiveIterator::_bump_scan_progress, avg 8µs/call
828
829 # only perform these stat() operations if we're not using a cache;
830 # it's faster to perform lookups in the cache, and more accurate
831239795µs if (!defined $AICache) {
83223920.8ms2395.55ms my @s = stat($mail);
# spent 5.55ms making 239 calls to Mail::SpamAssassin::ArchiveIterator::CORE:stat, avg 23µs/call
833239426µs @s or warn "archive-iterator: no access to $mail: $!";
8342392.31ms2393.49ms return unless $self->_message_is_useful_by_file_modtime($s[9]);
# spent 3.49ms making 239 calls to Mail::SpamAssassin::ArchiveIterator::_message_is_useful_by_file_modtime, avg 15µs/call
835 }
836
837239456µs my $date = AI_TIME_UNKNOWN;
838239785µs if ($self->{determine_receive_date}) {
839 unless (defined $AICache and $date = $AICache->check($mail)) {
840 # silently skip directories/non-files; some folders may
841 # contain extraneous dirs etc.
842 my $stat_errn = stat($mail) ? 0 : 0+$!;
843 if ($stat_errn != 0) {
844 warn "archive-iterator: no access to $mail: $!";
845 return;
846 }
847 elsif (!-f _) {
848 return;
849 }
850
851 my $header = '';
852 if (!_mail_open($mail)) {
853 $self->{access_problem} = 1;
854 return;
855 }
856 for ($!=0; <INPUT>; $!=0) {
857 last if /^\015?$/s;
858 $header .= $_;
859 }
860 defined $_ || $!==0 or
861 $!==EBADF ? dbg("archive-iterator: error reading: $!")
862 : die "error reading: $!";
863 close INPUT or die "error closing input file: $!";
864
865 return if ($self->{opt_skip_empty_messages} && $header eq '');
866
867 $date = Mail::SpamAssassin::Util::receive_date($header);
868 if (defined $AICache) {
869 $AICache->update($mail, $date);
870 }
871 }
872
873 return if !$self->_message_is_useful_by_date($date);
874 return if !$self->_scanprob_says_scan();
875 }
876 else {
877239390µs return if ($self->{opt_skip_empty_messages} && (-z $mail));
878 }
879
8804782.43ms23910.9ms &{$bkfunc}($self, $date, $class, 'f', $mail);
881
8822392.49ms return;
883}
884
885sub _scan_mailbox {
886 my ($self, $class, $folder, $bkfunc) = @_;
887 my @files;
888
889 my $stat_errn = stat($folder) ? 0 : 0+$!;
890 if ($stat_errn == ENOENT) {
891 # no longer there?
892 }
893 elsif ($stat_errn != 0) {
894 warn "archive-iterator: no access to $folder: $!";
895 }
896 elsif (-f _) {
897 push(@files, $folder);
898 }
899 elsif (-d _) {
900 # passed a directory of mboxes
901 $folder =~ s/\/\s*$//; #Remove trailing slash, if there
902 if (!opendir(DIR, $folder)) {
903 warn "archive-iterator: can't open '$folder' dir: $!\n";
904 $self->{access_problem} = 1;
905 return;
906 }
907 while ($_ = readdir(DIR)) {
908 next if $_ eq '.' || $_ eq '..' || !/^[^\.]\S*$/;
909 # hmmm, ignores folders with spaces in the name???
910 $stat_errn = stat("$folder/$_") ? 0 : 0+$!;
911 if ($stat_errn == ENOENT) {
912 # no longer there?
913 }
914 elsif ($stat_errn != 0) {
915 warn "archive-iterator: no access to $folder/$_: $!";
916 }
917 elsif (-f _) {
918 push(@files, "$folder/$_");
919 }
920 }
921 closedir(DIR) or die "error closing directory $folder: $!";
922 }
923 else {
924 warn "archive-iterator: $folder is not a plain file or directory: $!";
925 }
926
927 foreach my $file (@files) {
928 $self->_bump_scan_progress();
929 if ($file =~ /\.(?:gz|bz2)$/) {
930 warn "archive-iterator: compressed mbox folders are not supported at this time\n";
931 $self->{access_problem} = 1;
932 next;
933 }
934
935 my @s = stat($file);
936 @s or warn "archive-iterator: no access to $file: $!";
937 next unless $self->_message_is_useful_by_file_modtime($s[9]);
938
939 my $info = {};
940 my $count;
941
942 $self->_create_cache('mbox', $file);
943
944 if ($self->{opt_cache}) {
945 if ($count = $AICache->count()) {
946 $info = $AICache->check();
947 }
948 }
949
950 unless ($count) {
951 if (!_mail_open($file)) {
952 $self->{access_problem} = 1;
953 next;
954 }
955
956 my $start = 0; # start of a message
957 my $where = 0; # current byte offset
958 my $first = ''; # first line of message
959 my $header = ''; # header text
960 my $in_header = 0; # are in we a header?
961 while (!eof INPUT) {
962 my $offset = $start; # byte offset of this message
963 my $header = $first; # remember first line
964 for ($!=0; <INPUT>; $!=0) {
965 if ($in_header) {
966 if (/^\015?$/s) {
967 $in_header = 0;
968 }
969 else {
970 $header .= $_;
971 }
972 }
973 #Changed Regex to use option Per bug 6703
974 if (substr($_,0,5) eq "From " && /$self->{opt_from_regex}/o) {
975 $in_header = 1;
976 $first = $_;
977 $start = $where;
978 $where = tell INPUT;
979 $where >= 0 or die "cannot obtain file position: $!";
980 last;
981 }
982 $where = tell INPUT;
983 $where >= 0 or die "cannot obtain file position: $!";
984 }
985 defined $_ || $!==0 or
986 $!==EBADF ? dbg("archive-iterator: error reading: $!")
987 : die "error reading: $!";
988 if ($header ne '') {
989 # next if ($self->{opt_skip_empty_messages} && $header eq '');
990 $self->_bump_scan_progress();
991 $info->{$offset} = Mail::SpamAssassin::Util::receive_date($header);
992 }
993 }
994 close INPUT or die "error closing input file: $!";
995 }
996
997 while(my($k,$v) = each %{$info}) {
998 if (defined $AICache && !$count) {
999 $AICache->update($k, $v);
1000 }
1001
1002 if ($self->{determine_receive_date}) {
1003 next if !$self->_message_is_useful_by_date($v);
1004 }
1005 next if !$self->_scanprob_says_scan();
1006
1007 &{$bkfunc}($self, $v, $class, 'm', "$file.$k");
1008 }
1009
1010 if (defined $AICache) {
1011 $AICache = $AICache->finish();
1012 }
1013 }
1014}
1015
1016sub _scan_mbx {
1017 my ($self, $class, $folder, $bkfunc) = @_;
1018 my (@files, $fp);
1019
1020 my $stat_errn = stat($folder) ? 0 : 0+$!;
1021 if ($stat_errn == ENOENT) {
1022 # no longer there?
1023 }
1024 elsif ($stat_errn != 0) {
1025 warn "archive-iterator: no access to $folder: $!";
1026 }
1027 elsif (-f _) {
1028 push(@files, $folder);
1029 }
1030 elsif (-d _) {
1031 # got passed a directory full of mbx folders.
1032 $folder =~ s/\/\s*$//; # remove trailing slash, if there is one
1033 if (!opendir(DIR, $folder)) {
1034 warn "archive-iterator: can't open '$folder' dir: $!\n";
1035 $self->{access_problem} = 1;
1036 return;
1037 }
1038 while ($_ = readdir(DIR)) {
1039 next if $_ eq '.' || $_ eq '..' || !/^[^\.]\S*$/;
1040 # hmmm, ignores folders with spaces in the name???
1041 $stat_errn = stat("$folder/$_") ? 0 : 0+$!;
1042 if ($stat_errn == ENOENT) {
1043 # no longer there?
1044 }
1045 elsif ($stat_errn != 0) {
1046 warn "archive-iterator: no access to $folder/$_: $!";
1047 }
1048 elsif (-f _) {
1049 push(@files, "$folder/$_");
1050 }
1051 }
1052 closedir(DIR) or die "error closing directory $folder: $!";
1053 }
1054 else {
1055 warn "archive-iterator: $folder is not a plain file or directory: $!";
1056 }
1057
1058 foreach my $file (@files) {
1059 $self->_bump_scan_progress();
1060
1061 if ($folder =~ /\.(?:gz|bz2)$/) {
1062 warn "archive-iterator: compressed mbx folders are not supported at this time\n";
1063 $self->{access_problem} = 1;
1064 next;
1065 }
1066
1067 my @s = stat($file);
1068 @s or warn "archive-iterator: no access to $file: $!";
1069 next unless $self->_message_is_useful_by_file_modtime($s[9]);
1070
1071 my $info = {};
1072 my $count;
1073
1074 $self->_create_cache('mbx', $file);
1075
1076 if ($self->{opt_cache}) {
1077 if ($count = $AICache->count()) {
1078 $info = $AICache->check();
1079 }
1080 }
1081
1082 unless ($count) {
1083 if (!_mail_open($file)) {
1084 $self->{access_problem} = 1;
1085 next;
1086 }
1087
1088 # check the mailbox is in mbx format
1089 $! = 0; $fp = <INPUT>;
1090 defined $fp || $!==0 or
1091 $!==EBADF ? dbg("archive-iterator: error reading: $!")
1092 : die "error reading: $!";
1093 if (!defined $fp) {
1094 die "archive-iterator: error: mailbox not in mbx format - empty!\n";
1095 } elsif ($fp !~ /\*mbx\*/) {
1096 die "archive-iterator: error: mailbox not in mbx format!\n";
1097 }
1098
1099 # skip mbx headers to the first email...
1100 seek(INPUT,2048,0) or die "cannot reposition file to 2048: $!";
1101 my $sep = MBX_SEPARATOR;
1102
1103 for ($!=0; <INPUT>; $!=0) {
1104 if ($_ =~ /$sep/) {
1105 my $offset = tell INPUT;
1106 $offset >= 0 or die "cannot obtain file position: $!";
1107 my $size = $2;
1108
1109 # gather up the headers...
1110 my $header = '';
1111 for ($!=0; <INPUT>; $!=0) {
1112 last if (/^\015?$/s);
1113 $header .= $_;
1114 }
1115 defined $_ || $!==0 or
1116 $!==EBADF ? dbg("archive-iterator: error reading: $!")
1117 : die "error reading: $!";
1118 if (!($self->{opt_skip_empty_messages} && $header eq '')) {
1119 $self->_bump_scan_progress();
1120 $info->{$offset} = Mail::SpamAssassin::Util::receive_date($header);
1121 }
1122
1123 # go onto the next message
1124 seek(INPUT, $offset + $size, 0)
1125 or die "cannot reposition file to $offset + $size: $!";
1126 }
1127 else {
1128 die "archive-iterator: error: failure to read message body!\n";
1129 }
1130 }
1131 defined $_ || $!==0 or
1132 $!==EBADF ? dbg("archive-iterator: error reading: $!")
1133 : die "error reading: $!";
1134 close INPUT or die "error closing input file: $!";
1135 }
1136
1137 while(my($k,$v) = each %{$info}) {
1138 if (defined $AICache && !$count) {
1139 $AICache->update($k, $v);
1140 }
1141
1142 if ($self->{determine_receive_date}) {
1143 next if !$self->_message_is_useful_by_date($v);
1144 }
1145 next if !$self->_scanprob_says_scan();
1146
1147 &{$bkfunc}($self, $v, $class, 'b', "$file.$k");
1148 }
1149
1150 if (defined $AICache) {
1151 $AICache = $AICache->finish();
1152 }
1153 }
1154}
1155
1156############################################################################
1157
1158
# spent 1.98ms within Mail::SpamAssassin::ArchiveIterator::_bump_scan_progress which was called 239 times, avg 8µs/call: # 239 times (1.98ms+0s) by Mail::SpamAssassin::ArchiveIterator::_scan_file at line 827, avg 8µs/call
sub _bump_scan_progress {
1159239416µs my ($self) = @_;
11602392.99ms if (exists $self->{scan_progress_sub}) {
1161 return unless ($self->{scan_progress_counter}++ % 50 == 0);
1162 $self->{scan_progress_sub}->();
1163 }
1164}
1165
1166############################################################################
1167
1168{
116912µs my $home;
1170
1171
# spent 266µs (101+165) within Mail::SpamAssassin::ArchiveIterator::_fix_globs which was called 2 times, avg 133µs/call: # 2 times (101µs+165µs) by Mail::SpamAssassin::ArchiveIterator::_scan_targets at line 576, avg 133µs/call
sub _fix_globs {
117229µs my ($self, $path) = @_;
1173
117426µs unless (defined $home) {
117516µs $home = $ENV{'HOME'};
1176
1177 # No $HOME set? Try to find it, portably.
117812µs unless ($home) {
1179 if (!Mail::SpamAssassin::Util::am_running_on_windows()) {
1180 $home = (Mail::SpamAssassin::Util::portable_getpwuid($<))[7];
1181 } else {
1182 my $vol = $ENV{'HOMEDRIVE'} || 'C:';
1183 my $dir = $ENV{'HOMEPATH'} || '\\';
1184 $home = File::Spec->catpath($vol, $dir, '');
1185 }
1186
1187 # Fall back to no replacement at all.
1188 $home ||= '~';
1189 }
1190 }
1191221µs26µs $path =~ s,^~/,${home}/,;
# spent 6µs making 2 calls to Mail::SpamAssassin::ArchiveIterator::CORE:subst, avg 3µs/call
1192
1193 # protect/escape spaces: ./Mail/My Letters => ./Mail/My\ Letters
1194231µs210µs $path =~ s/(?<!\\)(\s)/\\$1/g;
# spent 10µs making 2 calls to Mail::SpamAssassin::ArchiveIterator::CORE:subst, avg 5µs/call
1195
1196 # return csh-style globs: ./corpus/*.mbox => er, you know what it does ;)
11972196µs2150µs return glob($path);
# spent 150µs making 2 calls to Mail::SpamAssassin::ArchiveIterator::CORE:glob, avg 75µs/call
1198 }
1199}
1200
120114µs
# spent 14µs within Mail::SpamAssassin::ArchiveIterator::_create_cache which was called: # once (14µs+0s) by Mail::SpamAssassin::ArchiveIterator::_scan_directory at line 792
sub _create_cache {
120215µs my ($self, $type, $path) = @_;
1203
1204112µs if ($self->{opt_cache}) {
1205 $AICache = Mail::SpamAssassin::AICache->new({
1206 'type' => $type,
1207 'prefix' => $self->{opt_cachedir},
1208 'path' => $path,
1209 });
1210 }
1211}
1212
1213############################################################################
1214
1215114µs1;
1216
1217__END__
 
# spent 1.04ms within Mail::SpamAssassin::ArchiveIterator::CORE:binmode which was called 239 times, avg 4µs/call: # 239 times (1.04ms+0s) by Mail::SpamAssassin::ArchiveIterator::_mail_open at line 659, avg 4µs/call
sub Mail::SpamAssassin::ArchiveIterator::CORE:binmode; # opcode
# spent 2.39ms within Mail::SpamAssassin::ArchiveIterator::CORE:close which was called 239 times, avg 10µs/call: # 234 times (2.36ms+0s) by Mail::SpamAssassin::ArchiveIterator::_run_file at line 408, avg 10µs/call # 5 times (38µs+0s) by Mail::SpamAssassin::ArchiveIterator::_run_file at line 383, avg 8µs/call
sub Mail::SpamAssassin::ArchiveIterator::CORE:close; # opcode
# spent 20µs within Mail::SpamAssassin::ArchiveIterator::CORE:closedir which was called 2 times, avg 10µs/call: # 2 times (20µs+0s) by Mail::SpamAssassin::ArchiveIterator::_scan_directory at line 781, avg 10µs/call
sub Mail::SpamAssassin::ArchiveIterator::CORE:closedir; # opcode
# spent 87µs within Mail::SpamAssassin::ArchiveIterator::CORE:ftdir which was called 4 times, avg 22µs/call: # 2 times (81µs+0s) by Mail::SpamAssassin::ArchiveIterator::_scan_directory at line 754, avg 40µs/call # 2 times (7µs+0s) by Mail::SpamAssassin::ArchiveIterator::_scan_targets at line 587, avg 3µs/call
sub Mail::SpamAssassin::ArchiveIterator::CORE:ftdir; # opcode
# spent 1.71ms within Mail::SpamAssassin::ArchiveIterator::CORE:ftfile which was called 719 times, avg 2µs/call: # 239 times (873µs+0s) by Mail::SpamAssassin::ArchiveIterator::_run_file at line 360, avg 4µs/call # 239 times (414µs+0s) by Mail::SpamAssassin::ArchiveIterator::_scan_directory at line 796, avg 2µs/call # 239 times (386µs+0s) by Mail::SpamAssassin::ArchiveIterator::_run_file at line 374, avg 2µs/call # 2 times (42µs+0s) by Mail::SpamAssassin::ArchiveIterator::_scan_directory at line 754, avg 21µs/call
sub Mail::SpamAssassin::ArchiveIterator::CORE:ftfile; # opcode
# spent 816µs within Mail::SpamAssassin::ArchiveIterator::CORE:ftsize which was called 244 times, avg 3µs/call: # 239 times (808µs+0s) by Mail::SpamAssassin::ArchiveIterator::_run_file at line 374, avg 3µs/call # 5 times (8µs+0s) by Mail::SpamAssassin::ArchiveIterator::_run_file at line 381, avg 2µs/call
sub Mail::SpamAssassin::ArchiveIterator::CORE:ftsize; # opcode
# spent 150µs within Mail::SpamAssassin::ArchiveIterator::CORE:glob which was called 2 times, avg 75µs/call: # 2 times (150µs+0s) by Mail::SpamAssassin::ArchiveIterator::_fix_globs at line 1197, avg 75µs/call
sub Mail::SpamAssassin::ArchiveIterator::CORE:glob; # opcode
# spent 39.2ms within Mail::SpamAssassin::ArchiveIterator::CORE:match which was called 14208 times, avg 3µs/call: # 13487 times (36.6ms+0s) by Mail::SpamAssassin::ArchiveIterator::_run_file at line 406, avg 3µs/call # 478 times (2.00ms+0s) by Mail::SpamAssassin::ArchiveIterator::_mail_open at line 643, avg 4µs/call # 243 times (559µs+0s) by Mail::SpamAssassin::ArchiveIterator::_scan_directory at line 780, avg 2µs/call
sub Mail::SpamAssassin::ArchiveIterator::CORE:match; # opcode
# spent 24.0ms within Mail::SpamAssassin::ArchiveIterator::CORE:open which was called 239 times, avg 101µs/call: # 239 times (24.0ms+0s) by Mail::SpamAssassin::ArchiveIterator::_mail_open at line 653, avg 101µs/call
sub Mail::SpamAssassin::ArchiveIterator::CORE:open; # opcode
# spent 94µs within Mail::SpamAssassin::ArchiveIterator::CORE:open_dir which was called 2 times, avg 47µs/call: # 2 times (94µs+0s) by Mail::SpamAssassin::ArchiveIterator::_scan_directory at line 776, avg 47µs/call
sub Mail::SpamAssassin::ArchiveIterator::CORE:open_dir; # opcode
# spent 2.90ms within Mail::SpamAssassin::ArchiveIterator::CORE:pack which was called 239 times, avg 12µs/call: # 239 times (2.90ms+0s) by Mail::SpamAssassin::ArchiveIterator::_index_pack at line 740, avg 12µs/call
sub Mail::SpamAssassin::ArchiveIterator::CORE:pack; # opcode
# spent 19.4ms within Mail::SpamAssassin::ArchiveIterator::CORE:read which was called 690 times, avg 28µs/call: # 456 times (7.41ms+0s) by Mail::SpamAssassin::ArchiveIterator::_run_file at line 400, avg 16µs/call # 234 times (12.0ms+0s) by Mail::SpamAssassin::ArchiveIterator::_run_file at line 392, avg 51µs/call
sub Mail::SpamAssassin::ArchiveIterator::CORE:read; # opcode
# spent 664µs within Mail::SpamAssassin::ArchiveIterator::CORE:readdir which was called 2 times, avg 332µs/call: # 2 times (664µs+0s) by Mail::SpamAssassin::ArchiveIterator::_scan_directory at line 780, avg 332µs/call
sub Mail::SpamAssassin::ArchiveIterator::CORE:readdir; # opcode
# spent 17.0ms within Mail::SpamAssassin::ArchiveIterator::CORE:stat which was called 719 times, avg 24µs/call: # 239 times (10.1ms+0s) by Mail::SpamAssassin::ArchiveIterator::_scan_directory at line 795, avg 42µs/call # 239 times (5.55ms+0s) by Mail::SpamAssassin::ArchiveIterator::_scan_file at line 832, avg 23µs/call # 239 times (1.38ms+0s) by Mail::SpamAssassin::ArchiveIterator::_run_file at line 359, avg 6µs/call # 2 times (63µs+0s) by Mail::SpamAssassin::ArchiveIterator::_scan_targets at line 586, avg 32µs/call
sub Mail::SpamAssassin::ArchiveIterator::CORE:stat; # opcode
# spent 31µs within Mail::SpamAssassin::ArchiveIterator::CORE:subst which was called 8 times, avg 4µs/call: # 2 times (10µs+0s) by Mail::SpamAssassin::ArchiveIterator::_fix_globs at line 1194, avg 5µs/call # 2 times (9µs+0s) by Mail::SpamAssassin::ArchiveIterator::_set_default_message_selection_opts at line 674, avg 5µs/call # 2 times (6µs+0s) by Mail::SpamAssassin::ArchiveIterator::_set_default_message_selection_opts at line 675, avg 3µs/call # 2 times (6µs+0s) by Mail::SpamAssassin::ArchiveIterator::_fix_globs at line 1191, avg 3µs/call
sub Mail::SpamAssassin::ArchiveIterator::CORE:subst; # opcode
# spent 2.36ms within Mail::SpamAssassin::ArchiveIterator::CORE:unpack which was called 239 times, avg 10µs/call: # 239 times (2.36ms+0s) by Mail::SpamAssassin::ArchiveIterator::_index_unpack at line 744, avg 10µs/call
sub Mail::SpamAssassin::ArchiveIterator::CORE:unpack; # opcode