Filename | /usr/local/lib/perl5/site_perl/Mail/SpamAssassin/ArchiveIterator.pm |
Statements | Executed 30244 statements in 617ms |
Calls | P | F | Exclusive Time |
Inclusive Time |
Subroutine |
---|---|---|---|---|---|
239 | 1 | 1 | 322ms | 96438s | _run_file | Mail::SpamAssassin::ArchiveIterator::
239 | 1 | 1 | 72.7ms | 96438s | _run_message | Mail::SpamAssassin::ArchiveIterator::
239 | 1 | 1 | 42.4ms | 64.3ms | _scan_file | Mail::SpamAssassin::ArchiveIterator::
14208 | 3 | 1 | 39.2ms | 39.2ms | CORE:match (opcode) | Mail::SpamAssassin::ArchiveIterator::
239 | 1 | 1 | 24.0ms | 24.0ms | CORE:open (opcode) | Mail::SpamAssassin::ArchiveIterator::
690 | 2 | 1 | 19.4ms | 19.4ms | CORE:read (opcode) | Mail::SpamAssassin::ArchiveIterator::
719 | 4 | 1 | 17.0ms | 17.0ms | CORE:stat (opcode) | Mail::SpamAssassin::ArchiveIterator::
2 | 1 | 1 | 14.1ms | 90.3ms | _scan_directory | Mail::SpamAssassin::ArchiveIterator::
1 | 1 | 1 | 12.2ms | 96438s | _run | Mail::SpamAssassin::ArchiveIterator::
239 | 1 | 1 | 12.0ms | 39.1ms | _mail_open | Mail::SpamAssassin::ArchiveIterator::
239 | 1 | 1 | 5.48ms | 10.9ms | __ANON__[:305] | Mail::SpamAssassin::ArchiveIterator::
239 | 1 | 1 | 4.25ms | 6.61ms | _index_unpack | Mail::SpamAssassin::ArchiveIterator::
239 | 1 | 1 | 3.49ms | 3.49ms | _message_is_useful_by_file_modtime | Mail::SpamAssassin::ArchiveIterator::
239 | 1 | 1 | 2.90ms | 2.90ms | CORE:pack (opcode) | Mail::SpamAssassin::ArchiveIterator::
239 | 1 | 1 | 2.51ms | 5.41ms | _index_pack | Mail::SpamAssassin::ArchiveIterator::
239 | 2 | 1 | 2.39ms | 2.39ms | CORE:close (opcode) | Mail::SpamAssassin::ArchiveIterator::
239 | 1 | 1 | 2.36ms | 2.36ms | CORE:unpack (opcode) | Mail::SpamAssassin::ArchiveIterator::
1 | 1 | 1 | 2.30ms | 3.00ms | BEGIN@31 | Mail::SpamAssassin::ArchiveIterator::
239 | 1 | 1 | 1.98ms | 1.98ms | _bump_scan_progress | Mail::SpamAssassin::ArchiveIterator::
719 | 4 | 1 | 1.71ms | 1.71ms | CORE:ftfile (opcode) | Mail::SpamAssassin::ArchiveIterator::
239 | 1 | 1 | 1.04ms | 1.04ms | CORE:binmode (opcode) | Mail::SpamAssassin::ArchiveIterator::
244 | 2 | 1 | 816µs | 816µs | CORE:ftsize (opcode) | Mail::SpamAssassin::ArchiveIterator::
2 | 1 | 1 | 664µs | 664µs | CORE:readdir (opcode) | Mail::SpamAssassin::ArchiveIterator::
1 | 1 | 1 | 338µs | 91.2ms | _scan_targets | Mail::SpamAssassin::ArchiveIterator::
2 | 1 | 1 | 150µs | 150µs | CORE:glob (opcode) | Mail::SpamAssassin::ArchiveIterator::
2 | 1 | 1 | 141µs | 175µs | _set_default_message_selection_opts | Mail::SpamAssassin::ArchiveIterator::
1 | 1 | 1 | 102µs | 96438s | run | Mail::SpamAssassin::ArchiveIterator::
2 | 1 | 1 | 101µs | 266µs | _fix_globs | Mail::SpamAssassin::ArchiveIterator::
2 | 1 | 1 | 94µs | 94µs | CORE:open_dir (opcode) | Mail::SpamAssassin::ArchiveIterator::
4 | 2 | 1 | 87µs | 87µs | CORE:ftdir (opcode) | Mail::SpamAssassin::ArchiveIterator::
1 | 1 | 1 | 63µs | 212µs | BEGIN@27 | Mail::SpamAssassin::ArchiveIterator::
1 | 1 | 1 | 49µs | 49µs | new | Mail::SpamAssassin::ArchiveIterator::
1 | 1 | 1 | 48µs | 62µs | BEGIN@22 | Mail::SpamAssassin::ArchiveIterator::
8 | 4 | 1 | 31µs | 31µs | CORE:subst (opcode) | Mail::SpamAssassin::ArchiveIterator::
1 | 1 | 1 | 27µs | 221µs | BEGIN@34 | Mail::SpamAssassin::ArchiveIterator::
1 | 1 | 1 | 26µs | 592µs | BEGIN@29 | Mail::SpamAssassin::ArchiveIterator::
1 | 1 | 1 | 26µs | 160µs | BEGIN@30 | Mail::SpamAssassin::ArchiveIterator::
1 | 1 | 1 | 24µs | 62µs | BEGIN@23 | Mail::SpamAssassin::ArchiveIterator::
1 | 1 | 1 | 24µs | 202µs | BEGIN@36 | Mail::SpamAssassin::ArchiveIterator::
1 | 1 | 1 | 23µs | 84µs | BEGIN@28 | Mail::SpamAssassin::ArchiveIterator::
1 | 1 | 1 | 23µs | 30µs | BEGIN@24 | Mail::SpamAssassin::ArchiveIterator::
1 | 1 | 1 | 23µs | 90µs | BEGIN@25 | Mail::SpamAssassin::ArchiveIterator::
2 | 1 | 1 | 20µs | 20µs | CORE:closedir (opcode) | Mail::SpamAssassin::ArchiveIterator::
1 | 1 | 1 | 15µs | 15µs | set_functions | Mail::SpamAssassin::ArchiveIterator::
1 | 1 | 1 | 14µs | 14µs | _create_cache | Mail::SpamAssassin::ArchiveIterator::
0 | 0 | 0 | 0s | 0s | _message_is_useful_by_date | Mail::SpamAssassin::ArchiveIterator::
0 | 0 | 0 | 0s | 0s | _run_mailbox | Mail::SpamAssassin::ArchiveIterator::
0 | 0 | 0 | 0s | 0s | _run_mbx | Mail::SpamAssassin::ArchiveIterator::
0 | 0 | 0 | 0s | 0s | _scan_mailbox | Mail::SpamAssassin::ArchiveIterator::
0 | 0 | 0 | 0s | 0s | _scan_mbx | Mail::SpamAssassin::ArchiveIterator::
0 | 0 | 0 | 0s | 0s | _scanprob_says_scan | Mail::SpamAssassin::ArchiveIterator::
Line | State ments |
Time on line |
Calls | Time in subs |
Code |
---|---|---|---|---|---|
1 | # iterate over mail archives, calling a function on each message. | ||||
2 | # | ||||
3 | # <@LICENSE> | ||||
4 | # Licensed to the Apache Software Foundation (ASF) under one or more | ||||
5 | # contributor license agreements. See the NOTICE file distributed with | ||||
6 | # this work for additional information regarding copyright ownership. | ||||
7 | # The ASF licenses this file to you under the Apache License, Version 2.0 | ||||
8 | # (the "License"); you may not use this file except in compliance with | ||||
9 | # the License. You may obtain a copy of the License at: | ||||
10 | # | ||||
11 | # http://www.apache.org/licenses/LICENSE-2.0 | ||||
12 | # | ||||
13 | # Unless required by applicable law or agreed to in writing, software | ||||
14 | # distributed under the License is distributed on an "AS IS" BASIS, | ||||
15 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||||
16 | # See the License for the specific language governing permissions and | ||||
17 | # limitations under the License. | ||||
18 | # </@LICENSE> | ||||
19 | |||||
20 | package Mail::SpamAssassin::ArchiveIterator; | ||||
21 | |||||
22 | 2 | 63µs | 2 | 76µs | # spent 62µs (48+14) within Mail::SpamAssassin::ArchiveIterator::BEGIN@22 which was called:
# once (48µs+14µs) by main::BEGIN@66 at line 22 # spent 62µs making 1 call to Mail::SpamAssassin::ArchiveIterator::BEGIN@22
# spent 14µs making 1 call to strict::import |
23 | 2 | 57µs | 2 | 100µs | # spent 62µs (24+38) within Mail::SpamAssassin::ArchiveIterator::BEGIN@23 which was called:
# once (24µs+38µs) by main::BEGIN@66 at line 23 # spent 62µs making 1 call to Mail::SpamAssassin::ArchiveIterator::BEGIN@23
# spent 38µs making 1 call to warnings::import |
24 | 2 | 66µs | 2 | 37µs | # spent 30µs (23+7) within Mail::SpamAssassin::ArchiveIterator::BEGIN@24 which was called:
# once (23µs+7µs) by main::BEGIN@66 at line 24 # spent 30µs making 1 call to Mail::SpamAssassin::ArchiveIterator::BEGIN@24
# spent 7µs making 1 call to bytes::import |
25 | 2 | 70µs | 2 | 156µs | # spent 90µs (23+67) within Mail::SpamAssassin::ArchiveIterator::BEGIN@25 which was called:
# once (23µs+67µs) by main::BEGIN@66 at line 25 # spent 90µs making 1 call to Mail::SpamAssassin::ArchiveIterator::BEGIN@25
# spent 67µs making 1 call to re::import |
26 | |||||
27 | 2 | 61µs | 2 | 361µs | # spent 212µs (63+149) within Mail::SpamAssassin::ArchiveIterator::BEGIN@27 which was called:
# once (63µs+149µs) by main::BEGIN@66 at line 27 # spent 212µs making 1 call to Mail::SpamAssassin::ArchiveIterator::BEGIN@27
# spent 149µs making 1 call to Exporter::import |
28 | 2 | 61µs | 2 | 145µs | # spent 84µs (23+61) within Mail::SpamAssassin::ArchiveIterator::BEGIN@28 which was called:
# once (23µs+61µs) by main::BEGIN@66 at line 28 # spent 84µs making 1 call to Mail::SpamAssassin::ArchiveIterator::BEGIN@28
# spent 61µs making 1 call to Exporter::import |
29 | 2 | 66µs | 2 | 1.16ms | # spent 592µs (26+566) within Mail::SpamAssassin::ArchiveIterator::BEGIN@29 which was called:
# once (26µs+566µs) by main::BEGIN@66 at line 29 # spent 592µs making 1 call to Mail::SpamAssassin::ArchiveIterator::BEGIN@29
# spent 566µs making 1 call to Exporter::import |
30 | 2 | 61µs | 2 | 294µs | # spent 160µs (26+134) within Mail::SpamAssassin::ArchiveIterator::BEGIN@30 which was called:
# once (26µs+134µs) by main::BEGIN@66 at line 30 # spent 160µs making 1 call to Mail::SpamAssassin::ArchiveIterator::BEGIN@30
# spent 134µs making 1 call to Exporter::import |
31 | 2 | 373µs | 1 | 3.00ms | # spent 3.00ms (2.30+705µs) within Mail::SpamAssassin::ArchiveIterator::BEGIN@31 which was called:
# once (2.30ms+705µs) by main::BEGIN@66 at line 31 # spent 3.00ms making 1 call to Mail::SpamAssassin::ArchiveIterator::BEGIN@31 |
32 | |||||
33 | # 256 KiB is a big email, unless stated otherwise | ||||
34 | 2 | 79µs | 2 | 415µs | # spent 221µs (27+194) within Mail::SpamAssassin::ArchiveIterator::BEGIN@34 which was called:
# once (27µs+194µs) by main::BEGIN@66 at line 34 # spent 221µs making 1 call to Mail::SpamAssassin::ArchiveIterator::BEGIN@34
# spent 194µs making 1 call to constant::import |
35 | |||||
36 | 1 | 3µs | # spent 202µs (24+179) within Mail::SpamAssassin::ArchiveIterator::BEGIN@36 which was called:
# once (24µs+179µs) by main::BEGIN@66 at line 41 | ||
37 | $MESSAGES | ||||
38 | $AICache | ||||
39 | %class_opts | ||||
40 | @ISA | ||||
41 | 1 | 10.3ms | 2 | 382µs | }; # spent 202µs making 1 call to Mail::SpamAssassin::ArchiveIterator::BEGIN@36
# spent 179µs making 1 call to vars::import |
42 | |||||
43 | 1 | 14µs | @ISA = qw(); | ||
44 | |||||
45 | =head1 NAME | ||||
46 | |||||
47 | Mail::SpamAssassin::ArchiveIterator - find and process messages one at a time | ||||
48 | |||||
49 | =head1 SYNOPSIS | ||||
50 | |||||
51 | my $iter = new Mail::SpamAssassin::ArchiveIterator( | ||||
52 | { | ||||
53 | 'opt_max_size' => 256 * 1024, # 0 implies no limit | ||||
54 | 'opt_cache' => 1, | ||||
55 | } | ||||
56 | ); | ||||
57 | |||||
58 | $iter->set_functions( \&wanted, sub { } ); | ||||
59 | |||||
60 | eval { $iter->run(@ARGV); }; | ||||
61 | |||||
62 | sub wanted { | ||||
63 | my($class, $filename, $recv_date, $msg_array) = @_; | ||||
64 | |||||
65 | |||||
66 | ... | ||||
67 | } | ||||
68 | |||||
69 | =head1 DESCRIPTION | ||||
70 | |||||
71 | The Mail::SpamAssassin::ArchiveIterator module will go through a set | ||||
72 | of mbox files, mbx files, and directories (with a single message per | ||||
73 | file) and generate a list of messages. It will then call the C<wanted_sub> | ||||
74 | and C<result_sub> functions appropriately per message. | ||||
75 | |||||
76 | =head1 METHODS | ||||
77 | |||||
78 | =over 4 | ||||
79 | |||||
80 | =cut | ||||
81 | |||||
82 | |||||
83 | ########################################################################### | ||||
84 | |||||
85 | =item $item = new Mail::SpamAssassin::ArchiveIterator( [ { opt => val, ... } ] ) | ||||
86 | |||||
87 | Constructs a new C<Mail::SpamAssassin::ArchiveIterator> object. You may | ||||
88 | pass the following attribute-value pairs to the constructor. The pairs are | ||||
89 | optional unless otherwise noted. | ||||
90 | |||||
91 | =over 4 | ||||
92 | |||||
93 | =item opt_max_size | ||||
94 | |||||
95 | A value of option I<opt_max_size> determines a limit (number of bytes) | ||||
96 | beyond which a message is considered large and is skipped by ArchiveIterator. | ||||
97 | |||||
98 | A value 0 implies no size limit, all messages are examined. An undefined | ||||
99 | value implies a default limit of 256 KiB. | ||||
100 | |||||
101 | =item opt_all | ||||
102 | |||||
103 | Setting this option to true implicitly sets I<opt_max_size> to 0, i.e. | ||||
104 | no limit of a message size, all messages are processes by ArchiveIterator. | ||||
105 | For compatibility with SpamAssassin versions older than 3.4.0 which | ||||
106 | lacked option I<opt_max_size>. | ||||
107 | |||||
108 | =item opt_scanprob | ||||
109 | |||||
110 | Randomly select messages to scan, with a probability of N, where N ranges | ||||
111 | from 0.0 (no messages scanned) to 1.0 (all messages scanned). Default | ||||
112 | is 1.0. | ||||
113 | |||||
114 | This setting can be specified separately for each target. | ||||
115 | |||||
116 | =item opt_before | ||||
117 | |||||
118 | Only use messages which are received after the given time_t value. | ||||
119 | Negative values are an offset from the current time, e.g. -86400 = | ||||
120 | last 24 hours; or as parsed by Time::ParseDate (e.g. '-6 months') | ||||
121 | |||||
122 | This setting can be specified separately for each target. | ||||
123 | |||||
124 | =item opt_after | ||||
125 | |||||
126 | Same as opt_before, except the messages are only used if after the given | ||||
127 | time_t value. | ||||
128 | |||||
129 | This setting can be specified separately for each target. | ||||
130 | |||||
131 | =item opt_want_date | ||||
132 | |||||
133 | Set to 1 (default) if you want the received date to be filled in | ||||
134 | in the C<wanted_sub> callback below. Set this to 0 to avoid this; | ||||
135 | it's a good idea to set this to 0 if you can, as it imposes a performance | ||||
136 | hit. | ||||
137 | |||||
138 | =item opt_skip_empty_messages | ||||
139 | |||||
140 | Set to 1 if you want to skip corrupt, 0-byte messages. The default is 0. | ||||
141 | |||||
142 | =item opt_cache | ||||
143 | |||||
144 | Set to 0 (default) if you don't want to use cached information to help speed | ||||
145 | up ArchiveIterator. Set to 1 to enable. This setting requires C<opt_cachedir> | ||||
146 | also be set. | ||||
147 | |||||
148 | =item opt_cachedir | ||||
149 | |||||
150 | Set to the path of a directory where you wish to store cached information for | ||||
151 | C<opt_cache>, if you don't want to mix them with the input files (as is the | ||||
152 | default). The directory must be both readable and writable. | ||||
153 | |||||
154 | =item wanted_sub | ||||
155 | |||||
156 | Reference to a subroutine which will process message data. Usually | ||||
157 | set via set_functions(). The routine will be passed 5 values: class | ||||
158 | (scalar), filename (scalar), received date (scalar), message content | ||||
159 | (array reference, one message line per element), and the message format | ||||
160 | key ('f' for file, 'm' for mbox, 'b' for mbx). | ||||
161 | |||||
162 | Note that if C<opt_want_date> is set to 0, the received date scalar will be | ||||
163 | undefined. | ||||
164 | |||||
165 | =item result_sub | ||||
166 | |||||
167 | Reference to a subroutine which will process the results of the wanted_sub | ||||
168 | for each message processed. Usually set via set_functions(). | ||||
169 | The routine will be passed 3 values: class (scalar), result (scalar, returned | ||||
170 | from wanted_sub), and received date (scalar). | ||||
171 | |||||
172 | Note that if C<opt_want_date> is set to 0, the received date scalar will be | ||||
173 | undefined. | ||||
174 | |||||
175 | =item scan_progress_sub | ||||
176 | |||||
177 | Reference to a subroutine which will be called intermittently during | ||||
178 | the 'scan' phase of the mass-check. No guarantees are made as to | ||||
179 | how frequently this may happen, mind you. | ||||
180 | |||||
181 | =item opt_from_regex | ||||
182 | |||||
183 | This setting allows for flexibility in specifying the mbox format From separator. | ||||
184 | |||||
185 | It defaults to the regular expression: | ||||
186 | |||||
187 | /^From \S+ ?(\S\S\S \S\S\S .\d .\d:\d\d:\d\d \d{4}|.\d-\d\d-\d{4}_\d\d:\d\d:\d\d_)/ | ||||
188 | |||||
189 | Some SpamAssassin programs such as sa-learn will use the configuration option | ||||
190 | 'mbox_format_from_regex' to override the default regular expression. | ||||
191 | |||||
192 | =back | ||||
193 | |||||
194 | =cut | ||||
195 | |||||
196 | # spent 49µs within Mail::SpamAssassin::ArchiveIterator::new which was called:
# once (49µs+0s) by main::RUNTIME at line 467 of /usr/local/bin/sa-learn | ||||
197 | 1 | 2µs | my $class = shift; | ||
198 | 1 | 2µs | $class = ref($class) || $class; | ||
199 | |||||
200 | 1 | 2µs | my $self = shift; | ||
201 | 1 | 2µs | if (!defined $self) { $self = { }; } | ||
202 | 1 | 2µs | bless ($self, $class); | ||
203 | |||||
204 | # If any of these options are set, we need to figure out the message's | ||||
205 | # receive date at scan time. opt_after, opt_before, or opt_want_date | ||||
206 | $self->{determine_receive_date} = | ||||
207 | defined $self->{opt_after} || defined $self->{opt_before} || | ||||
208 | 1 | 10µs | $self->{opt_want_date}; | ||
209 | |||||
210 | 1 | 4µs | $self->{s} = [ ]; # spam, of course | ||
211 | 1 | 3µs | $self->{h} = [ ]; # ham, as if you couldn't guess | ||
212 | |||||
213 | 1 | 4µs | $self->{access_problem} = 0; | ||
214 | |||||
215 | 1 | 6µs | if ($self->{opt_all}) { | ||
216 | $self->{opt_max_size} = 0; | ||||
217 | } elsif (!defined $self->{opt_max_size}) { | ||||
218 | 1 | 4µs | $self->{opt_max_size} = BIG_BYTES; | ||
219 | } | ||||
220 | |||||
221 | 1 | 11µs | $self; | ||
222 | } | ||||
223 | |||||
224 | ########################################################################### | ||||
225 | |||||
226 | =item set_functions( \&wanted_sub, \&result_sub ) | ||||
227 | |||||
228 | Sets the subroutines used for message processing (wanted_sub), and result | ||||
229 | reporting. For more information, see I<new()> above. | ||||
230 | |||||
231 | =cut | ||||
232 | |||||
233 | # spent 15µs within Mail::SpamAssassin::ArchiveIterator::set_functions which was called:
# once (15µs+0s) by main::RUNTIME at line 469 of /usr/local/bin/sa-learn | ||||
234 | 1 | 3µs | my ($self, $wanted, $result) = @_; | ||
235 | 1 | 4µs | $self->{wanted_sub} = $wanted if defined $wanted; | ||
236 | 1 | 11µs | $self->{result_sub} = $result if defined $result; | ||
237 | } | ||||
238 | |||||
239 | ########################################################################### | ||||
240 | |||||
241 | =item run ( @target_paths ) | ||||
242 | |||||
243 | Generates the list of messages to process, then runs each message through the | ||||
244 | configured wanted subroutine. Files which have a name ending in C<.gz> or | ||||
245 | C<.bz2> will be properly uncompressed via call to C<gzip -dc> and C<bzip2 -dc> | ||||
246 | respectively. | ||||
247 | |||||
248 | The target_paths array is expected to be either one element per path in the | ||||
249 | following format: C<class:format:raw_location>, or a hash reference containing | ||||
250 | key-value option pairs and a 'target' key with a value in that format. | ||||
251 | |||||
252 | The key-value option pairs that can be used are: opt_scanprob, opt_after, | ||||
253 | opt_before. See the constructor method's documentation for more information | ||||
254 | on their effects. | ||||
255 | |||||
256 | run() returns 0 if there was an error (can't open a file, etc,) and 1 if there | ||||
257 | were no errors. | ||||
258 | |||||
259 | =over 4 | ||||
260 | |||||
261 | =item class | ||||
262 | |||||
263 | Either 'h' for ham or 's' for spam. If the class is longer than 1 character, | ||||
264 | it will be truncated. If blank, 'h' is default. | ||||
265 | |||||
266 | =item format | ||||
267 | |||||
268 | Specifies the format of the raw_location. C<dir> is a directory whose | ||||
269 | files are individual messages, C<file> a file with a single message, | ||||
270 | C<mbox> an mbox formatted file, or C<mbx> for an mbx formatted directory. | ||||
271 | |||||
272 | C<detect> can also be used. This assumes C<mbox> for any file whose path | ||||
273 | contains the pattern C</\.mbox/i>, C<file> anything that is not a | ||||
274 | directory, or C<directory> otherwise. | ||||
275 | |||||
276 | =item raw_location | ||||
277 | |||||
278 | Path to file or directory. File globbing is allowed using the | ||||
279 | standard csh-style globbing (see C<perldoc -f glob>). C<~> at the | ||||
280 | front of the value will be replaced by the C<HOME> environment | ||||
281 | variable. Escaped whitespace is protected as well. | ||||
282 | |||||
283 | B<NOTE:> C<~user> is not allowed. | ||||
284 | |||||
285 | B<NOTE 2:> C<-> is not allowed as a raw location. To have | ||||
286 | ArchiveIterator deal with STDIN, generate a temp file. | ||||
287 | |||||
288 | =back | ||||
289 | |||||
290 | =cut | ||||
291 | |||||
292 | # spent 96438s (102µs+96438) within Mail::SpamAssassin::ArchiveIterator::run which was called:
# once (102µs+96438s) by main::RUNTIME at line 478 of /usr/local/bin/sa-learn | ||||
293 | 1 | 7µs | my ($self, @targets) = @_; | ||
294 | |||||
295 | 1 | 3µs | if (!defined $self->{wanted_sub}) { | ||
296 | warn "archive-iterator: set_functions never called"; | ||||
297 | return 0; | ||||
298 | } | ||||
299 | |||||
300 | # scan the targets and get the number and list of messages | ||||
301 | $self->_scan_targets(\@targets, | ||||
302 | # spent 10.9ms (5.48+5.41) within Mail::SpamAssassin::ArchiveIterator::__ANON__[/usr/local/lib/perl5/site_perl/Mail/SpamAssassin/ArchiveIterator.pm:305] which was called 239 times, avg 46µs/call:
# 239 times (5.48ms+5.41ms) by Mail::SpamAssassin::ArchiveIterator::_scan_file at line 880, avg 46µs/call | ||||
303 | 239 | 1.22ms | my($self, $date, $class, $format, $mail) = @_; | ||
304 | 478 | 14.8ms | 239 | 5.41ms | push(@{$self->{$class}}, _index_pack($date, $class, $format, $mail)); # spent 5.41ms making 239 calls to Mail::SpamAssassin::ArchiveIterator::_index_pack, avg 23µs/call |
305 | } | ||||
306 | 1 | 28µs | 1 | 91.2ms | ); # spent 91.2ms making 1 call to Mail::SpamAssassin::ArchiveIterator::_scan_targets |
307 | |||||
308 | 1 | 2µs | my $messages; | ||
309 | # for ease of memory, we'll play with pointers | ||||
310 | 1 | 3µs | $messages = $self->{s}; | ||
311 | 1 | 3µs | undef $self->{s}; | ||
312 | 3 | 11µs | push(@{$messages}, @{$self->{h}}); | ||
313 | 1 | 2µs | undef $self->{h}; | ||
314 | |||||
315 | 2 | 6µs | $MESSAGES = scalar(@{$messages}); | ||
316 | |||||
317 | # go ahead and run through all of the messages specified | ||||
318 | 1 | 25µs | 1 | 96438s | return $self->_run($messages); # spent 96438s making 1 call to Mail::SpamAssassin::ArchiveIterator::_run |
319 | } | ||||
320 | |||||
321 | # spent 96438s (12.2ms+96438) within Mail::SpamAssassin::ArchiveIterator::_run which was called:
# once (12.2ms+96438s) by Mail::SpamAssassin::ArchiveIterator::run at line 318 | ||||
322 | 1 | 2µs | my ($self, $messages) = @_; | ||
323 | |||||
324 | 241 | 3.14ms | while (my $message = shift @{$messages}) { | ||
325 | 239 | 3.96ms | 239 | 96438s | my($class, undef, $date, undef, $result) = $self->_run_message($message); # spent 96438s making 239 calls to Mail::SpamAssassin::ArchiveIterator::_run_message, avg 404s/call |
326 | 473 | 4.15ms | 234 | 4.54ms | &{$self->{result_sub}}($class, $result, $date) if $result; # spent 4.54ms making 234 calls to main::result, avg 19µs/call |
327 | } | ||||
328 | 1 | 18µs | return ! $self->{access_problem}; | ||
329 | } | ||||
330 | |||||
331 | ############################################################################ | ||||
332 | |||||
333 | ## run_message and related functions to process a single message | ||||
334 | |||||
335 | # spent 96438s (72.7ms+96438) within Mail::SpamAssassin::ArchiveIterator::_run_message which was called 239 times, avg 404s/call:
# 239 times (72.7ms+96438s) by Mail::SpamAssassin::ArchiveIterator::_run at line 325, avg 404s/call | ||||
336 | 239 | 1.03ms | my ($self, $msg) = @_; | ||
337 | |||||
338 | 239 | 3.45ms | 239 | 6.61ms | my ($date, $class, $format, $mail) = _index_unpack($msg); # spent 6.61ms making 239 calls to Mail::SpamAssassin::ArchiveIterator::_index_unpack, avg 28µs/call |
339 | |||||
340 | 239 | 727µs | if ($format eq 'f') { | ||
341 | 239 | 66.9ms | 239 | 96438s | return $self->_run_file($class, $format, $mail, $date); # spent 96438s making 239 calls to Mail::SpamAssassin::ArchiveIterator::_run_file, avg 404s/call |
342 | } | ||||
343 | elsif ($format eq 'm') { | ||||
344 | return $self->_run_mailbox($class, $format, $mail, $date); | ||||
345 | } | ||||
346 | elsif ($format eq 'b') { | ||||
347 | return $self->_run_mbx($class, $format, $mail, $date); | ||||
348 | } | ||||
349 | } | ||||
350 | |||||
351 | # spent 96438s (322ms+96437) within Mail::SpamAssassin::ArchiveIterator::_run_file which was called 239 times, avg 404s/call:
# 239 times (322ms+96437s) by Mail::SpamAssassin::ArchiveIterator::_run_message at line 341, avg 404s/call | ||||
352 | 239 | 2.16ms | my ($self, $class, $format, $where, $date) = @_; | ||
353 | |||||
354 | 239 | 1.97ms | 239 | 39.1ms | if (!_mail_open($where)) { # spent 39.1ms making 239 calls to Mail::SpamAssassin::ArchiveIterator::_mail_open, avg 164µs/call |
355 | $self->{access_problem} = 1; | ||||
356 | return; | ||||
357 | } | ||||
358 | |||||
359 | 239 | 9.29ms | 239 | 1.38ms | my $stat_errn = stat(INPUT) ? 0 : 0+$!; # spent 1.38ms making 239 calls to Mail::SpamAssassin::ArchiveIterator::CORE:stat, avg 6µs/call |
360 | 239 | 3.16ms | 239 | 873µs | if ($stat_errn == ENOENT) { # spent 873µs making 239 calls to Mail::SpamAssassin::ArchiveIterator::CORE:ftfile, avg 4µs/call |
361 | dbg("archive-iterator: no such input ($where)"); | ||||
362 | return; | ||||
363 | } | ||||
364 | elsif ($stat_errn != 0) { | ||||
365 | warn "archive-iterator: no access to input ($where): $!"; | ||||
366 | return; | ||||
367 | } | ||||
368 | elsif (!-f _ && !-c _ && !-p _) { | ||||
369 | warn "archive-iterator: not a plain file (or char.spec. or pipe) ($where)"; | ||||
370 | return; | ||||
371 | } | ||||
372 | |||||
373 | 239 | 971µs | my $opt_max_size = $self->{opt_max_size}; | ||
374 | 239 | 4.29ms | 478 | 1.19ms | if (!$opt_max_size) { # spent 808µs making 239 calls to Mail::SpamAssassin::ArchiveIterator::CORE:ftsize, avg 3µs/call
# spent 386µs making 239 calls to Mail::SpamAssassin::ArchiveIterator::CORE:ftfile, avg 2µs/call |
375 | # process any size | ||||
376 | } elsif (!-f _) { | ||||
377 | # must check size while reading | ||||
378 | } elsif (-s _ > $opt_max_size) { | ||||
379 | # skip too-big mails | ||||
380 | # note that -s can only deal with files, it returns 0 on char.spec. STDIN | ||||
381 | 5 | 86µs | 10 | 65µs | info("archive-iterator: skipping large message: ". # spent 57µs making 5 calls to Mail::SpamAssassin::Logger::info, avg 11µs/call
# spent 8µs making 5 calls to Mail::SpamAssassin::ArchiveIterator::CORE:ftsize, avg 2µs/call |
382 | "file size %d, limit %d bytes", -s _, $opt_max_size); | ||||
383 | 5 | 85µs | 5 | 38µs | close INPUT or die "error closing input file: $!"; # spent 38µs making 5 calls to Mail::SpamAssassin::ArchiveIterator::CORE:close, avg 8µs/call |
384 | 5 | 46µs | return; | ||
385 | } | ||||
386 | |||||
387 | 234 | 457µs | my @msg; | ||
388 | my $header; | ||||
389 | 234 | 508µs | my $len = 0; | ||
390 | 234 | 591µs | my $str = ''; | ||
391 | 234 | 440µs | my($inbuf,$nread); | ||
392 | 234 | 20.5ms | 234 | 12.0ms | while ( $nread=read(INPUT,$inbuf,16384) ) { # spent 12.0ms making 234 calls to Mail::SpamAssassin::ArchiveIterator::CORE:read, avg 51µs/call |
393 | 456 | 1.22ms | $len += $nread; | ||
394 | 456 | 1.01ms | if ($opt_max_size && $len > $opt_max_size) { | ||
395 | info("archive-iterator: skipping large message: read %d, limit %d bytes", | ||||
396 | $len, $opt_max_size); | ||||
397 | close INPUT or die "error closing input file: $!"; | ||||
398 | return; | ||||
399 | } | ||||
400 | 456 | 22.2ms | 456 | 7.41ms | $str .= $inbuf; # spent 7.41ms making 456 calls to Mail::SpamAssassin::ArchiveIterator::CORE:read, avg 16µs/call |
401 | } | ||||
402 | 234 | 526µs | defined $nread or die "error reading: $!"; | ||
403 | 234 | 755µs | undef $inbuf; | ||
404 | 468 | 71.6ms | @msg = split(/^/m, $str, -1); undef $str; | ||
405 | 234 | 1.73ms | for my $j (0..$#msg) { | ||
406 | 13955 | 226ms | 13487 | 36.6ms | if ($msg[$j] =~ /^\015?$/) { $header = $j; last } # spent 36.6ms making 13487 calls to Mail::SpamAssassin::ArchiveIterator::CORE:match, avg 3µs/call |
407 | } | ||||
408 | 234 | 4.41ms | 234 | 2.36ms | close INPUT or die "error closing input file: $!"; # spent 2.36ms making 234 calls to Mail::SpamAssassin::ArchiveIterator::CORE:close, avg 10µs/call |
409 | |||||
410 | 234 | 1000µs | if ($date == AI_TIME_UNKNOWN && $self->{determine_receive_date}) { | ||
411 | $date = Mail::SpamAssassin::Util::receive_date(join('', splice(@msg, 0, $header))); | ||||
412 | } | ||||
413 | |||||
414 | 468 | 9.38ms | 234 | 96437s | return($class, $format, $date, $where, &{$self->{wanted_sub}}($class, $where, $date, \@msg, $format)); # spent 96437s making 234 calls to main::wanted, avg 412s/call |
415 | } | ||||
416 | |||||
417 | sub _run_mailbox { | ||||
418 | my ($self, $class, $format, $where, $date) = @_; | ||||
419 | |||||
420 | my ($file, $offset); | ||||
421 | { local($1,$2); # Bug 7140 (avoids perl bug [perl #123880]) | ||||
422 | ($file, $offset) = ($where =~ m/(.*)\.(\d+)$/); | ||||
423 | } | ||||
424 | my @msg; | ||||
425 | my $header; | ||||
426 | if (!_mail_open($file)) { | ||||
427 | $self->{access_problem} = 1; | ||||
428 | return; | ||||
429 | } | ||||
430 | |||||
431 | my $opt_max_size = $self->{opt_max_size}; | ||||
432 | dbg("archive-iterator: _run_mailbox %s, ofs %d, limit %d", | ||||
433 | $file, $offset, $opt_max_size||0); | ||||
434 | |||||
435 | seek(INPUT,$offset,0) or die "cannot reposition file to $offset: $!"; | ||||
436 | |||||
437 | my $size = 0; | ||||
438 | for ($!=0; <INPUT>; $!=0) { | ||||
439 | #Changed Regex to use option Per bug 6703 | ||||
440 | last if (substr($_,0,5) eq "From " && @msg && /$self->{opt_from_regex}/o); | ||||
441 | $size += length($_); | ||||
442 | push (@msg, $_); | ||||
443 | |||||
444 | # skip mails that are too big | ||||
445 | if ($opt_max_size && $size > $opt_max_size) { | ||||
446 | info("archive-iterator: skipping large message: ". | ||||
447 | "%d lines, %d bytes, limit %d bytes", | ||||
448 | scalar @msg, $size, $opt_max_size); | ||||
449 | close INPUT or die "error closing input file: $!"; | ||||
450 | return; | ||||
451 | } | ||||
452 | |||||
453 | if (!defined $header && /^\s*$/) { | ||||
454 | $header = $#msg; | ||||
455 | } | ||||
456 | } | ||||
457 | defined $_ || $!==0 or | ||||
458 | $!==EBADF ? dbg("archive-iterator: error reading: $!") | ||||
459 | : die "error reading: $!"; | ||||
460 | close INPUT or die "error closing input file: $!"; | ||||
461 | |||||
462 | if ($date == AI_TIME_UNKNOWN && $self->{determine_receive_date}) { | ||||
463 | $date = Mail::SpamAssassin::Util::receive_date(join('', splice(@msg, 0, $header))); | ||||
464 | } | ||||
465 | |||||
466 | return($class, $format, $date, $where, &{$self->{wanted_sub}}($class, $where, $date, \@msg, $format)); | ||||
467 | } | ||||
468 | |||||
469 | sub _run_mbx { | ||||
470 | my ($self, $class, $format, $where, $date) = @_; | ||||
471 | |||||
472 | my ($file, $offset) = ($where =~ m/(.*)\.(\d+)$/); | ||||
473 | my @msg; | ||||
474 | my $header; | ||||
475 | |||||
476 | if (!_mail_open($file)) { | ||||
477 | $self->{access_problem} = 1; | ||||
478 | return; | ||||
479 | } | ||||
480 | |||||
481 | my $opt_max_size = $self->{opt_max_size}; | ||||
482 | dbg("archive-iterator: _run_mbx %s, ofs %d, limit %d", | ||||
483 | $file, $offset, $opt_max_size||0); | ||||
484 | |||||
485 | seek(INPUT,$offset,0) or die "cannot reposition file to $offset: $!"; | ||||
486 | |||||
487 | my $size = 0; | ||||
488 | for ($!=0; <INPUT>; $!=0) { | ||||
489 | last if ($_ =~ MBX_SEPARATOR); | ||||
490 | $size += length($_); | ||||
491 | push (@msg, $_); | ||||
492 | |||||
493 | # skip mails that are too big | ||||
494 | if ($opt_max_size && $size > $opt_max_size) { | ||||
495 | info("archive-iterator: skipping large message: ". | ||||
496 | "%d lines, %d bytes, limit %d bytes", | ||||
497 | scalar @msg, $size, $opt_max_size); | ||||
498 | close INPUT or die "error closing input file: $!"; | ||||
499 | return; | ||||
500 | } | ||||
501 | |||||
502 | if (!defined $header && /^\s*$/) { | ||||
503 | $header = $#msg; | ||||
504 | } | ||||
505 | } | ||||
506 | defined $_ || $!==0 or | ||||
507 | $!==EBADF ? dbg("archive-iterator: error reading: $!") | ||||
508 | : die "error reading: $!"; | ||||
509 | close INPUT or die "error closing input file: $!"; | ||||
510 | |||||
511 | if ($date == AI_TIME_UNKNOWN && $self->{determine_receive_date}) { | ||||
512 | $date = Mail::SpamAssassin::Util::receive_date(join('', splice(@msg, 0, $header))); | ||||
513 | } | ||||
514 | |||||
515 | return($class, $format, $date, $where, &{$self->{wanted_sub}}($class, $where, $date, \@msg, $format)); | ||||
516 | } | ||||
517 | |||||
518 | ############################################################################ | ||||
519 | |||||
520 | ## FUNCTIONS BELOW THIS POINT ARE FOR FINDING THE MESSAGES TO RUN AGAINST | ||||
521 | |||||
522 | ############################################################################ | ||||
523 | |||||
524 | # spent 91.2ms (338µs+90.8) within Mail::SpamAssassin::ArchiveIterator::_scan_targets which was called:
# once (338µs+90.8ms) by Mail::SpamAssassin::ArchiveIterator::run at line 306 | ||||
525 | 1 | 2µs | my ($self, $targets, $bkfunc) = @_; | ||
526 | |||||
527 | 1 | 3µs | %class_opts = (); | ||
528 | |||||
529 | 1 | 16µs | foreach my $target (@${targets}) { | ||
530 | 2 | 5µs | if (!defined $target) { | ||
531 | warn "archive-iterator: invalid (undef) value in target list"; | ||||
532 | next; | ||||
533 | } | ||||
534 | |||||
535 | 2 | 4µs | my %opts; | ||
536 | 2 | 7µs | if (ref $target eq 'HASH') { | ||
537 | # e.g. { target => $target, opt_foo => 1, opt_bar => 0.4 ... } | ||||
538 | foreach my $k (keys %{$target}) { | ||||
539 | if ($k =~ /^opt_/) { | ||||
540 | $opts{$k} = $target->{$k}; | ||||
541 | } | ||||
542 | } | ||||
543 | $target = $target->{target}; | ||||
544 | } | ||||
545 | |||||
546 | 2 | 25µs | my ($class, $format, $rawloc) = split(/:/, $target, 3); | ||
547 | |||||
548 | # "class" | ||||
549 | 2 | 5µs | if (!defined $format) { | ||
550 | warn "archive-iterator: invalid (undef) format in target list, $target"; | ||||
551 | next; | ||||
552 | } | ||||
553 | # "class:format" | ||||
554 | 2 | 4µs | if (!defined $rawloc) { | ||
555 | warn "archive-iterator: invalid (undef) raw location in target list, $target"; | ||||
556 | next; | ||||
557 | } | ||||
558 | |||||
559 | 2 | 5µs | if ($rawloc eq '-') { | ||
560 | warn 'archive-iterator: raw location "-" is not supported'; | ||||
561 | next; | ||||
562 | } | ||||
563 | |||||
564 | # use ham by default, things like "spamassassin" can't specify the type | ||||
565 | 2 | 14µs | $class = substr($class, 0, 1) || 'h'; | ||
566 | |||||
567 | # keep a copy of the most recent message-selection options for | ||||
568 | # each class | ||||
569 | 2 | 11µs | $class_opts{$class} = \%opts; | ||
570 | |||||
571 | 2 | 12µs | foreach my $k (keys %opts) { | ||
572 | $self->{$k} = $opts{$k}; | ||||
573 | } | ||||
574 | 2 | 19µs | 2 | 175µs | $self->_set_default_message_selection_opts(); # spent 175µs making 2 calls to Mail::SpamAssassin::ArchiveIterator::_set_default_message_selection_opts, avg 87µs/call |
575 | |||||
576 | 2 | 23µs | 2 | 266µs | my @locations = $self->_fix_globs($rawloc); # spent 266µs making 2 calls to Mail::SpamAssassin::ArchiveIterator::_fix_globs, avg 133µs/call |
577 | |||||
578 | 2 | 21µs | foreach my $location (@locations) { | ||
579 | 2 | 4µs | my $method; | ||
580 | |||||
581 | # for this location only; 'detect' means they can differ for each location | ||||
582 | 2 | 5µs | my $thisformat = $format; | ||
583 | |||||
584 | 2 | 10µs | if ($format eq 'detect') { | ||
585 | # detect the format | ||||
586 | 2 | 99µs | 2 | 63µs | my $stat_errn = stat($location) ? 0 : 0+$!; # spent 63µs making 2 calls to Mail::SpamAssassin::ArchiveIterator::CORE:stat, avg 32µs/call |
587 | 2 | 42µs | 2 | 7µs | if ($stat_errn == ENOENT) { # spent 7µs making 2 calls to Mail::SpamAssassin::ArchiveIterator::CORE:ftdir, avg 3µs/call |
588 | $thisformat = 'file'; # actually, no file - to be detected later | ||||
589 | } | ||||
590 | elsif ($stat_errn != 0) { | ||||
591 | warn "archive-iterator: no access to $location: $!"; | ||||
592 | $thisformat = 'file'; | ||||
593 | } | ||||
594 | elsif (-d _) { | ||||
595 | # it's a directory | ||||
596 | 2 | 7µs | $thisformat = 'dir'; | ||
597 | } | ||||
598 | elsif ($location =~ /\.mbox/i) { | ||||
599 | # filename indicates mbox | ||||
600 | $thisformat = 'mbox'; | ||||
601 | } | ||||
602 | else { | ||||
603 | $thisformat = 'file'; | ||||
604 | } | ||||
605 | } | ||||
606 | |||||
607 | 2 | 11µs | if ($thisformat eq 'dir') { | ||
608 | 2 | 7µs | $method = \&_scan_directory; | ||
609 | } | ||||
610 | elsif ($thisformat eq 'mbox') { | ||||
611 | $method = \&_scan_mailbox; | ||||
612 | } | ||||
613 | elsif ($thisformat eq 'file') { | ||||
614 | $method = \&_scan_file; | ||||
615 | } | ||||
616 | elsif ($thisformat eq 'mbx') { | ||||
617 | $method = \&_scan_mbx; | ||||
618 | } | ||||
619 | else { | ||||
620 | warn "archive-iterator: format $thisformat (from $format) unknown!"; | ||||
621 | next; | ||||
622 | } | ||||
623 | |||||
624 | # call the appropriate method | ||||
625 | 4 | 33µs | 2 | 90.3ms | &{$method}($self, $class, $location, $bkfunc); # spent 90.3ms making 2 calls to Mail::SpamAssassin::ArchiveIterator::_scan_directory, avg 45.2ms/call |
626 | } | ||||
627 | } | ||||
628 | } | ||||
629 | |||||
630 | # spent 39.1ms (12.0+27.1) within Mail::SpamAssassin::ArchiveIterator::_mail_open which was called 239 times, avg 164µs/call:
# 239 times (12.0ms+27.1ms) by Mail::SpamAssassin::ArchiveIterator::_run_file at line 354, avg 164µs/call | ||||
631 | 239 | 673µs | my ($file) = @_; | ||
632 | |||||
633 | # bug 5288: the "magic" version of open will strip leading and trailing | ||||
634 | # whitespace from the expression. switch to the three-argument version | ||||
635 | # of open which does not strip whitespace. see "perldoc -f open" and | ||||
636 | # "perldoc perlipc" for more information. | ||||
637 | |||||
638 | # Assume that the file by default is just a plain file | ||||
639 | 239 | 1.07ms | my @expr = ( $file ); | ||
640 | 239 | 681µs | my $mode = '<'; | ||
641 | |||||
642 | # Handle different types of compressed files | ||||
643 | 239 | 4.88ms | 478 | 2.00ms | if ($file =~ /\.gz$/) { # spent 2.00ms making 478 calls to Mail::SpamAssassin::ArchiveIterator::CORE:match, avg 4µs/call |
644 | $mode = '-|'; | ||||
645 | unshift @expr, 'gunzip', '-cd'; | ||||
646 | } | ||||
647 | elsif ($file =~ /\.bz2$/) { | ||||
648 | $mode = '-|'; | ||||
649 | unshift @expr, 'bzip2', '-cd'; | ||||
650 | } | ||||
651 | |||||
652 | # Go ahead and try to open the file | ||||
653 | 239 | 26.6ms | 239 | 24.0ms | if (!open (INPUT, $mode, @expr)) { # spent 24.0ms making 239 calls to Mail::SpamAssassin::ArchiveIterator::CORE:open, avg 101µs/call |
654 | warn "archive-iterator: unable to open $file: $!\n"; | ||||
655 | return 0; | ||||
656 | } | ||||
657 | |||||
658 | # bug 5249: mail could have 8-bit data, need this on some platforms | ||||
659 | 239 | 3.00ms | 239 | 1.04ms | binmode INPUT or die "cannot set input file to binmode: $!"; # spent 1.04ms making 239 calls to Mail::SpamAssassin::ArchiveIterator::CORE:binmode, avg 4µs/call |
660 | |||||
661 | 239 | 2.80ms | return 1; | ||
662 | } | ||||
663 | |||||
664 | # spent 175µs (141+34) within Mail::SpamAssassin::ArchiveIterator::_set_default_message_selection_opts which was called 2 times, avg 87µs/call:
# 2 times (141µs+34µs) by Mail::SpamAssassin::ArchiveIterator::_scan_targets at line 574, avg 87µs/call | ||||
665 | 2 | 4µs | my ($self) = @_; | ||
666 | |||||
667 | 2 | 8µs | $self->{opt_scanprob} = 1.0 unless (defined $self->{opt_scanprob}); | ||
668 | 2 | 6µs | $self->{opt_want_date} = 1 unless (defined $self->{opt_want_date}); | ||
669 | 2 | 6µs | $self->{opt_cache} = 0 unless (defined $self->{opt_cache}); | ||
670 | #Changed Regex to include boundaries for Communigate Pro versions (5.2.x and later). per Bug 6413 | ||||
671 | 2 | 7µs | $self->{opt_from_regex} = '^From \S+ ?(\S\S\S \S\S\S .\d .\d:\d\d:\d\d \d{4}|.\d-\d\d-\d{4}_\d\d:\d\d:\d\d_)' unless (defined $self->{opt_from_regex}); | ||
672 | |||||
673 | #STRIP LEADING AND TRAILING / FROM REGEX FOR OPTION | ||||
674 | 2 | 33µs | 2 | 9µs | $self->{opt_from_regex} =~ s/^\///; # spent 9µs making 2 calls to Mail::SpamAssassin::ArchiveIterator::CORE:subst, avg 5µs/call |
675 | 2 | 23µs | 2 | 6µs | $self->{opt_from_regex} =~ s/\/$//; # spent 6µs making 2 calls to Mail::SpamAssassin::ArchiveIterator::CORE:subst, avg 3µs/call |
676 | |||||
677 | 2 | 66µs | 2 | 19µs | dbg("archive-iterator: _set_default_message_selection_opts After: Scanprob[$self->{opt_scanprob}], want_date[$self->{opt_want_date}], cache[$self->{opt_cache}], from_regex[$self->{opt_from_regex}]"); # spent 19µs making 2 calls to Mail::SpamAssassin::Logger::dbg, avg 9µs/call |
678 | |||||
679 | } | ||||
680 | |||||
681 | ############################################################################ | ||||
682 | |||||
683 | sub _message_is_useful_by_date { | ||||
684 | my ($self, $date) = @_; | ||||
685 | |||||
686 | if (!$self->{opt_after} && !$self->{opt_before}) { | ||||
687 | # Not using the feature | ||||
688 | return 1; | ||||
689 | } | ||||
690 | |||||
691 | return 0 unless $date; # undef or 0 date = unusable | ||||
692 | |||||
693 | if (!$self->{opt_before}) { | ||||
694 | # Just care about after | ||||
695 | return $date > $self->{opt_after}; | ||||
696 | } | ||||
697 | else { | ||||
698 | return (($date < $self->{opt_before}) && ($date > $self->{opt_after})); | ||||
699 | } | ||||
700 | } | ||||
701 | |||||
702 | # additional check, based solely on a file's mod timestamp. we cannot | ||||
703 | # make assumptions about --before, since the file may have been "touch"ed | ||||
704 | # since the last message was appended; but we can assume that too-old | ||||
705 | # files cannot contain messages newer than their modtime. | ||||
706 | # spent 3.49ms within Mail::SpamAssassin::ArchiveIterator::_message_is_useful_by_file_modtime which was called 239 times, avg 15µs/call:
# 239 times (3.49ms+0s) by Mail::SpamAssassin::ArchiveIterator::_scan_file at line 834, avg 15µs/call | ||||
707 | 239 | 833µs | my ($self, $date) = @_; | ||
708 | |||||
709 | # better safe than sorry, if date is undef; let other stuff catch errors | ||||
710 | 239 | 406µs | return 1 unless $date; | ||
711 | |||||
712 | 239 | 482µs | if ($self->{opt_after}) { | ||
713 | return ($date > $self->{opt_after}); | ||||
714 | } | ||||
715 | else { | ||||
716 | 239 | 3.26ms | return 1; # --after not in use | ||
717 | } | ||||
718 | } | ||||
719 | |||||
720 | sub _scanprob_says_scan { | ||||
721 | my ($self) = @_; | ||||
722 | if (defined $self->{opt_scanprob} && $self->{opt_scanprob} < 1.0) { | ||||
723 | if ( int( rand( 1 / $self->{opt_scanprob} ) ) != 0 ) { | ||||
724 | return 0; | ||||
725 | } | ||||
726 | } | ||||
727 | return 1; | ||||
728 | } | ||||
729 | |||||
730 | ############################################################################ | ||||
731 | |||||
732 | # 0 850852128 atime | ||||
733 | # 1 h class | ||||
734 | # 2 m format | ||||
735 | # 3 ./ham/goodmsgs.0 path | ||||
736 | |||||
737 | # put the date in first, big-endian packed format | ||||
738 | # this format lets cmp easily sort by date, then class, format, and path. | ||||
739 | # spent 5.41ms (2.51+2.90) within Mail::SpamAssassin::ArchiveIterator::_index_pack which was called 239 times, avg 23µs/call:
# 239 times (2.51ms+2.90ms) by Mail::SpamAssassin::ArchiveIterator::__ANON__[/usr/local/lib/perl5/site_perl/Mail/SpamAssassin/ArchiveIterator.pm:305] at line 304, avg 23µs/call | ||||
740 | 239 | 6.07ms | 239 | 2.90ms | return pack("NAAA*", @_); # spent 2.90ms making 239 calls to Mail::SpamAssassin::ArchiveIterator::CORE:pack, avg 12µs/call |
741 | } | ||||
742 | |||||
743 | # spent 6.61ms (4.25+2.36) within Mail::SpamAssassin::ArchiveIterator::_index_unpack which was called 239 times, avg 28µs/call:
# 239 times (4.25ms+2.36ms) by Mail::SpamAssassin::ArchiveIterator::_run_message at line 338, avg 28µs/call | ||||
744 | 239 | 7.06ms | 239 | 2.36ms | return unpack("NAAA*", $_[0]); # spent 2.36ms making 239 calls to Mail::SpamAssassin::ArchiveIterator::CORE:unpack, avg 10µs/call |
745 | } | ||||
746 | |||||
747 | ############################################################################ | ||||
748 | |||||
749 | # spent 90.3ms (14.1+76.2) within Mail::SpamAssassin::ArchiveIterator::_scan_directory which was called 2 times, avg 45.2ms/call:
# 2 times (14.1ms+76.2ms) by Mail::SpamAssassin::ArchiveIterator::_scan_targets at line 625, avg 45.2ms/call | ||||
750 | 2 | 20µs | my ($self, $class, $folder, $bkfunc) = @_; | ||
751 | |||||
752 | 2 | 4µs | my(@files,@subdirs); | ||
753 | |||||
754 | 2 | 183µs | 4 | 123µs | if (-d "$folder/new" && -d "$folder/cur" && -d "$folder/tmp") { # spent 81µs making 2 calls to Mail::SpamAssassin::ArchiveIterator::CORE:ftdir, avg 40µs/call
# spent 42µs making 2 calls to Mail::SpamAssassin::ArchiveIterator::CORE:ftfile, avg 21µs/call |
755 | # Maildir format: bug 3003 | ||||
756 | for my $sub ("new", "cur") { | ||||
757 | opendir (DIR, "$folder/$sub") | ||||
758 | or die "Can't open '$folder/$sub' dir: $!\n"; | ||||
759 | # Don't learn from messages marked as deleted | ||||
760 | # Or files starting with a leading dot | ||||
761 | push @files, map { "$sub/$_" } grep { !/^\.|:2,.*T/ } readdir(DIR); | ||||
762 | closedir(DIR) or die "error closing directory $folder: $!"; | ||||
763 | } | ||||
764 | } | ||||
765 | elsif (-f "$folder/cyrus.header") { | ||||
766 | opendir(DIR, $folder) | ||||
767 | or die "archive-iterator: can't open '$folder' dir: $!\n"; | ||||
768 | |||||
769 | # cyrus metadata: http://unix.lsa.umich.edu/docs/imap/imap-lsa-srv_3.html | ||||
770 | @files = grep { $_ ne '.' && $_ ne '..' && | ||||
771 | /^\S+$/ && !/^cyrus\.(?:index|header|cache|seen)/ } | ||||
772 | readdir(DIR); | ||||
773 | closedir(DIR) or die "error closing directory $folder: $!"; | ||||
774 | } | ||||
775 | else { | ||||
776 | 2 | 126µs | 2 | 94µs | opendir(DIR, $folder) # spent 94µs making 2 calls to Mail::SpamAssassin::ArchiveIterator::CORE:open_dir, avg 47µs/call |
777 | or die "archive-iterator: can't open '$folder' dir: $!\n"; | ||||
778 | |||||
779 | # ignore ,234 (deleted or refiled messages) and MH metadata dotfiles | ||||
780 | 245 | 5.46ms | 245 | 1.22ms | @files = grep { !/^[,.]/ } readdir(DIR); # spent 664µs making 2 calls to Mail::SpamAssassin::ArchiveIterator::CORE:readdir, avg 332µs/call
# spent 559µs making 243 calls to Mail::SpamAssassin::ArchiveIterator::CORE:match, avg 2µs/call |
781 | 2 | 42µs | 2 | 20µs | closedir(DIR) or die "error closing directory $folder: $!"; # spent 20µs making 2 calls to Mail::SpamAssassin::ArchiveIterator::CORE:closedir, avg 10µs/call |
782 | } | ||||
783 | |||||
784 | 2 | 926µs | $_ = "$folder/$_" for @files; | ||
785 | |||||
786 | 2 | 4µs | if (!@files) { | ||
787 | # this is not a problem; no need to warn about it | ||||
788 | # warn "archive-iterator: readdir found no mail in '$folder' directory\n"; | ||||
789 | 1 | 11µs | return; | ||
790 | } | ||||
791 | |||||
792 | 1 | 10µs | 1 | 14µs | $self->_create_cache('dir', $folder); # spent 14µs making 1 call to Mail::SpamAssassin::ArchiveIterator::_create_cache |
793 | |||||
794 | 1 | 4µs | foreach my $file (@files) { | ||
795 | 239 | 12.1ms | 239 | 10.1ms | my $stat_errn = stat($file) ? 0 : 0+$!; # spent 10.1ms making 239 calls to Mail::SpamAssassin::ArchiveIterator::CORE:stat, avg 42µs/call |
796 | 239 | 4.18ms | 239 | 414µs | if ($stat_errn == ENOENT) { # spent 414µs making 239 calls to Mail::SpamAssassin::ArchiveIterator::CORE:ftfile, avg 2µs/call |
797 | # no longer there? | ||||
798 | } | ||||
799 | elsif ($stat_errn != 0) { | ||||
800 | warn "archive-iterator: no access to $file: $!"; | ||||
801 | } | ||||
802 | elsif (-f _ || -c _ || -p _) { | ||||
803 | 239 | 1.63ms | 239 | 64.3ms | $self->_scan_file($class, $file, $bkfunc); # spent 64.3ms making 239 calls to Mail::SpamAssassin::ArchiveIterator::_scan_file, avg 269µs/call |
804 | } | ||||
805 | elsif (-d _) { | ||||
806 | push(@subdirs, $file); | ||||
807 | } | ||||
808 | else { | ||||
809 | warn "archive-iterator: $file is not a plain file or directory: $!"; | ||||
810 | } | ||||
811 | } | ||||
812 | 1 | 150µs | undef @files; # release storage | ||
813 | |||||
814 | # recurse into directories | ||||
815 | 1 | 5µs | foreach my $dir (@subdirs) { | ||
816 | $self->_scan_directory($class, $dir, $bkfunc); | ||||
817 | } | ||||
818 | |||||
819 | 1 | 20µs | if (defined $AICache) { | ||
820 | $AICache = $AICache->finish(); | ||||
821 | } | ||||
822 | } | ||||
823 | |||||
824 | # spent 64.3ms (42.4+21.9) within Mail::SpamAssassin::ArchiveIterator::_scan_file which was called 239 times, avg 269µs/call:
# 239 times (42.4ms+21.9ms) by Mail::SpamAssassin::ArchiveIterator::_scan_directory at line 803, avg 269µs/call | ||||
825 | 239 | 2.77ms | my ($self, $class, $mail, $bkfunc) = @_; | ||
826 | |||||
827 | 239 | 1.38ms | 239 | 1.98ms | $self->_bump_scan_progress(); # spent 1.98ms making 239 calls to Mail::SpamAssassin::ArchiveIterator::_bump_scan_progress, avg 8µs/call |
828 | |||||
829 | # only perform these stat() operations if we're not using a cache; | ||||
830 | # it's faster to perform lookups in the cache, and more accurate | ||||
831 | 239 | 795µs | if (!defined $AICache) { | ||
832 | 239 | 20.8ms | 239 | 5.55ms | my @s = stat($mail); # spent 5.55ms making 239 calls to Mail::SpamAssassin::ArchiveIterator::CORE:stat, avg 23µs/call |
833 | 239 | 426µs | @s or warn "archive-iterator: no access to $mail: $!"; | ||
834 | 239 | 2.31ms | 239 | 3.49ms | return unless $self->_message_is_useful_by_file_modtime($s[9]); # spent 3.49ms making 239 calls to Mail::SpamAssassin::ArchiveIterator::_message_is_useful_by_file_modtime, avg 15µs/call |
835 | } | ||||
836 | |||||
837 | 239 | 456µs | my $date = AI_TIME_UNKNOWN; | ||
838 | 239 | 785µs | if ($self->{determine_receive_date}) { | ||
839 | unless (defined $AICache and $date = $AICache->check($mail)) { | ||||
840 | # silently skip directories/non-files; some folders may | ||||
841 | # contain extraneous dirs etc. | ||||
842 | my $stat_errn = stat($mail) ? 0 : 0+$!; | ||||
843 | if ($stat_errn != 0) { | ||||
844 | warn "archive-iterator: no access to $mail: $!"; | ||||
845 | return; | ||||
846 | } | ||||
847 | elsif (!-f _) { | ||||
848 | return; | ||||
849 | } | ||||
850 | |||||
851 | my $header = ''; | ||||
852 | if (!_mail_open($mail)) { | ||||
853 | $self->{access_problem} = 1; | ||||
854 | return; | ||||
855 | } | ||||
856 | for ($!=0; <INPUT>; $!=0) { | ||||
857 | last if /^\015?$/s; | ||||
858 | $header .= $_; | ||||
859 | } | ||||
860 | defined $_ || $!==0 or | ||||
861 | $!==EBADF ? dbg("archive-iterator: error reading: $!") | ||||
862 | : die "error reading: $!"; | ||||
863 | close INPUT or die "error closing input file: $!"; | ||||
864 | |||||
865 | return if ($self->{opt_skip_empty_messages} && $header eq ''); | ||||
866 | |||||
867 | $date = Mail::SpamAssassin::Util::receive_date($header); | ||||
868 | if (defined $AICache) { | ||||
869 | $AICache->update($mail, $date); | ||||
870 | } | ||||
871 | } | ||||
872 | |||||
873 | return if !$self->_message_is_useful_by_date($date); | ||||
874 | return if !$self->_scanprob_says_scan(); | ||||
875 | } | ||||
876 | else { | ||||
877 | 239 | 390µs | return if ($self->{opt_skip_empty_messages} && (-z $mail)); | ||
878 | } | ||||
879 | |||||
880 | 478 | 2.43ms | 239 | 10.9ms | &{$bkfunc}($self, $date, $class, 'f', $mail); # spent 10.9ms making 239 calls to Mail::SpamAssassin::ArchiveIterator::__ANON__[Mail/SpamAssassin/ArchiveIterator.pm:305], avg 46µs/call |
881 | |||||
882 | 239 | 2.49ms | return; | ||
883 | } | ||||
884 | |||||
885 | sub _scan_mailbox { | ||||
886 | my ($self, $class, $folder, $bkfunc) = @_; | ||||
887 | my @files; | ||||
888 | |||||
889 | my $stat_errn = stat($folder) ? 0 : 0+$!; | ||||
890 | if ($stat_errn == ENOENT) { | ||||
891 | # no longer there? | ||||
892 | } | ||||
893 | elsif ($stat_errn != 0) { | ||||
894 | warn "archive-iterator: no access to $folder: $!"; | ||||
895 | } | ||||
896 | elsif (-f _) { | ||||
897 | push(@files, $folder); | ||||
898 | } | ||||
899 | elsif (-d _) { | ||||
900 | # passed a directory of mboxes | ||||
901 | $folder =~ s/\/\s*$//; #Remove trailing slash, if there | ||||
902 | if (!opendir(DIR, $folder)) { | ||||
903 | warn "archive-iterator: can't open '$folder' dir: $!\n"; | ||||
904 | $self->{access_problem} = 1; | ||||
905 | return; | ||||
906 | } | ||||
907 | while ($_ = readdir(DIR)) { | ||||
908 | next if $_ eq '.' || $_ eq '..' || !/^[^\.]\S*$/; | ||||
909 | # hmmm, ignores folders with spaces in the name??? | ||||
910 | $stat_errn = stat("$folder/$_") ? 0 : 0+$!; | ||||
911 | if ($stat_errn == ENOENT) { | ||||
912 | # no longer there? | ||||
913 | } | ||||
914 | elsif ($stat_errn != 0) { | ||||
915 | warn "archive-iterator: no access to $folder/$_: $!"; | ||||
916 | } | ||||
917 | elsif (-f _) { | ||||
918 | push(@files, "$folder/$_"); | ||||
919 | } | ||||
920 | } | ||||
921 | closedir(DIR) or die "error closing directory $folder: $!"; | ||||
922 | } | ||||
923 | else { | ||||
924 | warn "archive-iterator: $folder is not a plain file or directory: $!"; | ||||
925 | } | ||||
926 | |||||
927 | foreach my $file (@files) { | ||||
928 | $self->_bump_scan_progress(); | ||||
929 | if ($file =~ /\.(?:gz|bz2)$/) { | ||||
930 | warn "archive-iterator: compressed mbox folders are not supported at this time\n"; | ||||
931 | $self->{access_problem} = 1; | ||||
932 | next; | ||||
933 | } | ||||
934 | |||||
935 | my @s = stat($file); | ||||
936 | @s or warn "archive-iterator: no access to $file: $!"; | ||||
937 | next unless $self->_message_is_useful_by_file_modtime($s[9]); | ||||
938 | |||||
939 | my $info = {}; | ||||
940 | my $count; | ||||
941 | |||||
942 | $self->_create_cache('mbox', $file); | ||||
943 | |||||
944 | if ($self->{opt_cache}) { | ||||
945 | if ($count = $AICache->count()) { | ||||
946 | $info = $AICache->check(); | ||||
947 | } | ||||
948 | } | ||||
949 | |||||
950 | unless ($count) { | ||||
951 | if (!_mail_open($file)) { | ||||
952 | $self->{access_problem} = 1; | ||||
953 | next; | ||||
954 | } | ||||
955 | |||||
956 | my $start = 0; # start of a message | ||||
957 | my $where = 0; # current byte offset | ||||
958 | my $first = ''; # first line of message | ||||
959 | my $header = ''; # header text | ||||
960 | my $in_header = 0; # are in we a header? | ||||
961 | while (!eof INPUT) { | ||||
962 | my $offset = $start; # byte offset of this message | ||||
963 | my $header = $first; # remember first line | ||||
964 | for ($!=0; <INPUT>; $!=0) { | ||||
965 | if ($in_header) { | ||||
966 | if (/^\015?$/s) { | ||||
967 | $in_header = 0; | ||||
968 | } | ||||
969 | else { | ||||
970 | $header .= $_; | ||||
971 | } | ||||
972 | } | ||||
973 | #Changed Regex to use option Per bug 6703 | ||||
974 | if (substr($_,0,5) eq "From " && /$self->{opt_from_regex}/o) { | ||||
975 | $in_header = 1; | ||||
976 | $first = $_; | ||||
977 | $start = $where; | ||||
978 | $where = tell INPUT; | ||||
979 | $where >= 0 or die "cannot obtain file position: $!"; | ||||
980 | last; | ||||
981 | } | ||||
982 | $where = tell INPUT; | ||||
983 | $where >= 0 or die "cannot obtain file position: $!"; | ||||
984 | } | ||||
985 | defined $_ || $!==0 or | ||||
986 | $!==EBADF ? dbg("archive-iterator: error reading: $!") | ||||
987 | : die "error reading: $!"; | ||||
988 | if ($header ne '') { | ||||
989 | # next if ($self->{opt_skip_empty_messages} && $header eq ''); | ||||
990 | $self->_bump_scan_progress(); | ||||
991 | $info->{$offset} = Mail::SpamAssassin::Util::receive_date($header); | ||||
992 | } | ||||
993 | } | ||||
994 | close INPUT or die "error closing input file: $!"; | ||||
995 | } | ||||
996 | |||||
997 | while(my($k,$v) = each %{$info}) { | ||||
998 | if (defined $AICache && !$count) { | ||||
999 | $AICache->update($k, $v); | ||||
1000 | } | ||||
1001 | |||||
1002 | if ($self->{determine_receive_date}) { | ||||
1003 | next if !$self->_message_is_useful_by_date($v); | ||||
1004 | } | ||||
1005 | next if !$self->_scanprob_says_scan(); | ||||
1006 | |||||
1007 | &{$bkfunc}($self, $v, $class, 'm', "$file.$k"); | ||||
1008 | } | ||||
1009 | |||||
1010 | if (defined $AICache) { | ||||
1011 | $AICache = $AICache->finish(); | ||||
1012 | } | ||||
1013 | } | ||||
1014 | } | ||||
1015 | |||||
1016 | sub _scan_mbx { | ||||
1017 | my ($self, $class, $folder, $bkfunc) = @_; | ||||
1018 | my (@files, $fp); | ||||
1019 | |||||
1020 | my $stat_errn = stat($folder) ? 0 : 0+$!; | ||||
1021 | if ($stat_errn == ENOENT) { | ||||
1022 | # no longer there? | ||||
1023 | } | ||||
1024 | elsif ($stat_errn != 0) { | ||||
1025 | warn "archive-iterator: no access to $folder: $!"; | ||||
1026 | } | ||||
1027 | elsif (-f _) { | ||||
1028 | push(@files, $folder); | ||||
1029 | } | ||||
1030 | elsif (-d _) { | ||||
1031 | # got passed a directory full of mbx folders. | ||||
1032 | $folder =~ s/\/\s*$//; # remove trailing slash, if there is one | ||||
1033 | if (!opendir(DIR, $folder)) { | ||||
1034 | warn "archive-iterator: can't open '$folder' dir: $!\n"; | ||||
1035 | $self->{access_problem} = 1; | ||||
1036 | return; | ||||
1037 | } | ||||
1038 | while ($_ = readdir(DIR)) { | ||||
1039 | next if $_ eq '.' || $_ eq '..' || !/^[^\.]\S*$/; | ||||
1040 | # hmmm, ignores folders with spaces in the name??? | ||||
1041 | $stat_errn = stat("$folder/$_") ? 0 : 0+$!; | ||||
1042 | if ($stat_errn == ENOENT) { | ||||
1043 | # no longer there? | ||||
1044 | } | ||||
1045 | elsif ($stat_errn != 0) { | ||||
1046 | warn "archive-iterator: no access to $folder/$_: $!"; | ||||
1047 | } | ||||
1048 | elsif (-f _) { | ||||
1049 | push(@files, "$folder/$_"); | ||||
1050 | } | ||||
1051 | } | ||||
1052 | closedir(DIR) or die "error closing directory $folder: $!"; | ||||
1053 | } | ||||
1054 | else { | ||||
1055 | warn "archive-iterator: $folder is not a plain file or directory: $!"; | ||||
1056 | } | ||||
1057 | |||||
1058 | foreach my $file (@files) { | ||||
1059 | $self->_bump_scan_progress(); | ||||
1060 | |||||
1061 | if ($folder =~ /\.(?:gz|bz2)$/) { | ||||
1062 | warn "archive-iterator: compressed mbx folders are not supported at this time\n"; | ||||
1063 | $self->{access_problem} = 1; | ||||
1064 | next; | ||||
1065 | } | ||||
1066 | |||||
1067 | my @s = stat($file); | ||||
1068 | @s or warn "archive-iterator: no access to $file: $!"; | ||||
1069 | next unless $self->_message_is_useful_by_file_modtime($s[9]); | ||||
1070 | |||||
1071 | my $info = {}; | ||||
1072 | my $count; | ||||
1073 | |||||
1074 | $self->_create_cache('mbx', $file); | ||||
1075 | |||||
1076 | if ($self->{opt_cache}) { | ||||
1077 | if ($count = $AICache->count()) { | ||||
1078 | $info = $AICache->check(); | ||||
1079 | } | ||||
1080 | } | ||||
1081 | |||||
1082 | unless ($count) { | ||||
1083 | if (!_mail_open($file)) { | ||||
1084 | $self->{access_problem} = 1; | ||||
1085 | next; | ||||
1086 | } | ||||
1087 | |||||
1088 | # check the mailbox is in mbx format | ||||
1089 | $! = 0; $fp = <INPUT>; | ||||
1090 | defined $fp || $!==0 or | ||||
1091 | $!==EBADF ? dbg("archive-iterator: error reading: $!") | ||||
1092 | : die "error reading: $!"; | ||||
1093 | if (!defined $fp) { | ||||
1094 | die "archive-iterator: error: mailbox not in mbx format - empty!\n"; | ||||
1095 | } elsif ($fp !~ /\*mbx\*/) { | ||||
1096 | die "archive-iterator: error: mailbox not in mbx format!\n"; | ||||
1097 | } | ||||
1098 | |||||
1099 | # skip mbx headers to the first email... | ||||
1100 | seek(INPUT,2048,0) or die "cannot reposition file to 2048: $!"; | ||||
1101 | my $sep = MBX_SEPARATOR; | ||||
1102 | |||||
1103 | for ($!=0; <INPUT>; $!=0) { | ||||
1104 | if ($_ =~ /$sep/) { | ||||
1105 | my $offset = tell INPUT; | ||||
1106 | $offset >= 0 or die "cannot obtain file position: $!"; | ||||
1107 | my $size = $2; | ||||
1108 | |||||
1109 | # gather up the headers... | ||||
1110 | my $header = ''; | ||||
1111 | for ($!=0; <INPUT>; $!=0) { | ||||
1112 | last if (/^\015?$/s); | ||||
1113 | $header .= $_; | ||||
1114 | } | ||||
1115 | defined $_ || $!==0 or | ||||
1116 | $!==EBADF ? dbg("archive-iterator: error reading: $!") | ||||
1117 | : die "error reading: $!"; | ||||
1118 | if (!($self->{opt_skip_empty_messages} && $header eq '')) { | ||||
1119 | $self->_bump_scan_progress(); | ||||
1120 | $info->{$offset} = Mail::SpamAssassin::Util::receive_date($header); | ||||
1121 | } | ||||
1122 | |||||
1123 | # go onto the next message | ||||
1124 | seek(INPUT, $offset + $size, 0) | ||||
1125 | or die "cannot reposition file to $offset + $size: $!"; | ||||
1126 | } | ||||
1127 | else { | ||||
1128 | die "archive-iterator: error: failure to read message body!\n"; | ||||
1129 | } | ||||
1130 | } | ||||
1131 | defined $_ || $!==0 or | ||||
1132 | $!==EBADF ? dbg("archive-iterator: error reading: $!") | ||||
1133 | : die "error reading: $!"; | ||||
1134 | close INPUT or die "error closing input file: $!"; | ||||
1135 | } | ||||
1136 | |||||
1137 | while(my($k,$v) = each %{$info}) { | ||||
1138 | if (defined $AICache && !$count) { | ||||
1139 | $AICache->update($k, $v); | ||||
1140 | } | ||||
1141 | |||||
1142 | if ($self->{determine_receive_date}) { | ||||
1143 | next if !$self->_message_is_useful_by_date($v); | ||||
1144 | } | ||||
1145 | next if !$self->_scanprob_says_scan(); | ||||
1146 | |||||
1147 | &{$bkfunc}($self, $v, $class, 'b', "$file.$k"); | ||||
1148 | } | ||||
1149 | |||||
1150 | if (defined $AICache) { | ||||
1151 | $AICache = $AICache->finish(); | ||||
1152 | } | ||||
1153 | } | ||||
1154 | } | ||||
1155 | |||||
1156 | ############################################################################ | ||||
1157 | |||||
1158 | # spent 1.98ms within Mail::SpamAssassin::ArchiveIterator::_bump_scan_progress which was called 239 times, avg 8µs/call:
# 239 times (1.98ms+0s) by Mail::SpamAssassin::ArchiveIterator::_scan_file at line 827, avg 8µs/call | ||||
1159 | 239 | 416µs | my ($self) = @_; | ||
1160 | 239 | 2.99ms | if (exists $self->{scan_progress_sub}) { | ||
1161 | return unless ($self->{scan_progress_counter}++ % 50 == 0); | ||||
1162 | $self->{scan_progress_sub}->(); | ||||
1163 | } | ||||
1164 | } | ||||
1165 | |||||
1166 | ############################################################################ | ||||
1167 | |||||
1168 | { | ||||
1169 | 1 | 2µs | my $home; | ||
1170 | |||||
1171 | # spent 266µs (101+165) within Mail::SpamAssassin::ArchiveIterator::_fix_globs which was called 2 times, avg 133µs/call:
# 2 times (101µs+165µs) by Mail::SpamAssassin::ArchiveIterator::_scan_targets at line 576, avg 133µs/call | ||||
1172 | 2 | 9µs | my ($self, $path) = @_; | ||
1173 | |||||
1174 | 2 | 6µs | unless (defined $home) { | ||
1175 | 1 | 6µs | $home = $ENV{'HOME'}; | ||
1176 | |||||
1177 | # No $HOME set? Try to find it, portably. | ||||
1178 | 1 | 2µs | unless ($home) { | ||
1179 | if (!Mail::SpamAssassin::Util::am_running_on_windows()) { | ||||
1180 | $home = (Mail::SpamAssassin::Util::portable_getpwuid($<))[7]; | ||||
1181 | } else { | ||||
1182 | my $vol = $ENV{'HOMEDRIVE'} || 'C:'; | ||||
1183 | my $dir = $ENV{'HOMEPATH'} || '\\'; | ||||
1184 | $home = File::Spec->catpath($vol, $dir, ''); | ||||
1185 | } | ||||
1186 | |||||
1187 | # Fall back to no replacement at all. | ||||
1188 | $home ||= '~'; | ||||
1189 | } | ||||
1190 | } | ||||
1191 | 2 | 21µs | 2 | 6µs | $path =~ s,^~/,${home}/,; # spent 6µs making 2 calls to Mail::SpamAssassin::ArchiveIterator::CORE:subst, avg 3µs/call |
1192 | |||||
1193 | # protect/escape spaces: ./Mail/My Letters => ./Mail/My\ Letters | ||||
1194 | 2 | 31µs | 2 | 10µs | $path =~ s/(?<!\\)(\s)/\\$1/g; # spent 10µs making 2 calls to Mail::SpamAssassin::ArchiveIterator::CORE:subst, avg 5µs/call |
1195 | |||||
1196 | # return csh-style globs: ./corpus/*.mbox => er, you know what it does ;) | ||||
1197 | 2 | 196µs | 2 | 150µs | return glob($path); # spent 150µs making 2 calls to Mail::SpamAssassin::ArchiveIterator::CORE:glob, avg 75µs/call |
1198 | } | ||||
1199 | } | ||||
1200 | |||||
1201 | 1 | 4µs | # spent 14µs within Mail::SpamAssassin::ArchiveIterator::_create_cache which was called:
# once (14µs+0s) by Mail::SpamAssassin::ArchiveIterator::_scan_directory at line 792 | ||
1202 | 1 | 5µs | my ($self, $type, $path) = @_; | ||
1203 | |||||
1204 | 1 | 12µs | if ($self->{opt_cache}) { | ||
1205 | $AICache = Mail::SpamAssassin::AICache->new({ | ||||
1206 | 'type' => $type, | ||||
1207 | 'prefix' => $self->{opt_cachedir}, | ||||
1208 | 'path' => $path, | ||||
1209 | }); | ||||
1210 | } | ||||
1211 | } | ||||
1212 | |||||
1213 | ############################################################################ | ||||
1214 | |||||
1215 | 1 | 14µs | 1; | ||
1216 | |||||
1217 | __END__ | ||||
# spent 1.04ms within Mail::SpamAssassin::ArchiveIterator::CORE:binmode which was called 239 times, avg 4µs/call:
# 239 times (1.04ms+0s) by Mail::SpamAssassin::ArchiveIterator::_mail_open at line 659, avg 4µs/call | |||||
# spent 2.39ms within Mail::SpamAssassin::ArchiveIterator::CORE:close which was called 239 times, avg 10µs/call:
# 234 times (2.36ms+0s) by Mail::SpamAssassin::ArchiveIterator::_run_file at line 408, avg 10µs/call
# 5 times (38µs+0s) by Mail::SpamAssassin::ArchiveIterator::_run_file at line 383, avg 8µs/call | |||||
# spent 20µs within Mail::SpamAssassin::ArchiveIterator::CORE:closedir which was called 2 times, avg 10µs/call:
# 2 times (20µs+0s) by Mail::SpamAssassin::ArchiveIterator::_scan_directory at line 781, avg 10µs/call | |||||
# spent 87µs within Mail::SpamAssassin::ArchiveIterator::CORE:ftdir which was called 4 times, avg 22µs/call:
# 2 times (81µs+0s) by Mail::SpamAssassin::ArchiveIterator::_scan_directory at line 754, avg 40µs/call
# 2 times (7µs+0s) by Mail::SpamAssassin::ArchiveIterator::_scan_targets at line 587, avg 3µs/call | |||||
# spent 1.71ms within Mail::SpamAssassin::ArchiveIterator::CORE:ftfile which was called 719 times, avg 2µs/call:
# 239 times (873µs+0s) by Mail::SpamAssassin::ArchiveIterator::_run_file at line 360, avg 4µs/call
# 239 times (414µs+0s) by Mail::SpamAssassin::ArchiveIterator::_scan_directory at line 796, avg 2µs/call
# 239 times (386µs+0s) by Mail::SpamAssassin::ArchiveIterator::_run_file at line 374, avg 2µs/call
# 2 times (42µs+0s) by Mail::SpamAssassin::ArchiveIterator::_scan_directory at line 754, avg 21µs/call | |||||
# spent 816µs within Mail::SpamAssassin::ArchiveIterator::CORE:ftsize which was called 244 times, avg 3µs/call:
# 239 times (808µs+0s) by Mail::SpamAssassin::ArchiveIterator::_run_file at line 374, avg 3µs/call
# 5 times (8µs+0s) by Mail::SpamAssassin::ArchiveIterator::_run_file at line 381, avg 2µs/call | |||||
# spent 150µs within Mail::SpamAssassin::ArchiveIterator::CORE:glob which was called 2 times, avg 75µs/call:
# 2 times (150µs+0s) by Mail::SpamAssassin::ArchiveIterator::_fix_globs at line 1197, avg 75µs/call | |||||
# spent 39.2ms within Mail::SpamAssassin::ArchiveIterator::CORE:match which was called 14208 times, avg 3µs/call:
# 13487 times (36.6ms+0s) by Mail::SpamAssassin::ArchiveIterator::_run_file at line 406, avg 3µs/call
# 478 times (2.00ms+0s) by Mail::SpamAssassin::ArchiveIterator::_mail_open at line 643, avg 4µs/call
# 243 times (559µs+0s) by Mail::SpamAssassin::ArchiveIterator::_scan_directory at line 780, avg 2µs/call | |||||
# spent 24.0ms within Mail::SpamAssassin::ArchiveIterator::CORE:open which was called 239 times, avg 101µs/call:
# 239 times (24.0ms+0s) by Mail::SpamAssassin::ArchiveIterator::_mail_open at line 653, avg 101µs/call | |||||
# spent 94µs within Mail::SpamAssassin::ArchiveIterator::CORE:open_dir which was called 2 times, avg 47µs/call:
# 2 times (94µs+0s) by Mail::SpamAssassin::ArchiveIterator::_scan_directory at line 776, avg 47µs/call | |||||
# spent 2.90ms within Mail::SpamAssassin::ArchiveIterator::CORE:pack which was called 239 times, avg 12µs/call:
# 239 times (2.90ms+0s) by Mail::SpamAssassin::ArchiveIterator::_index_pack at line 740, avg 12µs/call | |||||
# spent 19.4ms within Mail::SpamAssassin::ArchiveIterator::CORE:read which was called 690 times, avg 28µs/call:
# 456 times (7.41ms+0s) by Mail::SpamAssassin::ArchiveIterator::_run_file at line 400, avg 16µs/call
# 234 times (12.0ms+0s) by Mail::SpamAssassin::ArchiveIterator::_run_file at line 392, avg 51µs/call | |||||
# spent 664µs within Mail::SpamAssassin::ArchiveIterator::CORE:readdir which was called 2 times, avg 332µs/call:
# 2 times (664µs+0s) by Mail::SpamAssassin::ArchiveIterator::_scan_directory at line 780, avg 332µs/call | |||||
# spent 17.0ms within Mail::SpamAssassin::ArchiveIterator::CORE:stat which was called 719 times, avg 24µs/call:
# 239 times (10.1ms+0s) by Mail::SpamAssassin::ArchiveIterator::_scan_directory at line 795, avg 42µs/call
# 239 times (5.55ms+0s) by Mail::SpamAssassin::ArchiveIterator::_scan_file at line 832, avg 23µs/call
# 239 times (1.38ms+0s) by Mail::SpamAssassin::ArchiveIterator::_run_file at line 359, avg 6µs/call
# 2 times (63µs+0s) by Mail::SpamAssassin::ArchiveIterator::_scan_targets at line 586, avg 32µs/call | |||||
# spent 31µs within Mail::SpamAssassin::ArchiveIterator::CORE:subst which was called 8 times, avg 4µs/call:
# 2 times (10µs+0s) by Mail::SpamAssassin::ArchiveIterator::_fix_globs at line 1194, avg 5µs/call
# 2 times (9µs+0s) by Mail::SpamAssassin::ArchiveIterator::_set_default_message_selection_opts at line 674, avg 5µs/call
# 2 times (6µs+0s) by Mail::SpamAssassin::ArchiveIterator::_set_default_message_selection_opts at line 675, avg 3µs/call
# 2 times (6µs+0s) by Mail::SpamAssassin::ArchiveIterator::_fix_globs at line 1191, avg 3µs/call | |||||
# spent 2.36ms within Mail::SpamAssassin::ArchiveIterator::CORE:unpack which was called 239 times, avg 10µs/call:
# 239 times (2.36ms+0s) by Mail::SpamAssassin::ArchiveIterator::_index_unpack at line 744, avg 10µs/call |