Filename | /usr/local/lib/perl5/site_perl/Mail/SpamAssassin/Message.pm |
Statements | Executed 896594 statements in 5.64s |
Calls | P | F | Exclusive Time |
Inclusive Time |
Subroutine |
---|---|---|---|---|---|
234 | 1 | 1 | 2.26s | 4.79s | new | Mail::SpamAssassin::Message::
197 | 1 | 1 | 2.08s | 2.45s | _parse_multipart | Mail::SpamAssassin::Message::
114442 | 20 | 1 | 553ms | 553ms | CORE:match (opcode) | Mail::SpamAssassin::Message::
1170 | 3 | 1 | 156ms | 27.2s | get_body_text_array_common | Mail::SpamAssassin::Message::
702 | 1 | 1 | 143ms | 143ms | split_into_array_of_short_lines | Mail::SpamAssassin::Message::
234 | 1 | 1 | 106ms | 128ms | finish | Mail::SpamAssassin::Message::
234 | 1 | 1 | 63.5ms | 2.72s | parse_body | Mail::SpamAssassin::Message::
8112 | 2 | 1 | 57.4ms | 57.4ms | CORE:subst (opcode) | Mail::SpamAssassin::Message::
425 | 1 | 1 | 51.6ms | 202ms | _parse_normal | Mail::SpamAssassin::Message::
1246 | 4 | 1 | 33.1ms | 33.1ms | CORE:regcomp (opcode) | Mail::SpamAssassin::Message::
702 | 2 | 2 | 32.7ms | 4.42s | receive_date | Mail::SpamAssassin::Message::
702 | 1 | 1 | 32.3ms | 26.9s | get_rendered_body_text_array | Mail::SpamAssassin::Message::
1876 | 2 | 1 | 26.2ms | 26.2ms | get_metadata | Mail::SpamAssassin::Message::
234 | 1 | 1 | 25.7ms | 26.6ms | get_all_metadata | Mail::SpamAssassin::Message::
936 | 4 | 1 | 16.7ms | 16.7ms | put_metadata | Mail::SpamAssassin::Message::
702 | 1 | 1 | 16.4ms | 2.89s | find_parts | Mail::SpamAssassin::Message::
468 | 2 | 2 | 8.74ms | 3.76s | extract_message_metadata | Mail::SpamAssassin::Message::
1 | 1 | 1 | 8.50ms | 51.8ms | BEGIN@55 | Mail::SpamAssassin::Message::
702 | 1 | 1 | 7.85ms | 7.85ms | get_pristine_body | Mail::SpamAssassin::Message::
234 | 1 | 1 | 6.59ms | 18.5ms | finish_metadata | Mail::SpamAssassin::Message::
702 | 1 | 1 | 5.02ms | 5.02ms | CORE:qr (opcode) | Mail::SpamAssassin::Message::
234 | 1 | 1 | 4.73ms | 252ms | get_visible_rendered_body_text_array | Mail::SpamAssassin::Message::
1 | 1 | 1 | 4.39ms | 7.45ms | BEGIN@49 | Mail::SpamAssassin::Message::
234 | 1 | 1 | 3.87ms | 133ms | get_invisible_rendered_body_text_array | Mail::SpamAssassin::Message::
234 | 1 | 1 | 3.58ms | 3.58ms | get_pristine_header | Mail::SpamAssassin::Message::
16 | 1 | 1 | 3.14ms | 3.14ms | CORE:unlink (opcode) | Mail::SpamAssassin::Message::
79 | 1 | 1 | 1.54ms | 1.54ms | DESTROY | Mail::SpamAssassin::Message::
1 | 1 | 1 | 1.29ms | 177ms | BEGIN@56 | Mail::SpamAssassin::Message::
234 | 1 | 1 | 994µs | 994µs | CORE:sort (opcode) | Mail::SpamAssassin::Message::
16 | 1 | 1 | 172µs | 172µs | CORE:close (opcode) | Mail::SpamAssassin::Message::
1 | 1 | 1 | 44µs | 58µs | BEGIN@45 | Mail::SpamAssassin::Message::
1 | 1 | 1 | 27µs | 105µs | BEGIN@47 | Mail::SpamAssassin::Message::
1 | 1 | 1 | 23µs | 66µs | BEGIN@46 | Mail::SpamAssassin::Message::
1 | 1 | 1 | 23µs | 93µs | BEGIN@60 | Mail::SpamAssassin::Message::
1 | 1 | 1 | 22µs | 592µs | BEGIN@57 | Mail::SpamAssassin::Message::
1 | 1 | 1 | 21µs | 147µs | BEGIN@58 | Mail::SpamAssassin::Message::
1 | 1 | 1 | 18µs | 18µs | BEGIN@54 | Mail::SpamAssassin::Message::
0 | 0 | 0 | 0s | 0s | delete_metadata | Mail::SpamAssassin::Message::
0 | 0 | 0 | 0s | 0s | get_body | Mail::SpamAssassin::Message::
0 | 0 | 0 | 0s | 0s | get_decoded_body_text_array | Mail::SpamAssassin::Message::
0 | 0 | 0 | 0s | 0s | get_mbox_separator | Mail::SpamAssassin::Message::
0 | 0 | 0 | 0s | 0s | get_mimepart_digests | Mail::SpamAssassin::Message::
0 | 0 | 0 | 0s | 0s | get_pristine | Mail::SpamAssassin::Message::
0 | 0 | 0 | 0s | 0s | split_into_array_of_short_paragraphs | Mail::SpamAssassin::Message::
Line | State ments |
Time on line |
Calls | Time in subs |
Code |
---|---|---|---|---|---|
1 | # <@LICENSE> | ||||
2 | # Licensed to the Apache Software Foundation (ASF) under one or more | ||||
3 | # contributor license agreements. See the NOTICE file distributed with | ||||
4 | # this work for additional information regarding copyright ownership. | ||||
5 | # The ASF licenses this file to you under the Apache License, Version 2.0 | ||||
6 | # (the "License"); you may not use this file except in compliance with | ||||
7 | # the License. You may obtain a copy of the License at: | ||||
8 | # | ||||
9 | # http://www.apache.org/licenses/LICENSE-2.0 | ||||
10 | # | ||||
11 | # Unless required by applicable law or agreed to in writing, software | ||||
12 | # distributed under the License is distributed on an "AS IS" BASIS, | ||||
13 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||||
14 | # See the License for the specific language governing permissions and | ||||
15 | # limitations under the License. | ||||
16 | # </@LICENSE> | ||||
17 | |||||
18 | =head1 NAME | ||||
19 | |||||
20 | Mail::SpamAssassin::Message - decode, render, and hold an RFC-2822 message | ||||
21 | |||||
22 | =head1 DESCRIPTION | ||||
23 | |||||
24 | This module encapsulates an email message and allows access to the various MIME | ||||
25 | message parts and message metadata. | ||||
26 | |||||
27 | The message structure, after initiating a parse() cycle, looks like this: | ||||
28 | |||||
29 | Message object, also top-level node in Message::Node tree | ||||
30 | | | ||||
31 | +---> Message::Node for other parts in MIME structure | ||||
32 | | |---> [ more Message::Node parts ... ] | ||||
33 | | [ others ... ] | ||||
34 | | | ||||
35 | +---> Message::Metadata object to hold metadata | ||||
36 | |||||
37 | =head1 PUBLIC METHODS | ||||
38 | |||||
39 | =over 4 | ||||
40 | |||||
41 | =cut | ||||
42 | |||||
43 | package Mail::SpamAssassin::Message; | ||||
44 | |||||
45 | 2 | 61µs | 2 | 73µs | # spent 58µs (44+15) within Mail::SpamAssassin::Message::BEGIN@45 which was called:
# once (44µs+15µs) by Mail::SpamAssassin::BEGIN@75 at line 45 # spent 58µs making 1 call to Mail::SpamAssassin::Message::BEGIN@45
# spent 15µs making 1 call to strict::import |
46 | 2 | 61µs | 2 | 109µs | # spent 66µs (23+43) within Mail::SpamAssassin::Message::BEGIN@46 which was called:
# once (23µs+43µs) by Mail::SpamAssassin::BEGIN@75 at line 46 # spent 66µs making 1 call to Mail::SpamAssassin::Message::BEGIN@46
# spent 43µs making 1 call to warnings::import |
47 | 2 | 144µs | 2 | 182µs | # spent 105µs (27+77) within Mail::SpamAssassin::Message::BEGIN@47 which was called:
# once (27µs+77µs) by Mail::SpamAssassin::BEGIN@75 at line 47 # spent 105µs making 1 call to Mail::SpamAssassin::Message::BEGIN@47
# spent 78µs making 1 call to re::import |
48 | |||||
49 | # spent 7.45ms (4.39+3.06) within Mail::SpamAssassin::Message::BEGIN@49 which was called:
# once (4.39ms+3.06ms) by Mail::SpamAssassin::BEGIN@75 at line 52 | ||||
50 | 3 | 324µs | 1 | 429µs | eval { require Digest::SHA; import Digest::SHA qw(sha1 sha1_hex); 1 } # spent 429µs making 1 call to Exporter::import |
51 | 1 | 12µs | or do { require Digest::SHA1; import Digest::SHA1 qw(sha1 sha1_hex) } | ||
52 | 1 | 78µs | 1 | 7.45ms | } # spent 7.45ms making 1 call to Mail::SpamAssassin::Message::BEGIN@49 |
53 | |||||
54 | 2 | 63µs | 1 | 18µs | # spent 18µs within Mail::SpamAssassin::Message::BEGIN@54 which was called:
# once (18µs+0s) by Mail::SpamAssassin::BEGIN@75 at line 54 # spent 18µs making 1 call to Mail::SpamAssassin::Message::BEGIN@54 |
55 | 2 | 426µs | 1 | 51.8ms | # spent 51.8ms (8.50+43.3) within Mail::SpamAssassin::Message::BEGIN@55 which was called:
# once (8.50ms+43.3ms) by Mail::SpamAssassin::BEGIN@75 at line 55 # spent 51.8ms making 1 call to Mail::SpamAssassin::Message::BEGIN@55 |
56 | 2 | 470µs | 1 | 177ms | # spent 177ms (1.29+176) within Mail::SpamAssassin::Message::BEGIN@56 which was called:
# once (1.29ms+176ms) by Mail::SpamAssassin::BEGIN@75 at line 56 # spent 177ms making 1 call to Mail::SpamAssassin::Message::BEGIN@56 |
57 | 2 | 66µs | 2 | 1.16ms | # spent 592µs (22+570) within Mail::SpamAssassin::Message::BEGIN@57 which was called:
# once (22µs+570µs) by Mail::SpamAssassin::BEGIN@75 at line 57 # spent 592µs making 1 call to Mail::SpamAssassin::Message::BEGIN@57
# spent 570µs making 1 call to Exporter::import |
58 | 2 | 64µs | 2 | 273µs | # spent 147µs (21+126) within Mail::SpamAssassin::Message::BEGIN@58 which was called:
# once (21µs+126µs) by Mail::SpamAssassin::BEGIN@75 at line 58 # spent 147µs making 1 call to Mail::SpamAssassin::Message::BEGIN@58
# spent 126µs making 1 call to Exporter::import |
59 | |||||
60 | 2 | 9.13ms | 2 | 164µs | # spent 93µs (23+71) within Mail::SpamAssassin::Message::BEGIN@60 which was called:
# once (23µs+71µs) by Mail::SpamAssassin::BEGIN@75 at line 60 # spent 93µs making 1 call to Mail::SpamAssassin::Message::BEGIN@60
# spent 71µs making 1 call to vars::import |
61 | |||||
62 | 1 | 22µs | @ISA = qw(Mail::SpamAssassin::Message::Node); | ||
63 | |||||
64 | # --------------------------------------------------------------------------- | ||||
65 | |||||
66 | =item new() | ||||
67 | |||||
68 | Creates a Mail::SpamAssassin::Message object. Takes a hash reference | ||||
69 | as a parameter. The used hash key/value pairs are as follows: | ||||
70 | |||||
71 | C<message> is either undef (which will use STDIN), a scalar - a string | ||||
72 | containing an entire message, a reference to such string, an array reference | ||||
73 | of the message with one line per array element, or either a file glob | ||||
74 | or an IO::File object which holds the entire contents of the message. | ||||
75 | |||||
76 | Note: The message is expected to generally be in RFC 2822 format, optionally | ||||
77 | including an mbox message separator line (the "From " line) as the first line. | ||||
78 | |||||
79 | C<parse_now> specifies whether or not to create the MIME tree | ||||
80 | at object-creation time or later as necessary. | ||||
81 | |||||
82 | The I<parse_now> option, by default, is set to false (0). | ||||
83 | This allows SpamAssassin to not have to generate the tree of | ||||
84 | Mail::SpamAssassin::Message::Node objects and their related data if the | ||||
85 | tree is not going to be used. This is handy, for instance, when running | ||||
86 | C<spamassassin -d>, which only needs the pristine header and body which | ||||
87 | is always handled when the object is created. | ||||
88 | |||||
89 | C<subparse> specifies how many MIME recursion levels should be parsed. | ||||
90 | Defaults to 20. | ||||
91 | |||||
92 | =cut | ||||
93 | |||||
94 | # month mappings (ripped from Util.pm) | ||||
95 | 1 | 14µs | my %MONTH = (jan => 1, feb => 2, mar => 3, apr => 4, may => 5, jun => 6, | ||
96 | jul => 7, aug => 8, sep => 9, oct => 10, nov => 11, dec => 12); | ||||
97 | |||||
98 | # day of week mapping (starting from zero) | ||||
99 | 1 | 4µs | my @DAY_OF_WEEK = qw/Sun Mon Tue Wed Thu Fri Sat/ ; | ||
100 | |||||
101 | # spent 4.79s (2.26+2.53) within Mail::SpamAssassin::Message::new which was called 234 times, avg 20.5ms/call:
# 234 times (2.26s+2.53s) by Mail::SpamAssassin::parse at line 551 of Mail/SpamAssassin.pm, avg 20.5ms/call | ||||
102 | 234 | 674µs | my $class = shift; | ||
103 | 234 | 646µs | $class = ref($class) || $class; | ||
104 | |||||
105 | 234 | 563µs | my($opts) = @_; | ||
106 | 234 | 1.03ms | my $message = defined $opts->{'message'} ? $opts->{'message'} : \*STDIN; | ||
107 | 234 | 874µs | my $parsenow = $opts->{'parsenow'} || 0; | ||
108 | 234 | 644µs | my $normalize = $opts->{'normalize'} || 0; | ||
109 | |||||
110 | # Specifies whether or not to parse message/rfc822 parts into its own tree. | ||||
111 | # If the # > 0, it'll subparse, otherwise it won't. By default, do twenty | ||||
112 | # levels deep. | ||||
113 | 234 | 777µs | my $subparse = defined $opts->{'subparse'} ? $opts->{'subparse'} : 20; | ||
114 | |||||
115 | 234 | 3.34ms | 234 | 7.50ms | my $self = $class->SUPER::new({normalize=>$normalize}); # spent 7.50ms making 234 calls to Mail::SpamAssassin::Message::Node::new, avg 32µs/call |
116 | |||||
117 | 234 | 776µs | $self->{tmpfiles} = []; | ||
118 | 234 | 838µs | $self->{pristine_headers} = ''; | ||
119 | 234 | 994µs | $self->{pristine_body} = ''; | ||
120 | 234 | 1.04ms | $self->{mime_boundary_state} = {}; | ||
121 | 234 | 794µs | $self->{line_ending} = "\012"; | ||
122 | 234 | 711µs | $self->{master_deadline} = $opts->{'master_deadline'}; | ||
123 | 234 | 646µs | $self->{suppl_attrib} = $opts->{'suppl_attrib'}; | ||
124 | |||||
125 | 234 | 520µs | if ($self->{suppl_attrib}) { # caller-provided additional information | ||
126 | # pristine_body_length is currently used by an eval test check_body_length | ||||
127 | # Possible To-Do: Base the length on the @message array later down? | ||||
128 | if (defined $self->{suppl_attrib}{body_size}) { | ||||
129 | # Optional info provided by a caller; should reflect the original | ||||
130 | # message body size if provided, and as such it may differ from the | ||||
131 | # $self->{pristine_body} size, e.g. when the caller passed a truncated | ||||
132 | # message to SpamAssassin, or when counting line-endings differently. | ||||
133 | $self->{pristine_body_length} = $self->{suppl_attrib}{body_size}; | ||||
134 | } | ||||
135 | if (ref $self->{suppl_attrib}{mimepart_digests}) { | ||||
136 | # Optional info provided by a caller: an array of digest codes (e.g. SHA1) | ||||
137 | # of each MIME part. Should reflect the original message if provided. | ||||
138 | # As such it may differ from digests calculated by get_mimepart_digests(), | ||||
139 | # e.g. when the caller passed a truncated message to SpamAssassin. | ||||
140 | $self->{mimepart_digests} = $self->{suppl_attrib}{mimepart_digests}; | ||||
141 | } | ||||
142 | } | ||||
143 | |||||
144 | 234 | 582µs | bless($self,$class); | ||
145 | |||||
146 | # create the metadata holder class | ||||
147 | 234 | 3.13ms | 234 | 4.87ms | $self->{metadata} = Mail::SpamAssassin::Message::Metadata->new($self); # spent 4.87ms making 234 calls to Mail::SpamAssassin::Message::Metadata::new, avg 21µs/call |
148 | |||||
149 | # Ok, go ahead and do the message "parsing" | ||||
150 | |||||
151 | # protect it from abuse ... | ||||
152 | 234 | 460µs | local $_; | ||
153 | |||||
154 | # Figure out how the message was passed to us, and deal with it. | ||||
155 | 234 | 487µs | my @message; | ||
156 | 234 | 1.33ms | if (ref $message eq 'ARRAY') { | ||
157 | 468 | 52.9ms | @message = @{$message}; | ||
158 | } | ||||
159 | elsif (ref($message) eq 'GLOB' || ref($message) =~ /^IO::/) { | ||||
160 | if (defined fileno $message) { | ||||
161 | |||||
162 | # sysread+split avoids a Perl I/O bug (Bug 5985) | ||||
163 | # and is faster than (<$message>) by 10..25 % | ||||
164 | # (a drawback is a short-term double storage of a text in $raw_str) | ||||
165 | # | ||||
166 | my($nread,$raw_str); $raw_str = ''; | ||||
167 | while ( $nread=sysread($message, $raw_str, 16384, length $raw_str) ) { } | ||||
168 | defined $nread or die "error reading: $!"; | ||||
169 | @message = split(/^/m, $raw_str, -1); | ||||
170 | |||||
171 | if ($raw_str eq '') { | ||||
172 | dbg("message: empty message read"); | ||||
173 | } elsif (length($raw_str) > 128*1024) { | ||||
174 | # ditch rarely used large chunks of allocated memory, Bug 6514 | ||||
175 | # http://www.perlmonks.org/?node_id=803515 | ||||
176 | # about 97% of mail messages are below 128 kB, | ||||
177 | # about 98% of mail messages are below 256 kB (2010 statistics) | ||||
178 | # dbg("message: deallocating %.2f MB", length($raw_str)/1024/1024); | ||||
179 | undef $raw_str; | ||||
180 | } | ||||
181 | } | ||||
182 | } | ||||
183 | elsif (ref $message eq 'SCALAR') { | ||||
184 | @message = split(/^/m, $$message, -1); | ||||
185 | } | ||||
186 | elsif (ref $message) { | ||||
187 | dbg("message: Input is a reference of unknown type!"); | ||||
188 | } | ||||
189 | elsif (defined $message) { | ||||
190 | @message = split(/^/m, $message, -1); | ||||
191 | } | ||||
192 | |||||
193 | # Pull off mbox and mbx separators | ||||
194 | # also deal with null messages | ||||
195 | 234 | 4.92ms | 468 | 1.62ms | if (!@message) { # spent 1.62ms making 468 calls to Mail::SpamAssassin::Message::CORE:match, avg 3µs/call |
196 | # bug 4884: | ||||
197 | # if we get here, it means that the input was null, so fake the message | ||||
198 | # content as a single newline... | ||||
199 | @message = ("\n"); | ||||
200 | } elsif ($message[0] =~ /^From\s+(?!:)/) { | ||||
201 | # careful not to confuse with obsolete syntax which allowed WSP before ':' | ||||
202 | # mbox formated mailbox | ||||
203 | $self->{'mbox_sep'} = shift @message; | ||||
204 | } elsif ($message[0] =~ MBX_SEPARATOR) { | ||||
205 | $_ = shift @message; | ||||
206 | |||||
207 | # Munge the mbx message separator into mbox format as a sort of | ||||
208 | # de facto portability standard in SA's internals. We need to | ||||
209 | # to this so that Mail::SpamAssassin::Util::parse_rfc822_date | ||||
210 | # can parse the date string... | ||||
211 | if (/([\s\d]\d)-([a-zA-Z]{3})-(\d{4})\s(\d{2}):(\d{2}):(\d{2})/) { | ||||
212 | # $1 = day of month | ||||
213 | # $2 = month (text) | ||||
214 | # $3 = year | ||||
215 | # $4 = hour | ||||
216 | # $5 = min | ||||
217 | # $6 = sec | ||||
218 | my @arr = localtime(timelocal($6,$5,$4,$1,$MONTH{lc($2)}-1,$3)); | ||||
219 | my $address; | ||||
220 | foreach (@message) { | ||||
221 | if (/^From:[^<]*<([^>]+)>/) { | ||||
222 | $address = $1; | ||||
223 | last; | ||||
224 | } elsif (/^From:\s*([^<> ]+)/) { | ||||
225 | $address = $1; | ||||
226 | last; | ||||
227 | } | ||||
228 | } | ||||
229 | $self->{'mbox_sep'} = "From $address $DAY_OF_WEEK[$arr[6]] $2 $1 $4:$5:$6 $3\n"; | ||||
230 | } | ||||
231 | } | ||||
232 | |||||
233 | # bug 4363 | ||||
234 | # Check to see if we should do CRLF instead of just LF | ||||
235 | # For now, just check the first and last line and do whatever it does | ||||
236 | 234 | 4.06ms | 468 | 1.42ms | if (@message && ($message[0] =~ /\015\012/ || $message[-1] =~ /\015\012/)) { # spent 1.42ms making 468 calls to Mail::SpamAssassin::Message::CORE:match, avg 3µs/call |
237 | $self->{line_ending} = "\015\012"; | ||||
238 | dbg("message: line ending changed to CRLF"); | ||||
239 | } | ||||
240 | |||||
241 | # Is a CRLF -> LF line endings conversion necessary? | ||||
242 | 234 | 972µs | my $squash_crlf = $self->{line_ending} eq "\015\012"; | ||
243 | |||||
244 | # Go through all the header fields of the message | ||||
245 | 234 | 510µs | my $hdr_errors = 0; | ||
246 | 234 | 413µs | my $header; | ||
247 | 234 | 429µs | for (;;) { | ||
248 | # make sure not to lose the last header field when there is no body | ||||
249 | 13487 | 37.8ms | my $eof = !@message; | ||
250 | 13487 | 39.7ms | my $current = $eof ? "\n" : shift @message; | ||
251 | |||||
252 | 13487 | 250ms | 13487 | 63.2ms | if ( $current =~ /^[ \t]/ ) { # spent 63.2ms making 13487 calls to Mail::SpamAssassin::Message::CORE:match, avg 5µs/call |
253 | # This wasn't useful in terms of a rule, but we may want to treat it | ||||
254 | # specially at some point. Perhaps ignore it? | ||||
255 | #unless ($current =~ /\S/) { | ||||
256 | # $self->{'obsolete_folding_whitespace'} = 1; | ||||
257 | #} | ||||
258 | |||||
259 | 5843 | 10.00ms | $header = '' if !defined $header; # header starts with a continuation!? | ||
260 | 5843 | 16.0ms | $header .= $current; # append continuations, no matter what | ||
261 | 5843 | 36.7ms | $self->{'pristine_headers'} .= $current; | ||
262 | } | ||||
263 | else { # not a continuation | ||||
264 | # Ok, there's a header here, let's go ahead and add it in. | ||||
265 | 7644 | 26.0ms | if (defined $header) { # deal with a previous header field | ||
266 | 7410 | 83.2ms | my ($key, $value) = split (/:/s, $header, 2); | ||
267 | |||||
268 | # If it's not a valid header (aka: not in the form "foo:bar"), skip it. | ||||
269 | 7410 | 29.0ms | if (defined $value) { | ||
270 | # CRLF -> LF line-endings conversion if necessary | ||||
271 | 7410 | 11.5ms | $value =~ s/\015\012/\012/sg if $squash_crlf; | ||
272 | 7410 | 108ms | 7410 | 27.1ms | $key =~ s/[ \t]+\z//; # strip WSP before colon, obsolete rfc822 syn # spent 27.1ms making 7410 calls to Mail::SpamAssassin::Message::CORE:subst, avg 4µs/call |
273 | # limit the length of the pairs we store | ||||
274 | 7410 | 21.9ms | if (length($key) > MAX_HEADER_KEY_LENGTH) { | ||
275 | $key = substr($key, 0, MAX_HEADER_KEY_LENGTH); | ||||
276 | $self->{'truncated_header'} = 1; | ||||
277 | } | ||||
278 | 7410 | 14.3ms | if (length($value) > MAX_HEADER_VALUE_LENGTH) { | ||
279 | $value = substr($value, 0, MAX_HEADER_VALUE_LENGTH); | ||||
280 | $self->{'truncated_header'} = 1; | ||||
281 | } | ||||
282 | 7410 | 52.1ms | 7410 | 1.91s | $self->header($key, $value); # spent 1.91s making 7410 calls to Mail::SpamAssassin::Message::Node::header, avg 258µs/call |
283 | } | ||||
284 | } | ||||
285 | |||||
286 | 7644 | 196ms | 15054 | 59.8ms | if ($current =~ /^\r?$/) { # a regular end of a header section # spent 59.8ms making 15054 calls to Mail::SpamAssassin::Message::CORE:match, avg 4µs/call |
287 | 234 | 867µs | if ($eof) { | ||
288 | $self->{'missing_head_body_separator'} = 1; | ||||
289 | } else { | ||||
290 | 234 | 760µs | $self->{'pristine_headers'} .= $current; | ||
291 | } | ||||
292 | 234 | 722µs | last; | ||
293 | } | ||||
294 | elsif ($current =~ /^--/) { # mime boundary encountered, bail out | ||||
295 | $self->{'missing_head_body_separator'} = 1; | ||||
296 | unshift(@message, $current); | ||||
297 | last; | ||||
298 | } | ||||
299 | # should we assume entering a body on encountering invalid header field? | ||||
300 | else { | ||||
301 | # no re "strict"; # since perl 5.21.8: Ranges of ASCII printables... | ||||
302 | 7410 | 132ms | 7410 | 62.5ms | if ($current !~ /^[\041-\071\073-\176]+[ \t]*:/) { # spent 62.5ms making 7410 calls to Mail::SpamAssassin::Message::CORE:match, avg 8µs/call |
303 | # A field name MUST be composed of printable US-ASCII characters | ||||
304 | # (i.e., characters that have values between 33 (041) and 126 (176), | ||||
305 | # inclusive), except colon (072). Obsolete header field syntax | ||||
306 | # allowed WSP before a colon. | ||||
307 | if (++$hdr_errors <= 3) { | ||||
308 | # just consume but ignore a few invalid header fields | ||||
309 | } else { # enough is enough... | ||||
310 | $self->{'missing_head_body_separator'} = 1; | ||||
311 | unshift(@message, $current); | ||||
312 | last; | ||||
313 | } | ||||
314 | } | ||||
315 | } | ||||
316 | |||||
317 | # start collecting a new header field | ||||
318 | 7410 | 16.0ms | $header = $current; | ||
319 | 7410 | 47.5ms | $self->{'pristine_headers'} .= $current; | ||
320 | } | ||||
321 | } | ||||
322 | 234 | 608µs | undef $header; | ||
323 | |||||
324 | # Store the pristine body for later -- store as a copy since @message | ||||
325 | # will get modified below | ||||
326 | 234 | 25.9ms | $self->{'pristine_body'} = join('', @message); | ||
327 | |||||
328 | 234 | 1.04ms | if (!defined $self->{pristine_body_length}) { | ||
329 | 234 | 1.26ms | $self->{'pristine_body_length'} = length $self->{'pristine_body'}; | ||
330 | } | ||||
331 | |||||
332 | # iterate over lines in reverse order | ||||
333 | # merge multiple blank lines into a single one | ||||
334 | 234 | 506µs | my $start; | ||
335 | 234 | 245ms | for (my $cnt=$#message; $cnt>=0; $cnt--) { | ||
336 | # CRLF -> LF line-endings conversion if necessary | ||||
337 | 71012 | 110ms | $message[$cnt] =~ s/\015\012\z/\012/ if $squash_crlf; | ||
338 | |||||
339 | # line is blank | ||||
340 | 71012 | 930ms | 71012 | 333ms | if ($message[$cnt] =~ /^\s*$/) { # spent 333ms making 71012 calls to Mail::SpamAssassin::Message::CORE:match, avg 5µs/call |
341 | # /^\s*$/ is about 5% faster then !/\S/, but still expensive here | ||||
342 | 4562 | 12.7ms | if (!defined $start) { | ||
343 | 3329 | 5.64ms | $start=$cnt; | ||
344 | } | ||||
345 | 4562 | 9.53ms | next unless $cnt == 0; | ||
346 | } | ||||
347 | |||||
348 | # line is not blank, or we've reached the beginning | ||||
349 | |||||
350 | # if we've got a series of blank lines, get rid of them | ||||
351 | 66480 | 121ms | if (defined $start) { | ||
352 | 3329 | 5.73ms | my $max_blank_lines = 20; | ||
353 | 3329 | 5.86ms | my $num = $start-$cnt; | ||
354 | 3329 | 5.56ms | if ($num > $max_blank_lines) { | ||
355 | 6 | 38µs | splice @message, $cnt+2, $num-$max_blank_lines; | ||
356 | } | ||||
357 | 3329 | 26.0ms | undef $start; | ||
358 | } | ||||
359 | } | ||||
360 | |||||
361 | # Figure out the boundary | ||||
362 | 234 | 455µs | my ($boundary); | ||
363 | 234 | 5.52ms | 468 | 48.4ms | ($self->{'type'}, $boundary) = Mail::SpamAssassin::Util::parse_content_type($self->header('content-type')); # spent 34.3ms making 234 calls to Mail::SpamAssassin::Util::parse_content_type, avg 147µs/call
# spent 14.1ms making 234 calls to Mail::SpamAssassin::Message::Node::header, avg 60µs/call |
364 | 234 | 2.77ms | 234 | 2.57ms | dbg("message: main message type: ".$self->{'type'}); # spent 2.57ms making 234 calls to Mail::SpamAssassin::Logger::dbg, avg 11µs/call |
365 | |||||
366 | # dbg("message: \$message[0]: \"" . $message[0] . "\""); | ||||
367 | |||||
368 | # bug 6845: if main message type is multipart and the message body does not begin with | ||||
369 | # either a blank line or the boundary (if defined), insert a blank line | ||||
370 | # to ensure proper parsing - do not consider MIME headers at the beginning of the body | ||||
371 | # to be part of the message headers. | ||||
372 | 234 | 20.6ms | 427 | 1.85ms | if ($self->{'type'} =~ /^multipart\//i && $#message > 0 && $message[0] =~ /\S/) # spent 1.85ms making 427 calls to Mail::SpamAssassin::Message::CORE:match, avg 4µs/call |
373 | { | ||||
374 | 170 | 13.1ms | 340 | 9.80ms | if (!defined $boundary || $message[0] !~ /^--\Q$boundary\E/) # spent 9.20ms making 170 calls to Mail::SpamAssassin::Message::CORE:regcomp, avg 54µs/call
# spent 596µs making 170 calls to Mail::SpamAssassin::Message::CORE:match, avg 4µs/call |
375 | { | ||||
376 | 39 | 260µs | 39 | 362µs | dbg("message: Inserting blank line at top of body to ensure correct multipart MIME parsing"); # spent 362µs making 39 calls to Mail::SpamAssassin::Logger::dbg, avg 9µs/call |
377 | 39 | 193µs | unshift(@message, "\012"); | ||
378 | } | ||||
379 | } | ||||
380 | |||||
381 | # dbg("message: \$message[0]: \"" . $message[0] . "\""); | ||||
382 | # dbg("message: \$message[1]: \"" . $message[1] . "\""); | ||||
383 | |||||
384 | # parse queue, simple array of parts to parse: | ||||
385 | # 0: part object, already in the tree | ||||
386 | # 1: boundary used to focus body parsing | ||||
387 | # 2: message content | ||||
388 | # 3: how many MIME subparts to parse down | ||||
389 | # | ||||
390 | 234 | 1.92ms | $self->{'parse_queue'} = [ [ $self, $boundary, \@message, $subparse ] ]; | ||
391 | |||||
392 | # If the message does need to get parsed, save off a copy of the body | ||||
393 | # in a format we can easily parse later so we don't have to rip from | ||||
394 | # pristine_body ... If we do want to parse now, go ahead and do so ... | ||||
395 | # | ||||
396 | 234 | 461µs | if ($parsenow) { | ||
397 | $self->parse_body(); | ||||
398 | } | ||||
399 | |||||
400 | 234 | 4.92ms | $self; | ||
401 | } | ||||
402 | |||||
403 | # --------------------------------------------------------------------------- | ||||
404 | |||||
405 | =item find_parts() | ||||
406 | |||||
407 | Used to search the tree for specific MIME parts. See | ||||
408 | I<Mail::SpamAssassin::Message::Node> for more details. | ||||
409 | |||||
410 | =cut | ||||
411 | |||||
412 | # Used to find any MIME parts whose simple content-type matches a given regexp | ||||
413 | # Searches it's own and any children parts. Returns an array of MIME | ||||
414 | # objects which match. | ||||
415 | # | ||||
416 | # spent 2.89s (16.4ms+2.87) within Mail::SpamAssassin::Message::find_parts which was called 702 times, avg 4.12ms/call:
# 702 times (16.4ms+2.87s) by Mail::SpamAssassin::Message::get_body_text_array_common at line 1111, avg 4.12ms/call | ||||
417 | 702 | 1.55ms | my $self = shift; | ||
418 | |||||
419 | # ok, we need to do the parsing now... | ||||
420 | 702 | 3.56ms | 234 | 2.72s | $self->parse_body() if (exists $self->{'parse_queue'}); # spent 2.72s making 234 calls to Mail::SpamAssassin::Message::parse_body, avg 11.6ms/call |
421 | |||||
422 | # and pass through to the Message::Node version of the method | ||||
423 | 702 | 11.5ms | 702 | 153ms | return $self->SUPER::find_parts(@_); # spent 153ms making 702 calls to Mail::SpamAssassin::Message::Node::find_parts, avg 217µs/call |
424 | } | ||||
425 | |||||
426 | # --------------------------------------------------------------------------- | ||||
427 | |||||
428 | =item get_pristine_header() | ||||
429 | |||||
430 | Returns pristine headers of the message. If no specific header name | ||||
431 | is given as a parameter (case-insensitive), then all headers will be | ||||
432 | returned as a scalar, including the blank line at the end of the headers. | ||||
433 | |||||
434 | If called in an array context, an array will be returned with each | ||||
435 | specific header in a different element. In a scalar context, the last | ||||
436 | specific header is returned. | ||||
437 | |||||
438 | ie: If 'Subject' is specified as the header, and there are 2 Subject | ||||
439 | headers in a message, the last/bottom one in the message is returned in | ||||
440 | scalar context or both are returned in array context. | ||||
441 | |||||
442 | Btw, returning the last header field (not the first) happens to be consistent | ||||
443 | with DKIM signatures, which search for and cover multiple header fields | ||||
444 | bottom-up according to the 'h' tag. Let's keep it this way. | ||||
445 | |||||
446 | Note: the returned header will include the ending newline and any embedded | ||||
447 | whitespace folding. | ||||
448 | |||||
449 | =cut | ||||
450 | |||||
451 | # spent 3.58ms within Mail::SpamAssassin::Message::get_pristine_header which was called 234 times, avg 15µs/call:
# 234 times (3.58ms+0s) by Mail::SpamAssassin::PerMsgStatus::_get at line 1913 of Mail/SpamAssassin/PerMsgStatus.pm, avg 15µs/call | ||||
452 | 234 | 548µs | my ($self, $hdr) = @_; | ||
453 | |||||
454 | 234 | 10.3ms | return $self->{pristine_headers} if !defined $hdr || $hdr eq ''; | ||
455 | my(@ret) = | ||||
456 | $self->{pristine_headers} =~ /^\Q$hdr\E[ \t]*:[ \t]*(.*?\n(?![ \t]))/smgi; | ||||
457 | # taintedness is retained by "use re 'taint'" (fix in bug 5283 now redundant) | ||||
458 | if (!@ret) { | ||||
459 | return $self->get_header($hdr); | ||||
460 | } elsif (wantarray) { | ||||
461 | return @ret; | ||||
462 | } else { | ||||
463 | return $ret[-1]; | ||||
464 | } | ||||
465 | } | ||||
466 | |||||
467 | =item get_mbox_separator() | ||||
468 | |||||
469 | Returns the mbox separator found in the message, or undef if there | ||||
470 | wasn't one. | ||||
471 | |||||
472 | =cut | ||||
473 | |||||
474 | sub get_mbox_separator { | ||||
475 | return $_[0]->{mbox_sep}; | ||||
476 | } | ||||
477 | |||||
478 | =item get_body() | ||||
479 | |||||
480 | Returns an array of the pristine message body, one line per array element. | ||||
481 | |||||
482 | =cut | ||||
483 | |||||
484 | sub get_body { | ||||
485 | my ($self) = @_; | ||||
486 | my @ret = split(/^/m, $self->{pristine_body}); | ||||
487 | return \@ret; | ||||
488 | } | ||||
489 | |||||
490 | # --------------------------------------------------------------------------- | ||||
491 | |||||
492 | =item get_pristine() | ||||
493 | |||||
494 | Returns a scalar of the entire pristine message. | ||||
495 | |||||
496 | =cut | ||||
497 | |||||
498 | sub get_pristine { | ||||
499 | my ($self) = @_; | ||||
500 | return $self->{pristine_headers} . $self->{pristine_body}; | ||||
501 | } | ||||
502 | |||||
503 | =item get_pristine_body() | ||||
504 | |||||
505 | Returns a scalar of the pristine message body. | ||||
506 | |||||
507 | =cut | ||||
508 | |||||
509 | # spent 7.85ms within Mail::SpamAssassin::Message::get_pristine_body which was called 702 times, avg 11µs/call:
# 702 times (7.85ms+0s) by Mail::SpamAssassin::Plugin::Bayes::get_msgid at line 998 of Mail/SpamAssassin/Plugin/Bayes.pm, avg 11µs/call | ||||
510 | 702 | 1.71ms | my ($self) = @_; | ||
511 | 702 | 7.49ms | return $self->{pristine_body}; | ||
512 | } | ||||
513 | |||||
514 | # --------------------------------------------------------------------------- | ||||
515 | |||||
516 | =item extract_message_metadata($permsgstatus) | ||||
517 | |||||
518 | =cut | ||||
519 | |||||
520 | # spent 3.76s (8.74ms+3.75) within Mail::SpamAssassin::Message::extract_message_metadata which was called 468 times, avg 8.02ms/call:
# 234 times (6.12ms+3.75s) by Mail::SpamAssassin::PerMsgStatus::extract_message_metadata at line 1703 of Mail/SpamAssassin/PerMsgStatus.pm, avg 16.0ms/call
# 234 times (2.62ms+0s) by Mail::SpamAssassin::Plugin::Bayes::get_body_from_msg at line 1025 of Mail/SpamAssassin/Plugin/Bayes.pm, avg 11µs/call | ||||
521 | 468 | 1.03ms | my ($self, $permsgstatus) = @_; | ||
522 | |||||
523 | # do this only once per message, it can be expensive | ||||
524 | 468 | 3.01ms | return if $self->{already_extracted_metadata}; | ||
525 | 234 | 1.17ms | $self->{already_extracted_metadata} = 1; | ||
526 | |||||
527 | 234 | 3.79ms | 234 | 3.75s | $self->{metadata}->extract ($self, $permsgstatus); # spent 3.75s making 234 calls to Mail::SpamAssassin::Message::Metadata::extract, avg 16.0ms/call |
528 | } | ||||
529 | |||||
530 | # --------------------------------------------------------------------------- | ||||
531 | |||||
532 | =item $str = get_metadata($hdr) | ||||
533 | |||||
534 | =cut | ||||
535 | |||||
536 | # spent 26.2ms within Mail::SpamAssassin::Message::get_metadata which was called 1876 times, avg 14µs/call:
# 1642 times (23.5ms+0s) by Mail::SpamAssassin::PerMsgStatus::_get at line 1987 of Mail/SpamAssassin/PerMsgStatus.pm, avg 14µs/call
# 234 times (2.69ms+0s) by Mail::SpamAssassin::PerMsgStatus::extract_message_metadata at line 1741 of Mail/SpamAssassin/PerMsgStatus.pm, avg 11µs/call | ||||
537 | 1876 | 4.51ms | my ($self, $hdr) = @_; | ||
538 | 1876 | 4.08ms | if (!$self->{metadata}) { | ||
539 | warn "metadata: oops! get_metadata() called after finish_metadata()"; return; | ||||
540 | } | ||||
541 | # dbg("message: get_metadata - %s: %s", $hdr, defined $_ ? $_ : '<undef>') | ||||
542 | # for $self->{metadata}->{strings}->{lc $hdr}; | ||||
543 | |||||
544 | 1876 | 22.5ms | $self->{metadata}->{strings}->{lc $hdr}; | ||
545 | } | ||||
546 | |||||
547 | =item put_metadata($hdr, $text) | ||||
548 | |||||
549 | =cut | ||||
550 | |||||
551 | # spent 16.7ms within Mail::SpamAssassin::Message::put_metadata which was called 936 times, avg 18µs/call:
# 234 times (4.33ms+0s) by Mail::SpamAssassin::Message::Metadata::parse_received_headers at line 280 of Mail/SpamAssassin/Message/Metadata/Received.pm, avg 19µs/call
# 234 times (4.21ms+0s) by Mail::SpamAssassin::Message::Metadata::parse_received_headers at line 282 of Mail/SpamAssassin/Message/Metadata/Received.pm, avg 18µs/call
# 234 times (4.15ms+0s) by Mail::SpamAssassin::Message::Metadata::parse_received_headers at line 284 of Mail/SpamAssassin/Message/Metadata/Received.pm, avg 18µs/call
# 234 times (4.01ms+0s) by Mail::SpamAssassin::Message::Metadata::parse_received_headers at line 286 of Mail/SpamAssassin/Message/Metadata/Received.pm, avg 17µs/call | ||||
552 | 936 | 4.32ms | my ($self, $hdr, $text) = @_; | ||
553 | 936 | 2.04ms | if (!$self->{metadata}) { | ||
554 | warn "metadata: oops! put_metadata() called after finish_metadata()"; return; | ||||
555 | } | ||||
556 | # dbg("message: put_metadata - %s: %s", $hdr, $text); | ||||
557 | 936 | 13.0ms | $self->{metadata}->{strings}->{lc $hdr} = $text; | ||
558 | } | ||||
559 | |||||
560 | =item delete_metadata($hdr) | ||||
561 | |||||
562 | =cut | ||||
563 | |||||
564 | sub delete_metadata { | ||||
565 | my ($self, $hdr) = @_; | ||||
566 | if (!$self->{metadata}) { | ||||
567 | warn "metadata: oops! delete_metadata() called after finish_metadata()"; return; | ||||
568 | } | ||||
569 | delete $self->{metadata}->{strings}->{lc $hdr}; | ||||
570 | } | ||||
571 | |||||
572 | =item $str = get_all_metadata() | ||||
573 | |||||
574 | =cut | ||||
575 | |||||
576 | # spent 26.6ms (25.7+994µs) within Mail::SpamAssassin::Message::get_all_metadata which was called 234 times, avg 114µs/call:
# 234 times (25.7ms+994µs) by Mail::SpamAssassin::Plugin::Bayes::_tokenize_headers at line 1304 of Mail/SpamAssassin/Plugin/Bayes.pm, avg 114µs/call | ||||
577 | 234 | 635µs | my ($self) = @_; | ||
578 | |||||
579 | 234 | 877µs | if (!$self->{metadata}) { | ||
580 | warn "metadata: oops! get_all_metadata() called after finish_metadata()"; return; | ||||
581 | } | ||||
582 | 234 | 452µs | my @ret; | ||
583 | 234 | 931µs | my $keys_ref = $self->{metadata}->{strings}; | ||
584 | 234 | 4.70ms | 234 | 994µs | foreach my $key (sort keys %$keys_ref) { # spent 994µs making 234 calls to Mail::SpamAssassin::Message::CORE:sort, avg 4µs/call |
585 | 936 | 5.22ms | my $val = $keys_ref->{$key}; | ||
586 | 936 | 1.65ms | $val = '' if !defined $val; | ||
587 | 936 | 9.16ms | push (@ret, "$key: $val\n"); | ||
588 | } | ||||
589 | 234 | 3.66ms | return (wantarray ? @ret : join('', @ret)); | ||
590 | } | ||||
591 | |||||
592 | # --------------------------------------------------------------------------- | ||||
593 | |||||
594 | =item finish_metadata() | ||||
595 | |||||
596 | Destroys the metadata for this message. Once a message has been | ||||
597 | scanned fully, the metadata is no longer required. Destroying | ||||
598 | this will free up some memory. | ||||
599 | |||||
600 | =cut | ||||
601 | |||||
602 | # spent 18.5ms (6.59+11.9) within Mail::SpamAssassin::Message::finish_metadata which was called 234 times, avg 79µs/call:
# 234 times (6.59ms+11.9ms) by Mail::SpamAssassin::Message::finish at line 620, avg 79µs/call | ||||
603 | 234 | 513µs | my ($self) = @_; | ||
604 | 234 | 2.60ms | if (defined ($self->{metadata})) { | ||
605 | 234 | 2.84ms | 234 | 11.9ms | $self->{metadata}->finish(); # spent 11.9ms making 234 calls to Mail::SpamAssassin::Message::Metadata::finish, avg 51µs/call |
606 | 234 | 818µs | delete $self->{metadata}; | ||
607 | } | ||||
608 | } | ||||
609 | |||||
610 | =item finish() | ||||
611 | |||||
612 | Clean up an object so that it can be destroyed. | ||||
613 | |||||
614 | =cut | ||||
615 | |||||
616 | # spent 128ms (106+21.8) within Mail::SpamAssassin::Message::finish which was called 234 times, avg 547µs/call:
# 234 times (106ms+21.8ms) by main::wanted at line 588 of /usr/local/bin/sa-learn, avg 547µs/call | ||||
617 | 234 | 547µs | my ($self) = @_; | ||
618 | |||||
619 | # Clean ourself up | ||||
620 | 234 | 2.06ms | 234 | 18.5ms | $self->finish_metadata(); # spent 18.5ms making 234 calls to Mail::SpamAssassin::Message::finish_metadata, avg 79µs/call |
621 | |||||
622 | # These will only be in the root Message node | ||||
623 | 234 | 1.17ms | delete $self->{'mime_boundary_state'}; | ||
624 | 234 | 556µs | delete $self->{'mbox_sep'}; | ||
625 | 234 | 569µs | delete $self->{'normalize'}; | ||
626 | 234 | 1.04ms | delete $self->{'pristine_body'}; | ||
627 | 234 | 944µs | delete $self->{'pristine_headers'}; | ||
628 | 234 | 740µs | delete $self->{'line_ending'}; | ||
629 | 234 | 728µs | delete $self->{'missing_head_body_separator'}; | ||
630 | |||||
631 | 234 | 1.12ms | my @toclean = ( $self ); | ||
632 | |||||
633 | # Go ahead and clean up all of the Message::Node parts | ||||
634 | 234 | 7.57ms | while (my $part = shift @toclean) { | ||
635 | # bug 5557: windows requires tmp file be closed before it can be rm'd | ||||
636 | 622 | 2.45ms | if (ref $part->{'raw'} eq 'GLOB') { | ||
637 | 16 | 358µs | 16 | 172µs | close($part->{'raw'}) or die "error closing input file: $!"; # spent 172µs making 16 calls to Mail::SpamAssassin::Message::CORE:close, avg 11µs/call |
638 | } | ||||
639 | |||||
640 | # bug 5858: avoid memory leak with deep MIME structure | ||||
641 | 622 | 1.51ms | if (defined ($part->{metadata})) { | ||
642 | $part->{metadata}->finish(); | ||||
643 | delete $part->{metadata}; | ||||
644 | } | ||||
645 | |||||
646 | 622 | 13.3ms | delete $part->{'headers'}; | ||
647 | 622 | 14.2ms | delete $part->{'raw_headers'}; | ||
648 | 622 | 6.08ms | delete $part->{'header_order'}; | ||
649 | 622 | 32.3ms | delete $part->{'raw'}; | ||
650 | 622 | 1.86ms | delete $part->{'decoded'}; | ||
651 | 622 | 1.62ms | delete $part->{'rendered'}; | ||
652 | 622 | 1.52ms | delete $part->{'visible_rendered'}; | ||
653 | 622 | 1.43ms | delete $part->{'invisible_rendered'}; | ||
654 | 622 | 1.73ms | delete $part->{'type'}; | ||
655 | 622 | 1.35ms | delete $part->{'rendered_type'}; | ||
656 | |||||
657 | # if there are children nodes, add them to the queue of nodes to clean up | ||||
658 | 622 | 1.66ms | if (exists $part->{'body_parts'}) { | ||
659 | 394 | 1.55ms | push(@toclean, @{$part->{'body_parts'}}); | ||
660 | 197 | 542µs | delete $part->{'body_parts'}; | ||
661 | } | ||||
662 | } | ||||
663 | |||||
664 | # delete temporary files | ||||
665 | 234 | 3.07ms | if ($self->{'tmpfiles'}) { | ||
666 | 468 | 2.12ms | for my $fn (@{$self->{'tmpfiles'}}) { | ||
667 | 16 | 3.36ms | 16 | 3.14ms | unlink($fn) or warn "cannot unlink $fn: $!"; # spent 3.14ms making 16 calls to Mail::SpamAssassin::Message::CORE:unlink, avg 196µs/call |
668 | } | ||||
669 | 234 | 674µs | delete $self->{'tmpfiles'}; | ||
670 | } | ||||
671 | } | ||||
672 | |||||
673 | # also use a DESTROY method, just to ensure (as much as possible) that | ||||
674 | # temporary files are deleted even if the finish() method is omitted | ||||
675 | # spent 1.54ms within Mail::SpamAssassin::Message::DESTROY which was called 79 times, avg 19µs/call:
# 79 times (1.54ms+0s) by main::wanted at line 589 of /usr/local/bin/sa-learn, avg 19µs/call | ||||
676 | 79 | 195µs | my $self = shift; | ||
677 | # best practices: prevent potential calls to eval and to system routines | ||||
678 | # in code of a DESTROY method from clobbering global variables $@ and $! | ||||
679 | 79 | 575µs | local($@,$!); # keep outer error handling unaffected by DESTROY | ||
680 | 79 | 1.06ms | if ($self->{'tmpfiles'}) { | ||
681 | for my $fn (@{$self->{'tmpfiles'}}) { | ||||
682 | unlink($fn) or dbg("message: cannot unlink $fn: $!"); | ||||
683 | } | ||||
684 | } | ||||
685 | } | ||||
686 | |||||
687 | # --------------------------------------------------------------------------- | ||||
688 | |||||
689 | =item receive_date() | ||||
690 | |||||
691 | Return a time_t value with the received date of the current message, | ||||
692 | or current time if received time couldn't be determined. | ||||
693 | |||||
694 | =cut | ||||
695 | |||||
696 | # spent 4.42s (32.7ms+4.39) within Mail::SpamAssassin::Message::receive_date which was called 702 times, avg 6.29ms/call:
# 468 times (13.9ms+2.92s) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1249 of Mail/SpamAssassin/Plugin/TxRep.pm, avg 6.27ms/call
# 234 times (18.8ms+1.46s) by Mail::SpamAssassin::Plugin::Bayes::_learn_trapped at line 465 of Mail/SpamAssassin/Plugin/Bayes.pm, avg 6.34ms/call | ||||
697 | 702 | 1.76ms | my($self) = @_; | ||
698 | |||||
699 | 702 | 18.5ms | 1404 | 4.39s | return Mail::SpamAssassin::Util::receive_date(scalar $self->get_all_headers(0,1)); # spent 3.92s making 702 calls to Mail::SpamAssassin::Message::Node::get_all_headers, avg 5.59ms/call
# spent 462ms making 702 calls to Mail::SpamAssassin::Util::receive_date, avg 659µs/call |
700 | } | ||||
701 | |||||
702 | # --------------------------------------------------------------------------- | ||||
703 | |||||
704 | =back | ||||
705 | |||||
706 | =head1 PARSING METHODS, NON-PUBLIC | ||||
707 | |||||
708 | These methods take a RFC2822-esque formatted message and create a tree | ||||
709 | with all of the MIME body parts included. Those parts will be decoded | ||||
710 | as necessary, and text/html parts will be rendered into a standard text | ||||
711 | format, suitable for use in SpamAssassin. | ||||
712 | |||||
713 | =over 4 | ||||
714 | |||||
715 | =item parse_body() | ||||
716 | |||||
717 | parse_body() passes the body part that was passed in onto the | ||||
718 | correct part parser, either _parse_multipart() for multipart/* parts, | ||||
719 | or _parse_normal() for everything else. Multipart sections become the | ||||
720 | root of sub-trees, while everything else becomes a leaf in the tree. | ||||
721 | |||||
722 | For multipart messages, the first call to parse_body() doesn't create a | ||||
723 | new sub-tree and just uses the parent node to contain children. All other | ||||
724 | calls to parse_body() will cause a new sub-tree root to be created and | ||||
725 | children will exist underneath that root. (this is just so the tree | ||||
726 | doesn't have a root node which points at the actual root node ...) | ||||
727 | |||||
728 | =cut | ||||
729 | |||||
730 | # spent 2.72s (63.5ms+2.66) within Mail::SpamAssassin::Message::parse_body which was called 234 times, avg 11.6ms/call:
# 234 times (63.5ms+2.66s) by Mail::SpamAssassin::Message::find_parts at line 420, avg 11.6ms/call | ||||
731 | 234 | 544µs | my($self) = @_; | ||
732 | |||||
733 | # This shouldn't happen, but just in case, abort. | ||||
734 | 234 | 666µs | return unless (exists $self->{'parse_queue'}); | ||
735 | |||||
736 | 234 | 1.59ms | 234 | 1.50ms | dbg("message: ---- MIME PARSER START ----"); # spent 1.50ms making 234 calls to Mail::SpamAssassin::Logger::dbg, avg 6µs/call |
737 | |||||
738 | 1090 | 32.5ms | while (my $toparse = shift @{$self->{'parse_queue'}}) { | ||
739 | # multipart sections are required to have a boundary set ... If this | ||||
740 | # one doesn't, assume it's malformed and send it to be parsed as a | ||||
741 | # non-multipart section | ||||
742 | # | ||||
743 | 622 | 2.48ms | my ($msg, $boundary, $body, $subparse) = @$toparse; | ||
744 | |||||
745 | 622 | 10.5ms | 622 | 2.62ms | if ($msg->{'type'} =~ m{^multipart/}i && defined $boundary && $subparse > 0) { # spent 2.62ms making 622 calls to Mail::SpamAssassin::Message::CORE:match, avg 4µs/call |
746 | 197 | 1.89ms | 197 | 2.45s | $self->_parse_multipart($toparse); # spent 2.45s making 197 calls to Mail::SpamAssassin::Message::_parse_multipart, avg 12.4ms/call |
747 | } | ||||
748 | else { | ||||
749 | # If it's not multipart, go ahead and just deal with it. | ||||
750 | 425 | 3.32ms | 425 | 202ms | $self->_parse_normal($toparse); # spent 202ms making 425 calls to Mail::SpamAssassin::Message::_parse_normal, avg 474µs/call |
751 | |||||
752 | # bug 5041: process message/*, but exclude message/partial content types | ||||
753 | 425 | 4.79ms | 425 | 1.53ms | if ($msg->{'type'} =~ m{^message/(?!partial\z)}i && $subparse > 0) # spent 1.53ms making 425 calls to Mail::SpamAssassin::Message::CORE:match, avg 4µs/call |
754 | { | ||||
755 | # Just decode the part, but we don't need the resulting string here. | ||||
756 | $msg->decode(0); | ||||
757 | |||||
758 | # bug 7125: decode and parse only message/rfc822 or message/global, | ||||
759 | # but do not treat other message/* content types (like the ones listed | ||||
760 | # here) as a message consisting of a header and a body, as they are not: | ||||
761 | # message/delivery-status, message/global-delivery-status, | ||||
762 | # message/feedback-report, message/global-headers, | ||||
763 | # message/global-disposition-notification, | ||||
764 | # message/disposition-notification, (and message/partial) | ||||
765 | |||||
766 | # bug 5051, bug 3748: check $msg->{decoded}: sometimes message/* parts | ||||
767 | # have no content, and we get stuck waiting for STDIN, which is bad. :( | ||||
768 | |||||
769 | if ($msg->{'type'} =~ m{^message/(?:rfc822|global)\z}i && | ||||
770 | defined $msg->{'decoded'} && $msg->{'decoded'} ne '') | ||||
771 | { | ||||
772 | # Ok, so this part is still semi-recursive, since M::SA::Message | ||||
773 | # calls M::SA::Message, but we don't subparse the new message, | ||||
774 | # and pull a sneaky "steal our child's queue" maneuver to deal | ||||
775 | # with it on our own time. Reference the decoded array directly | ||||
776 | # since it's faster. | ||||
777 | # | ||||
778 | my $msg_obj = Mail::SpamAssassin::Message->new({ | ||||
779 | message => $msg->{'decoded'}, | ||||
780 | parsenow => 0, | ||||
781 | normalize => $self->{normalize}, | ||||
782 | subparse => $subparse - 1, | ||||
783 | }); | ||||
784 | |||||
785 | # Add the new message to the current node | ||||
786 | $msg->add_body_part($msg_obj); | ||||
787 | |||||
788 | # now this is the sneaky bit ... steal the sub-message's parse_queue | ||||
789 | # and add it to ours. then we'll handle the sub-message in our | ||||
790 | # normal loop and get all the glory. muhaha. :) | ||||
791 | push(@{$self->{'parse_queue'}}, @{$msg_obj->{'parse_queue'}}); | ||||
792 | delete $msg_obj->{'parse_queue'}; | ||||
793 | |||||
794 | # Ok, we've subparsed, so go ahead and remove the raw and decoded | ||||
795 | # data because we won't need them anymore (the tree under this part | ||||
796 | # will have that data) | ||||
797 | if (ref $msg->{'raw'} eq 'GLOB') { | ||||
798 | # Make sure we close it if it's a temp file -- Bug 5166 | ||||
799 | close($msg->{'raw'}) | ||||
800 | or die "error closing input file: $!"; | ||||
801 | } | ||||
802 | |||||
803 | delete $msg->{'raw'}; | ||||
804 | |||||
805 | delete $msg->{'decoded'}; | ||||
806 | } | ||||
807 | } | ||||
808 | } | ||||
809 | } | ||||
810 | |||||
811 | 234 | 1.64ms | 234 | 1.43ms | dbg("message: ---- MIME PARSER END ----"); # spent 1.43ms making 234 calls to Mail::SpamAssassin::Logger::dbg, avg 6µs/call |
812 | |||||
813 | # we're done parsing, so remove the queue variable | ||||
814 | 234 | 2.48ms | delete $self->{'parse_queue'}; | ||
815 | } | ||||
816 | |||||
817 | =item _parse_multipart() | ||||
818 | |||||
819 | Generate a root node, and for each child part call parse_body() | ||||
820 | to generate the tree. | ||||
821 | |||||
822 | =cut | ||||
823 | |||||
824 | # spent 2.45s (2.08+365ms) within Mail::SpamAssassin::Message::_parse_multipart which was called 197 times, avg 12.4ms/call:
# 197 times (2.08s+365ms) by Mail::SpamAssassin::Message::parse_body at line 746, avg 12.4ms/call | ||||
825 | 197 | 469µs | my($self, $toparse) = @_; | ||
826 | |||||
827 | 394 | 1.52ms | my ($msg, $boundary, $body, $subparse) = @{$toparse}; | ||
828 | |||||
829 | # we're not supposed to be a leaf, so prep ourselves | ||||
830 | 197 | 810µs | $msg->{'body_parts'} = []; | ||
831 | |||||
832 | # the next set of objects will be one level deeper | ||||
833 | 197 | 448µs | $subparse--; | ||
834 | |||||
835 | 197 | 2.01ms | 197 | 1.42ms | dbg("message: parsing multipart, got boundary: ".(defined $boundary ? $boundary : '')); # spent 1.42ms making 197 calls to Mail::SpamAssassin::Logger::dbg, avg 7µs/call |
836 | |||||
837 | # NOTE: The MIME boundary REs here are very specific to be mostly RFC 1521 | ||||
838 | # compliant, but also allow possible malformations to still work. Please | ||||
839 | # see Bugzilla bug 3749 for more information before making any changes! | ||||
840 | |||||
841 | # ignore preamble per RFC 1521, unless there's no boundary ... | ||||
842 | 197 | 908µs | if ( defined $boundary ) { | ||
843 | 197 | 356µs | my $line; | ||
844 | 394 | 1.22ms | my $tmp_line = @{$body}; | ||
845 | 197 | 1.52ms | for ($line=0; $line < $tmp_line; $line++) { | ||
846 | # dbg("message: multipart line $line: \"" . $body->[$line] . "\""); | ||||
847 | # specifically look for an opening boundary | ||||
848 | 336 | 14.1ms | 394 | 8.85ms | if (substr($body->[$line],0,2) eq '--' # triage # spent 7.97ms making 197 calls to Mail::SpamAssassin::Message::CORE:regcomp, avg 40µs/call
# spent 882µs making 197 calls to Mail::SpamAssassin::Message::CORE:match, avg 4µs/call |
849 | && $body->[$line] =~ /^--\Q$boundary\E\s*$/) { | ||||
850 | # Make note that we found the opening boundary | ||||
851 | 197 | 1.04ms | $self->{mime_boundary_state}->{$boundary} = 1; | ||
852 | |||||
853 | # if the line after the opening boundary isn't a header, flag it. | ||||
854 | # we need to make sure that there's actually another line though. | ||||
855 | # no re "strict"; # since perl 5.21.8: Ranges of ASCII printables... | ||||
856 | 197 | 3.19ms | 197 | 1.14ms | if ($line+1 < $tmp_line && $body->[$line+1] !~ /^[\041-\071\073-\176]+:/) { # spent 1.14ms making 197 calls to Mail::SpamAssassin::Message::CORE:match, avg 6µs/call |
857 | $self->{'missing_mime_headers'} = 1; | ||||
858 | } | ||||
859 | |||||
860 | 197 | 742µs | last; | ||
861 | } | ||||
862 | } | ||||
863 | |||||
864 | # Found a boundary, ignore the preamble | ||||
865 | 197 | 796µs | if ( $line < $tmp_line ) { | ||
866 | 394 | 1.72ms | splice @{$body}, 0, $line+1; | ||
867 | } | ||||
868 | |||||
869 | # Else, there's no boundary, so leave the whole part... | ||||
870 | } | ||||
871 | |||||
872 | # prepare a new tree node | ||||
873 | 197 | 2.94ms | 197 | 5.36ms | my $part_msg = Mail::SpamAssassin::Message::Node->new({ normalize=>$self->{normalize} }); # spent 5.36ms making 197 calls to Mail::SpamAssassin::Message::Node::new, avg 27µs/call |
874 | 197 | 436µs | my $in_body = 0; | ||
875 | 197 | 379µs | my $header; | ||
876 | my $part_array; | ||||
877 | my $found_end_boundary; | ||||
878 | |||||
879 | 394 | 1.24ms | my $line_count = @{$body}; | ||
880 | 394 | 5.91ms | foreach ( @{$body} ) { | ||
881 | # if we're on the last body line, or we find any boundary marker, | ||||
882 | # deal with the mime part; | ||||
883 | # a triage before an unlikely-to-match regexp avoids a CPU hotspot | ||||
884 | 67383 | 353ms | 982 | 11.9ms | $found_end_boundary = defined $boundary && substr($_,0,2) eq '--' # spent 9.16ms making 491 calls to Mail::SpamAssassin::Message::CORE:regcomp, avg 19µs/call
# spent 2.72ms making 491 calls to Mail::SpamAssassin::Message::CORE:match, avg 6µs/call |
885 | && /^--\Q$boundary\E(?:--)?\s*$/; | ||||
886 | 67383 | 111ms | if ( --$line_count == 0 || $found_end_boundary ) { | ||
887 | 388 | 1.62ms | my $line = $_; # remember the last line | ||
888 | |||||
889 | # If at last line and no end boundary found, the line belongs to body | ||||
890 | # TODO: | ||||
891 | # Is $self->{mime_boundary_state}->{$boundary}-- needed here? | ||||
892 | # Could "missing end boundary" be a useful rule? Mark it somewhere? | ||||
893 | # If SA processed truncated message from amavis etc, this could also | ||||
894 | # be hit legimately.. | ||||
895 | 388 | 2.27ms | if (!$found_end_boundary) { | ||
896 | # TODO: This is duplicate code from few pages down below.. | ||||
897 | while (length ($_) > MAX_BODY_LINE_LENGTH) { | ||||
898 | push (@{$part_array}, substr($_, 0, MAX_BODY_LINE_LENGTH)."\n"); | ||||
899 | substr($_, 0, MAX_BODY_LINE_LENGTH) = ''; | ||||
900 | } | ||||
901 | push ( @{$part_array}, $_ ); | ||||
902 | } | ||||
903 | # per rfc 1521, the CRLF before the boundary is part of the boundary: | ||||
904 | # NOTE: The CRLF preceding the encapsulation line is conceptually | ||||
905 | # attached to the boundary so that it is possible to have a part | ||||
906 | # that does not end with a CRLF (line break). Body parts that must | ||||
907 | # be considered to end with line breaks, therefore, must have two | ||||
908 | # CRLFs preceding the encapsulation line, the first of which is part | ||||
909 | # of the preceding body part, and the second of which is part of the | ||||
910 | # encapsulation boundary. | ||||
911 | elsif ($part_array) { | ||||
912 | 388 | 1.77ms | chomp( $part_array->[-1] ); # trim the CRLF that's part of the boundary | ||
913 | 681 | 2.70ms | splice @{$part_array}, -1 if ( $part_array->[-1] eq '' ); # blank line for the boundary only ... | ||
914 | } | ||||
915 | else { | ||||
916 | # Invalid parts can have no body, so fake in a blank body | ||||
917 | # in that case. | ||||
918 | $part_array = []; | ||||
919 | } | ||||
920 | |||||
921 | 388 | 684µs | my($p_boundary); | ||
922 | 388 | 7.74ms | 776 | 113ms | ($part_msg->{'type'}, $p_boundary) = Mail::SpamAssassin::Util::parse_content_type($part_msg->header('content-type')); # spent 90.3ms making 388 calls to Mail::SpamAssassin::Util::parse_content_type, avg 233µs/call
# spent 23.1ms making 388 calls to Mail::SpamAssassin::Message::Node::header, avg 60µs/call |
923 | 388 | 993µs | $p_boundary ||= $boundary; | ||
924 | 388 | 4.78ms | 388 | 4.66ms | dbg("message: found part of type ".$part_msg->{'type'}.", boundary: ".(defined $p_boundary ? $p_boundary : '')); # spent 4.66ms making 388 calls to Mail::SpamAssassin::Logger::dbg, avg 12µs/call |
925 | |||||
926 | # we've created a new node object, so add it to the queue along with the | ||||
927 | # text that belongs to that part, then add the new part to the current | ||||
928 | # node to create the tree. | ||||
929 | 776 | 5.03ms | push(@{$self->{'parse_queue'}}, [ $part_msg, $p_boundary, $part_array, $subparse ]); | ||
930 | 388 | 3.21ms | 388 | 15.7ms | $msg->add_body_part($part_msg); # spent 15.7ms making 388 calls to Mail::SpamAssassin::Message::Node::add_body_part, avg 41µs/call |
931 | |||||
932 | # rfc 1521 says /^--boundary--$/, some MUAs may just require /^--boundary--/ | ||||
933 | # but this causes problems with horizontal lines when the boundary is | ||||
934 | # made up of dashes as well, etc. | ||||
935 | 388 | 1.25ms | if (defined $boundary) { | ||
936 | # no re "strict"; # since perl 5.21.8: Ranges of ASCII printables... | ||||
937 | 388 | 17.1ms | 967 | 9.46ms | if ($line =~ /^--\Q${boundary}\E--\s*$/) { # spent 6.80ms making 388 calls to Mail::SpamAssassin::Message::CORE:regcomp, avg 18µs/call
# spent 2.66ms making 579 calls to Mail::SpamAssassin::Message::CORE:match, avg 5µs/call |
938 | # Make a note that we've seen the end boundary | ||||
939 | 197 | 643µs | $self->{mime_boundary_state}->{$boundary}--; | ||
940 | 197 | 1.04ms | last; | ||
941 | } | ||||
942 | elsif ($line_count && $body->[-$line_count] !~ /^[\041-\071\073-\176]+:/) { | ||||
943 | # if we aren't on an end boundary and there are still lines left, it | ||||
944 | # means we hit a new start boundary. therefore, the next line ought | ||||
945 | # to be a mime header. if it's not, mark it. | ||||
946 | $self->{'missing_mime_headers'} = 1; | ||||
947 | } | ||||
948 | } | ||||
949 | |||||
950 | # make sure we start with a new clean node | ||||
951 | 191 | 431µs | $in_body = 0; | ||
952 | 191 | 2.16ms | 191 | 5.56ms | $part_msg = Mail::SpamAssassin::Message::Node->new({ normalize=>$self->{normalize} }); # spent 5.56ms making 191 calls to Mail::SpamAssassin::Message::Node::new, avg 29µs/call |
953 | 191 | 435µs | undef $part_array; | ||
954 | 191 | 467µs | undef $header; | ||
955 | |||||
956 | 191 | 742µs | next; | ||
957 | } | ||||
958 | |||||
959 | 66995 | 104ms | if (!$in_body) { | ||
960 | # s/\s+$//; # bug 5127: don't clean this up (yet) | ||||
961 | # no re "strict"; # since perl 5.21.8: Ranges of ASCII printables... | ||||
962 | 1233 | 27.6ms | 1685 | 9.98ms | if (/^[\041-\071\073-\176]+[ \t]*:/) { # spent 9.98ms making 1685 calls to Mail::SpamAssassin::Message::CORE:match, avg 6µs/call |
963 | 781 | 2.88ms | if ($header) { | ||
964 | 393 | 5.18ms | my ( $key, $value ) = split ( /:\s*/, $header, 2 ); | ||
965 | 393 | 3.08ms | 393 | 96.4ms | $part_msg->header( $key, $value ); # spent 96.4ms making 393 calls to Mail::SpamAssassin::Message::Node::header, avg 245µs/call |
966 | } | ||||
967 | 781 | 1.99ms | $header = $_; | ||
968 | 781 | 1.73ms | next; | ||
969 | } | ||||
970 | elsif (/^[ \t]/ && $header) { | ||||
971 | # $_ =~ s/^\s*//; # bug 5127, again | ||||
972 | 64 | 260µs | $header .= $_; | ||
973 | 64 | 181µs | next; | ||
974 | } | ||||
975 | else { | ||||
976 | 388 | 1.45ms | if ($header) { | ||
977 | 388 | 4.77ms | my ( $key, $value ) = split ( /:\s*/, $header, 2 ); | ||
978 | 388 | 3.50ms | 388 | 78.7ms | $part_msg->header( $key, $value ); # spent 78.7ms making 388 calls to Mail::SpamAssassin::Message::Node::header, avg 203µs/call |
979 | } | ||||
980 | 388 | 1.29ms | $in_body = 1; | ||
981 | |||||
982 | # if there's a blank line separator, that's good. if there isn't, | ||||
983 | # it's a body line, so drop through. | ||||
984 | 388 | 4.93ms | 388 | 2.21ms | if (/^\r?$/) { # spent 2.21ms making 388 calls to Mail::SpamAssassin::Message::CORE:match, avg 6µs/call |
985 | 388 | 879µs | next; | ||
986 | } | ||||
987 | else { | ||||
988 | $self->{'missing_mime_head_body_separator'} = 1; | ||||
989 | } | ||||
990 | } | ||||
991 | } | ||||
992 | |||||
993 | # we run into a perl bug if the lines are astronomically long (probably | ||||
994 | # due to lots of regexp backtracking); so split any individual line | ||||
995 | # over MAX_BODY_LINE_LENGTH bytes in length. This can wreck HTML | ||||
996 | # totally -- but IMHO the only reason a luser would use | ||||
997 | # MAX_BODY_LINE_LENGTH-byte lines is to crash filters, anyway. | ||||
998 | 65762 | 394ms | while (length ($_) > MAX_BODY_LINE_LENGTH) { | ||
999 | push (@{$part_array}, substr($_, 0, MAX_BODY_LINE_LENGTH)."\n"); | ||||
1000 | substr($_, 0, MAX_BODY_LINE_LENGTH) = ''; | ||||
1001 | } | ||||
1002 | 131524 | 990ms | push ( @{$part_array}, $_ ); | ||
1003 | } | ||||
1004 | |||||
1005 | # Look for a message epilogue | ||||
1006 | # originally ignored whitespace: 0.185 0.2037 0.0654 0.757 0.00 0.00 TVD_TAB | ||||
1007 | # ham FPs were all "." on a line by itself. | ||||
1008 | # spams seem to only have NULL chars afterwards ? | ||||
1009 | 197 | 2.01ms | if ($line_count) { | ||
1010 | 75 | 727µs | for(; $line_count > 0; $line_count--) { | ||
1011 | 87 | 1.05ms | 87 | 360µs | if ($body->[-$line_count] =~ /[^\s.]/) { # spent 360µs making 87 calls to Mail::SpamAssassin::Message::CORE:match, avg 4µs/call |
1012 | $self->{mime_epilogue_exists} = 1; | ||||
1013 | last; | ||||
1014 | } | ||||
1015 | } | ||||
1016 | } | ||||
1017 | |||||
1018 | } | ||||
1019 | |||||
1020 | =item _parse_normal() | ||||
1021 | |||||
1022 | Generate a leaf node and add it to the parent. | ||||
1023 | |||||
1024 | =cut | ||||
1025 | |||||
1026 | # spent 202ms (51.6+150) within Mail::SpamAssassin::Message::_parse_normal which was called 425 times, avg 474µs/call:
# 425 times (51.6ms+150ms) by Mail::SpamAssassin::Message::parse_body at line 750, avg 474µs/call | ||||
1027 | 425 | 920µs | my($self, $toparse) = @_; | ||
1028 | |||||
1029 | 850 | 3.14ms | my ($msg, $boundary, $body) = @{$toparse}; | ||
1030 | |||||
1031 | 425 | 2.77ms | 425 | 2.65ms | dbg("message: parsing normal part"); # spent 2.65ms making 425 calls to Mail::SpamAssassin::Logger::dbg, avg 6µs/call |
1032 | |||||
1033 | # 0: content-type, 1: boundary, 2: charset, 3: filename | ||||
1034 | 425 | 7.23ms | 850 | 95.8ms | my @ct = Mail::SpamAssassin::Util::parse_content_type($msg->header('content-type')); # spent 71.6ms making 425 calls to Mail::SpamAssassin::Util::parse_content_type, avg 168µs/call
# spent 24.1ms making 425 calls to Mail::SpamAssassin::Message::Node::header, avg 57µs/call |
1035 | |||||
1036 | # multipart sections are required to have a boundary set ... If this | ||||
1037 | # one doesn't, assume it's malformed and revert to text/plain | ||||
1038 | 425 | 4.91ms | 425 | 1.26ms | $msg->{'type'} = ($ct[0] !~ m@^multipart/@i || defined $boundary ) ? $ct[0] : 'text/plain'; # spent 1.26ms making 425 calls to Mail::SpamAssassin::Message::CORE:match, avg 3µs/call |
1039 | 425 | 2.07ms | $msg->{'charset'} = $ct[2]; | ||
1040 | |||||
1041 | # attempt to figure out a name for this attachment if there is one ... | ||||
1042 | 425 | 3.30ms | 425 | 22.6ms | my $disp = $msg->header('content-disposition') || ''; # spent 22.6ms making 425 calls to Mail::SpamAssassin::Message::Node::header, avg 53µs/call |
1043 | 425 | 4.34ms | 425 | 884µs | if ($disp =~ /name="?([^\";]+)"?/i) { # spent 884µs making 425 calls to Mail::SpamAssassin::Message::CORE:match, avg 2µs/call |
1044 | 9 | 49µs | $msg->{'name'} = $1; | ||
1045 | } | ||||
1046 | elsif ($ct[3]) { | ||||
1047 | 6 | 27µs | $msg->{'name'} = $ct[3]; | ||
1048 | } | ||||
1049 | |||||
1050 | 425 | 2.37ms | $msg->{'boundary'} = $boundary; | ||
1051 | |||||
1052 | # If the part type is not one that we're likely to want to use, go | ||||
1053 | # ahead and write the part data out to a temp file -- why keep sucking | ||||
1054 | # up RAM with something we're not going to use? | ||||
1055 | # | ||||
1056 | 425 | 6.24ms | 425 | 3.23ms | if ($msg->{'type'} !~ m@^(?:text/(?:plain|html)$|message\b)@) { # spent 3.23ms making 425 calls to Mail::SpamAssassin::Message::CORE:match, avg 8µs/call |
1057 | 16 | 31µs | my($filepath, $fh); | ||
1058 | eval { | ||||
1059 | 32 | 198µs | 16 | 11.4ms | ($filepath, $fh) = Mail::SpamAssassin::Util::secure_tmpfile(); 1; # spent 11.4ms making 16 calls to Mail::SpamAssassin::Util::secure_tmpfile, avg 712µs/call |
1060 | 16 | 74µs | } or do { | ||
1061 | my $eval_stat = $@ ne '' ? $@ : "errno=$!"; chomp $eval_stat; | ||||
1062 | info("message: failed to create a temp file: %s", $eval_stat); | ||||
1063 | }; | ||||
1064 | 16 | 65µs | if ($fh) { | ||
1065 | # The temp file was created, add it to the list of pending deletions | ||||
1066 | # we cannot just delete immediately in the POSIX idiom, as this is | ||||
1067 | # unportable (to win32 at least) | ||||
1068 | 32 | 139µs | push @{$self->{tmpfiles}}, $filepath; | ||
1069 | 16 | 118µs | 16 | 96µs | dbg("message: storing a message part to file %s", $filepath); # spent 96µs making 16 calls to Mail::SpamAssassin::Logger::dbg, avg 6µs/call |
1070 | 32 | 467µs | 16 | 8.29ms | $fh->print(@{$body}) or die "error writing to $filepath: $!"; # spent 8.29ms making 16 calls to IO::Handle::print, avg 518µs/call |
1071 | 16 | 1.12ms | 16 | 942µs | $fh->flush or die "error writing (flush) to $filepath: $!"; # spent 942µs making 16 calls to IO::Handle::flush, avg 59µs/call |
1072 | 16 | 76µs | $msg->{'raw'} = $fh; | ||
1073 | } | ||||
1074 | } | ||||
1075 | |||||
1076 | # if the part didn't get a temp file, go ahead and store the data in memory | ||||
1077 | 425 | 5.69ms | if (!defined $msg->{'raw'}) { | ||
1078 | 409 | 2.65ms | 409 | 2.92ms | dbg("message: storing a body to memory"); # spent 2.92ms making 409 calls to Mail::SpamAssassin::Logger::dbg, avg 7µs/call |
1079 | 409 | 2.78ms | $msg->{'raw'} = $body; | ||
1080 | } | ||||
1081 | } | ||||
1082 | |||||
1083 | # --------------------------------------------------------------------------- | ||||
1084 | |||||
1085 | sub get_mimepart_digests { | ||||
1086 | my ($self) = @_; | ||||
1087 | |||||
1088 | if (!exists $self->{mimepart_digests}) { | ||||
1089 | # traverse all parts which are leaves, recursively | ||||
1090 | $self->{mimepart_digests} = | ||||
1091 | [ map(sha1_hex($_->decode) . ':' . lc($_->{type}||''), | ||||
1092 | $self->find_parts(qr/^/,1,1)) ]; | ||||
1093 | } | ||||
1094 | return $self->{mimepart_digests}; | ||||
1095 | } | ||||
1096 | |||||
1097 | # --------------------------------------------------------------------------- | ||||
1098 | |||||
1099 | # common code for get_rendered_body_text_array, | ||||
1100 | # get_visible_rendered_body_text_array, get_invisible_rendered_body_text_array | ||||
1101 | # | ||||
1102 | # spent 27.2s (156ms+27.1) within Mail::SpamAssassin::Message::get_body_text_array_common which was called 1170 times, avg 23.3ms/call:
# 702 times (68.6ms+26.8s) by Mail::SpamAssassin::Message::get_rendered_body_text_array at line 1159, avg 38.2ms/call
# 234 times (51.9ms+196ms) by Mail::SpamAssassin::Message::get_visible_rendered_body_text_array at line 1164, avg 1.06ms/call
# 234 times (35.5ms+93.2ms) by Mail::SpamAssassin::Message::get_invisible_rendered_body_text_array at line 1169, avg 550µs/call | ||||
1103 | 1170 | 3.17ms | my ($self, $method_name) = @_; | ||
1104 | |||||
1105 | 1170 | 3.47ms | my $key = 'text_' . $method_name; | ||
1106 | 1638 | 29.2ms | if (exists $self->{$key}) { return $self->{$key} } | ||
1107 | |||||
1108 | 702 | 3.52ms | $self->{$key} = []; | ||
1109 | |||||
1110 | # Find all parts which are leaves | ||||
1111 | 702 | 17.7ms | 1404 | 2.90s | my @parts = $self->find_parts(qr/./,1); # spent 2.89s making 702 calls to Mail::SpamAssassin::Message::find_parts, avg 4.12ms/call
# spent 5.02ms making 702 calls to Mail::SpamAssassin::Message::CORE:qr, avg 7µs/call |
1112 | 702 | 1.62ms | return $self->{$key} unless @parts; | ||
1113 | |||||
1114 | # the html metadata may have already been set, so let's not bother if it's | ||||
1115 | # already been done. | ||||
1116 | 702 | 2.97ms | my $html_needs_setting = !exists $self->{metadata}->{html}; | ||
1117 | |||||
1118 | 702 | 7.72ms | 468 | 61.1ms | my $text = $method_name eq 'invisible_rendered' ? '' # spent 61.1ms making 468 calls to Mail::SpamAssassin::Message::Node::get_header, avg 131µs/call |
1119 | : ($self->get_header('subject') || "\n"); | ||||
1120 | |||||
1121 | # Go through each part | ||||
1122 | 702 | 11.7ms | for (my $pt = 0 ; $pt <= $#parts ; $pt++ ) { | ||
1123 | 1275 | 2.96ms | my $p = $parts[$pt]; | ||
1124 | |||||
1125 | # put a blank line between parts ... | ||||
1126 | 1275 | 4.12ms | $text .= "\n" if $text ne ''; | ||
1127 | |||||
1128 | 1275 | 14.5ms | 1275 | 23.9s | my($type, $rnd) = $p->$method_name(); # decode this part # spent 23.9s making 425 calls to Mail::SpamAssassin::Message::Node::rendered, avg 56.2ms/call
# spent 25.7ms making 425 calls to Mail::SpamAssassin::Message::Node::invisible_rendered, avg 61µs/call
# spent 15.4ms making 425 calls to Mail::SpamAssassin::Message::Node::visible_rendered, avg 36µs/call |
1129 | 1275 | 7.63ms | if ( defined $rnd ) { | ||
1130 | # Only text/* types are rendered ... | ||||
1131 | 1227 | 6.24ms | $text .= $rnd; | ||
1132 | |||||
1133 | # TVD - if there are multiple parts, what should we do? | ||||
1134 | # right now, just use the last one. we may need to give some priority | ||||
1135 | # at some point, ie: use text/html rendered if it exists, or | ||||
1136 | # text/plain rendered as html otherwise. | ||||
1137 | 1227 | 3.05ms | if ($html_needs_setting && $type eq 'text/html') { | ||
1138 | 189 | 879µs | $self->{metadata}->{html} = $p->{html_results}; | ||
1139 | } | ||||
1140 | } | ||||
1141 | } | ||||
1142 | |||||
1143 | # whitespace handling (warning: small changes have large effects!) | ||||
1144 | 702 | 37.4ms | 702 | 30.4ms | $text =~ s/\n+\s*\n+/\f/gs; # double newlines => form feed # spent 30.4ms making 702 calls to Mail::SpamAssassin::Message::CORE:subst, avg 43µs/call |
1145 | # $text =~ tr/ \t\n\r\x0b\xa0/ /s; # whitespace (incl. VT, NBSP) => space | ||||
1146 | 702 | 11.7ms | $text =~ tr/ \t\n\r\x0b/ /s; # whitespace (incl. VT) => space | ||
1147 | 702 | 5.16ms | $text =~ tr/\f/\n/; # form feeds => newline | ||
1148 | |||||
1149 | 702 | 13.6ms | 702 | 143ms | my @textary = split_into_array_of_short_lines($text); # spent 143ms making 702 calls to Mail::SpamAssassin::Message::split_into_array_of_short_lines, avg 203µs/call |
1150 | 702 | 3.79ms | $self->{$key} = \@textary; | ||
1151 | |||||
1152 | 702 | 7.24ms | return $self->{$key}; | ||
1153 | } | ||||
1154 | |||||
1155 | # --------------------------------------------------------------------------- | ||||
1156 | |||||
1157 | # spent 26.9s (32.3ms+26.8) within Mail::SpamAssassin::Message::get_rendered_body_text_array which was called 702 times, avg 38.3ms/call:
# 702 times (32.3ms+26.8s) by Mail::SpamAssassin::PerMsgStatus::get_decoded_stripped_body_text_array at line 1785 of Mail/SpamAssassin/PerMsgStatus.pm, avg 38.3ms/call | ||||
1158 | 702 | 1.50ms | my ($self) = @_; | ||
1159 | 702 | 10.1ms | 702 | 26.8s | return $self->get_body_text_array_common('rendered'); # spent 26.8s making 702 calls to Mail::SpamAssassin::Message::get_body_text_array_common, avg 38.2ms/call |
1160 | } | ||||
1161 | |||||
1162 | # spent 252ms (4.73+247) within Mail::SpamAssassin::Message::get_visible_rendered_body_text_array which was called 234 times, avg 1.08ms/call:
# 234 times (4.73ms+247ms) by Mail::SpamAssassin::Plugin::Bayes::_get_msgdata_from_permsgstatus at line 1044 of Mail/SpamAssassin/Plugin/Bayes.pm, avg 1.08ms/call | ||||
1163 | 234 | 603µs | my ($self) = @_; | ||
1164 | 234 | 3.99ms | 234 | 247ms | return $self->get_body_text_array_common('visible_rendered'); # spent 247ms making 234 calls to Mail::SpamAssassin::Message::get_body_text_array_common, avg 1.06ms/call |
1165 | } | ||||
1166 | |||||
1167 | # spent 133ms (3.87+129) within Mail::SpamAssassin::Message::get_invisible_rendered_body_text_array which was called 234 times, avg 567µs/call:
# 234 times (3.87ms+129ms) by Mail::SpamAssassin::Plugin::Bayes::_get_msgdata_from_permsgstatus at line 1046 of Mail/SpamAssassin/Plugin/Bayes.pm, avg 567µs/call | ||||
1168 | 234 | 580µs | my ($self) = @_; | ||
1169 | 234 | 2.96ms | 234 | 129ms | return $self->get_body_text_array_common('invisible_rendered'); # spent 129ms making 234 calls to Mail::SpamAssassin::Message::get_body_text_array_common, avg 550µs/call |
1170 | } | ||||
1171 | |||||
1172 | # --------------------------------------------------------------------------- | ||||
1173 | |||||
1174 | sub get_decoded_body_text_array { | ||||
1175 | my ($self) = @_; | ||||
1176 | |||||
1177 | if (defined $self->{text_decoded}) { return $self->{text_decoded}; } | ||||
1178 | $self->{text_decoded} = [ ]; | ||||
1179 | |||||
1180 | # Find all parts which are leaves | ||||
1181 | my @parts = $self->find_parts(qr/^(?:text|message)\b/i,1); | ||||
1182 | return $self->{text_decoded} unless @parts; | ||||
1183 | |||||
1184 | # Go through each part | ||||
1185 | for(my $pt = 0 ; $pt <= $#parts ; $pt++ ) { | ||||
1186 | # bug 4843: skip text/calendar parts since they're usually an attachment | ||||
1187 | # and not displayed | ||||
1188 | next if ($parts[$pt]->{'type'} eq 'text/calendar'); | ||||
1189 | |||||
1190 | push(@{$self->{text_decoded}}, "\n") if ( @{$self->{text_decoded}} ); | ||||
1191 | push(@{$self->{text_decoded}}, | ||||
1192 | split_into_array_of_short_paragraphs($parts[$pt]->decode())); | ||||
1193 | } | ||||
1194 | |||||
1195 | return $self->{text_decoded}; | ||||
1196 | } | ||||
1197 | |||||
1198 | # --------------------------------------------------------------------------- | ||||
1199 | |||||
1200 | # spent 143ms within Mail::SpamAssassin::Message::split_into_array_of_short_lines which was called 702 times, avg 203µs/call:
# 702 times (143ms+0s) by Mail::SpamAssassin::Message::get_body_text_array_common at line 1149, avg 203µs/call | ||||
1201 | 702 | 1.34ms | my @result; | ||
1202 | 702 | 17.5ms | foreach my $line (split (/^/m, $_[0])) { | ||
1203 | 8895 | 34.8ms | while (length ($line) > MAX_BODY_LINE_LENGTH) { | ||
1204 | # try splitting "nicely" so that we don't chop a url in half or | ||||
1205 | # something. if there's no space, then just split at max length. | ||||
1206 | 92 | 1.06ms | my $length = rindex($line, ' ', MAX_BODY_LINE_LENGTH) + 1; | ||
1207 | 92 | 178µs | $length ||= MAX_BODY_LINE_LENGTH; | ||
1208 | 92 | 1.11ms | push (@result, substr($line, 0, $length, '')); | ||
1209 | } | ||||
1210 | 8895 | 73.6ms | push (@result, $line); | ||
1211 | } | ||||
1212 | 702 | 23.5ms | @result; | ||
1213 | } | ||||
1214 | |||||
1215 | # --------------------------------------------------------------------------- | ||||
1216 | |||||
1217 | # split a text into array of paragraphs of sizes between | ||||
1218 | # $chunk_size and 2 * $chunk_size, returning the resulting array | ||||
1219 | |||||
1220 | sub split_into_array_of_short_paragraphs { | ||||
1221 | my @result; | ||||
1222 | my $chunk_size = 1024; | ||||
1223 | my $text_l = length($_[0]); | ||||
1224 | my($j,$ofs); | ||||
1225 | for ($ofs = 0; $text_l - $ofs > 2 * $chunk_size; $ofs = $j+1) { | ||||
1226 | $j = index($_[0], "\n", $ofs+$chunk_size); | ||||
1227 | if ($j < 0) { | ||||
1228 | $j = index($_[0], " ", $ofs+$chunk_size); | ||||
1229 | if ($j < 0) { $j = $ofs+$chunk_size } | ||||
1230 | } | ||||
1231 | push(@result, substr($_[0], $ofs, $j-$ofs+1)); | ||||
1232 | } | ||||
1233 | push(@result, substr($_[0], $ofs)) if $ofs < $text_l; | ||||
1234 | @result; | ||||
1235 | } | ||||
1236 | |||||
1237 | # --------------------------------------------------------------------------- | ||||
1238 | |||||
1239 | 1 | 19µs | 1; | ||
1240 | |||||
1241 | =back | ||||
1242 | |||||
1243 | =cut | ||||
# spent 172µs within Mail::SpamAssassin::Message::CORE:close which was called 16 times, avg 11µs/call:
# 16 times (172µs+0s) by Mail::SpamAssassin::Message::finish at line 637, avg 11µs/call | |||||
# spent 553ms within Mail::SpamAssassin::Message::CORE:match which was called 114442 times, avg 5µs/call:
# 71012 times (333ms+0s) by Mail::SpamAssassin::Message::new at line 340, avg 5µs/call
# 15054 times (59.8ms+0s) by Mail::SpamAssassin::Message::new at line 286, avg 4µs/call
# 13487 times (63.2ms+0s) by Mail::SpamAssassin::Message::new at line 252, avg 5µs/call
# 7410 times (62.5ms+0s) by Mail::SpamAssassin::Message::new at line 302, avg 8µs/call
# 1685 times (9.98ms+0s) by Mail::SpamAssassin::Message::_parse_multipart at line 962, avg 6µs/call
# 622 times (2.62ms+0s) by Mail::SpamAssassin::Message::parse_body at line 745, avg 4µs/call
# 579 times (2.66ms+0s) by Mail::SpamAssassin::Message::_parse_multipart at line 937, avg 5µs/call
# 491 times (2.72ms+0s) by Mail::SpamAssassin::Message::_parse_multipart at line 884, avg 6µs/call
# 468 times (1.62ms+0s) by Mail::SpamAssassin::Message::new at line 195, avg 3µs/call
# 468 times (1.42ms+0s) by Mail::SpamAssassin::Message::new at line 236, avg 3µs/call
# 427 times (1.85ms+0s) by Mail::SpamAssassin::Message::new at line 372, avg 4µs/call
# 425 times (3.23ms+0s) by Mail::SpamAssassin::Message::_parse_normal at line 1056, avg 8µs/call
# 425 times (1.53ms+0s) by Mail::SpamAssassin::Message::parse_body at line 753, avg 4µs/call
# 425 times (1.26ms+0s) by Mail::SpamAssassin::Message::_parse_normal at line 1038, avg 3µs/call
# 425 times (884µs+0s) by Mail::SpamAssassin::Message::_parse_normal at line 1043, avg 2µs/call
# 388 times (2.21ms+0s) by Mail::SpamAssassin::Message::_parse_multipart at line 984, avg 6µs/call
# 197 times (1.14ms+0s) by Mail::SpamAssassin::Message::_parse_multipart at line 856, avg 6µs/call
# 197 times (882µs+0s) by Mail::SpamAssassin::Message::_parse_multipart at line 848, avg 4µs/call
# 170 times (596µs+0s) by Mail::SpamAssassin::Message::new at line 374, avg 4µs/call
# 87 times (360µs+0s) by Mail::SpamAssassin::Message::_parse_multipart at line 1011, avg 4µs/call | |||||
# spent 5.02ms within Mail::SpamAssassin::Message::CORE:qr which was called 702 times, avg 7µs/call:
# 702 times (5.02ms+0s) by Mail::SpamAssassin::Message::get_body_text_array_common at line 1111, avg 7µs/call | |||||
# spent 33.1ms within Mail::SpamAssassin::Message::CORE:regcomp which was called 1246 times, avg 27µs/call:
# 491 times (9.16ms+0s) by Mail::SpamAssassin::Message::_parse_multipart at line 884, avg 19µs/call
# 388 times (6.80ms+0s) by Mail::SpamAssassin::Message::_parse_multipart at line 937, avg 18µs/call
# 197 times (7.97ms+0s) by Mail::SpamAssassin::Message::_parse_multipart at line 848, avg 40µs/call
# 170 times (9.20ms+0s) by Mail::SpamAssassin::Message::new at line 374, avg 54µs/call | |||||
# spent 994µs within Mail::SpamAssassin::Message::CORE:sort which was called 234 times, avg 4µs/call:
# 234 times (994µs+0s) by Mail::SpamAssassin::Message::get_all_metadata at line 584, avg 4µs/call | |||||
# spent 57.4ms within Mail::SpamAssassin::Message::CORE:subst which was called 8112 times, avg 7µs/call:
# 7410 times (27.1ms+0s) by Mail::SpamAssassin::Message::new at line 272, avg 4µs/call
# 702 times (30.4ms+0s) by Mail::SpamAssassin::Message::get_body_text_array_common at line 1144, avg 43µs/call | |||||
# spent 3.14ms within Mail::SpamAssassin::Message::CORE:unlink which was called 16 times, avg 196µs/call:
# 16 times (3.14ms+0s) by Mail::SpamAssassin::Message::finish at line 667, avg 196µs/call |