Filename | /usr/local/lib/perl5/site_perl/Mail/SpamAssassin/Message.pm |
Statements | Executed 896006 statements in 5.43s |
Calls | P | F | Exclusive Time |
Inclusive Time |
Subroutine |
---|---|---|---|---|---|
234 | 1 | 1 | 2.35s | 4.67s | new | Mail::SpamAssassin::Message::
197 | 1 | 1 | 1.97s | 2.30s | _parse_multipart | Mail::SpamAssassin::Message::
114442 | 20 | 1 | 412ms | 412ms | CORE:match (opcode) | Mail::SpamAssassin::Message::
1170 | 3 | 1 | 154ms | 26.2s | get_body_text_array_common | Mail::SpamAssassin::Message::
702 | 1 | 1 | 143ms | 143ms | split_into_array_of_short_lines | Mail::SpamAssassin::Message::
234 | 1 | 1 | 103ms | 124ms | finish | Mail::SpamAssassin::Message::
234 | 1 | 1 | 64.6ms | 2.58s | parse_body | Mail::SpamAssassin::Message::
425 | 1 | 1 | 62.1ms | 211ms | _parse_normal | Mail::SpamAssassin::Message::
8112 | 2 | 1 | 51.6ms | 51.6ms | CORE:subst (opcode) | Mail::SpamAssassin::Message::
1246 | 4 | 1 | 33.0ms | 33.0ms | CORE:regcomp (opcode) | Mail::SpamAssassin::Message::
555 | 2 | 2 | 29.4ms | 3.19s | receive_date | Mail::SpamAssassin::Message::
234 | 1 | 1 | 26.2ms | 27.2ms | get_all_metadata | Mail::SpamAssassin::Message::
1876 | 2 | 1 | 23.9ms | 23.9ms | get_metadata | Mail::SpamAssassin::Message::
936 | 4 | 1 | 17.4ms | 17.4ms | put_metadata | Mail::SpamAssassin::Message::
702 | 1 | 1 | 16.0ms | 2.74s | find_parts | Mail::SpamAssassin::Message::
702 | 1 | 1 | 11.1ms | 25.8s | get_rendered_body_text_array | Mail::SpamAssassin::Message::
1 | 1 | 1 | 8.83ms | 58.9ms | BEGIN@55 | Mail::SpamAssassin::Message::
468 | 2 | 2 | 8.55ms | 3.81s | extract_message_metadata | Mail::SpamAssassin::Message::
234 | 1 | 1 | 6.32ms | 17.5ms | finish_metadata | Mail::SpamAssassin::Message::
555 | 1 | 1 | 6.21ms | 6.21ms | get_pristine_body | Mail::SpamAssassin::Message::
702 | 1 | 1 | 4.66ms | 4.66ms | CORE:qr (opcode) | Mail::SpamAssassin::Message::
1 | 1 | 1 | 4.62ms | 9.28ms | BEGIN@49 | Mail::SpamAssassin::Message::
234 | 1 | 1 | 4.37ms | 106ms | get_invisible_rendered_body_text_array | Mail::SpamAssassin::Message::
234 | 1 | 1 | 4.19ms | 248ms | get_visible_rendered_body_text_array | Mail::SpamAssassin::Message::
234 | 1 | 1 | 3.62ms | 3.62ms | get_pristine_header | Mail::SpamAssassin::Message::
16 | 1 | 1 | 3.13ms | 3.13ms | CORE:unlink (opcode) | Mail::SpamAssassin::Message::
79 | 1 | 1 | 1.45ms | 1.45ms | DESTROY | Mail::SpamAssassin::Message::
1 | 1 | 1 | 1.35ms | 218ms | BEGIN@56 | Mail::SpamAssassin::Message::
234 | 1 | 1 | 994µs | 994µs | CORE:sort (opcode) | Mail::SpamAssassin::Message::
16 | 1 | 1 | 200µs | 200µs | CORE:close (opcode) | Mail::SpamAssassin::Message::
1 | 1 | 1 | 48µs | 62µs | BEGIN@45 | Mail::SpamAssassin::Message::
1 | 1 | 1 | 32µs | 101µs | BEGIN@47 | Mail::SpamAssassin::Message::
1 | 1 | 1 | 29µs | 746µs | BEGIN@57 | Mail::SpamAssassin::Message::
1 | 1 | 1 | 25µs | 67µs | BEGIN@46 | Mail::SpamAssassin::Message::
1 | 1 | 1 | 24µs | 24µs | BEGIN@54 | Mail::SpamAssassin::Message::
1 | 1 | 1 | 23µs | 94µs | BEGIN@60 | Mail::SpamAssassin::Message::
1 | 1 | 1 | 21µs | 203µs | BEGIN@58 | Mail::SpamAssassin::Message::
0 | 0 | 0 | 0s | 0s | delete_metadata | Mail::SpamAssassin::Message::
0 | 0 | 0 | 0s | 0s | get_body | Mail::SpamAssassin::Message::
0 | 0 | 0 | 0s | 0s | get_decoded_body_text_array | Mail::SpamAssassin::Message::
0 | 0 | 0 | 0s | 0s | get_mbox_separator | Mail::SpamAssassin::Message::
0 | 0 | 0 | 0s | 0s | get_mimepart_digests | Mail::SpamAssassin::Message::
0 | 0 | 0 | 0s | 0s | get_pristine | Mail::SpamAssassin::Message::
0 | 0 | 0 | 0s | 0s | split_into_array_of_short_paragraphs | Mail::SpamAssassin::Message::
Line | State ments |
Time on line |
Calls | Time in subs |
Code |
---|---|---|---|---|---|
1 | # <@LICENSE> | ||||
2 | # Licensed to the Apache Software Foundation (ASF) under one or more | ||||
3 | # contributor license agreements. See the NOTICE file distributed with | ||||
4 | # this work for additional information regarding copyright ownership. | ||||
5 | # The ASF licenses this file to you under the Apache License, Version 2.0 | ||||
6 | # (the "License"); you may not use this file except in compliance with | ||||
7 | # the License. You may obtain a copy of the License at: | ||||
8 | # | ||||
9 | # http://www.apache.org/licenses/LICENSE-2.0 | ||||
10 | # | ||||
11 | # Unless required by applicable law or agreed to in writing, software | ||||
12 | # distributed under the License is distributed on an "AS IS" BASIS, | ||||
13 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||||
14 | # See the License for the specific language governing permissions and | ||||
15 | # limitations under the License. | ||||
16 | # </@LICENSE> | ||||
17 | |||||
18 | =head1 NAME | ||||
19 | |||||
20 | Mail::SpamAssassin::Message - decode, render, and hold an RFC-2822 message | ||||
21 | |||||
22 | =head1 DESCRIPTION | ||||
23 | |||||
24 | This module encapsulates an email message and allows access to the various MIME | ||||
25 | message parts and message metadata. | ||||
26 | |||||
27 | The message structure, after initiating a parse() cycle, looks like this: | ||||
28 | |||||
29 | Message object, also top-level node in Message::Node tree | ||||
30 | | | ||||
31 | +---> Message::Node for other parts in MIME structure | ||||
32 | | |---> [ more Message::Node parts ... ] | ||||
33 | | [ others ... ] | ||||
34 | | | ||||
35 | +---> Message::Metadata object to hold metadata | ||||
36 | |||||
37 | =head1 PUBLIC METHODS | ||||
38 | |||||
39 | =over 4 | ||||
40 | |||||
41 | =cut | ||||
42 | |||||
43 | package Mail::SpamAssassin::Message; | ||||
44 | |||||
45 | 2 | 69µs | 2 | 76µs | # spent 62µs (48+14) within Mail::SpamAssassin::Message::BEGIN@45 which was called:
# once (48µs+14µs) by Mail::SpamAssassin::BEGIN@75 at line 45 # spent 62µs making 1 call to Mail::SpamAssassin::Message::BEGIN@45
# spent 14µs making 1 call to strict::import |
46 | 2 | 61µs | 2 | 110µs | # spent 67µs (25+42) within Mail::SpamAssassin::Message::BEGIN@46 which was called:
# once (25µs+42µs) by Mail::SpamAssassin::BEGIN@75 at line 46 # spent 67µs making 1 call to Mail::SpamAssassin::Message::BEGIN@46
# spent 42µs making 1 call to warnings::import |
47 | 2 | 155µs | 2 | 170µs | # spent 101µs (32+69) within Mail::SpamAssassin::Message::BEGIN@47 which was called:
# once (32µs+69µs) by Mail::SpamAssassin::BEGIN@75 at line 47 # spent 101µs making 1 call to Mail::SpamAssassin::Message::BEGIN@47
# spent 69µs making 1 call to re::import |
48 | |||||
49 | # spent 9.28ms (4.62+4.66) within Mail::SpamAssassin::Message::BEGIN@49 which was called:
# once (4.62ms+4.66ms) by Mail::SpamAssassin::BEGIN@75 at line 52 | ||||
50 | 3 | 264µs | 1 | 736µs | eval { require Digest::SHA; import Digest::SHA qw(sha1 sha1_hex); 1 } # spent 736µs making 1 call to Exporter::import |
51 | 1 | 11µs | or do { require Digest::SHA1; import Digest::SHA1 qw(sha1 sha1_hex) } | ||
52 | 1 | 76µs | 1 | 9.28ms | } # spent 9.28ms making 1 call to Mail::SpamAssassin::Message::BEGIN@49 |
53 | |||||
54 | 2 | 74µs | 1 | 24µs | # spent 24µs within Mail::SpamAssassin::Message::BEGIN@54 which was called:
# once (24µs+0s) by Mail::SpamAssassin::BEGIN@75 at line 54 # spent 24µs making 1 call to Mail::SpamAssassin::Message::BEGIN@54 |
55 | 2 | 406µs | 1 | 58.9ms | # spent 58.9ms (8.83+50.1) within Mail::SpamAssassin::Message::BEGIN@55 which was called:
# once (8.83ms+50.1ms) by Mail::SpamAssassin::BEGIN@75 at line 55 # spent 58.9ms making 1 call to Mail::SpamAssassin::Message::BEGIN@55 |
56 | 2 | 414µs | 1 | 218ms | # spent 218ms (1.35+217) within Mail::SpamAssassin::Message::BEGIN@56 which was called:
# once (1.35ms+217ms) by Mail::SpamAssassin::BEGIN@75 at line 56 # spent 218ms making 1 call to Mail::SpamAssassin::Message::BEGIN@56 |
57 | 2 | 76µs | 2 | 1.46ms | # spent 746µs (29+717) within Mail::SpamAssassin::Message::BEGIN@57 which was called:
# once (29µs+717µs) by Mail::SpamAssassin::BEGIN@75 at line 57 # spent 746µs making 1 call to Mail::SpamAssassin::Message::BEGIN@57
# spent 717µs making 1 call to Exporter::import |
58 | 2 | 66µs | 2 | 385µs | # spent 203µs (21+182) within Mail::SpamAssassin::Message::BEGIN@58 which was called:
# once (21µs+182µs) by Mail::SpamAssassin::BEGIN@75 at line 58 # spent 203µs making 1 call to Mail::SpamAssassin::Message::BEGIN@58
# spent 182µs making 1 call to Exporter::import |
59 | |||||
60 | 2 | 9.84ms | 2 | 164µs | # spent 94µs (23+70) within Mail::SpamAssassin::Message::BEGIN@60 which was called:
# once (23µs+70µs) by Mail::SpamAssassin::BEGIN@75 at line 60 # spent 94µs making 1 call to Mail::SpamAssassin::Message::BEGIN@60
# spent 70µs making 1 call to vars::import |
61 | |||||
62 | 1 | 24µs | @ISA = qw(Mail::SpamAssassin::Message::Node); | ||
63 | |||||
64 | # --------------------------------------------------------------------------- | ||||
65 | |||||
66 | =item new() | ||||
67 | |||||
68 | Creates a Mail::SpamAssassin::Message object. Takes a hash reference | ||||
69 | as a parameter. The used hash key/value pairs are as follows: | ||||
70 | |||||
71 | C<message> is either undef (which will use STDIN), a scalar - a string | ||||
72 | containing an entire message, a reference to such string, an array reference | ||||
73 | of the message with one line per array element, or either a file glob | ||||
74 | or an IO::File object which holds the entire contents of the message. | ||||
75 | |||||
76 | Note: The message is expected to generally be in RFC 2822 format, optionally | ||||
77 | including an mbox message separator line (the "From " line) as the first line. | ||||
78 | |||||
79 | C<parse_now> specifies whether or not to create the MIME tree | ||||
80 | at object-creation time or later as necessary. | ||||
81 | |||||
82 | The I<parse_now> option, by default, is set to false (0). | ||||
83 | This allows SpamAssassin to not have to generate the tree of | ||||
84 | Mail::SpamAssassin::Message::Node objects and their related data if the | ||||
85 | tree is not going to be used. This is handy, for instance, when running | ||||
86 | C<spamassassin -d>, which only needs the pristine header and body which | ||||
87 | is always handled when the object is created. | ||||
88 | |||||
89 | C<subparse> specifies how many MIME recursion levels should be parsed. | ||||
90 | Defaults to 20. | ||||
91 | |||||
92 | =cut | ||||
93 | |||||
94 | # month mappings (ripped from Util.pm) | ||||
95 | 1 | 14µs | my %MONTH = (jan => 1, feb => 2, mar => 3, apr => 4, may => 5, jun => 6, | ||
96 | jul => 7, aug => 8, sep => 9, oct => 10, nov => 11, dec => 12); | ||||
97 | |||||
98 | # day of week mapping (starting from zero) | ||||
99 | 1 | 4µs | my @DAY_OF_WEEK = qw/Sun Mon Tue Wed Thu Fri Sat/ ; | ||
100 | |||||
101 | # spent 4.67s (2.35+2.32) within Mail::SpamAssassin::Message::new which was called 234 times, avg 20.0ms/call:
# 234 times (2.35s+2.32s) by Mail::SpamAssassin::parse at line 551 of Mail/SpamAssassin.pm, avg 20.0ms/call | ||||
102 | 234 | 658µs | my $class = shift; | ||
103 | 234 | 631µs | $class = ref($class) || $class; | ||
104 | |||||
105 | 234 | 543µs | my($opts) = @_; | ||
106 | 234 | 976µs | my $message = defined $opts->{'message'} ? $opts->{'message'} : \*STDIN; | ||
107 | 234 | 784µs | my $parsenow = $opts->{'parsenow'} || 0; | ||
108 | 234 | 627µs | my $normalize = $opts->{'normalize'} || 0; | ||
109 | |||||
110 | # Specifies whether or not to parse message/rfc822 parts into its own tree. | ||||
111 | # If the # > 0, it'll subparse, otherwise it won't. By default, do twenty | ||||
112 | # levels deep. | ||||
113 | 234 | 744µs | my $subparse = defined $opts->{'subparse'} ? $opts->{'subparse'} : 20; | ||
114 | |||||
115 | 234 | 3.10ms | 234 | 7.26ms | my $self = $class->SUPER::new({normalize=>$normalize}); # spent 7.26ms making 234 calls to Mail::SpamAssassin::Message::Node::new, avg 31µs/call |
116 | |||||
117 | 234 | 779µs | $self->{tmpfiles} = []; | ||
118 | 234 | 926µs | $self->{pristine_headers} = ''; | ||
119 | 234 | 783µs | $self->{pristine_body} = ''; | ||
120 | 234 | 1.04ms | $self->{mime_boundary_state} = {}; | ||
121 | 234 | 770µs | $self->{line_ending} = "\012"; | ||
122 | 234 | 706µs | $self->{master_deadline} = $opts->{'master_deadline'}; | ||
123 | 234 | 622µs | $self->{suppl_attrib} = $opts->{'suppl_attrib'}; | ||
124 | |||||
125 | 234 | 519µs | if ($self->{suppl_attrib}) { # caller-provided additional information | ||
126 | # pristine_body_length is currently used by an eval test check_body_length | ||||
127 | # Possible To-Do: Base the length on the @message array later down? | ||||
128 | if (defined $self->{suppl_attrib}{body_size}) { | ||||
129 | # Optional info provided by a caller; should reflect the original | ||||
130 | # message body size if provided, and as such it may differ from the | ||||
131 | # $self->{pristine_body} size, e.g. when the caller passed a truncated | ||||
132 | # message to SpamAssassin, or when counting line-endings differently. | ||||
133 | $self->{pristine_body_length} = $self->{suppl_attrib}{body_size}; | ||||
134 | } | ||||
135 | if (ref $self->{suppl_attrib}{mimepart_digests}) { | ||||
136 | # Optional info provided by a caller: an array of digest codes (e.g. SHA1) | ||||
137 | # of each MIME part. Should reflect the original message if provided. | ||||
138 | # As such it may differ from digests calculated by get_mimepart_digests(), | ||||
139 | # e.g. when the caller passed a truncated message to SpamAssassin. | ||||
140 | $self->{mimepart_digests} = $self->{suppl_attrib}{mimepart_digests}; | ||||
141 | } | ||||
142 | } | ||||
143 | |||||
144 | 234 | 577µs | bless($self,$class); | ||
145 | |||||
146 | # create the metadata holder class | ||||
147 | 234 | 2.73ms | 234 | 4.52ms | $self->{metadata} = Mail::SpamAssassin::Message::Metadata->new($self); # spent 4.52ms making 234 calls to Mail::SpamAssassin::Message::Metadata::new, avg 19µs/call |
148 | |||||
149 | # Ok, go ahead and do the message "parsing" | ||||
150 | |||||
151 | # protect it from abuse ... | ||||
152 | 234 | 457µs | local $_; | ||
153 | |||||
154 | # Figure out how the message was passed to us, and deal with it. | ||||
155 | 234 | 488µs | my @message; | ||
156 | 234 | 1.36ms | if (ref $message eq 'ARRAY') { | ||
157 | 468 | 53.3ms | @message = @{$message}; | ||
158 | } | ||||
159 | elsif (ref($message) eq 'GLOB' || ref($message) =~ /^IO::/) { | ||||
160 | if (defined fileno $message) { | ||||
161 | |||||
162 | # sysread+split avoids a Perl I/O bug (Bug 5985) | ||||
163 | # and is faster than (<$message>) by 10..25 % | ||||
164 | # (a drawback is a short-term double storage of a text in $raw_str) | ||||
165 | # | ||||
166 | my($nread,$raw_str); $raw_str = ''; | ||||
167 | while ( $nread=sysread($message, $raw_str, 16384, length $raw_str) ) { } | ||||
168 | defined $nread or die "error reading: $!"; | ||||
169 | @message = split(/^/m, $raw_str, -1); | ||||
170 | |||||
171 | if ($raw_str eq '') { | ||||
172 | dbg("message: empty message read"); | ||||
173 | } elsif (length($raw_str) > 128*1024) { | ||||
174 | # ditch rarely used large chunks of allocated memory, Bug 6514 | ||||
175 | # http://www.perlmonks.org/?node_id=803515 | ||||
176 | # about 97% of mail messages are below 128 kB, | ||||
177 | # about 98% of mail messages are below 256 kB (2010 statistics) | ||||
178 | # dbg("message: deallocating %.2f MB", length($raw_str)/1024/1024); | ||||
179 | undef $raw_str; | ||||
180 | } | ||||
181 | } | ||||
182 | } | ||||
183 | elsif (ref $message eq 'SCALAR') { | ||||
184 | @message = split(/^/m, $$message, -1); | ||||
185 | } | ||||
186 | elsif (ref $message) { | ||||
187 | dbg("message: Input is a reference of unknown type!"); | ||||
188 | } | ||||
189 | elsif (defined $message) { | ||||
190 | @message = split(/^/m, $message, -1); | ||||
191 | } | ||||
192 | |||||
193 | # Pull off mbox and mbx separators | ||||
194 | # also deal with null messages | ||||
195 | 234 | 4.68ms | 468 | 1.57ms | if (!@message) { # spent 1.57ms making 468 calls to Mail::SpamAssassin::Message::CORE:match, avg 3µs/call |
196 | # bug 4884: | ||||
197 | # if we get here, it means that the input was null, so fake the message | ||||
198 | # content as a single newline... | ||||
199 | @message = ("\n"); | ||||
200 | } elsif ($message[0] =~ /^From\s+(?!:)/) { | ||||
201 | # careful not to confuse with obsolete syntax which allowed WSP before ':' | ||||
202 | # mbox formated mailbox | ||||
203 | $self->{'mbox_sep'} = shift @message; | ||||
204 | } elsif ($message[0] =~ MBX_SEPARATOR) { | ||||
205 | $_ = shift @message; | ||||
206 | |||||
207 | # Munge the mbx message separator into mbox format as a sort of | ||||
208 | # de facto portability standard in SA's internals. We need to | ||||
209 | # to this so that Mail::SpamAssassin::Util::parse_rfc822_date | ||||
210 | # can parse the date string... | ||||
211 | if (/([\s\d]\d)-([a-zA-Z]{3})-(\d{4})\s(\d{2}):(\d{2}):(\d{2})/) { | ||||
212 | # $1 = day of month | ||||
213 | # $2 = month (text) | ||||
214 | # $3 = year | ||||
215 | # $4 = hour | ||||
216 | # $5 = min | ||||
217 | # $6 = sec | ||||
218 | my @arr = localtime(timelocal($6,$5,$4,$1,$MONTH{lc($2)}-1,$3)); | ||||
219 | my $address; | ||||
220 | foreach (@message) { | ||||
221 | if (/^From:[^<]*<([^>]+)>/) { | ||||
222 | $address = $1; | ||||
223 | last; | ||||
224 | } elsif (/^From:\s*([^<> ]+)/) { | ||||
225 | $address = $1; | ||||
226 | last; | ||||
227 | } | ||||
228 | } | ||||
229 | $self->{'mbox_sep'} = "From $address $DAY_OF_WEEK[$arr[6]] $2 $1 $4:$5:$6 $3\n"; | ||||
230 | } | ||||
231 | } | ||||
232 | |||||
233 | # bug 4363 | ||||
234 | # Check to see if we should do CRLF instead of just LF | ||||
235 | # For now, just check the first and last line and do whatever it does | ||||
236 | 234 | 3.92ms | 468 | 1.41ms | if (@message && ($message[0] =~ /\015\012/ || $message[-1] =~ /\015\012/)) { # spent 1.41ms making 468 calls to Mail::SpamAssassin::Message::CORE:match, avg 3µs/call |
237 | $self->{line_ending} = "\015\012"; | ||||
238 | dbg("message: line ending changed to CRLF"); | ||||
239 | } | ||||
240 | |||||
241 | # Is a CRLF -> LF line endings conversion necessary? | ||||
242 | 234 | 1.00ms | my $squash_crlf = $self->{line_ending} eq "\015\012"; | ||
243 | |||||
244 | # Go through all the header fields of the message | ||||
245 | 234 | 510µs | my $hdr_errors = 0; | ||
246 | 234 | 406µs | my $header; | ||
247 | 234 | 429µs | for (;;) { | ||
248 | # make sure not to lose the last header field when there is no body | ||||
249 | 13487 | 42.3ms | my $eof = !@message; | ||
250 | 13487 | 45.5ms | my $current = $eof ? "\n" : shift @message; | ||
251 | |||||
252 | 13487 | 252ms | 13487 | 51.5ms | if ( $current =~ /^[ \t]/ ) { # spent 51.5ms making 13487 calls to Mail::SpamAssassin::Message::CORE:match, avg 4µs/call |
253 | # This wasn't useful in terms of a rule, but we may want to treat it | ||||
254 | # specially at some point. Perhaps ignore it? | ||||
255 | #unless ($current =~ /\S/) { | ||||
256 | # $self->{'obsolete_folding_whitespace'} = 1; | ||||
257 | #} | ||||
258 | |||||
259 | 5843 | 10.0ms | $header = '' if !defined $header; # header starts with a continuation!? | ||
260 | 5843 | 16.4ms | $header .= $current; # append continuations, no matter what | ||
261 | 5843 | 37.9ms | $self->{'pristine_headers'} .= $current; | ||
262 | } | ||||
263 | else { # not a continuation | ||||
264 | # Ok, there's a header here, let's go ahead and add it in. | ||||
265 | 7644 | 26.9ms | if (defined $header) { # deal with a previous header field | ||
266 | 7410 | 72.1ms | my ($key, $value) = split (/:/s, $header, 2); | ||
267 | |||||
268 | # If it's not a valid header (aka: not in the form "foo:bar"), skip it. | ||||
269 | 7410 | 29.2ms | if (defined $value) { | ||
270 | # CRLF -> LF line-endings conversion if necessary | ||||
271 | 7410 | 11.6ms | $value =~ s/\015\012/\012/sg if $squash_crlf; | ||
272 | 7410 | 99.9ms | 7410 | 22.9ms | $key =~ s/[ \t]+\z//; # strip WSP before colon, obsolete rfc822 syn # spent 22.9ms making 7410 calls to Mail::SpamAssassin::Message::CORE:subst, avg 3µs/call |
273 | # limit the length of the pairs we store | ||||
274 | 7410 | 23.9ms | if (length($key) > MAX_HEADER_KEY_LENGTH) { | ||
275 | $key = substr($key, 0, MAX_HEADER_KEY_LENGTH); | ||||
276 | $self->{'truncated_header'} = 1; | ||||
277 | } | ||||
278 | 7410 | 14.4ms | if (length($value) > MAX_HEADER_VALUE_LENGTH) { | ||
279 | $value = substr($value, 0, MAX_HEADER_VALUE_LENGTH); | ||||
280 | $self->{'truncated_header'} = 1; | ||||
281 | } | ||||
282 | 7410 | 50.0ms | 7410 | 1.83s | $self->header($key, $value); # spent 1.83s making 7410 calls to Mail::SpamAssassin::Message::Node::header, avg 247µs/call |
283 | } | ||||
284 | } | ||||
285 | |||||
286 | 7644 | 227ms | 15054 | 43.4ms | if ($current =~ /^\r?$/) { # a regular end of a header section # spent 43.4ms making 15054 calls to Mail::SpamAssassin::Message::CORE:match, avg 3µs/call |
287 | 234 | 875µs | if ($eof) { | ||
288 | $self->{'missing_head_body_separator'} = 1; | ||||
289 | } else { | ||||
290 | 234 | 764µs | $self->{'pristine_headers'} .= $current; | ||
291 | } | ||||
292 | 234 | 718µs | last; | ||
293 | } | ||||
294 | elsif ($current =~ /^--/) { # mime boundary encountered, bail out | ||||
295 | $self->{'missing_head_body_separator'} = 1; | ||||
296 | unshift(@message, $current); | ||||
297 | last; | ||||
298 | } | ||||
299 | # should we assume entering a body on encountering invalid header field? | ||||
300 | else { | ||||
301 | # no re "strict"; # since perl 5.21.8: Ranges of ASCII printables... | ||||
302 | 7410 | 107ms | 7410 | 45.9ms | if ($current !~ /^[\041-\071\073-\176]+[ \t]*:/) { # spent 45.9ms making 7410 calls to Mail::SpamAssassin::Message::CORE:match, avg 6µs/call |
303 | # A field name MUST be composed of printable US-ASCII characters | ||||
304 | # (i.e., characters that have values between 33 (041) and 126 (176), | ||||
305 | # inclusive), except colon (072). Obsolete header field syntax | ||||
306 | # allowed WSP before a colon. | ||||
307 | if (++$hdr_errors <= 3) { | ||||
308 | # just consume but ignore a few invalid header fields | ||||
309 | } else { # enough is enough... | ||||
310 | $self->{'missing_head_body_separator'} = 1; | ||||
311 | unshift(@message, $current); | ||||
312 | last; | ||||
313 | } | ||||
314 | } | ||||
315 | } | ||||
316 | |||||
317 | # start collecting a new header field | ||||
318 | 7410 | 16.6ms | $header = $current; | ||
319 | 7410 | 49.2ms | $self->{'pristine_headers'} .= $current; | ||
320 | } | ||||
321 | } | ||||
322 | 234 | 609µs | undef $header; | ||
323 | |||||
324 | # Store the pristine body for later -- store as a copy since @message | ||||
325 | # will get modified below | ||||
326 | 234 | 25.6ms | $self->{'pristine_body'} = join('', @message); | ||
327 | |||||
328 | 234 | 1.04ms | if (!defined $self->{pristine_body_length}) { | ||
329 | 234 | 1.25ms | $self->{'pristine_body_length'} = length $self->{'pristine_body'}; | ||
330 | } | ||||
331 | |||||
332 | # iterate over lines in reverse order | ||||
333 | # merge multiple blank lines into a single one | ||||
334 | 234 | 483µs | my $start; | ||
335 | 234 | 230ms | for (my $cnt=$#message; $cnt>=0; $cnt--) { | ||
336 | # CRLF -> LF line-endings conversion if necessary | ||||
337 | 71012 | 111ms | $message[$cnt] =~ s/\015\012\z/\012/ if $squash_crlf; | ||
338 | |||||
339 | # line is blank | ||||
340 | 71012 | 881ms | 71012 | 239ms | if ($message[$cnt] =~ /^\s*$/) { # spent 239ms making 71012 calls to Mail::SpamAssassin::Message::CORE:match, avg 3µs/call |
341 | # /^\s*$/ is about 5% faster then !/\S/, but still expensive here | ||||
342 | 4562 | 12.7ms | if (!defined $start) { | ||
343 | 3329 | 5.66ms | $start=$cnt; | ||
344 | } | ||||
345 | 4562 | 9.02ms | next unless $cnt == 0; | ||
346 | } | ||||
347 | |||||
348 | # line is not blank, or we've reached the beginning | ||||
349 | |||||
350 | # if we've got a series of blank lines, get rid of them | ||||
351 | 66480 | 118ms | if (defined $start) { | ||
352 | 3329 | 5.66ms | my $max_blank_lines = 20; | ||
353 | 3329 | 5.79ms | my $num = $start-$cnt; | ||
354 | 3329 | 5.48ms | if ($num > $max_blank_lines) { | ||
355 | 6 | 38µs | splice @message, $cnt+2, $num-$max_blank_lines; | ||
356 | } | ||||
357 | 3329 | 27.3ms | undef $start; | ||
358 | } | ||||
359 | } | ||||
360 | |||||
361 | # Figure out the boundary | ||||
362 | 234 | 493µs | my ($boundary); | ||
363 | 234 | 5.32ms | 468 | 56.5ms | ($self->{'type'}, $boundary) = Mail::SpamAssassin::Util::parse_content_type($self->header('content-type')); # spent 42.4ms making 234 calls to Mail::SpamAssassin::Util::parse_content_type, avg 181µs/call
# spent 14.1ms making 234 calls to Mail::SpamAssassin::Message::Node::header, avg 60µs/call |
364 | 234 | 2.70ms | 234 | 2.30ms | dbg("message: main message type: ".$self->{'type'}); # spent 2.30ms making 234 calls to Mail::SpamAssassin::Logger::dbg, avg 10µs/call |
365 | |||||
366 | # dbg("message: \$message[0]: \"" . $message[0] . "\""); | ||||
367 | |||||
368 | # bug 6845: if main message type is multipart and the message body does not begin with | ||||
369 | # either a blank line or the boundary (if defined), insert a blank line | ||||
370 | # to ensure proper parsing - do not consider MIME headers at the beginning of the body | ||||
371 | # to be part of the message headers. | ||||
372 | 234 | 5.31ms | 427 | 1.82ms | if ($self->{'type'} =~ /^multipart\//i && $#message > 0 && $message[0] =~ /\S/) # spent 1.82ms making 427 calls to Mail::SpamAssassin::Message::CORE:match, avg 4µs/call |
373 | { | ||||
374 | 170 | 18.9ms | 340 | 9.63ms | if (!defined $boundary || $message[0] !~ /^--\Q$boundary\E/) # spent 9.05ms making 170 calls to Mail::SpamAssassin::Message::CORE:regcomp, avg 53µs/call
# spent 585µs making 170 calls to Mail::SpamAssassin::Message::CORE:match, avg 3µs/call |
375 | { | ||||
376 | 39 | 259µs | 39 | 295µs | dbg("message: Inserting blank line at top of body to ensure correct multipart MIME parsing"); # spent 295µs making 39 calls to Mail::SpamAssassin::Logger::dbg, avg 8µs/call |
377 | 39 | 190µs | unshift(@message, "\012"); | ||
378 | } | ||||
379 | } | ||||
380 | |||||
381 | # dbg("message: \$message[0]: \"" . $message[0] . "\""); | ||||
382 | # dbg("message: \$message[1]: \"" . $message[1] . "\""); | ||||
383 | |||||
384 | # parse queue, simple array of parts to parse: | ||||
385 | # 0: part object, already in the tree | ||||
386 | # 1: boundary used to focus body parsing | ||||
387 | # 2: message content | ||||
388 | # 3: how many MIME subparts to parse down | ||||
389 | # | ||||
390 | 234 | 1.94ms | $self->{'parse_queue'} = [ [ $self, $boundary, \@message, $subparse ] ]; | ||
391 | |||||
392 | # If the message does need to get parsed, save off a copy of the body | ||||
393 | # in a format we can easily parse later so we don't have to rip from | ||||
394 | # pristine_body ... If we do want to parse now, go ahead and do so ... | ||||
395 | # | ||||
396 | 234 | 453µs | if ($parsenow) { | ||
397 | $self->parse_body(); | ||||
398 | } | ||||
399 | |||||
400 | 234 | 4.80ms | $self; | ||
401 | } | ||||
402 | |||||
403 | # --------------------------------------------------------------------------- | ||||
404 | |||||
405 | =item find_parts() | ||||
406 | |||||
407 | Used to search the tree for specific MIME parts. See | ||||
408 | I<Mail::SpamAssassin::Message::Node> for more details. | ||||
409 | |||||
410 | =cut | ||||
411 | |||||
412 | # Used to find any MIME parts whose simple content-type matches a given regexp | ||||
413 | # Searches it's own and any children parts. Returns an array of MIME | ||||
414 | # objects which match. | ||||
415 | # | ||||
416 | # spent 2.74s (16.0ms+2.72) within Mail::SpamAssassin::Message::find_parts which was called 702 times, avg 3.90ms/call:
# 702 times (16.0ms+2.72s) by Mail::SpamAssassin::Message::get_body_text_array_common at line 1111, avg 3.90ms/call | ||||
417 | 702 | 1.56ms | my $self = shift; | ||
418 | |||||
419 | # ok, we need to do the parsing now... | ||||
420 | 702 | 3.41ms | 234 | 2.58s | $self->parse_body() if (exists $self->{'parse_queue'}); # spent 2.58s making 234 calls to Mail::SpamAssassin::Message::parse_body, avg 11.0ms/call |
421 | |||||
422 | # and pass through to the Message::Node version of the method | ||||
423 | 702 | 10.5ms | 702 | 136ms | return $self->SUPER::find_parts(@_); # spent 136ms making 702 calls to Mail::SpamAssassin::Message::Node::find_parts, avg 194µs/call |
424 | } | ||||
425 | |||||
426 | # --------------------------------------------------------------------------- | ||||
427 | |||||
428 | =item get_pristine_header() | ||||
429 | |||||
430 | Returns pristine headers of the message. If no specific header name | ||||
431 | is given as a parameter (case-insensitive), then all headers will be | ||||
432 | returned as a scalar, including the blank line at the end of the headers. | ||||
433 | |||||
434 | If called in an array context, an array will be returned with each | ||||
435 | specific header in a different element. In a scalar context, the last | ||||
436 | specific header is returned. | ||||
437 | |||||
438 | ie: If 'Subject' is specified as the header, and there are 2 Subject | ||||
439 | headers in a message, the last/bottom one in the message is returned in | ||||
440 | scalar context or both are returned in array context. | ||||
441 | |||||
442 | Btw, returning the last header field (not the first) happens to be consistent | ||||
443 | with DKIM signatures, which search for and cover multiple header fields | ||||
444 | bottom-up according to the 'h' tag. Let's keep it this way. | ||||
445 | |||||
446 | Note: the returned header will include the ending newline and any embedded | ||||
447 | whitespace folding. | ||||
448 | |||||
449 | =cut | ||||
450 | |||||
451 | # spent 3.62ms within Mail::SpamAssassin::Message::get_pristine_header which was called 234 times, avg 15µs/call:
# 234 times (3.62ms+0s) by Mail::SpamAssassin::PerMsgStatus::_get at line 1913 of Mail/SpamAssassin/PerMsgStatus.pm, avg 15µs/call | ||||
452 | 234 | 650µs | my ($self, $hdr) = @_; | ||
453 | |||||
454 | 234 | 3.30ms | return $self->{pristine_headers} if !defined $hdr || $hdr eq ''; | ||
455 | my(@ret) = | ||||
456 | $self->{pristine_headers} =~ /^\Q$hdr\E[ \t]*:[ \t]*(.*?\n(?![ \t]))/smgi; | ||||
457 | # taintedness is retained by "use re 'taint'" (fix in bug 5283 now redundant) | ||||
458 | if (!@ret) { | ||||
459 | return $self->get_header($hdr); | ||||
460 | } elsif (wantarray) { | ||||
461 | return @ret; | ||||
462 | } else { | ||||
463 | return $ret[-1]; | ||||
464 | } | ||||
465 | } | ||||
466 | |||||
467 | =item get_mbox_separator() | ||||
468 | |||||
469 | Returns the mbox separator found in the message, or undef if there | ||||
470 | wasn't one. | ||||
471 | |||||
472 | =cut | ||||
473 | |||||
474 | sub get_mbox_separator { | ||||
475 | return $_[0]->{mbox_sep}; | ||||
476 | } | ||||
477 | |||||
478 | =item get_body() | ||||
479 | |||||
480 | Returns an array of the pristine message body, one line per array element. | ||||
481 | |||||
482 | =cut | ||||
483 | |||||
484 | sub get_body { | ||||
485 | my ($self) = @_; | ||||
486 | my @ret = split(/^/m, $self->{pristine_body}); | ||||
487 | return \@ret; | ||||
488 | } | ||||
489 | |||||
490 | # --------------------------------------------------------------------------- | ||||
491 | |||||
492 | =item get_pristine() | ||||
493 | |||||
494 | Returns a scalar of the entire pristine message. | ||||
495 | |||||
496 | =cut | ||||
497 | |||||
498 | sub get_pristine { | ||||
499 | my ($self) = @_; | ||||
500 | return $self->{pristine_headers} . $self->{pristine_body}; | ||||
501 | } | ||||
502 | |||||
503 | =item get_pristine_body() | ||||
504 | |||||
505 | Returns a scalar of the pristine message body. | ||||
506 | |||||
507 | =cut | ||||
508 | |||||
509 | # spent 6.21ms within Mail::SpamAssassin::Message::get_pristine_body which was called 555 times, avg 11µs/call:
# 555 times (6.21ms+0s) by Mail::SpamAssassin::Plugin::Bayes::get_msgid at line 998 of Mail/SpamAssassin/Plugin/Bayes.pm, avg 11µs/call | ||||
510 | 555 | 1.33ms | my ($self) = @_; | ||
511 | 555 | 5.87ms | return $self->{pristine_body}; | ||
512 | } | ||||
513 | |||||
514 | # --------------------------------------------------------------------------- | ||||
515 | |||||
516 | =item extract_message_metadata($permsgstatus) | ||||
517 | |||||
518 | =cut | ||||
519 | |||||
520 | # spent 3.81s (8.55ms+3.81) within Mail::SpamAssassin::Message::extract_message_metadata which was called 468 times, avg 8.15ms/call:
# 234 times (6.15ms+3.81s) by Mail::SpamAssassin::PerMsgStatus::extract_message_metadata at line 1703 of Mail/SpamAssassin/PerMsgStatus.pm, avg 16.3ms/call
# 234 times (2.41ms+0s) by Mail::SpamAssassin::Plugin::Bayes::get_body_from_msg at line 1025 of Mail/SpamAssassin/Plugin/Bayes.pm, avg 10µs/call | ||||
521 | 468 | 1.03ms | my ($self, $permsgstatus) = @_; | ||
522 | |||||
523 | # do this only once per message, it can be expensive | ||||
524 | 468 | 15.8ms | return if $self->{already_extracted_metadata}; | ||
525 | 234 | 1.26ms | $self->{already_extracted_metadata} = 1; | ||
526 | |||||
527 | 234 | 3.65ms | 234 | 3.81s | $self->{metadata}->extract ($self, $permsgstatus); # spent 3.81s making 234 calls to Mail::SpamAssassin::Message::Metadata::extract, avg 16.3ms/call |
528 | } | ||||
529 | |||||
530 | # --------------------------------------------------------------------------- | ||||
531 | |||||
532 | =item $str = get_metadata($hdr) | ||||
533 | |||||
534 | =cut | ||||
535 | |||||
536 | # spent 23.9ms within Mail::SpamAssassin::Message::get_metadata which was called 1876 times, avg 13µs/call:
# 1642 times (21.2ms+0s) by Mail::SpamAssassin::PerMsgStatus::_get at line 1987 of Mail/SpamAssassin/PerMsgStatus.pm, avg 13µs/call
# 234 times (2.71ms+0s) by Mail::SpamAssassin::PerMsgStatus::extract_message_metadata at line 1741 of Mail/SpamAssassin/PerMsgStatus.pm, avg 12µs/call | ||||
537 | 1876 | 4.61ms | my ($self, $hdr) = @_; | ||
538 | 1876 | 4.03ms | if (!$self->{metadata}) { | ||
539 | warn "metadata: oops! get_metadata() called after finish_metadata()"; return; | ||||
540 | } | ||||
541 | # dbg("message: get_metadata - %s: %s", $hdr, defined $_ ? $_ : '<undef>') | ||||
542 | # for $self->{metadata}->{strings}->{lc $hdr}; | ||||
543 | |||||
544 | 1876 | 30.2ms | $self->{metadata}->{strings}->{lc $hdr}; | ||
545 | } | ||||
546 | |||||
547 | =item put_metadata($hdr, $text) | ||||
548 | |||||
549 | =cut | ||||
550 | |||||
551 | # spent 17.4ms within Mail::SpamAssassin::Message::put_metadata which was called 936 times, avg 19µs/call:
# 234 times (4.76ms+0s) by Mail::SpamAssassin::Message::Metadata::parse_received_headers at line 284 of Mail/SpamAssassin/Message/Metadata/Received.pm, avg 20µs/call
# 234 times (4.34ms+0s) by Mail::SpamAssassin::Message::Metadata::parse_received_headers at line 280 of Mail/SpamAssassin/Message/Metadata/Received.pm, avg 19µs/call
# 234 times (4.18ms+0s) by Mail::SpamAssassin::Message::Metadata::parse_received_headers at line 282 of Mail/SpamAssassin/Message/Metadata/Received.pm, avg 18µs/call
# 234 times (4.12ms+0s) by Mail::SpamAssassin::Message::Metadata::parse_received_headers at line 286 of Mail/SpamAssassin/Message/Metadata/Received.pm, avg 18µs/call | ||||
552 | 936 | 4.41ms | my ($self, $hdr, $text) = @_; | ||
553 | 936 | 2.00ms | if (!$self->{metadata}) { | ||
554 | warn "metadata: oops! put_metadata() called after finish_metadata()"; return; | ||||
555 | } | ||||
556 | # dbg("message: put_metadata - %s: %s", $hdr, $text); | ||||
557 | 936 | 13.2ms | $self->{metadata}->{strings}->{lc $hdr} = $text; | ||
558 | } | ||||
559 | |||||
560 | =item delete_metadata($hdr) | ||||
561 | |||||
562 | =cut | ||||
563 | |||||
564 | sub delete_metadata { | ||||
565 | my ($self, $hdr) = @_; | ||||
566 | if (!$self->{metadata}) { | ||||
567 | warn "metadata: oops! delete_metadata() called after finish_metadata()"; return; | ||||
568 | } | ||||
569 | delete $self->{metadata}->{strings}->{lc $hdr}; | ||||
570 | } | ||||
571 | |||||
572 | =item $str = get_all_metadata() | ||||
573 | |||||
574 | =cut | ||||
575 | |||||
576 | # spent 27.2ms (26.2+994µs) within Mail::SpamAssassin::Message::get_all_metadata which was called 234 times, avg 116µs/call:
# 234 times (26.2ms+994µs) by Mail::SpamAssassin::Plugin::Bayes::_tokenize_headers at line 1304 of Mail/SpamAssassin/Plugin/Bayes.pm, avg 116µs/call | ||||
577 | 234 | 615µs | my ($self) = @_; | ||
578 | |||||
579 | 234 | 859µs | if (!$self->{metadata}) { | ||
580 | warn "metadata: oops! get_all_metadata() called after finish_metadata()"; return; | ||||
581 | } | ||||
582 | 234 | 465µs | my @ret; | ||
583 | 234 | 962µs | my $keys_ref = $self->{metadata}->{strings}; | ||
584 | 234 | 4.59ms | 234 | 994µs | foreach my $key (sort keys %$keys_ref) { # spent 994µs making 234 calls to Mail::SpamAssassin::Message::CORE:sort, avg 4µs/call |
585 | 936 | 5.59ms | my $val = $keys_ref->{$key}; | ||
586 | 936 | 1.61ms | $val = '' if !defined $val; | ||
587 | 936 | 9.63ms | push (@ret, "$key: $val\n"); | ||
588 | } | ||||
589 | 234 | 3.44ms | return (wantarray ? @ret : join('', @ret)); | ||
590 | } | ||||
591 | |||||
592 | # --------------------------------------------------------------------------- | ||||
593 | |||||
594 | =item finish_metadata() | ||||
595 | |||||
596 | Destroys the metadata for this message. Once a message has been | ||||
597 | scanned fully, the metadata is no longer required. Destroying | ||||
598 | this will free up some memory. | ||||
599 | |||||
600 | =cut | ||||
601 | |||||
602 | # spent 17.5ms (6.32+11.2) within Mail::SpamAssassin::Message::finish_metadata which was called 234 times, avg 75µs/call:
# 234 times (6.32ms+11.2ms) by Mail::SpamAssassin::Message::finish at line 620, avg 75µs/call | ||||
603 | 234 | 514µs | my ($self) = @_; | ||
604 | 234 | 2.49ms | if (defined ($self->{metadata})) { | ||
605 | 234 | 2.65ms | 234 | 11.2ms | $self->{metadata}->finish(); # spent 11.2ms making 234 calls to Mail::SpamAssassin::Message::Metadata::finish, avg 48µs/call |
606 | 234 | 798µs | delete $self->{metadata}; | ||
607 | } | ||||
608 | } | ||||
609 | |||||
610 | =item finish() | ||||
611 | |||||
612 | Clean up an object so that it can be destroyed. | ||||
613 | |||||
614 | =cut | ||||
615 | |||||
616 | # spent 124ms (103+20.8) within Mail::SpamAssassin::Message::finish which was called 234 times, avg 530µs/call:
# 234 times (103ms+20.8ms) by main::wanted at line 588 of /usr/local/bin/sa-learn, avg 530µs/call | ||||
617 | 234 | 552µs | my ($self) = @_; | ||
618 | |||||
619 | # Clean ourself up | ||||
620 | 234 | 1.89ms | 234 | 17.5ms | $self->finish_metadata(); # spent 17.5ms making 234 calls to Mail::SpamAssassin::Message::finish_metadata, avg 75µs/call |
621 | |||||
622 | # These will only be in the root Message node | ||||
623 | 234 | 1.72ms | delete $self->{'mime_boundary_state'}; | ||
624 | 234 | 549µs | delete $self->{'mbox_sep'}; | ||
625 | 234 | 519µs | delete $self->{'normalize'}; | ||
626 | 234 | 1.08ms | delete $self->{'pristine_body'}; | ||
627 | 234 | 996µs | delete $self->{'pristine_headers'}; | ||
628 | 234 | 717µs | delete $self->{'line_ending'}; | ||
629 | 234 | 588µs | delete $self->{'missing_head_body_separator'}; | ||
630 | |||||
631 | 234 | 1.10ms | my @toclean = ( $self ); | ||
632 | |||||
633 | # Go ahead and clean up all of the Message::Node parts | ||||
634 | 234 | 7.67ms | while (my $part = shift @toclean) { | ||
635 | # bug 5557: windows requires tmp file be closed before it can be rm'd | ||||
636 | 622 | 2.33ms | if (ref $part->{'raw'} eq 'GLOB') { | ||
637 | 16 | 396µs | 16 | 200µs | close($part->{'raw'}) or die "error closing input file: $!"; # spent 200µs making 16 calls to Mail::SpamAssassin::Message::CORE:close, avg 12µs/call |
638 | } | ||||
639 | |||||
640 | # bug 5858: avoid memory leak with deep MIME structure | ||||
641 | 622 | 1.38ms | if (defined ($part->{metadata})) { | ||
642 | $part->{metadata}->finish(); | ||||
643 | delete $part->{metadata}; | ||||
644 | } | ||||
645 | |||||
646 | 622 | 12.4ms | delete $part->{'headers'}; | ||
647 | 622 | 13.4ms | delete $part->{'raw_headers'}; | ||
648 | 622 | 5.72ms | delete $part->{'header_order'}; | ||
649 | 622 | 31.6ms | delete $part->{'raw'}; | ||
650 | 622 | 1.87ms | delete $part->{'decoded'}; | ||
651 | 622 | 1.51ms | delete $part->{'rendered'}; | ||
652 | 622 | 1.53ms | delete $part->{'visible_rendered'}; | ||
653 | 622 | 1.39ms | delete $part->{'invisible_rendered'}; | ||
654 | 622 | 1.74ms | delete $part->{'type'}; | ||
655 | 622 | 1.41ms | delete $part->{'rendered_type'}; | ||
656 | |||||
657 | # if there are children nodes, add them to the queue of nodes to clean up | ||||
658 | 622 | 1.65ms | if (exists $part->{'body_parts'}) { | ||
659 | 394 | 1.53ms | push(@toclean, @{$part->{'body_parts'}}); | ||
660 | 197 | 546µs | delete $part->{'body_parts'}; | ||
661 | } | ||||
662 | } | ||||
663 | |||||
664 | # delete temporary files | ||||
665 | 234 | 2.91ms | if ($self->{'tmpfiles'}) { | ||
666 | 468 | 2.03ms | for my $fn (@{$self->{'tmpfiles'}}) { | ||
667 | 16 | 3.34ms | 16 | 3.13ms | unlink($fn) or warn "cannot unlink $fn: $!"; # spent 3.13ms making 16 calls to Mail::SpamAssassin::Message::CORE:unlink, avg 196µs/call |
668 | } | ||||
669 | 234 | 639µs | delete $self->{'tmpfiles'}; | ||
670 | } | ||||
671 | } | ||||
672 | |||||
673 | # also use a DESTROY method, just to ensure (as much as possible) that | ||||
674 | # temporary files are deleted even if the finish() method is omitted | ||||
675 | # spent 1.45ms within Mail::SpamAssassin::Message::DESTROY which was called 79 times, avg 18µs/call:
# 79 times (1.45ms+0s) by main::wanted at line 589 of /usr/local/bin/sa-learn, avg 18µs/call | ||||
676 | 79 | 189µs | my $self = shift; | ||
677 | # best practices: prevent potential calls to eval and to system routines | ||||
678 | # in code of a DESTROY method from clobbering global variables $@ and $! | ||||
679 | 79 | 532µs | local($@,$!); # keep outer error handling unaffected by DESTROY | ||
680 | 79 | 913µs | if ($self->{'tmpfiles'}) { | ||
681 | for my $fn (@{$self->{'tmpfiles'}}) { | ||||
682 | unlink($fn) or dbg("message: cannot unlink $fn: $!"); | ||||
683 | } | ||||
684 | } | ||||
685 | } | ||||
686 | |||||
687 | # --------------------------------------------------------------------------- | ||||
688 | |||||
689 | =item receive_date() | ||||
690 | |||||
691 | Return a time_t value with the received date of the current message, | ||||
692 | or current time if received time couldn't be determined. | ||||
693 | |||||
694 | =cut | ||||
695 | |||||
696 | # spent 3.19s (29.4ms+3.16) within Mail::SpamAssassin::Message::receive_date which was called 555 times, avg 5.75ms/call:
# 321 times (8.76ms+1.78s) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1239 of Mail/SpamAssassin/Plugin/TxRep.pm, avg 5.59ms/call
# 234 times (20.6ms+1.38s) by Mail::SpamAssassin::Plugin::Bayes::_learn_trapped at line 465 of Mail/SpamAssassin/Plugin/Bayes.pm, avg 5.97ms/call | ||||
697 | 555 | 1.32ms | my($self) = @_; | ||
698 | |||||
699 | 555 | 13.3ms | 1110 | 3.16s | return Mail::SpamAssassin::Util::receive_date(scalar $self->get_all_headers(0,1)); # spent 2.80s making 555 calls to Mail::SpamAssassin::Message::Node::get_all_headers, avg 5.04ms/call
# spent 362ms making 555 calls to Mail::SpamAssassin::Util::receive_date, avg 653µs/call |
700 | } | ||||
701 | |||||
702 | # --------------------------------------------------------------------------- | ||||
703 | |||||
704 | =back | ||||
705 | |||||
706 | =head1 PARSING METHODS, NON-PUBLIC | ||||
707 | |||||
708 | These methods take a RFC2822-esque formatted message and create a tree | ||||
709 | with all of the MIME body parts included. Those parts will be decoded | ||||
710 | as necessary, and text/html parts will be rendered into a standard text | ||||
711 | format, suitable for use in SpamAssassin. | ||||
712 | |||||
713 | =over 4 | ||||
714 | |||||
715 | =item parse_body() | ||||
716 | |||||
717 | parse_body() passes the body part that was passed in onto the | ||||
718 | correct part parser, either _parse_multipart() for multipart/* parts, | ||||
719 | or _parse_normal() for everything else. Multipart sections become the | ||||
720 | root of sub-trees, while everything else becomes a leaf in the tree. | ||||
721 | |||||
722 | For multipart messages, the first call to parse_body() doesn't create a | ||||
723 | new sub-tree and just uses the parent node to contain children. All other | ||||
724 | calls to parse_body() will cause a new sub-tree root to be created and | ||||
725 | children will exist underneath that root. (this is just so the tree | ||||
726 | doesn't have a root node which points at the actual root node ...) | ||||
727 | |||||
728 | =cut | ||||
729 | |||||
730 | # spent 2.58s (64.6ms+2.52) within Mail::SpamAssassin::Message::parse_body which was called 234 times, avg 11.0ms/call:
# 234 times (64.6ms+2.52s) by Mail::SpamAssassin::Message::find_parts at line 420, avg 11.0ms/call | ||||
731 | 234 | 542µs | my($self) = @_; | ||
732 | |||||
733 | # This shouldn't happen, but just in case, abort. | ||||
734 | 234 | 695µs | return unless (exists $self->{'parse_queue'}); | ||
735 | |||||
736 | 234 | 1.60ms | 234 | 1.52ms | dbg("message: ---- MIME PARSER START ----"); # spent 1.52ms making 234 calls to Mail::SpamAssassin::Logger::dbg, avg 7µs/call |
737 | |||||
738 | 1090 | 32.8ms | while (my $toparse = shift @{$self->{'parse_queue'}}) { | ||
739 | # multipart sections are required to have a boundary set ... If this | ||||
740 | # one doesn't, assume it's malformed and send it to be parsed as a | ||||
741 | # non-multipart section | ||||
742 | # | ||||
743 | 622 | 2.58ms | my ($msg, $boundary, $body, $subparse) = @$toparse; | ||
744 | |||||
745 | 622 | 10.6ms | 622 | 2.34ms | if ($msg->{'type'} =~ m{^multipart/}i && defined $boundary && $subparse > 0) { # spent 2.34ms making 622 calls to Mail::SpamAssassin::Message::CORE:match, avg 4µs/call |
746 | 197 | 1.76ms | 197 | 2.30s | $self->_parse_multipart($toparse); # spent 2.30s making 197 calls to Mail::SpamAssassin::Message::_parse_multipart, avg 11.7ms/call |
747 | } | ||||
748 | else { | ||||
749 | # If it's not multipart, go ahead and just deal with it. | ||||
750 | 425 | 3.36ms | 425 | 211ms | $self->_parse_normal($toparse); # spent 211ms making 425 calls to Mail::SpamAssassin::Message::_parse_normal, avg 495µs/call |
751 | |||||
752 | # bug 5041: process message/*, but exclude message/partial content types | ||||
753 | 425 | 4.56ms | 425 | 1.30ms | if ($msg->{'type'} =~ m{^message/(?!partial\z)}i && $subparse > 0) # spent 1.30ms making 425 calls to Mail::SpamAssassin::Message::CORE:match, avg 3µs/call |
754 | { | ||||
755 | # Just decode the part, but we don't need the resulting string here. | ||||
756 | $msg->decode(0); | ||||
757 | |||||
758 | # bug 7125: decode and parse only message/rfc822 or message/global, | ||||
759 | # but do not treat other message/* content types (like the ones listed | ||||
760 | # here) as a message consisting of a header and a body, as they are not: | ||||
761 | # message/delivery-status, message/global-delivery-status, | ||||
762 | # message/feedback-report, message/global-headers, | ||||
763 | # message/global-disposition-notification, | ||||
764 | # message/disposition-notification, (and message/partial) | ||||
765 | |||||
766 | # bug 5051, bug 3748: check $msg->{decoded}: sometimes message/* parts | ||||
767 | # have no content, and we get stuck waiting for STDIN, which is bad. :( | ||||
768 | |||||
769 | if ($msg->{'type'} =~ m{^message/(?:rfc822|global)\z}i && | ||||
770 | defined $msg->{'decoded'} && $msg->{'decoded'} ne '') | ||||
771 | { | ||||
772 | # Ok, so this part is still semi-recursive, since M::SA::Message | ||||
773 | # calls M::SA::Message, but we don't subparse the new message, | ||||
774 | # and pull a sneaky "steal our child's queue" maneuver to deal | ||||
775 | # with it on our own time. Reference the decoded array directly | ||||
776 | # since it's faster. | ||||
777 | # | ||||
778 | my $msg_obj = Mail::SpamAssassin::Message->new({ | ||||
779 | message => $msg->{'decoded'}, | ||||
780 | parsenow => 0, | ||||
781 | normalize => $self->{normalize}, | ||||
782 | subparse => $subparse - 1, | ||||
783 | }); | ||||
784 | |||||
785 | # Add the new message to the current node | ||||
786 | $msg->add_body_part($msg_obj); | ||||
787 | |||||
788 | # now this is the sneaky bit ... steal the sub-message's parse_queue | ||||
789 | # and add it to ours. then we'll handle the sub-message in our | ||||
790 | # normal loop and get all the glory. muhaha. :) | ||||
791 | push(@{$self->{'parse_queue'}}, @{$msg_obj->{'parse_queue'}}); | ||||
792 | delete $msg_obj->{'parse_queue'}; | ||||
793 | |||||
794 | # Ok, we've subparsed, so go ahead and remove the raw and decoded | ||||
795 | # data because we won't need them anymore (the tree under this part | ||||
796 | # will have that data) | ||||
797 | if (ref $msg->{'raw'} eq 'GLOB') { | ||||
798 | # Make sure we close it if it's a temp file -- Bug 5166 | ||||
799 | close($msg->{'raw'}) | ||||
800 | or die "error closing input file: $!"; | ||||
801 | } | ||||
802 | |||||
803 | delete $msg->{'raw'}; | ||||
804 | |||||
805 | delete $msg->{'decoded'}; | ||||
806 | } | ||||
807 | } | ||||
808 | } | ||||
809 | } | ||||
810 | |||||
811 | 234 | 1.60ms | 234 | 1.41ms | dbg("message: ---- MIME PARSER END ----"); # spent 1.41ms making 234 calls to Mail::SpamAssassin::Logger::dbg, avg 6µs/call |
812 | |||||
813 | # we're done parsing, so remove the queue variable | ||||
814 | 234 | 2.44ms | delete $self->{'parse_queue'}; | ||
815 | } | ||||
816 | |||||
817 | =item _parse_multipart() | ||||
818 | |||||
819 | Generate a root node, and for each child part call parse_body() | ||||
820 | to generate the tree. | ||||
821 | |||||
822 | =cut | ||||
823 | |||||
824 | # spent 2.30s (1.97+338ms) within Mail::SpamAssassin::Message::_parse_multipart which was called 197 times, avg 11.7ms/call:
# 197 times (1.97s+338ms) by Mail::SpamAssassin::Message::parse_body at line 746, avg 11.7ms/call | ||||
825 | 197 | 438µs | my($self, $toparse) = @_; | ||
826 | |||||
827 | 394 | 1.55ms | my ($msg, $boundary, $body, $subparse) = @{$toparse}; | ||
828 | |||||
829 | # we're not supposed to be a leaf, so prep ourselves | ||||
830 | 197 | 885µs | $msg->{'body_parts'} = []; | ||
831 | |||||
832 | # the next set of objects will be one level deeper | ||||
833 | 197 | 431µs | $subparse--; | ||
834 | |||||
835 | 197 | 2.01ms | 197 | 1.43ms | dbg("message: parsing multipart, got boundary: ".(defined $boundary ? $boundary : '')); # spent 1.43ms making 197 calls to Mail::SpamAssassin::Logger::dbg, avg 7µs/call |
836 | |||||
837 | # NOTE: The MIME boundary REs here are very specific to be mostly RFC 1521 | ||||
838 | # compliant, but also allow possible malformations to still work. Please | ||||
839 | # see Bugzilla bug 3749 for more information before making any changes! | ||||
840 | |||||
841 | # ignore preamble per RFC 1521, unless there's no boundary ... | ||||
842 | 197 | 890µs | if ( defined $boundary ) { | ||
843 | 197 | 362µs | my $line; | ||
844 | 394 | 1.23ms | my $tmp_line = @{$body}; | ||
845 | 197 | 1.51ms | for ($line=0; $line < $tmp_line; $line++) { | ||
846 | # dbg("message: multipart line $line: \"" . $body->[$line] . "\""); | ||||
847 | # specifically look for an opening boundary | ||||
848 | 336 | 14.0ms | 394 | 8.75ms | if (substr($body->[$line],0,2) eq '--' # triage # spent 7.90ms making 197 calls to Mail::SpamAssassin::Message::CORE:regcomp, avg 40µs/call
# spent 856µs making 197 calls to Mail::SpamAssassin::Message::CORE:match, avg 4µs/call |
849 | && $body->[$line] =~ /^--\Q$boundary\E\s*$/) { | ||||
850 | # Make note that we found the opening boundary | ||||
851 | 197 | 1.06ms | $self->{mime_boundary_state}->{$boundary} = 1; | ||
852 | |||||
853 | # if the line after the opening boundary isn't a header, flag it. | ||||
854 | # we need to make sure that there's actually another line though. | ||||
855 | # no re "strict"; # since perl 5.21.8: Ranges of ASCII printables... | ||||
856 | 197 | 3.22ms | 197 | 1.11ms | if ($line+1 < $tmp_line && $body->[$line+1] !~ /^[\041-\071\073-\176]+:/) { # spent 1.11ms making 197 calls to Mail::SpamAssassin::Message::CORE:match, avg 6µs/call |
857 | $self->{'missing_mime_headers'} = 1; | ||||
858 | } | ||||
859 | |||||
860 | 197 | 778µs | last; | ||
861 | } | ||||
862 | } | ||||
863 | |||||
864 | # Found a boundary, ignore the preamble | ||||
865 | 197 | 801µs | if ( $line < $tmp_line ) { | ||
866 | 394 | 1.71ms | splice @{$body}, 0, $line+1; | ||
867 | } | ||||
868 | |||||
869 | # Else, there's no boundary, so leave the whole part... | ||||
870 | } | ||||
871 | |||||
872 | # prepare a new tree node | ||||
873 | 197 | 2.88ms | 197 | 5.34ms | my $part_msg = Mail::SpamAssassin::Message::Node->new({ normalize=>$self->{normalize} }); # spent 5.34ms making 197 calls to Mail::SpamAssassin::Message::Node::new, avg 27µs/call |
874 | 197 | 428µs | my $in_body = 0; | ||
875 | 197 | 382µs | my $header; | ||
876 | my $part_array; | ||||
877 | my $found_end_boundary; | ||||
878 | |||||
879 | 394 | 1.21ms | my $line_count = @{$body}; | ||
880 | 394 | 5.96ms | foreach ( @{$body} ) { | ||
881 | # if we're on the last body line, or we find any boundary marker, | ||||
882 | # deal with the mime part; | ||||
883 | # a triage before an unlikely-to-match regexp avoids a CPU hotspot | ||||
884 | 67383 | 339ms | 982 | 11.8ms | $found_end_boundary = defined $boundary && substr($_,0,2) eq '--' # spent 9.21ms making 491 calls to Mail::SpamAssassin::Message::CORE:regcomp, avg 19µs/call
# spent 2.60ms making 491 calls to Mail::SpamAssassin::Message::CORE:match, avg 5µs/call |
885 | && /^--\Q$boundary\E(?:--)?\s*$/; | ||||
886 | 67383 | 110ms | if ( --$line_count == 0 || $found_end_boundary ) { | ||
887 | 388 | 1.66ms | my $line = $_; # remember the last line | ||
888 | |||||
889 | # If at last line and no end boundary found, the line belongs to body | ||||
890 | # TODO: | ||||
891 | # Is $self->{mime_boundary_state}->{$boundary}-- needed here? | ||||
892 | # Could "missing end boundary" be a useful rule? Mark it somewhere? | ||||
893 | # If SA processed truncated message from amavis etc, this could also | ||||
894 | # be hit legimately.. | ||||
895 | 388 | 2.28ms | if (!$found_end_boundary) { | ||
896 | # TODO: This is duplicate code from few pages down below.. | ||||
897 | while (length ($_) > MAX_BODY_LINE_LENGTH) { | ||||
898 | push (@{$part_array}, substr($_, 0, MAX_BODY_LINE_LENGTH)."\n"); | ||||
899 | substr($_, 0, MAX_BODY_LINE_LENGTH) = ''; | ||||
900 | } | ||||
901 | push ( @{$part_array}, $_ ); | ||||
902 | } | ||||
903 | # per rfc 1521, the CRLF before the boundary is part of the boundary: | ||||
904 | # NOTE: The CRLF preceding the encapsulation line is conceptually | ||||
905 | # attached to the boundary so that it is possible to have a part | ||||
906 | # that does not end with a CRLF (line break). Body parts that must | ||||
907 | # be considered to end with line breaks, therefore, must have two | ||||
908 | # CRLFs preceding the encapsulation line, the first of which is part | ||||
909 | # of the preceding body part, and the second of which is part of the | ||||
910 | # encapsulation boundary. | ||||
911 | elsif ($part_array) { | ||||
912 | 388 | 1.77ms | chomp( $part_array->[-1] ); # trim the CRLF that's part of the boundary | ||
913 | 681 | 2.55ms | splice @{$part_array}, -1 if ( $part_array->[-1] eq '' ); # blank line for the boundary only ... | ||
914 | } | ||||
915 | else { | ||||
916 | # Invalid parts can have no body, so fake in a blank body | ||||
917 | # in that case. | ||||
918 | $part_array = []; | ||||
919 | } | ||||
920 | |||||
921 | 388 | 672µs | my($p_boundary); | ||
922 | 388 | 7.83ms | 776 | 87.2ms | ($part_msg->{'type'}, $p_boundary) = Mail::SpamAssassin::Util::parse_content_type($part_msg->header('content-type')); # spent 64.5ms making 388 calls to Mail::SpamAssassin::Util::parse_content_type, avg 166µs/call
# spent 22.7ms making 388 calls to Mail::SpamAssassin::Message::Node::header, avg 58µs/call |
923 | 388 | 1.01ms | $p_boundary ||= $boundary; | ||
924 | 388 | 5.34ms | 388 | 4.00ms | dbg("message: found part of type ".$part_msg->{'type'}.", boundary: ".(defined $p_boundary ? $p_boundary : '')); # spent 4.00ms making 388 calls to Mail::SpamAssassin::Logger::dbg, avg 10µs/call |
925 | |||||
926 | # we've created a new node object, so add it to the queue along with the | ||||
927 | # text that belongs to that part, then add the new part to the current | ||||
928 | # node to create the tree. | ||||
929 | 776 | 5.13ms | push(@{$self->{'parse_queue'}}, [ $part_msg, $p_boundary, $part_array, $subparse ]); | ||
930 | 388 | 3.23ms | 388 | 15.1ms | $msg->add_body_part($part_msg); # spent 15.1ms making 388 calls to Mail::SpamAssassin::Message::Node::add_body_part, avg 39µs/call |
931 | |||||
932 | # rfc 1521 says /^--boundary--$/, some MUAs may just require /^--boundary--/ | ||||
933 | # but this causes problems with horizontal lines when the boundary is | ||||
934 | # made up of dashes as well, etc. | ||||
935 | 388 | 1.27ms | if (defined $boundary) { | ||
936 | # no re "strict"; # since perl 5.21.8: Ranges of ASCII printables... | ||||
937 | 388 | 17.6ms | 967 | 9.39ms | if ($line =~ /^--\Q${boundary}\E--\s*$/) { # spent 6.87ms making 388 calls to Mail::SpamAssassin::Message::CORE:regcomp, avg 18µs/call
# spent 2.53ms making 579 calls to Mail::SpamAssassin::Message::CORE:match, avg 4µs/call |
938 | # Make a note that we've seen the end boundary | ||||
939 | 197 | 664µs | $self->{mime_boundary_state}->{$boundary}--; | ||
940 | 197 | 1.02ms | last; | ||
941 | } | ||||
942 | elsif ($line_count && $body->[-$line_count] !~ /^[\041-\071\073-\176]+:/) { | ||||
943 | # if we aren't on an end boundary and there are still lines left, it | ||||
944 | # means we hit a new start boundary. therefore, the next line ought | ||||
945 | # to be a mime header. if it's not, mark it. | ||||
946 | $self->{'missing_mime_headers'} = 1; | ||||
947 | } | ||||
948 | } | ||||
949 | |||||
950 | # make sure we start with a new clean node | ||||
951 | 191 | 415µs | $in_body = 0; | ||
952 | 191 | 2.23ms | 191 | 5.82ms | $part_msg = Mail::SpamAssassin::Message::Node->new({ normalize=>$self->{normalize} }); # spent 5.82ms making 191 calls to Mail::SpamAssassin::Message::Node::new, avg 30µs/call |
953 | 191 | 426µs | undef $part_array; | ||
954 | 191 | 462µs | undef $header; | ||
955 | |||||
956 | 191 | 737µs | next; | ||
957 | } | ||||
958 | |||||
959 | 66995 | 104ms | if (!$in_body) { | ||
960 | # s/\s+$//; # bug 5127: don't clean this up (yet) | ||||
961 | # no re "strict"; # since perl 5.21.8: Ranges of ASCII printables... | ||||
962 | 1233 | 21.8ms | 1685 | 8.62ms | if (/^[\041-\071\073-\176]+[ \t]*:/) { # spent 8.62ms making 1685 calls to Mail::SpamAssassin::Message::CORE:match, avg 5µs/call |
963 | 781 | 2.95ms | if ($header) { | ||
964 | 393 | 5.05ms | my ( $key, $value ) = split ( /:\s*/, $header, 2 ); | ||
965 | 393 | 3.22ms | 393 | 100.0ms | $part_msg->header( $key, $value ); # spent 100.0ms making 393 calls to Mail::SpamAssassin::Message::Node::header, avg 254µs/call |
966 | } | ||||
967 | 781 | 2.02ms | $header = $_; | ||
968 | 781 | 1.65ms | next; | ||
969 | } | ||||
970 | elsif (/^[ \t]/ && $header) { | ||||
971 | # $_ =~ s/^\s*//; # bug 5127, again | ||||
972 | 64 | 292µs | $header .= $_; | ||
973 | 64 | 177µs | next; | ||
974 | } | ||||
975 | else { | ||||
976 | 388 | 1.70ms | if ($header) { | ||
977 | 388 | 4.73ms | my ( $key, $value ) = split ( /:\s*/, $header, 2 ); | ||
978 | 388 | 3.10ms | 388 | 76.9ms | $part_msg->header( $key, $value ); # spent 76.9ms making 388 calls to Mail::SpamAssassin::Message::Node::header, avg 198µs/call |
979 | } | ||||
980 | 388 | 1.75ms | $in_body = 1; | ||
981 | |||||
982 | # if there's a blank line separator, that's good. if there isn't, | ||||
983 | # it's a body line, so drop through. | ||||
984 | 388 | 4.89ms | 388 | 2.04ms | if (/^\r?$/) { # spent 2.04ms making 388 calls to Mail::SpamAssassin::Message::CORE:match, avg 5µs/call |
985 | 388 | 861µs | next; | ||
986 | } | ||||
987 | else { | ||||
988 | $self->{'missing_mime_head_body_separator'} = 1; | ||||
989 | } | ||||
990 | } | ||||
991 | } | ||||
992 | |||||
993 | # we run into a perl bug if the lines are astronomically long (probably | ||||
994 | # due to lots of regexp backtracking); so split any individual line | ||||
995 | # over MAX_BODY_LINE_LENGTH bytes in length. This can wreck HTML | ||||
996 | # totally -- but IMHO the only reason a luser would use | ||||
997 | # MAX_BODY_LINE_LENGTH-byte lines is to crash filters, anyway. | ||||
998 | 65762 | 319ms | while (length ($_) > MAX_BODY_LINE_LENGTH) { | ||
999 | push (@{$part_array}, substr($_, 0, MAX_BODY_LINE_LENGTH)."\n"); | ||||
1000 | substr($_, 0, MAX_BODY_LINE_LENGTH) = ''; | ||||
1001 | } | ||||
1002 | 131524 | 959ms | push ( @{$part_array}, $_ ); | ||
1003 | } | ||||
1004 | |||||
1005 | # Look for a message epilogue | ||||
1006 | # originally ignored whitespace: 0.185 0.2037 0.0654 0.757 0.00 0.00 TVD_TAB | ||||
1007 | # ham FPs were all "." on a line by itself. | ||||
1008 | # spams seem to only have NULL chars afterwards ? | ||||
1009 | 197 | 1.99ms | if ($line_count) { | ||
1010 | 75 | 752µs | for(; $line_count > 0; $line_count--) { | ||
1011 | 87 | 1.04ms | 87 | 332µs | if ($body->[-$line_count] =~ /[^\s.]/) { # spent 332µs making 87 calls to Mail::SpamAssassin::Message::CORE:match, avg 4µs/call |
1012 | $self->{mime_epilogue_exists} = 1; | ||||
1013 | last; | ||||
1014 | } | ||||
1015 | } | ||||
1016 | } | ||||
1017 | |||||
1018 | } | ||||
1019 | |||||
1020 | =item _parse_normal() | ||||
1021 | |||||
1022 | Generate a leaf node and add it to the parent. | ||||
1023 | |||||
1024 | =cut | ||||
1025 | |||||
1026 | # spent 211ms (62.1+148) within Mail::SpamAssassin::Message::_parse_normal which was called 425 times, avg 495µs/call:
# 425 times (62.1ms+148ms) by Mail::SpamAssassin::Message::parse_body at line 750, avg 495µs/call | ||||
1027 | 425 | 880µs | my($self, $toparse) = @_; | ||
1028 | |||||
1029 | 850 | 3.08ms | my ($msg, $boundary, $body) = @{$toparse}; | ||
1030 | |||||
1031 | 425 | 2.80ms | 425 | 2.68ms | dbg("message: parsing normal part"); # spent 2.68ms making 425 calls to Mail::SpamAssassin::Logger::dbg, avg 6µs/call |
1032 | |||||
1033 | # 0: content-type, 1: boundary, 2: charset, 3: filename | ||||
1034 | 425 | 7.04ms | 850 | 94.2ms | my @ct = Mail::SpamAssassin::Util::parse_content_type($msg->header('content-type')); # spent 70.9ms making 425 calls to Mail::SpamAssassin::Util::parse_content_type, avg 167µs/call
# spent 23.3ms making 425 calls to Mail::SpamAssassin::Message::Node::header, avg 55µs/call |
1035 | |||||
1036 | # multipart sections are required to have a boundary set ... If this | ||||
1037 | # one doesn't, assume it's malformed and revert to text/plain | ||||
1038 | 425 | 4.96ms | 425 | 1.18ms | $msg->{'type'} = ($ct[0] !~ m@^multipart/@i || defined $boundary ) ? $ct[0] : 'text/plain'; # spent 1.18ms making 425 calls to Mail::SpamAssassin::Message::CORE:match, avg 3µs/call |
1039 | 425 | 2.15ms | $msg->{'charset'} = $ct[2]; | ||
1040 | |||||
1041 | # attempt to figure out a name for this attachment if there is one ... | ||||
1042 | 425 | 3.28ms | 425 | 21.7ms | my $disp = $msg->header('content-disposition') || ''; # spent 21.7ms making 425 calls to Mail::SpamAssassin::Message::Node::header, avg 51µs/call |
1043 | 425 | 4.15ms | 425 | 906µs | if ($disp =~ /name="?([^\";]+)"?/i) { # spent 906µs making 425 calls to Mail::SpamAssassin::Message::CORE:match, avg 2µs/call |
1044 | 9 | 54µs | $msg->{'name'} = $1; | ||
1045 | } | ||||
1046 | elsif ($ct[3]) { | ||||
1047 | 6 | 28µs | $msg->{'name'} = $ct[3]; | ||
1048 | } | ||||
1049 | |||||
1050 | 425 | 2.56ms | $msg->{'boundary'} = $boundary; | ||
1051 | |||||
1052 | # If the part type is not one that we're likely to want to use, go | ||||
1053 | # ahead and write the part data out to a temp file -- why keep sucking | ||||
1054 | # up RAM with something we're not going to use? | ||||
1055 | # | ||||
1056 | 425 | 15.1ms | 425 | 3.41ms | if ($msg->{'type'} !~ m@^(?:text/(?:plain|html)$|message\b)@) { # spent 3.41ms making 425 calls to Mail::SpamAssassin::Message::CORE:match, avg 8µs/call |
1057 | 16 | 32µs | my($filepath, $fh); | ||
1058 | eval { | ||||
1059 | 32 | 205µs | 16 | 12.3ms | ($filepath, $fh) = Mail::SpamAssassin::Util::secure_tmpfile(); 1; # spent 12.3ms making 16 calls to Mail::SpamAssassin::Util::secure_tmpfile, avg 768µs/call |
1060 | 16 | 77µs | } or do { | ||
1061 | my $eval_stat = $@ ne '' ? $@ : "errno=$!"; chomp $eval_stat; | ||||
1062 | info("message: failed to create a temp file: %s", $eval_stat); | ||||
1063 | }; | ||||
1064 | 16 | 72µs | if ($fh) { | ||
1065 | # The temp file was created, add it to the list of pending deletions | ||||
1066 | # we cannot just delete immediately in the POSIX idiom, as this is | ||||
1067 | # unportable (to win32 at least) | ||||
1068 | 32 | 141µs | push @{$self->{tmpfiles}}, $filepath; | ||
1069 | 16 | 134µs | 16 | 125µs | dbg("message: storing a message part to file %s", $filepath); # spent 125µs making 16 calls to Mail::SpamAssassin::Logger::dbg, avg 8µs/call |
1070 | 32 | 461µs | 16 | 8.13ms | $fh->print(@{$body}) or die "error writing to $filepath: $!"; # spent 8.13ms making 16 calls to IO::Handle::print, avg 508µs/call |
1071 | 16 | 1.30ms | 16 | 1.12ms | $fh->flush or die "error writing (flush) to $filepath: $!"; # spent 1.12ms making 16 calls to IO::Handle::flush, avg 70µs/call |
1072 | 16 | 116µs | $msg->{'raw'} = $fh; | ||
1073 | } | ||||
1074 | } | ||||
1075 | |||||
1076 | # if the part didn't get a temp file, go ahead and store the data in memory | ||||
1077 | 425 | 6.06ms | if (!defined $msg->{'raw'}) { | ||
1078 | 409 | 2.70ms | 409 | 2.61ms | dbg("message: storing a body to memory"); # spent 2.61ms making 409 calls to Mail::SpamAssassin::Logger::dbg, avg 6µs/call |
1079 | 409 | 2.59ms | $msg->{'raw'} = $body; | ||
1080 | } | ||||
1081 | } | ||||
1082 | |||||
1083 | # --------------------------------------------------------------------------- | ||||
1084 | |||||
1085 | sub get_mimepart_digests { | ||||
1086 | my ($self) = @_; | ||||
1087 | |||||
1088 | if (!exists $self->{mimepart_digests}) { | ||||
1089 | # traverse all parts which are leaves, recursively | ||||
1090 | $self->{mimepart_digests} = | ||||
1091 | [ map(sha1_hex($_->decode) . ':' . lc($_->{type}||''), | ||||
1092 | $self->find_parts(qr/^/,1,1)) ]; | ||||
1093 | } | ||||
1094 | return $self->{mimepart_digests}; | ||||
1095 | } | ||||
1096 | |||||
1097 | # --------------------------------------------------------------------------- | ||||
1098 | |||||
1099 | # common code for get_rendered_body_text_array, | ||||
1100 | # get_visible_rendered_body_text_array, get_invisible_rendered_body_text_array | ||||
1101 | # | ||||
1102 | # spent 26.2s (154ms+26.0) within Mail::SpamAssassin::Message::get_body_text_array_common which was called 1170 times, avg 22.4ms/call:
# 702 times (60.3ms+25.8s) by Mail::SpamAssassin::Message::get_rendered_body_text_array at line 1159, avg 36.8ms/call
# 234 times (60.3ms+183ms) by Mail::SpamAssassin::Message::get_visible_rendered_body_text_array at line 1164, avg 1.04ms/call
# 234 times (33.1ms+68.5ms) by Mail::SpamAssassin::Message::get_invisible_rendered_body_text_array at line 1169, avg 434µs/call | ||||
1103 | 1170 | 3.34ms | my ($self, $method_name) = @_; | ||
1104 | |||||
1105 | 1170 | 3.48ms | my $key = 'text_' . $method_name; | ||
1106 | 1638 | 7.83ms | if (exists $self->{$key}) { return $self->{$key} } | ||
1107 | |||||
1108 | 702 | 3.60ms | $self->{$key} = []; | ||
1109 | |||||
1110 | # Find all parts which are leaves | ||||
1111 | 702 | 16.4ms | 1404 | 2.74s | my @parts = $self->find_parts(qr/./,1); # spent 2.74s making 702 calls to Mail::SpamAssassin::Message::find_parts, avg 3.90ms/call
# spent 4.66ms making 702 calls to Mail::SpamAssassin::Message::CORE:qr, avg 7µs/call |
1112 | 702 | 1.55ms | return $self->{$key} unless @parts; | ||
1113 | |||||
1114 | # the html metadata may have already been set, so let's not bother if it's | ||||
1115 | # already been done. | ||||
1116 | 702 | 2.63ms | my $html_needs_setting = !exists $self->{metadata}->{html}; | ||
1117 | |||||
1118 | 702 | 7.28ms | 468 | 50.2ms | my $text = $method_name eq 'invisible_rendered' ? '' # spent 50.2ms making 468 calls to Mail::SpamAssassin::Message::Node::get_header, avg 107µs/call |
1119 | : ($self->get_header('subject') || "\n"); | ||||
1120 | |||||
1121 | # Go through each part | ||||
1122 | 702 | 12.2ms | for (my $pt = 0 ; $pt <= $#parts ; $pt++ ) { | ||
1123 | 1275 | 3.05ms | my $p = $parts[$pt]; | ||
1124 | |||||
1125 | # put a blank line between parts ... | ||||
1126 | 1275 | 3.73ms | $text .= "\n" if $text ne ''; | ||
1127 | |||||
1128 | 1275 | 14.4ms | 1275 | 23.1s | my($type, $rnd) = $p->$method_name(); # decode this part # spent 23.0s making 425 calls to Mail::SpamAssassin::Message::Node::rendered, avg 54.2ms/call
# spent 15.2ms making 425 calls to Mail::SpamAssassin::Message::Node::visible_rendered, avg 36µs/call
# spent 14.7ms making 425 calls to Mail::SpamAssassin::Message::Node::invisible_rendered, avg 35µs/call |
1129 | 1275 | 5.41ms | if ( defined $rnd ) { | ||
1130 | # Only text/* types are rendered ... | ||||
1131 | 1227 | 6.25ms | $text .= $rnd; | ||
1132 | |||||
1133 | # TVD - if there are multiple parts, what should we do? | ||||
1134 | # right now, just use the last one. we may need to give some priority | ||||
1135 | # at some point, ie: use text/html rendered if it exists, or | ||||
1136 | # text/plain rendered as html otherwise. | ||||
1137 | 1227 | 2.92ms | if ($html_needs_setting && $type eq 'text/html') { | ||
1138 | 189 | 841µs | $self->{metadata}->{html} = $p->{html_results}; | ||
1139 | } | ||||
1140 | } | ||||
1141 | } | ||||
1142 | |||||
1143 | # whitespace handling (warning: small changes have large effects!) | ||||
1144 | 702 | 35.7ms | 702 | 28.7ms | $text =~ s/\n+\s*\n+/\f/gs; # double newlines => form feed # spent 28.7ms making 702 calls to Mail::SpamAssassin::Message::CORE:subst, avg 41µs/call |
1145 | # $text =~ tr/ \t\n\r\x0b\xa0/ /s; # whitespace (incl. VT, NBSP) => space | ||||
1146 | 702 | 11.7ms | $text =~ tr/ \t\n\r\x0b/ /s; # whitespace (incl. VT) => space | ||
1147 | 702 | 5.21ms | $text =~ tr/\f/\n/; # form feeds => newline | ||
1148 | |||||
1149 | 702 | 13.3ms | 702 | 143ms | my @textary = split_into_array_of_short_lines($text); # spent 143ms making 702 calls to Mail::SpamAssassin::Message::split_into_array_of_short_lines, avg 203µs/call |
1150 | 702 | 3.93ms | $self->{$key} = \@textary; | ||
1151 | |||||
1152 | 702 | 7.61ms | return $self->{$key}; | ||
1153 | } | ||||
1154 | |||||
1155 | # --------------------------------------------------------------------------- | ||||
1156 | |||||
1157 | # spent 25.8s (11.1ms+25.8) within Mail::SpamAssassin::Message::get_rendered_body_text_array which was called 702 times, avg 36.8ms/call:
# 702 times (11.1ms+25.8s) by Mail::SpamAssassin::PerMsgStatus::get_decoded_stripped_body_text_array at line 1785 of Mail/SpamAssassin/PerMsgStatus.pm, avg 36.8ms/call | ||||
1158 | 702 | 1.46ms | my ($self) = @_; | ||
1159 | 702 | 21.2ms | 702 | 25.8s | return $self->get_body_text_array_common('rendered'); # spent 25.8s making 702 calls to Mail::SpamAssassin::Message::get_body_text_array_common, avg 36.8ms/call |
1160 | } | ||||
1161 | |||||
1162 | # spent 248ms (4.19+244) within Mail::SpamAssassin::Message::get_visible_rendered_body_text_array which was called 234 times, avg 1.06ms/call:
# 234 times (4.19ms+244ms) by Mail::SpamAssassin::Plugin::Bayes::_get_msgdata_from_permsgstatus at line 1044 of Mail/SpamAssassin/Plugin/Bayes.pm, avg 1.06ms/call | ||||
1163 | 234 | 542µs | my ($self) = @_; | ||
1164 | 234 | 3.55ms | 234 | 244ms | return $self->get_body_text_array_common('visible_rendered'); # spent 244ms making 234 calls to Mail::SpamAssassin::Message::get_body_text_array_common, avg 1.04ms/call |
1165 | } | ||||
1166 | |||||
1167 | # spent 106ms (4.37+102) within Mail::SpamAssassin::Message::get_invisible_rendered_body_text_array which was called 234 times, avg 453µs/call:
# 234 times (4.37ms+102ms) by Mail::SpamAssassin::Plugin::Bayes::_get_msgdata_from_permsgstatus at line 1046 of Mail/SpamAssassin/Plugin/Bayes.pm, avg 453µs/call | ||||
1168 | 234 | 541µs | my ($self) = @_; | ||
1169 | 234 | 2.89ms | 234 | 102ms | return $self->get_body_text_array_common('invisible_rendered'); # spent 102ms making 234 calls to Mail::SpamAssassin::Message::get_body_text_array_common, avg 434µs/call |
1170 | } | ||||
1171 | |||||
1172 | # --------------------------------------------------------------------------- | ||||
1173 | |||||
1174 | sub get_decoded_body_text_array { | ||||
1175 | my ($self) = @_; | ||||
1176 | |||||
1177 | if (defined $self->{text_decoded}) { return $self->{text_decoded}; } | ||||
1178 | $self->{text_decoded} = [ ]; | ||||
1179 | |||||
1180 | # Find all parts which are leaves | ||||
1181 | my @parts = $self->find_parts(qr/^(?:text|message)\b/i,1); | ||||
1182 | return $self->{text_decoded} unless @parts; | ||||
1183 | |||||
1184 | # Go through each part | ||||
1185 | for(my $pt = 0 ; $pt <= $#parts ; $pt++ ) { | ||||
1186 | # bug 4843: skip text/calendar parts since they're usually an attachment | ||||
1187 | # and not displayed | ||||
1188 | next if ($parts[$pt]->{'type'} eq 'text/calendar'); | ||||
1189 | |||||
1190 | push(@{$self->{text_decoded}}, "\n") if ( @{$self->{text_decoded}} ); | ||||
1191 | push(@{$self->{text_decoded}}, | ||||
1192 | split_into_array_of_short_paragraphs($parts[$pt]->decode())); | ||||
1193 | } | ||||
1194 | |||||
1195 | return $self->{text_decoded}; | ||||
1196 | } | ||||
1197 | |||||
1198 | # --------------------------------------------------------------------------- | ||||
1199 | |||||
1200 | # spent 143ms within Mail::SpamAssassin::Message::split_into_array_of_short_lines which was called 702 times, avg 203µs/call:
# 702 times (143ms+0s) by Mail::SpamAssassin::Message::get_body_text_array_common at line 1149, avg 203µs/call | ||||
1201 | 702 | 1.30ms | my @result; | ||
1202 | 702 | 16.9ms | foreach my $line (split (/^/m, $_[0])) { | ||
1203 | 8895 | 33.1ms | while (length ($line) > MAX_BODY_LINE_LENGTH) { | ||
1204 | # try splitting "nicely" so that we don't chop a url in half or | ||||
1205 | # something. if there's no space, then just split at max length. | ||||
1206 | 92 | 1.01ms | my $length = rindex($line, ' ', MAX_BODY_LINE_LENGTH) + 1; | ||
1207 | 92 | 179µs | $length ||= MAX_BODY_LINE_LENGTH; | ||
1208 | 92 | 1.13ms | push (@result, substr($line, 0, $length, '')); | ||
1209 | } | ||||
1210 | 8895 | 76.8ms | push (@result, $line); | ||
1211 | } | ||||
1212 | 702 | 15.3ms | @result; | ||
1213 | } | ||||
1214 | |||||
1215 | # --------------------------------------------------------------------------- | ||||
1216 | |||||
1217 | # split a text into array of paragraphs of sizes between | ||||
1218 | # $chunk_size and 2 * $chunk_size, returning the resulting array | ||||
1219 | |||||
1220 | sub split_into_array_of_short_paragraphs { | ||||
1221 | my @result; | ||||
1222 | my $chunk_size = 1024; | ||||
1223 | my $text_l = length($_[0]); | ||||
1224 | my($j,$ofs); | ||||
1225 | for ($ofs = 0; $text_l - $ofs > 2 * $chunk_size; $ofs = $j+1) { | ||||
1226 | $j = index($_[0], "\n", $ofs+$chunk_size); | ||||
1227 | if ($j < 0) { | ||||
1228 | $j = index($_[0], " ", $ofs+$chunk_size); | ||||
1229 | if ($j < 0) { $j = $ofs+$chunk_size } | ||||
1230 | } | ||||
1231 | push(@result, substr($_[0], $ofs, $j-$ofs+1)); | ||||
1232 | } | ||||
1233 | push(@result, substr($_[0], $ofs)) if $ofs < $text_l; | ||||
1234 | @result; | ||||
1235 | } | ||||
1236 | |||||
1237 | # --------------------------------------------------------------------------- | ||||
1238 | |||||
1239 | 1 | 23µs | 1; | ||
1240 | |||||
1241 | =back | ||||
1242 | |||||
1243 | =cut | ||||
# spent 200µs within Mail::SpamAssassin::Message::CORE:close which was called 16 times, avg 12µs/call:
# 16 times (200µs+0s) by Mail::SpamAssassin::Message::finish at line 637, avg 12µs/call | |||||
# spent 412ms within Mail::SpamAssassin::Message::CORE:match which was called 114442 times, avg 4µs/call:
# 71012 times (239ms+0s) by Mail::SpamAssassin::Message::new at line 340, avg 3µs/call
# 15054 times (43.4ms+0s) by Mail::SpamAssassin::Message::new at line 286, avg 3µs/call
# 13487 times (51.5ms+0s) by Mail::SpamAssassin::Message::new at line 252, avg 4µs/call
# 7410 times (45.9ms+0s) by Mail::SpamAssassin::Message::new at line 302, avg 6µs/call
# 1685 times (8.62ms+0s) by Mail::SpamAssassin::Message::_parse_multipart at line 962, avg 5µs/call
# 622 times (2.34ms+0s) by Mail::SpamAssassin::Message::parse_body at line 745, avg 4µs/call
# 579 times (2.53ms+0s) by Mail::SpamAssassin::Message::_parse_multipart at line 937, avg 4µs/call
# 491 times (2.60ms+0s) by Mail::SpamAssassin::Message::_parse_multipart at line 884, avg 5µs/call
# 468 times (1.57ms+0s) by Mail::SpamAssassin::Message::new at line 195, avg 3µs/call
# 468 times (1.41ms+0s) by Mail::SpamAssassin::Message::new at line 236, avg 3µs/call
# 427 times (1.82ms+0s) by Mail::SpamAssassin::Message::new at line 372, avg 4µs/call
# 425 times (3.41ms+0s) by Mail::SpamAssassin::Message::_parse_normal at line 1056, avg 8µs/call
# 425 times (1.30ms+0s) by Mail::SpamAssassin::Message::parse_body at line 753, avg 3µs/call
# 425 times (1.18ms+0s) by Mail::SpamAssassin::Message::_parse_normal at line 1038, avg 3µs/call
# 425 times (906µs+0s) by Mail::SpamAssassin::Message::_parse_normal at line 1043, avg 2µs/call
# 388 times (2.04ms+0s) by Mail::SpamAssassin::Message::_parse_multipart at line 984, avg 5µs/call
# 197 times (1.11ms+0s) by Mail::SpamAssassin::Message::_parse_multipart at line 856, avg 6µs/call
# 197 times (856µs+0s) by Mail::SpamAssassin::Message::_parse_multipart at line 848, avg 4µs/call
# 170 times (585µs+0s) by Mail::SpamAssassin::Message::new at line 374, avg 3µs/call
# 87 times (332µs+0s) by Mail::SpamAssassin::Message::_parse_multipart at line 1011, avg 4µs/call | |||||
# spent 4.66ms within Mail::SpamAssassin::Message::CORE:qr which was called 702 times, avg 7µs/call:
# 702 times (4.66ms+0s) by Mail::SpamAssassin::Message::get_body_text_array_common at line 1111, avg 7µs/call | |||||
# spent 33.0ms within Mail::SpamAssassin::Message::CORE:regcomp which was called 1246 times, avg 27µs/call:
# 491 times (9.21ms+0s) by Mail::SpamAssassin::Message::_parse_multipart at line 884, avg 19µs/call
# 388 times (6.87ms+0s) by Mail::SpamAssassin::Message::_parse_multipart at line 937, avg 18µs/call
# 197 times (7.90ms+0s) by Mail::SpamAssassin::Message::_parse_multipart at line 848, avg 40µs/call
# 170 times (9.05ms+0s) by Mail::SpamAssassin::Message::new at line 374, avg 53µs/call | |||||
# spent 994µs within Mail::SpamAssassin::Message::CORE:sort which was called 234 times, avg 4µs/call:
# 234 times (994µs+0s) by Mail::SpamAssassin::Message::get_all_metadata at line 584, avg 4µs/call | |||||
# spent 51.6ms within Mail::SpamAssassin::Message::CORE:subst which was called 8112 times, avg 6µs/call:
# 7410 times (22.9ms+0s) by Mail::SpamAssassin::Message::new at line 272, avg 3µs/call
# 702 times (28.7ms+0s) by Mail::SpamAssassin::Message::get_body_text_array_common at line 1144, avg 41µs/call | |||||
# spent 3.13ms within Mail::SpamAssassin::Message::CORE:unlink which was called 16 times, avg 196µs/call:
# 16 times (3.13ms+0s) by Mail::SpamAssassin::Message::finish at line 667, avg 196µs/call |