← Index
NYTProf Performance Profile   « line view »
For /usr/local/bin/sa-learn
  Run on Sun Nov 5 03:09:29 2017
Reported on Mon Nov 6 13:20:46 2017

Filename/usr/local/lib/perl5/site_perl/Mail/SpamAssassin/Message/Node.pm
StatementsExecuted 1260684 statements in 9.46s
Subroutines
Calls P F Exclusive
Time
Inclusive
Time
Subroutine
40380922.96s4.22sMail::SpamAssassin::Message::Node::::headerMail::SpamAssassin::Message::Node::header
936221.68s5.05sMail::SpamAssassin::Message::Node::::get_all_headersMail::SpamAssassin::Message::Node::get_all_headers
30308861.66s3.69sMail::SpamAssassin::Message::Node::::get_headerMail::SpamAssassin::Message::Node::get_header
93641865ms1.17sMail::SpamAssassin::Message::Node::::delete_headerMail::SpamAssassin::Message::Node::delete_header
133285141613ms613msMail::SpamAssassin::Message::Node::::CORE:substMail::SpamAssassin::Message::Node::CORE:subst (opcode)
819111593ms747msMail::SpamAssassin::Message::Node::::_decode_headerMail::SpamAssassin::Message::Node::_decode_header
93611209ms209msMail::SpamAssassin::Message::Node::::CORE:sortMail::SpamAssassin::Message::Node::CORE:sort (opcode)
5711431196ms196msMail::SpamAssassin::Message::Node::::CORE:regcompMail::SpamAssassin::Message::Node::CORE:regcomp (opcode)
6660491194ms194msMail::SpamAssassin::Message::Node::::CORE:matchMail::SpamAssassin::Message::Node::CORE:match (opcode)
70211112ms153msMail::SpamAssassin::Message::Node::::find_partsMail::SpamAssassin::Message::Node::find_parts
4091174.5ms769msMail::SpamAssassin::Message::Node::::decodeMail::SpamAssassin::Message::Node::decode
12753273.6ms23.9sMail::SpamAssassin::Message::Node::::renderedMail::SpamAssassin::Message::Node::rendered
9361139.8ms45.0msMail::SpamAssassin::Message::Node::::raw_headerMail::SpamAssassin::Message::Node::raw_header
18661122.6ms22.6msMail::SpamAssassin::Message::Node::::is_leafMail::SpamAssassin::Message::Node::is_leaf
4251119.9ms25.7msMail::SpamAssassin::Message::Node::::invisible_renderedMail::SpamAssassin::Message::Node::invisible_rendered
6223118.4ms18.4msMail::SpamAssassin::Message::Node::::newMail::SpamAssassin::Message::Node::new
11115.1ms37.2msMail::SpamAssassin::Message::Node::::BEGIN@45Mail::SpamAssassin::Message::Node::BEGIN@45
3881111.6ms15.7msMail::SpamAssassin::Message::Node::::add_body_partMail::SpamAssassin::Message::Node::add_body_part
425119.14ms15.4msMail::SpamAssassin::Message::Node::::visible_renderedMail::SpamAssassin::Message::Node::visible_rendered
118116.02ms17.8msMail::SpamAssassin::Message::Node::::__decode_headerMail::SpamAssassin::Message::Node::__decode_header
217113.12ms4.33msMail::SpamAssassin::Message::Node::::_html_renderMail::SpamAssassin::Message::Node::_html_render
250212.26ms2.26msMail::SpamAssassin::Message::Node::::CORE:substcontMail::SpamAssassin::Message::Node::CORE:substcont (opcode)
118112.26ms2.26msMail::SpamAssassin::Message::Node::::_normalizeMail::SpamAssassin::Message::Node::_normalize
111740µs5.07msMail::SpamAssassin::Message::Node::::BEGIN@49Mail::SpamAssassin::Message::Node::BEGIN@49
11163µs72µsMail::SpamAssassin::Message::Node::::BEGIN@37Mail::SpamAssassin::Message::Node::BEGIN@37
11134µs68µsMail::SpamAssassin::Message::Node::::BEGIN@38Mail::SpamAssassin::Message::Node::BEGIN@38
11132µs212µsMail::SpamAssassin::Message::Node::::BEGIN@46Mail::SpamAssassin::Message::Node::BEGIN@46
11124µs604µsMail::SpamAssassin::Message::Node::::BEGIN@44Mail::SpamAssassin::Message::Node::BEGIN@44
11122µs84µsMail::SpamAssassin::Message::Node::::BEGIN@39Mail::SpamAssassin::Message::Node::BEGIN@39
11112µs12µsMail::SpamAssassin::Message::Node::::BEGIN@43Mail::SpamAssassin::Message::Node::BEGIN@43
0000s0sMail::SpamAssassin::Message::Node::::content_summaryMail::SpamAssassin::Message::Node::content_summary
0000s0sMail::SpamAssassin::Message::Node::::finishMail::SpamAssassin::Message::Node::finish
0000s0sMail::SpamAssassin::Message::Node::::rawMail::SpamAssassin::Message::Node::raw
0000s0sMail::SpamAssassin::Message::Node::::set_renderedMail::SpamAssassin::Message::Node::set_rendered
Call graph for these subroutines as a Graphviz dot language file.
Line State
ments
Time
on line
Calls Time
in subs
Code
1# <@LICENSE>
2# Licensed to the Apache Software Foundation (ASF) under one or more
3# contributor license agreements. See the NOTICE file distributed with
4# this work for additional information regarding copyright ownership.
5# The ASF licenses this file to you under the Apache License, Version 2.0
6# (the "License"); you may not use this file except in compliance with
7# the License. You may obtain a copy of the License at:
8#
9# http://www.apache.org/licenses/LICENSE-2.0
10#
11# Unless required by applicable law or agreed to in writing, software
12# distributed under the License is distributed on an "AS IS" BASIS,
13# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14# See the License for the specific language governing permissions and
15# limitations under the License.
16# </@LICENSE>
17
18=head1 NAME
19
20Mail::SpamAssassin::Message::Node - decode, render, and make available MIME message parts
21
22=head1 SYNOPSIS
23
24=head1 DESCRIPTION
25
26This module will encapsulate an email message and allow access to
27the various MIME message parts.
28
29=head1 PUBLIC METHODS
30
31=over 4
32
33=cut
34
35package Mail::SpamAssassin::Message::Node;
36
37257µs281µs
# spent 72µs (63+9) within Mail::SpamAssassin::Message::Node::BEGIN@37 which was called: # once (63µs+9µs) by Mail::SpamAssassin::Message::BEGIN@55 at line 37
use strict;
# spent 72µs making 1 call to Mail::SpamAssassin::Message::Node::BEGIN@37 # spent 9µs making 1 call to strict::import
38261µs2102µs
# spent 68µs (34+34) within Mail::SpamAssassin::Message::Node::BEGIN@38 which was called: # once (34µs+34µs) by Mail::SpamAssassin::Message::BEGIN@55 at line 38
use warnings;
# spent 68µs making 1 call to Mail::SpamAssassin::Message::Node::BEGIN@38 # spent 34µs making 1 call to warnings::import
39277µs2147µs
# spent 84µs (22+63) within Mail::SpamAssassin::Message::Node::BEGIN@39 which was called: # once (22µs+63µs) by Mail::SpamAssassin::Message::BEGIN@55 at line 39
use re 'taint';
# spent 84µs making 1 call to Mail::SpamAssassin::Message::Node::BEGIN@39 # spent 63µs making 1 call to re::import
40
41128µsrequire 5.008001; # needs utf8::is_utf8()
42
43258µs112µs
# spent 12µs within Mail::SpamAssassin::Message::Node::BEGIN@43 which was called: # once (12µs+0s) by Mail::SpamAssassin::Message::BEGIN@55 at line 43
use Mail::SpamAssassin;
# spent 12µs making 1 call to Mail::SpamAssassin::Message::Node::BEGIN@43
44261µs21.18ms
# spent 604µs (24+580) within Mail::SpamAssassin::Message::Node::BEGIN@44 which was called: # once (24µs+580µs) by Mail::SpamAssassin::Message::BEGIN@55 at line 44
use Mail::SpamAssassin::Constants qw(:sa);
# spent 604µs making 1 call to Mail::SpamAssassin::Message::Node::BEGIN@44 # spent 580µs making 1 call to Exporter::import
452352µs137.2ms
# spent 37.2ms (15.1+22.1) within Mail::SpamAssassin::Message::Node::BEGIN@45 which was called: # once (15.1ms+22.1ms) by Mail::SpamAssassin::Message::BEGIN@55 at line 45
use Mail::SpamAssassin::HTML;
# spent 37.2ms making 1 call to Mail::SpamAssassin::Message::Node::BEGIN@45
462220µs2392µs
# spent 212µs (32+180) within Mail::SpamAssassin::Message::Node::BEGIN@46 which was called: # once (32µs+180µs) by Mail::SpamAssassin::Message::BEGIN@55 at line 46
use Mail::SpamAssassin::Logger;
# spent 212µs making 1 call to Mail::SpamAssassin::Message::Node::BEGIN@46 # spent 180µs making 1 call to Exporter::import
47
48our($enc_utf8, $enc_w1252, $have_encode_detector);
49
# spent 5.07ms (740µs+4.33) within Mail::SpamAssassin::Message::Node::BEGIN@49 which was called: # once (740µs+4.33ms) by Mail::SpamAssassin::Message::BEGIN@55 at line 55
BEGIN {
5014µs eval { require Encode }
51212µs1324µs and do { $enc_utf8 = Encode::find_encoding('UTF-8');
# spent 324µs making 1 call to Encode::find_encoding
5216µs1965µs $enc_w1252 = Encode::find_encoding('Windows-1252') };
# spent 965µs making 1 call to Encode::find_encoding
531448µs eval { require Encode::Detect::Detector }
54217µs and do { $have_encode_detector = 1 };
5517.63ms15.07ms};
# spent 5.07ms making 1 call to Mail::SpamAssassin::Message::Node::BEGIN@49
56
57=item new()
58
59Generates an empty Node object and returns it. Typically only called
60by functions in Message.
61
62=cut
63
64
# spent 18.4ms within Mail::SpamAssassin::Message::Node::new which was called 622 times, avg 30µs/call: # 234 times (7.50ms+0s) by Mail::SpamAssassin::Message::new at line 115 of Mail/SpamAssassin/Message.pm, avg 32µs/call # 197 times (5.36ms+0s) by Mail::SpamAssassin::Message::_parse_multipart at line 873 of Mail/SpamAssassin/Message.pm, avg 27µs/call # 191 times (5.56ms+0s) by Mail::SpamAssassin::Message::_parse_multipart at line 952 of Mail/SpamAssassin/Message.pm, avg 29µs/call
sub new {
656221.54ms my $class = shift;
666221.53ms $class = ref($class) || $class;
67
686225.57ms my $self = {
69 headers => {},
70 raw_headers => {},
71 header_order => []
72 };
73
74 # deal with any parameters
756221.32ms my($opts) = @_;
766222.50ms $self->{normalize} = $opts->{'normalize'} || 0;
77
786221.86ms bless($self,$class);
7962211.5ms $self;
80}
81
82=item find_parts()
83
84Used to search the tree for specific MIME parts. An array of matching
85Node objects (pointers into the tree) is returned. The parameters that
86can be passed in are (in order, all scalars):
87
88Regexp - Used to match against each part's Content-Type header,
89specifically the type and not the rest of the header. ie: "Content-type:
90text/html; encoding=quoted-printable" has a type of "text/html". If no
91regexp is specified, find_parts() will return an empty array.
92
93Only_leaves - By default, find_parts() will return any part that matches
94the regexp, including multipart. If you only want to see leaves of the
95tree (ie: parts that aren't multipart), set this to true (1).
96
97Recursive - By default, when find_parts() finds a multipart which has
98parts underneath it, it will recurse through all sub-children. If set to 0,
99only look at the part and any direct children of the part.
100
101=cut
102
103# Used to find any MIME parts whose simple content-type matches a given regexp
104# Searches it's own and any children parts. Returns an array of MIME
105# objects which match. Our callers may expect the default behavior which is a
106# depth-first array of parts.
107#
108
# spent 153ms (112+40.1) within Mail::SpamAssassin::Message::Node::find_parts which was called 702 times, avg 217µs/call: # 702 times (112ms+40.1ms) by Mail::SpamAssassin::Message::find_parts at line 423 of Mail/SpamAssassin/Message.pm, avg 217µs/call
sub find_parts {
1097021.88ms my ($self, $re, $onlyleaves, $recursive) = @_;
110
111 # Didn't pass an RE? Just abort.
1127022.16ms return () unless defined $re && $re ne '';
113
1147021.36ms $onlyleaves = 0 unless defined $onlyleaves;
115
1167021.38ms my $depth;
1177021.39ms if (defined $recursive && $recursive == 0) {
118 $depth = 1;
119 }
120
1217021.38ms my @ret;
1227022.74ms my @search = ( $self );
123
12470224.0ms while (my $part = shift @search) {
125 # If this object matches, mark it for return.
126186615.2ms186622.6ms my $amialeaf = $part->is_leaf();
# spent 22.6ms making 1866 calls to Mail::SpamAssassin::Message::Node::is_leaf, avg 12µs/call
127
128186648.5ms373217.4ms if ( $part->{'type'} =~ /$re/ && (!$onlyleaves || $amialeaf) ) {
# spent 10.3ms making 1866 calls to Mail::SpamAssassin::Message::Node::CORE:regcomp, avg 6µs/call # spent 7.15ms making 1866 calls to Mail::SpamAssassin::Message::Node::CORE:match, avg 4µs/call
12912752.90ms push(@ret, $part);
130 }
131
132186610.9ms if ( !$amialeaf && (!defined $depth || $depth > 0)) {
1335911.11ms $depth-- if defined $depth;
13411824.85ms unshift(@search, @{$part->{'body_parts'}});
135 }
136 }
137
1387027.14ms return @ret;
139}
140
141=item header()
142
143Stores and retrieves headers from a specific MIME part. The first
144parameter is the header name. If there is no other parameter, the header
145is retrieved. If there is a second parameter, the header is stored.
146
147Header names are case-insensitive and are stored in both raw and
148decoded form. Using header(), only the decoded form is retrievable.
149
150For retrieval, if header() is called in an array context, an array will
151be returned with each header entry in a different element. In a scalar
152context, the last specific header is returned.
153
154ie: If 'Subject' is specified as the header, and there are 2 Subject
155headers in a message, the last/bottom one in the message is returned in
156scalar context or both are returned in array context.
157
158=cut
159
160# Store or retrieve headers from a given MIME object
161#
162
# spent 4.22s (2.96+1.27) within Mail::SpamAssassin::Message::Node::header which was called 40380 times, avg 105µs/call: # 30308 times (1.84s+192ms) by Mail::SpamAssassin::Message::Node::get_header at line 863, avg 67µs/call # 7410 times (930ms+979ms) by Mail::SpamAssassin::Message::new at line 282 of Mail/SpamAssassin/Message.pm, avg 258µs/call # 425 times (20.7ms+3.42ms) by Mail::SpamAssassin::Message::_parse_normal at line 1034 of Mail/SpamAssassin/Message.pm, avg 57µs/call # 425 times (16.7ms+5.83ms) by Mail::SpamAssassin::Message::_parse_normal at line 1042 of Mail/SpamAssassin/Message.pm, avg 53µs/call # 409 times (19.3ms+3.08ms) by Mail::SpamAssassin::Message::Node::decode at line 335, avg 55µs/call # 393 times (53.1ms+43.3ms) by Mail::SpamAssassin::Message::_parse_multipart at line 965 of Mail/SpamAssassin/Message.pm, avg 245µs/call # 388 times (45.2ms+33.5ms) by Mail::SpamAssassin::Message::_parse_multipart at line 978 of Mail/SpamAssassin/Message.pm, avg 203µs/call # 388 times (20.3ms+2.88ms) by Mail::SpamAssassin::Message::_parse_multipart at line 922 of Mail/SpamAssassin/Message.pm, avg 60µs/call # 234 times (12.6ms+1.54ms) by Mail::SpamAssassin::Message::new at line 363 of Mail/SpamAssassin/Message.pm, avg 60µs/call
sub header {
1634038071.7ms my $self = shift;
1644038083.7ms my $rawkey = shift;
165
1664038068.0ms return unless defined $rawkey;
167
168 # we're going to do things case insensitively
16940380101ms my $key = lc($rawkey);
170
171 # Trim whitespace off of the header keys
17240380773ms40380201ms $key =~ s/^\s+//;
# spent 201ms making 40380 calls to Mail::SpamAssassin::Message::Node::CORE:subst, avg 5µs/call
17340380579ms4038096.3ms $key =~ s/\s+$//;
# spent 96.3ms making 40380 calls to Mail::SpamAssassin::Message::Node::CORE:subst, avg 2µs/call
174
1754038071.4ms if (@_) {
176819120.9ms my $raw_value = shift;
177819114.4ms return unless defined $raw_value;
178
1791638299.1ms push @{ $self->{'header_order'} }, $rawkey;
180819129.8ms if ( !exists $self->{'headers'}->{$key} ) {
181718354.5ms $self->{'headers'}->{$key} = [];
182718324.9ms $self->{'raw_headers'}->{$key} = [];
183 }
184
185819119.2ms my $dec_value = $raw_value;
1868191140ms819158.3ms $dec_value =~ s/\n[ \t]+/ /gs;
# spent 58.3ms making 8191 calls to Mail::SpamAssassin::Message::Node::CORE:subst, avg 7µs/call
1878191167ms819193.9ms $dec_value =~ s/\s+$//s;
# spent 93.9ms making 8191 calls to Mail::SpamAssassin::Message::Node::CORE:subst, avg 11µs/call
1888191125ms819168.6ms $dec_value =~ s/^\s+//s;
# spent 68.6ms making 8191 calls to Mail::SpamAssassin::Message::Node::CORE:subst, avg 8µs/call
18916382106ms8191747ms push @{ $self->{'headers'}->{$key} }, $self->_decode_header($dec_value,$key);
# spent 747ms making 8191 calls to Mail::SpamAssassin::Message::Node::_decode_header, avg 91µs/call
190
1911638297.6ms push @{ $self->{'raw_headers'}->{$key} }, $raw_value;
192
1938191109ms return $self->{'headers'}->{$key}->[-1];
194 }
195
1963218957.6ms if (wantarray) {
19731355184ms return unless exists $self->{'headers'}->{$key};
19857940712ms return @{ $self->{'headers'}->{$key} };
199 }
200 else {
2018347.77ms return '' unless exists $self->{'headers'}->{$key};
20238313.7ms return $self->{'headers'}->{$key}->[-1];
203 }
204}
205
206=item raw_header()
207
208Retrieves the raw version of headers from a specific MIME part. The only
209parameter is the header name. Header names are case-insensitive.
210
211For retrieval, if raw_header() is called in an array context, an array
212will be returned with each header entry in a different element. In a
213scalar context, the last specific header is returned.
214
215ie: If 'Subject' is specified as the header, and there are 2 Subject
216headers in a message, the last/bottom one in the message is returned in
217scalar context or both are returned in array context.
218
219=cut
220
221# Retrieve raw headers from a given MIME object
222#
223
# spent 45.0ms (39.8+5.14) within Mail::SpamAssassin::Message::Node::raw_header which was called 936 times, avg 48µs/call: # 936 times (39.8ms+5.14ms) by Mail::SpamAssassin::PerMsgStatus::_get at line 1982 of Mail/SpamAssassin/PerMsgStatus.pm, avg 48µs/call
sub raw_header {
2249361.85ms my $self = shift;
2259362.49ms my $key = lc(shift);
226
227 # Trim whitespace off of the header keys
2289369.55ms9362.82ms $key =~ s/^\s+//;
# spent 2.82ms making 936 calls to Mail::SpamAssassin::Message::Node::CORE:subst, avg 3µs/call
22993617.8ms9362.32ms $key =~ s/\s+$//;
# spent 2.32ms making 936 calls to Mail::SpamAssassin::Message::Node::CORE:subst, avg 2µs/call
230
2319361.84ms if (wantarray) {
2329369.07ms return unless exists $self->{'raw_headers'}->{$key};
2334683.56ms return @{ $self->{'raw_headers'}->{$key} };
234 }
235 else {
236 return '' unless exists $self->{'raw_headers'}->{$key};
237 return $self->{'raw_headers'}->{$key}->[-1];
238 }
239}
240
241=item add_body_part()
242
243Adds a Node child object to the current node object.
244
245=cut
246
247# Add a MIME child part to ourselves
248
# spent 15.7ms (11.6+4.11) within Mail::SpamAssassin::Message::Node::add_body_part which was called 388 times, avg 41µs/call: # 388 times (11.6ms+4.11ms) by Mail::SpamAssassin::Message::_parse_multipart at line 930 of Mail/SpamAssassin/Message.pm, avg 41µs/call
sub add_body_part {
249388881µs my($self, $part) = @_;
250
2513884.19ms3884.11ms dbg("message: added part, type: ".$part->{'type'});
# spent 4.11ms making 388 calls to Mail::SpamAssassin::Logger::dbg, avg 11µs/call
2527765.73ms push @{ $self->{'body_parts'} }, $part;
253}
254
255=item is_leaf()
256
257Returns true if the tree node in question is a leaf of the tree (ie:
258has no children of its own). Note: This function may return odd results
259unless the message has been mime parsed via _do_parse()!
260
261=cut
262
263
# spent 22.6ms within Mail::SpamAssassin::Message::Node::is_leaf which was called 1866 times, avg 12µs/call: # 1866 times (22.6ms+0s) by Mail::SpamAssassin::Message::Node::find_parts at line 126, avg 12µs/call
sub is_leaf {
26418663.50ms my($self) = @_;
265186623.7ms return !exists $self->{'body_parts'};
266}
267
268=item raw()
269
270Return a reference to the the raw array. Treat this as READ ONLY.
271
272=cut
273
274sub raw {
275 my $self = shift;
276
277 # Ok, if we're called we are expected to return an array.
278 # so if it's a file reference, read in the message into an array...
279 #
280 # NOTE: that "ref undef" works, so don't bother checking for a defined var
281 # first.
282 if (ref $self->{'raw'} eq 'GLOB') {
283 my $fd = $self->{'raw'};
284 seek($fd, 0, 0) or die "message: cannot rewind file: $!";
285
286 # dbg("message: (raw) reading mime part from a temporary file");
287 my($nread,$raw_str); $raw_str = '';
288 while ( $nread=sysread($fd, $raw_str, 16384, length $raw_str) ) { }
289 defined $nread or die "error reading: $!";
290 my @array = split(/^/m, $raw_str, -1);
291
292 dbg("message: empty message read") if $raw_str eq '';
293 return \@array;
294 }
295
296 return $self->{'raw'};
297}
298
299=item decode()
300
301If necessary, decode the part text as base64 or quoted-printable.
302The decoded text will be returned as a scalar string. An optional length
303parameter can be passed in which limits how much decoded data is returned.
304If the scalar isn't needed, call with "0" as a parameter.
305
306=cut
307
308
# spent 769ms (74.5+694) within Mail::SpamAssassin::Message::Node::decode which was called 409 times, avg 1.88ms/call: # 409 times (74.5ms+694ms) by Mail::SpamAssassin::Message::Node::rendered at line 604, avg 1.88ms/call
sub decode {
3094091.22ms my($self, $bytes) = @_;
310
3114091.81ms if ( !exists $self->{'decoded'} ) {
312 # Someone is looking for a decoded part where there is no raw data
313 # (multipart or subparsed message, etc.) Just return undef.
314409936µs return if !exists $self->{'raw'};
315
316409732µs my $raw;
317
318 # if the part is held in a temp file, read it into the scalar
3194092.73ms if (ref $self->{'raw'} eq 'GLOB') {
320 my $fd = $self->{'raw'};
321 seek($fd, 0, 0) or die "message: cannot rewind file: $!";
322
323 # dbg("message: (decode) reading mime part from a temporary file");
324 my($nread,$raw_str); $raw = '';
325 while ( $nread=sysread($fd, $raw, 16384, length $raw) ) { }
326 defined $nread or die "error reading: $!";
327
328 dbg("message: empty message read from a temp file") if $raw eq '';
329 }
330 else {
331 # create a new scalar from the raw array in memory
33281827.5ms $raw = join('', @{$self->{'raw'}});
333 }
334
3354093.85ms40922.4ms my $encoding = lc $self->header('content-transfer-encoding') || '';
# spent 22.4ms making 409 calls to Mail::SpamAssassin::Message::Node::header, avg 55µs/call
336
3374092.53ms if ( $encoding eq 'quoted-printable' ) {
3382021.36ms2021.44ms dbg("message: decoding quoted-printable");
# spent 1.44ms making 202 calls to Mail::SpamAssassin::Logger::dbg, avg 7µs/call
3392022.39ms202596ms $self->{'decoded'} = Mail::SpamAssassin::Util::qp_decode($raw);
# spent 596ms making 202 calls to Mail::SpamAssassin::Util::qp_decode, avg 2.95ms/call
3402024.93ms2023.14ms $self->{'decoded'} =~ s/\015\012/\012/gs;
# spent 3.14ms making 202 calls to Mail::SpamAssassin::Message::Node::CORE:subst, avg 16µs/call
341 }
342 elsif ( $encoding eq 'base64' ) {
34355400µs55429µs dbg("message: decoding base64");
# spent 429µs making 55 calls to Mail::SpamAssassin::Logger::dbg, avg 8µs/call
344
345 # if it's not defined or is 0, do the whole thing, otherwise only decode
346 # a portion
34755224µs if ($bytes) {
348 return Mail::SpamAssassin::Util::base64_decode($raw, $bytes);
349 }
350 else {
351 # Generate the decoded output
35255600µs5562.6ms $self->{'decoded'} = Mail::SpamAssassin::Util::base64_decode($raw);
# spent 62.6ms making 55 calls to Mail::SpamAssassin::Util::base64_decode, avg 1.14ms/call
353 }
354
35555963µs55363µs if ( $self->{'type'} =~ m@^(?:text|message)\b/@i ) {
# spent 363µs making 55 calls to Mail::SpamAssassin::Message::Node::CORE:match, avg 7µs/call
356556.96ms556.54ms $self->{'decoded'} =~ s/\015\012/\012/gs;
# spent 6.54ms making 55 calls to Mail::SpamAssassin::Message::Node::CORE:subst, avg 119µs/call
357 }
358 }
359 else {
360 # Encoding is one of 7bit, 8bit, binary or x-something
361152669µs if ( $encoding ) {
3621061.10ms106824µs dbg("message: decoding other encoding type ($encoding), ignoring");
# spent 824µs making 106 calls to Mail::SpamAssassin::Logger::dbg, avg 8µs/call
363 }
364 else {
36546335µs46434µs dbg("message: no encoding detected");
# spent 434µs making 46 calls to Mail::SpamAssassin::Logger::dbg, avg 9µs/call
366 }
367152672µs $self->{'decoded'} = $raw;
368 }
369 }
370
371409940µs if ( !defined $bytes || $bytes ) {
372409876µs if ( !defined $bytes ) {
373 # force a copy
3744098.90ms return '' . $self->{'decoded'};
375 }
376 else {
377 return substr($self->{'decoded'}, 0, $bytes);
378 }
379 }
380}
381
382# Look at a text scalar and determine whether it should be rendered
383# as text/html.
384#
385# This is not a public function.
386#
387
# spent 4.33ms (3.12+1.21) within Mail::SpamAssassin::Message::Node::_html_render which was called 217 times, avg 20µs/call: # 217 times (3.12ms+1.21ms) by Mail::SpamAssassin::Message::Node::rendered at line 609, avg 20µs/call
sub _html_render {
3882172.83ms2171.21ms if ($_[0] =~ m/^(.{0,18}?<(?:body|head|html|img|pre|table|title)(?:\s.{0,18}?)?>)/is)
# spent 1.21ms making 217 calls to Mail::SpamAssassin::Message::Node::CORE:match, avg 6µs/call
389 {
390 my $pad = $1;
391 my $count = 0;
392 $count += ($pad =~ tr/\n//d) * 2;
393 $count += ($pad =~ tr/\n//cd);
394 return ($count < 24);
395 }
3962171.82ms return 0;
397}
398
399# Decode character set of a given text to perl characters (Unicode),
400# then encode into UTF-8 octets if requested.
401#
402
# spent 2.26ms within Mail::SpamAssassin::Message::Node::_normalize which was called 118 times, avg 19µs/call: # 118 times (2.26ms+0s) by Mail::SpamAssassin::Message::Node::__decode_header at line 783, avg 19µs/call
sub _normalize {
403118227µs my $self = $_[0];
404# my $data = $_[1]; # avoid copying large strings
405118296µs my $charset_declared = $_[2];
406118218µs my $return_decoded = $_[3]; # true: Unicode characters, false: UTF-8 octets
407
4081181.79ms return $_[1] unless $self->{normalize} && $enc_utf8;
409
410 warn "message: _normalize() was given characters, expected bytes: $_[1]\n"
411 if utf8::is_utf8($_[1]);
412
413 # workaround for Encode::decode taint laundering bug [rt.cpan.org #84879]
414 my $data_taint = substr($_[1], 0, 0); # empty string, tainted like $data
415
416 if (!defined $charset_declared || $charset_declared eq '') {
417 $charset_declared = 'us-ascii';
418 }
419
420 # number of characters with code above 127
421 my $cnt_8bits = $_[1] =~ tr/\x00-\x7F//c;
422
423 if (!$cnt_8bits &&
424 $charset_declared =~
425 /^(?: (?:US-)?ASCII | ANSI[_ ]? X3\.4- (?:1986|1968) |
426 ISO646-US )\z/xsi)
427 { # declared as US-ASCII (a.k.a. ANSI X3.4-1986) and it really is
428 dbg("message: kept, charset is US-ASCII as declared");
429 return $_[1]; # is all-ASCII, no need for decoding
430 }
431
432 if (!$cnt_8bits &&
433 $charset_declared =~
434 /^(?: ISO[ -]?8859 (?: - \d{1,2} )? | Windows-\d{4} |
435 UTF-?8 | (KOI8|EUC)-[A-Z]{1,2} |
436 Big5 | GBK | GB[ -]?18030 (?:-20\d\d)? )\z/xsi)
437 { # declared as extended ASCII, but it is actually a plain 7-bit US-ASCII
438 dbg("message: kept, charset is US-ASCII, declared %s", $charset_declared);
439 return $_[1]; # is all-ASCII, no need for decoding
440 }
441
442 # Try first to strictly decode based on a declared character set.
443
444 my $rv;
445 if ($charset_declared =~ /^UTF-?8\z/i) {
446 # attempt decoding as strict UTF-8 (flags: FB_CROAK | LEAVE_SRC)
447 if (eval { $rv = $enc_utf8->decode($_[1], 1|8); defined $rv }) {
448 dbg("message: decoded as declared charset UTF-8");
449 return $_[1] if !$return_decoded;
450 $rv .= $data_taint; # carry taintedness over, avoid Encode bug
451 return $rv; # decoded
452 } else {
453 dbg("message: failed decoding as declared charset UTF-8");
454 };
455
456 } elsif ($cnt_8bits &&
457 eval { $rv = $enc_utf8->decode($_[1], 1|8); defined $rv }) {
458 dbg("message: decoded as charset UTF-8, declared %s", $charset_declared);
459 return $_[1] if !$return_decoded;
460 $rv .= $data_taint; # carry taintedness over, avoid Encode bug
461 return $rv; # decoded
462
463 } elsif ($charset_declared =~ /^(?:US-)?ASCII\z/i) {
464 # declared as US-ASCII but contains 8-bit characters, makes no sense
465 # to attempt decoding first as strict US-ASCII as we know it would fail
466
467 } else {
468 # try decoding as a declared character set
469
470 # -> http://en.wikipedia.org/wiki/Windows-1252
471 # Windows-1252 character encoding is a superset of ISO 8859-1, but differs
472 # from the IANA's ISO-8859-1 by using displayable characters rather than
473 # control characters in the 80 to 9F (hex) range. [...]
474 # It is very common to mislabel Windows-1252 text with the charset label
475 # ISO-8859-1. A common result was that all the quotes and apostrophes
476 # (produced by "smart quotes" in word-processing software) were replaced
477 # with question marks or boxes on non-Windows operating systems, making
478 # text difficult to read. Most modern web browsers and e-mail clients
479 # treat the MIME charset ISO-8859-1 as Windows-1252 to accommodate
480 # such mislabeling. This is now standard behavior in the draft HTML 5
481 # specification, which requires that documents advertised as ISO-8859-1
482 # actually be parsed with the Windows-1252 encoding.
483 #
484 my($chset, $decoder);
485 if ($charset_declared =~ /^(?: ISO-?8859-1 | Windows-1252 | CP1252 )\z/xi) {
486 $chset = 'Windows-1252'; $decoder = $enc_w1252;
487 } else {
488 $chset = $charset_declared; $decoder = Encode::find_encoding($chset);
489 if (!$decoder && $chset =~ /^GB[ -]?18030(?:-20\d\d)?\z/i) {
490 $decoder = Encode::find_encoding('GBK'); # a subset of GB18030
491 dbg("message: no decoder for a declared charset %s, using GBK",
492 $chset) if $decoder;
493 }
494 }
495 if (!$decoder) {
496 dbg("message: failed decoding, no decoder for a declared charset %s",
497 $chset);
498 } else {
499 eval { $rv = $decoder->decode($_[1], 1|8) }; # FB_CROAK | LEAVE_SRC
500 if (lc $chset eq lc $charset_declared) {
501 dbg("message: %s as declared charset %s",
502 defined $rv ? 'decoded' : 'failed decoding', $charset_declared);
503 } else {
504 dbg("message: %s as charset %s, declared %s",
505 defined $rv ? 'decoded' : 'failed decoding',
506 $chset, $charset_declared);
507 }
508 }
509 }
510
511 # If the above failed, check if it is US-ASCII, possibly extended by few
512 # NBSP or SHY characters from ISO-8859-* or Windows-1252, or containing
513 # some popular punctuation or special characters from Windows-1252 in
514 # the \x80-\x9F range (which is unassigned in ISO-8859-*).
515 # Note that Windows-1252 is a proper superset of ISO-8859-1.
516 #
517 if (!defined $rv && !$cnt_8bits) {
518 dbg("message: kept, guessed charset is US-ASCII, declared %s",
519 $charset_declared);
520 return $_[1]; # is all-ASCII, no need for decoding
521
522 } elsif (!defined $rv && $enc_w1252 &&
523 # ASCII NBSP (c) SHY ' " ... '".- TM
524 $_[1] !~ tr/\x00-\x7F\xA0\xA9\xAD\x82\x84\x85\x91-\x97\x99//c)
525 { # ASCII + NBSP + SHY + some punctuation characters
526 # NBSP (A0) and SHY (AD) are at the same position in ISO-8859-* too
527 # consider also: AE (r), 80 Euro
528 eval { $rv = $enc_w1252->decode($_[1], 1|8) }; # FB_CROAK | LEAVE_SRC
529 # the above can't fail, but keep code general just in case
530 dbg("message: %s as guessed charset %s, declared %s",
531 defined $rv ? 'decoded' : 'failed decoding',
532 'Windows-1252', $charset_declared);
533 }
534
535 # If we were unsuccessful so far, try some guesswork
536 # based on Encode::Detect::Detector .
537
538 if (defined $rv) {
539 # done, no need for guesswork
540 } elsif (!$have_encode_detector) {
541 dbg("message: Encode::Detect::Detector not available, declared %s failed",
542 $charset_declared);
543 } else {
544 my $charset_detected = Encode::Detect::Detector::detect($_[1]);
545 if ($charset_detected && lc $charset_detected ne lc $charset_declared) {
546 my $decoder = Encode::find_encoding($charset_detected);
547 if (!$decoder && $charset_detected =~ /^GB[ -]?18030(?:-20\d\d)?\z/i) {
548 $decoder = Encode::find_encoding('GBK'); # a subset of GB18030
549 dbg("message: no decoder for a detected charset %s, using GBK",
550 $charset_detected) if $decoder;
551 }
552 if (!$decoder) {
553 dbg("message: failed decoding, no decoder for a detected charset %s",
554 $charset_detected);
555 } else {
556 eval { $rv = $decoder->decode($_[1], 1|8) }; # FB_CROAK | LEAVE_SRC
557 dbg("message: %s as detected charset %s, declared %s",
558 defined $rv ? 'decoded' : 'failed decoding',
559 $charset_detected, $charset_declared);
560 }
561 }
562 }
563
564 if (!defined $rv) { # all decoding attempts failed so far, probably garbage
565 # go for Windows-1252 which can't fail
566 eval { $rv = $enc_w1252->decode($_[1]) };
567 dbg("message: %s as last-resort charset %s, declared %s",
568 defined $rv ? 'decoded' : 'failed decoding',
569 'Windows-1252', $charset_declared);
570 }
571
572 if (!defined $rv) { # just in case - all decoding attempts failed so far
573 return $_[1]; # garbage-in / garbage-out, return unchanged octets
574 }
575 # decoding octets to characters was successful
576 if (!$return_decoded) {
577 # utf8::encode() is much faster than $enc_utf8->encode on utf8-flagged arg
578 utf8::encode($rv); # encode Unicode characters to UTF-8 octets
579 }
580 $rv .= $data_taint; # carry taintedness over, avoid Encode bug
581 return $rv;
582}
583
584=item rendered()
585
586render_text() takes the given text/* type MIME part, and attempts to
587render it into a text scalar. It will always render text/html, and will
588use a heuristic to determine if other text/* parts should be considered
589text/html. Two scalars are returned: the rendered type (either text/html
590or whatever the original type was), and the rendered text.
591
592=cut
593
594
# spent 23.9s (73.6ms+23.8) within Mail::SpamAssassin::Message::Node::rendered which was called 1275 times, avg 18.7ms/call: # 425 times (61.8ms+23.8s) by Mail::SpamAssassin::Message::get_body_text_array_common at line 1128 of Mail/SpamAssassin/Message.pm, avg 56.2ms/call # 425 times (6.12ms+88µs) by Mail::SpamAssassin::Message::Node::visible_rendered at line 697, avg 15µs/call # 425 times (5.68ms+96µs) by Mail::SpamAssassin::Message::Node::invisible_rendered at line 709, avg 14µs/call
sub rendered {
59512752.44ms my ($self) = @_;
596
59712755.63ms if (!exists $self->{rendered}) {
598 # We only know how to render text/plain and text/html ...
599 # Note: for bug 4843, make sure to skip text/calendar parts
600 # we also want to skip things like text/x-vcard
601 # text/x-aol is ignored here, but looks like text/html ...
6024577.44ms4573.70ms return(undef,undef) unless ( $self->{'type'} =~ /^text\/(?:plain|html)$/i );
# spent 3.70ms making 457 calls to Mail::SpamAssassin::Message::Node::CORE:match, avg 8µs/call
603
6044094.06ms409769ms my $text = $self->decode; # QP and Base64 decoding, bytes
# spent 769ms making 409 calls to Mail::SpamAssassin::Message::Node::decode, avg 1.88ms/call
6054091.44ms my $text_len = length($text); # num of bytes in original charset encoding
606
607 # render text/html always, or any other text|text/plain part as text/html
608 # based on a heuristic which simulates a certain common mail client
60940911.1ms8407.53ms if ($text ne '' && ($self->{'type'} =~ m{^text/html$}i ||
# spent 4.33ms making 217 calls to Mail::SpamAssassin::Message::Node::_html_render, avg 20µs/call # spent 3.21ms making 623 calls to Mail::SpamAssassin::Message::Node::CORE:match, avg 5µs/call
610 ($self->{'type'} =~ m{^text/plain$}i &&
611 _html_render(substr($text, 0, 23)))))
612 {
613189640µs $self->{rendered_type} = 'text/html';
614
615 # will input text to HTML::Parser be provided as Unicode characters?
616189438µs my $character_semantics = 0; # $text is in bytes
6171893.57ms1881.08ms if ($self->{normalize} && $enc_utf8) { # charset decoding requested
# spent 1.08ms making 188 calls to Mail::SpamAssassin::Message::Node::CORE:match, avg 6µs/call
618 # Provide input to HTML::Parser as Unicode characters
619 # which avoids a HTML::Parser bug in utf8_mode
620 # https://rt.cpan.org/Public/Bug/Display.html?id=99755
621 # Avoid unnecessary step of encoding-then-decoding by telling
622 # subroutine _normalize() to return Unicode text. See Bug 7133
623 #
624 $character_semantics = 1; # $text will be in characters
625 $text = $self->_normalize($text, $self->{charset}, 1); # bytes to chars
626 } elsif (!defined $self->{charset} ||
627 $self->{charset} =~ /^(?:US-ASCII|UTF-8)\z/i) {
628 # With some luck input can be interpreted as UTF-8, do not warn.
629 # It is still possible to hit the HTML::Parses utf8_mode bug however.
630 } else {
631 dbg("message: 'normalize_charset' is off, encoding will likely ".
63246409µs46352µs "be misinterpreted; declared charset: %s", $self->{charset});
# spent 352µs making 46 calls to Mail::SpamAssassin::Logger::dbg, avg 8µs/call
633 }
634 # the 0 requires decoded HTML results to be in bytes (not characters)
6351892.55ms18980.2ms my $html = Mail::SpamAssassin::HTML->new($character_semantics,0); # object
# spent 80.2ms making 189 calls to Mail::SpamAssassin::HTML::new, avg 425µs/call
636
6371891.79ms18922.6s $html->parse($text); # parse+render text
# spent 22.6s making 189 calls to Mail::SpamAssassin::HTML::parse, avg 119ms/call
638
639 # resulting HTML-decoded text is in bytes, likely encoded as UTF-8
6401892.20ms18910.7ms $self->{rendered} = $html->get_rendered_text();
# spent 10.7ms making 189 calls to Mail::SpamAssassin::HTML::get_rendered_text, avg 57µs/call
6411891.75ms189193ms $self->{visible_rendered} = $html->get_rendered_text(invisible => 0);
# spent 193ms making 189 calls to Mail::SpamAssassin::HTML::get_rendered_text, avg 1.02ms/call
6421891.65ms189179ms $self->{invisible_rendered} = $html->get_rendered_text(invisible => 1);
# spent 179ms making 189 calls to Mail::SpamAssassin::HTML::get_rendered_text, avg 947µs/call
6431891.81ms1891.56ms $self->{html_results} = $html->get_results();
# spent 1.56ms making 189 calls to Mail::SpamAssassin::HTML::get_results, avg 8µs/call
644
645 # end-of-document result values that require looking at the text
646189481µs my $r = $self->{html_results}; # temporary reference for brevity
647
648 # count the number of spaces in the rendered text (likely UTF-8 octets)
6491891.51ms my $space = $self->{rendered} =~ tr/ \t\n\r\x0b//;
650 # we may want to add the count of other Unicode whitespace characters
651
652189783µs $r->{html_length} = length $self->{rendered}; # bytes (likely UTF-8)
653189788µs $r->{non_space_len} = $r->{html_length} - $space;
65418910.9ms $r->{ratio} = ($text_len - $r->{html_length}) / $text_len if $text_len;
655 }
656
657 else { # plain text
658220544µs if ($self->{normalize} && $enc_utf8) {
659 # request transcoded result as UTF-8 octets!
660 $text = $self->_normalize($text, $self->{charset}, 0);
661 }
662220873µs $self->{rendered_type} = $self->{type};
6632201.19ms $self->{rendered} = $self->{'visible_rendered'} = $text;
664220925µs $self->{'invisible_rendered'} = '';
665 }
666 }
667
668122724.2ms return ($self->{rendered_type}, $self->{rendered});
669}
670
671=item set_rendered($text, $type)
672
673Set the rendered text and type for the given part. If type is not
674specified, and text is a defined value, a default of 'text/plain' is used.
675This can be used, for instance, to render non-text parts using plugins.
676
677=cut
678
679sub set_rendered {
680 my ($self, $text, $type) = @_;
681
682 $type = 'text/plain' if (!defined $type && defined $text);
683
684 $self->{'rendered_type'} = $type;
685 $self->{'rendered'} = $self->{'visible_rendered'} = $text;
686 $self->{'invisible_rendered'} = defined $text ? '' : undef;
687}
688
689=item visible_rendered()
690
691Render and return the visible text in this part.
692
693=cut
694
695
# spent 15.4ms (9.14+6.21) within Mail::SpamAssassin::Message::Node::visible_rendered which was called 425 times, avg 36µs/call: # 425 times (9.14ms+6.21ms) by Mail::SpamAssassin::Message::get_body_text_array_common at line 1128 of Mail/SpamAssassin/Message.pm, avg 36µs/call
sub visible_rendered {
696425889µs my ($self) = @_;
6974253.45ms4256.21ms $self->rendered(); # ignore return, we want just this:
# spent 6.21ms making 425 calls to Mail::SpamAssassin::Message::Node::rendered, avg 15µs/call
6984254.74ms return ($self->{rendered_type}, $self->{visible_rendered});
699}
700
701=item invisible_rendered()
702
703Render and return the invisible text in this part.
704
705=cut
706
707
# spent 25.7ms (19.9+5.77) within Mail::SpamAssassin::Message::Node::invisible_rendered which was called 425 times, avg 61µs/call: # 425 times (19.9ms+5.77ms) by Mail::SpamAssassin::Message::get_body_text_array_common at line 1128 of Mail/SpamAssassin/Message.pm, avg 61µs/call
sub invisible_rendered {
708425857µs my ($self) = @_;
7094252.64ms4255.77ms $self->rendered(); # ignore return, we want just this:
# spent 5.77ms making 425 calls to Mail::SpamAssassin::Message::Node::rendered, avg 14µs/call
7104254.57ms return ($self->{rendered_type}, $self->{invisible_rendered});
711}
712
713=item content_summary()
714
715Returns an array of scalars describing the mime parts of the message.
716Note: This function requires that the message be parsed first!
717
718=cut
719
720# return an array with scalars describing mime parts
721sub content_summary {
722 my($self) = @_;
723
724 my @ret = ( [ $self->{'type'} ] );
725 my @search;
726
727 if (exists $self->{'body_parts'}) {
728 my $count = @{$self->{'body_parts'}};
729 for(my $i=0; $i<$count; $i++) {
730 push(@search, [ $i+1, $self->{'body_parts'}->[$i] ]);
731 }
732 }
733
734 while(my $part = shift @search) {
735 my($index, $part) = @{$part};
736 push(@{$ret[$index]}, $part->{'type'});
737 if (exists $part->{'body_parts'}) {
738 unshift(@search, map { [ $index, $_ ] } @{$part->{'body_parts'}});
739 }
740 }
741
742 return map { join(",", @{$_}) } @ret;
743}
744
745=item delete_header()
746
747Delete the specified header (decoded and raw) from the Node information.
748
749=cut
750
751
# spent 1.17s (865ms+308ms) within Mail::SpamAssassin::Message::Node::delete_header which was called 936 times, avg 1.25ms/call: # 234 times (241ms+86.9ms) by Mail::SpamAssassin::Message::Metadata::parse_received_headers at line 273 of Mail/SpamAssassin/Message/Metadata/Received.pm, avg 1.40ms/call # 234 times (232ms+73.6ms) by Mail::SpamAssassin::Message::Metadata::parse_received_headers at line 276 of Mail/SpamAssassin/Message/Metadata/Received.pm, avg 1.31ms/call # 234 times (204ms+72.6ms) by Mail::SpamAssassin::Message::Metadata::parse_received_headers at line 274 of Mail/SpamAssassin/Message/Metadata/Received.pm, avg 1.18ms/call # 234 times (187ms+74.6ms) by Mail::SpamAssassin::Message::Metadata::parse_received_headers at line 275 of Mail/SpamAssassin/Message/Metadata/Received.pm, avg 1.12ms/call
sub delete_header {
7529362.26ms my($self, $hdr) = @_;
753
7541872547ms51216135ms foreach ( grep(/^${hdr}$/i, keys %{$self->{'headers'}}) ) {
# spent 90.1ms making 25608 calls to Mail::SpamAssassin::Message::Node::CORE:regcomp, avg 4µs/call # spent 45.0ms making 25608 calls to Mail::SpamAssassin::Message::Node::CORE:match, avg 2µs/call
755 delete $self->{'headers'}->{$_};
756 delete $self->{'raw_headers'}->{$_};
757 }
758
7591872618ms59280172ms my @neworder = grep(!/^${hdr}$/i, @{$self->{'header_order'}});
# spent 95.5ms making 29640 calls to Mail::SpamAssassin::Message::Node::CORE:regcomp, avg 3µs/call # spent 77.0ms making 29640 calls to Mail::SpamAssassin::Message::Node::CORE:match, avg 3µs/call
76093612.1ms $self->{'header_order'} = \@neworder;
761}
762
763# decode a header appropriately. don't bother adding it to the pod documents.
764
# spent 17.8ms (6.02+11.8) within Mail::SpamAssassin::Message::Node::__decode_header which was called 118 times, avg 151µs/call: # 118 times (6.02ms+11.8ms) by Mail::SpamAssassin::Message::Node::_decode_header at line 819, avg 151µs/call
sub __decode_header {
7651181.25ms my ( $self, $encoding, $cte, $data ) = @_;
766
767118543µs if ( $cte eq 'B' ) {
768 # base 64 encoded
76915178µs151.32ms $data = Mail::SpamAssassin::Util::base64_decode($data);
# spent 1.32ms making 15 calls to Mail::SpamAssassin::Util::base64_decode, avg 88µs/call
770 }
771 elsif ( $cte eq 'Q' ) {
772 # quoted printable
773
774 # the RFC states that in the encoded text, "_" is equal to "=20"
7751031.55ms103882µs $data =~ s/_/=20/g;
# spent 882µs making 103 calls to Mail::SpamAssassin::Message::Node::CORE:subst, avg 9µs/call
776
777103738µs1037.30ms $data = Mail::SpamAssassin::Util::qp_decode($data);
# spent 7.30ms making 103 calls to Mail::SpamAssassin::Util::qp_decode, avg 71µs/call
778 }
779 else {
780 # not possible since the input has already been limited to 'B' and 'Q'
781 die "message: unknown encoding type '$cte' in RFC2047 header";
782 }
7831182.39ms1182.26ms return $self->_normalize($data, $encoding, 0); # transcode to UTF-8 octets
# spent 2.26ms making 118 calls to Mail::SpamAssassin::Message::Node::_normalize, avg 19µs/call
784}
785
786# Decode base64 and quoted-printable in headers according to RFC2047.
787#
788
# spent 747ms (593+154) within Mail::SpamAssassin::Message::Node::_decode_header which was called 8191 times, avg 91µs/call: # 8191 times (593ms+154ms) by Mail::SpamAssassin::Message::Node::header at line 189, avg 91µs/call
sub _decode_header {
789819188.6ms my($self, $header_field_body, $header_field_name) = @_;
790
791819119.3ms return '' unless defined $header_field_body && $header_field_body ne '';
792
793 # deal with folding and cream the newlines and such
7947950109ms795025.1ms $header_field_body =~ s/\n[ \t]+/\n /g;
# spent 25.1ms making 7950 calls to Mail::SpamAssassin::Message::Node::CORE:subst, avg 3µs/call
795795094.3ms795017.7ms $header_field_body =~ s/\015?\012//gs;
# spent 17.7ms making 7950 calls to Mail::SpamAssassin::Message::Node::CORE:subst, avg 2µs/call
796
7977950153ms795055.2ms if ($header_field_name =~
# spent 55.2ms making 7950 calls to Mail::SpamAssassin::Message::Node::CORE:match, avg 7µs/call
798 /^ (?: (?: Received | (?:Resent-)? (?: Message-ID | Date ) |
799 MIME-Version | References | In-Reply-To ) \z
800 | (?: List- | Content- ) ) /xsi ) {
801 # Bug 6945: some header fields must not be processed for MIME encoding
802
803 } else {
804490335.0ms local($1,$2,$3);
805
806 # Multiple encoded sections must ignore the interim whitespace.
807 # To avoid possible FPs with (\s+(?==\?))?, look for the whole RE
808 # separated by whitespace.
809490366.8ms501014.8ms 1 while $header_field_body =~
# spent 14.1ms making 4917 calls to Mail::SpamAssassin::Message::Node::CORE:subst, avg 3µs/call # spent 737µs making 93 calls to Mail::SpamAssassin::Message::Node::CORE:substcont, avg 8µs/call
810 s{ ( = \? [A-Za-z0-9_-]+ \? [bqBQ] \? [^?]* \? = ) \s+
811 {$1$2}xsg;
812
813
814 # transcode properly encoded RFC 2047 substrings into UTF-8 octets,
815 # leave everything else unchanged as it is supposed to be UTF-8 (RFC 6532)
816 # or plain US-ASCII
817490387.9ms506023.3ms $header_field_body =~
# spent 21.8ms making 4903 calls to Mail::SpamAssassin::Message::Node::CORE:subst, avg 4µs/call # spent 1.52ms making 157 calls to Mail::SpamAssassin::Message::Node::CORE:substcont, avg 10µs/call
818 s{ (?: = \? ([A-Za-z0-9_-]+) \? ([bqBQ]) \? ([^?]*) \? = ) }
8191181.87ms11817.8ms { $self->__decode_header($1, uc($2), $3) }xsge;
# spent 17.8ms making 118 calls to Mail::SpamAssassin::Message::Node::__decode_header, avg 151µs/call
820 }
821
822# dbg("message: _decode_header %s: %s", $header_field_name, $header_field_body);
8237950112ms return $header_field_body;
824}
825
826=item get_header()
827
828Retrieve a specific header. Will have a newline at the end and will be
829unfolded. The first parameter is the header name (case-insensitive),
830and the second parameter (optional) is whether or not to return the
831raw header.
832
833If get_header() is called in an array context, an array will be returned
834with each header entry in a different element. In a scalar context,
835the last specific header is returned.
836
837ie: If 'Subject' is specified as the header, and there are 2 Subject
838headers in a message, the last/bottom one in the message is returned in
839scalar context or both are returned in array context.
840
841Btw, returning the last header field (not the first) happens to be consistent
842with DKIM signatures, which search for and cover multiple header fields
843bottom-up according to the 'h' tag. Let's keep it this way.
844
845=cut
846
847
# spent 3.69s (1.66+2.03) within Mail::SpamAssassin::Message::Node::get_header which was called 30308 times, avg 122µs/call: # 25608 times (1.43s+1.74s) by Mail::SpamAssassin::Message::Node::get_all_headers at line 914, avg 124µs/call # 1190 times (43.9ms+71.2ms) by Mail::SpamAssassin::PerMsgStatus::_get at line 1982 of Mail/SpamAssassin/PerMsgStatus.pm, avg 97µs/call # 1170 times (47.3ms+72.0ms) by Mail::SpamAssassin::Message::Metadata::parse_received_headers at line 128 of Mail/SpamAssassin/Message/Metadata/Received.pm, avg 102µs/call # 702 times (48.0ms+41.0ms) by Mail::SpamAssassin::Plugin::Bayes::get_msgid at line 976 of Mail/SpamAssassin/Plugin/Bayes.pm, avg 127µs/call # 702 times (35.1ms+39.8ms) by Mail::SpamAssassin::Plugin::Bayes::get_msgid at line 989 of Mail/SpamAssassin/Plugin/Bayes.pm, avg 107µs/call # 468 times (24.9ms+36.3ms) by Mail::SpamAssassin::Message::get_body_text_array_common at line 1118 of Mail/SpamAssassin/Message.pm, avg 131µs/call # 234 times (17.2ms+23.7ms) by Mail::SpamAssassin::Message::Metadata::parse_received_headers at line 121 of Mail/SpamAssassin/Message/Metadata/Received.pm, avg 175µs/call # 234 times (7.49ms+11.5ms) by main::wanted at line 568 of /usr/local/bin/sa-learn, avg 81µs/call
sub get_header {
8483030872.4ms my ($self, $hdr, $raw) = @_;
8493030854.0ms $raw ||= 0;
850
851 # And now pick up all the entries into a list
852 # This is assumed to include a newline at the end ...
853 # This is also assumed to have removed continuation bits ...
854
855 # Deal with the possibility that header() or raw_header() returns undef
8563030852.1ms my @hdrs;
85730308112ms if ( $raw ) {
858 if (@hdrs = $self->raw_header($hdr)) {
859 s/\015?\012\s+/ /gs for @hdrs;
860 }
861 }
862 else {
86330308447ms303082.03s if (@hdrs = $self->header($hdr)) {
# spent 2.03s making 30308 calls to Mail::SpamAssassin::Message::Node::header, avg 67µs/call
86427939284ms $_ .= "\n" for @hdrs;
865 }
866 }
867
8683030854.3ms if (wantarray) {
86927032482ms return @hdrs;
870 }
871 else {
872327655.2ms return @hdrs ? $hdrs[-1] : undef;
873 }
874}
875
876=item get_all_headers()
877
878Retrieve all headers. Each header will have a newline at the end and
879will be unfolded. The first parameter (optional) is whether or not to
880return the raw headers, and the second parameter (optional) is whether
881or not to include the mbox separator.
882
883If get_all_header() is called in an array context, an array will be
884returned with each header entry in a different element. In a scalar
885context, the headers are returned in a single scalar.
886
887=back
888
889=cut
890
891# build it and it will not bomb
892
# spent 5.05s (1.68+3.38) within Mail::SpamAssassin::Message::Node::get_all_headers which was called 936 times, avg 5.40ms/call: # 702 times (1.31s+2.62s) by Mail::SpamAssassin::Message::receive_date at line 699 of Mail/SpamAssassin/Message.pm, avg 5.59ms/call # 234 times (371ms+761ms) by Mail::SpamAssassin::Plugin::Bayes::_tokenize_headers at line 1293 of Mail/SpamAssassin/Plugin/Bayes.pm, avg 4.84ms/call
sub get_all_headers {
8939362.26ms my ($self, $raw, $include_mbox) = @_;
8949362.54ms $raw ||= 0;
8959362.67ms $include_mbox ||= 0;
896
8979361.78ms my @lines;
898
899 # precalculate destination positions based on order of appearance
9009362.21ms my $i = 0;
9019362.05ms my %locations;
902187211.1ms for my $k (@{$self->{header_order}}) {
90359280398ms push(@{$locations{lc($k)}}, $i++);
904 }
905
906 # process headers in order of first appearance
9079362.10ms my $header;
9089362.48ms my $size = 0;
90996681244ms936209ms HEADER: for my $name (sort { $locations{$a}->[0] <=> $locations{$b}->[0] }
# spent 209ms making 936 calls to Mail::SpamAssassin::Message::Node::CORE:sort, avg 223µs/call
910 keys %locations)
911 {
912 # get all same-name headers and poke into correct position
9132560848.8ms my $positions = $locations{$name};
91425608302ms256083.17s for my $contents ($self->get_header($name, $raw)) {
# spent 3.17s making 25608 calls to Mail::SpamAssassin::Message::Node::get_header, avg 124µs/call
91559280156ms my $position = shift @{$positions};
9162964094.5ms $size += length($name) + length($contents) + 2;
9172964051.1ms if ($size > MAX_HEADER_LENGTH) {
918 $self->{'truncated_header'} = 1;
919 last HEADER;
920 }
92129640355ms $lines[$position] = $self->{header_order}->[$position].":".$contents;
922 }
923 }
924
925 # skip undefined lines if we truncated
9269363.04ms @lines = grep { defined $_ } @lines if $self->{'truncated_header'};
927
9289362.57ms splice @lines, 0, 0, $self->{mbox_sep} if ( $include_mbox && exists $self->{mbox_sep} );
929
93093644.4ms return wantarray ? @lines : join ('', @lines);
931}
932
933# legacy public API; now a no-op.
934sub finish { }
935
936# ---------------------------------------------------------------------------
937
938113µs1;
939__END__
 
# spent 194ms within Mail::SpamAssassin::Message::Node::CORE:match which was called 66604 times, avg 3µs/call: # 29640 times (77.0ms+0s) by Mail::SpamAssassin::Message::Node::delete_header at line 759, avg 3µs/call # 25608 times (45.0ms+0s) by Mail::SpamAssassin::Message::Node::delete_header at line 754, avg 2µs/call # 7950 times (55.2ms+0s) by Mail::SpamAssassin::Message::Node::_decode_header at line 797, avg 7µs/call # 1866 times (7.15ms+0s) by Mail::SpamAssassin::Message::Node::find_parts at line 128, avg 4µs/call # 623 times (3.21ms+0s) by Mail::SpamAssassin::Message::Node::rendered at line 609, avg 5µs/call # 457 times (3.70ms+0s) by Mail::SpamAssassin::Message::Node::rendered at line 602, avg 8µs/call # 217 times (1.21ms+0s) by Mail::SpamAssassin::Message::Node::_html_render at line 388, avg 6µs/call # 188 times (1.08ms+0s) by Mail::SpamAssassin::Message::Node::rendered at line 617, avg 6µs/call # 55 times (363µs+0s) by Mail::SpamAssassin::Message::Node::decode at line 355, avg 7µs/call
sub Mail::SpamAssassin::Message::Node::CORE:match; # opcode
# spent 196ms within Mail::SpamAssassin::Message::Node::CORE:regcomp which was called 57114 times, avg 3µs/call: # 29640 times (95.5ms+0s) by Mail::SpamAssassin::Message::Node::delete_header at line 759, avg 3µs/call # 25608 times (90.1ms+0s) by Mail::SpamAssassin::Message::Node::delete_header at line 754, avg 4µs/call # 1866 times (10.3ms+0s) by Mail::SpamAssassin::Message::Node::find_parts at line 128, avg 6µs/call
sub Mail::SpamAssassin::Message::Node::CORE:regcomp; # opcode
# spent 209ms within Mail::SpamAssassin::Message::Node::CORE:sort which was called 936 times, avg 223µs/call: # 936 times (209ms+0s) by Mail::SpamAssassin::Message::Node::get_all_headers at line 909, avg 223µs/call
sub Mail::SpamAssassin::Message::Node::CORE:sort; # opcode
# spent 613ms within Mail::SpamAssassin::Message::Node::CORE:subst which was called 133285 times, avg 5µs/call: # 40380 times (201ms+0s) by Mail::SpamAssassin::Message::Node::header at line 172, avg 5µs/call # 40380 times (96.3ms+0s) by Mail::SpamAssassin::Message::Node::header at line 173, avg 2µs/call # 8191 times (93.9ms+0s) by Mail::SpamAssassin::Message::Node::header at line 187, avg 11µs/call # 8191 times (68.6ms+0s) by Mail::SpamAssassin::Message::Node::header at line 188, avg 8µs/call # 8191 times (58.3ms+0s) by Mail::SpamAssassin::Message::Node::header at line 186, avg 7µs/call # 7950 times (25.1ms+0s) by Mail::SpamAssassin::Message::Node::_decode_header at line 794, avg 3µs/call # 7950 times (17.7ms+0s) by Mail::SpamAssassin::Message::Node::_decode_header at line 795, avg 2µs/call # 4917 times (14.1ms+0s) by Mail::SpamAssassin::Message::Node::_decode_header at line 809, avg 3µs/call # 4903 times (21.8ms+0s) by Mail::SpamAssassin::Message::Node::_decode_header at line 817, avg 4µs/call # 936 times (2.82ms+0s) by Mail::SpamAssassin::Message::Node::raw_header at line 228, avg 3µs/call # 936 times (2.32ms+0s) by Mail::SpamAssassin::Message::Node::raw_header at line 229, avg 2µs/call # 202 times (3.14ms+0s) by Mail::SpamAssassin::Message::Node::decode at line 340, avg 16µs/call # 103 times (882µs+0s) by Mail::SpamAssassin::Message::Node::__decode_header at line 775, avg 9µs/call # 55 times (6.54ms+0s) by Mail::SpamAssassin::Message::Node::decode at line 356, avg 119µs/call
sub Mail::SpamAssassin::Message::Node::CORE:subst; # opcode
# spent 2.26ms within Mail::SpamAssassin::Message::Node::CORE:substcont which was called 250 times, avg 9µs/call: # 157 times (1.52ms+0s) by Mail::SpamAssassin::Message::Node::_decode_header at line 817, avg 10µs/call # 93 times (737µs+0s) by Mail::SpamAssassin::Message::Node::_decode_header at line 809, avg 8µs/call
sub Mail::SpamAssassin::Message::Node::CORE:substcont; # opcode