← Index
NYTProf Performance Profile   « line view »
For /usr/local/bin/sa-learn
  Run on Tue Nov 7 05:38:10 2017
Reported on Tue Nov 7 06:16:02 2017

Filename/usr/local/lib/perl5/site_perl/Mail/SpamAssassin/Message/Node.pm
StatementsExecuted 1248929 statements in 8.69s
Subroutines
Calls P F Exclusive
Time
Inclusive
Time
Subroutine
40032922.73s3.86sMail::SpamAssassin::Message::Node::::headerMail::SpamAssassin::Message::Node::header
923221.56s4.42sMail::SpamAssassin::Message::Node::::get_all_headersMail::SpamAssassin::Message::Node::get_all_headers
29917861.29s3.11sMail::SpamAssassin::Message::Node::::get_headerMail::SpamAssassin::Message::Node::get_header
94041902ms1.20sMail::SpamAssassin::Message::Node::::delete_headerMail::SpamAssassin::Message::Node::delete_header
822511574ms707msMail::SpamAssassin::Message::Node::::_decode_headerMail::SpamAssassin::Message::Node::_decode_header
132804141504ms504msMail::SpamAssassin::Message::Node::::CORE:substMail::SpamAssassin::Message::Node::CORE:subst (opcode)
92311204ms204msMail::SpamAssassin::Message::Node::::CORE:sortMail::SpamAssassin::Message::Node::CORE:sort (opcode)
5734731193ms193msMail::SpamAssassin::Message::Node::::CORE:regcompMail::SpamAssassin::Message::Node::CORE:regcomp (opcode)
6687791181ms181msMail::SpamAssassin::Message::Node::::CORE:matchMail::SpamAssassin::Message::Node::CORE:match (opcode)
70511111ms148msMail::SpamAssassin::Message::Node::::find_partsMail::SpamAssassin::Message::Node::find_parts
12813290.2ms21.7sMail::SpamAssassin::Message::Node::::renderedMail::SpamAssassin::Message::Node::rendered
4111182.5ms765msMail::SpamAssassin::Message::Node::::decodeMail::SpamAssassin::Message::Node::decode
9401142.5ms47.5msMail::SpamAssassin::Message::Node::::raw_headerMail::SpamAssassin::Message::Node::raw_header
18751119.3ms19.3msMail::SpamAssassin::Message::Node::::is_leafMail::SpamAssassin::Message::Node::is_leaf
6253118.0ms18.0msMail::SpamAssassin::Message::Node::::newMail::SpamAssassin::Message::Node::new
11116.2ms33.7msMail::SpamAssassin::Message::Node::::BEGIN@45Mail::SpamAssassin::Message::Node::BEGIN@45
3901111.5ms14.8msMail::SpamAssassin::Message::Node::::add_body_partMail::SpamAssassin::Message::Node::add_body_part
4271110.9ms14.6msMail::SpamAssassin::Message::Node::::invisible_renderedMail::SpamAssassin::Message::Node::invisible_rendered
427119.80ms15.9msMail::SpamAssassin::Message::Node::::visible_renderedMail::SpamAssassin::Message::Node::visible_rendered
118116.69ms16.7msMail::SpamAssassin::Message::Node::::__decode_headerMail::SpamAssassin::Message::Node::__decode_header
218113.20ms4.31msMail::SpamAssassin::Message::Node::::_html_renderMail::SpamAssassin::Message::Node::_html_render
118111.98ms1.98msMail::SpamAssassin::Message::Node::::_normalizeMail::SpamAssassin::Message::Node::_normalize
250211.97ms1.97msMail::SpamAssassin::Message::Node::::CORE:substcontMail::SpamAssassin::Message::Node::CORE:substcont (opcode)
111659µs5.39msMail::SpamAssassin::Message::Node::::BEGIN@49Mail::SpamAssassin::Message::Node::BEGIN@49
11168µs77µsMail::SpamAssassin::Message::Node::::BEGIN@37Mail::SpamAssassin::Message::Node::BEGIN@37
11140µs75µsMail::SpamAssassin::Message::Node::::BEGIN@38Mail::SpamAssassin::Message::Node::BEGIN@38
11126µs761µsMail::SpamAssassin::Message::Node::::BEGIN@44Mail::SpamAssassin::Message::Node::BEGIN@44
11125µs81µsMail::SpamAssassin::Message::Node::::BEGIN@39Mail::SpamAssassin::Message::Node::BEGIN@39
11125µs214µsMail::SpamAssassin::Message::Node::::BEGIN@46Mail::SpamAssassin::Message::Node::BEGIN@46
11114µs14µsMail::SpamAssassin::Message::Node::::BEGIN@43Mail::SpamAssassin::Message::Node::BEGIN@43
0000s0sMail::SpamAssassin::Message::Node::::content_summaryMail::SpamAssassin::Message::Node::content_summary
0000s0sMail::SpamAssassin::Message::Node::::finishMail::SpamAssassin::Message::Node::finish
0000s0sMail::SpamAssassin::Message::Node::::rawMail::SpamAssassin::Message::Node::raw
0000s0sMail::SpamAssassin::Message::Node::::set_renderedMail::SpamAssassin::Message::Node::set_rendered
Call graph for these subroutines as a Graphviz dot language file.
Line State
ments
Time
on line
Calls Time
in subs
Code
1# <@LICENSE>
2# Licensed to the Apache Software Foundation (ASF) under one or more
3# contributor license agreements. See the NOTICE file distributed with
4# this work for additional information regarding copyright ownership.
5# The ASF licenses this file to you under the Apache License, Version 2.0
6# (the "License"); you may not use this file except in compliance with
7# the License. You may obtain a copy of the License at:
8#
9# http://www.apache.org/licenses/LICENSE-2.0
10#
11# Unless required by applicable law or agreed to in writing, software
12# distributed under the License is distributed on an "AS IS" BASIS,
13# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14# See the License for the specific language governing permissions and
15# limitations under the License.
16# </@LICENSE>
17
18=head1 NAME
19
20Mail::SpamAssassin::Message::Node - decode, render, and make available MIME message parts
21
22=head1 SYNOPSIS
23
24=head1 DESCRIPTION
25
26This module will encapsulate an email message and allow access to
27the various MIME message parts.
28
29=head1 PUBLIC METHODS
30
31=over 4
32
33=cut
34
35package Mail::SpamAssassin::Message::Node;
36
37259µs286µs
# spent 77µs (68+9) within Mail::SpamAssassin::Message::Node::BEGIN@37 which was called: # once (68µs+9µs) by Mail::SpamAssassin::Message::BEGIN@55 at line 37
use strict;
# spent 77µs making 1 call to Mail::SpamAssassin::Message::Node::BEGIN@37 # spent 9µs making 1 call to strict::import
38277µs2109µs
# spent 75µs (40+34) within Mail::SpamAssassin::Message::Node::BEGIN@38 which was called: # once (40µs+34µs) by Mail::SpamAssassin::Message::BEGIN@55 at line 38
use warnings;
# spent 75µs making 1 call to Mail::SpamAssassin::Message::Node::BEGIN@38 # spent 34µs making 1 call to warnings::import
39280µs2137µs
# spent 81µs (25+56) within Mail::SpamAssassin::Message::Node::BEGIN@39 which was called: # once (25µs+56µs) by Mail::SpamAssassin::Message::BEGIN@55 at line 39
use re 'taint';
# spent 81µs making 1 call to Mail::SpamAssassin::Message::Node::BEGIN@39 # spent 56µs making 1 call to re::import
40
41123µsrequire 5.008001; # needs utf8::is_utf8()
42
43276µs114µs
# spent 14µs within Mail::SpamAssassin::Message::Node::BEGIN@43 which was called: # once (14µs+0s) by Mail::SpamAssassin::Message::BEGIN@55 at line 43
use Mail::SpamAssassin;
# spent 14µs making 1 call to Mail::SpamAssassin::Message::Node::BEGIN@43
44272µs21.50ms
# spent 761µs (26+735) within Mail::SpamAssassin::Message::Node::BEGIN@44 which was called: # once (26µs+735µs) by Mail::SpamAssassin::Message::BEGIN@55 at line 44
use Mail::SpamAssassin::Constants qw(:sa);
# spent 761µs making 1 call to Mail::SpamAssassin::Message::Node::BEGIN@44 # spent 735µs making 1 call to Exporter::import
452339µs133.7ms
# spent 33.7ms (16.2+17.5) within Mail::SpamAssassin::Message::Node::BEGIN@45 which was called: # once (16.2ms+17.5ms) by Mail::SpamAssassin::Message::BEGIN@55 at line 45
use Mail::SpamAssassin::HTML;
# spent 33.7ms making 1 call to Mail::SpamAssassin::Message::Node::BEGIN@45
462229µs2403µs
# spent 214µs (25+189) within Mail::SpamAssassin::Message::Node::BEGIN@46 which was called: # once (25µs+189µs) by Mail::SpamAssassin::Message::BEGIN@55 at line 46
use Mail::SpamAssassin::Logger;
# spent 214µs making 1 call to Mail::SpamAssassin::Message::Node::BEGIN@46 # spent 189µs making 1 call to Exporter::import
47
48our($enc_utf8, $enc_w1252, $have_encode_detector);
49
# spent 5.39ms (659µs+4.73) within Mail::SpamAssassin::Message::Node::BEGIN@49 which was called: # once (659µs+4.73ms) by Mail::SpamAssassin::Message::BEGIN@55 at line 55
BEGIN {
5015µs eval { require Encode }
51214µs1325µs and do { $enc_utf8 = Encode::find_encoding('UTF-8');
# spent 325µs making 1 call to Encode::find_encoding
5216µs11.38ms $enc_w1252 = Encode::find_encoding('Windows-1252') };
# spent 1.38ms making 1 call to Encode::find_encoding
531346µs eval { require Encode::Detect::Detector }
54223µs and do { $have_encode_detector = 1 };
5517.93ms15.39ms};
# spent 5.39ms making 1 call to Mail::SpamAssassin::Message::Node::BEGIN@49
56
57=item new()
58
59Generates an empty Node object and returns it. Typically only called
60by functions in Message.
61
62=cut
63
64
# spent 18.0ms within Mail::SpamAssassin::Message::Node::new which was called 625 times, avg 29µs/call: # 235 times (7.16ms+0s) by Mail::SpamAssassin::Message::new at line 115 of Mail/SpamAssassin/Message.pm, avg 30µs/call # 198 times (5.24ms+0s) by Mail::SpamAssassin::Message::_parse_multipart at line 873 of Mail/SpamAssassin/Message.pm, avg 26µs/call # 192 times (5.59ms+0s) by Mail::SpamAssassin::Message::_parse_multipart at line 952 of Mail/SpamAssassin/Message.pm, avg 29µs/call
sub new {
656251.52ms my $class = shift;
666251.47ms $class = ref($class) || $class;
67
686255.55ms my $self = {
69 headers => {},
70 raw_headers => {},
71 header_order => []
72 };
73
74 # deal with any parameters
756251.29ms my($opts) = @_;
766252.51ms $self->{normalize} = $opts->{'normalize'} || 0;
77
786251.76ms bless($self,$class);
7962521.7ms $self;
80}
81
82=item find_parts()
83
84Used to search the tree for specific MIME parts. An array of matching
85Node objects (pointers into the tree) is returned. The parameters that
86can be passed in are (in order, all scalars):
87
88Regexp - Used to match against each part's Content-Type header,
89specifically the type and not the rest of the header. ie: "Content-type:
90text/html; encoding=quoted-printable" has a type of "text/html". If no
91regexp is specified, find_parts() will return an empty array.
92
93Only_leaves - By default, find_parts() will return any part that matches
94the regexp, including multipart. If you only want to see leaves of the
95tree (ie: parts that aren't multipart), set this to true (1).
96
97Recursive - By default, when find_parts() finds a multipart which has
98parts underneath it, it will recurse through all sub-children. If set to 0,
99only look at the part and any direct children of the part.
100
101=cut
102
103# Used to find any MIME parts whose simple content-type matches a given regexp
104# Searches it's own and any children parts. Returns an array of MIME
105# objects which match. Our callers may expect the default behavior which is a
106# depth-first array of parts.
107#
108
# spent 148ms (111+36.4) within Mail::SpamAssassin::Message::Node::find_parts which was called 705 times, avg 209µs/call: # 705 times (111ms+36.4ms) by Mail::SpamAssassin::Message::find_parts at line 423 of Mail/SpamAssassin/Message.pm, avg 209µs/call
sub find_parts {
1097051.80ms my ($self, $re, $onlyleaves, $recursive) = @_;
110
111 # Didn't pass an RE? Just abort.
1127052.12ms return () unless defined $re && $re ne '';
113
1147051.30ms $onlyleaves = 0 unless defined $onlyleaves;
115
1167051.28ms my $depth;
1177051.34ms if (defined $recursive && $recursive == 0) {
118 $depth = 1;
119 }
120
1217051.40ms my @ret;
1227052.46ms my @search = ( $self );
123
12470515.3ms while (my $part = shift @search) {
125 # If this object matches, mark it for return.
126187515.0ms187519.3ms my $amialeaf = $part->is_leaf();
# spent 19.3ms making 1875 calls to Mail::SpamAssassin::Message::Node::is_leaf, avg 10µs/call
127
128187553.4ms375017.1ms if ( $part->{'type'} =~ /$re/ && (!$onlyleaves || $amialeaf) ) {
# spent 9.40ms making 1875 calls to Mail::SpamAssassin::Message::Node::CORE:regcomp, avg 5µs/call # spent 7.66ms making 1875 calls to Mail::SpamAssassin::Message::Node::CORE:match, avg 4µs/call
12912812.88ms push(@ret, $part);
130 }
131
13218757.29ms if ( !$amialeaf && (!defined $depth || $depth > 0)) {
1335941.09ms $depth-- if defined $depth;
13411884.67ms unshift(@search, @{$part->{'body_parts'}});
135 }
136 }
137
13870512.6ms return @ret;
139}
140
141=item header()
142
143Stores and retrieves headers from a specific MIME part. The first
144parameter is the header name. If there is no other parameter, the header
145is retrieved. If there is a second parameter, the header is stored.
146
147Header names are case-insensitive and are stored in both raw and
148decoded form. Using header(), only the decoded form is retrievable.
149
150For retrieval, if header() is called in an array context, an array will
151be returned with each header entry in a different element. In a scalar
152context, the last specific header is returned.
153
154ie: If 'Subject' is specified as the header, and there are 2 Subject
155headers in a message, the last/bottom one in the message is returned in
156scalar context or both are returned in array context.
157
158=cut
159
160# Store or retrieve headers from a given MIME object
161#
162
# spent 3.86s (2.73+1.13) within Mail::SpamAssassin::Message::Node::header which was called 40032 times, avg 96µs/call: # 29917 times (1.66s+163ms) by Mail::SpamAssassin::Message::Node::get_header at line 863, avg 61µs/call # 7440 times (876ms+891ms) by Mail::SpamAssassin::Message::new at line 282 of Mail/SpamAssassin/Message.pm, avg 237µs/call # 427 times (19.7ms+3.41ms) by Mail::SpamAssassin::Message::_parse_normal at line 1034 of Mail/SpamAssassin/Message.pm, avg 54µs/call # 427 times (17.5ms+4.55ms) by Mail::SpamAssassin::Message::_parse_normal at line 1042 of Mail/SpamAssassin/Message.pm, avg 52µs/call # 411 times (19.2ms+2.81ms) by Mail::SpamAssassin::Message::Node::decode at line 335, avg 54µs/call # 395 times (46.3ms+31.8ms) by Mail::SpamAssassin::Message::_parse_multipart at line 965 of Mail/SpamAssassin/Message.pm, avg 198µs/call # 390 times (45.7ms+30.2ms) by Mail::SpamAssassin::Message::_parse_multipart at line 978 of Mail/SpamAssassin/Message.pm, avg 195µs/call # 390 times (28.0ms+2.31ms) by Mail::SpamAssassin::Message::_parse_multipart at line 922 of Mail/SpamAssassin/Message.pm, avg 78µs/call # 235 times (19.6ms+1.38ms) by Mail::SpamAssassin::Message::new at line 363 of Mail/SpamAssassin/Message.pm, avg 89µs/call
sub header {
1634003270.2ms my $self = shift;
1644003283.6ms my $rawkey = shift;
165
1664003266.7ms return unless defined $rawkey;
167
168 # we're going to do things case insensitively
1694003298.4ms my $key = lc($rawkey);
170
171 # Trim whitespace off of the header keys
17240032587ms40032137ms $key =~ s/^\s+//;
# spent 137ms making 40032 calls to Mail::SpamAssassin::Message::Node::CORE:subst, avg 3µs/call
17340032572ms4003293.5ms $key =~ s/\s+$//;
# spent 93.5ms making 40032 calls to Mail::SpamAssassin::Message::Node::CORE:subst, avg 2µs/call
174
1754003270.7ms if (@_) {
176822520.8ms my $raw_value = shift;
177822514.3ms return unless defined $raw_value;
178
1791645069.4ms push @{ $self->{'header_order'} }, $rawkey;
180822529.9ms if ( !exists $self->{'headers'}->{$key} ) {
181721360.7ms $self->{'headers'}->{$key} = [];
182721323.4ms $self->{'raw_headers'}->{$key} = [];
183 }
184
185822519.5ms my $dec_value = $raw_value;
1868225122ms822553.0ms $dec_value =~ s/\n[ \t]+/ /gs;
# spent 53.0ms making 8225 calls to Mail::SpamAssassin::Message::Node::CORE:subst, avg 6µs/call
1878225177ms822579.8ms $dec_value =~ s/\s+$//s;
# spent 79.8ms making 8225 calls to Mail::SpamAssassin::Message::Node::CORE:subst, avg 10µs/call
1888225119ms822559.7ms $dec_value =~ s/^\s+//s;
# spent 59.7ms making 8225 calls to Mail::SpamAssassin::Message::Node::CORE:subst, avg 7µs/call
18916450102ms8225707ms push @{ $self->{'headers'}->{$key} }, $self->_decode_header($dec_value,$key);
# spent 707ms making 8225 calls to Mail::SpamAssassin::Message::Node::_decode_header, avg 86µs/call
190
1911645069.5ms push @{ $self->{'raw_headers'}->{$key} }, $raw_value;
192
1938225112ms return $self->{'headers'}->{$key}->[-1];
194 }
195
1963180756.9ms if (wantarray) {
19730969162ms return unless exists $self->{'headers'}->{$key};
19857148592ms return @{ $self->{'headers'}->{$key} };
199 }
200 else {
2018387.15ms return '' unless exists $self->{'headers'}->{$key};
2023855.14ms return $self->{'headers'}->{$key}->[-1];
203 }
204}
205
206=item raw_header()
207
208Retrieves the raw version of headers from a specific MIME part. The only
209parameter is the header name. Header names are case-insensitive.
210
211For retrieval, if raw_header() is called in an array context, an array
212will be returned with each header entry in a different element. In a
213scalar context, the last specific header is returned.
214
215ie: If 'Subject' is specified as the header, and there are 2 Subject
216headers in a message, the last/bottom one in the message is returned in
217scalar context or both are returned in array context.
218
219=cut
220
221# Retrieve raw headers from a given MIME object
222#
223
# spent 47.5ms (42.5+5.08) within Mail::SpamAssassin::Message::Node::raw_header which was called 940 times, avg 51µs/call: # 940 times (42.5ms+5.08ms) by Mail::SpamAssassin::PerMsgStatus::_get at line 1982 of Mail/SpamAssassin/PerMsgStatus.pm, avg 51µs/call
sub raw_header {
2249401.86ms my $self = shift;
2259402.49ms my $key = lc(shift);
226
227 # Trim whitespace off of the header keys
2289409.67ms9402.81ms $key =~ s/^\s+//;
# spent 2.81ms making 940 calls to Mail::SpamAssassin::Message::Node::CORE:subst, avg 3µs/call
22994019.8ms9402.27ms $key =~ s/\s+$//;
# spent 2.27ms making 940 calls to Mail::SpamAssassin::Message::Node::CORE:subst, avg 2µs/call
230
2319401.79ms if (wantarray) {
2329409.63ms return unless exists $self->{'raw_headers'}->{$key};
2334703.65ms return @{ $self->{'raw_headers'}->{$key} };
234 }
235 else {
236 return '' unless exists $self->{'raw_headers'}->{$key};
237 return $self->{'raw_headers'}->{$key}->[-1];
238 }
239}
240
241=item add_body_part()
242
243Adds a Node child object to the current node object.
244
245=cut
246
247# Add a MIME child part to ourselves
248
# spent 14.8ms (11.5+3.29) within Mail::SpamAssassin::Message::Node::add_body_part which was called 390 times, avg 38µs/call: # 390 times (11.5ms+3.29ms) by Mail::SpamAssassin::Message::_parse_multipart at line 930 of Mail/SpamAssassin/Message.pm, avg 38µs/call
sub add_body_part {
249390851µs my($self, $part) = @_;
250
2513904.12ms3903.29ms dbg("message: added part, type: ".$part->{'type'});
# spent 3.29ms making 390 calls to Mail::SpamAssassin::Logger::dbg, avg 8µs/call
2527805.73ms push @{ $self->{'body_parts'} }, $part;
253}
254
255=item is_leaf()
256
257Returns true if the tree node in question is a leaf of the tree (ie:
258has no children of its own). Note: This function may return odd results
259unless the message has been mime parsed via _do_parse()!
260
261=cut
262
263
# spent 19.3ms within Mail::SpamAssassin::Message::Node::is_leaf which was called 1875 times, avg 10µs/call: # 1875 times (19.3ms+0s) by Mail::SpamAssassin::Message::Node::find_parts at line 126, avg 10µs/call
sub is_leaf {
26418753.48ms my($self) = @_;
265187527.6ms return !exists $self->{'body_parts'};
266}
267
268=item raw()
269
270Return a reference to the the raw array. Treat this as READ ONLY.
271
272=cut
273
274sub raw {
275 my $self = shift;
276
277 # Ok, if we're called we are expected to return an array.
278 # so if it's a file reference, read in the message into an array...
279 #
280 # NOTE: that "ref undef" works, so don't bother checking for a defined var
281 # first.
282 if (ref $self->{'raw'} eq 'GLOB') {
283 my $fd = $self->{'raw'};
284 seek($fd, 0, 0) or die "message: cannot rewind file: $!";
285
286 # dbg("message: (raw) reading mime part from a temporary file");
287 my($nread,$raw_str); $raw_str = '';
288 while ( $nread=sysread($fd, $raw_str, 16384, length $raw_str) ) { }
289 defined $nread or die "error reading: $!";
290 my @array = split(/^/m, $raw_str, -1);
291
292 dbg("message: empty message read") if $raw_str eq '';
293 return \@array;
294 }
295
296 return $self->{'raw'};
297}
298
299=item decode()
300
301If necessary, decode the part text as base64 or quoted-printable.
302The decoded text will be returned as a scalar string. An optional length
303parameter can be passed in which limits how much decoded data is returned.
304If the scalar isn't needed, call with "0" as a parameter.
305
306=cut
307
308
# spent 765ms (82.5+683) within Mail::SpamAssassin::Message::Node::decode which was called 411 times, avg 1.86ms/call: # 411 times (82.5ms+683ms) by Mail::SpamAssassin::Message::Node::rendered at line 604, avg 1.86ms/call
sub decode {
3094111.16ms my($self, $bytes) = @_;
310
3114111.77ms if ( !exists $self->{'decoded'} ) {
312 # Someone is looking for a decoded part where there is no raw data
313 # (multipart or subparsed message, etc.) Just return undef.
314411930µs return if !exists $self->{'raw'};
315
316411743µs my $raw;
317
318 # if the part is held in a temp file, read it into the scalar
3194113.17ms if (ref $self->{'raw'} eq 'GLOB') {
320 my $fd = $self->{'raw'};
321 seek($fd, 0, 0) or die "message: cannot rewind file: $!";
322
323 # dbg("message: (decode) reading mime part from a temporary file");
324 my($nread,$raw_str); $raw = '';
325 while ( $nread=sysread($fd, $raw, 16384, length $raw) ) { }
326 defined $nread or die "error reading: $!";
327
328 dbg("message: empty message read from a temp file") if $raw eq '';
329 }
330 else {
331 # create a new scalar from the raw array in memory
33282226.8ms $raw = join('', @{$self->{'raw'}});
333 }
334
3354113.88ms41122.0ms my $encoding = lc $self->header('content-transfer-encoding') || '';
# spent 22.0ms making 411 calls to Mail::SpamAssassin::Message::Node::header, avg 54µs/call
336
3374112.49ms if ( $encoding eq 'quoted-printable' ) {
3382031.36ms2031.34ms dbg("message: decoding quoted-printable");
# spent 1.34ms making 203 calls to Mail::SpamAssassin::Logger::dbg, avg 7µs/call
3392032.24ms203574ms $self->{'decoded'} = Mail::SpamAssassin::Util::qp_decode($raw);
# spent 574ms making 203 calls to Mail::SpamAssassin::Util::qp_decode, avg 2.83ms/call
34020312.2ms2032.99ms $self->{'decoded'} =~ s/\015\012/\012/gs;
# spent 2.99ms making 203 calls to Mail::SpamAssassin::Message::Node::CORE:subst, avg 15µs/call
341 }
342 elsif ( $encoding eq 'base64' ) {
34355399µs55402µs dbg("message: decoding base64");
# spent 402µs making 55 calls to Mail::SpamAssassin::Logger::dbg, avg 7µs/call
344
345 # if it's not defined or is 0, do the whole thing, otherwise only decode
346 # a portion
34755204µs if ($bytes) {
348 return Mail::SpamAssassin::Util::base64_decode($raw, $bytes);
349 }
350 else {
351 # Generate the decoded output
35255643µs5574.2ms $self->{'decoded'} = Mail::SpamAssassin::Util::base64_decode($raw);
# spent 74.2ms making 55 calls to Mail::SpamAssassin::Util::base64_decode, avg 1.35ms/call
353 }
354
35555979µs55360µs if ( $self->{'type'} =~ m@^(?:text|message)\b/@i ) {
# spent 360µs making 55 calls to Mail::SpamAssassin::Message::Node::CORE:match, avg 7µs/call
356556.84ms556.28ms $self->{'decoded'} =~ s/\015\012/\012/gs;
# spent 6.28ms making 55 calls to Mail::SpamAssassin::Message::Node::CORE:subst, avg 114µs/call
357 }
358 }
359 else {
360 # Encoding is one of 7bit, 8bit, binary or x-something
361153867µs if ( $encoding ) {
3621071.06ms107804µs dbg("message: decoding other encoding type ($encoding), ignoring");
# spent 804µs making 107 calls to Mail::SpamAssassin::Logger::dbg, avg 8µs/call
363 }
364 else {
36546319µs46302µs dbg("message: no encoding detected");
# spent 302µs making 46 calls to Mail::SpamAssassin::Logger::dbg, avg 7µs/call
366 }
367153740µs $self->{'decoded'} = $raw;
368 }
369 }
370
371411927µs if ( !defined $bytes || $bytes ) {
372411878µs if ( !defined $bytes ) {
373 # force a copy
3744119.02ms return '' . $self->{'decoded'};
375 }
376 else {
377 return substr($self->{'decoded'}, 0, $bytes);
378 }
379 }
380}
381
382# Look at a text scalar and determine whether it should be rendered
383# as text/html.
384#
385# This is not a public function.
386#
387
# spent 4.31ms (3.20+1.11) within Mail::SpamAssassin::Message::Node::_html_render which was called 218 times, avg 20µs/call: # 218 times (3.20ms+1.11ms) by Mail::SpamAssassin::Message::Node::rendered at line 609, avg 20µs/call
sub _html_render {
3882182.84ms2181.11ms if ($_[0] =~ m/^(.{0,18}?<(?:body|head|html|img|pre|table|title)(?:\s.{0,18}?)?>)/is)
# spent 1.11ms making 218 calls to Mail::SpamAssassin::Message::Node::CORE:match, avg 5µs/call
389 {
390 my $pad = $1;
391 my $count = 0;
392 $count += ($pad =~ tr/\n//d) * 2;
393 $count += ($pad =~ tr/\n//cd);
394 return ($count < 24);
395 }
3962181.89ms return 0;
397}
398
399# Decode character set of a given text to perl characters (Unicode),
400# then encode into UTF-8 octets if requested.
401#
402
# spent 1.98ms within Mail::SpamAssassin::Message::Node::_normalize which was called 118 times, avg 17µs/call: # 118 times (1.98ms+0s) by Mail::SpamAssassin::Message::Node::__decode_header at line 783, avg 17µs/call
sub _normalize {
403118242µs my $self = $_[0];
404# my $data = $_[1]; # avoid copying large strings
405118293µs my $charset_declared = $_[2];
406118218µs my $return_decoded = $_[3]; # true: Unicode characters, false: UTF-8 octets
407
4081181.81ms return $_[1] unless $self->{normalize} && $enc_utf8;
409
410 warn "message: _normalize() was given characters, expected bytes: $_[1]\n"
411 if utf8::is_utf8($_[1]);
412
413 # workaround for Encode::decode taint laundering bug [rt.cpan.org #84879]
414 my $data_taint = substr($_[1], 0, 0); # empty string, tainted like $data
415
416 if (!defined $charset_declared || $charset_declared eq '') {
417 $charset_declared = 'us-ascii';
418 }
419
420 # number of characters with code above 127
421 my $cnt_8bits = $_[1] =~ tr/\x00-\x7F//c;
422
423 if (!$cnt_8bits &&
424 $charset_declared =~
425 /^(?: (?:US-)?ASCII | ANSI[_ ]? X3\.4- (?:1986|1968) |
426 ISO646-US )\z/xsi)
427 { # declared as US-ASCII (a.k.a. ANSI X3.4-1986) and it really is
428 dbg("message: kept, charset is US-ASCII as declared");
429 return $_[1]; # is all-ASCII, no need for decoding
430 }
431
432 if (!$cnt_8bits &&
433 $charset_declared =~
434 /^(?: ISO[ -]?8859 (?: - \d{1,2} )? | Windows-\d{4} |
435 UTF-?8 | (KOI8|EUC)-[A-Z]{1,2} |
436 Big5 | GBK | GB[ -]?18030 (?:-20\d\d)? )\z/xsi)
437 { # declared as extended ASCII, but it is actually a plain 7-bit US-ASCII
438 dbg("message: kept, charset is US-ASCII, declared %s", $charset_declared);
439 return $_[1]; # is all-ASCII, no need for decoding
440 }
441
442 # Try first to strictly decode based on a declared character set.
443
444 my $rv;
445 if ($charset_declared =~ /^UTF-?8\z/i) {
446 # attempt decoding as strict UTF-8 (flags: FB_CROAK | LEAVE_SRC)
447 if (eval { $rv = $enc_utf8->decode($_[1], 1|8); defined $rv }) {
448 dbg("message: decoded as declared charset UTF-8");
449 return $_[1] if !$return_decoded;
450 $rv .= $data_taint; # carry taintedness over, avoid Encode bug
451 return $rv; # decoded
452 } else {
453 dbg("message: failed decoding as declared charset UTF-8");
454 };
455
456 } elsif ($cnt_8bits &&
457 eval { $rv = $enc_utf8->decode($_[1], 1|8); defined $rv }) {
458 dbg("message: decoded as charset UTF-8, declared %s", $charset_declared);
459 return $_[1] if !$return_decoded;
460 $rv .= $data_taint; # carry taintedness over, avoid Encode bug
461 return $rv; # decoded
462
463 } elsif ($charset_declared =~ /^(?:US-)?ASCII\z/i) {
464 # declared as US-ASCII but contains 8-bit characters, makes no sense
465 # to attempt decoding first as strict US-ASCII as we know it would fail
466
467 } else {
468 # try decoding as a declared character set
469
470 # -> http://en.wikipedia.org/wiki/Windows-1252
471 # Windows-1252 character encoding is a superset of ISO 8859-1, but differs
472 # from the IANA's ISO-8859-1 by using displayable characters rather than
473 # control characters in the 80 to 9F (hex) range. [...]
474 # It is very common to mislabel Windows-1252 text with the charset label
475 # ISO-8859-1. A common result was that all the quotes and apostrophes
476 # (produced by "smart quotes" in word-processing software) were replaced
477 # with question marks or boxes on non-Windows operating systems, making
478 # text difficult to read. Most modern web browsers and e-mail clients
479 # treat the MIME charset ISO-8859-1 as Windows-1252 to accommodate
480 # such mislabeling. This is now standard behavior in the draft HTML 5
481 # specification, which requires that documents advertised as ISO-8859-1
482 # actually be parsed with the Windows-1252 encoding.
483 #
484 my($chset, $decoder);
485 if ($charset_declared =~ /^(?: ISO-?8859-1 | Windows-1252 | CP1252 )\z/xi) {
486 $chset = 'Windows-1252'; $decoder = $enc_w1252;
487 } else {
488 $chset = $charset_declared; $decoder = Encode::find_encoding($chset);
489 if (!$decoder && $chset =~ /^GB[ -]?18030(?:-20\d\d)?\z/i) {
490 $decoder = Encode::find_encoding('GBK'); # a subset of GB18030
491 dbg("message: no decoder for a declared charset %s, using GBK",
492 $chset) if $decoder;
493 }
494 }
495 if (!$decoder) {
496 dbg("message: failed decoding, no decoder for a declared charset %s",
497 $chset);
498 } else {
499 eval { $rv = $decoder->decode($_[1], 1|8) }; # FB_CROAK | LEAVE_SRC
500 if (lc $chset eq lc $charset_declared) {
501 dbg("message: %s as declared charset %s",
502 defined $rv ? 'decoded' : 'failed decoding', $charset_declared);
503 } else {
504 dbg("message: %s as charset %s, declared %s",
505 defined $rv ? 'decoded' : 'failed decoding',
506 $chset, $charset_declared);
507 }
508 }
509 }
510
511 # If the above failed, check if it is US-ASCII, possibly extended by few
512 # NBSP or SHY characters from ISO-8859-* or Windows-1252, or containing
513 # some popular punctuation or special characters from Windows-1252 in
514 # the \x80-\x9F range (which is unassigned in ISO-8859-*).
515 # Note that Windows-1252 is a proper superset of ISO-8859-1.
516 #
517 if (!defined $rv && !$cnt_8bits) {
518 dbg("message: kept, guessed charset is US-ASCII, declared %s",
519 $charset_declared);
520 return $_[1]; # is all-ASCII, no need for decoding
521
522 } elsif (!defined $rv && $enc_w1252 &&
523 # ASCII NBSP (c) SHY ' " ... '".- TM
524 $_[1] !~ tr/\x00-\x7F\xA0\xA9\xAD\x82\x84\x85\x91-\x97\x99//c)
525 { # ASCII + NBSP + SHY + some punctuation characters
526 # NBSP (A0) and SHY (AD) are at the same position in ISO-8859-* too
527 # consider also: AE (r), 80 Euro
528 eval { $rv = $enc_w1252->decode($_[1], 1|8) }; # FB_CROAK | LEAVE_SRC
529 # the above can't fail, but keep code general just in case
530 dbg("message: %s as guessed charset %s, declared %s",
531 defined $rv ? 'decoded' : 'failed decoding',
532 'Windows-1252', $charset_declared);
533 }
534
535 # If we were unsuccessful so far, try some guesswork
536 # based on Encode::Detect::Detector .
537
538 if (defined $rv) {
539 # done, no need for guesswork
540 } elsif (!$have_encode_detector) {
541 dbg("message: Encode::Detect::Detector not available, declared %s failed",
542 $charset_declared);
543 } else {
544 my $charset_detected = Encode::Detect::Detector::detect($_[1]);
545 if ($charset_detected && lc $charset_detected ne lc $charset_declared) {
546 my $decoder = Encode::find_encoding($charset_detected);
547 if (!$decoder && $charset_detected =~ /^GB[ -]?18030(?:-20\d\d)?\z/i) {
548 $decoder = Encode::find_encoding('GBK'); # a subset of GB18030
549 dbg("message: no decoder for a detected charset %s, using GBK",
550 $charset_detected) if $decoder;
551 }
552 if (!$decoder) {
553 dbg("message: failed decoding, no decoder for a detected charset %s",
554 $charset_detected);
555 } else {
556 eval { $rv = $decoder->decode($_[1], 1|8) }; # FB_CROAK | LEAVE_SRC
557 dbg("message: %s as detected charset %s, declared %s",
558 defined $rv ? 'decoded' : 'failed decoding',
559 $charset_detected, $charset_declared);
560 }
561 }
562 }
563
564 if (!defined $rv) { # all decoding attempts failed so far, probably garbage
565 # go for Windows-1252 which can't fail
566 eval { $rv = $enc_w1252->decode($_[1]) };
567 dbg("message: %s as last-resort charset %s, declared %s",
568 defined $rv ? 'decoded' : 'failed decoding',
569 'Windows-1252', $charset_declared);
570 }
571
572 if (!defined $rv) { # just in case - all decoding attempts failed so far
573 return $_[1]; # garbage-in / garbage-out, return unchanged octets
574 }
575 # decoding octets to characters was successful
576 if (!$return_decoded) {
577 # utf8::encode() is much faster than $enc_utf8->encode on utf8-flagged arg
578 utf8::encode($rv); # encode Unicode characters to UTF-8 octets
579 }
580 $rv .= $data_taint; # carry taintedness over, avoid Encode bug
581 return $rv;
582}
583
584=item rendered()
585
586render_text() takes the given text/* type MIME part, and attempts to
587render it into a text scalar. It will always render text/html, and will
588use a heuristic to determine if other text/* parts should be considered
589text/html. Two scalars are returned: the rendered type (either text/html
590or whatever the original type was), and the rendered text.
591
592=cut
593
594
# spent 21.7s (90.2ms+21.6) within Mail::SpamAssassin::Message::Node::rendered which was called 1281 times, avg 17.0ms/call: # 427 times (80.5ms+21.6s) by Mail::SpamAssassin::Message::get_body_text_array_common at line 1128 of Mail/SpamAssassin/Message.pm, avg 50.9ms/call # 427 times (6.04ms+86µs) by Mail::SpamAssassin::Message::Node::visible_rendered at line 697, avg 14µs/call # 427 times (3.60ms+97µs) by Mail::SpamAssassin::Message::Node::invisible_rendered at line 709, avg 9µs/call
sub rendered {
59512812.42ms my ($self) = @_;
596
59712815.75ms if (!exists $self->{rendered}) {
598 # We only know how to render text/plain and text/html ...
599 # Note: for bug 4843, make sure to skip text/calendar parts
600 # we also want to skip things like text/x-vcard
601 # text/x-aol is ignored here, but looks like text/html ...
6024597.58ms4593.77ms return(undef,undef) unless ( $self->{'type'} =~ /^text\/(?:plain|html)$/i );
# spent 3.77ms making 459 calls to Mail::SpamAssassin::Message::Node::CORE:match, avg 8µs/call
603
6044113.88ms411765ms my $text = $self->decode; # QP and Base64 decoding, bytes
# spent 765ms making 411 calls to Mail::SpamAssassin::Message::Node::decode, avg 1.86ms/call
6054111.40ms my $text_len = length($text); # num of bytes in original charset encoding
606
607 # render text/html always, or any other text|text/plain part as text/html
608 # based on a heuristic which simulates a certain common mail client
60941111.0ms8447.65ms if ($text ne '' && ($self->{'type'} =~ m{^text/html$}i ||
# spent 4.31ms making 218 calls to Mail::SpamAssassin::Message::Node::_html_render, avg 20µs/call # spent 3.33ms making 626 calls to Mail::SpamAssassin::Message::Node::CORE:match, avg 5µs/call
610 ($self->{'type'} =~ m{^text/plain$}i &&
611 _html_render(substr($text, 0, 23)))))
612 {
613190628µs $self->{rendered_type} = 'text/html';
614
615 # will input text to HTML::Parser be provided as Unicode characters?
616190443µs my $character_semantics = 0; # $text is in bytes
6171903.62ms1891.05ms if ($self->{normalize} && $enc_utf8) { # charset decoding requested
# spent 1.05ms making 189 calls to Mail::SpamAssassin::Message::Node::CORE:match, avg 6µs/call
618 # Provide input to HTML::Parser as Unicode characters
619 # which avoids a HTML::Parser bug in utf8_mode
620 # https://rt.cpan.org/Public/Bug/Display.html?id=99755
621 # Avoid unnecessary step of encoding-then-decoding by telling
622 # subroutine _normalize() to return Unicode text. See Bug 7133
623 #
624 $character_semantics = 1; # $text will be in characters
625 $text = $self->_normalize($text, $self->{charset}, 1); # bytes to chars
626 } elsif (!defined $self->{charset} ||
627 $self->{charset} =~ /^(?:US-ASCII|UTF-8)\z/i) {
628 # With some luck input can be interpreted as UTF-8, do not warn.
629 # It is still possible to hit the HTML::Parses utf8_mode bug however.
630 } else {
631 dbg("message: 'normalize_charset' is off, encoding will likely ".
63246391µs46338µs "be misinterpreted; declared charset: %s", $self->{charset});
# spent 338µs making 46 calls to Mail::SpamAssassin::Logger::dbg, avg 7µs/call
633 }
634 # the 0 requires decoded HTML results to be in bytes (not characters)
6351902.24ms19079.2ms my $html = Mail::SpamAssassin::HTML->new($character_semantics,0); # object
# spent 79.2ms making 190 calls to Mail::SpamAssassin::HTML::new, avg 417µs/call
636
6371901.69ms19020.4s $html->parse($text); # parse+render text
# spent 20.4s making 190 calls to Mail::SpamAssassin::HTML::parse, avg 107ms/call
638
639 # resulting HTML-decoded text is in bytes, likely encoded as UTF-8
6401902.06ms1909.80ms $self->{rendered} = $html->get_rendered_text();
# spent 9.80ms making 190 calls to Mail::SpamAssassin::HTML::get_rendered_text, avg 52µs/call
6411901.69ms190193ms $self->{visible_rendered} = $html->get_rendered_text(invisible => 0);
# spent 193ms making 190 calls to Mail::SpamAssassin::HTML::get_rendered_text, avg 1.01ms/call
6421901.59ms190171ms $self->{invisible_rendered} = $html->get_rendered_text(invisible => 1);
# spent 171ms making 190 calls to Mail::SpamAssassin::HTML::get_rendered_text, avg 902µs/call
6431901.85ms1901.58ms $self->{html_results} = $html->get_results();
# spent 1.58ms making 190 calls to Mail::SpamAssassin::HTML::get_results, avg 8µs/call
644
645 # end-of-document result values that require looking at the text
646190487µs my $r = $self->{html_results}; # temporary reference for brevity
647
648 # count the number of spaces in the rendered text (likely UTF-8 octets)
6491901.48ms my $space = $self->{rendered} =~ tr/ \t\n\r\x0b//;
650 # we may want to add the count of other Unicode whitespace characters
651
652190794µs $r->{html_length} = length $self->{rendered}; # bytes (likely UTF-8)
653190790µs $r->{non_space_len} = $r->{html_length} - $space;
65419010.6ms $r->{ratio} = ($text_len - $r->{html_length}) / $text_len if $text_len;
655 }
656
657 else { # plain text
658221553µs if ($self->{normalize} && $enc_utf8) {
659 # request transcoded result as UTF-8 octets!
660 $text = $self->_normalize($text, $self->{charset}, 0);
661 }
662221859µs $self->{rendered_type} = $self->{type};
6632211.17ms $self->{rendered} = $self->{'visible_rendered'} = $text;
664221794µs $self->{'invisible_rendered'} = '';
665 }
666 }
667
668123313.5ms return ($self->{rendered_type}, $self->{rendered});
669}
670
671=item set_rendered($text, $type)
672
673Set the rendered text and type for the given part. If type is not
674specified, and text is a defined value, a default of 'text/plain' is used.
675This can be used, for instance, to render non-text parts using plugins.
676
677=cut
678
679sub set_rendered {
680 my ($self, $text, $type) = @_;
681
682 $type = 'text/plain' if (!defined $type && defined $text);
683
684 $self->{'rendered_type'} = $type;
685 $self->{'rendered'} = $self->{'visible_rendered'} = $text;
686 $self->{'invisible_rendered'} = defined $text ? '' : undef;
687}
688
689=item visible_rendered()
690
691Render and return the visible text in this part.
692
693=cut
694
695
# spent 15.9ms (9.80+6.13) within Mail::SpamAssassin::Message::Node::visible_rendered which was called 427 times, avg 37µs/call: # 427 times (9.80ms+6.13ms) by Mail::SpamAssassin::Message::get_body_text_array_common at line 1128 of Mail/SpamAssassin/Message.pm, avg 37µs/call
sub visible_rendered {
696427833µs my ($self) = @_;
6974273.16ms4276.13ms $self->rendered(); # ignore return, we want just this:
# spent 6.13ms making 427 calls to Mail::SpamAssassin::Message::Node::rendered, avg 14µs/call
6984275.21ms return ($self->{rendered_type}, $self->{visible_rendered});
699}
700
701=item invisible_rendered()
702
703Render and return the invisible text in this part.
704
705=cut
706
707
# spent 14.6ms (10.9+3.70) within Mail::SpamAssassin::Message::Node::invisible_rendered which was called 427 times, avg 34µs/call: # 427 times (10.9ms+3.70ms) by Mail::SpamAssassin::Message::get_body_text_array_common at line 1128 of Mail/SpamAssassin/Message.pm, avg 34µs/call
sub invisible_rendered {
708427822µs my ($self) = @_;
7094272.64ms4273.70ms $self->rendered(); # ignore return, we want just this:
# spent 3.70ms making 427 calls to Mail::SpamAssassin::Message::Node::rendered, avg 9µs/call
7104274.85ms return ($self->{rendered_type}, $self->{invisible_rendered});
711}
712
713=item content_summary()
714
715Returns an array of scalars describing the mime parts of the message.
716Note: This function requires that the message be parsed first!
717
718=cut
719
720# return an array with scalars describing mime parts
721sub content_summary {
722 my($self) = @_;
723
724 my @ret = ( [ $self->{'type'} ] );
725 my @search;
726
727 if (exists $self->{'body_parts'}) {
728 my $count = @{$self->{'body_parts'}};
729 for(my $i=0; $i<$count; $i++) {
730 push(@search, [ $i+1, $self->{'body_parts'}->[$i] ]);
731 }
732 }
733
734 while(my $part = shift @search) {
735 my($index, $part) = @{$part};
736 push(@{$ret[$index]}, $part->{'type'});
737 if (exists $part->{'body_parts'}) {
738 unshift(@search, map { [ $index, $_ ] } @{$part->{'body_parts'}});
739 }
740 }
741
742 return map { join(",", @{$_}) } @ret;
743}
744
745=item delete_header()
746
747Delete the specified header (decoded and raw) from the Node information.
748
749=cut
750
751
# spent 1.20s (902ms+300ms) within Mail::SpamAssassin::Message::Node::delete_header which was called 940 times, avg 1.28ms/call: # 235 times (256ms+88.0ms) by Mail::SpamAssassin::Message::Metadata::parse_received_headers at line 273 of Mail/SpamAssassin/Message/Metadata/Received.pm, avg 1.46ms/call # 235 times (230ms+70.8ms) by Mail::SpamAssassin::Message::Metadata::parse_received_headers at line 274 of Mail/SpamAssassin/Message/Metadata/Received.pm, avg 1.28ms/call # 235 times (216ms+69.8ms) by Mail::SpamAssassin::Message::Metadata::parse_received_headers at line 276 of Mail/SpamAssassin/Message/Metadata/Received.pm, avg 1.22ms/call # 235 times (201ms+71.4ms) by Mail::SpamAssassin::Message::Metadata::parse_received_headers at line 275 of Mail/SpamAssassin/Message/Metadata/Received.pm, avg 1.16ms/call
sub delete_header {
7529402.54ms my($self, $hdr) = @_;
753
7541880512ms51424135ms foreach ( grep(/^${hdr}$/i, keys %{$self->{'headers'}}) ) {
# spent 89.9ms making 25712 calls to Mail::SpamAssassin::Message::Node::CORE:regcomp, avg 3µs/call # spent 45.0ms making 25712 calls to Mail::SpamAssassin::Message::Node::CORE:match, avg 2µs/call
755 delete $self->{'headers'}->{$_};
756 delete $self->{'raw_headers'}->{$_};
757 }
758
7591880681ms59520165ms my @neworder = grep(!/^${hdr}$/i, @{$self->{'header_order'}});
# spent 94.2ms making 29760 calls to Mail::SpamAssassin::Message::Node::CORE:regcomp, avg 3µs/call # spent 70.9ms making 29760 calls to Mail::SpamAssassin::Message::Node::CORE:match, avg 2µs/call
76094012.3ms $self->{'header_order'} = \@neworder;
761}
762
763# decode a header appropriately. don't bother adding it to the pod documents.
764
# spent 16.7ms (6.69+10.0) within Mail::SpamAssassin::Message::Node::__decode_header which was called 118 times, avg 142µs/call: # 118 times (6.69ms+10.0ms) by Mail::SpamAssassin::Message::Node::_decode_header at line 819, avg 142µs/call
sub __decode_header {
7651181.33ms my ( $self, $encoding, $cte, $data ) = @_;
766
767118564µs if ( $cte eq 'B' ) {
768 # base 64 encoded
76915152µs151.20ms $data = Mail::SpamAssassin::Util::base64_decode($data);
# spent 1.20ms making 15 calls to Mail::SpamAssassin::Util::base64_decode, avg 80µs/call
770 }
771 elsif ( $cte eq 'Q' ) {
772 # quoted printable
773
774 # the RFC states that in the encoded text, "_" is equal to "=20"
7751031.43ms103475µs $data =~ s/_/=20/g;
# spent 475µs making 103 calls to Mail::SpamAssassin::Message::Node::CORE:subst, avg 5µs/call
776
777103753µs1036.39ms $data = Mail::SpamAssassin::Util::qp_decode($data);
# spent 6.39ms making 103 calls to Mail::SpamAssassin::Util::qp_decode, avg 62µs/call
778 }
779 else {
780 # not possible since the input has already been limited to 'B' and 'Q'
781 die "message: unknown encoding type '$cte' in RFC2047 header";
782 }
7831181.99ms1181.98ms return $self->_normalize($data, $encoding, 0); # transcode to UTF-8 octets
# spent 1.98ms making 118 calls to Mail::SpamAssassin::Message::Node::_normalize, avg 17µs/call
784}
785
786# Decode base64 and quoted-printable in headers according to RFC2047.
787#
788
# spent 707ms (574+133) within Mail::SpamAssassin::Message::Node::_decode_header which was called 8225 times, avg 86µs/call: # 8225 times (574ms+133ms) by Mail::SpamAssassin::Message::Node::header at line 189, avg 86µs/call
sub _decode_header {
789822568.5ms my($self, $header_field_body, $header_field_name) = @_;
790
791822519.7ms return '' unless defined $header_field_body && $header_field_body ne '';
792
793 # deal with folding and cream the newlines and such
794798398.3ms798320.8ms $header_field_body =~ s/\n[ \t]+/\n /g;
# spent 20.8ms making 7983 calls to Mail::SpamAssassin::Message::Node::CORE:subst, avg 3µs/call
7957983110ms798317.3ms $header_field_body =~ s/\015?\012//gs;
# spent 17.3ms making 7983 calls to Mail::SpamAssassin::Message::Node::CORE:subst, avg 2µs/call
796
7977983155ms798347.5ms if ($header_field_name =~
# spent 47.5ms making 7983 calls to Mail::SpamAssassin::Message::Node::CORE:match, avg 6µs/call
798 /^ (?: (?: Received | (?:Resent-)? (?: Message-ID | Date ) |
799 MIME-Version | References | In-Reply-To ) \z
800 | (?: List- | Content- ) ) /xsi ) {
801 # Bug 6945: some header fields must not be processed for MIME encoding
802
803 } else {
804492230.5ms local($1,$2,$3);
805
806 # Multiple encoded sections must ignore the interim whitespace.
807 # To avoid possible FPs with (\s+(?==\?))?, look for the whole RE
808 # separated by whitespace.
809492272.0ms502913.6ms 1 while $header_field_body =~
# spent 12.9ms making 4936 calls to Mail::SpamAssassin::Message::Node::CORE:subst, avg 3µs/call # spent 687µs making 93 calls to Mail::SpamAssassin::Message::Node::CORE:substcont, avg 7µs/call
810 s{ ( = \? [A-Za-z0-9_-]+ \? [bqBQ] \? [^?]* \? = ) \s+
811 {$1$2}xsg;
812
813
814 # transcode properly encoded RFC 2047 substrings into UTF-8 octets,
815 # leave everything else unchanged as it is supposed to be UTF-8 (RFC 6532)
816 # or plain US-ASCII
817492275.2ms507916.9ms $header_field_body =~
# spent 15.7ms making 4922 calls to Mail::SpamAssassin::Message::Node::CORE:subst, avg 3µs/call # spent 1.28ms making 157 calls to Mail::SpamAssassin::Message::Node::CORE:substcont, avg 8µs/call
818 s{ (?: = \? ([A-Za-z0-9_-]+) \? ([bqBQ]) \? ([^?]*) \? = ) }
8191181.70ms11816.7ms { $self->__decode_header($1, uc($2), $3) }xsge;
# spent 16.7ms making 118 calls to Mail::SpamAssassin::Message::Node::__decode_header, avg 142µs/call
820 }
821
822# dbg("message: _decode_header %s: %s", $header_field_name, $header_field_body);
823798396.4ms return $header_field_body;
824}
825
826=item get_header()
827
828Retrieve a specific header. Will have a newline at the end and will be
829unfolded. The first parameter is the header name (case-insensitive),
830and the second parameter (optional) is whether or not to return the
831raw header.
832
833If get_header() is called in an array context, an array will be returned
834with each header entry in a different element. In a scalar context,
835the last specific header is returned.
836
837ie: If 'Subject' is specified as the header, and there are 2 Subject
838headers in a message, the last/bottom one in the message is returned in
839scalar context or both are returned in array context.
840
841Btw, returning the last header field (not the first) happens to be consistent
842with DKIM signatures, which search for and cover multiple header fields
843bottom-up according to the 'h' tag. Let's keep it this way.
844
845=cut
846
847
# spent 3.11s (1.29+1.82) within Mail::SpamAssassin::Message::Node::get_header which was called 29917 times, avg 104µs/call: # 25231 times (1.10s+1.55s) by Mail::SpamAssassin::Message::Node::get_all_headers at line 914, avg 105µs/call # 1195 times (38.0ms+81.9ms) by Mail::SpamAssassin::PerMsgStatus::_get at line 1982 of Mail/SpamAssassin/PerMsgStatus.pm, avg 100µs/call # 1175 times (39.3ms+51.7ms) by Mail::SpamAssassin::Message::Metadata::parse_received_headers at line 128 of Mail/SpamAssassin/Message/Metadata/Received.pm, avg 77µs/call # 688 times (28.9ms+47.7ms) by Mail::SpamAssassin::Plugin::Bayes::get_msgid at line 989 of Mail/SpamAssassin/Plugin/Bayes.pm, avg 111µs/call # 688 times (32.6ms+37.5ms) by Mail::SpamAssassin::Plugin::Bayes::get_msgid at line 976 of Mail/SpamAssassin/Plugin/Bayes.pm, avg 102µs/call # 470 times (24.0ms+27.3ms) by Mail::SpamAssassin::Message::get_body_text_array_common at line 1118 of Mail/SpamAssassin/Message.pm, avg 109µs/call # 235 times (17.8ms+15.7ms) by Mail::SpamAssassin::Message::Metadata::parse_received_headers at line 121 of Mail/SpamAssassin/Message/Metadata/Received.pm, avg 143µs/call # 235 times (7.37ms+11.7ms) by main::wanted at line 568 of /usr/local/bin/sa-learn, avg 81µs/call
sub get_header {
8482991772.8ms my ($self, $hdr, $raw) = @_;
8492991752.3ms $raw ||= 0;
850
851 # And now pick up all the entries into a list
852 # This is assumed to include a newline at the end ...
853 # This is also assumed to have removed continuation bits ...
854
855 # Deal with the possibility that header() or raw_header() returns undef
8562991751.1ms my @hdrs;
85729917114ms if ( $raw ) {
858 if (@hdrs = $self->raw_header($hdr)) {
859 s/\015?\012\s+/ /gs for @hdrs;
860 }
861 }
862 else {
86329917285ms299171.82s if (@hdrs = $self->header($hdr)) {
# spent 1.82s making 29917 calls to Mail::SpamAssassin::Message::Node::header, avg 61µs/call
86427538224ms $_ .= "\n" for @hdrs;
865 }
866 }
867
8682991755.6ms if (wantarray) {
86926661489ms return @hdrs;
870 }
871 else {
872325668.0ms return @hdrs ? $hdrs[-1] : undef;
873 }
874}
875
876=item get_all_headers()
877
878Retrieve all headers. Each header will have a newline at the end and
879will be unfolded. The first parameter (optional) is whether or not to
880return the raw headers, and the second parameter (optional) is whether
881or not to include the mbox separator.
882
883If get_all_header() is called in an array context, an array will be
884returned with each header entry in a different element. In a scalar
885context, the headers are returned in a single scalar.
886
887=back
888
889=cut
890
891# build it and it will not bomb
892
# spent 4.42s (1.56+2.85) within Mail::SpamAssassin::Message::Node::get_all_headers which was called 923 times, avg 4.78ms/call: # 688 times (1.20s+2.21s) by Mail::SpamAssassin::Message::receive_date at line 699 of Mail/SpamAssassin/Message.pm, avg 4.96ms/call # 235 times (367ms+640ms) by Mail::SpamAssassin::Plugin::Bayes::_tokenize_headers at line 1293 of Mail/SpamAssassin/Plugin/Bayes.pm, avg 4.28ms/call
sub get_all_headers {
8939232.04ms my ($self, $raw, $include_mbox) = @_;
8949232.43ms $raw ||= 0;
8959232.32ms $include_mbox ||= 0;
896
8979231.71ms my @lines;
898
899 # precalculate destination positions based on order of appearance
9009232.05ms my $i = 0;
9019231.93ms my %locations;
902184610.4ms for my $k (@{$self->{header_order}}) {
90358422379ms push(@{$locations{lc($k)}}, $i++);
904 }
905
906 # process headers in order of first appearance
9079232.27ms my $header;
9089232.63ms my $size = 0;
90994933227ms923204ms HEADER: for my $name (sort { $locations{$a}->[0] <=> $locations{$b}->[0] }
# spent 204ms making 923 calls to Mail::SpamAssassin::Message::Node::CORE:sort, avg 222µs/call
910 keys %locations)
911 {
912 # get all same-name headers and poke into correct position
9132523147.7ms my $positions = $locations{$name};
91425231288ms252312.65s for my $contents ($self->get_header($name, $raw)) {
# spent 2.65s making 25231 calls to Mail::SpamAssassin::Message::Node::get_header, avg 105µs/call
91558422153ms my $position = shift @{$positions};
9162921189.3ms $size += length($name) + length($contents) + 2;
9172921149.7ms if ($size > MAX_HEADER_LENGTH) {
918 $self->{'truncated_header'} = 1;
919 last HEADER;
920 }
92129211265ms $lines[$position] = $self->{header_order}->[$position].":".$contents;
922 }
923 }
924
925 # skip undefined lines if we truncated
9269233.05ms @lines = grep { defined $_ } @lines if $self->{'truncated_header'};
927
9289232.65ms splice @lines, 0, 0, $self->{mbox_sep} if ( $include_mbox && exists $self->{mbox_sep} );
929
93092350.5ms return wantarray ? @lines : join ('', @lines);
931}
932
933# legacy public API; now a no-op.
934sub finish { }
935
936# ---------------------------------------------------------------------------
937
938111µs1;
939__END__
 
# spent 181ms within Mail::SpamAssassin::Message::Node::CORE:match which was called 66877 times, avg 3µs/call: # 29760 times (70.9ms+0s) by Mail::SpamAssassin::Message::Node::delete_header at line 759, avg 2µs/call # 25712 times (45.0ms+0s) by Mail::SpamAssassin::Message::Node::delete_header at line 754, avg 2µs/call # 7983 times (47.5ms+0s) by Mail::SpamAssassin::Message::Node::_decode_header at line 797, avg 6µs/call # 1875 times (7.66ms+0s) by Mail::SpamAssassin::Message::Node::find_parts at line 128, avg 4µs/call # 626 times (3.33ms+0s) by Mail::SpamAssassin::Message::Node::rendered at line 609, avg 5µs/call # 459 times (3.77ms+0s) by Mail::SpamAssassin::Message::Node::rendered at line 602, avg 8µs/call # 218 times (1.11ms+0s) by Mail::SpamAssassin::Message::Node::_html_render at line 388, avg 5µs/call # 189 times (1.05ms+0s) by Mail::SpamAssassin::Message::Node::rendered at line 617, avg 6µs/call # 55 times (360µs+0s) by Mail::SpamAssassin::Message::Node::decode at line 355, avg 7µs/call
sub Mail::SpamAssassin::Message::Node::CORE:match; # opcode
# spent 193ms within Mail::SpamAssassin::Message::Node::CORE:regcomp which was called 57347 times, avg 3µs/call: # 29760 times (94.2ms+0s) by Mail::SpamAssassin::Message::Node::delete_header at line 759, avg 3µs/call # 25712 times (89.9ms+0s) by Mail::SpamAssassin::Message::Node::delete_header at line 754, avg 3µs/call # 1875 times (9.40ms+0s) by Mail::SpamAssassin::Message::Node::find_parts at line 128, avg 5µs/call
sub Mail::SpamAssassin::Message::Node::CORE:regcomp; # opcode
# spent 204ms within Mail::SpamAssassin::Message::Node::CORE:sort which was called 923 times, avg 222µs/call: # 923 times (204ms+0s) by Mail::SpamAssassin::Message::Node::get_all_headers at line 909, avg 222µs/call
sub Mail::SpamAssassin::Message::Node::CORE:sort; # opcode
# spent 504ms within Mail::SpamAssassin::Message::Node::CORE:subst which was called 132804 times, avg 4µs/call: # 40032 times (137ms+0s) by Mail::SpamAssassin::Message::Node::header at line 172, avg 3µs/call # 40032 times (93.5ms+0s) by Mail::SpamAssassin::Message::Node::header at line 173, avg 2µs/call # 8225 times (79.8ms+0s) by Mail::SpamAssassin::Message::Node::header at line 187, avg 10µs/call # 8225 times (59.7ms+0s) by Mail::SpamAssassin::Message::Node::header at line 188, avg 7µs/call # 8225 times (53.0ms+0s) by Mail::SpamAssassin::Message::Node::header at line 186, avg 6µs/call # 7983 times (20.8ms+0s) by Mail::SpamAssassin::Message::Node::_decode_header at line 794, avg 3µs/call # 7983 times (17.3ms+0s) by Mail::SpamAssassin::Message::Node::_decode_header at line 795, avg 2µs/call # 4936 times (12.9ms+0s) by Mail::SpamAssassin::Message::Node::_decode_header at line 809, avg 3µs/call # 4922 times (15.7ms+0s) by Mail::SpamAssassin::Message::Node::_decode_header at line 817, avg 3µs/call # 940 times (2.81ms+0s) by Mail::SpamAssassin::Message::Node::raw_header at line 228, avg 3µs/call # 940 times (2.27ms+0s) by Mail::SpamAssassin::Message::Node::raw_header at line 229, avg 2µs/call # 203 times (2.99ms+0s) by Mail::SpamAssassin::Message::Node::decode at line 340, avg 15µs/call # 103 times (475µs+0s) by Mail::SpamAssassin::Message::Node::__decode_header at line 775, avg 5µs/call # 55 times (6.28ms+0s) by Mail::SpamAssassin::Message::Node::decode at line 356, avg 114µs/call
sub Mail::SpamAssassin::Message::Node::CORE:subst; # opcode
# spent 1.97ms within Mail::SpamAssassin::Message::Node::CORE:substcont which was called 250 times, avg 8µs/call: # 157 times (1.28ms+0s) by Mail::SpamAssassin::Message::Node::_decode_header at line 817, avg 8µs/call # 93 times (687µs+0s) by Mail::SpamAssassin::Message::Node::_decode_header at line 809, avg 7µs/call
sub Mail::SpamAssassin::Message::Node::CORE:substcont; # opcode