← Index
NYTProf Performance Profile   « line view »
For /usr/local/bin/sa-learn
  Run on Sun Nov 5 03:09:29 2017
Reported on Mon Nov 6 13:20:47 2017

Filename/usr/local/lib/perl5/site_perl/Mail/SpamAssassin/HTML.pm
StatementsExecuted 3803434 statements in 23.0s
Subroutines
Calls P F Exclusive
Time
Inclusive
Time
Subroutine
46468514.54s4.91sMail::SpamAssassin::HTML::::display_textMail::SpamAssassin::HTML::display_text
46892113.39s14.1sMail::SpamAssassin::HTML::::html_tagMail::SpamAssassin::HTML::html_tag
29709212.28s6.56sMail::SpamAssassin::HTML::::html_textMail::SpamAssassin::HTML::html_text
32908112.26s5.65sMail::SpamAssassin::HTML::::text_styleMail::SpamAssassin::HTML::text_style
16266111.93s1.93sMail::SpamAssassin::HTML::::close_tagMail::SpamAssassin::HTML::close_tag
2318862511.34s1.34sMail::SpamAssassin::HTML::::CORE:matchMail::SpamAssassin::HTML::CORE:match (opcode)
782411993ms1.01sMail::SpamAssassin::HTML::::close_table_tagMail::SpamAssassin::HTML::close_table_tag
2457011661ms734msMail::SpamAssassin::HTML::::html_testsMail::SpamAssassin::HTML::html_tests
1686011616ms2.53sMail::SpamAssassin::HTML::::html_whitespaceMail::SpamAssassin::HTML::html_whitespace
101699121576ms576msMail::SpamAssassin::HTML::::CORE:substMail::SpamAssassin::HTML::CORE:subst (opcode)
270321400ms490msMail::SpamAssassin::HTML::::_remove_dot_segmentsMail::SpamAssassin::HTML::_remove_dot_segments
478711396ms444msMail::SpamAssassin::HTML::::html_font_invisibleMail::SpamAssassin::HTML::html_font_invisible
56731383ms383msMail::SpamAssassin::HTML::::get_rendered_textMail::SpamAssassin::HTML::get_rendered_text
1267311348ms1.73sMail::SpamAssassin::HTML::::html_uriMail::SpamAssassin::HTML::html_uri
270511315ms1.05sMail::SpamAssassin::HTML::::target_uriMail::SpamAssassin::HTML::target_uri
232531204ms266msMail::SpamAssassin::HTML::::name_to_rgbMail::SpamAssassin::HTML::name_to_rgb
541021191ms246msMail::SpamAssassin::HTML::::_parse_uriMail::SpamAssassin::HTML::_parse_uri
426121144ms192msMail::SpamAssassin::HTML::::canon_uriMail::SpamAssassin::HTML::canon_uri
270531126ms1.31sMail::SpamAssassin::HTML::::push_uriMail::SpamAssassin::HTML::push_uri
309620188.0ms88.0msMail::SpamAssassin::HTML::::put_resultsMail::SpamAssassin::HTML::put_results
1891164.3ms146msMail::SpamAssassin::HTML::::html_endMail::SpamAssassin::HTML::html_end
1891134.9ms22.6sMail::SpamAssassin::HTML::::parseMail::SpamAssassin::HTML::parse
14041128.3ms28.3msMail::SpamAssassin::HTML::::CORE:substcontMail::SpamAssassin::HTML::CORE:substcont (opcode)
11118.9ms20.7msMail::SpamAssassin::HTML::::BEGIN@30Mail::SpamAssassin::HTML::BEGIN@30
1891115.7ms80.2msMail::SpamAssassin::HTML::::newMail::SpamAssassin::HTML::new
189117.40ms12.4msMail::SpamAssassin::HTML::::html_startMail::SpamAssassin::HTML::html_start
411117.24ms7.24msMail::SpamAssassin::HTML::::html_commentMail::SpamAssassin::HTML::html_comment
67112.64ms3.04msMail::SpamAssassin::HTML::::html_declarationMail::SpamAssassin::HTML::html_declaration
189111.56ms1.56msMail::SpamAssassin::HTML::::get_resultsMail::SpamAssassin::HTML::get_results
11146µs289µsMail::SpamAssassin::HTML::::BEGIN@1080Mail::SpamAssassin::HTML::BEGIN@1080
11138µs46µsMail::SpamAssassin::HTML::::BEGIN@23Mail::SpamAssassin::HTML::BEGIN@23
11135µs204µsMail::SpamAssassin::HTML::::BEGIN@31Mail::SpamAssassin::HTML::BEGIN@31
11130µs605µsMail::SpamAssassin::HTML::::BEGIN@32Mail::SpamAssassin::HTML::BEGIN@32
11123µs113µsMail::SpamAssassin::HTML::::BEGIN@33Mail::SpamAssassin::HTML::BEGIN@33
11120µs42µsMail::SpamAssassin::HTML::::BEGIN@24Mail::SpamAssassin::HTML::BEGIN@24
11118µs64µsMail::SpamAssassin::HTML::::BEGIN@25Mail::SpamAssassin::HTML::BEGIN@25
0000s0sMail::SpamAssassin::HTML::::_merge_uriMail::SpamAssassin::HTML::_merge_uri
0000s0sMail::SpamAssassin::HTML::::dec2hexMail::SpamAssassin::HTML::dec2hex
0000s0sMail::SpamAssassin::HTML::::name_to_rgb_oldMail::SpamAssassin::HTML::name_to_rgb_old
Call graph for these subroutines as a Graphviz dot language file.
Line State
ments
Time
on line
Calls Time
in subs
Code
1# <@LICENSE>
2# Licensed to the Apache Software Foundation (ASF) under one or more
3# contributor license agreements. See the NOTICE file distributed with
4# this work for additional information regarding copyright ownership.
5# The ASF licenses this file to you under the Apache License, Version 2.0
6# (the "License"); you may not use this file except in compliance with
7# the License. You may obtain a copy of the License at:
8#
9# http://www.apache.org/licenses/LICENSE-2.0
10#
11# Unless required by applicable law or agreed to in writing, software
12# distributed under the License is distributed on an "AS IS" BASIS,
13# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14# See the License for the specific language governing permissions and
15# limitations under the License.
16# </@LICENSE>
17
18# HTML decoding TODOs
19# - add URIs to list for faster URI testing
20
21package Mail::SpamAssassin::HTML;
22
23260µs253µs
# spent 46µs (38+7) within Mail::SpamAssassin::HTML::BEGIN@23 which was called: # once (38µs+7µs) by Mail::SpamAssassin::Message::Node::BEGIN@45 at line 23
use strict;
# spent 46µs making 1 call to Mail::SpamAssassin::HTML::BEGIN@23 # spent 7µs making 1 call to strict::import
24252µs265µs
# spent 42µs (20+23) within Mail::SpamAssassin::HTML::BEGIN@24 which was called: # once (20µs+23µs) by Mail::SpamAssassin::Message::Node::BEGIN@45 at line 24
use warnings;
# spent 42µs making 1 call to Mail::SpamAssassin::HTML::BEGIN@24 # spent 23µs making 1 call to warnings::import
25273µs2110µs
# spent 64µs (18+46) within Mail::SpamAssassin::HTML::BEGIN@25 which was called: # once (18µs+46µs) by Mail::SpamAssassin::Message::Node::BEGIN@45 at line 25
use re 'taint';
# spent 64µs making 1 call to Mail::SpamAssassin::HTML::BEGIN@25 # spent 46µs making 1 call to re::import
26
27132µsrequire 5.008; # need basic Unicode support for HTML::Parser::utf8_mode
28# require 5.008008; # Bug 3787; [perl #37950]: Malformed UTF-8 character ...
29
303436µs220.8ms
# spent 20.7ms (18.9+1.87) within Mail::SpamAssassin::HTML::BEGIN@30 which was called: # once (18.9ms+1.87ms) by Mail::SpamAssassin::Message::Node::BEGIN@45 at line 30
use HTML::Parser 3.43 ();
# spent 20.7ms making 1 call to Mail::SpamAssassin::HTML::BEGIN@30 # spent 25µs making 1 call to version::_VERSION
31278µs2373µs
# spent 204µs (35+169) within Mail::SpamAssassin::HTML::BEGIN@31 which was called: # once (35µs+169µs) by Mail::SpamAssassin::Message::Node::BEGIN@45 at line 31
use Mail::SpamAssassin::Logger;
# spent 204µs making 1 call to Mail::SpamAssassin::HTML::BEGIN@31 # spent 169µs making 1 call to Exporter::import
32271µs21.18ms
# spent 605µs (30+576) within Mail::SpamAssassin::HTML::BEGIN@32 which was called: # once (30µs+576µs) by Mail::SpamAssassin::Message::Node::BEGIN@45 at line 32
use Mail::SpamAssassin::Constants qw(:sa);
# spent 605µs making 1 call to Mail::SpamAssassin::HTML::BEGIN@32 # spent 576µs making 1 call to Exporter::import
33211.7ms2203µs
# spent 113µs (23+90) within Mail::SpamAssassin::HTML::BEGIN@33 which was called: # once (23µs+90µs) by Mail::SpamAssassin::Message::Node::BEGIN@45 at line 33
use Mail::SpamAssassin::Util qw(untaint_var);
# spent 113µs making 1 call to Mail::SpamAssassin::HTML::BEGIN@33 # spent 90µs making 1 call to Exporter::import
34
35125µsour @ISA = qw(HTML::Parser);
36
37# elements defined by the HTML 4.01 and XHTML 1.0 DTDs (do not change them!)
38# does not include XML
3995410µsmy %elements = map {; $_ => 1 }
40 # strict
41 qw( a abbr acronym address area b base bdo big blockquote body br button caption cite code col colgroup dd del dfn div dl dt em fieldset form h1 h2 h3 h4 h5 h6 head hr html i img input ins kbd label legend li link map meta noscript object ol optgroup option p param pre q samp script select small span strong style sub sup table tbody td textarea tfoot th thead title tr tt ul var ),
42 # loose
43 qw( applet basefont center dir font frame frameset iframe isindex menu noframes s strike u ),
44 # non-standard tags
45 qw( nobr x-sigsep x-tab ),
46;
47
48# elements that we want to render, but not count as valid
49624µsmy %tricks = map {; $_ => 1 }
50 # non-standard and non-valid tags
51 qw( bgsound embed listing plaintext xmp ),
52 # other non-standard tags handled in popfile
53 # blink ilayer multicol noembed nolayer spacer wbr
54;
55
56# elements that change text style
571450µsmy %elements_text_style = map {; $_ => 1 }
58 qw( body font table tr th td big small basefont marquee span p div ),
59;
60
61# elements that insert whitespace
622384µsmy %elements_whitespace = map {; $_ => 1 }
63 qw( br div li th td dt dd p hr blockquote pre embed listing plaintext xmp title
64 h1 h2 h3 h4 h5 h6 ),
65;
66
67# elements that push URIs
681657µsmy %elements_uri = map {; $_ => 1 }
69 qw( body table tr td a area link img frame iframe embed script form base bgsound ),
70;
71
72# style attribute not accepted
73#my %elements_no_style = map {; $_ => 1 }
74# qw( base basefont head html meta param script style title ),
75#;
76
77# permitted element attributes
7812µsmy %ok_attributes;
79114µs$ok_attributes{basefont}{$_} = 1 for qw( color face size );
80118µs$ok_attributes{body}{$_} = 1 for qw( text bgcolor link alink vlink background );
81110µs$ok_attributes{font}{$_} = 1 for qw( color face size );
8218µs$ok_attributes{marquee}{$_} = 1 for qw( bgcolor background );
8316µs$ok_attributes{table}{$_} = 1 for qw( bgcolor );
8415µs$ok_attributes{td}{$_} = 1 for qw( bgcolor );
85110µs$ok_attributes{th}{$_} = 1 for qw( bgcolor );
8616µs$ok_attributes{tr}{$_} = 1 for qw( bgcolor );
8716µs$ok_attributes{span}{$_} = 1 for qw( style );
8815µs$ok_attributes{p}{$_} = 1 for qw( style );
8915µs$ok_attributes{div}{$_} = 1 for qw( style );
90
91
# spent 80.2ms (15.7+64.6) within Mail::SpamAssassin::HTML::new which was called 189 times, avg 425µs/call: # 189 times (15.7ms+64.6ms) by Mail::SpamAssassin::Message::Node::rendered at line 635 of Mail/SpamAssassin/Message/Node.pm, avg 425µs/call
sub new {
92189548µs my ($class, $character_semantics_input, $character_semantics_output) = @_;
931896.02ms18964.6ms my $self = $class->SUPER::new(
# spent 64.6ms making 189 calls to HTML::Parser::new, avg 342µs/call
94 api_version => 3,
95 handlers => [
96 start_document => ["html_start", "self"],
97 start => ["html_tag", "self,tagname,attr,'+1'"],
98 end_document => ["html_end", "self"],
99 end => ["html_tag", "self,tagname,attr,'-1'"],
100 text => ["html_text", "self,dtext"],
101 comment => ["html_comment", "self,text"],
102 declaration => ["html_declaration", "self,text"],
103 ],
104 marked_sections => 1);
105189666µs $self->{SA_character_semantics_input} = $character_semantics_input;
106 $self->{SA_encode_results} =
107189654µs $character_semantics_input && !$character_semantics_output;
1081891.58ms $self;
109}
110
111
# spent 12.4ms (7.40+4.95) within Mail::SpamAssassin::HTML::html_start which was called 189 times, avg 65µs/call: # 189 times (7.40ms+4.95ms) by HTML::Parser::parse at line 260, avg 65µs/call
sub html_start {
112189459µs my ($self) = @_;
113
114 # trigger HTML_MESSAGE
1151891.64ms1894.95ms $self->put_results(html => 1);
# spent 4.95ms making 189 calls to Mail::SpamAssassin::HTML::put_results, avg 26µs/call
116
117 # initial display attributes
118189640µs $self->{basefont} = 3;
119 my %default = (tag => "default",
120 fgcolor => "#000000",
121 bgcolor => "#ffffff",
1221891.77ms size => $self->{basefont});
1233782.92ms push @{ $self->{text_style} }, \%default;
124}
125
126
# spent 146ms (64.3+81.7) within Mail::SpamAssassin::HTML::html_end which was called 189 times, avg 773µs/call: # 189 times (64.3ms+81.7ms) by HTML::Parser::eof at line 261, avg 773µs/call
sub html_end {
127189442µs my ($self) = @_;
128
1291891.31ms delete $self->{text_style};
130
131189368µs my @uri;
132
133 # add the canonicalized version of each uri to the detail list
134189881µs if (defined $self->{uri}) {
1352361.97ms @uri = keys %{$self->{uri}};
136 }
137
138 # these keep backward compatibility, albeit a little wasteful
1391891.64ms1894.71ms $self->put_results(uri => \@uri);
# spent 4.71ms making 189 calls to Mail::SpamAssassin::HTML::put_results, avg 25µs/call
1401891.64ms1895.43ms $self->put_results(anchor => $self->{anchor});
# spent 5.43ms making 189 calls to Mail::SpamAssassin::HTML::put_results, avg 29µs/call
141
1421891.45ms1895.96ms $self->put_results(uri_detail => $self->{uri});
# spent 5.96ms making 189 calls to Mail::SpamAssassin::HTML::put_results, avg 32µs/call
1431891.80ms1895.82ms $self->put_results(uri_truncated => $self->{uri_truncated});
# spent 5.82ms making 189 calls to Mail::SpamAssassin::HTML::put_results, avg 31µs/call
144
145 # final results scalars
1461891.37ms1894.78ms $self->put_results(image_area => $self->{image_area});
# spent 4.78ms making 189 calls to Mail::SpamAssassin::HTML::put_results, avg 25µs/call
1471891.40ms1894.93ms $self->put_results(length => $self->{length});
# spent 4.93ms making 189 calls to Mail::SpamAssassin::HTML::put_results, avg 26µs/call
1481891.35ms1895.11ms $self->put_results(min_size => $self->{min_size});
# spent 5.11ms making 189 calls to Mail::SpamAssassin::HTML::put_results, avg 27µs/call
1491891.41ms1895.16ms $self->put_results(max_size => $self->{max_size});
# spent 5.16ms making 189 calls to Mail::SpamAssassin::HTML::put_results, avg 27µs/call
150189968µs if (exists $self->{tags}) {
151 $self->put_results(closed_extra_ratio =>
1521871.80ms1875.48ms ($self->{closed_extra} / $self->{tags}));
# spent 5.48ms making 187 calls to Mail::SpamAssassin::HTML::put_results, avg 29µs/call
153 }
154
155 # final result arrays
1561891.69ms1895.68ms $self->put_results(comment => $self->{comment});
# spent 5.68ms making 189 calls to Mail::SpamAssassin::HTML::put_results, avg 30µs/call
1571891.67ms1895.74ms $self->put_results(script => $self->{script});
# spent 5.74ms making 189 calls to Mail::SpamAssassin::HTML::put_results, avg 30µs/call
1581891.55ms1895.58ms $self->put_results(title => $self->{title});
# spent 5.58ms making 189 calls to Mail::SpamAssassin::HTML::put_results, avg 29µs/call
159
160 # final result hashes
1611891.32ms1895.64ms $self->put_results(inside => $self->{inside});
# spent 5.64ms making 189 calls to Mail::SpamAssassin::HTML::put_results, avg 30µs/call
162
163 # end-of-document result values that don't require looking at the text
164189519µs if (exists $self->{backhair}) {
16522137µs11382µs $self->put_results(backhair_count => scalar keys %{ $self->{backhair} });
# spent 382µs making 11 calls to Mail::SpamAssassin::HTML::put_results, avg 35µs/call
166 }
167189915µs if (exists $self->{elements} && exists $self->{tags}) {
168 $self->put_results(bad_tag_ratio =>
1691871.78ms1875.40ms ($self->{tags} - $self->{elements}) / $self->{tags});
# spent 5.40ms making 187 calls to Mail::SpamAssassin::HTML::put_results, avg 29µs/call
170 }
171189983µs if (exists $self->{elements_seen} && exists $self->{tags_seen}) {
172 $self->put_results(non_element_ratio =>
173 ($self->{tags_seen} - $self->{elements_seen}) /
1741871.68ms1875.42ms $self->{tags_seen});
# spent 5.42ms making 187 calls to Mail::SpamAssassin::HTML::put_results, avg 29µs/call
175 }
1761891.80ms if (exists $self->{tags} && exists $self->{obfuscation}) {
177 $self->put_results(obfuscation_ratio =>
17819146µs19491µs $self->{obfuscation} / $self->{tags});
# spent 491µs making 19 calls to Mail::SpamAssassin::HTML::put_results, avg 26µs/call
179 }
180}
181
182
# spent 88.0ms within Mail::SpamAssassin::HTML::put_results which was called 3096 times, avg 28µs/call: # 189 times (5.96ms+0s) by Mail::SpamAssassin::HTML::html_end at line 142, avg 32µs/call # 189 times (5.82ms+0s) by Mail::SpamAssassin::HTML::html_end at line 143, avg 31µs/call # 189 times (5.74ms+0s) by Mail::SpamAssassin::HTML::html_end at line 157, avg 30µs/call # 189 times (5.68ms+0s) by Mail::SpamAssassin::HTML::html_end at line 156, avg 30µs/call # 189 times (5.64ms+0s) by Mail::SpamAssassin::HTML::html_end at line 161, avg 30µs/call # 189 times (5.58ms+0s) by Mail::SpamAssassin::HTML::html_end at line 158, avg 29µs/call # 189 times (5.43ms+0s) by Mail::SpamAssassin::HTML::html_end at line 140, avg 29µs/call # 189 times (5.16ms+0s) by Mail::SpamAssassin::HTML::html_end at line 149, avg 27µs/call # 189 times (5.11ms+0s) by Mail::SpamAssassin::HTML::html_end at line 148, avg 27µs/call # 189 times (4.95ms+0s) by Mail::SpamAssassin::HTML::html_start at line 115, avg 26µs/call # 189 times (4.93ms+0s) by Mail::SpamAssassin::HTML::html_end at line 147, avg 26µs/call # 189 times (4.78ms+0s) by Mail::SpamAssassin::HTML::html_end at line 146, avg 25µs/call # 189 times (4.71ms+0s) by Mail::SpamAssassin::HTML::html_end at line 139, avg 25µs/call # 187 times (5.48ms+0s) by Mail::SpamAssassin::HTML::html_end at line 152, avg 29µs/call # 187 times (5.42ms+0s) by Mail::SpamAssassin::HTML::html_end at line 174, avg 29µs/call # 187 times (5.40ms+0s) by Mail::SpamAssassin::HTML::html_end at line 169, avg 29µs/call # 47 times (1.33ms+0s) by Mail::SpamAssassin::HTML::html_font_invisible at line 555, avg 28µs/call # 19 times (491µs+0s) by Mail::SpamAssassin::HTML::html_end at line 178, avg 26µs/call # 11 times (382µs+0s) by Mail::SpamAssassin::HTML::html_end at line 165, avg 35µs/call # once (32µs+0s) by Mail::SpamAssassin::HTML::html_font_invisible at line 586
sub put_results {
18330965.37ms my $self = shift;
184309620.7ms my %results = @_;
185
186309681.0ms while (my ($k, $v) = each %results) {
187309612.2ms $self->{results}{$k} = $v;
188 }
189}
190
191
# spent 1.56ms within Mail::SpamAssassin::HTML::get_results which was called 189 times, avg 8µs/call: # 189 times (1.56ms+0s) by Mail::SpamAssassin::Message::Node::rendered at line 643 of Mail/SpamAssassin/Message/Node.pm, avg 8µs/call
sub get_results {
192189442µs my ($self) = @_;
193
1941891.42ms return $self->{results};
195}
196
197
# spent 383ms within Mail::SpamAssassin::HTML::get_rendered_text which was called 567 times, avg 675µs/call: # 189 times (193ms+0s) by Mail::SpamAssassin::Message::Node::rendered at line 641 of Mail/SpamAssassin/Message/Node.pm, avg 1.02ms/call # 189 times (179ms+0s) by Mail::SpamAssassin::Message::Node::rendered at line 642 of Mail/SpamAssassin/Message/Node.pm, avg 947µs/call # 189 times (10.7ms+0s) by Mail::SpamAssassin::Message::Node::rendered at line 640 of Mail/SpamAssassin/Message/Node.pm, avg 57µs/call
sub get_rendered_text {
1985671.03ms my $self = shift;
1995672.68ms my %options = @_;
200
20175611.0ms return join('', @{ $self->{text} }) unless %options;
202
203378725µs my $mask;
2043784.30ms while (my ($k, $v) = each %options) {
2053781.20ms next if !defined $self->{"text_$k"};
2063781.42ms if (!defined $mask) {
2073783.62ms $mask |= $v ? $self->{"text_$k"} : ~ $self->{"text_$k"};
208 }
209 else {
210 $mask &= $v ? $self->{"text_$k"} : ~ $self->{"text_$k"};
211 }
212 }
213
214378960µs my $text = '';
215378708µs my $i = 0;
21693692352ms for (@{ $self->{text} }) { $text .= $_ if vec($mask, $i++, 1); }
2173785.07ms return $text;
218}
219
220
# spent 22.6s (34.9ms+22.5) within Mail::SpamAssassin::HTML::parse which was called 189 times, avg 119ms/call: # 189 times (34.9ms+22.5s) by Mail::SpamAssassin::Message::Node::rendered at line 637 of Mail/SpamAssassin/Message/Node.pm, avg 119ms/call
sub parse {
2211891.04ms my ($self, $text) = @_;
222
223189695µs $self->{image_area} = 0;
224189578µs $self->{title_index} = -1;
225189601µs $self->{max_size} = 3; # start at default size
226189548µs $self->{min_size} = 3; # start at default size
227189762µs $self->{closed_html} = 0;
228189608µs $self->{closed_body} = 0;
229189550µs $self->{closed_extra} = 0;
230189606µs $self->{text} = []; # rendered text
2311893.35ms1896.75ms $self->{length} += untaint_var(length($text));
# spent 6.75ms making 189 calls to Mail::SpamAssassin::Util::untaint_var, avg 36µs/call
232
233 # NOTE: We *only* need to fix the rendering when we verify that it
234 # differs from what people see in their MUA. Testing is best done with
235 # the most common MUAs and browsers, if you catch my drift.
236
237 # NOTE: HTML::Parser can cope with: <?xml pis>, <? with space>, so we
238 # don't need to fix them here.
239
240 # # (outdated claim) HTML::Parser converts &nbsp; into a question mark ("?")
241 # # for some reason, so convert them to spaces. Confirmed in 3.31, at least.
242 # ... Actually it doesn't, it is correctly coverted into Unicode NBSP,
243 # nevertheless it does not hurt to treat it as a space.
24418916.8ms18915.0ms $text =~ s/&nbsp;/ /g;
# spent 15.0ms making 189 calls to Mail::SpamAssassin::HTML::CORE:subst, avg 79µs/call
245
246 # bug 4695: we want "<br/>" to be treated the same as "<br>", and
247 # the HTML::Parser API won't do it for us
24818948.5ms159337.6ms $text =~ s/<(\w+)\s*\/>/<$1>/gi;
# spent 28.3ms making 1404 calls to Mail::SpamAssassin::HTML::CORE:substcont, avg 20µs/call # spent 9.24ms making 189 calls to Mail::SpamAssassin::HTML::CORE:subst, avg 49µs/call
249
2501893.55ms1891.03ms if (!$self->UNIVERSAL::can('utf8_mode')) {
# spent 1.03ms making 189 calls to UNIVERSAL::can, avg 5µs/call
251 # utf8_mode is cleared by default, only warn if it would need to be set
252 warn "message: cannot set utf8_mode, module HTML::Parser is too old\n"
253 if !$self->{SA_character_semantics_input};
254 } else {
2551892.48ms189682µs $self->SUPER::utf8_mode($self->{SA_character_semantics_input} ? 0 : 1);
# spent 682µs making 189 calls to HTML::Parser::utf8_mode, avg 4µs/call
2561893.10ms3781.95ms dbg("message: HTML::Parser utf8_mode %s",
# spent 1.41ms making 189 calls to Mail::SpamAssassin::Logger::dbg, avg 7µs/call # spent 536µs making 189 calls to HTML::Parser::utf8_mode, avg 3µs/call
257 $self->SUPER::utf8_mode ? "on (assumed UTF-8 octets)"
258 : "off (default, assumed Unicode characters)");
259 }
2601891.10s7731143.0s $self->SUPER::parse($text);
# spent 22.3s making 189 calls to HTML::Parser::parse, avg 118ms/call # spent 14.1s making 46892 calls to Mail::SpamAssassin::HTML::html_tag, avg 301µs/call # spent 6.52s making 29563 calls to Mail::SpamAssassin::HTML::html_text, avg 221µs/call # spent 12.4ms making 189 calls to Mail::SpamAssassin::HTML::html_start, avg 65µs/call # spent 7.24ms making 411 calls to Mail::SpamAssassin::HTML::html_comment, avg 18µs/call # spent 3.04ms making 67 calls to Mail::SpamAssassin::HTML::html_declaration, avg 45µs/call
2611895.63ms524369ms $self->SUPER::eof;
# spent 187ms making 189 calls to HTML::Parser::eof, avg 991µs/call # spent 146ms making 189 calls to Mail::SpamAssassin::HTML::html_end, avg 773µs/call # spent 35.8ms making 146 calls to Mail::SpamAssassin::HTML::html_text, avg 245µs/call
262
2631892.02ms return $self->{text};
264}
265
266
# spent 14.1s (3.39+10.7) within Mail::SpamAssassin::HTML::html_tag which was called 46892 times, avg 301µs/call: # 46892 times (3.39s+10.7s) by HTML::Parser::parse at line 260, avg 301µs/call
sub html_tag {
26746892102ms my ($self, $tag, $attr, $num) = @_;
2684689290.4ms utf8::encode($tag) if $self->{SA_encode_results};
269
27046892562ms4689297.9ms my $maybe_namespace = ($tag =~ m@^(?:o|st\d):[\w-]+/?$@);
# spent 97.9ms making 46892 calls to Mail::SpamAssassin::HTML::CORE:match, avg 2µs/call
271
27246892177ms if (exists $elements{$tag} || $maybe_namespace) {
2734687782.9ms $self->{elements}++;
2744687795.7ms $self->{elements_seen}++ if !exists $self->{inside}{$tag};
275 }
2764689279.9ms $self->{tags}++;
2774689286.1ms $self->{tags_seen}++ if !exists $self->{inside}{$tag};
27846892126ms $self->{inside}{$tag} += $num;
2794689290.6ms if ($self->{inside}{$tag} < 0) {
2803482µs $self->{inside}{$tag} = 0;
2813472µs $self->{closed_extra}++;
282 }
283
2844689289.5ms return if $maybe_namespace;
285
286 # ignore non-elements
28745476598ms if (exists $elements{$tag} || exists $tricks{$tag}) {
28845461320ms329085.65s $self->text_style($tag, $attr, $num) if exists $elements_text_style{$tag};
# spent 5.65s making 32908 calls to Mail::SpamAssassin::HTML::text_style, avg 172µs/call
289
290 # bug 5009: things like <p> and </p> both need dealing with
29145461169ms168602.53s $self->html_whitespace($tag) if exists $elements_whitespace{$tag};
# spent 2.53s making 16860 calls to Mail::SpamAssassin::HTML::html_whitespace, avg 150µs/call
292
293 # start tags
29445461194ms if ($num == 1) {
29524570110ms126731.73s $self->html_uri($tag, $attr) if exists $elements_uri{$tag};
# spent 1.73s making 12673 calls to Mail::SpamAssassin::HTML::html_uri, avg 136µs/call
29624570190ms24570734ms $self->html_tests($tag, $attr, $num);
# spent 734ms making 24570 calls to Mail::SpamAssassin::HTML::html_tests, avg 30µs/call
297 }
298 # end tags
299 else {
3002089136.7ms $self->{closed_html} = 1 if $tag eq "html";
3012089136.1ms $self->{closed_body} = 1 if $tag eq "body";
302 }
303 }
304}
305
306
# spent 2.53s (616ms+1.91) within Mail::SpamAssassin::HTML::html_whitespace which was called 16860 times, avg 150µs/call: # 16860 times (616ms+1.91s) by Mail::SpamAssassin::HTML::html_tag at line 291, avg 150µs/call
sub html_whitespace {
3071686037.3ms my ($self, $tag) = @_;
308
309 # ordered by frequency of tag groups, note: whitespace is always "visible"
31016860459ms1690778.4ms if ($tag eq "br" || $tag eq "div") {
# spent 78.4ms making 16907 calls to Mail::SpamAssassin::HTML::CORE:match, avg 5µs/call
311394925.6ms3949403ms $self->display_text("\n", whitespace => 1);
# spent 403ms making 3949 calls to Mail::SpamAssassin::HTML::display_text, avg 102µs/call
312 }
313 elsif ($tag =~ /^(?:li|t[hd]|d[td]|embed|h\d)$/) {
314891567.4ms89151.02s $self->display_text(" ", whitespace => 1);
# spent 1.02s making 8915 calls to Mail::SpamAssassin::HTML::display_text, avg 114µs/call
315 }
316 elsif ($tag =~ /^(?:p|hr|blockquote|pre|listing|plaintext|xmp|title)$/) {
317399625.7ms3996413ms $self->display_text("\n\n", whitespace => 1);
# spent 413ms making 3996 calls to Mail::SpamAssassin::HTML::display_text, avg 103µs/call
318 }
319}
320
321# puts the uri onto the internal array
322# note: uri may be blank (<a href=""></a> obfuscation, etc.)
323
# spent 1.31s (126ms+1.18) within Mail::SpamAssassin::HTML::push_uri which was called 2705 times, avg 484µs/call: # 1560 times (76.6ms+702ms) by Mail::SpamAssassin::HTML::html_uri at line 362, avg 499µs/call # 1132 times (48.9ms+464ms) by Mail::SpamAssassin::HTML::html_uri at line 367, avg 453µs/call # 13 times (496µs+17.9ms) by Mail::SpamAssassin::HTML::html_uri at line 357, avg 1.41ms/call
sub push_uri {
32427055.87ms my ($self, $type, $uri) = @_;
325
326270519.1ms2705133ms $uri = $self->canon_uri($uri);
# spent 133ms making 2705 calls to Mail::SpamAssassin::HTML::canon_uri, avg 49µs/call
32727055.50ms utf8::encode($uri) if $self->{SA_encode_results};
328
329270525.6ms27051.05s my $target = target_uri($self->{base_href} || "", $uri);
# spent 1.05s making 2705 calls to Mail::SpamAssassin::HTML::target_uri, avg 389µs/call
330
331 # skip things like <iframe src="" ...>
332270530.3ms $self->{uri}->{$uri}->{types}->{$type} = 1 if $uri ne '';
333}
334
335
# spent 192ms (144+47.3) within Mail::SpamAssassin::HTML::canon_uri which was called 4261 times, avg 45µs/call: # 2705 times (101ms+31.1ms) by Mail::SpamAssassin::HTML::push_uri at line 326, avg 49µs/call # 1556 times (43.0ms+16.1ms) by Mail::SpamAssassin::HTML::html_tests at line 654, avg 38µs/call
sub canon_uri {
33642618.27ms my ($self, $uri) = @_;
337
338 # URIs don't have leading/trailing whitespace ...
339426172.3ms426124.9ms $uri =~ s/^\s+//;
# spent 24.9ms making 4261 calls to Mail::SpamAssassin::HTML::CORE:subst, avg 6µs/call
340426168.0ms426122.3ms $uri =~ s/\s+$//;
# spent 22.3ms making 4261 calls to Mail::SpamAssassin::HTML::CORE:subst, avg 5µs/call
341
342 # Make sure all the URIs are nice and short
34342618.38ms if (length $uri > MAX_URI_LENGTH) {
344 $self->{'uri_truncated'} = 1;
345 $uri = substr $uri, 0, MAX_URI_LENGTH;
346 }
347
348426175.4ms return $uri;
349}
350
351
# spent 1.73s (348ms+1.38) within Mail::SpamAssassin::HTML::html_uri which was called 12673 times, avg 136µs/call: # 12673 times (348ms+1.38s) by Mail::SpamAssassin::HTML::html_tag at line 295, avg 136µs/call
sub html_uri {
3521267332.8ms my ($self, $tag, $attr) = @_;
353
354 # ordered by frequency of tag groups
35512673358ms1654270.8ms if ($tag =~ /^(?:body|table|tr|td)$/) {
# spent 70.8ms making 16542 calls to Mail::SpamAssassin::HTML::CORE:match, avg 4µs/call
356993820.5ms if (defined $attr->{background}) {
35713113µs1318.3ms $self->push_uri($tag, $attr->{background});
# spent 18.3ms making 13 calls to Mail::SpamAssassin::HTML::push_uri, avg 1.41ms/call
358 }
359 }
360 elsif ($tag =~ /^(?:a|area|link)$/) {
36116016.10ms if (defined $attr->{href}) {
362156015.4ms1560779ms $self->push_uri($tag, $attr->{href});
# spent 779ms making 1560 calls to Mail::SpamAssassin::HTML::push_uri, avg 499µs/call
363 }
364 }
365 elsif ($tag =~ /^(?:img|frame|iframe|embed|script|bgsound)$/) {
36611344.39ms if (defined $attr->{src}) {
367113210.7ms1132513ms $self->push_uri($tag, $attr->{src});
# spent 513ms making 1132 calls to Mail::SpamAssassin::HTML::push_uri, avg 453µs/call
368 }
369 }
370 elsif ($tag eq "form") {
371 if (defined $attr->{action}) {
372 $self->push_uri($tag, $attr->{action});
373 }
374 }
375 elsif ($tag eq "base") {
376 if (my $uri = $attr->{href}) {
377 $uri = $self->canon_uri($uri);
378
379 # use <BASE HREF="URI"> to turn relative links into absolute links
380
381 # even if it is a base URI, handle like a normal URI as well
382 $self->push_uri($tag, $uri);
383
384 # a base URI will be ignored by browsers unless it is an absolute
385 # URI of a standard protocol
386 if ($uri =~ m@^(?:https?|ftp):/{0,2}@i) {
387 # remove trailing filename, if any; base URIs can have the
388 # form of "http://foo.com/index.html"
389 $uri =~ s@^([a-z]+:/{0,2}[^/]+/.*?)[^/\.]+\.[^/\.]{2,4}$@$1@i;
390
391 # Make sure it ends in a slash
392 $uri .= "/" unless $uri =~ m@/$@;
393 utf8::encode($uri) if $self->{SA_encode_results};
394 $self->{base_href} = $uri;
395 }
396 }
397 }
398}
399
400# this might not be quite right, may need to pay attention to table nesting
401
# spent 1.01s (993ms+12.8ms) within Mail::SpamAssassin::HTML::close_table_tag which was called 7824 times, avg 129µs/call: # 7824 times (993ms+12.8ms) by Mail::SpamAssassin::HTML::text_style at line 455, avg 129µs/call
sub close_table_tag {
402782415.3ms my ($self, $tag) = @_;
403
404 # don't close if never opened
405176215837ms return unless grep { $_->{tag} eq $tag } @{ $self->{text_style} };
406
407728111.4ms my $top;
4081459443.2ms while (@{ $self->{text_style} } && ($top = $self->{text_style}[-1]->{tag})) {
409731345.7ms321312.8ms if (($tag eq "td" && ($top eq "font" || $top eq "td")) ||
# spent 12.8ms making 3213 calls to Mail::SpamAssassin::HTML::CORE:match, avg 4µs/call
410 ($tag eq "tr" && $top =~ /^(?:font|td|tr)$/))
411 {
41264211µs pop @{ $self->{text_style} };
413 }
414 else {
415728169.7ms last;
416 }
417 }
418}
419
420
# spent 1.93s within Mail::SpamAssassin::HTML::close_tag which was called 16266 times, avg 118µs/call: # 16266 times (1.93s+0s) by Mail::SpamAssassin::HTML::text_style at line 539, avg 118µs/call
sub close_tag {
4211626631.4ms my ($self, $tag) = @_;
422
423 # don't close if never opened
4242986651.41s return if !grep { $_->{tag} eq $tag } @{ $self->{text_style} };
425
426 # close everything up to and including tag
42748780308ms while (my %current = %{ pop @{ $self->{text_style} } }) {
42816279293ms last if $current{tag} eq $tag;
429 }
430}
431
432
# spent 5.65s (2.26+3.39) within Mail::SpamAssassin::HTML::text_style which was called 32908 times, avg 172µs/call: # 32908 times (2.26s+3.39s) by Mail::SpamAssassin::HTML::html_tag at line 288, avg 172µs/call
sub text_style {
4333290870.6ms my ($self, $tag, $attr, $num) = @_;
434
435 # treat <th> as <td>
4363290858.7ms $tag = "td" if $tag eq "th";
437
438 # open
43932908453ms if ($num == 1) {
440 # HTML browsers generally only use first <body> for colors,
441 # so only push if we haven't seen a body tag yet
4421648527.4ms if ($tag eq "body") {
443 # TODO: skip if we've already seen body
444 }
445
446 # change basefont (only change size)
4471648526.5ms if ($tag eq "basefont" &&
448 exists $attr->{size} && $attr->{size} =~ /^\s*(\d+)/)
449 {
450 $self->{basefont} = $1;
451 return;
452 }
453
454 # close elements with optional end tags
4551648565.2ms78241.01s $self->close_table_tag($tag) if ($tag eq "td" || $tag eq "tr");
# spent 1.01s making 7824 calls to Mail::SpamAssassin::HTML::close_table_tag, avg 129µs/call
456
457 # copy current text state
45832970333ms my %new = %{ $self->{text_style}[-1] };
459
460 # change tag name!
4611648532.3ms $new{tag} = $tag;
462
463 # big and small tags
4641648527.7ms if ($tag eq "big") {
4651332µs $new{size} += 1;
4662686µs push @{ $self->{text_style} }, \%new;
4671398µs return;
468 }
4691647226.7ms if ($tag eq "small") {
4701637µs $new{size} -= 1;
47132108µs push @{ $self->{text_style} }, \%new;
47216141µs return;
473 }
474
475 # tag attributes
47616456127ms for my $name (keys %$attr) {
4772628656.4ms next unless exists $ok_attributes{$tag}{$name};
478544823.3ms if ($name eq "text" || $name eq "color") {
479 # two different names for text color
48085882µs8520.1ms $new{fgcolor} = name_to_rgb($attr->{$name});
# spent 20.1ms making 85 calls to Mail::SpamAssassin::HTML::name_to_rgb, avg 237µs/call
481 }
482 elsif ($name eq "size") {
4831172.63ms198800µs if ($attr->{size} =~ /^\s*([+-]\d+)/) {
# spent 800µs making 198 calls to Mail::SpamAssassin::HTML::CORE:match, avg 4µs/call
484 # relative font size
48536132µs $new{size} = $self->{basefont} + $1;
486 }
487 elsif ($attr->{size} =~ /^\s*(\d+)/) {
488 # absolute font size
48977322µs $new{size} = $1;
490 }
491 }
492 elsif ($name eq 'style') {
49344269.71ms $new{style} = $attr->{style};
494442627.3ms my @parts = split(/;/, $new{style});
495442626.2ms foreach (@parts) {
49615262595ms28303182ms if (/^\s*(background-)?color:\s*(.+)\s*$/i) {
# spent 182ms making 28303 calls to Mail::SpamAssassin::HTML::CORE:match, avg 6µs/call
49722215.89ms my $whcolor = $1 ? 'bgcolor' : 'fgcolor';
49822216.42ms my $value = lc $2;
499
500222134.5ms22216.05ms if ($value =~ /rgb/) {
# spent 6.05ms making 2221 calls to Mail::SpamAssassin::HTML::CORE:match, avg 3µs/call
5016111.87ms $value =~ tr/0-9,//cd;
5026112.56ms my @rgb = split(/,/, $value);
503 $new{$whcolor} = sprintf("#%02x%02x%02x",
504244416.8ms map { !$_ ? 0 : $_ > 255 ? 255 : $_ }
505 @rgb[0..2]);
506 }
507 else {
508161015.0ms1610170ms $new{$whcolor} = name_to_rgb($value);
# spent 170ms making 1610 calls to Mail::SpamAssassin::HTML::name_to_rgb, avg 105µs/call
509 }
510 }
511 elsif (/^\s*([a-z_-]+)\s*:\s*(\S.*?)\s*$/i) {
512 # "display: none", "visibility: hidden", etc.
51312646113ms $new{'style_'.$1} = $2;
514 }
515 }
516 }
517 elsif ($name eq "bgcolor") {
518 # overwrite with hex value, $new{bgcolor} is set below
5196307.32ms63075.9ms $attr->{bgcolor} = name_to_rgb($attr->{bgcolor});
# spent 75.9ms making 630 calls to Mail::SpamAssassin::HTML::name_to_rgb, avg 120µs/call
520 }
521 else {
522 # attribute is probably okay
523190483µs $new{$name} = $attr->{$name};
524 }
525
526544834.7ms if ($new{size} > $self->{max_size}) {
52726µs $self->{max_size} = $new{size};
528 }
529 elsif ($new{size} < $self->{min_size}) {
5303394µs $self->{min_size} = $new{size};
531 }
532 }
53332912144ms push @{ $self->{text_style} }, \%new;
534 }
535 # explicitly close a tag
536 else {
5371642355.3ms if ($tag ne "body") {
538 # don't close body since browsers seem to render text after </body>
53916266101ms162661.93s $self->close_tag($tag);
# spent 1.93s making 16266 calls to Mail::SpamAssassin::HTML::close_tag, avg 118µs/call
540 }
541 }
542}
543
544
# spent 444ms (396+47.8) within Mail::SpamAssassin::HTML::html_font_invisible which was called 4787 times, avg 93µs/call: # 4787 times (396ms+47.8ms) by Mail::SpamAssassin::HTML::html_text at line 746, avg 93µs/call
sub html_font_invisible {
545478710.3ms my ($self, $text) = @_;
546
547478712.3ms my $fg = $self->{text_style}[-1]->{fgcolor};
548478710.3ms my $bg = $self->{text_style}[-1]->{bgcolor};
54947879.92ms my $size = $self->{text_style}[-1]->{size};
55047879.80ms my $display = $self->{text_style}[-1]->{style_display};
55147879.51ms my $visibility = $self->{text_style}[-1]->{style_visibility};
552
553 # invisibility
5544787103ms474029.8ms if (substr($fg,-6) eq substr($bg,-6)) {
# spent 29.8ms making 4740 calls to Mail::SpamAssassin::HTML::CORE:match, avg 6µs/call
55547332µs471.33ms $self->put_results(font_low_contrast => 1);
# spent 1.33ms making 47 calls to Mail::SpamAssassin::HTML::put_results, avg 28µs/call
55647485µs return 1;
557 # near-invisibility
558 } elsif ($fg =~ /^\#?([0-9a-f]{2})([0-9a-f]{2})([0-9a-f]{2})$/) {
559473918.9ms my ($r1, $g1, $b1) = (hex($1), hex($2), hex($3));
560
561473997.3ms473916.6ms if ($bg =~ /^\#?([0-9a-f]{2})([0-9a-f]{2})([0-9a-f]{2})$/) {
# spent 16.6ms making 4739 calls to Mail::SpamAssassin::HTML::CORE:match, avg 4µs/call
562473914.7ms my ($r2, $g2, $b2) = (hex($1), hex($2), hex($3));
563
56447399.86ms my $r = ($r1 - $r2);
56547398.18ms my $g = ($g1 - $g2);
56647398.04ms my $b = ($b1 - $b2);
567
568 # geometric distance weighted by brightness
569 # maximum distance is 191.151823601032
570473919.9ms my $distance = ((0.2126*$r)**2 + (0.7152*$g)**2 + (0.0722*$b)**2)**0.5;
571
572 # the text is very difficult to read if the distance is under 12,
573 # a limit of 14 to 16 might be okay if the usage significantly
574 # increases (near-invisible text is at about 0.95% of spam and
575 # 1.25% of HTML spam right now), but please test any changes first
576473931.9ms if ($distance < 12) {
577 $self->put_results(font_low_contrast => 1);
578 return 1;
579 }
580 }
581 }
582
583
584 # invalid color
58547409.65ms if ($fg eq 'invalid' or $bg eq 'invalid') {
58617µs132µs $self->put_results(font_invalid_color => 1);
# spent 32µs making 1 call to Mail::SpamAssassin::HTML::put_results
58717µs return 1;
588 }
589
590 # size too small
59147398.83ms if ($size <= 1) {
59240452µs return 1;
593 }
594
595 # <span style="display: none">
59646998.16ms if ($display && lc $display eq 'none') {
597438µs return 1;
598 }
599
60046957.87ms if ($visibility && lc $visibility eq 'hidden') {
601887µs return 1;
602 }
603
604468773.3ms return 0;
605}
606
607
# spent 734ms (661+73.3) within Mail::SpamAssassin::HTML::html_tests which was called 24570 times, avg 30µs/call: # 24570 times (661ms+73.3ms) by Mail::SpamAssassin::HTML::html_tag at line 296, avg 30µs/call
sub html_tests {
6082457054.2ms my ($self, $tag, $attr, $num) = @_;
609
6102457043.9ms if ($tag eq "font" && exists $attr->{face}) {
6111753.38ms1751.97ms if ($attr->{face} !~ /^[a-z ][a-z -]*[a-z](?:,\s*[a-z][a-z -]*[a-z])*$/i) {
# spent 1.97ms making 175 calls to Mail::SpamAssassin::HTML::CORE:match, avg 11µs/call
612 $self->put_results(font_face_bad => 1);
613 }
614 }
6152457045.9ms if ($tag eq "img" && exists $self->{inside}{a} && $self->{inside}{a} > 0) {
6165041.37ms my $uri = $self->{anchor_last};
6175041.00ms utf8::encode($uri) if $self->{SA_encode_results};
6185042.02ms $self->{uri}->{$uri}->{anchor_text}->[-1] .= "<img>\n";
6195043.22ms $self->{anchor}->[-1] .= "<img>\n";
620 }
621
6222457042.8ms if ($tag eq "img" && exists $attr->{width} && exists $attr->{height}) {
6237761.65ms my $width = 0;
6247761.55ms my $height = 0;
6257761.58ms my $area = 0;
626
627 # assume 800x600 screen for percentage values
62877613.7ms7767.29ms if ($attr->{width} =~ /^(\d+)(\%)?$/) {
# spent 7.29ms making 776 calls to Mail::SpamAssassin::HTML::CORE:match, avg 9µs/call
6297762.03ms $width = $1;
6307761.90ms $width *= 8 if (defined $2 && $2 eq "%");
631 }
6327769.88ms7763.15ms if ($attr->{height} =~ /^(\d+)(\%)?$/) {
# spent 3.15ms making 776 calls to Mail::SpamAssassin::HTML::CORE:match, avg 4µs/call
6337751.89ms $height = $1;
6347751.71ms $height *= 6 if (defined $2 && $2 eq "%");
635 }
636 # guess size
6377762.04ms $width = 200 if $width <= 0;
6387761.44ms $height = 200 if $height <= 0;
6397766.63ms if ($width > 0 && $height > 0) {
6407761.42ms $area = $width * $height;
6417762.21ms $self->{image_area} += $area;
642 }
643 }
6442457039.8ms if ($tag eq "form" && exists $attr->{action}) {
645 $self->put_results(form_action_mailto => 1) if $attr->{action} =~ /mailto:/i
646 }
6472457045.3ms if ($tag eq "object" || $tag eq "embed") {
648 $self->put_results(embeds => 1);
649 }
650
651 # special text delimiters - <a> and <title>
6522457044.7ms if ($tag eq "a") {
653 my $uri = $self->{anchor_last} =
654159713.8ms155659.1ms (exists $attr->{href} ? $self->canon_uri($attr->{href}) : "");
# spent 59.1ms making 1556 calls to Mail::SpamAssassin::HTML::canon_uri, avg 38µs/call
65515973.00ms utf8::encode($uri) if $self->{SA_encode_results};
656319418.7ms push(@{$self->{uri}->{$uri}->{anchor_text}}, '');
657319418.3ms push(@{$self->{anchor}}, '');
658 }
6592457040.8ms if ($tag eq "title") {
66067163µs $self->{title_index}++;
66167334µs $self->{title}->[$self->{title_index}] = "";
662 }
663
66424570332ms2721.79ms if ($tag eq "meta" &&
# spent 1.79ms making 272 calls to Mail::SpamAssassin::HTML::CORE:match, avg 7µs/call
665 exists $attr->{'http-equiv'} &&
666 exists $attr->{content} &&
667 $attr->{'http-equiv'} =~ /Content-Type/i &&
668 $attr->{content} =~ /\bcharset\s*=\s*["']?([^"']+)/i)
669 {
670123750µs $self->{charsets} .= exists $self->{charsets} ? " $1" : $1;
671 }
672}
673
674
# spent 4.91s (4.54+373ms) within Mail::SpamAssassin::HTML::display_text which was called 46468 times, avg 106µs/call: # 29508 times (2.76s+304ms) by Mail::SpamAssassin::HTML::html_text at line 773, avg 104µs/call # 8915 times (980ms+36.9ms) by Mail::SpamAssassin::HTML::html_whitespace at line 314, avg 114µs/call # 3996 times (404ms+9.16ms) by Mail::SpamAssassin::HTML::html_whitespace at line 317, avg 103µs/call # 3949 times (394ms+9.14ms) by Mail::SpamAssassin::HTML::html_whitespace at line 311, avg 102µs/call # 100 times (8.15ms+13.9ms) by Mail::SpamAssassin::HTML::html_text at line 770, avg 221µs/call
sub display_text {
6754646883.8ms my $self = shift;
6764646890.4ms my $text = shift;
67746468156ms my %display = @_;
678
679 # Unless it's specified to be invisible, then it's not invisible. ;)
68046468159ms if (!exists $display{invisible}) {
6814636894.9ms $display{invisible} = 0;
682 }
683
68446468199ms if ($display{whitespace}) {
685 # trim trailing whitespace from previous element if it was not whitespace
686 # and it was not invisible
68733720194ms if (@{ $self->{text} } &&
688 (!defined $self->{text_whitespace} ||
6891667331.9ms !vec($self->{text_whitespace}, $#{$self->{text}}, 1)) &&
690 (!defined $self->{text_invisible} ||
6911426426.7ms !vec($self->{text_invisible}, $#{$self->{text}}, 1)))
692 {
69314202308ms1420255.2ms $self->{text}->[-1] =~ s/ $//;
# spent 55.2ms making 14202 calls to Mail::SpamAssassin::HTML::CORE:subst, avg 4µs/call
694 }
695 }
696 else {
697 # NBSP: UTF-8: C2 A0, ISO-8859-*: A0
69829608714ms29608261ms $text =~ s/[ \t\n\r\f\x0b]+|\xc2\xa0/ /gs;
# spent 261ms making 29608 calls to Mail::SpamAssassin::HTML::CORE:subst, avg 9µs/call
699 # trim leading whitespace if previous element was whitespace
700 # and current element is not invisible
70159216346ms if (@{ $self->{text} } && !$display{invisible} &&
702 defined $self->{text_whitespace} &&
7032876152.5ms vec($self->{text_whitespace}, $#{$self->{text}}, 1))
704 {
70514215275ms1421556.8ms $text =~ s/^ //;
# spent 56.8ms making 14215 calls to Mail::SpamAssassin::HTML::CORE:subst, avg 4µs/call
706 }
707 }
70892936479ms push @{ $self->{text} }, $text;
709464681.38s while (my ($k, $v) = each %display) {
71063328127ms my $textvar = "text_".$k;
71163704125ms if (!exists $self->{$textvar}) { $self->{$textvar} = ''; }
712126656588ms vec($self->{$textvar}, $#{$self->{text}}, 1) = $v;
713 }
714}
715
716
# spent 6.56s (2.28+4.28) within Mail::SpamAssassin::HTML::html_text which was called 29709 times, avg 221µs/call: # 29563 times (2.26s+4.26s) by HTML::Parser::parse at line 260, avg 221µs/call # 146 times (18.4ms+17.4ms) by HTML::Parser::eof at line 261, avg 245µs/call
sub html_text {
7172970962.5ms my ($self, $text) = @_;
7182970957.7ms utf8::encode($text) if $self->{SA_encode_results};
719
720 # text that is not part of body
7212970954.9ms if (exists $self->{inside}{script} && $self->{inside}{script} > 0)
722 {
72328µs push @{ $self->{script} }, $text;
72417µs return;
725 }
7262970859.3ms if (exists $self->{inside}{style} && $self->{inside}{style} > 0) {
727100785µs return;
728 }
729
730 # text that is part of body and also stored separately
7312960863.6ms if (exists $self->{inside}{a} && $self->{inside}{a} > 0) {
732 # this doesn't worry about nested anchors
73316313.92ms my $uri = $self->{anchor_last};
73416312.92ms utf8::encode($uri) if $self->{SA_encode_results};
73516316.32ms $self->{uri}->{$uri}->{anchor_text}->[-1] .= $text;
736163111.9ms $self->{anchor}->[-1] .= $text;
737 }
7382960856.2ms if (exists $self->{inside}{title} && $self->{inside}{title} > 0) {
73927100µs $self->{title}->[$self->{title_index}] .= $text;
740 }
741
7422960851.2ms my $invisible_for_bayes = 0;
743
744 # NBSP: UTF-8: C2 A0, ISO-8859-*: A0
74529608759ms29608405ms if ($text !~ /^(?:[ \t\n\r\f\x0b]|\xc2\xa0)*\z/s) {
# spent 405ms making 29608 calls to Mail::SpamAssassin::HTML::CORE:match, avg 14µs/call
746478734.8ms4787444ms $invisible_for_bayes = $self->html_font_invisible($text);
# spent 444ms making 4787 calls to Mail::SpamAssassin::HTML::html_font_invisible, avg 93µs/call
747 }
748
74929608116ms if (exists $self->{text}->[-1]) {
750 # ideas discarded since they would be easy to evade:
751 # 1. using \w or [A-Za-z] instead of \S or non-punctuation
752 # 2. exempting certain tags
753 # no re "strict"; # since perl 5.21.8: Ranges of ASCII printables...
75429444441ms33070148ms if ($text =~ /^[^\s\x21-\x2f\x3a-\x40\x5b-\x60\x7b-\x7e]/s &&
# spent 148ms making 33070 calls to Mail::SpamAssassin::HTML::CORE:match, avg 4µs/call
755 $self->{text}->[-1] =~ /[^\s\x21-\x2f\x3a-\x40\x5b-\x60\x7b-\x7e]\z/s)
756 {
7573897µs $self->{obfuscation}++;
758 }
75929444511ms29444199ms if ($self->{text}->[-1] =~
# spent 199ms making 29444 calls to Mail::SpamAssassin::HTML::CORE:match, avg 7µs/call
760 /\b([^\s\x21-\x2f\x3a-\x40\x5b-\x60\x7b-\x7e]{1,7})\z/s)
761 {
7623911.27ms my $start = length($1);
7633915.00ms3911.29ms if ($text =~ /^([^\s\x21-\x2f\x3a-\x40\x5b-\x60\x7b-\x7e]{1,7})\b/s) {
# spent 1.29ms making 391 calls to Mail::SpamAssassin::HTML::CORE:match, avg 3µs/call
76413109µs $self->{backhair}->{$start . "_" . length($1)}++;
765 }
766 }
767 }
768
76929608424ms if ($invisible_for_bayes) {
770100802µs10022.1ms $self->display_text($text, invisible => 1);
# spent 22.1ms making 100 calls to Mail::SpamAssassin::HTML::display_text, avg 221µs/call
771 }
772 else {
77329508191ms295083.06s $self->display_text($text);
# spent 3.06s making 29508 calls to Mail::SpamAssassin::HTML::display_text, avg 104µs/call
774 }
775}
776
777# note: $text includes <!-- and -->
778
# spent 7.24ms within Mail::SpamAssassin::HTML::html_comment which was called 411 times, avg 18µs/call: # 411 times (7.24ms+0s) by HTML::Parser::parse at line 260, avg 18µs/call
sub html_comment {
779411869µs my ($self, $text) = @_;
780411847µs utf8::encode($text) if $self->{SA_encode_results};
781
7828226.39ms push @{ $self->{comment} }, $text;
783}
784
785
# spent 3.04ms (2.64+396µs) within Mail::SpamAssassin::HTML::html_declaration which was called 67 times, avg 45µs/call: # 67 times (2.64ms+396µs) by HTML::Parser::parse at line 260, avg 45µs/call
sub html_declaration {
78667328µs my ($self, $text) = @_;
78767191µs utf8::encode($text) if $self->{SA_encode_results};
788
789671.77ms67396µs if ($text =~ /^<!doctype/i) {
# spent 396µs making 67 calls to Mail::SpamAssassin::HTML::CORE:match, avg 6µs/call
79067178µs my $tag = "!doctype";
79167329µs $self->{elements}++;
79267179µs $self->{tags}++;
79367320µs $self->{inside}{$tag} = 0;
794 }
795}
796
797###########################################################################
798
7991127µsmy %html_color = (
800 # HTML 4 defined 16 colors
801 aqua => 0x00ffff,
802 black => 0x000000,
803 blue => 0x0000ff,
804 fuchsia => 0xff00ff,
805 gray => 0x808080,
806 green => 0x008000,
807 lime => 0x00ff00,
808 maroon => 0x800000,
809 navy => 0x000080,
810 olive => 0x808000,
811 purple => 0x800080,
812 red => 0xff0000,
813 silver => 0xc0c0c0,
814 teal => 0x008080,
815 white => 0xffffff,
816 yellow => 0xffff00,
817 # colors specified in CSS3 color module
818 aliceblue => 0xf0f8ff,
819 antiquewhite => 0xfaebd7,
820 aqua => 0x00ffff,
821 aquamarine => 0x7fffd4,
822 azure => 0xf0ffff,
823 beige => 0xf5f5dc,
824 bisque => 0xffe4c4,
825 black => 0x000000,
826 blanchedalmond => 0xffebcd,
827 blue => 0x0000ff,
828 blueviolet => 0x8a2be2,
829 brown => 0xa52a2a,
830 burlywood => 0xdeb887,
831 cadetblue => 0x5f9ea0,
832 chartreuse => 0x7fff00,
833 chocolate => 0xd2691e,
834 coral => 0xff7f50,
835 cornflowerblue => 0x6495ed,
836 cornsilk => 0xfff8dc,
837 crimson => 0xdc143c,
838 cyan => 0x00ffff,
839 darkblue => 0x00008b,
840 darkcyan => 0x008b8b,
841 darkgoldenrod => 0xb8860b,
842 darkgray => 0xa9a9a9,
843 darkgreen => 0x006400,
844 darkgrey => 0xa9a9a9,
845 darkkhaki => 0xbdb76b,
846 darkmagenta => 0x8b008b,
847 darkolivegreen => 0x556b2f,
848 darkorange => 0xff8c00,
849 darkorchid => 0x9932cc,
850 darkred => 0x8b0000,
851 darksalmon => 0xe9967a,
852 darkseagreen => 0x8fbc8f,
853 darkslateblue => 0x483d8b,
854 darkslategray => 0x2f4f4f,
855 darkslategrey => 0x2f4f4f,
856 darkturquoise => 0x00ced1,
857 darkviolet => 0x9400d3,
858 deeppink => 0xff1493,
859 deepskyblue => 0x00bfff,
860 dimgray => 0x696969,
861 dimgrey => 0x696969,
862 dodgerblue => 0x1e90ff,
863 firebrick => 0xb22222,
864 floralwhite => 0xfffaf0,
865 forestgreen => 0x228b22,
866 fuchsia => 0xff00ff,
867 gainsboro => 0xdcdcdc,
868 ghostwhite => 0xf8f8ff,
869 gold => 0xffd700,
870 goldenrod => 0xdaa520,
871 gray => 0x808080,
872 green => 0x008000,
873 greenyellow => 0xadff2f,
874 grey => 0x808080,
875 honeydew => 0xf0fff0,
876 hotpink => 0xff69b4,
877 indianred => 0xcd5c5c,
878 indigo => 0x4b0082,
879 ivory => 0xfffff0,
880 khaki => 0xf0e68c,
881 lavender => 0xe6e6fa,
882 lavenderblush => 0xfff0f5,
883 lawngreen => 0x7cfc00,
884 lemonchiffon => 0xfffacd,
885 lightblue => 0xadd8e6,
886 lightcoral => 0xf08080,
887 lightcyan => 0xe0ffff,
888 lightgoldenrodyellow => 0xfafad2,
889 lightgray => 0xd3d3d3,
890 lightgreen => 0x90ee90,
891 lightgrey => 0xd3d3d3,
892 lightpink => 0xffb6c1,
893 lightsalmon => 0xffa07a,
894 lightseagreen => 0x20b2aa,
895 lightskyblue => 0x87cefa,
896 lightslategray => 0x778899,
897 lightslategrey => 0x778899,
898 lightsteelblue => 0xb0c4de,
899 lightyellow => 0xffffe0,
900 lime => 0x00ff00,
901 limegreen => 0x32cd32,
902 linen => 0xfaf0e6,
903 magenta => 0xff00ff,
904 maroon => 0x800000,
905 mediumaquamarine => 0x66cdaa,
906 mediumblue => 0x0000cd,
907 mediumorchid => 0xba55d3,
908 mediumpurple => 0x9370db,
909 mediumseagreen => 0x3cb371,
910 mediumslateblue => 0x7b68ee,
911 mediumspringgreen => 0x00fa9a,
912 mediumturquoise => 0x48d1cc,
913 mediumvioletred => 0xc71585,
914 midnightblue => 0x191970,
915 mintcream => 0xf5fffa,
916 mistyrose => 0xffe4e1,
917 moccasin => 0xffe4b5,
918 navajowhite => 0xffdead,
919 navy => 0x000080,
920 oldlace => 0xfdf5e6,
921 olive => 0x808000,
922 olivedrab => 0x6b8e23,
923 orange => 0xffa500,
924 orangered => 0xff4500,
925 orchid => 0xda70d6,
926 palegoldenrod => 0xeee8aa,
927 palegreen => 0x98fb98,
928 paleturquoise => 0xafeeee,
929 palevioletred => 0xdb7093,
930 papayawhip => 0xffefd5,
931 peachpuff => 0xffdab9,
932 peru => 0xcd853f,
933 pink => 0xffc0cb,
934 plum => 0xdda0dd,
935 powderblue => 0xb0e0e6,
936 purple => 0x800080,
937 red => 0xff0000,
938 rosybrown => 0xbc8f8f,
939 royalblue => 0x4169e1,
940 saddlebrown => 0x8b4513,
941 salmon => 0xfa8072,
942 sandybrown => 0xf4a460,
943 seagreen => 0x2e8b57,
944 seashell => 0xfff5ee,
945 sienna => 0xa0522d,
946 silver => 0xc0c0c0,
947 skyblue => 0x87ceeb,
948 slateblue => 0x6a5acd,
949 slategray => 0x708090,
950 slategrey => 0x708090,
951 snow => 0xfffafa,
952 springgreen => 0x00ff7f,
953 steelblue => 0x4682b4,
954 tan => 0xd2b48c,
955 teal => 0x008080,
956 thistle => 0xd8bfd8,
957 tomato => 0xff6347,
958 turquoise => 0x40e0d0,
959 violet => 0xee82ee,
960 wheat => 0xf5deb3,
961 white => 0xffffff,
962 whitesmoke => 0xf5f5f5,
963 yellow => 0xffff00,
964 yellowgreen => 0x9acd32,
965);
966
967sub name_to_rgb_old {
968 my $color = lc $_[0];
969
970 # note: Mozilla strips leading and trailing whitespace at this point,
971 # but IE does not
972
973 # named colors
974 my $hex = $html_color{$color};
975 if (defined $hex) {
976 return sprintf("#%06x", $hex);
977 }
978
979 # Flex Hex: John Graham-Cumming, http://www.jgc.org/pdf/lisa2004.pdf
980 # strip optional # character
981 $color =~ s/^#//;
982 # pad right-hand-side to a multiple of three
983 $color .= "0" x (3 - (length($color) % 3)) if (length($color) % 3);
984 # split into triplets
985 my $length = length($color) / 3;
986 my @colors = ($color =~ /(.{$length})(.{$length})(.{$length})/);
987 # truncate each color to a DWORD, take MSB, left pad nibbles
988 foreach (@colors) { s/.*(.{8})$/$1/; s/(..).*/$1/; s/^(.)$/0$1/ };
989 # the color
990 $color = join("", @colors);
991 # replace non-hex characters with 0
992 $color =~ tr/0-9a-f/0/c;
993
994 return "#" . $color;
995}
996
997
# spent 266ms (204+61.4) within Mail::SpamAssassin::HTML::name_to_rgb which was called 2325 times, avg 114µs/call: # 1610 times (132ms+37.6ms) by Mail::SpamAssassin::HTML::text_style at line 508, avg 105µs/call # 630 times (54.4ms+21.5ms) by Mail::SpamAssassin::HTML::text_style at line 519, avg 120µs/call # 85 times (17.9ms+2.27ms) by Mail::SpamAssassin::HTML::text_style at line 480, avg 237µs/call
sub name_to_rgb {
99823256.52ms my $color = lc $_[0];
99923254.73ms my $before = $color;
1000
1001 # strip leading and ending whitespace
1002232530.8ms232512.8ms $color =~ s/^\s*//;
# spent 12.8ms making 2325 calls to Mail::SpamAssassin::HTML::CORE:subst, avg 6µs/call
1003232540.1ms232513.6ms $color =~ s/\s*$//;
# spent 13.6ms making 2325 calls to Mail::SpamAssassin::HTML::CORE:subst, avg 6µs/call
1004
1005 # named colors
100623257.73ms my $hex = $html_color{$color};
100723254.37ms if (defined $hex) {
10083024.92ms return sprintf("#%06x", $hex);
1009 }
1010
1011 # IF NOT A NAME, IT SHOULD BE A HEX COLOR, HEX SHORTHAND or rgb values
1012202324.2ms20236.92ms if ($color =~ m/^[#a-f0-9]*$|rgb\([\d%, ]*\)/i) {
# spent 6.92ms making 2023 calls to Mail::SpamAssassin::HTML::CORE:match, avg 3µs/call
1013
1014 #Convert the RGB values to hex values so we can fall through on the programming
1015
1016 #RGB PERCENTS TO HEX
1017201715.5ms20173.52ms if ($color =~ m/rgb\((\d+)%,\s*(\d+)%,\s*(\d+)%\s*\)/i) {
# spent 3.52ms making 2017 calls to Mail::SpamAssassin::HTML::CORE:match, avg 2µs/call
1018 $color = "#".dec2hex(int($1/100*255)).dec2hex(int($2/100*255)).dec2hex(int($3/100*255));
1019 }
1020
1021 #RGB DEC TO HEX
1022201715.7ms20173.27ms if ($color =~ m/rgb\((\d+),\s*(\d+),\s*(\d+)\s*\)/i) {
# spent 3.27ms making 2017 calls to Mail::SpamAssassin::HTML::CORE:match, avg 2µs/call
1023 $color = "#".dec2hex($1).dec2hex($2).dec2hex($3);
1024 }
1025
1026 #PARSE THE HEX
1027201722.9ms20176.01ms if ($color =~ m/^#/) {
# spent 6.01ms making 2017 calls to Mail::SpamAssassin::HTML::CORE:match, avg 3µs/call
1028 # strip to hex only
1029201738.2ms201715.1ms $color =~ s/[^a-f0-9]//ig;
# spent 15.1ms making 2017 calls to Mail::SpamAssassin::HTML::CORE:subst, avg 7µs/call
1030
1031 # strip to 6 if greater than 6
103220174.72ms if (length($color) > 6) {
1033 $color=substr($color,0,6);
1034 }
1035
1036 # strip to 3 if length < 6)
103720174.99ms if (length($color) > 3 && length($color) < 6) {
1038 $color=substr($color,0,3);
1039 }
1040
1041 # pad right-hand-side to a multiple of three
104220174.97ms $color .= "0" x (3 - (length($color) % 3)) if (length($color) % 3);
1043
1044 #DUPLICATE SHORTHAND HEX
1045201713.3ms if (length($color) == 3) {
104666804µs66295µs $color =~ m/(.)(.)(.)/;
# spent 295µs making 66 calls to Mail::SpamAssassin::HTML::CORE:match, avg 4µs/call
104766687µs $color = "$1$1$2$2$3$3";
1048 }
1049
1050 } else {
1051 return "invalid";
1052 }
1053
1054 } else {
1055 #INVALID
1056
1057 #??RETURN BLACK SINCE WE DO NOT KNOW HOW THE MUA / BROWSER WILL PARSE
1058 #$color = "000000";
1059
1060653µs return "invalid";
1061 }
1062
1063 #print "DEBUG: before/after name_to_rgb new version: $before/$color\n";
1064
1065201744.5ms return "#" . $color;
1066}
1067
1068sub dec2hex {
1069 my ($dec) = @_;
1070 my ($pre) = '';
1071
1072 if ($dec < 16) {
1073 $pre = '0';
1074 }
1075
1076 return sprintf("$pre%lx", $dec);
1077}
1078
1079
108021.80ms2533µs
# spent 289µs (46+244) within Mail::SpamAssassin::HTML::BEGIN@1080 which was called: # once (46µs+244µs) by Mail::SpamAssassin::Message::Node::BEGIN@45 at line 1080
use constant URI_STRICT => 0;
# spent 289µs making 1 call to Mail::SpamAssassin::HTML::BEGIN@1080 # spent 244µs making 1 call to constant::import
1081
1082# resolving relative URIs as defined in RFC 2396 (steps from section 5.2)
1083# using draft http://www.gbiv.com/protocols/uri/rev-2002/rfc2396bis.html
1084
# spent 246ms (191+54.9) within Mail::SpamAssassin::HTML::_parse_uri which was called 5410 times, avg 45µs/call: # 2705 times (111ms+39.1ms) by Mail::SpamAssassin::HTML::target_uri at line 1131, avg 56µs/call # 2705 times (79.7ms+15.8ms) by Mail::SpamAssassin::HTML::target_uri at line 1132, avg 35µs/call
sub _parse_uri {
1085541010.1ms my ($u) = @_;
108654109.23ms my %u;
10875410156ms541054.9ms ($u{scheme}, $u{authority}, $u{path}, $u{query}, $u{fragment}) =
# spent 54.9ms making 5410 calls to Mail::SpamAssassin::HTML::CORE:match, avg 10µs/call
1088 $u =~ m|^(?:([^:/?#]+):)?(?://([^/?#]*))?([^?#]*)(?:\?([^#]*))?(?:#(.*))?|;
10895410109ms return %u;
1090}
1091
1092
# spent 490ms (400+89.9) within Mail::SpamAssassin::HTML::_remove_dot_segments which was called 2703 times, avg 181µs/call: # 2702 times (400ms+89.9ms) by Mail::SpamAssassin::HTML::target_uri at line 1145, avg 181µs/call # once (44µs+11µs) by Mail::SpamAssassin::HTML::target_uri at line 1151
sub _remove_dot_segments {
109327035.81ms my ($input) = @_;
109427035.32ms my $output = "";
1095
1096270354.3ms27036.42ms $input =~ s@^(?:\.\.?/)@/@;
# spent 6.42ms making 2703 calls to Mail::SpamAssassin::HTML::CORE:subst, avg 2µs/call
1097
109827039.65ms while ($input) {
10998468369ms2540483.4ms if ($input =~ s@^/\.(?:$|/)@/@) {
# spent 83.4ms making 25404 calls to Mail::SpamAssassin::HTML::CORE:subst, avg 3µs/call
1100 }
1101 elsif ($input =~ s@^/\.\.(?:$|/)@/@) {
1102 $output =~ s@/?[^/]*$@@;
1103 }
1104 elsif ($input =~ s@(/?[^/]*)@@) {
1105846829.4ms $output .= $1;
1106 }
1107 }
1108270342.7ms return $output;
1109}
1110
1111sub _merge_uri {
1112 my ($base_authority, $base_path, $r_path) = @_;
1113
1114 if (defined $base_authority && !$base_path) {
1115 return "/" . $r_path;
1116 }
1117 else {
1118 if ($base_path =~ m|/|) {
1119 $base_path =~ s|(?<=/)[^/]*$||;
1120 }
1121 else {
1122 $base_path = "";
1123 }
1124 return $base_path . $r_path;
1125 }
1126}
1127
1128
# spent 1.05s (315ms+736ms) within Mail::SpamAssassin::HTML::target_uri which was called 2705 times, avg 389µs/call: # 2705 times (315ms+736ms) by Mail::SpamAssassin::HTML::push_uri at line 329, avg 389µs/call
sub target_uri {
112927055.69ms my ($base, $r) = @_;
1130
1131270533.3ms2705150ms my %r = _parse_uri($r); # parsed relative URI
# spent 150ms making 2705 calls to Mail::SpamAssassin::HTML::_parse_uri, avg 56µs/call
1132270527.9ms270595.5ms my %base = _parse_uri($base); # parsed base URI
# spent 95.5ms making 2705 calls to Mail::SpamAssassin::HTML::_parse_uri, avg 35µs/call
113327054.41ms my %t; # generated temporary URI
1134
113527056.73ms if ((not URI_STRICT) and
1136 (defined $r{scheme} && defined $base{scheme}) and
1137 ($r{scheme} eq $base{scheme}))
1138 {
1139 undef $r{scheme};
1140 }
1141
114227059.57ms if (defined $r{scheme}) {
114327028.01ms $t{scheme} = $r{scheme};
114427027.44ms $t{authority} = $r{authority};
1145270221.9ms2702490ms $t{path} = _remove_dot_segments($r{path});
# spent 490ms making 2702 calls to Mail::SpamAssassin::HTML::_remove_dot_segments, avg 181µs/call
1146270218.7ms $t{query} = $r{query};
1147 }
1148 else {
1149313µs if (defined $r{authority}) {
115013µs $t{authority} = $r{authority};
115117µs156µs $t{path} = _remove_dot_segments($r{path});
# spent 56µs making 1 call to Mail::SpamAssassin::HTML::_remove_dot_segments
115213µs $t{query} = $r{query};
1153 }
1154 else {
1155217µs if ($r{path} eq "") {
115627µs $t{path} = $base{path};
115728µs if (defined $r{query}) {
1158 $t{query} = $r{query};
1159 }
1160 else {
116125µs $t{query} = $base{query};
1162 }
1163 }
1164 else {
1165 if ($r{path} =~ m|^/|) {
1166 $t{path} = _remove_dot_segments($r{path});
1167 }
1168 else {
1169 $t{path} = _merge_uri($base{authority}, $base{path}, $r{path});
1170 $t{path} = _remove_dot_segments($t{path});
1171 }
1172 $t{query} = $r{query};
1173 }
117426µs $t{authority} = $base{authority};
1175 }
117638µs $t{scheme} = $base{scheme};
1177 }
117827057.16ms $t{fragment} = $r{fragment};
1179
1180 # recompose URI
118127055.06ms my $result = "";
118227059.24ms if ($t{scheme}) {
118327028.38ms $result .= $t{scheme} . ":";
1184 }
1185 elsif (defined $t{authority}) {
1186 # this block is not part of the RFC
1187 # TODO: figure out what MUAs actually do with unschemed URIs
1188 # maybe look at URI::Heuristic
1189120µs26µs if ($t{authority} =~ /^www\d*\./i) {
# spent 6µs making 2 calls to Mail::SpamAssassin::HTML::CORE:match, avg 3µs/call
1190 # some spammers are using unschemed URIs to escape filters
1191 $result .= "http:";
1192 }
1193 elsif ($t{authority} =~ /^ftp\d*\./i) {
1194 $result .= "ftp:";
1195 }
1196 }
119727059.39ms if ($t{authority}) {
119826547.28ms $result .= "//" . $t{authority};
1199 }
120027058.72ms $result .= $t{path};
120127055.81ms if (defined $t{query}) {
12023281.53ms $result .= "?" . $t{query};
1203 }
120427055.11ms if (defined $t{fragment}) {
1205841µs $result .= "#" . $t{fragment};
1206 }
1207270552.3ms return $result;
1208}
1209
12101119µs1;
1211__END__
 
# spent 1.34s within Mail::SpamAssassin::HTML::CORE:match which was called 231886 times, avg 6µs/call: # 46892 times (97.9ms+0s) by Mail::SpamAssassin::HTML::html_tag at line 270, avg 2µs/call # 33070 times (148ms+0s) by Mail::SpamAssassin::HTML::html_text at line 754, avg 4µs/call # 29608 times (405ms+0s) by Mail::SpamAssassin::HTML::html_text at line 745, avg 14µs/call # 29444 times (199ms+0s) by Mail::SpamAssassin::HTML::html_text at line 759, avg 7µs/call # 28303 times (182ms+0s) by Mail::SpamAssassin::HTML::text_style at line 496, avg 6µs/call # 16907 times (78.4ms+0s) by Mail::SpamAssassin::HTML::html_whitespace at line 310, avg 5µs/call # 16542 times (70.8ms+0s) by Mail::SpamAssassin::HTML::html_uri at line 355, avg 4µs/call # 5410 times (54.9ms+0s) by Mail::SpamAssassin::HTML::_parse_uri at line 1087, avg 10µs/call # 4740 times (29.8ms+0s) by Mail::SpamAssassin::HTML::html_font_invisible at line 554, avg 6µs/call # 4739 times (16.6ms+0s) by Mail::SpamAssassin::HTML::html_font_invisible at line 561, avg 4µs/call # 3213 times (12.8ms+0s) by Mail::SpamAssassin::HTML::close_table_tag at line 409, avg 4µs/call # 2221 times (6.05ms+0s) by Mail::SpamAssassin::HTML::text_style at line 500, avg 3µs/call # 2023 times (6.92ms+0s) by Mail::SpamAssassin::HTML::name_to_rgb at line 1012, avg 3µs/call # 2017 times (6.01ms+0s) by Mail::SpamAssassin::HTML::name_to_rgb at line 1027, avg 3µs/call # 2017 times (3.52ms+0s) by Mail::SpamAssassin::HTML::name_to_rgb at line 1017, avg 2µs/call # 2017 times (3.27ms+0s) by Mail::SpamAssassin::HTML::name_to_rgb at line 1022, avg 2µs/call # 776 times (7.29ms+0s) by Mail::SpamAssassin::HTML::html_tests at line 628, avg 9µs/call # 776 times (3.15ms+0s) by Mail::SpamAssassin::HTML::html_tests at line 632, avg 4µs/call # 391 times (1.29ms+0s) by Mail::SpamAssassin::HTML::html_text at line 763, avg 3µs/call # 272 times (1.79ms+0s) by Mail::SpamAssassin::HTML::html_tests at line 664, avg 7µs/call # 198 times (800µs+0s) by Mail::SpamAssassin::HTML::text_style at line 483, avg 4µs/call # 175 times (1.97ms+0s) by Mail::SpamAssassin::HTML::html_tests at line 611, avg 11µs/call # 67 times (396µs+0s) by Mail::SpamAssassin::HTML::html_declaration at line 789, avg 6µs/call # 66 times (295µs+0s) by Mail::SpamAssassin::HTML::name_to_rgb at line 1046, avg 4µs/call # 2 times (6µs+0s) by Mail::SpamAssassin::HTML::target_uri at line 1189, avg 3µs/call
sub Mail::SpamAssassin::HTML::CORE:match; # opcode
# spent 576ms within Mail::SpamAssassin::HTML::CORE:subst which was called 101699 times, avg 6µs/call: # 29608 times (261ms+0s) by Mail::SpamAssassin::HTML::display_text at line 698, avg 9µs/call # 25404 times (83.4ms+0s) by Mail::SpamAssassin::HTML::_remove_dot_segments at line 1099, avg 3µs/call # 14215 times (56.8ms+0s) by Mail::SpamAssassin::HTML::display_text at line 705, avg 4µs/call # 14202 times (55.2ms+0s) by Mail::SpamAssassin::HTML::display_text at line 693, avg 4µs/call # 4261 times (24.9ms+0s) by Mail::SpamAssassin::HTML::canon_uri at line 339, avg 6µs/call # 4261 times (22.3ms+0s) by Mail::SpamAssassin::HTML::canon_uri at line 340, avg 5µs/call # 2703 times (6.42ms+0s) by Mail::SpamAssassin::HTML::_remove_dot_segments at line 1096, avg 2µs/call # 2325 times (13.6ms+0s) by Mail::SpamAssassin::HTML::name_to_rgb at line 1003, avg 6µs/call # 2325 times (12.8ms+0s) by Mail::SpamAssassin::HTML::name_to_rgb at line 1002, avg 6µs/call # 2017 times (15.1ms+0s) by Mail::SpamAssassin::HTML::name_to_rgb at line 1029, avg 7µs/call # 189 times (15.0ms+0s) by Mail::SpamAssassin::HTML::parse at line 244, avg 79µs/call # 189 times (9.24ms+0s) by Mail::SpamAssassin::HTML::parse at line 248, avg 49µs/call
sub Mail::SpamAssassin::HTML::CORE:subst; # opcode
# spent 28.3ms within Mail::SpamAssassin::HTML::CORE:substcont which was called 1404 times, avg 20µs/call: # 1404 times (28.3ms+0s) by Mail::SpamAssassin::HTML::parse at line 248, avg 20µs/call
sub Mail::SpamAssassin::HTML::CORE:substcont; # opcode