← Index
NYTProf Performance Profile   « line view »
For /usr/local/bin/sa-learn
  Run on Sun Nov 5 02:36:06 2017
Reported on Sun Nov 5 02:56:19 2017

Filename/usr/local/lib/perl5/site_perl/Mail/SpamAssassin/HTML.pm
StatementsExecuted 3803434 statements in 22.1s
Subroutines
Calls P F Exclusive
Time
Inclusive
Time
Subroutine
46468514.14s4.50sMail::SpamAssassin::HTML::::display_textMail::SpamAssassin::HTML::display_text
46892113.58s13.6sMail::SpamAssassin::HTML::::html_tagMail::SpamAssassin::HTML::html_tag
29709212.25s6.09sMail::SpamAssassin::HTML::::html_textMail::SpamAssassin::HTML::html_text
32908112.11s5.35sMail::SpamAssassin::HTML::::text_styleMail::SpamAssassin::HTML::text_style
16266111.77s1.77sMail::SpamAssassin::HTML::::close_tagMail::SpamAssassin::HTML::close_tag
2318862511.16s1.16sMail::SpamAssassin::HTML::::CORE:matchMail::SpamAssassin::HTML::CORE:match (opcode)
782411963ms975msMail::SpamAssassin::HTML::::close_table_tagMail::SpamAssassin::HTML::close_table_tag
2457011618ms686msMail::SpamAssassin::HTML::::html_testsMail::SpamAssassin::HTML::html_tests
1686011569ms2.31sMail::SpamAssassin::HTML::::html_whitespaceMail::SpamAssassin::HTML::html_whitespace
101699121562ms562msMail::SpamAssassin::HTML::::CORE:substMail::SpamAssassin::HTML::CORE:subst (opcode)
56731377ms377msMail::SpamAssassin::HTML::::get_rendered_textMail::SpamAssassin::HTML::get_rendered_text
270321368ms459msMail::SpamAssassin::HTML::::_remove_dot_segmentsMail::SpamAssassin::HTML::_remove_dot_segments
1267311367ms1.59sMail::SpamAssassin::HTML::::html_uriMail::SpamAssassin::HTML::html_uri
478711356ms403msMail::SpamAssassin::HTML::::html_font_invisibleMail::SpamAssassin::HTML::html_font_invisible
270511276ms926msMail::SpamAssassin::HTML::::target_uriMail::SpamAssassin::HTML::target_uri
232531233ms294msMail::SpamAssassin::HTML::::name_to_rgbMail::SpamAssassin::HTML::name_to_rgb
541021145ms191msMail::SpamAssassin::HTML::::_parse_uriMail::SpamAssassin::HTML::_parse_uri
426121131ms169msMail::SpamAssassin::HTML::::canon_uriMail::SpamAssassin::HTML::canon_uri
270531122ms1.16sMail::SpamAssassin::HTML::::push_uriMail::SpamAssassin::HTML::push_uri
309620178.3ms78.3msMail::SpamAssassin::HTML::::put_resultsMail::SpamAssassin::HTML::put_results
1891173.9ms146msMail::SpamAssassin::HTML::::html_endMail::SpamAssassin::HTML::html_end
1891146.4ms21.7sMail::SpamAssassin::HTML::::parseMail::SpamAssassin::HTML::parse
14041128.5ms28.5msMail::SpamAssassin::HTML::::CORE:substcontMail::SpamAssassin::HTML::CORE:substcont (opcode)
11123.2ms25.5msMail::SpamAssassin::HTML::::BEGIN@30Mail::SpamAssassin::HTML::BEGIN@30
189118.73ms65.6msMail::SpamAssassin::HTML::::newMail::SpamAssassin::HTML::new
189117.49ms12.3msMail::SpamAssassin::HTML::::html_startMail::SpamAssassin::HTML::html_start
411115.83ms5.83msMail::SpamAssassin::HTML::::html_commentMail::SpamAssassin::HTML::html_comment
67112.53ms2.88msMail::SpamAssassin::HTML::::html_declarationMail::SpamAssassin::HTML::html_declaration
189111.53ms1.53msMail::SpamAssassin::HTML::::get_resultsMail::SpamAssassin::HTML::get_results
11147µs60µsMail::SpamAssassin::HTML::::BEGIN@23Mail::SpamAssassin::HTML::BEGIN@23
11138µs257µsMail::SpamAssassin::HTML::::BEGIN@1080Mail::SpamAssassin::HTML::BEGIN@1080
11132µs225µsMail::SpamAssassin::HTML::::BEGIN@31Mail::SpamAssassin::HTML::BEGIN@31
11131µs63µsMail::SpamAssassin::HTML::::BEGIN@24Mail::SpamAssassin::HTML::BEGIN@24
11127µs170µsMail::SpamAssassin::HTML::::BEGIN@33Mail::SpamAssassin::HTML::BEGIN@33
11122µs732µsMail::SpamAssassin::HTML::::BEGIN@32Mail::SpamAssassin::HTML::BEGIN@32
11120µs96µsMail::SpamAssassin::HTML::::BEGIN@25Mail::SpamAssassin::HTML::BEGIN@25
0000s0sMail::SpamAssassin::HTML::::_merge_uriMail::SpamAssassin::HTML::_merge_uri
0000s0sMail::SpamAssassin::HTML::::dec2hexMail::SpamAssassin::HTML::dec2hex
0000s0sMail::SpamAssassin::HTML::::name_to_rgb_oldMail::SpamAssassin::HTML::name_to_rgb_old
Call graph for these subroutines as a Graphviz dot language file.
Line State
ments
Time
on line
Calls Time
in subs
Code
1# <@LICENSE>
2# Licensed to the Apache Software Foundation (ASF) under one or more
3# contributor license agreements. See the NOTICE file distributed with
4# this work for additional information regarding copyright ownership.
5# The ASF licenses this file to you under the Apache License, Version 2.0
6# (the "License"); you may not use this file except in compliance with
7# the License. You may obtain a copy of the License at:
8#
9# http://www.apache.org/licenses/LICENSE-2.0
10#
11# Unless required by applicable law or agreed to in writing, software
12# distributed under the License is distributed on an "AS IS" BASIS,
13# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14# See the License for the specific language governing permissions and
15# limitations under the License.
16# </@LICENSE>
17
18# HTML decoding TODOs
19# - add URIs to list for faster URI testing
20
21package Mail::SpamAssassin::HTML;
22
23272µs273µs
# spent 60µs (47+13) within Mail::SpamAssassin::HTML::BEGIN@23 which was called: # once (47µs+13µs) by Mail::SpamAssassin::Message::Node::BEGIN@45 at line 23
use strict;
# spent 60µs making 1 call to Mail::SpamAssassin::HTML::BEGIN@23 # spent 13µs making 1 call to strict::import
24272µs294µs
# spent 63µs (31+32) within Mail::SpamAssassin::HTML::BEGIN@24 which was called: # once (31µs+32µs) by Mail::SpamAssassin::Message::Node::BEGIN@45 at line 24
use warnings;
# spent 63µs making 1 call to Mail::SpamAssassin::HTML::BEGIN@24 # spent 32µs making 1 call to warnings::import
25293µs2171µs
# spent 96µs (20+75) within Mail::SpamAssassin::HTML::BEGIN@25 which was called: # once (20µs+75µs) by Mail::SpamAssassin::Message::Node::BEGIN@45 at line 25
use re 'taint';
# spent 96µs making 1 call to Mail::SpamAssassin::HTML::BEGIN@25 # spent 75µs making 1 call to re::import
26
27130µsrequire 5.008; # need basic Unicode support for HTML::Parser::utf8_mode
28# require 5.008008; # Bug 3787; [perl #37950]: Malformed UTF-8 character ...
29
303371µs225.5ms
# spent 25.5ms (23.2+2.28) within Mail::SpamAssassin::HTML::BEGIN@30 which was called: # once (23.2ms+2.28ms) by Mail::SpamAssassin::Message::Node::BEGIN@45 at line 30
use HTML::Parser 3.43 ();
# spent 25.5ms making 1 call to Mail::SpamAssassin::HTML::BEGIN@30 # spent 18µs making 1 call to version::_VERSION
31281µs2418µs
# spent 225µs (32+193) within Mail::SpamAssassin::HTML::BEGIN@31 which was called: # once (32µs+193µs) by Mail::SpamAssassin::Message::Node::BEGIN@45 at line 31
use Mail::SpamAssassin::Logger;
# spent 225µs making 1 call to Mail::SpamAssassin::HTML::BEGIN@31 # spent 193µs making 1 call to Exporter::import
32265µs21.44ms
# spent 732µs (22+710) within Mail::SpamAssassin::HTML::BEGIN@32 which was called: # once (22µs+710µs) by Mail::SpamAssassin::Message::Node::BEGIN@45 at line 32
use Mail::SpamAssassin::Constants qw(:sa);
# spent 732µs making 1 call to Mail::SpamAssassin::HTML::BEGIN@32 # spent 710µs making 1 call to Exporter::import
33212.4ms2314µs
# spent 170µs (27+144) within Mail::SpamAssassin::HTML::BEGIN@33 which was called: # once (27µs+144µs) by Mail::SpamAssassin::Message::Node::BEGIN@45 at line 33
use Mail::SpamAssassin::Util qw(untaint_var);
# spent 170µs making 1 call to Mail::SpamAssassin::HTML::BEGIN@33 # spent 144µs making 1 call to Exporter::import
34
35122µsour @ISA = qw(HTML::Parser);
36
37# elements defined by the HTML 4.01 and XHTML 1.0 DTDs (do not change them!)
38# does not include XML
3995618µsmy %elements = map {; $_ => 1 }
40 # strict
41 qw( a abbr acronym address area b base bdo big blockquote body br button caption cite code col colgroup dd del dfn div dl dt em fieldset form h1 h2 h3 h4 h5 h6 head hr html i img input ins kbd label legend li link map meta noscript object ol optgroup option p param pre q samp script select small span strong style sub sup table tbody td textarea tfoot th thead title tr tt ul var ),
42 # loose
43 qw( applet basefont center dir font frame frameset iframe isindex menu noframes s strike u ),
44 # non-standard tags
45 qw( nobr x-sigsep x-tab ),
46;
47
48# elements that we want to render, but not count as valid
49631µsmy %tricks = map {; $_ => 1 }
50 # non-standard and non-valid tags
51 qw( bgsound embed listing plaintext xmp ),
52 # other non-standard tags handled in popfile
53 # blink ilayer multicol noembed nolayer spacer wbr
54;
55
56# elements that change text style
571480µsmy %elements_text_style = map {; $_ => 1 }
58 qw( body font table tr th td big small basefont marquee span p div ),
59;
60
61# elements that insert whitespace
6223123µsmy %elements_whitespace = map {; $_ => 1 }
63 qw( br div li th td dt dd p hr blockquote pre embed listing plaintext xmp title
64 h1 h2 h3 h4 h5 h6 ),
65;
66
67# elements that push URIs
6816103µsmy %elements_uri = map {; $_ => 1 }
69 qw( body table tr td a area link img frame iframe embed script form base bgsound ),
70;
71
72# style attribute not accepted
73#my %elements_no_style = map {; $_ => 1 }
74# qw( base basefont head html meta param script style title ),
75#;
76
77# permitted element attributes
7812µsmy %ok_attributes;
79112µs$ok_attributes{basefont}{$_} = 1 for qw( color face size );
80117µs$ok_attributes{body}{$_} = 1 for qw( text bgcolor link alink vlink background );
81110µs$ok_attributes{font}{$_} = 1 for qw( color face size );
8217µs$ok_attributes{marquee}{$_} = 1 for qw( bgcolor background );
8315µs$ok_attributes{table}{$_} = 1 for qw( bgcolor );
84110µs$ok_attributes{td}{$_} = 1 for qw( bgcolor );
85114µs$ok_attributes{th}{$_} = 1 for qw( bgcolor );
86111µs$ok_attributes{tr}{$_} = 1 for qw( bgcolor );
87113µs$ok_attributes{span}{$_} = 1 for qw( style );
8815µs$ok_attributes{p}{$_} = 1 for qw( style );
8916µs$ok_attributes{div}{$_} = 1 for qw( style );
90
91
# spent 65.6ms (8.73+56.9) within Mail::SpamAssassin::HTML::new which was called 189 times, avg 347µs/call: # 189 times (8.73ms+56.9ms) by Mail::SpamAssassin::Message::Node::rendered at line 635 of Mail/SpamAssassin/Message/Node.pm, avg 347µs/call
sub new {
92189547µs my ($class, $character_semantics_input, $character_semantics_output) = @_;
931895.84ms18956.9ms my $self = $class->SUPER::new(
# spent 56.9ms making 189 calls to HTML::Parser::new, avg 301µs/call
94 api_version => 3,
95 handlers => [
96 start_document => ["html_start", "self"],
97 start => ["html_tag", "self,tagname,attr,'+1'"],
98 end_document => ["html_end", "self"],
99 end => ["html_tag", "self,tagname,attr,'-1'"],
100 text => ["html_text", "self,dtext"],
101 comment => ["html_comment", "self,text"],
102 declaration => ["html_declaration", "self,text"],
103 ],
104 marked_sections => 1);
105189604µs $self->{SA_character_semantics_input} = $character_semantics_input;
106 $self->{SA_encode_results} =
107189619µs $character_semantics_input && !$character_semantics_output;
1081891.39ms $self;
109}
110
111
# spent 12.3ms (7.49+4.82) within Mail::SpamAssassin::HTML::html_start which was called 189 times, avg 65µs/call: # 189 times (7.49ms+4.82ms) by HTML::Parser::parse at line 260, avg 65µs/call
sub html_start {
112189452µs my ($self) = @_;
113
114 # trigger HTML_MESSAGE
1151891.65ms1894.82ms $self->put_results(html => 1);
# spent 4.82ms making 189 calls to Mail::SpamAssassin::HTML::put_results, avg 25µs/call
116
117 # initial display attributes
118189654µs $self->{basefont} = 3;
119 my %default = (tag => "default",
120 fgcolor => "#000000",
121 bgcolor => "#ffffff",
1221891.77ms size => $self->{basefont});
1233782.84ms push @{ $self->{text_style} }, \%default;
124}
125
126
# spent 146ms (73.9+72.3) within Mail::SpamAssassin::HTML::html_end which was called 189 times, avg 774µs/call: # 189 times (73.9ms+72.3ms) by HTML::Parser::eof at line 261, avg 774µs/call
sub html_end {
127189446µs my ($self) = @_;
128
1291891.29ms delete $self->{text_style};
130
131189372µs my @uri;
132
133 # add the canonicalized version of each uri to the detail list
134189833µs if (defined $self->{uri}) {
1352361.98ms @uri = keys %{$self->{uri}};
136 }
137
138 # these keep backward compatibility, albeit a little wasteful
1391891.64ms1894.59ms $self->put_results(uri => \@uri);
# spent 4.59ms making 189 calls to Mail::SpamAssassin::HTML::put_results, avg 24µs/call
1401891.61ms1894.78ms $self->put_results(anchor => $self->{anchor});
# spent 4.78ms making 189 calls to Mail::SpamAssassin::HTML::put_results, avg 25µs/call
141
1421891.46ms1895.05ms $self->put_results(uri_detail => $self->{uri});
# spent 5.05ms making 189 calls to Mail::SpamAssassin::HTML::put_results, avg 27µs/call
1431891.78ms1895.10ms $self->put_results(uri_truncated => $self->{uri_truncated});
# spent 5.10ms making 189 calls to Mail::SpamAssassin::HTML::put_results, avg 27µs/call
144
145 # final results scalars
1461891.38ms1894.16ms $self->put_results(image_area => $self->{image_area});
# spent 4.16ms making 189 calls to Mail::SpamAssassin::HTML::put_results, avg 22µs/call
1471891.39ms1894.34ms $self->put_results(length => $self->{length});
# spent 4.34ms making 189 calls to Mail::SpamAssassin::HTML::put_results, avg 23µs/call
1481891.35ms1894.50ms $self->put_results(min_size => $self->{min_size});
# spent 4.50ms making 189 calls to Mail::SpamAssassin::HTML::put_results, avg 24µs/call
1491891.40ms1893.94ms $self->put_results(max_size => $self->{max_size});
# spent 3.94ms making 189 calls to Mail::SpamAssassin::HTML::put_results, avg 21µs/call
1501891.51ms if (exists $self->{tags}) {
151 $self->put_results(closed_extra_ratio =>
1521871.75ms1874.86ms ($self->{closed_extra} / $self->{tags}));
# spent 4.86ms making 187 calls to Mail::SpamAssassin::HTML::put_results, avg 26µs/call
153 }
154
155 # final result arrays
1561891.70ms1894.92ms $self->put_results(comment => $self->{comment});
# spent 4.92ms making 189 calls to Mail::SpamAssassin::HTML::put_results, avg 26µs/call
1571891.70ms1895.12ms $self->put_results(script => $self->{script});
# spent 5.12ms making 189 calls to Mail::SpamAssassin::HTML::put_results, avg 27µs/call
1581891.54ms1894.91ms $self->put_results(title => $self->{title});
# spent 4.91ms making 189 calls to Mail::SpamAssassin::HTML::put_results, avg 26µs/call
159
160 # final result hashes
1611891.32ms1895.02ms $self->put_results(inside => $self->{inside});
# spent 5.02ms making 189 calls to Mail::SpamAssassin::HTML::put_results, avg 27µs/call
162
163 # end-of-document result values that don't require looking at the text
164189507µs if (exists $self->{backhair}) {
16522136µs11311µs $self->put_results(backhair_count => scalar keys %{ $self->{backhair} });
# spent 311µs making 11 calls to Mail::SpamAssassin::HTML::put_results, avg 28µs/call
166 }
167189909µs if (exists $self->{elements} && exists $self->{tags}) {
168 $self->put_results(bad_tag_ratio =>
1691871.76ms1875.06ms ($self->{tags} - $self->{elements}) / $self->{tags});
# spent 5.06ms making 187 calls to Mail::SpamAssassin::HTML::put_results, avg 27µs/call
170 }
1711891.01ms if (exists $self->{elements_seen} && exists $self->{tags_seen}) {
172 $self->put_results(non_element_ratio =>
173 ($self->{tags_seen} - $self->{elements_seen}) /
1741871.69ms1875.22ms $self->{tags_seen});
# spent 5.22ms making 187 calls to Mail::SpamAssassin::HTML::put_results, avg 28µs/call
175 }
1761891.77ms if (exists $self->{tags} && exists $self->{obfuscation}) {
177 $self->put_results(obfuscation_ratio =>
17819144µs19470µs $self->{obfuscation} / $self->{tags});
# spent 470µs making 19 calls to Mail::SpamAssassin::HTML::put_results, avg 25µs/call
179 }
180}
181
182
# spent 78.3ms within Mail::SpamAssassin::HTML::put_results which was called 3096 times, avg 25µs/call: # 189 times (5.12ms+0s) by Mail::SpamAssassin::HTML::html_end at line 157, avg 27µs/call # 189 times (5.10ms+0s) by Mail::SpamAssassin::HTML::html_end at line 143, avg 27µs/call # 189 times (5.05ms+0s) by Mail::SpamAssassin::HTML::html_end at line 142, avg 27µs/call # 189 times (5.02ms+0s) by Mail::SpamAssassin::HTML::html_end at line 161, avg 27µs/call # 189 times (4.92ms+0s) by Mail::SpamAssassin::HTML::html_end at line 156, avg 26µs/call # 189 times (4.91ms+0s) by Mail::SpamAssassin::HTML::html_end at line 158, avg 26µs/call # 189 times (4.82ms+0s) by Mail::SpamAssassin::HTML::html_start at line 115, avg 25µs/call # 189 times (4.78ms+0s) by Mail::SpamAssassin::HTML::html_end at line 140, avg 25µs/call # 189 times (4.59ms+0s) by Mail::SpamAssassin::HTML::html_end at line 139, avg 24µs/call # 189 times (4.50ms+0s) by Mail::SpamAssassin::HTML::html_end at line 148, avg 24µs/call # 189 times (4.34ms+0s) by Mail::SpamAssassin::HTML::html_end at line 147, avg 23µs/call # 189 times (4.16ms+0s) by Mail::SpamAssassin::HTML::html_end at line 146, avg 22µs/call # 189 times (3.94ms+0s) by Mail::SpamAssassin::HTML::html_end at line 149, avg 21µs/call # 187 times (5.22ms+0s) by Mail::SpamAssassin::HTML::html_end at line 174, avg 28µs/call # 187 times (5.06ms+0s) by Mail::SpamAssassin::HTML::html_end at line 169, avg 27µs/call # 187 times (4.86ms+0s) by Mail::SpamAssassin::HTML::html_end at line 152, avg 26µs/call # 47 times (1.16ms+0s) by Mail::SpamAssassin::HTML::html_font_invisible at line 555, avg 25µs/call # 19 times (470µs+0s) by Mail::SpamAssassin::HTML::html_end at line 178, avg 25µs/call # 11 times (311µs+0s) by Mail::SpamAssassin::HTML::html_end at line 165, avg 28µs/call # once (28µs+0s) by Mail::SpamAssassin::HTML::html_font_invisible at line 586
sub put_results {
18330965.33ms my $self = shift;
184309620.7ms my %results = @_;
185
186309680.9ms while (my ($k, $v) = each %results) {
187309612.0ms $self->{results}{$k} = $v;
188 }
189}
190
191
# spent 1.53ms within Mail::SpamAssassin::HTML::get_results which was called 189 times, avg 8µs/call: # 189 times (1.53ms+0s) by Mail::SpamAssassin::Message::Node::rendered at line 643 of Mail/SpamAssassin/Message/Node.pm, avg 8µs/call
sub get_results {
192189425µs my ($self) = @_;
193
1941891.40ms return $self->{results};
195}
196
197
# spent 377ms within Mail::SpamAssassin::HTML::get_rendered_text which was called 567 times, avg 665µs/call: # 189 times (203ms+0s) by Mail::SpamAssassin::Message::Node::rendered at line 641 of Mail/SpamAssassin/Message/Node.pm, avg 1.07ms/call # 189 times (163ms+0s) by Mail::SpamAssassin::Message::Node::rendered at line 642 of Mail/SpamAssassin/Message/Node.pm, avg 865µs/call # 189 times (10.5ms+0s) by Mail::SpamAssassin::Message::Node::rendered at line 640 of Mail/SpamAssassin/Message/Node.pm, avg 56µs/call
sub get_rendered_text {
1985671.05ms my $self = shift;
1995672.74ms my %options = @_;
200
20175610.7ms return join('', @{ $self->{text} }) unless %options;
202
203378674µs my $mask;
2043784.14ms while (my ($k, $v) = each %options) {
2053781.30ms next if !defined $self->{"text_$k"};
2063781.37ms if (!defined $mask) {
2073783.65ms $mask |= $v ? $self->{"text_$k"} : ~ $self->{"text_$k"};
208 }
209 else {
210 $mask &= $v ? $self->{"text_$k"} : ~ $self->{"text_$k"};
211 }
212 }
213
214378905µs my $text = '';
215378692µs my $i = 0;
21693692347ms for (@{ $self->{text} }) { $text .= $_ if vec($mask, $i++, 1); }
2173785.00ms return $text;
218}
219
220
# spent 21.7s (46.4ms+21.6) within Mail::SpamAssassin::HTML::parse which was called 189 times, avg 115ms/call: # 189 times (46.4ms+21.6s) by Mail::SpamAssassin::Message::Node::rendered at line 637 of Mail/SpamAssassin/Message/Node.pm, avg 115ms/call
sub parse {
221189988µs my ($self, $text) = @_;
222
223189664µs $self->{image_area} = 0;
224189523µs $self->{title_index} = -1;
225189540µs $self->{max_size} = 3; # start at default size
226189588µs $self->{min_size} = 3; # start at default size
227189708µs $self->{closed_html} = 0;
228189472µs $self->{closed_body} = 0;
229189533µs $self->{closed_extra} = 0;
230189600µs $self->{text} = []; # rendered text
2311893.42ms1896.71ms $self->{length} += untaint_var(length($text));
# spent 6.71ms making 189 calls to Mail::SpamAssassin::Util::untaint_var, avg 36µs/call
232
233 # NOTE: We *only* need to fix the rendering when we verify that it
234 # differs from what people see in their MUA. Testing is best done with
235 # the most common MUAs and browsers, if you catch my drift.
236
237 # NOTE: HTML::Parser can cope with: <?xml pis>, <? with space>, so we
238 # don't need to fix them here.
239
240 # # (outdated claim) HTML::Parser converts &nbsp; into a question mark ("?")
241 # # for some reason, so convert them to spaces. Confirmed in 3.31, at least.
242 # ... Actually it doesn't, it is correctly coverted into Unicode NBSP,
243 # nevertheless it does not hurt to treat it as a space.
24418926.7ms18914.8ms $text =~ s/&nbsp;/ /g;
# spent 14.8ms making 189 calls to Mail::SpamAssassin::HTML::CORE:subst, avg 78µs/call
245
246 # bug 4695: we want "<br/>" to be treated the same as "<br>", and
247 # the HTML::Parser API won't do it for us
24818948.5ms159337.5ms $text =~ s/<(\w+)\s*\/>/<$1>/gi;
# spent 28.5ms making 1404 calls to Mail::SpamAssassin::HTML::CORE:substcont, avg 20µs/call # spent 9.08ms making 189 calls to Mail::SpamAssassin::HTML::CORE:subst, avg 48µs/call
249
2501894.00ms189915µs if (!$self->UNIVERSAL::can('utf8_mode')) {
# spent 915µs making 189 calls to UNIVERSAL::can, avg 5µs/call
251 # utf8_mode is cleared by default, only warn if it would need to be set
252 warn "message: cannot set utf8_mode, module HTML::Parser is too old\n"
253 if !$self->{SA_character_semantics_input};
254 } else {
2551892.41ms189664µs $self->SUPER::utf8_mode($self->{SA_character_semantics_input} ? 0 : 1);
# spent 664µs making 189 calls to HTML::Parser::utf8_mode, avg 4µs/call
2561893.55ms3781.91ms dbg("message: HTML::Parser utf8_mode %s",
# spent 1.39ms making 189 calls to Mail::SpamAssassin::Logger::dbg, avg 7µs/call # spent 518µs making 189 calls to HTML::Parser::utf8_mode, avg 3µs/call
257 $self->SUPER::utf8_mode ? "on (assumed UTF-8 octets)"
258 : "off (default, assumed Unicode characters)");
259 }
2601891.13s7731141.1s $self->SUPER::parse($text);
# spent 21.4s making 189 calls to HTML::Parser::parse, avg 113ms/call # spent 13.6s making 46892 calls to Mail::SpamAssassin::HTML::html_tag, avg 290µs/call # spent 6.07s making 29563 calls to Mail::SpamAssassin::HTML::html_text, avg 205µs/call # spent 12.3ms making 189 calls to Mail::SpamAssassin::HTML::html_start, avg 65µs/call # spent 5.83ms making 411 calls to Mail::SpamAssassin::HTML::html_comment, avg 14µs/call # spent 2.88ms making 67 calls to Mail::SpamAssassin::HTML::html_declaration, avg 43µs/call
2611896.17ms524347ms $self->SUPER::eof;
# spent 176ms making 189 calls to HTML::Parser::eof, avg 933µs/call # spent 146ms making 189 calls to Mail::SpamAssassin::HTML::html_end, avg 774µs/call # spent 24.5ms making 146 calls to Mail::SpamAssassin::HTML::html_text, avg 168µs/call
262
2631891.71ms return $self->{text};
264}
265
266
# spent 13.6s (3.58+10.0) within Mail::SpamAssassin::HTML::html_tag which was called 46892 times, avg 290µs/call: # 46892 times (3.58s+10.0s) by HTML::Parser::parse at line 260, avg 290µs/call
sub html_tag {
26746892102ms my ($self, $tag, $attr, $num) = @_;
2684689290.0ms utf8::encode($tag) if $self->{SA_encode_results};
269
27046892595ms4689296.6ms my $maybe_namespace = ($tag =~ m@^(?:o|st\d):[\w-]+/?$@);
# spent 96.6ms making 46892 calls to Mail::SpamAssassin::HTML::CORE:match, avg 2µs/call
271
27246892180ms if (exists $elements{$tag} || $maybe_namespace) {
2734687782.7ms $self->{elements}++;
2744687794.9ms $self->{elements_seen}++ if !exists $self->{inside}{$tag};
275 }
2764689280.2ms $self->{tags}++;
2774689285.7ms $self->{tags_seen}++ if !exists $self->{inside}{$tag};
27846892129ms $self->{inside}{$tag} += $num;
2794689291.2ms if ($self->{inside}{$tag} < 0) {
2803482µs $self->{inside}{$tag} = 0;
2813466µs $self->{closed_extra}++;
282 }
283
2844689295.8ms return if $maybe_namespace;
285
286 # ignore non-elements
28745476623ms if (exists $elements{$tag} || exists $tricks{$tag}) {
28845461318ms329085.35s $self->text_style($tag, $attr, $num) if exists $elements_text_style{$tag};
# spent 5.35s making 32908 calls to Mail::SpamAssassin::HTML::text_style, avg 162µs/call
289
290 # bug 5009: things like <p> and </p> both need dealing with
29145461169ms168602.31s $self->html_whitespace($tag) if exists $elements_whitespace{$tag};
# spent 2.31s making 16860 calls to Mail::SpamAssassin::HTML::html_whitespace, avg 137µs/call
292
293 # start tags
29445461176ms if ($num == 1) {
29524570111ms126731.59s $self->html_uri($tag, $attr) if exists $elements_uri{$tag};
# spent 1.59s making 12673 calls to Mail::SpamAssassin::HTML::html_uri, avg 125µs/call
29624570170ms24570686ms $self->html_tests($tag, $attr, $num);
# spent 686ms making 24570 calls to Mail::SpamAssassin::HTML::html_tests, avg 28µs/call
297 }
298 # end tags
299 else {
3002089137.3ms $self->{closed_html} = 1 if $tag eq "html";
3012089136.5ms $self->{closed_body} = 1 if $tag eq "body";
302 }
303 }
304}
305
306
# spent 2.31s (569ms+1.74) within Mail::SpamAssassin::HTML::html_whitespace which was called 16860 times, avg 137µs/call: # 16860 times (569ms+1.74s) by Mail::SpamAssassin::HTML::html_tag at line 291, avg 137µs/call
sub html_whitespace {
3071686037.4ms my ($self, $tag) = @_;
308
309 # ordered by frequency of tag groups, note: whitespace is always "visible"
31016860550ms1690766.7ms if ($tag eq "br" || $tag eq "div") {
# spent 66.7ms making 16907 calls to Mail::SpamAssassin::HTML::CORE:match, avg 4µs/call
311394925.3ms3949373ms $self->display_text("\n", whitespace => 1);
# spent 373ms making 3949 calls to Mail::SpamAssassin::HTML::display_text, avg 94µs/call
312 }
313 elsif ($tag =~ /^(?:li|t[hd]|d[td]|embed|h\d)$/) {
314891555.5ms8915903ms $self->display_text(" ", whitespace => 1);
# spent 903ms making 8915 calls to Mail::SpamAssassin::HTML::display_text, avg 101µs/call
315 }
316 elsif ($tag =~ /^(?:p|hr|blockquote|pre|listing|plaintext|xmp|title)$/) {
317399625.4ms3996402ms $self->display_text("\n\n", whitespace => 1);
# spent 402ms making 3996 calls to Mail::SpamAssassin::HTML::display_text, avg 101µs/call
318 }
319}
320
321# puts the uri onto the internal array
322# note: uri may be blank (<a href=""></a> obfuscation, etc.)
323
# spent 1.16s (122ms+1.04) within Mail::SpamAssassin::HTML::push_uri which was called 2705 times, avg 430µs/call: # 1560 times (70.6ms+675ms) by Mail::SpamAssassin::HTML::html_uri at line 362, avg 478µs/call # 1132 times (51.0ms+361ms) by Mail::SpamAssassin::HTML::html_uri at line 367, avg 364µs/call # 13 times (532µs+4.40ms) by Mail::SpamAssassin::HTML::html_uri at line 357, avg 379µs/call
sub push_uri {
32427055.81ms my ($self, $type, $uri) = @_;
325
326270519.2ms2705114ms $uri = $self->canon_uri($uri);
# spent 114ms making 2705 calls to Mail::SpamAssassin::HTML::canon_uri, avg 42µs/call
32727055.21ms utf8::encode($uri) if $self->{SA_encode_results};
328
329270524.5ms2705926ms my $target = target_uri($self->{base_href} || "", $uri);
# spent 926ms making 2705 calls to Mail::SpamAssassin::HTML::target_uri, avg 342µs/call
330
331 # skip things like <iframe src="" ...>
332270534.3ms $self->{uri}->{$uri}->{types}->{$type} = 1 if $uri ne '';
333}
334
335
# spent 169ms (131+37.9) within Mail::SpamAssassin::HTML::canon_uri which was called 4261 times, avg 40µs/call: # 2705 times (85.6ms+28.7ms) by Mail::SpamAssassin::HTML::push_uri at line 326, avg 42µs/call # 1556 times (45.5ms+9.26ms) by Mail::SpamAssassin::HTML::html_tests at line 654, avg 35µs/call
sub canon_uri {
33642618.19ms my ($self, $uri) = @_;
337
338 # URIs don't have leading/trailing whitespace ...
339426153.4ms426116.1ms $uri =~ s/^\s+//;
# spent 16.1ms making 4261 calls to Mail::SpamAssassin::HTML::CORE:subst, avg 4µs/call
340426169.9ms426121.9ms $uri =~ s/\s+$//;
# spent 21.9ms making 4261 calls to Mail::SpamAssassin::HTML::CORE:subst, avg 5µs/call
341
342 # Make sure all the URIs are nice and short
34342618.23ms if (length $uri > MAX_URI_LENGTH) {
344 $self->{'uri_truncated'} = 1;
345 $uri = substr $uri, 0, MAX_URI_LENGTH;
346 }
347
348426147.5ms return $uri;
349}
350
351
# spent 1.59s (367ms+1.22) within Mail::SpamAssassin::HTML::html_uri which was called 12673 times, avg 125µs/call: # 12673 times (367ms+1.22s) by Mail::SpamAssassin::HTML::html_tag at line 295, avg 125µs/call
sub html_uri {
3521267333.5ms my ($self, $tag, $attr) = @_;
353
354 # ordered by frequency of tag groups
35512673409ms1654256.7ms if ($tag =~ /^(?:body|table|tr|td)$/) {
# spent 56.7ms making 16542 calls to Mail::SpamAssassin::HTML::CORE:match, avg 3µs/call
356993820.0ms if (defined $attr->{background}) {
35713116µs134.93ms $self->push_uri($tag, $attr->{background});
# spent 4.93ms making 13 calls to Mail::SpamAssassin::HTML::push_uri, avg 379µs/call
358 }
359 }
360 elsif ($tag =~ /^(?:a|area|link)$/) {
36116016.74ms if (defined $attr->{href}) {
362156010.7ms1560745ms $self->push_uri($tag, $attr->{href});
# spent 745ms making 1560 calls to Mail::SpamAssassin::HTML::push_uri, avg 478µs/call
363 }
364 }
365 elsif ($tag =~ /^(?:img|frame|iframe|embed|script|bgsound)$/) {
36611344.73ms if (defined $attr->{src}) {
36711329.42ms1132412ms $self->push_uri($tag, $attr->{src});
# spent 412ms making 1132 calls to Mail::SpamAssassin::HTML::push_uri, avg 364µs/call
368 }
369 }
370 elsif ($tag eq "form") {
371 if (defined $attr->{action}) {
372 $self->push_uri($tag, $attr->{action});
373 }
374 }
375 elsif ($tag eq "base") {
376 if (my $uri = $attr->{href}) {
377 $uri = $self->canon_uri($uri);
378
379 # use <BASE HREF="URI"> to turn relative links into absolute links
380
381 # even if it is a base URI, handle like a normal URI as well
382 $self->push_uri($tag, $uri);
383
384 # a base URI will be ignored by browsers unless it is an absolute
385 # URI of a standard protocol
386 if ($uri =~ m@^(?:https?|ftp):/{0,2}@i) {
387 # remove trailing filename, if any; base URIs can have the
388 # form of "http://foo.com/index.html"
389 $uri =~ s@^([a-z]+:/{0,2}[^/]+/.*?)[^/\.]+\.[^/\.]{2,4}$@$1@i;
390
391 # Make sure it ends in a slash
392 $uri .= "/" unless $uri =~ m@/$@;
393 utf8::encode($uri) if $self->{SA_encode_results};
394 $self->{base_href} = $uri;
395 }
396 }
397 }
398}
399
400# this might not be quite right, may need to pay attention to table nesting
401
# spent 975ms (963+11.5) within Mail::SpamAssassin::HTML::close_table_tag which was called 7824 times, avg 125µs/call: # 7824 times (963ms+11.5ms) by Mail::SpamAssassin::HTML::text_style at line 455, avg 125µs/call
sub close_table_tag {
402782415.2ms my ($self, $tag) = @_;
403
404 # don't close if never opened
405176215792ms return unless grep { $_->{tag} eq $tag } @{ $self->{text_style} };
406
407728111.8ms my $top;
4081459445.7ms while (@{ $self->{text_style} } && ($top = $self->{text_style}[-1]->{tag})) {
409731388.9ms321311.5ms if (($tag eq "td" && ($top eq "font" || $top eq "td")) ||
# spent 11.5ms making 3213 calls to Mail::SpamAssassin::HTML::CORE:match, avg 4µs/call
410 ($tag eq "tr" && $top =~ /^(?:font|td|tr)$/))
411 {
41264200µs pop @{ $self->{text_style} };
413 }
414 else {
415728174.1ms last;
416 }
417 }
418}
419
420
# spent 1.77s within Mail::SpamAssassin::HTML::close_tag which was called 16266 times, avg 109µs/call: # 16266 times (1.77s+0s) by Mail::SpamAssassin::HTML::text_style at line 539, avg 109µs/call
sub close_tag {
4211626631.3ms my ($self, $tag) = @_;
422
423 # don't close if never opened
4242986651.28s return if !grep { $_->{tag} eq $tag } @{ $self->{text_style} };
425
426 # close everything up to and including tag
42748780305ms while (my %current = %{ pop @{ $self->{text_style} } }) {
42816279266ms last if $current{tag} eq $tag;
429 }
430}
431
432
# spent 5.35s (2.11+3.23) within Mail::SpamAssassin::HTML::text_style which was called 32908 times, avg 162µs/call: # 32908 times (2.11s+3.23s) by Mail::SpamAssassin::HTML::html_tag at line 288, avg 162µs/call
sub text_style {
4333290870.2ms my ($self, $tag, $attr, $num) = @_;
434
435 # treat <th> as <td>
4363290858.1ms $tag = "td" if $tag eq "th";
437
438 # open
43932908476ms if ($num == 1) {
440 # HTML browsers generally only use first <body> for colors,
441 # so only push if we haven't seen a body tag yet
4421648527.3ms if ($tag eq "body") {
443 # TODO: skip if we've already seen body
444 }
445
446 # change basefont (only change size)
4471648526.6ms if ($tag eq "basefont" &&
448 exists $attr->{size} && $attr->{size} =~ /^\s*(\d+)/)
449 {
450 $self->{basefont} = $1;
451 return;
452 }
453
454 # close elements with optional end tags
4551648564.8ms7824975ms $self->close_table_tag($tag) if ($tag eq "td" || $tag eq "tr");
# spent 975ms making 7824 calls to Mail::SpamAssassin::HTML::close_table_tag, avg 125µs/call
456
457 # copy current text state
45832970314ms my %new = %{ $self->{text_style}[-1] };
459
460 # change tag name!
4611648532.5ms $new{tag} = $tag;
462
463 # big and small tags
4641648527.4ms if ($tag eq "big") {
4651335µs $new{size} += 1;
4662693µs push @{ $self->{text_style} }, \%new;
4671388µs return;
468 }
4691647226.7ms if ($tag eq "small") {
4701635µs $new{size} -= 1;
47132101µs push @{ $self->{text_style} }, \%new;
47216136µs return;
473 }
474
475 # tag attributes
47616456125ms for my $name (keys %$attr) {
4772628655.2ms next unless exists $ok_attributes{$tag}{$name};
478544823.0ms if ($name eq "text" || $name eq "color") {
479 # two different names for text color
48085722µs858.43ms $new{fgcolor} = name_to_rgb($attr->{$name});
# spent 8.43ms making 85 calls to Mail::SpamAssassin::HTML::name_to_rgb, avg 99µs/call
481 }
482 elsif ($name eq "size") {
4831172.42ms198724µs if ($attr->{size} =~ /^\s*([+-]\d+)/) {
# spent 724µs making 198 calls to Mail::SpamAssassin::HTML::CORE:match, avg 4µs/call
484 # relative font size
48536136µs $new{size} = $self->{basefont} + $1;
486 }
487 elsif ($attr->{size} =~ /^\s*(\d+)/) {
488 # absolute font size
48977316µs $new{size} = $1;
490 }
491 }
492 elsif ($name eq 'style') {
49344269.87ms $new{style} = $attr->{style};
494442624.0ms my @parts = split(/;/, $new{style});
495442626.9ms foreach (@parts) {
49615262509ms28303184ms if (/^\s*(background-)?color:\s*(.+)\s*$/i) {
# spent 184ms making 28303 calls to Mail::SpamAssassin::HTML::CORE:match, avg 7µs/call
49722215.95ms my $whcolor = $1 ? 'bgcolor' : 'fgcolor';
49822216.44ms my $value = lc $2;
499
500222133.3ms22216.29ms if ($value =~ /rgb/) {
# spent 6.29ms making 2221 calls to Mail::SpamAssassin::HTML::CORE:match, avg 3µs/call
5016111.91ms $value =~ tr/0-9,//cd;
5026112.43ms my @rgb = split(/,/, $value);
503 $new{$whcolor} = sprintf("#%02x%02x%02x",
504244417.3ms map { !$_ ? 0 : $_ > 255 ? 255 : $_ }
505 @rgb[0..2]);
506 }
507 else {
508161012.7ms1610209ms $new{$whcolor} = name_to_rgb($value);
# spent 209ms making 1610 calls to Mail::SpamAssassin::HTML::name_to_rgb, avg 130µs/call
509 }
510 }
511 elsif (/^\s*([a-z_-]+)\s*:\s*(\S.*?)\s*$/i) {
512 # "display: none", "visibility: hidden", etc.
51312646110ms $new{'style_'.$1} = $2;
514 }
515 }
516 }
517 elsif ($name eq "bgcolor") {
518 # overwrite with hex value, $new{bgcolor} is set below
5196305.58ms63076.6ms $attr->{bgcolor} = name_to_rgb($attr->{bgcolor});
# spent 76.6ms making 630 calls to Mail::SpamAssassin::HTML::name_to_rgb, avg 122µs/call
520 }
521 else {
522 # attribute is probably okay
523190530µs $new{$name} = $attr->{$name};
524 }
525
526544825.8ms if ($new{size} > $self->{max_size}) {
52726µs $self->{max_size} = $new{size};
528 }
529 elsif ($new{size} < $self->{min_size}) {
53033103µs $self->{min_size} = $new{size};
531 }
532 }
53332912126ms push @{ $self->{text_style} }, \%new;
534 }
535 # explicitly close a tag
536 else {
5371642356.0ms if ($tag ne "body") {
538 # don't close body since browsers seem to render text after </body>
53916266102ms162661.77s $self->close_tag($tag);
# spent 1.77s making 16266 calls to Mail::SpamAssassin::HTML::close_tag, avg 109µs/call
540 }
541 }
542}
543
544
# spent 403ms (356+47.2) within Mail::SpamAssassin::HTML::html_font_invisible which was called 4787 times, avg 84µs/call: # 4787 times (356ms+47.2ms) by Mail::SpamAssassin::HTML::html_text at line 746, avg 84µs/call
sub html_font_invisible {
545478710.4ms my ($self, $text) = @_;
546
547478712.4ms my $fg = $self->{text_style}[-1]->{fgcolor};
548478710.4ms my $bg = $self->{text_style}[-1]->{bgcolor};
549478710.2ms my $size = $self->{text_style}[-1]->{size};
55047879.93ms my $display = $self->{text_style}[-1]->{style_display};
55147879.67ms my $visibility = $self->{text_style}[-1]->{style_visibility};
552
553 # invisibility
554478795.8ms474027.7ms if (substr($fg,-6) eq substr($bg,-6)) {
# spent 27.7ms making 4740 calls to Mail::SpamAssassin::HTML::CORE:match, avg 6µs/call
55547315µs471.16ms $self->put_results(font_low_contrast => 1);
# spent 1.16ms making 47 calls to Mail::SpamAssassin::HTML::put_results, avg 25µs/call
55647451µs return 1;
557 # near-invisibility
558 } elsif ($fg =~ /^\#?([0-9a-f]{2})([0-9a-f]{2})([0-9a-f]{2})$/) {
559473919.0ms my ($r1, $g1, $b1) = (hex($1), hex($2), hex($3));
560
561473981.5ms473918.3ms if ($bg =~ /^\#?([0-9a-f]{2})([0-9a-f]{2})([0-9a-f]{2})$/) {
# spent 18.3ms making 4739 calls to Mail::SpamAssassin::HTML::CORE:match, avg 4µs/call
562473914.8ms my ($r2, $g2, $b2) = (hex($1), hex($2), hex($3));
563
56447399.48ms my $r = ($r1 - $r2);
56547398.37ms my $g = ($g1 - $g2);
56647398.14ms my $b = ($b1 - $b2);
567
568 # geometric distance weighted by brightness
569 # maximum distance is 191.151823601032
570473920.2ms my $distance = ((0.2126*$r)**2 + (0.7152*$g)**2 + (0.0722*$b)**2)**0.5;
571
572 # the text is very difficult to read if the distance is under 12,
573 # a limit of 14 to 16 might be okay if the usage significantly
574 # increases (near-invisible text is at about 0.95% of spam and
575 # 1.25% of HTML spam right now), but please test any changes first
576473921.6ms if ($distance < 12) {
577 $self->put_results(font_low_contrast => 1);
578 return 1;
579 }
580 }
581 }
582
583
584 # invalid color
58547409.83ms if ($fg eq 'invalid' or $bg eq 'invalid') {
58617µs128µs $self->put_results(font_invalid_color => 1);
# spent 28µs making 1 call to Mail::SpamAssassin::HTML::put_results
58717µs return 1;
588 }
589
590 # size too small
59147398.24ms if ($size <= 1) {
59240407µs return 1;
593 }
594
595 # <span style="display: none">
59646998.04ms if ($display && lc $display eq 'none') {
597432µs return 1;
598 }
599
60046957.80ms if ($visibility && lc $visibility eq 'hidden') {
601881µs return 1;
602 }
603
604468788.4ms return 0;
605}
606
607
# spent 686ms (618+67.7) within Mail::SpamAssassin::HTML::html_tests which was called 24570 times, avg 28µs/call: # 24570 times (618ms+67.7ms) by Mail::SpamAssassin::HTML::html_tag at line 296, avg 28µs/call
sub html_tests {
6082457054.4ms my ($self, $tag, $attr, $num) = @_;
609
6102457043.8ms if ($tag eq "font" && exists $attr->{face}) {
6111753.20ms1751.71ms if ($attr->{face} !~ /^[a-z ][a-z -]*[a-z](?:,\s*[a-z][a-z -]*[a-z])*$/i) {
# spent 1.71ms making 175 calls to Mail::SpamAssassin::HTML::CORE:match, avg 10µs/call
612 $self->put_results(font_face_bad => 1);
613 }
614 }
6152457050.6ms if ($tag eq "img" && exists $self->{inside}{a} && $self->{inside}{a} > 0) {
6165041.29ms my $uri = $self->{anchor_last};
617504991µs utf8::encode($uri) if $self->{SA_encode_results};
6185041.97ms $self->{uri}->{$uri}->{anchor_text}->[-1] .= "<img>\n";
6195041.75ms $self->{anchor}->[-1] .= "<img>\n";
620 }
621
6222457042.2ms if ($tag eq "img" && exists $attr->{width} && exists $attr->{height}) {
6237761.74ms my $width = 0;
6247761.50ms my $height = 0;
6257761.64ms my $area = 0;
626
627 # assume 800x600 screen for percentage values
62877611.3ms7764.48ms if ($attr->{width} =~ /^(\d+)(\%)?$/) {
# spent 4.48ms making 776 calls to Mail::SpamAssassin::HTML::CORE:match, avg 6µs/call
6297762.09ms $width = $1;
6307761.89ms $width *= 8 if (defined $2 && $2 eq "%");
631 }
63277611.6ms7765.00ms if ($attr->{height} =~ /^(\d+)(\%)?$/) {
# spent 5.00ms making 776 calls to Mail::SpamAssassin::HTML::CORE:match, avg 6µs/call
6337751.96ms $height = $1;
6347751.74ms $height *= 6 if (defined $2 && $2 eq "%");
635 }
636 # guess size
6377762.16ms $width = 200 if $width <= 0;
6387761.50ms $height = 200 if $height <= 0;
6397765.37ms if ($width > 0 && $height > 0) {
6407761.45ms $area = $width * $height;
6417762.22ms $self->{image_area} += $area;
642 }
643 }
6442457039.6ms if ($tag eq "form" && exists $attr->{action}) {
645 $self->put_results(form_action_mailto => 1) if $attr->{action} =~ /mailto:/i
646 }
6472457045.3ms if ($tag eq "object" || $tag eq "embed") {
648 $self->put_results(embeds => 1);
649 }
650
651 # special text delimiters - <a> and <title>
6522457045.2ms if ($tag eq "a") {
653 my $uri = $self->{anchor_last} =
654159714.1ms155654.7ms (exists $attr->{href} ? $self->canon_uri($attr->{href}) : "");
# spent 54.7ms making 1556 calls to Mail::SpamAssassin::HTML::canon_uri, avg 35µs/call
65515972.91ms utf8::encode($uri) if $self->{SA_encode_results};
656319418.1ms push(@{$self->{uri}->{$uri}->{anchor_text}}, '');
657319412.1ms push(@{$self->{anchor}}, '');
658 }
6592457040.9ms if ($tag eq "title") {
66067166µs $self->{title_index}++;
66167323µs $self->{title}->[$self->{title_index}] = "";
662 }
663
66424570338ms2721.77ms if ($tag eq "meta" &&
# spent 1.77ms making 272 calls to Mail::SpamAssassin::HTML::CORE:match, avg 7µs/call
665 exists $attr->{'http-equiv'} &&
666 exists $attr->{content} &&
667 $attr->{'http-equiv'} =~ /Content-Type/i &&
668 $attr->{content} =~ /\bcharset\s*=\s*["']?([^"']+)/i)
669 {
670123835µs $self->{charsets} .= exists $self->{charsets} ? " $1" : $1;
671 }
672}
673
674
# spent 4.50s (4.14+368ms) within Mail::SpamAssassin::HTML::display_text which was called 46468 times, avg 97µs/call: # 29508 times (2.51s+297ms) by Mail::SpamAssassin::HTML::html_text at line 773, avg 95µs/call # 8915 times (868ms+34.6ms) by Mail::SpamAssassin::HTML::html_whitespace at line 314, avg 101µs/call # 3996 times (388ms+13.3ms) by Mail::SpamAssassin::HTML::html_whitespace at line 317, avg 101µs/call # 3949 times (364ms+8.93ms) by Mail::SpamAssassin::HTML::html_whitespace at line 311, avg 94µs/call # 100 times (7.27ms+14.2ms) by Mail::SpamAssassin::HTML::html_text at line 770, avg 215µs/call
sub display_text {
6754646883.6ms my $self = shift;
6764646898.8ms my $text = shift;
67746468156ms my %display = @_;
678
679 # Unless it's specified to be invisible, then it's not invisible. ;)
68046468174ms if (!exists $display{invisible}) {
6814636897.5ms $display{invisible} = 0;
682 }
683
68446468193ms if ($display{whitespace}) {
685 # trim trailing whitespace from previous element if it was not whitespace
686 # and it was not invisible
68733720196ms if (@{ $self->{text} } &&
688 (!defined $self->{text_whitespace} ||
6891667331.6ms !vec($self->{text_whitespace}, $#{$self->{text}}, 1)) &&
690 (!defined $self->{text_invisible} ||
6911426426.6ms !vec($self->{text_invisible}, $#{$self->{text}}, 1)))
692 {
69314202208ms1420256.8ms $self->{text}->[-1] =~ s/ $//;
# spent 56.8ms making 14202 calls to Mail::SpamAssassin::HTML::CORE:subst, avg 4µs/call
694 }
695 }
696 else {
697 # NBSP: UTF-8: C2 A0, ISO-8859-*: A0
69829608619ms29608251ms $text =~ s/[ \t\n\r\f\x0b]+|\xc2\xa0/ /gs;
# spent 251ms making 29608 calls to Mail::SpamAssassin::HTML::CORE:subst, avg 8µs/call
699 # trim leading whitespace if previous element was whitespace
700 # and current element is not invisible
70159216305ms if (@{ $self->{text} } && !$display{invisible} &&
702 defined $self->{text_whitespace} &&
7032876154.9ms vec($self->{text_whitespace}, $#{$self->{text}}, 1))
704 {
70514215216ms1421560.7ms $text =~ s/^ //;
# spent 60.7ms making 14215 calls to Mail::SpamAssassin::HTML::CORE:subst, avg 4µs/call
706 }
707 }
70892936450ms push @{ $self->{text} }, $text;
709464681.21s while (my ($k, $v) = each %display) {
71063328127ms my $textvar = "text_".$k;
71163704126ms if (!exists $self->{$textvar}) { $self->{$textvar} = ''; }
712126656514ms vec($self->{$textvar}, $#{$self->{text}}, 1) = $v;
713 }
714}
715
716
# spent 6.09s (2.25+3.84) within Mail::SpamAssassin::HTML::html_text which was called 29709 times, avg 205µs/call: # 29563 times (2.24s+3.83s) by HTML::Parser::parse at line 260, avg 205µs/call # 146 times (8.22ms+16.3ms) by HTML::Parser::eof at line 261, avg 168µs/call
sub html_text {
7172970962.9ms my ($self, $text) = @_;
7182970957.3ms utf8::encode($text) if $self->{SA_encode_results};
719
720 # text that is not part of body
7212970955.1ms if (exists $self->{inside}{script} && $self->{inside}{script} > 0)
722 {
72327µs push @{ $self->{script} }, $text;
72416µs return;
725 }
7262970857.9ms if (exists $self->{inside}{style} && $self->{inside}{style} > 0) {
727100730µs return;
728 }
729
730 # text that is part of body and also stored separately
7312960862.9ms if (exists $self->{inside}{a} && $self->{inside}{a} > 0) {
732 # this doesn't worry about nested anchors
73316313.76ms my $uri = $self->{anchor_last};
73416312.92ms utf8::encode($uri) if $self->{SA_encode_results};
73516316.61ms $self->{uri}->{$uri}->{anchor_text}->[-1] .= $text;
73616318.91ms $self->{anchor}->[-1] .= $text;
737 }
7382960857.3ms if (exists $self->{inside}{title} && $self->{inside}{title} > 0) {
73927103µs $self->{title}->[$self->{title_index}] .= $text;
740 }
741
7422960851.0ms my $invisible_for_bayes = 0;
743
744 # NBSP: UTF-8: C2 A0, ISO-8859-*: A0
74529608710ms29608324ms if ($text !~ /^(?:[ \t\n\r\f\x0b]|\xc2\xa0)*\z/s) {
# spent 324ms making 29608 calls to Mail::SpamAssassin::HTML::CORE:match, avg 11µs/call
746478732.8ms4787403ms $invisible_for_bayes = $self->html_font_invisible($text);
# spent 403ms making 4787 calls to Mail::SpamAssassin::HTML::html_font_invisible, avg 84µs/call
747 }
748
74929608108ms if (exists $self->{text}->[-1]) {
750 # ideas discarded since they would be easy to evade:
751 # 1. using \w or [A-Za-z] instead of \S or non-punctuation
752 # 2. exempting certain tags
753 # no re "strict"; # since perl 5.21.8: Ranges of ASCII printables...
75429444390ms3307095.3ms if ($text =~ /^[^\s\x21-\x2f\x3a-\x40\x5b-\x60\x7b-\x7e]/s &&
# spent 95.3ms making 33070 calls to Mail::SpamAssassin::HTML::CORE:match, avg 3µs/call
755 $self->{text}->[-1] =~ /[^\s\x21-\x2f\x3a-\x40\x5b-\x60\x7b-\x7e]\z/s)
756 {
7573890µs $self->{obfuscation}++;
758 }
75929444499ms29444193ms if ($self->{text}->[-1] =~
# spent 193ms making 29444 calls to Mail::SpamAssassin::HTML::CORE:match, avg 7µs/call
760 /\b([^\s\x21-\x2f\x3a-\x40\x5b-\x60\x7b-\x7e]{1,7})\z/s)
761 {
7623911.27ms my $start = length($1);
7633914.52ms3911.43ms if ($text =~ /^([^\s\x21-\x2f\x3a-\x40\x5b-\x60\x7b-\x7e]{1,7})\b/s) {
# spent 1.43ms making 391 calls to Mail::SpamAssassin::HTML::CORE:match, avg 4µs/call
76413110µs $self->{backhair}->{$start . "_" . length($1)}++;
765 }
766 }
767 }
768
76929608433ms if ($invisible_for_bayes) {
770100722µs10021.5ms $self->display_text($text, invisible => 1);
# spent 21.5ms making 100 calls to Mail::SpamAssassin::HTML::display_text, avg 215µs/call
771 }
772 else {
77329508189ms295082.81s $self->display_text($text);
# spent 2.81s making 29508 calls to Mail::SpamAssassin::HTML::display_text, avg 95µs/call
774 }
775}
776
777# note: $text includes <!-- and -->
778
# spent 5.83ms within Mail::SpamAssassin::HTML::html_comment which was called 411 times, avg 14µs/call: # 411 times (5.83ms+0s) by HTML::Parser::parse at line 260, avg 14µs/call
sub html_comment {
779411897µs my ($self, $text) = @_;
780411842µs utf8::encode($text) if $self->{SA_encode_results};
781
7828225.07ms push @{ $self->{comment} }, $text;
783}
784
785
# spent 2.88ms (2.53+344µs) within Mail::SpamAssassin::HTML::html_declaration which was called 67 times, avg 43µs/call: # 67 times (2.53ms+344µs) by HTML::Parser::parse at line 260, avg 43µs/call
sub html_declaration {
78667302µs my ($self, $text) = @_;
78767187µs utf8::encode($text) if $self->{SA_encode_results};
788
789671.60ms67344µs if ($text =~ /^<!doctype/i) {
# spent 344µs making 67 calls to Mail::SpamAssassin::HTML::CORE:match, avg 5µs/call
79067178µs my $tag = "!doctype";
79167325µs $self->{elements}++;
79267171µs $self->{tags}++;
79367302µs $self->{inside}{$tag} = 0;
794 }
795}
796
797###########################################################################
798
7991116µsmy %html_color = (
800 # HTML 4 defined 16 colors
801 aqua => 0x00ffff,
802 black => 0x000000,
803 blue => 0x0000ff,
804 fuchsia => 0xff00ff,
805 gray => 0x808080,
806 green => 0x008000,
807 lime => 0x00ff00,
808 maroon => 0x800000,
809 navy => 0x000080,
810 olive => 0x808000,
811 purple => 0x800080,
812 red => 0xff0000,
813 silver => 0xc0c0c0,
814 teal => 0x008080,
815 white => 0xffffff,
816 yellow => 0xffff00,
817 # colors specified in CSS3 color module
818 aliceblue => 0xf0f8ff,
819 antiquewhite => 0xfaebd7,
820 aqua => 0x00ffff,
821 aquamarine => 0x7fffd4,
822 azure => 0xf0ffff,
823 beige => 0xf5f5dc,
824 bisque => 0xffe4c4,
825 black => 0x000000,
826 blanchedalmond => 0xffebcd,
827 blue => 0x0000ff,
828 blueviolet => 0x8a2be2,
829 brown => 0xa52a2a,
830 burlywood => 0xdeb887,
831 cadetblue => 0x5f9ea0,
832 chartreuse => 0x7fff00,
833 chocolate => 0xd2691e,
834 coral => 0xff7f50,
835 cornflowerblue => 0x6495ed,
836 cornsilk => 0xfff8dc,
837 crimson => 0xdc143c,
838 cyan => 0x00ffff,
839 darkblue => 0x00008b,
840 darkcyan => 0x008b8b,
841 darkgoldenrod => 0xb8860b,
842 darkgray => 0xa9a9a9,
843 darkgreen => 0x006400,
844 darkgrey => 0xa9a9a9,
845 darkkhaki => 0xbdb76b,
846 darkmagenta => 0x8b008b,
847 darkolivegreen => 0x556b2f,
848 darkorange => 0xff8c00,
849 darkorchid => 0x9932cc,
850 darkred => 0x8b0000,
851 darksalmon => 0xe9967a,
852 darkseagreen => 0x8fbc8f,
853 darkslateblue => 0x483d8b,
854 darkslategray => 0x2f4f4f,
855 darkslategrey => 0x2f4f4f,
856 darkturquoise => 0x00ced1,
857 darkviolet => 0x9400d3,
858 deeppink => 0xff1493,
859 deepskyblue => 0x00bfff,
860 dimgray => 0x696969,
861 dimgrey => 0x696969,
862 dodgerblue => 0x1e90ff,
863 firebrick => 0xb22222,
864 floralwhite => 0xfffaf0,
865 forestgreen => 0x228b22,
866 fuchsia => 0xff00ff,
867 gainsboro => 0xdcdcdc,
868 ghostwhite => 0xf8f8ff,
869 gold => 0xffd700,
870 goldenrod => 0xdaa520,
871 gray => 0x808080,
872 green => 0x008000,
873 greenyellow => 0xadff2f,
874 grey => 0x808080,
875 honeydew => 0xf0fff0,
876 hotpink => 0xff69b4,
877 indianred => 0xcd5c5c,
878 indigo => 0x4b0082,
879 ivory => 0xfffff0,
880 khaki => 0xf0e68c,
881 lavender => 0xe6e6fa,
882 lavenderblush => 0xfff0f5,
883 lawngreen => 0x7cfc00,
884 lemonchiffon => 0xfffacd,
885 lightblue => 0xadd8e6,
886 lightcoral => 0xf08080,
887 lightcyan => 0xe0ffff,
888 lightgoldenrodyellow => 0xfafad2,
889 lightgray => 0xd3d3d3,
890 lightgreen => 0x90ee90,
891 lightgrey => 0xd3d3d3,
892 lightpink => 0xffb6c1,
893 lightsalmon => 0xffa07a,
894 lightseagreen => 0x20b2aa,
895 lightskyblue => 0x87cefa,
896 lightslategray => 0x778899,
897 lightslategrey => 0x778899,
898 lightsteelblue => 0xb0c4de,
899 lightyellow => 0xffffe0,
900 lime => 0x00ff00,
901 limegreen => 0x32cd32,
902 linen => 0xfaf0e6,
903 magenta => 0xff00ff,
904 maroon => 0x800000,
905 mediumaquamarine => 0x66cdaa,
906 mediumblue => 0x0000cd,
907 mediumorchid => 0xba55d3,
908 mediumpurple => 0x9370db,
909 mediumseagreen => 0x3cb371,
910 mediumslateblue => 0x7b68ee,
911 mediumspringgreen => 0x00fa9a,
912 mediumturquoise => 0x48d1cc,
913 mediumvioletred => 0xc71585,
914 midnightblue => 0x191970,
915 mintcream => 0xf5fffa,
916 mistyrose => 0xffe4e1,
917 moccasin => 0xffe4b5,
918 navajowhite => 0xffdead,
919 navy => 0x000080,
920 oldlace => 0xfdf5e6,
921 olive => 0x808000,
922 olivedrab => 0x6b8e23,
923 orange => 0xffa500,
924 orangered => 0xff4500,
925 orchid => 0xda70d6,
926 palegoldenrod => 0xeee8aa,
927 palegreen => 0x98fb98,
928 paleturquoise => 0xafeeee,
929 palevioletred => 0xdb7093,
930 papayawhip => 0xffefd5,
931 peachpuff => 0xffdab9,
932 peru => 0xcd853f,
933 pink => 0xffc0cb,
934 plum => 0xdda0dd,
935 powderblue => 0xb0e0e6,
936 purple => 0x800080,
937 red => 0xff0000,
938 rosybrown => 0xbc8f8f,
939 royalblue => 0x4169e1,
940 saddlebrown => 0x8b4513,
941 salmon => 0xfa8072,
942 sandybrown => 0xf4a460,
943 seagreen => 0x2e8b57,
944 seashell => 0xfff5ee,
945 sienna => 0xa0522d,
946 silver => 0xc0c0c0,
947 skyblue => 0x87ceeb,
948 slateblue => 0x6a5acd,
949 slategray => 0x708090,
950 slategrey => 0x708090,
951 snow => 0xfffafa,
952 springgreen => 0x00ff7f,
953 steelblue => 0x4682b4,
954 tan => 0xd2b48c,
955 teal => 0x008080,
956 thistle => 0xd8bfd8,
957 tomato => 0xff6347,
958 turquoise => 0x40e0d0,
959 violet => 0xee82ee,
960 wheat => 0xf5deb3,
961 white => 0xffffff,
962 whitesmoke => 0xf5f5f5,
963 yellow => 0xffff00,
964 yellowgreen => 0x9acd32,
965);
966
967sub name_to_rgb_old {
968 my $color = lc $_[0];
969
970 # note: Mozilla strips leading and trailing whitespace at this point,
971 # but IE does not
972
973 # named colors
974 my $hex = $html_color{$color};
975 if (defined $hex) {
976 return sprintf("#%06x", $hex);
977 }
978
979 # Flex Hex: John Graham-Cumming, http://www.jgc.org/pdf/lisa2004.pdf
980 # strip optional # character
981 $color =~ s/^#//;
982 # pad right-hand-side to a multiple of three
983 $color .= "0" x (3 - (length($color) % 3)) if (length($color) % 3);
984 # split into triplets
985 my $length = length($color) / 3;
986 my @colors = ($color =~ /(.{$length})(.{$length})(.{$length})/);
987 # truncate each color to a DWORD, take MSB, left pad nibbles
988 foreach (@colors) { s/.*(.{8})$/$1/; s/(..).*/$1/; s/^(.)$/0$1/ };
989 # the color
990 $color = join("", @colors);
991 # replace non-hex characters with 0
992 $color =~ tr/0-9a-f/0/c;
993
994 return "#" . $color;
995}
996
997
# spent 294ms (233+61.0) within Mail::SpamAssassin::HTML::name_to_rgb which was called 2325 times, avg 126µs/call: # 1610 times (171ms+37.7ms) by Mail::SpamAssassin::HTML::text_style at line 508, avg 130µs/call # 630 times (55.6ms+21.0ms) by Mail::SpamAssassin::HTML::text_style at line 519, avg 122µs/call # 85 times (6.13ms+2.30ms) by Mail::SpamAssassin::HTML::text_style at line 480, avg 99µs/call
sub name_to_rgb {
99823256.46ms my $color = lc $_[0];
99923254.55ms my $before = $color;
1000
1001 # strip leading and ending whitespace
1002232532.4ms232512.7ms $color =~ s/^\s*//;
# spent 12.7ms making 2325 calls to Mail::SpamAssassin::HTML::CORE:subst, avg 5µs/call
1003232557.3ms232513.8ms $color =~ s/\s*$//;
# spent 13.8ms making 2325 calls to Mail::SpamAssassin::HTML::CORE:subst, avg 6µs/call
1004
1005 # named colors
100623257.21ms my $hex = $html_color{$color};
100723254.54ms if (defined $hex) {
10083024.33ms return sprintf("#%06x", $hex);
1009 }
1010
1011 # IF NOT A NAME, IT SHOULD BE A HEX COLOR, HEX SHORTHAND or rgb values
1012202324.5ms20236.69ms if ($color =~ m/^[#a-f0-9]*$|rgb\([\d%, ]*\)/i) {
# spent 6.69ms making 2023 calls to Mail::SpamAssassin::HTML::CORE:match, avg 3µs/call
1013
1014 #Convert the RGB values to hex values so we can fall through on the programming
1015
1016 #RGB PERCENTS TO HEX
1017201726.9ms20173.61ms if ($color =~ m/rgb\((\d+)%,\s*(\d+)%,\s*(\d+)%\s*\)/i) {
# spent 3.61ms making 2017 calls to Mail::SpamAssassin::HTML::CORE:match, avg 2µs/call
1018 $color = "#".dec2hex(int($1/100*255)).dec2hex(int($2/100*255)).dec2hex(int($3/100*255));
1019 }
1020
1021 #RGB DEC TO HEX
1022201715.4ms20173.37ms if ($color =~ m/rgb\((\d+),\s*(\d+),\s*(\d+)\s*\)/i) {
# spent 3.37ms making 2017 calls to Mail::SpamAssassin::HTML::CORE:match, avg 2µs/call
1023 $color = "#".dec2hex($1).dec2hex($2).dec2hex($3);
1024 }
1025
1026 #PARSE THE HEX
1027201733.5ms20176.08ms if ($color =~ m/^#/) {
# spent 6.08ms making 2017 calls to Mail::SpamAssassin::HTML::CORE:match, avg 3µs/call
1028 # strip to hex only
1029201731.1ms201714.4ms $color =~ s/[^a-f0-9]//ig;
# spent 14.4ms making 2017 calls to Mail::SpamAssassin::HTML::CORE:subst, avg 7µs/call
1030
1031 # strip to 6 if greater than 6
103220174.68ms if (length($color) > 6) {
1033 $color=substr($color,0,6);
1034 }
1035
1036 # strip to 3 if length < 6)
103720175.06ms if (length($color) > 3 && length($color) < 6) {
1038 $color=substr($color,0,3);
1039 }
1040
1041 # pad right-hand-side to a multiple of three
104220175.26ms $color .= "0" x (3 - (length($color) % 3)) if (length($color) % 3);
1043
1044 #DUPLICATE SHORTHAND HEX
104520179.17ms if (length($color) == 3) {
104666823µs66340µs $color =~ m/(.)(.)(.)/;
# spent 340µs making 66 calls to Mail::SpamAssassin::HTML::CORE:match, avg 5µs/call
1047661.25ms $color = "$1$1$2$2$3$3";
1048 }
1049
1050 } else {
1051 return "invalid";
1052 }
1053
1054 } else {
1055 #INVALID
1056
1057 #??RETURN BLACK SINCE WE DO NOT KNOW HOW THE MUA / BROWSER WILL PARSE
1058 #$color = "000000";
1059
1060648µs return "invalid";
1061 }
1062
1063 #print "DEBUG: before/after name_to_rgb new version: $before/$color\n";
1064
1065201724.7ms return "#" . $color;
1066}
1067
1068sub dec2hex {
1069 my ($dec) = @_;
1070 my ($pre) = '';
1071
1072 if ($dec < 16) {
1073 $pre = '0';
1074 }
1075
1076 return sprintf("$pre%lx", $dec);
1077}
1078
1079
108021.91ms2476µs
# spent 257µs (38+219) within Mail::SpamAssassin::HTML::BEGIN@1080 which was called: # once (38µs+219µs) by Mail::SpamAssassin::Message::Node::BEGIN@45 at line 1080
use constant URI_STRICT => 0;
# spent 257µs making 1 call to Mail::SpamAssassin::HTML::BEGIN@1080 # spent 219µs making 1 call to constant::import
1081
1082# resolving relative URIs as defined in RFC 2396 (steps from section 5.2)
1083# using draft http://www.gbiv.com/protocols/uri/rev-2002/rfc2396bis.html
1084
# spent 191ms (145+46.9) within Mail::SpamAssassin::HTML::_parse_uri which was called 5410 times, avg 35µs/call: # 2705 times (75.4ms+32.4ms) by Mail::SpamAssassin::HTML::target_uri at line 1131, avg 40µs/call # 2705 times (69.3ms+14.5ms) by Mail::SpamAssassin::HTML::target_uri at line 1132, avg 31µs/call
sub _parse_uri {
108554109.98ms my ($u) = @_;
108654108.93ms my %u;
10875410110ms541046.9ms ($u{scheme}, $u{authority}, $u{path}, $u{query}, $u{fragment}) =
# spent 46.9ms making 5410 calls to Mail::SpamAssassin::HTML::CORE:match, avg 9µs/call
1088 $u =~ m|^(?:([^:/?#]+):)?(?://([^/?#]*))?([^?#]*)(?:\?([^#]*))?(?:#(.*))?|;
1089541091.9ms return %u;
1090}
1091
1092
# spent 459ms (368+91.1) within Mail::SpamAssassin::HTML::_remove_dot_segments which was called 2703 times, avg 170µs/call: # 2702 times (368ms+91.1ms) by Mail::SpamAssassin::HTML::target_uri at line 1145, avg 170µs/call # once (35µs+19µs) by Mail::SpamAssassin::HTML::target_uri at line 1151
sub _remove_dot_segments {
109327035.88ms my ($input) = @_;
109427035.25ms my $output = "";
1095
1096270334.3ms27036.43ms $input =~ s@^(?:\.\.?/)@/@;
# spent 6.43ms making 2703 calls to Mail::SpamAssassin::HTML::CORE:subst, avg 2µs/call
1097
109827039.91ms while ($input) {
10998468358ms2540484.7ms if ($input =~ s@^/\.(?:$|/)@/@) {
# spent 84.7ms making 25404 calls to Mail::SpamAssassin::HTML::CORE:subst, avg 3µs/call
1100 }
1101 elsif ($input =~ s@^/\.\.(?:$|/)@/@) {
1102 $output =~ s@/?[^/]*$@@;
1103 }
1104 elsif ($input =~ s@(/?[^/]*)@@) {
1105846829.9ms $output .= $1;
1106 }
1107 }
1108270321.6ms return $output;
1109}
1110
1111sub _merge_uri {
1112 my ($base_authority, $base_path, $r_path) = @_;
1113
1114 if (defined $base_authority && !$base_path) {
1115 return "/" . $r_path;
1116 }
1117 else {
1118 if ($base_path =~ m|/|) {
1119 $base_path =~ s|(?<=/)[^/]*$||;
1120 }
1121 else {
1122 $base_path = "";
1123 }
1124 return $base_path . $r_path;
1125 }
1126}
1127
1128
# spent 926ms (276+650) within Mail::SpamAssassin::HTML::target_uri which was called 2705 times, avg 342µs/call: # 2705 times (276ms+650ms) by Mail::SpamAssassin::HTML::push_uri at line 329, avg 342µs/call
sub target_uri {
112927055.68ms my ($base, $r) = @_;
1130
1131270530.6ms2705108ms my %r = _parse_uri($r); # parsed relative URI
# spent 108ms making 2705 calls to Mail::SpamAssassin::HTML::_parse_uri, avg 40µs/call
1132270527.1ms270583.7ms my %base = _parse_uri($base); # parsed base URI
# spent 83.7ms making 2705 calls to Mail::SpamAssassin::HTML::_parse_uri, avg 31µs/call
113327054.40ms my %t; # generated temporary URI
1134
113527056.68ms if ((not URI_STRICT) and
1136 (defined $r{scheme} && defined $base{scheme}) and
1137 ($r{scheme} eq $base{scheme}))
1138 {
1139 undef $r{scheme};
1140 }
1141
114227059.61ms if (defined $r{scheme}) {
114327027.98ms $t{scheme} = $r{scheme};
114427027.44ms $t{authority} = $r{authority};
1145270222.1ms2702459ms $t{path} = _remove_dot_segments($r{path});
# spent 459ms making 2702 calls to Mail::SpamAssassin::HTML::_remove_dot_segments, avg 170µs/call
1146270214.3ms $t{query} = $r{query};
1147 }
1148 else {
1149312µs if (defined $r{authority}) {
115013µs $t{authority} = $r{authority};
115118µs154µs $t{path} = _remove_dot_segments($r{path});
# spent 54µs making 1 call to Mail::SpamAssassin::HTML::_remove_dot_segments
115213µs $t{query} = $r{query};
1153 }
1154 else {
115529µs if ($r{path} eq "") {
115627µs $t{path} = $base{path};
115728µs if (defined $r{query}) {
1158 $t{query} = $r{query};
1159 }
1160 else {
116126µs $t{query} = $base{query};
1162 }
1163 }
1164 else {
1165 if ($r{path} =~ m|^/|) {
1166 $t{path} = _remove_dot_segments($r{path});
1167 }
1168 else {
1169 $t{path} = _merge_uri($base{authority}, $base{path}, $r{path});
1170 $t{path} = _remove_dot_segments($t{path});
1171 }
1172 $t{query} = $r{query};
1173 }
117427µs $t{authority} = $base{authority};
1175 }
117638µs $t{scheme} = $base{scheme};
1177 }
117827057.56ms $t{fragment} = $r{fragment};
1179
1180 # recompose URI
118127055.18ms my $result = "";
118227059.34ms if ($t{scheme}) {
118327028.46ms $result .= $t{scheme} . ":";
1184 }
1185 elsif (defined $t{authority}) {
1186 # this block is not part of the RFC
1187 # TODO: figure out what MUAs actually do with unschemed URIs
1188 # maybe look at URI::Heuristic
1189118µs26µs if ($t{authority} =~ /^www\d*\./i) {
# spent 6µs making 2 calls to Mail::SpamAssassin::HTML::CORE:match, avg 3µs/call
1190 # some spammers are using unschemed URIs to escape filters
1191 $result .= "http:";
1192 }
1193 elsif ($t{authority} =~ /^ftp\d*\./i) {
1194 $result .= "ftp:";
1195 }
1196 }
119727059.47ms if ($t{authority}) {
119826547.34ms $result .= "//" . $t{authority};
1199 }
120027059.05ms $result .= $t{path};
120127055.65ms if (defined $t{query}) {
12023281.62ms $result .= "?" . $t{query};
1203 }
120427055.05ms if (defined $t{fragment}) {
1205832µs $result .= "#" . $t{fragment};
1206 }
1207270564.2ms return $result;
1208}
1209
12101157µs1;
1211__END__
 
# spent 1.16s within Mail::SpamAssassin::HTML::CORE:match which was called 231886 times, avg 5µs/call: # 46892 times (96.6ms+0s) by Mail::SpamAssassin::HTML::html_tag at line 270, avg 2µs/call # 33070 times (95.3ms+0s) by Mail::SpamAssassin::HTML::html_text at line 754, avg 3µs/call # 29608 times (324ms+0s) by Mail::SpamAssassin::HTML::html_text at line 745, avg 11µs/call # 29444 times (193ms+0s) by Mail::SpamAssassin::HTML::html_text at line 759, avg 7µs/call # 28303 times (184ms+0s) by Mail::SpamAssassin::HTML::text_style at line 496, avg 7µs/call # 16907 times (66.7ms+0s) by Mail::SpamAssassin::HTML::html_whitespace at line 310, avg 4µs/call # 16542 times (56.7ms+0s) by Mail::SpamAssassin::HTML::html_uri at line 355, avg 3µs/call # 5410 times (46.9ms+0s) by Mail::SpamAssassin::HTML::_parse_uri at line 1087, avg 9µs/call # 4740 times (27.7ms+0s) by Mail::SpamAssassin::HTML::html_font_invisible at line 554, avg 6µs/call # 4739 times (18.3ms+0s) by Mail::SpamAssassin::HTML::html_font_invisible at line 561, avg 4µs/call # 3213 times (11.5ms+0s) by Mail::SpamAssassin::HTML::close_table_tag at line 409, avg 4µs/call # 2221 times (6.29ms+0s) by Mail::SpamAssassin::HTML::text_style at line 500, avg 3µs/call # 2023 times (6.69ms+0s) by Mail::SpamAssassin::HTML::name_to_rgb at line 1012, avg 3µs/call # 2017 times (6.08ms+0s) by Mail::SpamAssassin::HTML::name_to_rgb at line 1027, avg 3µs/call # 2017 times (3.61ms+0s) by Mail::SpamAssassin::HTML::name_to_rgb at line 1017, avg 2µs/call # 2017 times (3.37ms+0s) by Mail::SpamAssassin::HTML::name_to_rgb at line 1022, avg 2µs/call # 776 times (5.00ms+0s) by Mail::SpamAssassin::HTML::html_tests at line 632, avg 6µs/call # 776 times (4.48ms+0s) by Mail::SpamAssassin::HTML::html_tests at line 628, avg 6µs/call # 391 times (1.43ms+0s) by Mail::SpamAssassin::HTML::html_text at line 763, avg 4µs/call # 272 times (1.77ms+0s) by Mail::SpamAssassin::HTML::html_tests at line 664, avg 7µs/call # 198 times (724µs+0s) by Mail::SpamAssassin::HTML::text_style at line 483, avg 4µs/call # 175 times (1.71ms+0s) by Mail::SpamAssassin::HTML::html_tests at line 611, avg 10µs/call # 67 times (344µs+0s) by Mail::SpamAssassin::HTML::html_declaration at line 789, avg 5µs/call # 66 times (340µs+0s) by Mail::SpamAssassin::HTML::name_to_rgb at line 1046, avg 5µs/call # 2 times (6µs+0s) by Mail::SpamAssassin::HTML::target_uri at line 1189, avg 3µs/call
sub Mail::SpamAssassin::HTML::CORE:match; # opcode
# spent 562ms within Mail::SpamAssassin::HTML::CORE:subst which was called 101699 times, avg 6µs/call: # 29608 times (251ms+0s) by Mail::SpamAssassin::HTML::display_text at line 698, avg 8µs/call # 25404 times (84.7ms+0s) by Mail::SpamAssassin::HTML::_remove_dot_segments at line 1099, avg 3µs/call # 14215 times (60.7ms+0s) by Mail::SpamAssassin::HTML::display_text at line 705, avg 4µs/call # 14202 times (56.8ms+0s) by Mail::SpamAssassin::HTML::display_text at line 693, avg 4µs/call # 4261 times (21.9ms+0s) by Mail::SpamAssassin::HTML::canon_uri at line 340, avg 5µs/call # 4261 times (16.1ms+0s) by Mail::SpamAssassin::HTML::canon_uri at line 339, avg 4µs/call # 2703 times (6.43ms+0s) by Mail::SpamAssassin::HTML::_remove_dot_segments at line 1096, avg 2µs/call # 2325 times (13.8ms+0s) by Mail::SpamAssassin::HTML::name_to_rgb at line 1003, avg 6µs/call # 2325 times (12.7ms+0s) by Mail::SpamAssassin::HTML::name_to_rgb at line 1002, avg 5µs/call # 2017 times (14.4ms+0s) by Mail::SpamAssassin::HTML::name_to_rgb at line 1029, avg 7µs/call # 189 times (14.8ms+0s) by Mail::SpamAssassin::HTML::parse at line 244, avg 78µs/call # 189 times (9.08ms+0s) by Mail::SpamAssassin::HTML::parse at line 248, avg 48µs/call
sub Mail::SpamAssassin::HTML::CORE:subst; # opcode
# spent 28.5ms within Mail::SpamAssassin::HTML::CORE:substcont which was called 1404 times, avg 20µs/call: # 1404 times (28.5ms+0s) by Mail::SpamAssassin::HTML::parse at line 248, avg 20µs/call
sub Mail::SpamAssassin::HTML::CORE:substcont; # opcode