Gessel On…

…this and that.

Thursday, August 7, 2008

Fixing ImageMagick resize in Postie

I noticed that postie was doing a terrible job at resizing images.

It turns out that the default GD library isn’t super good at resizing - it does a simple subsample and the result is quite jaggy (see the GD version of this image in this post)

The full size view of our camp and Carolyn.

I think the version above looks a lot better. It should have been as easy as just turning on the “use ImageMagick” function in the postie config, but it wasn’t that simple. Two files were not where they were expected to be. The easy one is “convert” which postie expects to find at /usr/bin/convert, but under BSD is actually at /usr/local/bin/convert. This isn’t a big deal as there’s a config option to point postie in the right direction. A bit harder is ImageMagick identify which postie expects to find at /usr/bin/identify, but for which there is no config entry.

The fix for BSD is to edit around line 1768 of postie-functions.php and change /usr/bin/identify to /usr/local/bin/identify before the first run or by resetting postie to defaults. If you’ve already installed postie and don’t want to reset the defaults you may need to edit the postie config database (I did) using, for example, PHPMyAdmin and set the value of IMAGEMAGICK_IDENTIFY to /usr/local/bin/identify.

And thus one gets nice, pretty postie thumbnails.

posted at 02:16:44 more on... FreeBSD, photo, technology  

Sunday, April 6, 2008

PHP, Pear, pspell and a core dump

PHP

I’ve been getting core dumps from HTTPD since doing an update which included PHP. This happened to me before and I thought I’d try the same solution again, but it didn’t work. Pear was due an update portupgrade -ra would get to the update and error out. Attempting manually force it was a dead end:
install ok: channel://pear.php.net/Console_Getopt-1.2.2
install ok: channel://pear.php.net/Structures_Graph-1.0.2
*** Signal 11

Couldn’t find any help on pear.php.net except to say it was a PHP problem. That seemed more likely when I found that
# php -v
yielded
segmentation fault (core dumped)

Many fingers point to ZEND, and a few to recode.so but one pointed to pspell.so

I deleted that line from my .../etc/php/extensions.ini and voila:

claudel# php -v
PHP 5.2.5 with Suhosin-Patch 0.9.6.2 (cli) (built: Apr 5 2008 16:51:20)
Copyright (c) 1997-2007 The PHP Group
Zend Engine v2.2.0, Copyright (c) 1998-2007 Zend Technologies

I recompiled all the whole PHP dependency tree with -O2 and still it works fine and I could update pear right to 1.7.1

posted at 01:56:34 more on... FreeBSD, technology  

Monday, October 22, 2007

fixing GeoIP for awstats

http://forum.maxmind.com/viewtopic.php?t=27 helped, but the real key was hardcoding the database location in geoip.pm line 63: if (! $datafile) { $datafile=”GeoIP.dat”; } to if (! $datafile) { $datafile=”/path/to/GeoIP.dat”; } .

posted at 10:45:14 more on... FreeBSD, technology  

Thursday, October 18, 2007

Eliminating Spam with Procmail and SpamAssassin

For years I’ve fought spam with all sorts of techniques, some limited server side tricks in setting my postfix rules to very strict adherence and using RBLs, but ultimately settling on whitelist filtering on my Trusty Eudora client, POPping all that spam over whatever airport international dialup I happened to be on and cursing it even as it disappeared into the UBC folder for bulk deletion.

And I dreamed of the day when I would switch to IMAP and set up all those cool anti-spam server-side techniques I’d been reading about, primarily SpamAssassin. The problem with spam filtering is that it often catches your friends.

So I found this great procmail filter that whitelisted on the server side and sent confirmation requests to unlisted addresses. So I installed Procmail on my server, then SpamAssassin, and rewrote the filter below to do just what I wanted:

My .procmailrc

(more…)

posted at 20:49:37 more on... FreeBSD, technology  

Sunday, October 14, 2007

Courier IMAP 4.2.0 breaks SSL-Authd on 993

Updating to Courier IMAP 4.2.0_1 broke authentication with SSL on 993 for roundcube (and perhaps others) but not for Thunderbird. The following worked for me:

Change:

TLS_PROTOCOL=SSL3

to:

TLS_PROTOCOL=SSL23

in /usr/local/etc/courier-imap/imapd-ssl

(from this fine site)

posted at 13:12:47 more on... FreeBSD, technology  

Monday, October 8, 2007

I hate thunderbird

So once, long ago, I moved to IMAP on my server. I wanted to move there with my trusty Eudora client that I’ve been using since about 1993. Sure, I flirted with other mail systems, but they screwed me and what I care about most was:

  1. Never Lose Data (early versions of Thunderbird were not so good about this for me)
  2. Search my several gigabyte database of mail fast enough to be useful.

Now outlook is absolutely intolerable about this last point. Search in all Microsoft products, indexed or not, is so painfully slow one might as well go on vacation. It is incomprehensible to me how it can suck so bad. I remember in 1990 using OnLocation and searching my entire computer (all 20MB of it) in a fraction of a second. Sure it was less data, but it was also doing it on a 33Mhz 68030.

Eudora lets me find my mail. Eudora lets me get my job done. Unfortunately Eudora can’t search an IMAP mailbox unless it is connected. WTF?

OK, time for Thunderbird. But Thunderbird is so not ready for prime time. There’s the massive delays to open any of my larger mailboxes, even to show titles (14,000 messages in a mailbox is NOT too many, who uses this? Kids?) Second it gets confused easily communicating with the IMAP server which tends to lock it up indefinitely. Still, it does cache locally and the built-in search, while interminably slow is faster than Microsoft Search (but doesn’t search across accounts! Hello!). I’m hoping Google Desktop Search will help. Initial results are promising. And Penelope could be very cool. Especially if they add indexed search.

One little change I had to make for Thunderbird was given at this fine site:

perl -p -i -e 's/^MAXDAEMONS=40/MAXDAEMONS=80/g' /usr/local/etc/courier-imap/imapd

perl -p -i -e 's/^MAXPERIP=4/MAXPERIP=40/g'  /usr/local/etc/courier-imap/imapd
posted at 15:45:05 more on... FreeBSD, reviews, technology  

Saturday, October 6, 2007

updating bothers

Apache LogoPHP  Logo
I recently did a portupgrade -ra on my server after some period of complacence. It was instigated by having to clean out my mySQL logs after they ate up 30GB of disk space and caused some table corruption.

Anyway, the key details were that
apache+mod_ssl-1.3.37+2.8.28 > needs updating (port has 1.3.39+2.8.30)
php5-5.2.3 > needs updating (port has 5.2.4)

(among about 50 others)

Some foreshadowing.. once I updated and rebooting I get in /var/log/messages only
kernel: pid 1127 (httpd), uid 0: exited on signal 11 (core dumped)
and in /var/log/httpd-error.log only
[info] mod_unique_id: using ip addr 66.93.181.130
every time I “apachectl start” (and after setting apache.config log level to “debug”)

No go.

Much email searching ensued, but Torfinn Ingolfsen on the free-bsd-stable mailing list suggested looking at PHP. Turned out disabling php.so in httpd.conf got apache sort of working, but that was no help, so I thought, eh, why not migrate to apache 2.2.6?

That helped a lot. First the default config didn’t get run with SSL (crash) but that was hinted in the config files

Oct 02 11:30:26 2007] [info] mod_unique_id: using ip addr 66.93.181.130
[Tue Oct 02 11:30:27 2007] [info] Init: Seeding PRNG with 136 bytes of entropy
[Tue Oct 02 11:30:27 2007] [info] Loading certificate & private key of SSL-aware server
[Tue Oct 02 11:30:27 2007] [error] Server should be SSL-aware but has no certificate configured [Hint: SSLCertificateFile]

so I disabled that for the moment. But I was also getting periodic seg faults from Apache. No details (even less than with 1.3.39). Disabling PHP made them go away, but at least apache 2 was self-restarting, so aside from log pollution, no problem…

It occurred to me that my make.conf -O2 compiler specification might be part of the problem, so I changed just that to -O1 and recompiled with portupdate -rf PHP and no more seg faults. 5.2.3 had no trouble with -O2, but 5.2.4 doesn’t seem stable with O2 optimization.

The SSL problem was that /usr/local/etc/apache22/httpd.conf had a bit at the end about the following being present to support starting SSL… blah blah, 3rd line from the bottom “SSLEngine on.” It was turning on the engine twice since I was using extra/httpd-ssl.conf already. I commented that line out and everything seems fine now.

posted at 11:00:15 more on... FreeBSD, technology  

Wednesday, September 12, 2007

Search Engine Enhancement

Getting timely search engine coverage of a site means people can find things soon after you change or post them.

Linked pages get searched by most search engines following external links or manual URL submissions every few days or so, but they won’t find unlinked pages or broken links, and it is likely that the ranking and efficiency of the search is suboptimal compared to a site that is indexed for easy searching using a sitemap.

There are three basic steps to having a page optimally indexed:

  • Generating a Sitemap
  • Creating an appropriate robots.txt file
  • Informing search engines of the site’s existence

Sitemaps
It seems like the world has settled on sitemaps for making search engine’s lives easier. There is no indication that a sitemap actually improves rank or search rate, but it seems likely that it does, or that it will soon. The format was created by Google, and is supported by Google, Yahoo, Ask, and IBM, at least. The reference is at sitemaps.org.

Google has created a python script to generate a sitemap through a number of methods: walking the HTML path, walking the directory structure, parsing Apache-standard access logs, parsing external files, or direct entry. It seems to me that walking the server-side directory structure is the easiest, most accurate method. The script itself is on sourceforge . The directions are good, but if you’re only using directory structure, the config.xml file can be edited down to something like:

<?xml version="1.0" encoding="UTF-8"?>
<site
  base_url="http://www.your-site.com/"
  store_into="/www/data-dist/your_site_directory/sitemap.xml.gz"
  verbose="1"
  >

 <url href="http://www.your-site.com/" />
 <directory
    path="/www/data-dist/your_site_directory"
    url="http://www.your-site.com/"
    default_file="index.html"
 />

Note that this will index every file on the site, which can be large. If you use your site for media files or file transfer, you might not want to index every part of the site. In which case you can use filters to block the indexing of parts of the site or certain file types. If you only want to index web files you might insert the following:

 <filter  action="pass"  type="wildcard"  pattern="*.htm"           />
 <filter  action="pass"  type="wildcard"  pattern="*.html"          />
 <filter  action="pass"  type="wildcard"  pattern="*.php"           />
 <filter  action="drop"  type="wildcard"  pattern="*"               />

Running the script with

python sitemap_gen.py --config=config.xml

will generate the sitemap.xml.gz file and put it in the right place. If the uncompressed file size is over 10MB, you’ll need to pare down the files listed. This can happen if the filters are more inclusive than what I’ve given, particularly if you have large photo or media directories or something like that and index all the media and thumbnail files.

The sitemap will tend to get out of date. If you want to update it regularly , there are a few options: one is to use a wordpress sitemap generator (if that’s what you’re using and indexing) which does the right thing and generates a sitemap using relevant data available to wordpress and not to the file system (a good thing) and/or add a chron script to regenerate the sitemap regularly, for example

3  3  *  *  *  root python /path_to/sitemap_gen.py --config=/path_to/config.xml

will update the sitemap daily.

robots.txt

The robots.txt file can be used to exclude certain search engines, for example MSN if you don’t like Microsoft for some reason and are willing to sacrifice traffic to make a point, it also points search engines to your sitemap.txt file. There’s kind of a cool tool here that generates a robots.txt file for you but a simple one might look like:

User-agent: MSNBot                             % Agent I don't like for some reason
Disallow: /                                    % path it isn't allowed to traverse
User-agent: *                                  % For everything else
Disallow:                                      % Nothing is disallowed
Disallow: /cgi-bin/                            % Directory nobody can index
Sitemap: http://www.my_site.com/sitemap.xml.gz % Where my sitemap is.

Telling the world

Search engines are supposed to do the work, that’s their job, and they should find your robots.txt file eventually and then read the sitemap and then parse your site without any further assistance. But to expedite the process and possibly enhance search results there are some submission tools at Yahooo, Ask, and particularly Google that generally allow you to add meta information.
Ask
Ask.com allows you to submit your sitemap via URL (and that seems to be all they do)
http://submissions.ask.com/ping?sitemap=http://www.your_site.com/sitemap.xml.gz


Yahoo
Yahoo has some site submission tools and supports site authentication, which means putting a random string in a file they can find to prove you have write-access to the server. Their tools are at
https://siteexplorer.search.yahoo.com/mysites


with submissions at
https://siteexplorer.search.yahoo.com/submit.php


you can submit sites and feeds. I usually use the file authentication which means creating a file with some random string (y_key_random_string.html) with another random string as the only contents. They authenticate within 24 hours.
It isn’t clear that if you have a feed and submit it that it does not also add a site, it looks like it does. If you don’t have a feed you may not need to authenticate the site for submission.
Google
Google has a lot of webmaster tools at
https://www.google.com/webmasters/tools/siteoverview?hl=en


The verification process is similar but you don’t have to put data inside the verification file so

touch googlerandomstring.html

is all you need to get the verification file up. You submit the URL to the sitemap directly.
Google also offers blog tools at
http://blogsearch.google.com/ping


Where you can manually add the feed for the blog to Google’s blog search tool.

posted at 13:25:13 more on... FreeBSD, technology  

Powered by WordPress