Audio
Post has an audio file
Audio File Analysis With Sox
Sox is a cool program, a “Swiss Army knife of sound processing,” and a useful tool for checking audio files that belongs in anyone’s audio processing workflow. I thought it might be useful for detecting improperly encoded audio files or those files that have decayed due to bit rot or cosmic rays or other acoustic calamities and it is.
Sox has two statistical output command line options, “stat” and “stats,” which output different but useful data. What’s useful about sox for this, that some metadata checking programs (like the very useful MP3Diags-unstable) don’t do is actually decode the file and compute stats from the actual audio data. This takes some time, about 0.7 sec for a typical (5 min) audio file. This may seem fast, it is certainly way faster than real time, but if you want to process 22,000 files, it will take 4-5 hours.
Some of the specific values that are calculated seem to mean something obvious, like “Flat factor” is related to the maximum number of identical samples in a row – which would make the waveform “flat.” But the computation isn’t linear and there is a maximum value (>30 is a bad sign, usually).
So I wrote a little program to parse out the results and generate a csv file of all of the results in tabular form for analysis in LibreOffice Calc. I focused on a few variables I thought might be indicative of problems, rather than all of them:
- DC offset—which you’d hope was always close to zero.
- Min-Max level difference—min and max should be close to symmetric and usually are, but not always.
- RMS pk dB—which is normally set for -3 or -6 dB, but shouldn’t peak at nearly silent, -35 dB.
- Flat factor—which is most often 0, but frequently not.
- Pk count—the number of samples at peak, which is most often 2
- Length s—the length of the file in seconds, which might indicate a play problem
After processing 22,000 files, I gathered some statistics on what is “normal” (ish, for this set of files), which may be of some use in interpreting sox results. The source code for my little bash script is at the bottom of the post.
DC Bias
DC Bias really should be very close to zero, and the most files are fairly close to zero, but some in the sample had a bias of greater than 0.1, which even so has no perceptible audio impact.
Min Level – Max Level
Min level is most often normalized to -1 and max level most often normalized to +1, which would yield a difference of 2 or a difference of absolute values of 0 (as measured) and this is the most common result (31.13%). A few files, 0.05% or so have a difference greater than 0.34, which is likely to be a problem and is worth a listen.
RMS pk dB
Peak dB is a pretty important parameter to optimize as an audio engineer and common settings are -6dB and -3dB for various types of music, however if a set of files is set as a group, individual files can be quite a bit lower or, sometimes, a bit higher. Some types of music, psychobilly for example, might be set even a little over -3 dB. A file much above -3 dB might have sound quality problems or might be corrupted to be just noise; 0.05% of files have a peak dB over -2.2 dB. A file with peak amplitudes much below -30 dB may be silent and certainly will be malto pianissimo; 0.05% of files have a peak dB below -31.2 dB.
A very quiet sample, with a Pk dB of -31.58, would likely have a lot of aliasing due to the entire program using only about 10% of the total head room.
Flat factor
Flat factor is a complicated measure, but is roughly (but not exactly) the maximum number of consecutive identical samples. @AkselA offered a useful oneliner (sox -n -p synth 10 square 1 norm -3 | sox - -n stats
) to verify that it is not, exactly, just a run of identical values and just what it actually is, isn’t that well documented. Whatever it is exactly, 0 is the right answer and 68% of files get it right. Only 0.05% of files have a flat factor greater than 27.
Pk count
Peak count is a good way to measure clipping. 0.05% of files have a pk count < 1000, but the most common value, 65.5%, is 2, meaning most files are normalized to peak at 100%… exactly twice (log scale chart, the peak is at 2).
As an example, a file with levels set to -2.31 and a flat factor of only 14.31 but with a Pk count of 306,000 looks like this in Audacity with “Show Clipping” on, and yet sounds kinda like you’d think it is supposed to. Go figure.
Statistics
What’s life without statistics, sample pop: 22,096 files. 205 minutes run time or 0.56 seconds per file.
Stats | DC bias | min amp | max amp | min-max | avg pk dB | flat factor | pk count | length s |
Mode | 0.000015 | -1 | 1 | 0 | -10.05 | 0.00 | 2 | 160 |
Count at Mode | 473 | 7,604 | 7,630 | 6,879 | 39 | 14,940 | 14,472 | 14 |
% at mode | 2.14% | 34.41% | 34.53% | 31.13% | 0.18% | 67.61% | 65.50% | 0.06% |
Average | 0.00105 | -0.80 | 0.80 | 0.03 | -10.70 | 2.03 | 288.51 | 226.61 |
Min | 0 | -1 | 0.0480 | 0 | -34.61 | 0 | 1 | 4.44 |
Max | 0.12523 | -0.0478 | 1 | 0.497 | -1.25 | 129.15 | 306,000 | 7,176 |
Threshold | 0.1 | -0.085 | 0.085 | 0.25 | -2.2 | 27 | 1,000 | 1,200 |
Count @ Thld | 3 | 11 | 10 | 68 | 12 | 12 | 35 | 45 |
% @ Thld | 0.01% | 0.05% | 0.05% | 0.31% | 0.05% | 0.05% | 0.16% | 0.20% |
Bash Script
#!/bin/bash ############################################################### # This program uses sox to analyize an audio file for some # common indicators that the actual file data may have issues # such as corruption or have been badly prepared or modified # It takes a file path as an input and outputs to stdio the results # of tests if that file exceeds the theshold values set below # or, if the last conditional is commented out, all files. # a typical invocation might be something like: # find . -depth -type f -name "*.mp3" -exec soxverify.sh {} > stats.csv \; # The code does not handle single or multi-track files and will # throw an error. If sox can't read the file it will throw an error # to the csv file. Flagged files probably warrant a sound check. ############################################## ### Set reasonable threshold values ########## # DC offset should be close to zero, but is almost never exactly # The program uses the absolute value of DC offset (which can be # neg or positive) as a test and is normalized to 1.0 # If the value is high, total fidelity might be improved by # using audacity to remove the bias and recompressing. # files that exceed the dc_offset_bias will be output with # Error Code "O" dc_offset_threshold=0.1 # Most files have fairly symmetric min_level and max_level # values. If the min and max aren't symmetric, there may # be something wrong, so we compute and test. 99.95% of files have # a delta below 0.34, files with a min_max_delta above # min_max_delta_threshold will be flagged EC "D" min_max_delta_threshold=0.34 # Average peak dB is a standard target for normalization and # replay gain is common used to adjust files or albums that weren't # normalized to hit that value. 99.95% of files have a # RMS_pk_dB of < -2.2, higher than that is weird, check the sound. # Exceeding this threshold generates EC "H" RMS_pk_dB_threshold=-2.2 # Extremely quiet files might also be indicative of a problem # though some are simply malto pianissimo. 99.95% of files have # a minimum RMS_pk_dB > -31.2 . Files with a RMS pk dB < # RMS_min_dB_threshold will be flagged with EC "Q" RMS_min_dB_threshold=-31.2 # Flat_factor is a not-linear measure of sequential samples at the # same level. 68% of files have a flat factor of 0, but this could # be intentional for a track with moments of absolute silence # 99.95% of files have a flat factor < 27. Exceeding this threshold # generates EC "F" flat_factor_threshold=27 # peak_count is the number of samples at maximum volume and any value > 2 # is a strong indicator of clipping. 65% of files are mixed so that 2 samples # peak at max. However, a lot of "loud" music is engineered to clip # 8% of files have >100 "clipped" samples and 0.16% > 10,000 samples # In the data set, 0.16% > 1000 samples. Exceeding this threshold # generates EC "C" pk_count_threshold=1000 # Zero length (in seconds) or extremely long files may be, depending on # one's data set, indicative of some error. A file that plays back # in less time than length_s_threshold will generate EC "S" # file playing back longer than length_l_threshold: EC "L" length_s_threshold=4 length_l_threshold=1200 # Check if a file path is provided as an argument if [ "$#" -ne 1 ]; then echo "Usage: $0 <audio_file_path>" exit 1 fi audio_file="$1" # Check if the file exists if [ ! -f "$audio_file" ]; then echo "Error: File not found - $audio_file" exit 1 fi # Run sox with -stats option, remove newlines, and capture the output sox_stats=$(sox "$audio_file" --replay-gain off -n stats 2>&1 | tr '\n' ' ' ) # clean up the output sox_stats=$( sed 's/[ ]\+/ /g' <<< $sox_stats ) sox_stats=$( sed 's/^ //g' <<< $sox_stats ) # Check if the output contains "Overall" as a substring if [[ ! "$sox_stats" =~ Overall ]]; then echo "Error: Unexpected output from sox: $1" echo "$sox_stats" echo "" exit 1 fi # Extract and set variables dc_offset=$(echo "$sox_stats" | cut -d ' ' -f 6) min_level=$(echo "$sox_stats" | cut -d ' ' -f 11) max_level=$(echo "$sox_stats" | cut -d ' ' -f 16) RMS_pk_dB=$(echo "$sox_stats" | cut -d ' ' -f 34) flat_factor=$(echo "$sox_stats" | cut -d ' ' -f 50) pk_count=$(echo "$sox_stats" | cut -d ' ' -f 55) length_s=$(echo "$sox_stats" | cut -d ' ' -f 67) # convert DC offset to absolute value dc_offset=$(echo "$dc_offset" | tr -d '-') # convert min and max_level to absolute values: abs_min_lev=$(echo "$min_level" | tr -d '-') abs_max_lev=$(echo "$max_level" | tr -d '-') # compute delta and convert to abs value min_max_delta_int=$(echo "abs_max_lev - abs_min_lev" | bc -l) min_max_delta=$(echo "$min_max_delta_int" | tr -d '-') # parss pkcount pk_count=$( sed 's/k/000/' <<< $pk_count ) pk_count=$( sed 's/M/000000/' <<< $pk_count ) # Compare values against thresholds threshold_failed=false err_code="ERR: " # Offset bad check if (( $(echo "$dc_offset > $dc_offset_threshold" | bc -l) )); then threshold_failed=true err_code+="O" fi # Large delta check if (( $(echo "$min_max_delta >= $min_max_delta_threshold" | bc -l) )); then threshold_failed=true err_code+="D" fi # Mix set too high check if (( $(echo "$RMS_pk_dB > $RMS_pk_dB_threshold" | bc -l) )); then threshold_failed=true err_code+="H" fi # Very quiet file check if (( $(echo "$RMS_pk_dB < $RMS_min_dB_threshold" | bc -l) )); then threshold_failed=true err_code+="Q" fi # Flat factor check if (( $(echo "$flat_factor > $flat_factor_threshold" | bc -l) )); then threshold_failed=true err_code+="F" fi # Clipping check - peak is max and many samples are at peak if (( $(echo "$max_level >= 1" | bc -l) )); then if (( $(echo "$pk_count > $pk_count_threshold" | bc -l) )); then threshold_failed=true err_code+="C" fi fi # Short file check if (( $(echo "$length_s < $length_s_threshold" | bc -l) )); then threshold_failed=true err_code+="S" fi # Long file check if (( $(echo "$length_s > $length_l_threshold" | bc -l) )); then threshold_failed=true err_code+="L" fi # for data collection purposes, comment out the conditional and the values # for all found files will be output. if [ "$threshold_failed" = true ]; then echo -e "$1" "\t" "$err_code" "\t" "$dc_offset" "\t" "$min_level" "\t" "$max_level" "\t" "$min_max_delta" "\t" "$RMS_pk_dB" "\t" "$flat_factor" "\t" "$pk_count" "\t" "$length_s" fi
Smol bash script for finding oversize media files
Sometimes you want to know if you have media files that are taking up more than their fair share of space. You compressed the file some time ago in an old, inefficient format, or you just need to archive the oversize stuff, this can help you find em. It’s different from file size detection in that it uses mediainfo
to determine the media file length and a variety of other useful data bits and wc -c
to get the size (so data rate includes any file overhead), and from that computes the total effective data rate. All math is done with bc
, which is usually installed. Files are found recursively (descending into sub-directories) from the starting point (passed as first argument) using find
.
basic usage would be:
./find-high-rate-media.sh /search/path/tostart/ [min bpp] [min data rate] [min size] > oversize.csv 2>&1
The script will then report media with a rate higher than minimum and size larger than minimum as a tab delimited list of filenames, calculated rate, and calculated size. Piping the output to a file, output.csv
, makes it easy to sort and otherwise manipulate in LibreOffice Calc as a tab delimited file. The values are interpreted as the minimum for suppression of output, so any file that exceeds all three minimum triggers will be output to the screen (or .csv file if so redirected).
The script takes four command line variables:
- The starting directory, which defaults to . [defaults to the directory the script is executed in]
- The minimum bits per pixel (including audio, sorry) for exclusions (i.e. more bpp and the filename will be output) [defaults to 0.25 bpp]
- The minimum data rate in kbps [defaults to 1 kbps so files would by default only be excluded by bits per pixel rate]
- The minimum file size in megabytes [defaults to 1mb so files would by default only be excluded by bits per pixel rate]
Save the file as a name you like (such as find-high-rate-media.sh) and # chmod +x find-high-rate-media.sh
and run it to find your oversized media.
!/usr/bin/bash ############################# USE ####################################################### # This creates a tab-delimeted CSV file of recursive directories of media files enumerating # key compression parameters. Note bits per pixel includes audio, somewhat necessarily given # the simplicity of the analysis. This can throw off the calculation. # find_media.sh /starting/path/ [min bits per pixel] [min data rate] [min file size mb] # /find-high-rate-media.sh /Media 0.2 400 0 > /recomp.csv 2>&1 # The "find" command will traverse the file system from the starting path down. # if output isn't directed to a CSV file, it will be written to screen. If directed to CSV # this will generate a tab delimted csv file with key information about all found media files # the extensions supported can be extended if it isn't complete, but verify that the # format is parsable by the tools called for extracting media information - mostly mediainfo # Typical bits per pixel range from 0.015 for a HVEC highly compressed file at the edge of obvious # degradation to quite a bit higher. Raw would be 24 or even 30 bits per pixel for 10bit raw. # Uncompressed YUV video is about 12 bpp. # this can be useful for finding under and/or overcompressed video files # the program will suppress output if the files bits per pixel is below the supplied threshold # to reverse this invert the rate test to " if (( $(bc <<<"$rate < $maxr") )); then..." # if a min data rate is supplied, output will be suppressed for files with a lower data rate # if a min file size is supplied, output will be suppressed for files smaller than this size ######################################################################################## # No argument given? if [ -z "$1" ]; then printf "\nUsage:\n starting by default in the current directory and searchign recusrively \n" dir="$(pwd)" else dir="$1" echo -e "starting in " $dir "" fi if [ -z "$2" ]; then printf "\nUsage:\n returning files with bits per pixel greater than default max of .25 bpp \n" maxr=0.25 else maxr=$2 echo -e "returning files with bits per pixel greater than " $maxr " bpp" fi if [ -z "$3" ]; then printf "\nUsage:\n returning files with data rate greater than default max of 1 kbps \n" maxdr=1 else maxdr=$3 echo -e "returning files with data rate greater than " $maxdr " kbps" fi if [ -z "$4" ]; then printf "\nUsage:\n no min file size provided returning files larger than 1MB \n" maxs=1 else maxs=$4 echo -e "returning files with file size greater than " $maxs " MB \n\n" fi msec="1000" kilo="1024" reint='^[0-9]+$' refp='^[0-9]+([.][0-9]+)?$' echo -e "file path \t rate bpp \t rate kbps \t V CODEC \t A CODEC \t Frame Size \t FPS \t Runtime \t size MB" find "$dir" -type f \( -iname \*.avi -o -iname \*.mkv -o -iname \*.mp4 -o -iname \*.wmv -iname \*.m4v \) -print0 | while read -rd $'\0' file do if [[ -f "$file" ]]; then bps="0.1" size="$(wc -c "$file" | awk '{print $1}')" duration="$(mediainfo --Inform="Video;%Duration%" "$file")" if ! [[ $duration =~ $refp ]] ; then duration=$msec fi seconds=$(bc -l <<<"${duration}/${msec}") sizek=$(bc -l <<<"scale=1; ${size}/${kilo}") sizem=$(bc -l <<<"scale=1; ${sizek}/${kilo}") rate=$(bc -l <<<"scale=1; ${sizek}/${seconds}") codec="$(mediainfo --Inform="Video;%Format%" "$file")" audio="$(mediainfo --Inform="Audio;%Format%" "$file")" framerate="$(mediainfo --Inform="General;%FrameRate%" "$file")" if ! [[ $framerate =~ $refp ]] ; then framerate=100 fi rtime="$(mediainfo --Inform="General;%Duration/String3%" "$file")" width="$(mediainfo --Inform="Video;%Width%" "$file")" if ! [[ $width =~ $reint ]] ; then width=1 fi height="$(mediainfo --Inform="Video;%Height%" "$file")" if ! [[ $height =~ $reint ]] ; then height=1 fi pixels=$(bc -l <<<"scale=1; ${width}*${height}*${seconds}*${framerate}") bps=$(bc -l <<<"scale=4; ${size}*8/${pixels}") if (( $(bc -l <<<"$bps > $maxr") )); then if (( $(bc -l <<<"$sizem > $maxs") )); then if (( $(bc -l <<<"$rate > $maxdr") )); then echo -e "$file" "\t" $bps "\t" $rate "\t" $codec "\t" $audio "\t" $width"x"$height "\t" $framerate "\t" $rtime "\t" $sizem fi fi fi fi done
Results might look like:
Another common task is renaming video files with some key stats on the contents so they’re easier to find and compare. Linux has limited integration with media information (dolphin is somewhat capable, but thunar not so much). This little script also leans on mediainfo
command line to append the following to the file name of media files recursively found below a starting directory path:
- WidthxHeight in pixels (e.g. 1920×1080)
- Runtime in HH-MM-SS.msec (e.g. 02-38-15.111) (colons aren’t a good thing in filenames, yah, it is confusingly like a date)
- CODEC name (e.g. AVC)
- Datarate (e.g. 1323kbps)
For example
kittyplay.mp4 -> kittyplay_1280x682_02-38-15.111_AVC_154.3kbps.mp4
The code is also available here.
#!/usr/bin/bash PATH="/home/gessel/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin" ############################# USE ####################################################### # find_media.sh /starting/path/ (quote path names with spaces) ######################################################################################## # No argument given? if [ -z "$1" ]; then printf "\nUsage:\n pass a starting point like \"/Downloads/Media files/\" \n" exit 1 fi msec="1000" kilo="1024" s="_" x="x" kbps="kbps" dot="." find "$1" -type f \( -iname \*.avi -o -iname \*.mkv -o -iname \*.mp4 -o -iname \*.wmv \) -print0 | while read -rd $'\0' file do if [[ -f "$file" ]]; then size="$(wc -c "$file" | awk '{print $1}')" duration="$(mediainfo --Inform="Video;%Duration%" "$file")" seconds=$(bc -l <<<"${duration}/${msec}") sizek=$(bc -l <<<"scale=1; ${size}/${kilo}") sizem=$(bc -l <<<"scale=1; ${sizek}/${kilo}") rate=$(bc -l <<<"scale=1; ${sizek}/${seconds}") codec="$(mediainfo --Inform="Video;%Format%" "$file")" framerate="$(mediainfo --Inform="General;%FrameRate%" "$file")" rtime="$(mediainfo --Inform="General;%Duration/String3%" "$file")" runtime="${rtime//:/-}" width="$(mediainfo --Inform="Video;%Width%" "$file")" height="$(mediainfo --Inform="Video;%Height%" "$file")" fname="${file%.*}" ext="${file##*.}" $(mv "$file" "$fname$s$width$x$height$s$runtime$s$codec$s$rate$kbps$dot$ext") fi done
If you don’t have mediainfo installed,
sudo apt update sudo apt install mediainfo
Audio Compression for Speech
Speech is generally a special class of audio files where compression quality is rated more on intelligibility than on fidelity, though the two related the former may be optimized at the expense of the latter to achieve very low data rates. A few codecs have emerged as particularly adept at this specific class: Speex, Opus, and the latest, Google’s Lyra, a deep learning enhanced codec.
Lyra is focused on Android and requires a bunch of Java cruft to build and needs debugging. It didn’t seem worth the effort, but I appreciate the Deep Learning based compression, it is clearly the most efficient compression possible.
I couldn’t find a quick whatcha-need-to-know is kind of summary of the codecs, so maybe this is useful:
Opus
On Ubuntu (and most Linux distros) you can install the Opus codec and supporting tools with a simple
# sudo apt install opus-tools
If you have ffmpeg
installed, it provides a framework for dealing with IO and driving libopus
from the command line like:
# ffmpeg -i infile.mp3 -codec:a libopus -b:a 8k -cutoff 8000 outfile.opus
Aside from infile.(format)
and outfile.opus
, there are two command line options that make sense to mess with to get good results: the bitrate -b:a (bit rate)
and the -cutoff (frequency)
, which must be 4000
(narrowband), 6000
(mediumband), 8000
(wideband), 12000
(super wideband), or 20000
(fullband). The two parameters work together and for speech limiting bandwidth saves bits for speech.
There are various research papers on the significance of frequency components in speech intelligibility that range from about 4kHz to about 8kHz (and “sometimes higher”). I’d argue useful cutoffs are 6000 and 8000 for most applications. The fewer frequency components fed into the encoder, the more bps remain to encode the residual. There will be an optimum value which will maximize the subjective measure of intelligibility times the objective metric of average bit rate that has to be determined empirically for recording quality, speaker’s voice, and transmission requirements.
In my tests, my sample, the voice I had to work with an 8kHz bandwidth made little perceptible difference to the quality of speech. 6kbps VBR (-b:a 6k
) compromised intelligibility, 8k did not, and 24k was not perceptibly compromised from the source.
one last option to consider might be the -application
, which yields subtle differences in encoding results. The choices are voip
which optimizes for speech, audio
(default) which optimizes for fidelity, and lowdelay
which minimizes latency for interactive applications.
# ffmpeg -i infile.mp3 -codec:a libopus -b:a 8k -application voip -cutoff 8000 outfile.opus
VLC player can play .opus files.
Speex
AFAIK, Speex isn’t callable by ffmpeg
yet, but the speex installer has a tool speexenc
that does the job.
# sudo apt install speex
Speexenc only eats raw and .wav files, the latter somewhat more easily managed. To convert an arbitrary input to wav, ffmpeg is your friend:
# ffmpeg -i infile.mp3 -f wav -bitexact -acodec pcm_s16le -ar 8000 -ac 1 wavfile.wav
Note the -ar 8000
option. This sets the sample rate to 8000 – Speexenc will yield unexpected output data rates unless sample rates are 8000
, 16000
, or 32000
, and these should correlate to the speexenc bandwidth options that will be used in the compression step (speexenc
doesn’t transcode to match): -n
“narroband,” -w
“wideband,” and -u
“ultrawideband”
# speexenc -n --quality 3 --vbr --comp 10 wavfile.wav outfile.spx
This sets the bandwidth to “narrow” (matching the 8k input sample rate), the quality to 3 (see table for data rates), enables VBR (not enabled by default with speex, but it is with Opus), and the “complexity” to 10 (speex defaults to 3 for faster encode, Opus defaults to 10), thus giving a pretty head-to-head comparison with the default Opus settings.
VLC can also play speex .spx files. yay VLC.
Results
The result is an 8kbps stream which is to my ear more intelligible than Opus at 8kbps – not 😮 better, but 😐 better. This is atypical, I expected Opus to be obviously better and it wasn’t for this sample. I didn’t carefully evaluate the -application voip
option, which would likely tip the tables results. Clearly YMMV so experiment.
Tagging MP3 Files with Puddletag on Linux Mint
A “fun” part of organizing an MP3 collection is harmonizing the tags so the datas work consistently with whatever management schema you prefer. My preference is management by the file system—genre/artist/year/album/tracks works for me—but consistent metainformation is required and often disharmonious. Finding metaharmony is a chore I find less taxing with a well structured tag editor and to my mind the ur-meta-tag manager is MP3TAG.
The problem is that only works with that dead-end spyware riddled failing legacyware called “Windows.” Fortunately, in Linux-land we have puddletag, a very solid clone of MP3TAG. The issues is that the version in repositories is (as of this writing) 1.20 and I couldn’t find a PPA for the latest, 2.0.1. But compiling from source is super easy and works in both Linux Mint 19 and Ubuntu 20.04 and version 2.20 on 22.04 which contains my mods to latinization of foreign scripts (yay open source!):
- Install pre-reqs to build (don’t worry, if they’re installed, they won’t be double installed)
- get the tarball of the source code
- expand it (into a reasonable directory, like ~/projects)
- switch into that directory
- run the python executable “puddletag” directly to verify it is working
- install it
- tell the desktop manager it’s there – and it should be in your window manager along with the rest of your applications.
The latest version as of this post was 2.0.1 from https://github.com/puddletag/puddletag
sudo apt install python3-pyqt5 python3-pyqt5.qtsvg python3-pyparsing python3-mutagen python3-acoustid libchromaprint-dev libchromaprint-tools libchromaprint1 wget href="https://github.com/puddletag/puddletag/releases/download/2.0.1/puddletag-2.0.1.tar.gz tar -xvf puddletag-2.0.1.tar.gz cd puddletag-2.0.1/ cd puddletag ./puddletag sudo python3 setup.py install sudo desktop-file-install puddletag.desktop
A nice feature is the configuration directory is portable and takes your complete customization with you – it is an extremely customizable program so you can generally configure it as fits your mental model. Just copy the entire puddletag directory located at ~/.configure/puddletag
.
The Daily, from the NYT, weirdly slow
From my distant location overseas, listening to the news via podcasts is a great way to keep up: something I’m quite grateful to have access to on demand and via the internets. Until the end of Net Neutrality means that only “Verizon Insights” and “Life at AT&T” are still accessible, I enjoy a range of news sources on a daily basis using a podcatcher called Beyond Pod. One of the essential features it has is the ability to speed up the tempo of podcasts, some of which are a bit slow as recorded. But one…. one is like dripping molasses on a winter day: The Daily from the NYT by Michael Barbaro. I’m pretty sure silences are inserted in editing to draw out the drama to infuriating lengths and the tempo of the audio is selectively slowed to about half normal speed. Nobody can actually talk that slowly. I mean listen and try – like actually try to draw out a word that might take 1 second to pronounce to two full seconds. It is a pretty good news summary and has some useful information, but there’s no way I’d suffer through it without setting the tempo to 2x.
Every time I accidentally stream the podcast, rather than downloading and playing, the tempo control is disabled and while I scramble to skip to the next podcast before my I start questioning reality I often wonder for a moment just how bad the pauses are. Here’s my analysis:
I consider the BBC Global News to be a very professional, truly “broadcast quality” podcast. The announcers are clear, comprehensible, and speak at a pace that is appropriate for a news broadcast. I still speed it up because daily life is like that now, but if I listen at normal pace, it isn’t even slightly annoying.
The Economist Radio is fairly typical of a print publication’s efforts at presenting print journalists as audio stars. it doesn’t always work out perfectly and the pacing varies a lot by who is speaking and the rather eclectic line-up. In general it is annoyingly slow, but not interminably so. It comes across as a bit amateur by broadcast standards, but well done and very informative.
But then there’s The Daily from the NYT. This podcast was the reason I took the time to figure out how to speed up playback. There was no other choice: either unsubscribe or speed it up to something not aneurysm inducing. Looking at the waveforms, I suspect they might actually be inserting silences of around 500msec between words, perhaps for dramatic effect (there’s way too much dramatic effect in a lot of the stories, which speeding up only hastens rather than fully alleviating—never have you heard so many interviewees break into uncomfortable tears as they’re overwhelmed by whatever the day’s tragedy is, an artifice only slightly less annoying than broadcasting the sound of someone eating. OMG, that’s real. Rule 34.)
Family Feud Basra Style
Gunfire is pretty common here, perhaps even more common than in Oakland though usually for the same reasons: celebrating holidays, sports victories, weddings, that sort of stuff. It is kind of fun to listen to and watch tracers and stuff, but usually the villa is also celebrating in an obvious way; when you hear gunfire you also hear cheers, at least at night.
This evening the house was quiet, but the gunfire sure wasn’t. The guys tell me it was a tribal feud in the neighborhood, quite close from the sound of it. This is a low-fi recording from my phone.
Speaker Build
In December of 2002 (really, 2002, 12 years ago (OMG, >20 years ago)), I decided that the crappy former Sony self-amplified speakers with blown amplifiers that I had wired into my stereo as surround speakers really didn’t sound very good as they were, by then, 7 years old and the holes in the plastic housing where the adjustment knobs once protruded were covered by aging gaffers tape.
At least it was stylish black tape.
I saw on ebay a set of “Boston Acoustics” woofers and tweeters back in the time when ebay prices could be surprisingly good. Boston Acoustics was a well-respected company at the time making fairly decent speakers. 36 woofers and 24 tweeters for $131 including shipping. About 100 lbs of drivers. And thus began the execution of a fun little project.
Design Phase: 2003-2011
I didn’t have enough data to design speaker enclosures around them, but about a year later (in 2003), I found this site, which had a process for calculating standard speaker properties with instruments I have (frequency generator, oscilloscope, etc.) I used the weighted diaphragm method.
WOOFER: PN 304-1150001-00 22 JUL 2000 80MM CONE DIA = 8CM FS = 58HZ RE = 3.04 OHMS QMS = 1.629 QES = 0.26 QTS = 0.224 CMS = 0.001222 VAS = 4.322 (LITERS) 264 CUBIC INCHES EBP = 177.8 NOMINAL COIL RESISTANCE @ 385HZ (MID LINEAR BAND) 3.19 OHMS NOMINAL COIL INDUCTANCE (@ 1KHZ) 0.448 MHENRY
TWEETER: PN 304-050001-00 16 OCT 2000 35MM CONE DIA FS = 269HZ RE = 3.29 OHMS QMS = 5.66 QES = 1.838 QTS = 1.387 CMS = 0.0006 VAS = 0.0778 (LITERS) EBP = 86.7 NOMINAL COIL RESISTANCE @ 930HZ (MID LINEAR BAND) 3.471 OHMS NOMINAL COIL INDUCTANCE (@ 1KHZ) 0.153 MHENRY
Awesome. I could specify a cross over and begin designing a cabinet. A few years went by…
In January of 2009 I found a good crossover at AllElectronics. It was a half decent match and since it was designed for 8 ohm woofers, I could put two of my 4 ohm drivers in series and get to about the right impedance for better power handling (less risk of clipping at higher volumes and lower distortion as the driver travel is cut in half, split between the two).
https://web.archive.org/web/20120904083243/http://www.allelectronics.com:80/make-a-store/item/XVR-21/2-WAY-CROSSOVER-INFINITY/1.html CROSS OVER FREQUENCY 3800HZ CROSSOVER LOW-PASS: 18DB, 8 OHM HIGH-PASS: 18DB, 4 OHM
Eventually I got around to calculating the enclosure parameters. I’m not sure when I did that, but sometime between 2009 and 2011. I found a site with a nice script for calculating a vented enclosure with dual woofers, just like I wanted and got the following parameters:
TARGET VOLUME 1.78 LITERS = 108 CUBIC INCHES DRIVER VOLUME (80MM) = 26.25 CUBIC INCHES = 0.43 LITERS CROSS OVER VOLUME = 2.93 CUBIC INCHES = 0.05 LITERS SUM = 0.91 LITERS 1" PVC PORT TUBE: OD = 2.68CM, ID = 2.1CM = 3.46 CM^2 PORT LENGTH = 10.48CM = 4.126" WIDTH = 12.613 = 4.829" HEIGHT = 20.408 = 7.82" DEPTH = 7.795 = 3"
In 2011 I got around to designing the enclosure in CAD:
There was no way to fit the crossover inside the enclosure as the drivers have massive, magnetically shielded drivers, so they got mounted on the outside. The speakers were designed for inside mounting (as opposed to flange mounting) so I opted to radius the opening to provide some horn-loading.
I also, over the course of the project, bought some necessary tools to be prepared for eventually doing the work: a nice Hitachi plunge router and a set of cheap router bits to form the radii and hole saws of the right size for the drivers and PVC port tubes.
Build Phase (2014)
This fall, Oct 9 2014, everything was ready and the time was right. The drivers had aged just the appropriate 14 years since manufacture and were in the peak of their flavor.
I started by cutting down some PVC tubes to make the speaker ports and converting some PVC caps into the tweeter enclosure. My first experiment with recycled shelf wood for the tweeter mounting plate failed: the walls got a bit thin and it was clear that decent plywood would make life easier. I used the shelf wood for the rest of the speaker: it was salvaged from my building, which was built in the 1930s and is probably almost 100 years old. The plywood came with the building as well, but was from the woodworker who owned it before me.
I got to use my router after so many years of contemplation to shape the faceplates, fabricated from some fairly nice A-grade plywood I had lying around.
Once I got the boxes glued up, I installed the wiring and soldered the drivers in. The wood parts were glued together with waterproof glue while the tweeters and plastic parts were installed with two component clear epoxy. The low frequency drivers had screw mounting holes, so I used those in case I have to replace them, you know, from cranking the tunage.
I lightly sanded the wood to preserve the salvage wood character (actually no power sander and after 12 years, I wasn’t going to sand my way to clean wood by hand) then treated them with some polyurethane I found left behind by the woodworker that owned the building before I did. So that was at least 18 years old. At least.
I supported the speakers over the edge of the table to align the drivers in the holes from below.
The finished assembly looked more or less like I predicted:
Testing
The speakers sound objectively quite nice, but I was curious about the frequency response. To test them I used the pink noise generator in Audacity to generate 5.1 6 channel pink noise files which I copied over to the HTPC to play back through my amp. This introduces the amp’s frequency response, which is unlikely to be particularly good, and room characteristics, which are certainly not anechoic.
Then I recorded the results per speaker on a 24/96 Tascam DR-2d recorder, which also introduces some frequency response issues, and imported the audio files back into Audacity (and the original pink noise file), plotted the spectrum with 65536 poles, and exported the text files into excel for analysis.
Audacity’s pink noise looks like this:
It’s pretty good – a bit off plan below 10 Hz and the random noise gets a bit wider as the frequency increases, but it is pretty much what it should be.
First, I tested one of my vintage ADS L980 studio monitors. I bought my L980s in high school in about 1984 and have used them ever since. In college I blew a few drivers (you know, cranking tunage) but they were all replaced with OEM drivers at the Tweeter store (New England memories). They haven’t been used very hard since, but the testing process uncovered damage to one of my tweeters, which I fixed before proceeding.
The ADS L980 has very solid response in the low frequency end with a nicely manufactured 12″ woofer and good high end with their fancy woven tweeter. A 3 way speaker, there are inevitably some complexities to the frequency response.
I also tested my Klipsch KSC-C1 Center Channel speaker (purchased in 2002 on ebay for $44.10) to see what that looked like:
It isn’t too bad, but clearly weaker in the low frequency, despite moderate sized dual woofers and with a bit of a spike in the high frequency that maybe is designed in for TV and is perhaps just an artifact of the horn loaded tweeter. It is a two way design and so has a fairly smooth frequency response in the mid-range, which is good for the voice program that a center speaker mostly carries.
And how about those new ones?
Well… not great, a little more variability than one would hope, and (of course) weak below about 100Hz. I’m a little surprised the tweeters aren’t a little stronger over about 15kHz, though while that might have stood out to me in 1984, it doesn’t now. Overall the response is quite good for relatively inexpensive drivers, the low frequency response, in particular, is far better than I expected given the small drivers. The high frequency is a bit spiky, but quite acceptable sounding.
And they sound far, far better than the poor hacked apart Sony speakers they replaced.
Raw Data
The drawings I fabricated from and the raw data from my tests are in the files linked below: