Musical Cleanup On Aisle 5!

In my “spare time” I’ve been re-ripping all my CDs to FLAC-encoded files.  I decided that since I was using my server as an archival storage mechanism for my CDs that I ought to go ahead and use a true archival format.  Despite the fact that the sound quality of 320Kbps MP3 files is pretty high, they still don’t faithfully reproduce the original sound.  Worse, portable players like the iPod don’t really take advantage of the quality of a 320Kbps file.  In fact, I’ve found that using high bit-rate MP3s with the iPod causes significant degradation in battery life.  I think this is due to the extra data fetches that are required (i.e. a 320Kbps file will require twice the memory accesses of a 160Kbps file of the same length).  This isn’t as bad on the Nano as it was with the Mini (which makes sense because the Mini used a micro hard drive, which sucks more juice than flash memory), but it’s still noticeable. 

For those not familiar with audio codecs and compression, it should be noted that there are two types: lossy and lossless.  Lossy compression algorithms throw away parts of the sound data so as to reduce the amount of data to be transferred or stored.  The trick with a lossy codec is to find the point where the data that is lost isn’t noticeable to the human ear.  This is something of an art, though, as some people are more sensitive than others.  I find that I can readily identify any MP3 stream under about 160Kbps (i.e. I get a feeling that it’s not quite right).  Lossless codes, such as FLAC or ALE/ALAC, work much like ZIP to compress the data without modifying the data stream itself. 

Some people (mostly anal-retentive audiophiles) just dump their CD’s to WAV files and have done with it.  That certainly guarantees nothing is lost, since the output is pretty much just a copy of what was on the CD.  But it’s pretty wasteful of space, as a single CD may well be 650MB of data.  Something like FLAC can retain all the original data but reduce the size by half, which makes a big difference in the amount of disk space used.  Better yet, FLAC is natively supported by my Squeezebox music players.  I’d always felt kind of strange feeding music via SPDIF to my receivers from the Squeezeboxes when that music was encoded with a lossy codec (even if it was at 320Kbps). 

My strategy now will be to rip everything to FLAC and keep a synchronized directory of 160Kbps MP3s for use with portable audio devices.  This lets me feed the full, original, audio signal to my receivers via the Squeezeboxes and to put them on the iPod without wasting space or battery life.

So while I’ve been re-ripping I’ve also been investigating how the music is tagged.  Over the years I’d relied on CDDB or FreeDB in my ripper program to get the track data.  I’ve found that this data is often just wrong enough to give me headaches.  I store the music under subdirectories by artist, then album name.  Things like different punctuations and spellings for the same artist would really mess up the tags and make finding the music more difficult.

Consider the following example.  In the old collection, there were three entries for Alison Krauss & Union Station:
turnera@minilith:/data/music/flac> find ../artists/ -type d -iname “*alison*”
../artists/alison_krauss__union_station
../artists/alison_krauss_and_union_station
../artists/alison_krauss

I’ve taken pains to make sure that doesn’t happen again in the new collection:
turnera@minilith:/data/music/flac> find Alison_Krauss__Union_Station/ -type d
Alison_Krauss__Union_Station/
Alison_Krauss__Union_Station/Lonely_Runs_Both_Ways
Alison_Krauss__Union_Station/Forget_About_It
Alison_Krauss__Union_Station/New_Favorite
Alison_Krauss__Union_Station/So_Long_So_Wrong
Alison_Krauss__Union_Station/Live_Disc_2
Alison_Krauss__Union_Station/Live_Disc_1

And it’s not just spelling and punctuation.  Sometimes it’s just plain inconsistent naming conventions between the people who entered the data originally:
turnera@minilith:/data/music/flac> find ../artists/ -type d -iname “*kill*”
../artists/soundtrack/kill_bill_volume_1
../artists/various/kill_bill_volume_2_ost
turnera@minilith:/data/music/flac> find . -type d -iname “*kill*”
./Various_Artists/Kill_Bill_Vol_2
./Various_Artists/Kill_Bill_Vol_1

I especially liked the fact that out of five k.d. lang CDs I found that people came up with four different (and wrong) ways of entering her name.  Just for the record, it’s “k.d. lang” (little k period little d period space lang), or at least that’s how she writes it.

And finally, there was the issue of foreign characters.  When I started ripping the programs I was using didn’t properly support double-byte characters.  The latest version of the programs all use UTF-8 and can handle any characters (it makes entering filenames at the command line a bit messy, though, since my keyboard doesn’t have all those characters; copy/paste and command-line completion have been lifesavers).  So what previously showed up as “stephane_pompougnac/h?tel_costes_quatre” is now more accurately shown as “Stéphane_Pompougnac/Hôtel_Costes_Volume_4_Quatre” (if those show up as garbage characters, make sure your browser is set to UTF-8 character encoding).

As the commands above show, I’m using Linux for ripping, storing, and serving the music.  Well before the Sony rootkit fiasco I had a healthy distrust of the record companies, so I’ve been using Linux for years for all my ripping.  I’m using GRIP with cdparanoia, and they aren’t susceptible to auto-run or any other similar nonsense.

3 Comments

  1. Mike says:

    I use Exact Audio Copy for the rip. LAME 3.96 with dbPowerAmp for the compression (setting: Alt-Preset-Extreme). Tag and Rename for the metadata chores.

  2. I heard lots of good things about EAC on the Slim Devices forums, although they said it could be difficult to set up.  But so far I’ve had good results with cdparanoia.  It has managed to extract audio from some CD’s that I thought were ruined.  One of my nieces, when she was about 2, grabbed one of my CD’s with some kind of goop on her fingers which I didn’t discover for a day or two, by which time it was stuck on for good.  The CD wouldn’t play in a normal player, but cdparanoia managed to extract the data (albeit very slowly).

    Grip will go to FreeDB to get the meta information, although being volunteer-submitted it sometimes requires scrubbing (as I noted).  For any after-ripping cleanup I’ve been using Mp3Tag, which also supports FLAC.  The only thing that I’m considering changing is perhaps merging multi-disc albums into one directory using the Disc ID tag.  I’d have to do that manually, though, as Grip doesn’t appear to support it (of it if does I haven’t found the right configuration option).

    For additional album information (especially to get the right accented/foreign characters when they aren’t in the FreeDB data) I’ve been using MusicBrainz.org.

  3. William says:

    I use FLAC too! I like having a “backup” of all my cds bit-by-bit.

    I use MusicBrainz to tag my files though. Picard is quite a cool program. Give it a go. smile