==============
== morph.sh ==
==============
Einfach mal was mit Holz machen.

Cleaning up Napster WMA file metadata

en music linux mp3

Since I started using Navidrome to listen to my music collection, I’ve once again realized that I have way too many WMA files. Usually these don’t cause a big problem, pretty much all software I use can easily handle them - Audacious, WebMinidisc, my Sony MP3 player - but Navidrome can’t, at least not out of the box. So it really is time to clean these up once and for all.

The thing is: I got most of these files from Napster. Not the original Napster, but the thing that took its name after - an early subscription service for music that I used in the late 2000s. In contrast to modern streaming services it let you asynchronously download an unlimited amount of music as WMA files, but you could only play them as long as you had your subscription - this was swiftly alleviated with a software called FreeUse, which freed the files of their DRM but left an ugly [NoDRM] tag at the beginning of each file name.

Converting these files to MP3 is of course no challenge, there are plenty of programs that can do it - I opted for good old ffmpeg. But I wanted to be extra careful with files that I once bought, because I don’t know if they can be traced back to me in some way; if I want to pass any of them on to my friends, I’d rather strip them of any metadata that they could contain. I’ll simply assume that the files don’t contain any audio watermarks, I somehow don’t think that they would’ve done that back then - and even if they have, it shouldn’t be a huge issue since the company that ran that iteration of Napster has long disappeared. So for now I’ll focus on ID3 tags (or whatever they are called in WMA files).

Directly converting the WMA files to MP3 simply copies all of the metadata to the new files. Thankfully ffmpeg can edit that metadata on the fly, so I don’t have to touch them a second time. The original files contain a number of weird looking data fields, which seem to belong to the “Windows Media Format SDK”. All of them are still documented, like for example this page for WM/SubscriptionContentID which immediately sounded like a metadata leak waiting to happen to me.

Anyway, here’s the full ffmpeg command that converts a WMA file to MP3 V0 (variable bitrate of the highest quality) while removing all of Microsoft’s tags.

  ffmpeg -nostdin -loglevel error \
    -i "$wma" \
    -vn \
    -map_metadata 0 \
    -metadata "DeviceConformanceTemplate"="" \
    -metadata "IsVBR"="" \
    -metadata "MediaFoundationVersion"="" \
    -metadata "WM/EncodingTime"="" \
    -metadata "WM/MCDI"="" \
    -metadata "WM/NapsterHeader"="" \
    -metadata "WM/Provider"="" \
    -metadata "WM/ProviderRating"="" \
    -metadata "WM/ProviderStyle"="" \
    -metadata "WM/SubscriptionContentID"="" \
    -metadata "WM/Track"="" \
    -metadata "WM/UniqueFileIdentifier"="" \
    -metadata "WM/WMADRCAverageReference"="" \
    -metadata "WM/WMADRCPeakReference"="" \
    -metadata "WM/Year"="" \
    -metadata "WMFSDKVersion"="" \
    -metadata "WMFSDKNeeded"="" \
    -c:a libmp3lame -q:a 0 \
    "$mp3"

For convenience, I wrapped the whole thing in a shell script, which can be run directly within a directory that contains WMA files. It does the following:

  • check which WMA files are present in the directory
  • create an “archive directory” where the WMA files go after being converted in-place - just in case…
    • the archive directory is named after the current directory. You can supply a path as an argument to the script, and it will create the archive directory in there; I use it to have a top-level WMA directory outside of the rest of my music collection so that Navidrome doesn’t index them.
  • convert the WMA file to MP3
  • remove all metadata except the “normal” tags such as artist, album, title…
  • remove ‘[NoDRM]-’ from the filename if present
  • move the WMA file to the archive directory.
#!/usr/bin/env bash
 
# convert wma files in this directory to mp3
# move successfully converted wmas somewhere else
# use $1 for target archive directory
# remove wma specific id3 tags
# strip napster stuff out of filename

set -euo pipefail

log() {
  echo "[INFO] $1"
}

err() {
  echo "[ERROR] $1" >&2
}

shopt -s nullglob

cwd_name="$(basename "$(pwd)")"
archive_dir="../WMA/$cwd_name"

if ! [ "$1" ]; then
    log "Using default archive directory: ${archive_dir}"
else
    archive_dir="$1"
fi

wma_files=( *.wma )

if [[ ${#wma_files[@]} -eq 0 ]]; then
  log "No WMA files found in current directory. Nothing to do."
  exit 0
fi

log "Found ${#wma_files[@]} WMA file(s). Starting conversion."

log "Creating archive directory '$archive_dir'"
mkdir -p "$archive_dir"

for wma in "${wma_files[@]}"; do
  base="${wma%.wma}"
  mp3="${base}.mp3"
  
  # remove [NoDRM] from filename
  mp3=$(echo "$mp3" | sed -e 's/\[NoDRM\]-//g')

  log "Converting '$wma' to '$mp3'"

  # check if file already exits so ffmpeg doesn't complain
  if [ -f "$mp3" ]; then
      err "File ${mp3} already exists! Exiting."
      exit 1
  fi
  
  ffmpeg -nostdin -loglevel error \
    -i "$wma" \
    -vn \
    -map_metadata 0 \
    -metadata "DeviceConformanceTemplate"="" \
	-metadata "IsVBR"="" \
	-metadata "MediaFoundationVersion"="" \
	-metadata "WM/EncodingTime"="" \
	-metadata "WM/MCDI"="" \
	-metadata "WM/NapsterHeader"="" \
	-metadata "WM/Provider"="" \
	-metadata "WM/ProviderRating"="" \
	-metadata "WM/ProviderStyle"="" \
	-metadata "WM/SubscriptionContentID"="" \
	-metadata "WM/Track"="" \
	-metadata "WM/UniqueFileIdentifier"="" \
	-metadata "WM/WMADRCAverageReference"="" \
	-metadata "WM/WMADRCPeakReference"="" \
	-metadata "WM/Year"="" \
	-metadata "WMFSDKVersion"="" \
	-metadata "WMFSDKNeeded"="" \
    -c:a libmp3lame -q:a 0 \
    "$mp3"

  log "Finished processing '$mp3'"
  log "Moving ./${wma} to ${archive_dir}/${wma}"
  mv "$wma" "$archive_dir/"
done

log "All files processed successfully."
log "Done."

I hope this will be useful when I try to remember how I did this in a few years - maybe it will even be useful for someone else, too. :)