Friday, May 27, 2011

Personal Data in Amazon MP3s

In December I discussed the buyer ID data that Amazon is placing in MP3 files, noted that the standard tools seem to not notice these IDs, and expressed a desire to write a script to display these IDs. See http://martesmartes.blogspot.com/2010/12/need-to-write-my-own.html and http://martesmartes.blogspot.com/2010/12/personal-information-in-amazon-mp3.html.

First, the script:
// Time-stamp: <2011-05-27 22:23:37 jdm>

// JFlex script to look for UID tags in an MP3 received from Amazon. If
// such a tag is encountered, it is displayed. Otherwise, there is no
// output.

// Compiling (assuming JFlex is installed)
//
// jflex findUID.lex
// javac Yylex.java

// Running:
//
// java Yylex <MP3 file name>

// Bugs:
//
// A left angle bracket, <, within the UID will cause the tag to not be
// displayed.
// Even though the MP3s that I have seen with UID tags have the tags
// near the beginning of the file and only one UID tag per file, this
// searches the entire (possibly long) file and will display multiple
// UIDs if found. Though this is probably not a bug, it does cause a
// perceptible delay.

%%

%standalone

%unicode
%int

openAngle  = <
uid        = UID
stuff      = [^<]+
tagEnd     = "</UID"
closeAngle = >
tag        = {openAngle}{uid}{stuff}{tagEnd}{closeAngle}

%%

{tag} { System.out.println(yytext()); return 0; }

.     { return 0; }

\n     { return 0; }

\r     { return 0; }

As mentioned in the comments, this is a JFlex script. JFlex's lineage dates back to the standard Unix lexical analyzer-building tool, lex, which was superseded by flex. JLex has been well-known in the Java community for awhile, but work on it seems to have ceased. JFlex, however, appears to be an active project (and an Ubuntu package). Of course, it works on Windows, too.  See http://jflex.de/

It turns out that Amazon informs the consumer when an MP3 will contain identifying information. I did not notice this before Michael D. pointed it out to me in January. The Amazon notice is in the product details and says "Record Company Required Metadata: Music file contains unique purchase identifier." Then they have a "Learn More" link. This is what Amazon has to say:

Record Company Required Metadata

The record company that supplies this song or album requires all companies that sell its downloadable music to include identifiers with the downloads.  Embedded in the metadata of each purchased MP3 from this record company are a random number Amazon assigns to your order, the Amazon store name, the purchase date and time, codes that identify the album and song (the UPC and ISRC), Amazon's digital signature, and an identifier that can be used to determine whether the audio has been modified.  In addition, Amazon inserts the first part of the email address associated with your Amazon.com account, so that you know these files are unique to you. Songs that include these identifiers are marked on their product detail page on Amazon.com.  These identifiers do not affect the playback experience in any way.
The idea seems to be that the record companies are requiring Amazon to put the information in, and Amazon is being honest about what's in there, though most consumers likely never see this information and never notice the link to it.

A few comments are in order.
  • My script displays the UID tag and contents, but does not modify or remove it. I have no intention of providing such a script.
  • People share MP3s at their own risk. As someone who has made good money developing software, I understand their need to earn a living. I even understand, though am less sympathetic toward, the RIAA's outrageous damage claims in suits. Any individual's decision to share, or not, is between him, his conscience, and the RIAA.
  • The UID is the user's Amazon user ID. On the MP3s containing the UID that I have, my script displays this:  <UID version="1">martensjd</UID>. That's me. 
  • Amazon says there is other identifying information embedded in the MP3. Read the statement above. So stripping this out will not be sufficient to hide the original buyer.
  • I would rather not have this in my media files, but I don't object strongly enough to go through the files stripping it all out.

2 comments:

Sam said...

Strange. I don't see the xml in the mp3 files I bought. Did Amazon change how they add metadata in the files?

Jeff Martens said...

It's not in all of their MP3s, and I don't buy MP3s from Amazon any more (terrible Linux support for their downloader), so I can't tell if they've changed the way they do it.