In recent weeks, there was some fuss about a new agreement between digital book distribution platform eBoekhuis and connected vendors. This agreement obliges vendors to hand over previously-private customer information to anti-piracy group BREIN, should a purchased e-book at some point turn up on the internet (e.g. BitTorrent, Usenet, file-sharing sites). In order to trace a book back to the customer, a transaction code is watermarked into it. When I noticed one of the eBookhuis-connected vendors (i.e. Bol.com) started selling watermarked e-books, I bought one to see what this watermark would look like.
The digital book is served as an EPUB file which basically is an archive containing pages, stylesheets and images. The first thing I noticed is that book has a warning page that contains two codes of respectively 56 and 13 characters:
I suspect the first string is the transaction code that uniquely identifies a purchase at the vendor, the latter code represents the date and time of purchase as a big endian Unix 32-bit hex value. Translating the value 520e70ac6bcd8 to a human-readable date gives: Wednesday, 16 august 2013 20:34:20 +0200. This date time code (together with the EAN) is used as a key to the eBoekhuis platform to download the book.
Other pages in the book all have the same image at the bottom that is nearly impossible to notice with your naked eye. The image is embedded as a Based64 encoded string (see the image depicted above), decoding (using the openssl base64 command) it results in an image with the dimensions of 1 by a few hundred pixels. When seen with a naked eye it looks like a thin white line. If you use the eyedropper tool to inspect the image you will see that it consists of bars of variable widths in two different colors: white (#FFFFFF) and a little bit less white (#FEFEFE). When you replace either of the colors with black and enlarge the image vertically by 100 pixels you will see an image of a barcode similar to the one depicted below.
Replacing a single color in an image can easily be accomplished by using ImageMagick. The following line of code replaces the color of all pixels with black that do have the same color as the pixel found at position 1,1.
convert original.png -fill black -draw ‘color 1,1 replace’ output.png
So, what data is encoded in this barcode? Let’s first translate the bars and spaces to binary form where each bar represents a binary one and each space represents a binary zero. This binary string of ones and zeroes has a length of 392. Remember that the transaction mentioned earlier has a length of 56 characters. Assuming the barcode doesn’t use start or stop codes, each character is encoded using seven bits (dividing 392 by 56 gives 7). The standard ASCII encoding table uses seven bits to represent each character. When mapping the 7-bit blocks to characters you can see almost immediately that it results in a string equal to the transaction code. So, the barcode you find on each page in the book represents the transaction code encoded in 7-bit ASCII.
Do note that information might be encoded in other ways as well, for example using random variations of text. Images (e.g. cover, photos) itself can be watermarked as well.
For those asking, the transaction code, timestamp or other identifiers seen in this post are fake, for obvious reasons.
Update (September 7, 2013): Added a paragraph explaining the barcode symbology that is used.