notes.husk.org/likes images.

57614820302

xerox-scannersphotocopiers-randomly-alter-numbers

Xerox scanners/photocopiers randomly alter numbers in scanned documents

The error does not occur if PDFs are scanned with OCR, or TIFs are scanned (the latter seems plausible, as the pure image data should be saved into the TIF). Additionally, there seems to be a correlation between font size, scan dpi used. I was able to reliably reproduce the error for 200 DPI PDF scans w/o OCR, of sheets with Arial 7pt and 8pt numbers. Overall it looks like some sort of compression algorithm using patches more than once (I think I could even identify some equally-pixeled eights).

Edit: It seems that the above thought was not that wrong at all. Several mails I got suggest that the xerox machines use JBIG2 for compression. This algorithm creates a dictionary of image patches it finds “similar”. Those patches then get reused instead of the original image data, as long as the error generated by them is not “too high”. Makes sense.

This also would explain, why the error occurs when scanning letters or numbers in low resolution (still readable, though). In this case, the letter size is close to the patch size of JBIG2, and whole “similar” letters or even letter blocks get replaced by each other.


Date posted: 2013/08/07 15:08:02
Date liked: 2013/08/07 15:08:22
72 Tumblr notes
Liked from: The New Aesthetic
Post tagged:

No tags

Automatically generated tags:
before and after 8
numbers 5
spreadsheet 3
digits 2
numerical data 2
data comparison 1
data modification 1
difference detection 1
highlighting 1
text editing 1