Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Searchable PDF vs TIFF
#1
I've been busy scanning hundreds (probably thousands) of pages, manuals and documents with my ScanSnap, which I LOVE.

Then someone mentioned that PDF was a large format taking up all kinds of room and that I should be using TIFF.

Now, for me, I'm almost done and I have a 15+ pound bag of motorhome, tax, tool, computer, whatever paper scanned into the computer and it's sitting just over 600mb, which is fine for me.

My question is this, if I had used TIFF instead of PDF, would it be text searchable like the PDFs?

I am so excited, I've actually been able to find out how to reset every little clock, weather station, etc. that I own with just a simple search.

Kate
Reply
#2
I would ignore what that person told you. PDF can be a large format, but it can also be a very small format, it all depends on on the compression used.

Same thing sort of applies for TIFF. It's an image format and supports different compression methods. But not every application has good support for multi-page TIFF's, and I doubt they would be text searchable unless you converted them into a PDF and applied OCR software (which is what the ScanSnap is doing).
Reply
#3
I would say PDF is preferable to TIFF for your task. And TIFF is far from a small file format anyway.


Nathan
Reply
#4
Minute too slow. Gareth beat me to it.


Nathan
Reply
#5
TIFF might actually work better than PDF on a B&W document, especially if zipped. No, TIFF wouldn't be searchable. That said, the PDFs you're making are probably only searchable if you are using OCR.
Reply
#6
PDFs can subsume TIFF data and also use the same compression algorithms, so the size issue is likely a wash.

PDFs can be search-able and multipage, and index-able so that's an advantage. All those features, however depend on OCR, or some can be achieved via metadata that is saved into the PDF document itself.

There's also a set of standards defined that makes PDF archivable, although this may be an expensive proposition for personal use.

http://en.wikipedia.org/wiki/PDF/A
Reply
#7
Seacrest wrote:
PDFs can subsume TIFF data and also use the same compression algorithms, so the size issue is likely a wash.

It depends on the software used and the circumstances. For example, I just used a one page text doc as a test. It's grayscale. I saved it as a PDF @ 300ppi JPEG and it comes out 2.1MB. I saved it as a TIFF @ 300ppi and it comes out as 7.6MB. Now take that TIFF and zip it and it's 459KB.

Zip the PDF and it's 1.8MB.

Save the PDF as ZIP and it comes out as 545KB. So there's a 15-20% difference between saving a PDF with zip compression and zipping a TIFF of the same file, res...

With a "15+ pound bag" of paper, 15-20% is not really a wash.

Of course, I'd just run OCR and make it a PDF so it is searchable and copy & past'able.
Reply
#8
M A V I C wrote:
[quote=Seacrest]
PDFs can subsume TIFF data and also use the same compression algorithms, so the size issue is likely a wash.

It depends on the software used and the circumstances. For example, I just used a one page text doc as a test. It's grayscale. I saved it as a PDF @ 300ppi JPEG and it comes out 2.1MB. I saved it as a TIFF @ 300ppi and it comes out as 7.6MB. Now take that TIFF and zip it and it's 459KB.

Zip the PDF and it's 1.8MB.

Save the PDF as ZIP and it comes out as 545KB. So there's a 15-20% difference between saving a PDF with zip compression and zipping a TIFF of the same file, res...
Yeah, that must be PDF data, not the tiff data.
PDF will use ZIP or LZW compression on the bitmap data from the TIFF file, but there could be a lot of PDF-specific data in what you're saving.
Reply
#9
There is a setting in the software to recognize text when you scan if you are scanning to a folder (not to print or email).

This makes it fully searchable with spotlight and later can be OCR
Reply
#10
"There is a setting in the software to recognize text when you scan if you are scanning to a folder (not to print or email).

This makes it fully searchable with spotlight and later can be OCR"

I have a ScanSnap and can't find this setting, could you tell me where it is?

Thanks

Kate
Reply


Forum Jump:


Users browsing this thread: 1 Guest(s)