![]() |
Need advice for generating large PDF output from HTML - Printable Version +- MacResource (https://forums.macresource.com) +-- Forum: My Category (https://forums.macresource.com/forumdisplay.php?fid=1) +--- Forum: Tips and Deals (https://forums.macresource.com/forumdisplay.php?fid=3) +--- Thread: Need advice for generating large PDF output from HTML (/showthread.php?tid=48110) Pages:
1
2
|
Need advice for generating large PDF output from HTML - volcs0 - 01-26-2008 I've written a Python script that parses at large set of mass spec data. I've formatted the output as HTML, but it will be 100's of pages, so opening these HTML files in a browser is impractical. I've spent the morning getting html2ps and ps2pdf working (and a PHP script called HTML_ToPDF that I found) - it works find for modestly sized HTML files. But anything large, the program takes way to long. In fact, I am testing it on moderately sized file now (13 meg HTML file), and ps2pdf has been running for 20 minutes. Although this is keeping my legs quite warm, this solution is also impractical. The final HTML file will 5 times this size. So, I'm trying to think of other options. The ideal output would be PDF, since that would render fast and be browser independent. I tried a python-based HTML2PDF conversion tool, but it did not work well. So, other ideas? Should I try to use LaTeX and produce the postscript directly? I like the HTML idea, because it is easy for me to format and color my table output. Any other thoughts? Thanks. Re: Need advice for generating large PDF output from HTML - TheTominator - 01-26-2008 [quote volcs0]So, other ideas? Should I try to use LaTeX and produce the postscript directly? I would. You can use pdflatex to produce PDF directly. You can use a LaTeX class/package to add color. Alter your Python script to produce the relatively simple repetitive code for LaTeX markup. Other high-speed options include 'troff' or other tools in the troff family. Re: Need advice for generating large PDF output from HTML - TheTominator - 01-26-2008 Another option is to load the data into FileMaker Pro, format a layout, and then print to PDF. It wouldn't be faster for rendering, might indeed be slow when printing to PDF, and wouldn't give you the smallest PDF document. But if you are familiar with FileMaker it might be faster than coding depending on your skills. I would do that only if you would find the data filtering and analysis tools of FileMaker helpful in the process. Re: Need advice for generating large PDF output from HTML - volcs0 - 01-26-2008 [quote TheTominator][quote volcs0]So, other ideas? Should I try to use LaTeX and produce the postscript directly? I would. You can use pdflatex to produce PDF directly. You can use a LaTeX class/package to add color. Alter your Python script to produce the relatively simple repetitive code for LaTeX markup. Yes - was "afraid" of this. I didn't think ahead. I have such beautiful HTML output... But you're right. LaTeX is the best long-term option... Thanks - I'll get to work on it.... haven't done any LaTeX in awhile... Re: Need advice for generating large PDF output from HTML - mattkime - 01-26-2008 don't convert - output in the format that you want. http://www.devshed.com/c/a/PHP/PDF-Generation-With-PHP/ Re: Need advice for generating large PDF output from HTML - TheTominator - 01-26-2008 I presently have a LaTeX document which has a chapter consisting of 10+ pages of tabular information. To avoid problems with page breaking, I formatted the column entries in a nontraditional way. I created a poor man's table with fixed-width column entries and a fixed separation between columns. Since you are producing the LaTeX code from a script, it should be no trouble to implement a scheme like this should you decide to. Each line in the "table" goes like this. \noindent{\makebox[\mycolwidth][l]{text in col 1}\hspace{\colseparation}\makebox[\mycolwidth][l]{text in col 2}\hspace{\mycolwidth} ...} with repeated use of \makebox[\mycolwidth][l]{...} to print the column's contents and \hspace{\colseparation} to force a specific space between the columns. At the end of each line I have \par\vspace{0.25\baselineskip} There is one column entry that can span multiple lines. For that one I use \parbox[t]{\myparboxwidth}{Text in a multi-line paragraph.} instead of \makebox At the beginning define \newlength{\mycolwidth} \settowidth{\mycolwidth}{The longest column contents you actually have} \newlength{\colseparation} \setlength{\colseparation}{0.5em} % or whatever horizontal separation you want \newlength{\myparboxwidth} \setlength{\myparboxwidth}{1.55in} I use the same width columns for most of my table. If you have different widths, define \mycolwidth1, \mycolwidth2, and so on as needed. There is no need to define a floating environment. The formatting is insensitive to extra blank lines between these single, long lines for each row. The \begin{tabbing} \end{tabbing} environment may work just fine and be simpler though. It is supposed to work well over page breaks. Re: Need advice for generating large PDF output from HTML - TheTominator - 01-26-2008 As mattkime recommends, you can try to locate a PDF library for Python or switch to using another language. I was going to suggest that, but I've never tried it myself. Personally I don't like the idea of having to typeset the material manually directly in PDF. You will be dealing with multi-page content. It is my understanding that the libraries are designed to let you create your PDF one page at a time. For better or worse you would be having to fit your content on the page and reflow it as you tweak your format. Re: Need advice for generating large PDF output from HTML - volcs0 - 01-26-2008 [quote TheTominator]I presently have a LaTeX document which has a chapter consisting of 10+ pages of tabular information... Thank you for this detailed explanation. The output is very repetitive - the same table information displayed 10,000 times or so. If I can't get some sort of continuous display (which HTML was good for), then I'll want to make sure the tables do not break over the page. I'll check out the PHP-PDF link. The main purpose of the program is heavy-duty computation of this mass spec data... so, re-tooling it for PHP would be onerous, though not impossible. But there must be some python hooks I can use. Thanks for thinking about this with me. I already made my first table in LaTeX - just in a text editor, so that I can see the repetitive aspects of it and how it applies to my data. Now I will try to get Python to do the same thing... Re: Need advice for generating large PDF output from HTML - TheTominator - 01-26-2008 This link may be helpful on using the tabbing environment in LaTeX. http://noodle.med.yale.edu/latex/latex2e-html/ltx-58.html Re: Need advice for generating large PDF output from HTML - volcs0 - 01-26-2008 Also amazing to me is that when I google, "python latex pdf output" - THIS THREAD is already on the first page of hits. Freaky. |