PostScript and PDF tools under Gnu/Linux

Introduction

PostScript and its cousins are page description formats, which handle scaling of fonts and vector graphics with ease, and are far superior to pixelated formats (GIF, TIFF etc.) for most quality printing and publishing jobs. PostScript (and its relatives) are output options for both word processing and graphics applications.

There are three closely-related formats, often distinguished by characteristic file extensions:

The three formats are distinct, so it is best to understand and respect the differences, and to use the right format for the job. Nevertheless, sometimes transforming between formats is forced on you by circumstances; and anyway, the close family relationship makes transformations feasible. This is the topic of the remainder of this page.

Principal utilities

Central to all PostScript and PDF manipulations is the program GhostScript. This interprets files in any of these formats, and it can then output the document or image after applying various operations: rotation, scaling, cropping, and rasterization. GhostScript is rather user-hostile. Fortunately there are various less hostile wrapper programs, which shield users from GhostScript. The most important one is

gv
displays ps, eps or pdf files
display
displays, manipulates and converts image files

Gnome and KDE have their own variants of gv.

Other utilities, specific to .ps and .eps, are:

psbook
rearrange pages in PostScript file into signatures
psselect
selects pages from a PostScript file
pstops
Ghostscript's PostScript distiller for PS and EPS
epsffit
fit encapsulated PostScript file (EPSF) into constrained size
ps2epsi
embeds a thumbnail image within a PS or EPS file
psnup
multiple pages per sheet
psresize
multiple pages per sheet
psmerge
filter to merge several PostScript files into one
ps2ps
optimizer for PS files
eps2eps
optimizer for EPS files
ps2ascii
extract text from PS file
ps2pdf
translator from PS to PDF
fig2ps
translator from FIG, via Latex, to PS/EPS

Utilities specific to .pdf are:

pdf2ps
translator from PDF to PS
pdfimages
extracts images from PDF documents
pdftotext
extracts text from PDF documents
pdftops
converts PDF to PS
pdftopbm
converts each page to a PBM image
pdftoppm
converts each page to a PPM image
pdfcrop
reduces/increases margins

These utilities can solve most problems, and they can be combined creatively to achieve even more elaborate effects. Below are some simple examples.

To... Use this command:
Convert from Letter to A4 psresize -Pletter -pA4 infile.ps outfile.ps
Print 4 pages on one A4 sheet psnup -pA4 -4 infile.ps | lp -
Create a new PS file with selected
pages from an input file
psselect -p1,3,6,20-25 infile.ps outfile.ps
Merge several files into a new file psmerge -ooutfile.ps file1.ps file2.ps

It is best to check the 'man pages' for the detailed instructions for each of these programs, e.g. man psbook.

It is not guaranteed that your Linux distribution provides all the above utilities; and it may contain utilities not listed above.

Other utilities

To convert a single-page PS file to EPS, use ps2eps from the Prosper package. If you don't have Prosper installed then I can give you a copy of this utility. Usage is explained by

    ps2eps -h

To convert a multipage PS file to multiple single-page PS files, use split-psfile — also part of Prosper. This program is a simple front end to psselect. Usage is

    split-psfile <input.ps> <prefix>

and will result in a set of files <prefix>nnn.ps, with nnn equal to 001, 002, 003...

Either smart or arbitrary cropping of PDF pages is possible by applying a patch to the Perl script pdfcrop, mentioned above. See pdfcrop2. This helps with A4-Letter conversions, as well as with extracting a specific rectangle.

For an all-in-one utility for PDF manipulations, try pdftk. (However this may not be installed by default.)

To annotate PostScript figures try flpsed. This allows you, for example, to superimpose labels on plots, or to fill in forms. It also allows batch operation.

PostScript and PDF graphics can be converted to other vector formats with pstoedit.

Another listing of handy tools can be found at http://www.ubuntugeek.com/list-of-pdf-editing-tools-for-ubuntu.html. It briefly describes flpsed, pdftk, pdfedit, krita, pdfmod, inkscape, OpenOffice, PDF-Shuffler, pdfcrop.

Converting eps files to pixel format

This job can be done interactively with the aid of the graphics utilities xv/xfig/gimp. But for the most precise control of the output format, or when many transformations are required, then GhostScript can't be avoided.

To use gs to transform an eps page to another format, first run

   gs -h

to see what output formats are available. Most refer to the native format of various printers, or to different windowing systems. However pixel-based file formats include:

TIFF PNG JPEG
  • tiffg3: Fax, Group 3, bilevel
  • tiffg4: Fax, Group 4, bilevel
  • tiff32d: Fax, Group 3 2-D
  • tiffcrle: Fax, Group 3 fax with no EOLs
  • tiff12nc: 12-bit RGB, no compression
  • tiff24nc: 24-bit RGB, no compression
  • tifflzw: LZW compression
  • tiffpack: PackBits
  • png16: 4-bit color
  • png16m: 24-bit color
  • png48
  • png256: 8-bit color
  • pngalpha
  • pnggray
  • pngmono
  • jpeg
  • jpegcmyk
  • jpeggray

The distinctions are not documented, but seem to relate mainly to colour depth and compression. Trial and error might be necessary.

Then run GhostScript with something like

    gs -q                 # No version message
       -sDEVICE=tiffg3    # Chosen output 'device'
       -sOutputFile=out.tiff   # Output file
       -dNOPAUSE          # Exit when done
       -dBATCH            # No interaction
       -r72x72            # Default output resolution in pixels/inch
       -g940x880          # Output geometry in pixels 
       image.eps          # Input file: in this case 940x880 points

Scaling is important to understand, since the input file uses absolute lengths, and these lengths need to be converted to pixels. Postscript internally measures lengths in 'points', and LCD monitors happen to have pixels that are about 1x1 point in area. Consequently we often aim for a conversion factor of 1 pixel/point in both the horizontal and vertical directions. This is the reason for '-r72x72' in the above example. (It helps to know that a 'point' is a measure used in typography, and is currently 1/72 inches, or 0.35 mm.)

But specifying the horizontal and vertical scaling is not enough, since the input image has somewhat vague extent. It has a nominal size (denoted, by the header 'bbox' entry, in units of 'points'); but the image or text may extend beyond that boundary, or the relevant part of the image may be confined to one corner. Consequently we also need to select a rectangular area. That is the reason for '-g940x880' in the above example.

The upshot of having an input PS image that is 940x880 points in extent, and then specifying both '-r72x72' and '-g940x880', is that a pixelated image is generated that is 940x880 pixels in size.

The on-line docs for GhostScript provide much more information about color, media, orientation, fonts, batching, and piping.

Finding files using thumbnails

Sometimes the biggest problem with images files is simply to find the one you want. If you don't know the name of the figure then visual identification may be the answer. The following script helps by generating an array of thumbnails, with the aid of ImageMagick. (Note: the script can't cope with filenames containing spaces. Any suggestions?)

#!/bin/bash
  
# Put this file in some directory in your path ('e.g. ~/bin') or in
# a the root directory of the branch that you want to explore.  Call
# it something like 'thumbs'.  Then run it (this is helped by 
# 'chmod +x thumbs').  Assuming output to 'x:' the thumbnails will be 
# shown in an X11 window: step through the pages by pressing Esc
# or 'q'.  It may take 5 or 10 seconds to generate each page.

# Many customizations are possible: feel free to experiment.
# Prerequisites are ImageMagick and bash.  And linux, of course!

# Define which images you wish to turn into thumbnails
declare -a LIS
#LIS=(`find * -name '*.tiff' -o -name '*.jpeg' -o -iname '*.jpg'`)
LIS=(`find * -name '*.eps'`)
if [ ${#LIS} -eq 0 ]; then echo "No files found"; exit; fi
if [ ${#LIS[@]} -eq 1 ]; then display $LIS; exit; fi

# Layout of thumbnails on each output page
let pageH=842  # A4, points
let pageW=595  # A4, points
let omargin=15 # outer margin; around the set of NxM images
let imargin=5  # image margin; around each individual image
let nRows=5
let nCols=4
let pixH=($pageH-2*$omargin)/$nRows-2*${imargin}
let pixW=($pageW-2*$omargin)/$nCols-2*${imargin}
imageGeom=${pixW}'x'${pixH}'+'${imargin}'+'${imargin}

# Compute the number of A4 pages that will be generated
let nPages=${#LIS[@]}-${#LIS[@]}%$((nRows*nCols))
let nPages=nPages/$((nRows*nCols))
if [ $(( ${#LIS[@]}%(nRows*nCols) )) -gt 0 ]; then let nPages+=1; fi

trap exit 2 15   # enable Ctrl-C; however cursor should be in terminal window
for ((pageN=1;pageN<=nPages;pageN++)); do
  # Compute index of start of subsequence
  let off=nRows*nCols*$((pageN-1))
  
  # Build list of filenames to include in the current page
  names=""
  for ((i=0;i<nRows*nCols;i++)); do
      names=`printf '%s %s ' $names ${LIS[(($off+$i))]}`
  done
  if [ $nPages -gt 1 ]; then echo Page $pageN "/" $nPages; fi
  
  # For more, see file:///usr/share/doc/imagemagick/www/montage.html
  # The final item defines the output destination.  Some possibilities are: 
  #    x:                    a simple window
  #    out_$pageN.ps         a sequence of numbered files
  #    miff:- | display -    ImageMagick's display utility
  #    ps:- | gv -           GhostView
  montage  -set label "%i\n%wx%h" -pointsize 10 -tile $nRowsx$nCols -frame 3 -geometry ${imageGeom} -shadow -page A4 -gravity Center $names x:

done

Conclusion

The above utilities allow most things to be done, though perhaps in several steps.

Thanks

Thanks to the countless programmers who created all these great tools. And thanks to those who have informed me of additional utiities: Kate Wilson, Klaus Holler.

 

Validate HTML CSS Last changed 2011-07-05 Chris Rennie