Archive

Archive for March, 2009

Web-friendly PDFs

March 8th, 2009

If you have a website that serves up PDFs you’ll want to make sure you are delivering them in a web-friendly way.  While HTML web pages tend to be relatively small in terms of size, a PDF can be much larger depending on its content.  Several factors contribute to this such as embedded fonts, a large number of pages or high-resolution graphics.  Therefore, as a thoughtful provider of PDF content, you’ll want to spend a little time upfront in preparing your PDF rather than make your end users spend a lot of time waiting to view your PDF - if they decide to stick around and wait.

For the most part, small to medium size web pages typically render in the browser quickly.  This is partly due to the fact that the browser will display certain items while it continues to work on other parts of the page so the page has the illusion of loading quickly.  You may notice some sites with lengthy tables tend to take longer as the browser has to work to determine the optimal size for each column based on all the data in the table.  Even though it may take some time (few seconds to a minute or more) to render an HTML page, when you can see various items start to populate in the window you can begin to focus on those areas while the rest populates.

A standard PDF works differently when you view it in your browser from a website.  In this case, the entire PDF must be downloaded to you computer before the PDF viewer will show this first bit of content.  The main reason for this is the way a PDF is structured.  A PDF is not required to be physically setup in the same order its pages are laid out.  That is, the data required for page 1 of the PDF isn’t necessarily near the top of the file.  In fact, PDFs have what is called a cross-reference table that specifies where all the objects or logical pieces are located.  The cross-reference table is located (not always in its entirety) at the end of PDF.  Thus, the entire PDF must be sent down the wire first to determine which pieces are needed and where they are within the PDF.

Optimization/Linearization

First up is optimization, which you might also hear referred to as linearization or fast-web view.  The point behind this is to physically restructure the PDF in such a way that the PDF viewer can begin to render the PDF without waiting for the entire PDF to download.  Different software will have different ways of setting this feature - there is a checkbox in Adobe Acrobat when you save the PDF to specify this and most FyTek software has an option that can be set.  Check the documentation if you are using other software to see if the option is available.

Optimizing a PDF is a method by which the first page of the PDF and bookmarks are moved to the top of the file along with part of the cross-reference table.  Some special settings are also made in the PDF file to denote to the PDF reader that this happens to be an optimized PDF.  When a web browser instructs the PDF plug-in to open and display the PDF, the browser will begin to show the first page (almost) right away.  Once the first page is displayed along with any bookmarks the browser will continue downloading the rest of the PDF in the background.  There may be a slight delay as you click to different bookmarks as that page might not yet be downloaded.  The total time to download the entire PDF will not be affected - but end users can begin to read the PDF much quicker rather than stare at the spinning wheel or globe in their favorite browser.

Optimized PDF Loading in Firefox

Optimized PDF Loading in Firefox

I should point out that optimization does not physically reduce the size of your PDF.  In fact, with the extra information needed to specify it is an optimized PDF, you will likely notice a small increase in the byte size.  Typically by only a few hundred to a few thousand bytes of the original size of the PDF before optimization.  Also, optimization will have no effect on PDFs outside of the web.  Opening an optimized PDF from your hard drive will be no faster than opening the same un-optimized version.  That is because the web server combined with the PDF Reader browser plug-in is what allows optimization to work and neither are in play when you open a locally stored PDF.

Other options

There are a few other options to consider in addition to optimization.  One is to use lower resolution images if possible.  Next, if your software supports it, use font subsets.  This option will place only the needed glyphs for fonts in your PDF rather than the entire character set.  Check the documentation for the software you are using to see if this option is available.  Better yet, do you need a custom font?  Try using a built-in font instead so you don’t need to include any extra font information.  PDF readers contain built-in fonts for Times Roman, Helvetica (Arial) and Courier.

Another consideration is to break up the PDF into smaller PDFs.  Perhaps use the bookmark structure of your current PDF to split it into more manageable sizes.  Even if you don’t have access to the document the PDF was created from there are other software programs available (including PDF Meld) to perform this function automatically.  Doing this means users may need to download multiple smaller PDFs rather than a single large PDF containing everything they need.

Summary

Use PDF optimization when you have large PDFs (probably a couple megabytes or more) not to reduce their size but to make them display faster in a web browser.  Optimization is only beneficial when you are making large PDFs available for viewing or download on a website.  It will not help with other delivery methods such as email.  Try using lower resolution images, font subsets to make the PDF physically smaller if you need to email or just want a smaller PDF.  Lastly, you might want to split up a large PDF based on some logical grouping such as bookmarks to create multiple smaller PDFs.

admin FyTek Software, PDF

PDF Encryption

March 1st, 2009

The PDF file structure allows for various types of encryption. The common forms use 40-bit encryption, which are compatible with Adobe Reader 4.x and earlier and 128-bit used in Adobe Reader 5.x and higher. The more bits, the more difficult to crack using a brute force approach so 128-bit is considered more secure. No encryption scheme is completely safe but some are safer than others. The encryption used in PDFs won’t stop a dedicated hacker but should be fine for most other situations. Having said that, unless you have a need to support older versions of Adobe Reader you should use 128-bit encryption.

A PDF is internally divided into logical groupings or objects and these objects are assigned a unique number. There are many types of objects but for an example they could be an embedded font file, the content for a given page, or an image. Each object within a PDF has its data encrypted – the contents of the object, not the object number or other non-data parts of the object – when you apply encryption. That is, those sections that don’t relate to potentially sensitive information such as metadata or pointers are intentionally left unencrypted. This is so other software, such as file content search routine, can find a description or title for the PDF. You can encrypt this metadata if desired but unless you really want it encrypted it’s best to leave it as plain text.

The typical use for encryption, other than making it difficult to extract data directly, is to place restrictions on what can be done with a PDF file. For example, you may want to prevent the end user from printing or copying text from the PDF. Perhaps you are a graphics shop and your images are considered an asset. You may provide full resolution images for viewing within the PDF but only allow low-res printing. Here’s a list of restrictions you can place on a PDF depending on level of encryption used:

40-bit encryption

  • do not allow user to print
  • do not allow user to make changes
  • do not allow user to copy text/graphics
  • do not allow user to add/update annotations

128-bit encryption

  • do not allow user to print (even low quality)
  • do not allow user to make changes
  • do not allow user to copy text/graphics
  • do not allow user to add/update annotations
  • do not allow user to fill in interactive fields
  • do not allow user to extract information
  • do not allow assembly (insert, rotate, delete pages or create bookmarks)
  • do not allow user to print at digital quality

Types of Passwords

You have the option of applying an owner password only or an owner and user password. A user password, or opening password, is used when you want Adobe Reader to prompt for a password before showing the PDF. In this case, you can enter either the owner or user password depending on which one you know or care to use. If you don’t know either one then you will not be able to open or view the PDF. Note that both the owner and user password will grant the same restricted access as defined by the author of the document. So what good is the owner password? It can be used to remove the encryption from the document using the full (paid) version of Adobe Acrobat or other free software such as PDF Un-Secure from FyTek.

Having just an owner password on a PDF is similar to having both an owner and user password except there is no prompt to enter a password when opening the PDF. In this case there is no need to supply users with a user password and whatever restrictions you place on the PDF will be effect when the PDF is opened. Well, that’s the intent anyway. While we certainly don’t endorse the practice, it is possible to find software that will remove encryption from a PDF when only an owner password is present. The reason for this is the user password is used internally to decrypt the document. If the user password is not applied then by default it’s blank and therefore known. Keep this in mind if you have sensitive data to encrypt.

Summary

PDF encryption is used to limit how an end user can interact with your PDF. In addition, it provides an extra layer of security. This is not to say the security cannot be cracked but it should be sufficient for most users. Always keep the owner password available for future reference in the event you want to modify restrictions on an existing PDF. You can use Adobe Acrobat or other compatible software that is capable of removing PDF security settings. The other option is to recreate the PDF from its original source, such as a Word Document, and apply new restrictions.

admin PDF