PDF 2.0, ISO 32000-2 (2017, 2020) (<< Access Full Text)
PDF 2.0 is an international standard (ISO 32000-2:2017), published as a successor format to PDF 1.7 (ISO 32000-1:2008). A dated revision for the same format specification was published in December 2020. The PDF family of formats are designed for representing electronic documents, intended to enable users to exchange and view documents independent of the environments in which they were created or in which they are viewed or printed. A PDF file typically represents a formatted, page-oriented document. Such documents may be heavily structured or simple. They may contain text, images, graphics, and rich media content, such as video, audio, and interactive 3D models. There is support for annotations, metadata, hypertext links, and bookmarks.
In both published versions of ISO 32000-2, the Introduction states, "PDF, together with software for creating, viewing, printing and processing PDF files in a variety of ways, fulfills a set of requirements for electronic documents including:
- preservation of document fidelity independent of the device, platform, and software,
- merging of content from diverse sources — Web sites, word processing and spreadsheet programs, scanned documents, photos, and graphics — into one self-contained document while maintaining the integrity of all original source documents,
- an extensible metadata model at the document and object level,
- collaborative editing of documents from multiple locations or platforms,
- digital signatures to certify authenticity,
- security and permissions to allow the creator to retain control of the document and associated rights,
- accessibility of content to those with disabilities,
- extraction and reuse of content for use with other file formats and applications, and
- electronic forms to gather and/or represent data within business systems."
Features added in PDF 2.0 include:
- Wider support for UTF-8 encoding in embedded metadata and textual annotations. In ISO 32000-1, the text string type allowed character strings to be encoded in PDFDocEncoding or the UTF-16BE (big-endian) Unicode character encoding scheme. In ISO 32000-2, UTF-8 encoding is also permitted in this context. PDFDocEncoding can encode the entire ISO Latin 1 character set and is documented in Annex D, “Character Sets and Encodings”. Note: In PDF documents, display of textual content is usually based on a different encoding mechanism, via fonts. See a useful response to the problem Cannot copy non-Latin characters from PDF document on Stack Exchange for an explanation of this mechanism.
- Support for 3D and RichMedia “annotations.” Support for 3D annotations conforming to U3D or PRC standards. Support for 3D annotations in U3D was added in PDF 1.7, ExtensionLevel 3 and Acrobat 9 by Adobe in June 2008. Use of RichMedia annotations is recommended in place of the previous separate Movie and Sound annotations (which are deprecated in PDF 2.0).
- Geospatial features. 2D and 3D geospatial data can be added, relating page contents to geographic regions using one of several common geospatial models of the earth. For more detail, see PDF, Geospatial Encoding. Geospatial features were introduced in PDF 1.7, ExtensionLevel 3 and Acrobat 9 by Adobe in June 2008.
- Improved support for digital signatures, based on the PAdES standard (ETSI EN 319 142-1 PAdES digital signatures; Part 1: Building blocks and PAdES baseline signatures. New structures are defined to provide support for long-term validation of signatures. See Establish long-term signature validation in Adobe Acrobat User Guide.
- Improvements for representing document structure in a “Tagged PDF.” New tags for content include Aside for sidebars, callouts, etc. New subtypes for the artifacts of type Pagination include PageNum, LineNum, and Bates, in order to support indexing and direct referencing of pages as required for formal documents in specialized domains. Bates Numbering is a method of indexing legal documents for easy identification and retrieval. As well as introducing new tag types and subtypes, PDF 2.0 introduced the concept of namespaces for customized tagging and to distinguish between structure tags defined in PDF 1.7 and those defined in PDF 2.0. See Notes below for more on Tagged PDF and the changes introduced in PDF 2.0.
- Features introduced for support of the printing and graphic arts industries. One such feature is support for marking individual graphic objects for black point compensation rather than assuming that the choice to apply black point compensation at print time is made for the entire file or not all. Improvements for handling half-tones and spot colors are introduced. Another addition is support for output intents for individual pages. An output intent describes characteristics of the final destination device to be used to reproduce the color in the PDF. Starting with PDF 1.4, output intents could be specified, but only for the entire document. Output intents can be embedded or referred to as external resources. See the 2017 white paper The Impact of PDF 2.0 on Print Production by Martin Bailey of Global Graphics Software. This paper specifically addresses issues relevant for printing.
- Features first introduced as part of “subset standards” for PDF: Document Parts and Associated Files. Document parts were introduced in PDF/VT ISO 16612-2:2010, Graphic technology – Variable data exchange – Part 2: Using PDF/X-4 and PDF/X-5 (PDF/VT-1 and PDF/VT-2). A primary use of document parts is to facilitate workflows that process large documents section by section. The associated files feature was introduced in PDF/A-3 (ISO 19005-3: 2012). These file attachments can be associated with the whole document, a page, or some other part of the document. The relationship between the associated file and the corresponding part of the document can be specified. The PDF constructs for associated files and attachment relationships are included in PDF 2.0. See PDF 2.0 Application Note 002: Associated Files from the PDF Association.
- Custom security handlers are supported in PDF 2.0 by permitting an encrypted file that uses an encryption mechanism not part of the PDF 2.0 standard to be embedded in an unencrypted wrapper document. The wrapper provides guidance associated with the security handler needed to decrypt the embedded encrypted PDF document (encrypted payload). This mechanism requires use of the Collection dictionary and Associated Files to identify the encrypted payload in a way that allows PDF processors that already have the necessary security handler to immediately present the encrypted payload. PDF processors without the custom security handler will present the unencrypted wrapper document with instructions to the user. For more on security handling in PDF files, see Security from the User Guide for the PDFTron Software Development Kit.
Features from earlier PDF versions dropped or deprecated in PDF 2.0.:
The definition of “deprecated” in the specification says that features marked as deprecated in this part of ISO 32000 should not be written into a PDF 2.0 document, and should be ignored by a reader. However, a note associated with the definition state that some “variations on these restrictions on continued use of a deprecated feature are explicitly stated in this document.” A second note states, “Implementers are cautioned that some features that are deprecated in this part of ISO 32000 could have tighter constraints placed on them, or even be removed completely, in a later version of ISO 32000.”:
- XFA forms, introduced in PDF 1.5. XFA (XML Forms Architecture) is a family of formats for XML-based forms that is proprietary to Adobe. The PDF 2.0 appears to limit the use of XFA. Comments welcome.
- PDF 2.0 does not support the use of Flash/Shockwave, a format proprietary to Adobe, introduced as a RichMedia annotation type in PDF 1.7 extension level 3 and Acrobat 9.
- Use of PostScript XObjects, used for fragments of code expressed in the PostScript page description language, was discouraged in ISO 32000-1. In the PDF 2.0 specification, this type of XObject has been completely removed.
- For PDF 2.0, entries in the document information dictionary other than the CreationDate and ModDate entries are deprecated. Since PDF 1.1, this structure in the trailer of a PDF file has held optional metadata entries such as Title, Author, Subject, and Keywords. This structure played an important role even after the introduction of the more general XMP metadata framework in PDF 1.4. Experience with the PDF/A standards has highlighted the challenges associated with avoiding conflicting metadata entries in the two structures.
- Open Prepress Interface (OPI), a collection of PostScript conventions that supports page design with low-resolution images as placeholders for high-resolution images to be substituted before printing. The OPI specification was originally developed in the 1980s by Aldus for use with PostScript in the PageMaker application. Its use is deprecated in PDF 2.0.
- Weak encryption schemes and algorithms are deprecated. Only AES-256 encryption used in a secure encryption scheme is encouraged in PDF 2.0. For digital signatures, some older methods are deprecated in favor of the modern PAdES (PDF Advanced Electronic Signatures) standard.
- Movie and Sound annotations are deprecated, with the recommendation to use the more powerful RichMedia annotation introduced in Acrobat 9.
- Several PDF syntax features without any use in practice have been deprecated.
See PDF 2.0 (ISO 32000-2): Deprecated Features from PDFlib and The Latest in PDF 2.0 Test from QualityLogic for more details on dropped and deprecated features.
Changes in the 2020 dated revision of the PDF 2.0 specification:
The 2020 dated revision of PDF 2.0 has only one area with a change that affects parsing and rendering of PDF documents, related to support for new Unicode character collections, a feature of Unicode that is of particular interest in CJK scripts. For example, in Japan, new characters are introduced for the new era associated with a new emperor; see New Japanese Era (September 6, 2018) from the Unicode blog. In addition to clarifications and corrections throughout, two new annexes were introduced. Annex E is a new normative annex titled “Extending PDF.” Annex Q is a new normative annex titled “Method for determining transparency on a page.” Annex M, an informative annex, is a replacement with a new title, “Differences between the standard structure namespaces.”
…
Read the complete version with developer and reference links in PDF 2.0, ISO 32000-2 (2017, 2020)