A PDF is a file that consists of several objects. In general you have four parts to a PDF file structure.
- The header states the PDF specification that this file follows
- The body contains all the objects that make up the document
- The cross-reference table list the locations of the indirect objects in the file
- The trailer specifies the location of the cross reference table and other special objects
Below is a simple example PDF I created with notepad. It prints out a “Hello World!” message centered at the top of the document. I will show the code and explain each section one by one.
The header section contains the version of the PDF specification that my file conforms to. In my example I use the version 1.0. Next is the first object which is a catalog object. If you think of a tree data structure the catalog object would be the root and all other elements grow or build onto this node.
Our second object is the “Pages” object which will contain references to individual pages. Be careful not to confuse the “Pages” object with the “Page” object. As you can see above, the type of this object is pages and we introduce a new entry called “Count”. Count refers to the number of pages that this current object points to. In this simple example we only have one page. Line 15 we specify a required keyword “Kids” which points to the object with the individual page object. The next object is our “Page” object and it is object number 3.
Similar to the “Pages” object, the “Page” object also has to declare its type in line 21. In line 22 instead of kids you must list the parent of the object which in this case is object 2 (Pages). In Line 23 I list the resource I use for this object which the object font is necessary. Here I only declare a name for the font I will use and give a reference to a font object that fully declares the font type and size. In my example, my “Font” object is number 5. Line 25 I use the entry “MediaBox” and it is a required entry for a page object. It defines the boundaries of the page. The last entry I use for the Page object is “Contents” and this specifies a reference to an object that will contain our text we wish to display. In my example this is object 4 which is a stream object.
First in line 31 we must include the length and this is the byte size starting after stream to right before endstream. If we calculate the bytes we get the size to be 45. Next are the tags for the stream object. Line 32 is the start tag for stream and line 37 is the end tag for stream. Line 33 and 36 are opening and closing tags for text as well. “BT” stands for begin text, and “ET” stands for end text. Line 34 calls on our font which we declared as “F1” and the font size is set to 24. Something to note is how functions and parameters are called. The parameters of a function are pushed on the stack first, after the function is called and pops the parameters off. This is what is happening in line 34. The font “F1” and the size 24 are pushed on the stack. After the function “Tf” is pushed on the stack and pops the two parameters off the stack. On line 35, 250 and 700 is distance beginning from the bottom right side of the document. At this coordinate is where the text “Hello, World!” will be displayed. Additional if we desired we could add an optional filter to decode parameters if not in plain text.
Object number 5 is the font object that has been referenced beforehand. Here we fully declare the font object. We must first declare the type similar to the catalog and page objects. The type for this object is “Font”. Line 43 is a required entry in the font dictionary. There are seven subtypes that can be chosen and the different values can be found in the Portable Document Format Specification . For our example I use “Type1”. The entry “BaseFont” on line 44 simply describes the font name we use which is Helvetica.
The last section that must be included to close a PDF document is the xref and trailer section.
Line 49 specifies the number of object entries in the document including the xref object. The number of objects is 6 and it is again referenced inside the trailer section to indicate the size. Also in line 54 we have a reference to the catalog object (Object 1) which is the root node. We end with a closing startxref tag and an end of file tag on line 58. This completes the creation of a PDF and will be read by a PDF reader. Only the bare essentials are included in my example and normally in the cross reference table one would include the offsets for each object in the document.
 “PDF Reference and Adobe Extensions to the PDF Specifications”, Available at http://www.adobe.com/devnet/pdf_reference.html