public class PreflightParser extends NonSequentialPDFParser
| Modifier and Type | Field and Description |
|---|---|
protected PreflightContext |
ctx |
static Charset |
encoding
Define a one byte encoding that hasn't specific encoding in UTF-8 charset.
|
protected DataSource |
originalDocument |
protected PreflightDocument |
preflightDocument |
protected ValidationResult |
validationResult |
DEFAULT_TRAIL_BYTECOUNT, EOF_MARKER, OBJ_MARKER, securityHandler, STARTXREF_MARKER, SYSPROP_EOFLOOKUPRANGE, SYSPROP_PARSEMINIMAL, TMP_FILE_PREFIXisFDFDocment, xrefTrailerResolverDEF, document, ENDOBJ, ENDSTREAM, forceParsing, pdfSource, PROP_PUSHBACK_SIZE| Constructor and Description |
|---|
PreflightParser(DataSource input) |
PreflightParser(File file) |
PreflightParser(File file,
RandomAccess rafi) |
PreflightParser(String filename) |
| Modifier and Type | Method and Description |
|---|---|
protected void |
addValidationError(ValidationResult.ValidationError error)
Add the error to the ValidationResult.
|
protected void |
addValidationErrors(List<ValidationResult.ValidationError> errors) |
protected void |
checkEndstreamKeyWord()
'endstream' must be preceded by an EOL
|
protected void |
checkPdfHeader()
Check that the PDF header match rules of the PDF/A specification.
|
protected void |
checkStreamKeyWord()
'stream' must be followed by
|
protected void |
createContext()
Create a validation context.
|
protected void |
createPdfADocument(Format format,
PreflightConfiguration config) |
protected static ValidationResult |
createUnknownErrorResult()
Create an instance of ValidationResult with a ValidationError(UNKNOWN_ERROR)
|
PDDocument |
getPDDocument()
This will get the PD document that was parsed.
|
PreflightDocument |
getPreflightDocument() |
protected void |
initialParse()
The initial parse will first parse only the trailer, the xrefstart and
all xref tables to have a pointer (offset) to all the pdf's objects.
|
protected int |
lastIndexOf(char[] pattern,
byte[] buf,
int endOff)
Searches last appearance of pattern within buffer.
|
protected boolean |
nextIsEOL() |
protected boolean |
nextIsSpace() |
void |
parse()
This will parse the stream and populate the COSDocument object.
|
void |
parse(Format format)
Parse the given file and check if it is a confirming file according to the given format.
|
void |
parse(Format format,
PreflightConfiguration config)
Parse the given file and check if it is a confirming file according to the given format.
|
protected COSArray |
parseCOSArray()
This will parse a PDF array object.
|
protected COSName |
parseCOSName()
This will parse a PDF name from the stream.
|
protected COSStream |
parseCOSStream(COSDictionary dic,
RandomAccess file)
Wraps the
NonSequentialPDFParser.parseCOSStream(org.apache.pdfbox.cos.COSDictionary, org.apache.pdfbox.io.RandomAccess) to check rules on 'stream' and 'endstream' keywords. |
protected COSString |
parseCOSString()
Check that the hexa string contains only an even number of Hexadecimal characters.
|
protected COSString |
parseCOSString(boolean isDictionary)
Deprecated.
Not needed anymore. Use
#COSString() instead. PDFBOX-1437 |
protected COSBase |
parseDirObject()
Call
BaseParser.parseDirObject() check limit range for Float, Integer and number of Dictionary entries. |
protected COSBase |
parseObjectDynamically(int objNr,
int objGenNr,
boolean requireExistingNotCompressedObj)
This will parse the next object from the stream and add it to the local
state.
|
protected boolean |
parseXrefTable(long startByteOffset)
Same method than the PDFParser.parseXrefTable(long) with additional controls : - EOL mandatory after
the 'xref' keyword - Cross reference subsection header uses single white space as separator - and so on
|
decrypt, decryptDictionary, decryptString, deleteTempFile, getPage, getPageNumber, getPdfFile, getSecurityHandler, getStartxrefOffset, isLenient, parseObjectDynamically, readPattern, releasePdfSourceInputStream, setEOFLookupRange, setLenient, setPdfSourceclearResources, getDocument, getFDFDocument, isContinueOnError, parseHeader, parseStartXref, parseTrailer, parseXrefStream, parseXrefStream, readVersionInTrailer, setTempDirectoryisClosing, isClosing, isEndOfName, isEOL, isEOL, isWhitespace, isWhitespace, parseBoolean, parseCOSDictionary, readExpectedString, readGenerationNumber, readInt, readLine, readLong, readObjectNumber, readString, readString, readStringNumber, readUntilEndStream, setDocument, skipSpacespublic static final Charset encoding
protected DataSource originalDocument
protected ValidationResult validationResult
protected PreflightDocument preflightDocument
protected PreflightContext ctx
public PreflightParser(File file, RandomAccess rafi) throws IOException
IOExceptionpublic PreflightParser(File file) throws IOException
IOExceptionpublic PreflightParser(String filename) throws IOException
IOExceptionpublic PreflightParser(DataSource input) throws IOException
IOExceptionprotected static ValidationResult createUnknownErrorResult()
protected void addValidationError(ValidationResult.ValidationError error)
error - protected void addValidationErrors(List<ValidationResult.ValidationError> errors)
public void parse()
throws IOException
NonSequentialPDFParserparse in class NonSequentialPDFParserIOException - If there is an error reading from the stream or corrupt data
is found.public void parse(Format format) throws IOException
format - format that the document should follow (default Format.PDF_A1B)IOExceptionpublic void parse(Format format, PreflightConfiguration config) throws IOException
format - format that the document should follow (default Format.PDF_A1B)config - Configuration bean that will be used by the PreflightDocument. If null the format is used to determine
the default configuration.IOExceptionprotected void createPdfADocument(Format format, PreflightConfiguration config) throws IOException
IOExceptionprotected void createContext()
public PDDocument getPDDocument() throws IOException
NonSequentialPDFParsergetPDDocument in class NonSequentialPDFParserIOException - If there is an error getting the document.public PreflightDocument getPreflightDocument() throws IOException
IOExceptionprotected void initialParse()
throws IOException
NonSequentialPDFParserinitialParse in class NonSequentialPDFParserIOException - If something went wrong.protected void checkPdfHeader()
protected boolean parseXrefTable(long startByteOffset)
throws IOException
parseXrefTable in class PDFParserstartByteOffset - the offset to start atIOException - If an IO error occurs.protected COSStream parseCOSStream(COSDictionary dic, RandomAccess file) throws IOException
NonSequentialPDFParser.parseCOSStream(org.apache.pdfbox.cos.COSDictionary, org.apache.pdfbox.io.RandomAccess) to check rules on 'stream' and 'endstream' keywords.
checkStreamKeyWord() and checkEndstreamKeyWord()parseCOSStream in class NonSequentialPDFParserdic - dictionary that goes with this stream.file - file to write the stream to when reading.IOException - if an error occurred reading the stream, like
problems with reading length attribute, stream does not end
with 'endstream' after data read, stream too short etc.protected void checkStreamKeyWord()
throws IOException
IOExceptionprotected void checkEndstreamKeyWord()
throws IOException
IOExceptionprotected boolean nextIsEOL()
throws IOException
IOExceptionprotected boolean nextIsSpace()
throws IOException
IOExceptionprotected COSArray parseCOSArray() throws IOException
BaseParserparseCOSArray in class BaseParserIOException - If there is an error parsing the stream.protected COSName parseCOSName() throws IOException
BaseParserparseCOSName in class BaseParserIOException - If there is an error reading from the stream.@Deprecated protected COSString parseCOSString(boolean isDictionary) throws IOException
#COSString() instead. PDFBOX-1437BaseParser.parseCOSString()parseCOSString in class BaseParserisDictionary - indicates if the stream is a dictionary or notIOException - If there is an error reading from the stream.protected COSString parseCOSString() throws IOException
BaseParser.parseCOSString()parseCOSString in class BaseParserIOException - If there is an error reading from the stream.protected COSBase parseDirObject() throws IOException
BaseParser.parseDirObject() check limit range for Float, Integer and number of Dictionary entries.parseDirObject in class BaseParserIOException - If there is an error during parsing.protected COSBase parseObjectDynamically(int objNr, int objGenNr, boolean requireExistingNotCompressedObj) throws IOException
NonSequentialPDFParserPDFParser and reduced to parsing an
indirect object.parseObjectDynamically in class NonSequentialPDFParserobjNr - object number of object to be parsedobjGenNr - object generation number of object to be parsedrequireExistingNotCompressedObj - if true the object to
be parsed must be defined in xref (comment: null objects may
be missing from xref) and it must not be a compressed object
within object stream (this is used to circumvent being stuck
in a loop in a malicious PDF)IOException - If an IO error occurs.protected int lastIndexOf(char[] pattern,
byte[] buf,
int endOff)
NonSequentialPDFParserlastIndexOf in class NonSequentialPDFParserpattern - pattern to search forbuf - buffer to search pattern inendOff - offset (exclusive) where lookup starts at-1 if
pattern could not be foundCopyright © 2002–2019 The Apache Software Foundation. All rights reserved.