Extract Text

This REST API provides the functionality to extract text from the document. There are several ways to extract text from a document:

Extract only text;
Extract formatted text by setting pages extraction mode option;
Extract text from specific pages by setting the pages range.

For protected documents, it is also required to provide a password. The table below contains the full list of properties that can be specified when extracting text from a document.

Name	Description	Comment
FileInfo.FilePath	The path of the document, located in the storage.	Required.
FileInfo.StorageName	Storage name	It could be omitted for default storage.
FileInfo.Password	The password to open file	It should be specified only for password-protected documents.
ContainerItemInfo.RelativePath	The relative path of the container.	Should be specified only for container files like ZIP archives, emails or PDF portfolios.
ContainerItemInfo.Password	Password for processing password-protected container items.	It should be specified only for password-protected container items.
FormattedTextOptions.Mode	The formatted text extraction mode.	Possible values are: “PlainText”, “Html”, “Markdown”.
StartPageNumber	Extraction start page.	The zero-based index. Extracts all pages if not specified.
CountPagesToExtract	The number of pages to extract.	Required if StartPageNumber is specified.

Resource URI

HTTP POST ~/text

Swagger UI lets you call this REST API directly from the browser.

Extract Text

Resource URI

Use Cases