Extract Text

This REST API provides the functionality to extract text from the document. There are several ways to extract text from a document:

  • Extract only text;
  • Extract formatted text by setting pages extraction mode option;
  • Extract text from specific pages by setting the pages range.

For protected documents, it is also required to provide a password. The table below contains the full list of properties that can be specified when extracting text from a document.

NameDescriptionComment
FileInfo.FilePathThe path of the document, located in the storage.Required.
FileInfo.StorageNameStorage nameIt could be omitted for default storage.
FileInfo.PasswordThe password to open fileIt should be specified only for password-protected documents.
ContainerItemInfo.RelativePathThe relative path of the container.Should be specified only for container files like ZIP archives, emails or PDF portfolios.
ContainerItemInfo.PasswordPassword for processing password-protected container items.It should be specified only for password-protected container items.
FormattedTextOptions.ModeThe formatted text extraction mode.Possible values are: “PlainText”, “Html”, “Markdown”.
StartPageNumberExtraction start page.The zero-based index. Extracts all pages if not specified.
CountPagesToExtractThe number of pages to extract.Required if StartPageNumber is specified.

Resource URI

HTTP POST ~/text

Swagger UI lets you call this REST API directly from the browser. 

Use Cases