Basic Usage For a complete set of extras catering to every document type, use:
pip install "unstructured[all-docs]"
To install unstructured, you’ll also need to install the following system dependencies: libmagic, poppler, libreoffice, pandoc, and tesseract. Instruction details for these dependencies will vary by operating system. We recommend running unstructured from the officially supported Docker image, which has these dependencies installed already. Installation for Specific Document Types If you’re processing document types beyond the basics, you can install the necessary extras:
pip install "unstructured[docx,pptx]"

Available document types:
"csv", "doc", "docx", "epub", "image", "md", "msg", "odt", "org", "pdf", "ppt", "pptx", "rtf", "rst", "tsv", "xlsx"

Installation for Specific Data Connectors To use any of the data connectors, you must install the specific dependency:
pip install "unstructured-ingest[s3]"

Available data connectors:
"airtable", "azure", "azure-ai-search", "biomed", "box", "confluence", "couchbase", "delta-table", "discord", "dropbox", "elasticsearch", "gcs", "github", "gitlab", "google-drive", "jira", "mongodb", "notion", "opensearch", "onedrive", "outlook", "reddit", "s3", "sharepoint", "salesforce", "slack", "wikipedia"