unstructured
open source library is designed as a starting point for quick prototyping and has limits. For production scenarios, see the Unstructured API instead.unstructured
library offers an open-source toolkit
designed to simplify the ingestion and pre-processing of diverse data formats, including images and text-based documents
such as PDFs, HTML files, Word documents, and more. With a focus on optimizing data workflows for Large Language Models (LLMs),
unstructured
provides modular functions and connectors that work seamlessly together. This cohesive system ensures
efficient transformation of unstructured data into structured formats, while also offering adaptability to various platforms
and use cases.
Destination Connectors
.
SCARF_NO_ANALYTICS=true
before running any commands that call Unstructured hosted APIs.