https://<workspace-id>.cloud.databricks.com
https://adb-<workspace-id>.<random-number>.azuredatabricks.net
https://<workspace-id>.<random-number>.gcp.databricks.com
/
) to the workspace URL.USE CATALOG
on the volume’s parent catalog in Unity Catalog.USE SCHEMA
on the volume’s parent schema (formerly known as a database) in Unity Catalog.READ VOLUME
and WRITE VOLUME
on the volume.DATABRICKS_HOST
- The Databricks host URL, represented by --host
(CLI) or host
(Python).
/
) to the host URL.DATABRICKS_CATALOG
- The Databricks catalog name for the Volume, represented by --catalog
(CLI) or catalog
(Python).
DATABRICKS_SCHEMA
- The Databricks schema name for the Volume, represented by --schema
(CLI) or schema
(Python). If not specified, default
is used.
DATABRICKS_VOLUME
- The Databricks Volume name, represented by --volume
(CLI) or volume
(Python).
DATABRICKS_VOLUME_PATH
- Any optional path to access within the volume, specified by --volume-path
(CLI) or volume_path
(Python).
DATABRICKS_TOKEN
- The personal access token, represented by --token
(CLI) or token
(Python).DATABRICKS_USERNAME
- The user’s name, represented by --username
(CLI) or username
(Python).DATABRICKS_PASSWORD
- The user’s password, represented by --password
(CLI) or password
(Python).DATABRICKS_CLIENT_ID
- The client ID value for the corresponding service principal, represented by --client-id
(CLI) or client_id
(Python).DATABRICKS_CLIENT_SECRET
- The client ID and OAuth secret values for the corresponding service principal, represented by --client-secret
(CLI) or client_secret
(Python).ARM_CLIENT_ID
- The client ID value for the corresponding managed identity, represented by --azure-client-id
(CLI) or azure_client_id
(Python).DATABRICKS_AZURE_RESOURCE_ID
, represented by --azure-workspace-resource-id
(CLI) or azure_workspace_resource_id
(Python).ARM_TENANT_ID
- The tenant ID value for the corresponding service principal, represented by --azure-tenant-id
(CLI) or azure_tenant_id
(Python).ARM_CLIENT_ID
- The client ID value for the corresponding service principal, represented by --azure-client-id
(CLI) or azure_client_id
(Python).ARM_CLIENT_SECRET
- The client secret value for the corresponding service principal, represented by --azure-client-secret
(CLI) or azure_client_secret
(Python).DATABRICKS_AZURE_RESOURCE_ID
, represented by --azure-workspace-resource-id
(CLI) or azure_workspace_resource_id
(Python).DATABRICKS_TOKEN
- The Entra ID token for the corresponding Entra ID user, represented by --token
(CLI) or token
(Python).GOOGLE_CREDENTIALS
- The local path to the corresponding Google Cloud service account’s credentials file, represented by --google-credentials
(CLI) or google_credentials
GOOGLE_SERVICE_ACCOUNT
- The Google Cloud service account’s email address, represented by --google-service-account
(CLI) or google_service_account
(Python).DATABRICKS_PROFILE
- The name of the Databricks configuration profile, represented by --profile
(CLI) or profile
(Python).--partition-by-api
option (CLI) or partition_by_api
(Python) parameter to specify where files are processed:
--partition-by-api
(CLI) or partition_by_api
(Python), or explicitly specify partition_by_api=False
(Python).
Local file processing does not use an Unstructured API key or API URL, so you can also omit the following, if they appear:
--api-key $UNSTRUCTURED_API_KEY
(CLI) or api_key=os.getenv("UNSTRUCTURED_API_KEY")
(Python)--partition-endpoint $UNSTRUCTURED_API_URL
(CLI) or partition_endpoint=os.getenv("UNSTRUCTURED_API_URL")
(Python)UNSTRUCTURED_API_KEY
and UNSTRUCTURED_API_URL
--partition-by-api
(CLI) or partition_by_api=True
(Python).
Unstructured also requires an Unstructured API key and API URL, by adding the following:
--api-key $UNSTRUCTURED_API_KEY
(CLI) or api_key=os.getenv("UNSTRUCTURED_API_KEY")
(Python)--partition-endpoint $UNSTRUCTURED_API_URL
(CLI) or partition_endpoint=os.getenv("UNSTRUCTURED_API_URL")
(Python)UNSTRUCTURED_API_KEY
and UNSTRUCTURED_API_URL
, representing your API key and API URL, respectively.https://api.unstructuredapp.io/general/v0/general
, which is the API URL for the Unstructured Partition Endpoint.If you do not have an API key, get one now.If the Unstructured API is self-hosted, the process
for generating Unstructured API keys, and the Unstructured API URL that you use, are different.
For details, contact Unstructured Sales at
sales@unstructured.io.