forked from Unstructured-IO/unstructured
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feature CORE-3985: add Clarifai destination connector (Unstructured-I…
…O#2633) Thanks to @mogith-pn from Clarifai we have a new destination connector! This PR intends to add Clarifai as a ingest destination connector. Access via CLI and programmatic. Documentation and Examples. Integration test script.
- Loading branch information
1 parent
fcc8b73
commit e6321d4
Showing
20 changed files
with
793 additions
and
3 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
Clarifai | ||
=========== | ||
|
||
Batch process all your records using ``unstructured-ingest`` to store unstructured outputs locally on your filesystem and upload those to Clarifai apps. | ||
|
||
First start with the installation of clarifai dependencies as shown here. | ||
|
||
.. code:: shell | ||
pip install "unstructured[clarifai]" | ||
Create a clarifai app with base workflow. Find more information in the `create clarifai app <https://docs.clarifai.com/clarifai-basics/applications/create-an-application/>`_. | ||
|
||
Run Locally | ||
----------- | ||
The upstream connector can be any of the ones supported, but for the convenience here, showing a sample command using the upstream local connector. | ||
|
||
.. tabs:: | ||
|
||
.. tab:: Shell | ||
|
||
.. literatinclude:: ./code/bash/clarifai.sh | ||
:language: bash | ||
|
||
.. tab:: Python | ||
|
||
.. literalinclude:: ./code/python/clarifai.py | ||
:language: python | ||
|
||
For a full list of the options the CLI accepts check ``unstructured-ingest <upstream connector> clarifai --help``. | ||
|
||
NOTE: Keep in mind that you will need to have all the appropriate extras and dependencies for the file types of the documents contained in your data storage platform if you're running this locally. You can find more information about this in the `installation guide <https://unstructured-io.github.io/unstructured/installing.html>`_. | ||
|
||
|
15 changes: 15 additions & 0 deletions
15
docs/source/ingest/destination_connectors/code/bash/clarifai.sh
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
#!/usr/bin/env bash | ||
|
||
unstructured-ingest \ | ||
local \ | ||
--input-path example-docs/book-war-and-peace-1225p.txt \ | ||
--output-dir local-output-to-clarifai \ | ||
--strategy fast \ | ||
--chunk-elements \ | ||
--num-processes 2 \ | ||
--verbose \ | ||
clarifai \ | ||
--app-id "<your clarifai app name>" \ | ||
--user-id "<your clarifai user id>" \ | ||
--api-key "<your clarifai PAT key>" \ | ||
--batch-size 100 |
48 changes: 48 additions & 0 deletions
48
docs/source/ingest/destination_connectors/code/python/clarifai.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
from unstructured.ingest.connector.clarifai import ( | ||
ClarifaiAccessConfig, | ||
ClarifaiWriteConfig, | ||
SimpleClarifaiConfig, | ||
) | ||
from unstructured.ingest.connector.local import SimpleLocalConfig | ||
from unstructured.ingest.interfaces import ( | ||
ChunkingConfig, | ||
PartitionConfig, | ||
ProcessorConfig, | ||
ReadConfig, | ||
) | ||
from unstructured.ingest.runner import LocalRunner | ||
from unstructured.ingest.runner.writers.base_writer import Writer | ||
from unstructured.ingest.runner.writers.clarifai import ( | ||
ClarifaiWriter, | ||
) | ||
|
||
|
||
def get_writer() -> Writer: | ||
return ClarifaiWriter( | ||
connector_config=SimpleClarifaiConfig( | ||
access_config=ClarifaiAccessConfig(api_key="CLARIFAI_PAT"), | ||
app_id="CLARIFAI_APP", | ||
user_id="CLARIFAI_USER_ID", | ||
), | ||
write_config=ClarifaiWriteConfig(), | ||
) | ||
|
||
|
||
if __name__ == "__main__": | ||
writer = get_writer() | ||
runner = LocalRunner( | ||
processor_config=ProcessorConfig( | ||
verbose=True, | ||
output_dir="local-output-to-clarifai-app", | ||
num_processes=2, | ||
), | ||
connector_config=SimpleLocalConfig( | ||
input_path="example-docs/book-war-and-peace-1225p.txt", | ||
), | ||
read_config=ReadConfig(), | ||
partition_config=PartitionConfig(), | ||
chunking_config=ChunkingConfig(chunk_elements=True), | ||
writer=writer, | ||
writer_kwargs={}, | ||
) | ||
runner.run() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
#!/usr/bin/env bash | ||
|
||
# Uploads the structured output of the files within the given path to a clarifai app. | ||
|
||
SCRIPT_DIR=$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")" &>/dev/null && pwd) | ||
cd "$SCRIPT_DIR"/../../.. || exit 1 | ||
|
||
PYTHONPATH=. ./unstructured/ingest/main.py \ | ||
local \ | ||
--input-path example-docs/book-war-and-peace-1225p.txt \ | ||
--output-dir local-output-to-clarifai \ | ||
--strategy fast \ | ||
--chunk-elements \ | ||
--num-processes 2 \ | ||
--verbose \ | ||
clarifai \ | ||
--app-id "<your clarifai app name>" \ | ||
--user-id "<your clarifai user id>" \ | ||
--api-key "<your clarifai PAT key>" \ | ||
--batch-size 100 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
-c ../constraints.in | ||
-c ../base.txt | ||
clarifai |
Oops, something went wrong.