09 May 15:13

jfcalvo

1a093e3

v1.28.0 Latest

Latest

🔆 Release highlights

Improved suggestions

suggestions_first.mp4

Multiple scores support for `MultiLabelQuestion` and `RankingQuestion`

MultiLabelQuestion and RankingQuestion now take one score per suggested label / value, making the scores easier to interpret. Learn more about suggestions and their scores here.

Warning

If you upgrade to this version all previous scores in suggestions for MultiLabelQuestion, RankingQuestion and SpanQuestion will turn to NULL, as they will not be valid in the new schema. Please, make sure you upload scores again if you want to use them.

See scores next to its label / value

Scores are now shown next to its label / value in all questions. This makes them more visible and easier to interpret.

Suggestions first - 🌟 Community request: #4647

Now you can order labels in MultiLabelQuestion so that suggestions are always shown first. This will help you make sure that the most relevant labels are always at hand. Plus, if you’ve added scores to your labels, these will be ordered in descending order. To enable this, go to the Dataset Settings page > Questions and enable “Suggestions first” for the desired question.

`SpanQuestion` improvements

new_spans_selection.mp4

Pre-selection highlight

We’ve improved the way selections are shown. You can now see a highlight that represents what the final selection will look like while you’re dragging your mouse. This will help you with the selection speed and show you the difference between the token vs character selection.

Note

Remember that character-level spans are activated by holding Shift while doing the selection.

New label selector

We’ve improved the way the label selector works in the SpanQuestion when overlapping spans are enabled so it’s easier to add or correct labels. Simply click on the desired span to activate the selector and click on the label(s) that you want to add or remove.

Persistent storage warning

We’ve added a warning for Argilla instances deployed on Hugging Face Spaces to alert of data loss when the persistent storage is not enabled.

To learn more about this warning and how to disable it, go to our docs.

Changelog 1.28.0

Added

Added suggestion multi score attribute. (#4730)
Added order by suggestion first. (#4731)
Added multi selection entity dropdown for span annotation overlap. (#4735)
Added pre selection highlight for span annotation. (#4726)
Added banner when persistent storage is not enabled. (#4744)
Added support on Python SDK for new multi-label questions labels_order attribute. (#4757)

Changed

Changed the way how Hugging Face space and user is showed in sign in. (#4748)

Fixed

Fixed Korean character reversed. (#4753)

Fixed

Fixed requirements for version of wrapt library conflicting with Python 3.11 (#4693)

Full Changelog: v1.27.0...v1.28.0

Assets 2

18 Apr 14:21

damianpumar

v1.27.0

1a42065

v1.27.0

🔆 Release highlights

Overlapping spans

We are finally releasing a much expected feature: overlapping spans. This allows you to draw more than one span over the same token(s)/character(s).

overlapping_spans.mp4

To try them out, set up a SpanQuestion with the argument allow_overlap=True like this:

dataset = rg.FeedbackDataset(
    fields = [rg.TextField(name="text")]
    questions = [
        rg.SpanQuestion(
            name="spans",
            labels=["label1", "label2", "label3"],
            field="text"
        )
    ]
)

Learn more about configuring this and other question types here.

Global progress bars

We’ve included a new column in our home page that offers the global progress of your datasets, so that you can see at a glance what datasets are closer to completion.

Captura de pantalla 2024-04-17 a las 14 27 32

These bars show progress by grouping records based on the status of their responses:

Submitted: Records where all responses have the submitted status.
Discarded: Records where all responses have the discarded status.
Conflicting: Records with at least one submitted and one discarded response.
Left: All other records that have no submitted or discarded responses. These may be in pending or draft .

Suggestions got a new look

We’ve improved the way suggestions are shown in the UI to make their purpose clearer: now you can identify each suggestion with a sparkle icon ✨ .

The behavior is still the same:

suggested values will appear pre-filled responses and marked with the sparkle icon.
make changes the the incorrect suggestions, then save as a draft or submit.
the icon will stay to mark the suggestions so you can compare the final response with the suggested one.

Increased label limits

We’ve increased the limit of labels you can use in Label, Multilabel and Span questions to 500. If you need to go beyond that number, you can set up a custom limit using the following environment variables:

ARGILLA_LABEL_SELECTION_OPTIONS_MAX_ITEMS to set the limits in label and multi label questions.
ARGILLA_SPAN_OPTIONS_MAX_ITEMS to set the limit in span questions.

Warning

The UI has been optimized to support up to 1000 labels. If you go beyond this limit, the UI may not be as responsive.

Learn more about this and other environment variables here.

Argilla auf Deutsch!

Thanks to our contributor @paulbauriegel you can now use Argilla fully in German! If that is the main language of your browser, there is nothing you need to do, the UI will automatically detect that and switch to German.

Would you like to translate Argilla to your own language? Reach out to us and we'll help you!

Changelog 1.27.0

Added

Added Allow overlap spans in the FeedbackDataset (#4668)
Added allow_overlapping parameter for span questions. (#4697)
Added overall progress bar on Datasets table (#4696)
Added German language translation (#4688)

Changed

New UI design for suggestions (#4682)

Fixed

Improve performance for more than 250 labels (#4702)

New Contributors

@stevengans made their first contribution in #4646
@tim-win made their first contribution in #4672
@strickvl made their first contribution in #4675
@paulbauriegel made their first contribution in #4688
@davanstrien made their first contribution in #4687

Full Changelog: v1.26.1...v1.27.0

Contributors

strickvl, tim-win, and 3 other contributors

Assets 2

27 Mar 13:16

jfcalvo

v1.26.1

ff41ad2

v1.26.1

1.26.1

Added

Added support for automatic detection of RTL languages. (#4686)

Full Changelog: v1.26.0...v1.26.1

Assets 2

22 Mar 11:33

jfcalvo

v1.26.0

02e2f5e

v1.26.0

🔆 Release highlights

Spans question

We've added a new type of question to Feedback Datasets: the SpanQuestion. This type of question allows you to highlight portions of text in a specific field and apply a label. It is specially useful for token classification (like NER or POS tagging) and information extraction tasks.

spans_demo.mp4

With this type of question you can:

✨ Provide suggested spans with a confidence score, so your team doesn't need to start from scratch.

⌨️ Choose a label using your mouse or with the keyboard shortcut provided next to the label.

🖱️ Draw a span by dragging your mouse over the parts of the text you want to select or if it's a single token, just double-click on it.

🪄 Forget about mistakes with token boundaries. The UI will snap your spans to token boundaries for you.

🔎 Annotate at character-level when you need more fine-grained spans. Hold the Shift key while drawing the span and the resulting span will start and end in the exact boundaries of your selection.

✔️ Quickly change the label of a span by clicking on the label name and selecting the correct one from the dropdown.

🖍️ Correct a span at the speed of light by simply drawing the correct span over it. The new span will overwrite the old one.

🧼 Remove labels by hovering over the label name in the span and then click on the 𐢫 on the left hand side.

Here's an example of what your dataset would look like from the SDK:

import argilla as rg
from argilla.client.feedback.schemas import SpanValueSchema

#connect to your Argilla instance
rg.init(...)

# create a dataset with a span question
dataset = rg.FeedbackDataset(
    fields=[rg.TextField(name="text"),
    questions=[
        rg.SpanQuestion(
            name="entities",
            title="Highlight the entities in the text:",
            labels={"PER": "Person", "ORG": "Organization", "EVE": "Event"},  # or ["PER", "ORG", "EVE"]
            field="text", # the field where you want to do the span annotation
            required=True
        )
    ]
)

# create a record with suggested spans
record = rg.FeedbackRecord(
    fields={"text": "This is the text of the record"}
    suggestions = [
        {
            "question_name": "entities",
            "value": [
                SpanValueSchema(
                    start=0, # position of the first character of the span
                    end=10, # position of the character right after the end of the span
                    label="ORG",
                    score=1.0
                )
            ],
            "agent": "my_model",
        }
    ]
)

# add records to the dataset and push to Argilla
dataset.add_records([record])
dataset.push_to_argilla(...)

To learn more about this and all the other questions available in Feedback Datasets, check out our documentation on:

Changelog 1.26.0

Added

If you expand the labels of a single or multi label Question, the state is maintained during the entire annotation process. (#4630)
Added support for span questions in the Python SDK. (#4617)
Added support for span values in suggestions and responses. (#4623)
Added span questions for FeedbackDataset. (#4622)
Added ARGILLA_CACHE_DIR environment variable to configure the client cache directory. (#4509)

Fixed

Fixed contextualized workspaces. (#4665)
Fixed prepare for training when passing RankingValueSchema instances to suggestions. (#4628)
Fixed parsing ranking values in suggestions from HF datasets. (#4629)
Fixed reading description from API response payload. (#4632)
Fixed pulling (n*chunk_size)+1 records when using ds.pull or iterating over the dataset. (#4662)
Fixed client's resolution of enum values when calling the Search and Metrics api, to support Python >=3.11 enum handling. (#4672)

New Contributors

@davidefiocco made their first contribution in #4639

Full Changelog: v1.25.0...v1.26.0

Contributors

davidefiocco

Assets 2

29 Feb 10:31

frascuchon

v1.25.0

c234cf6

v1.25.0

🔆 Release highlights

Reorder labels

admin and owner users can now change the order in which labels appear in the question form. To do this, go to the Questions tab inside Dataset Settings and move the labels until they are in the desired order.

reorder_labels.mp4

Aligned SDK status filter

The missing status has been removed from the SDK filters. To filter records that don't have responses you will now need to use the pending status like so:

filtered_dataset = dataset.filter_by(response_status="pending")

Learn more about how to use this filter in our docs

Pandas 2.0 support

We’ve removed the limitation to use pandas <2.0.0 so you can now use Argilla with pandas v1 or v2 safely.

Changelog 1.25.0

Note

For changes in the argilla-server module, visit the argilla-server release notes

Added

Reorder labels in dataset settings page for single/multi label questions (#4598)
Added pandas v2 support using the python SDK. (#4600)

Removed

Removed missing response for status filter. Use pending instead. (#4533)

Fixed

Fixed FloatMetadataProperty: value is not a valid float (#4570)
Fixed redirect to user-settings instead of 404 user_settings (#4609)

New Contributors

@julien-c made their first contribution in #4582
@7flash made their first contribution in #4504

Full Changelog: v1.24.0....v1.25.0

Contributors

julien-c and 7flash

Assets 2

09 Feb 11:27

frascuchon

v1.24.0

f76fcf9

v1.24.0

Note

This release does not contain any new features, but it includes a major change in the argilla server.
The package is using the argilla-server dependency defined here.

The Argilla Changelog
The Argilla Server Changelog

Full Changelog: v1.23.1...v1.24.0

Assets 2

08 Feb 16:22

frascuchon

v1.23.1

35292cd

v1.23.1

1.23.1

Fixed

Fixed Responsive view for Feedback Datasets. (#4579)

New Contributors

@CpHaddock made their first contribution at #4484
@julien-c made their first contribution in #4582

Full Changelog: v1.23.0...v1.23.1

Contributors

julien-c and CpHaddock

Assets 2

02 Feb 14:47

jfcalvo

v1.23.0

5c24407

v1.23.0

🔆 Release highlights

Hugging Face OAuth

You can now set up OAuth in your Argilla Hugging Face spaces. This is a simple way to have your team members or collaborators in crowdsourced projects sign in and log in to your space using their Hugging face accounts.

To learn how to set up Hugging Face OAuth for your Argilla Space, go to our docs.

Bulk actions for filter results

We’ve added an improvement for our bulk view so you can perform actions on all results from a filter (or a combination of them!).

To use this, go to the bulk view and apply some filter(s) of your choice. If the results are more than the records seen in the current page, when you click the checkbox you will see the option to select all of the results. Then, you can give responses, discard, save a draft and even submit all of the records at once!

Embed PDFs in a TextField

We’ve added the pdf_to_html function in our utilities so you can easily embed a PDF reader within a TextField using markdown.

This function accepts either the file path, the URLs or the file's byte data and returns the corresponding HTML to render the PDF within the Argilla user interface.

Learn more about how to use this feature here.

Changelog 1.23.0

Added

Added bulk annotation by filter criteria. (#4516)
Automatically fetch new datasets on focus tab. (#4514)
API v1 responses returning Record schema now always include dataset_id as attribute. (#4482)
API v1 responses returning Response schema now always include record_id as attribute. (#4482)
API v1 responses returning Question schema now always include dataset_id attribute. (#4487)
API v1 responses returning Field schema now always include dataset_id attribute. (#4488)
API v1 responses returning MetadataProperty schema now always include dataset_id attribute. (#4489)
API v1 responses returning VectorSettings schema now always include dataset_id attribute. (#4490)
Added pdf_to_html function to .html_utils module that convert PDFs to dataURL to be able to render them in tha Argilla UI. (#4481)
Added ARGILLA_AUTH_SECRET_KEY environment variable. (#4539)
Added ARGILLA_AUTH_ALGORITHM environment variable. (#4539)
Added ARGILLA_AUTH_TOKEN_EXPIRATION environment variable. (#4539)
Added ARGILLA_AUTH_OAUTH_CFG environment variable. (#4546)
Added OAuth2 support for HuggingFace Hub. (#4546)

Deprecated

Deprecated ARGILLA_LOCAL_AUTH_* environment variables. Will be removed in the release v1.25.0. (#4539)

Changed

Changed regex pattern for username attribute in UserCreate. Now uppercase letters are allowed. (#4544)

Removed

Remove sending Authorization header from python SDK requests. (#4535)

Fixed

Fixed keyboard shortcut for label questions. (#4530)

New Contributors

@gardner made their first contribution in #4527

Full Changelog: v1.22.0...v1.23.0

Contributors

gardner

Assets 2

18 Jan 15:04

frascuchon

v1.22.0

bd5d171

v1.22.0

🔆 Release Highlights

Bulk actions in Feedback Task datasets

Our signature bulk actions are now available for Feedback datasets!

Bulk.in.Feedback.mp4

Switch between Focus and Bulk depending on your needs:

In the Focus view, you can navigate and respond to records individually. This is ideal for closely examining and giving responses to each record.
The Bulk view allows you to see multiple records on the same page. You can select all or some of them and perform actions in bulk, such as applying a label, saving responses, submitting, or discarding. You can use this feature along with filters and similarity search to process a list of records in bulk.

For now, this is only available in the Pending queue, but rest assured, bulk actions will be improved and extended to other queues in upcoming releases.

Read more about our Focus and Bulk views here.

Sorting rating values

We now support sorting records in the Argilla UI based on the values of Rating questions (both suggestions and responses):

Learn about this and other filters in our docs.

Out-of-the-box embedding support

It’s now easier than ever to add vector embeddings to your records with the new Sentence Transformers integration.

Just choose a model from the Hugging Face hub and use our SentenceTransformersExtractor to add vectors to your dataset:

import argilla as rg
from argilla.client.feedback.integrations.sentencetransformers import SentenceTransformersExtractor

# Connect to Argilla
rg.init(
    api_url="http://localhost:6900",
    api_key="owner.apikey",
    workspace="my_workspace"
)

# Initialize the SentenceTransformersExtractor
ste = SentenceTransformersExtractor(
    model = "TaylorAI/bge-micro-v2", # Use a model from https://huggingface.co/models?library=sentence-transformers
    show_progress = False,
)

# Load a dataset from your Argilla instance
ds_remote = rg.FeedbackDataset.from_argilla("my_dataset")

# Update the dataset
ste.update_dataset(
    dataset=ds_remote,
    fields=["context"], # Only update the context field
    update_records=True, # Update the records in the dataset
    overwrite=False, # Overwrite existing fields
)

Learn more about this functionality in this tutorial.

Changelog 1.22.0

Added

Added Bulk annotation support. (#4333)
Restore filters from feedback dataset settings. (#4461)
Warning on feedback dataset settings when leaving page with unsaved changes. (#4461)
Added pydantic v2 support using the python SDK. (#4459)
Added vector_settings to the __repr__ method of the FeedbackDataset and RemoteFeedbackDataset. (#4454)
Added integration for sentence-transformers using SentenceTransformersExtractor to configure vector_settings in FeedbackDataset and FeedbackRecord. (#4454)

Changed

Module argilla.cli.server definitions have been moved to argilla.server.cli module. (#4472)
[breaking] Changed vector_settings_by_name for generic property_by_name usage, which will return None instead of raising an error. (#4454)
The constant definition ES_INDEX_REGEX_PATTERN in module argilla._constants is now private. (#4472)
nan values in metadata properties will raise a 422 error when creating/updating records. (#4300)
None values are now allowed in metadata properties. (#4300)

Fixed

Paginating to a new record, automatically scrolls down to selected form area. (#4333)

Deprecated

The missing response status for filtering records is deprecated and will be removed in the release v1.24.0. Use pending instead. (#4433)

Removed

The deprecated python -m argilla database command has been removed. (#4472)

New Contributors

@Piyush-Kumar-Ghosh made their first contribution in #4463

Full Changelog: v1.21.0...v1.22.0

Contributors

Piyush-Kumar-Ghosh

Assets 2

21 Dec 14:45

damianpumar

v1.21.0

b1cb46c

v1.21.0

🔆 Release highlights

Draft queue

We’ve added a new queue in the Feedback Task UI so that you can save your drafts and have them all together in a separate view. This allows you to save your responses and come back to them before submission.

Note that responses won’t be autosaved now and to save your changes you will need to click on “Save as draft” or use the shortcut command ⌘ + S (macOS), Ctrl + S (other).

Improved shortcuts

We’ve been working to improve the keyboard shortcuts within the Feedback Task UI to make them more productive and user-friendly.

You can now select labels in Label and Multi-label questions using the numerical keys in your keyboard. To know which number corresponds with each label you can simply show or hide helpers by pressing command ⌘ (MacOS) or Ctrl (other) for 2 seconds. You will then see the numbers next to the corresponding labels.

We’ve also simplified shortcuts for navigation and actions, so that they use as few keys as possible.

Check all available shortcuts here.

New `metrics` module

We've added a new module to analyze the annotations, both in terms of agreement between the annotators and in terms of data and model drift monitoring.

Agreement metrics

Easily measure the inter-annotator agreement to explore the quality of the annotation guidelines and consistency between annotators:

import argilla as rg
from argilla.client.feedback.metrics import AgreementMetric
feedback_dataset = rg.FeedbackDataset.from_argilla("...", workspace="...")
metric = AgreementMetric(dataset=feedback_dataset, question_name="question_name")
agreement_metrics = metric.compute("alpha")
#>>> agreement_metrics
#[AgreementMetricResult(metric_name='alpha', count=1000, result=0.467889)]

Model metrics

You can use ModelMetric to model monitor performance for data and model drift:

import argilla as rg
from argilla.client.feedback.metrics import ModelMetric
feedback_dataset = rg.FeedbackDataset.from_argilla("...", workspace="...")
metric = ModelMetric(dataset=feedback_dataset, question_name="question_name")
annotator_metrics = metric.compute("accuracy")
#>>> annotator_metrics
#{'00000000-0000-0000-0000-000000000001': [ModelMetricResult(metric_name='accuracy', count=3, result=0.5)], '00000000-0000-0000-0000-000000000002': [ModelMetricResult(metric_name='accuracy', count=3, result=0.25)], '00000000-0000-0000-0000-000000000003': [ModelMetricResult(metric_name='accuracy', count=3, result=0.5)]}

List aggregation support for `TermsMetadataProperty`

You can now pass a list of terms within a record’s metadata that will be aggregated and filterable as part of a TermsMetadataProperty.

Here is an example:

import argilla as rg

dataset = rg.FeedbackDataset(
    fields = ...,
    questions = ...,
    metadata_properties = [rg.TermsMetadataProperty(name="annotators")]
)

record = rg.FeedbackRecord(
    fields = ...,
    metadata = {"annotators": ["user_1", "user_2"]}
)

Reindex from CLI

Reindex all entities in your Argilla instance (datasets, records, responses, etc.) with a simple CLI command.

argilla server reindex

This is useful when you are working with an existing feedback datasets and you want to update the search engine info.

Changelog 1.21.0

Added

Added new draft queue for annotation view (#4334)
Added annotation metrics module for the FeedbackDataset (argilla.client.feedback.metrics). (#4175).
Added strategy to handle and translate errors from the server for 401 HTTP status code` (#4362)
Added integration for textdescriptives using TextDescriptivesExtractor to configure metadata_properties in FeedbackDataset and FeedbackRecord. (#4400). Contributed by @m-newhauser
Added POST /api/v1/me/responses/bulk endpoint to create responses in bulk for current user. (#4380)
Added list support for term metadata properties. (Closes #4359)
Added new CLI task to reindex datasets and records into the search engine. (#4404)
Added httpx_extra_kwargs argument to rg.init and Argilla to allow passing extra arguments to httpx.Client used by Argilla. (#4440)

Changed

More productive and simpler shortcuts system (#4215)
Move ArgillaSingleton, init and active_client to a new module singleton. (#4347)
Updated argilla.load functions to also work with FeedbackDatasets. (#4347)
[breaking] Updated argilla.delete functions to also work with FeedbackDatasets. It now raises an error if the dataset does not exist. (#4347)
Updated argilla.list_datasets functions to also work with FeedbackDatasets. (#4347)

Fixed

Fixed error in TextClassificationSettings.from_dict method in which the label_schema created was a list of dict instead of a list of str. (#4347)
Fixed total records on pagination component (#4424)

Removed

Removed draft auto save for annotation view (#4334)

Contributors

m-newhauser

Assets 2

Releases: argilla-io/argilla

v1.28.0

🔆 Release highlights

Improved suggestions

Multiple scores support for MultiLabelQuestion and RankingQuestion

See scores next to its label / value

Suggestions first - 🌟 Community request: #4647

SpanQuestion improvements

Pre-selection highlight

New label selector

Persistent storage warning

Added

Changed

Fixed

Fixed

v1.27.0

🔆 Release highlights

Overlapping spans

Global progress bars

Suggestions got a new look

Increased label limits

Argilla auf Deutsch!

Added

Changed

Fixed

New Contributors

Contributors

v1.26.1

Added

v1.26.0

🔆 Release highlights

Spans question

Added

Fixed

New Contributors

Contributors

v1.25.0

🔆 Release highlights

Reorder labels

Aligned SDK status filter

Pandas 2.0 support

Added

Removed

Fixed

New Contributors

Contributors

v1.24.0

v1.23.1

Fixed

New Contributors

Contributors

v1.23.0

🔆 Release highlights

Hugging Face OAuth

Bulk actions for filter results

Embed PDFs in a TextField

Added

Deprecated

Changed

Removed

Fixed

New Contributors

Contributors

v1.22.0

🔆 Release Highlights

Bulk actions in Feedback Task datasets

Sorting rating values

Out-of-the-box embedding support

Added

Changed

Fixed

Deprecated

Removed

New Contributors

Contributors

v1.21.0

🔆 Release highlights

Draft queue

Improved shortcuts

New metrics module

Multiple scores support for `MultiLabelQuestion` and `RankingQuestion`

`SpanQuestion` improvements

New `metrics` module

List aggregation support for `TermsMetadataProperty`