Skip to content

Commit

Permalink
[Cosmos][VectorIndex]Adding changes for vectorIndex and vectorEmbeddi…
Browse files Browse the repository at this point in the history
…ngPolicy (#40116)

* Adding changes for vectorIndex and vectorEmbeddingPolicy

* Adding some necessary comments

* Adding test case

* updating enum values

* Updating test case

* Updating test case

* Updating test case

* updating changelog

* Updating test case

* Resolving comments

* Resolving comments

* Fixing test case

* Resolving comments

* Resolving Comments

* Fixing build issues

* Resolving comments

* Resolving Comments

* Increment versions for core releases (#40003)

Increment package versions for core releases

* Ensure ServiceBus session idle timeout fall back to retry-options::try-timeout (#39994)

* Added Alpha3 Java Media Streaming Events (#40002)

* Added Alpha3 Java Media Streaming Events

* updating readme to add the media streaming events to remove model

---------

Co-authored-by: Vinothini Dharmaraj <v-vdharmaraj@microsoft.com>

* Update version of github-event-processor to 1.0.0-dev.20240502.2 (#40012)

Co-authored-by: James Suplizio <jasupliz@microsoft.com>

* Prepare May 2024 Identity Release (#40006)

* Prepare Identity Broker May 2024 Release (#40014)

* Increment package versions for identity releases (#40015)

* [JobRouter] SDK Review updates (#40011)

* SDK Review updates

* Update auto-generated models

* Add customization

* Fix customization

* Update package

* Update tests

* Linting

* FixFaultInjectionRuleFailedToApplyPerPartitionInGatewayMode (#40005)

* fix fault injection rule failed to apply per partition in gateway mode

---------

Co-authored-by: annie-mac <xinlian@microsoft.com>

* azure-cosmos-test_1.0.0.beta.7Release (#40021)

* release azure-cosmos-test 1.0.0.beta.7
---------

Co-authored-by: annie-mac <xinlian@microsoft.com>

* Fixed existsById API in ReactiveCosmosTemplate (#40022)

* Fixed existsById API in ReactiveCosmosTemplate

* Added changelog

* Skip Recorded test and delete Event record until test proxy to work with Event recordings (#40029)

Co-authored-by: Min Woo Lee 🧊 <77083090+minwoolee-ms@users.noreply.github.com>

* Fix invalid CODEOWNERS (#40032)

* ServiceBus: fix session tracing (#39962)

* remove additional matrix

* Fix session processing and disposition instrumentation

* return matrix config

* review suggestions

* [Automation] Generate SDK based on TypeSpec 0.15.15 (#40048)

* [CODEOWNERS] Updates for org changes (#40049)

* [CODEOWNERS] Updates for org changes

The focus of these changes is to remove an individual who no longer is responsible for the products which their GH account is associated to.

* Move from using the docker image to java2docfx for docs validation (#39744)

* Move from using the docker image to java2docfx for docs validation

* Temporarily turn on docs processing for template libraries for testing

* Actually install the rex validation tool

* Fix the if not Test-Path statement

* Update java2docfx version and add a couple of diagnostics output lines

* Add missing close paren

* Ensure that Sort-Object always returns an array even if there's only one item

* add another piece of diagnostics output

* trying one more thing

* remove some diag, add other

* Remove the additional diagnostics, add permanent output message

* Invoke java -jar on java2docfx to show the help command to ensure the install is okay

* fiddling with the java -jar command

* Set the working directory to the java2docfx directory before executing the mvn dependency download

* Actually create the directory before trying to set location...oops

* Update rex validation to verify MAVEN_HOME is set

* Updates for Java PR 39875 which had changes from this PR that were more immediate

* Update java2docfx version

* remove check for MAVEN_HOME which was only for testing

* Update the version of java2docfx to test a fix

* Update version of java2docfx to 1.0.4

* revert template's ci.yml changes that were only necessary to test java2docfx

* owners (#39686)

* Use ClientLogger in testing output (#40010)

Use ClientLogger in testing output

* Fix null pointer exception and context usage (#40053)

* Rename AML to AzureMachineLearning (#40056)

* Fixed the Key Vault `test-resources.json` file to properly configure a deployment script for certificate creation. (#40037)

* Close response body in bearer policy (#40052)

* Running Prepare-Release for azure-messaging-servicebus 7.17.0 (#40058)

* mgmt, TypeSpec code generation pipeline (#39963)

* typespec generation pipeline

echo command

PR_TITLE

* generation typespec

Update generation.yml for Azure Pipelines

Update generation.yml for Azure Pipelines

Update generation.yml for Azure Pipelines

* remove typespec pipeline file

* fix pr title

* address comments

* Add codeowner linter owners (#39997)

* Update to ESRP task version that supports federated auth (#40059)

* Increment package versions for cosmos releases (#40031)

* Update azure-sdk-build-tools Repository Resource Refs in Yaml files (#39627)

* Add reduced embeddings sample to azure-search-documents (#40069)

* Add reduced embeddings sample

* Fix cspell

* Fix link

* Search May Preview Regen Updates (#40057)

* Search May Preview Regeneration
- Still need to add varargs convenience

* Removing ovveride statements from `setFields` for `VectorizableImageUrlQuery` and `VectorizableImageBinaryQuery`

* Removing ovveride statements from `setFields` for `VectorizableImageUrlQuery` and `VectorizableImageBinaryQuery`

* adding varargs

* Additional adjustments to FieldBuilder and Search Index Customizations

* Updating cspell.json

* Adjust `SearchScoreThreshold` customization
Re-enable code generation in CI

* Updates:

- Updated Cspell
- Rename `maxStoragePerIndex` property to `maxStoragePerIndexInBytes` in SearchServiceLimits
- Set `hybridSearch` property to be type `HybridSearch` in SearchRequest
- Add `hybridSearch` to SearchOptions and `SearchAsyncClient.createSearchRequest()`

* Adding Support and testing byte[] and List<byte> within field builder

* Fix linting

---------

Co-authored-by: alzimmermsft <48699787+alzimmermsft@users.noreply.github.com>

* Preparing Search May 2024 Beta Release (#40071)

* Preparing Search May 2024 Beta Release

* Preparing Search May 2024 Beta Release

* eng, update autorest.java, improve error output in sdk automation (#40073)

* improve error output

* autorest.java 4.1.29

* Merge to main after spring cloud azure 4.18.0 released (#40075)

* Prepare for Spring Cloud Azure 4.18.0 release (#40063)

* update version client

* update version/changelog/readme

* update changelog

* Increment versions for spring releases (#40074)

* Increment package versions for spring releases

* Update version_client.txt

* Update pom.xml

---------

Co-authored-by: Muyao Feng <92105726+Netyyyy@users.noreply.github.com>

---------

Co-authored-by: Azure SDK Bot <53356347+azure-sdk@users.noreply.github.com>

* Miscellaneous Core performance improvements (#39552)

Miscellaneous Core performance improvements

* Increment package versions for search releases (#40072)

* Update io.fabric8:kubernetes-client (#40086)

5.12.3 -> 6.12.1

* Increment package versions for servicebus releases (#40094)

* Emit stable auto-instrumented otel metrics (#39960)

* Update otel metrics logic

* add runtime metrics

* adding a few metrics I forgot

* small correction

* Update

* Fix

* Update

* Delete pre-stable metrics

---------

Co-authored-by: Harsimar Kaur (from Dev Box) <harskaur@microsoft.com>

* [Key Vault] Added support for `/prerestore` and `/prebackup` endpoints in Backup clients (#39878)

* Updated `autorest.md` files in all swagger folders.

* Re-generated implementation code.

* Updated ServiceVersion expandable enums.

* Added public APIs for the new /prebacukp and /prerestore endpoints.

* Added tests.

* Refactored Backup client tests.

* Updated tests.

* Updated test recordings.

* Updated documentation and samples.

* Addressed PR feedback.

* Prepare to release beta.22 (#40097)

* Fix template name (#40099)

* Fix template name

* Also install the rex validation tool

* Update partner release to use WIF (#40101)

* core mgmt, `SubResource` implements `JsonSerializable` to support azure-json (#40076)

* test

* implementation

* fix lint

* spotless:apply

* Update spring-reference and sync changelog (#40105)

* update spring-reference.yml

* update CHANGELOG.md

* Support per-call response timeout in all HttpClient implementations (#40017)

Support per-call response timeout in all HttpClient implementations

* Change how JavaType is resolved to support JsonSerializable better (#40112)

* Resolving comments

* Resolving comments

---------

Co-authored-by: Azure SDK Bot <53356347+azure-sdk@users.noreply.github.com>
Co-authored-by: Anu Thomas Chandy <anuamd@hotmail.com>
Co-authored-by: v-durgeshs <146056835+v-durgeshs@users.noreply.github.com>
Co-authored-by: Vinothini Dharmaraj <v-vdharmaraj@microsoft.com>
Co-authored-by: James Suplizio <jasupliz@microsoft.com>
Co-authored-by: Bill Wert <billwert@microsoft.com>
Co-authored-by: williamzhao87 <williamzhao87@users.noreply.github.com>
Co-authored-by: Annie Liang <64233642+xinlian12@users.noreply.github.com>
Co-authored-by: annie-mac <xinlian@microsoft.com>
Co-authored-by: Kushagra Thapar <kuthapar@microsoft.com>
Co-authored-by: minwoolee-msft <77083090+minwoolee-msft@users.noreply.github.com>
Co-authored-by: Min Woo Lee 🧊 <77083090+minwoolee-ms@users.noreply.github.com>
Co-authored-by: Alan Zimmer <48699787+alzimmermsft@users.noreply.github.com>
Co-authored-by: Liudmila Molkova <limolkova@microsoft.com>
Co-authored-by: Jesse Squire <jsquire@microsoft.com>
Co-authored-by: Harsimar Kaur <skaur21@gmail.com>
Co-authored-by: vcolin7 <vicolina@microsoft.com>
Co-authored-by: Xiaofei Cao <92354331+XiaofeiCao@users.noreply.github.com>
Co-authored-by: Wes Haggard <weshaggard@users.noreply.github.com>
Co-authored-by: Patrick Hallisey <pahallis@microsoft.com>
Co-authored-by: Jair Myree <jairmyree@microsoft.com>
Co-authored-by: Weidong Xu <weidxu@microsoft.com>
Co-authored-by: Muyao Feng <92105726+Netyyyy@users.noreply.github.com>
Co-authored-by: Helen <56097766+heyams@users.noreply.github.com>
Co-authored-by: Harsimar Kaur (from Dev Box) <harskaur@microsoft.com>
  • Loading branch information
1 parent 52917f9 commit 47cf3d5
Show file tree
Hide file tree
Showing 16 changed files with 878 additions and 5 deletions.
5 changes: 5 additions & 0 deletions sdk/cosmos/azure-cosmos-test/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,11 @@
#### Bugs Fixed
* Fixed an issue where `FaultInjectionRule` can not apply on partition level when using `Gateway` Mode and non-session consistency - See [40005](https://github.com/Azure/azure-sdk-for-java/pull/40005)

### 1.0.0-beta.7 (2024-05-03)

#### Bugs Fixed
* Fixed an issue where `FaultInjectionRule` can not apply on partition level when using `Gateway` Mode and non-session consistency - See [40005](https://github.com/Azure/azure-sdk-for-java/pull/40005)

### 1.0.0-beta.6 (2023-10-24)
#### Features Added
* Added support for `ReadFeed` operation type - See [PR 37108](https://github.com/Azure/azure-sdk-for-java/pull/37108)
Expand Down

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion sdk/cosmos/azure-cosmos/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
### 4.60.0-beta.1 (Unreleased)

#### Features Added
* Added `cosmosVectorEmbeddingPolicy` in `cosmosContainerProperties` and `vectorIndexes` in `indexPolicy` to support vector search in CosmosDB - See[39379](https://github.com/Azure/azure-sdk-for-java/pull/39379)

* Added support for non-streaming OrderBy query and a query feature `NonStreamingOrderBy` to support Vector Search queries. - See [PR 39897](https://github.com/Azure/azure-sdk-for-java/pull/39897/)

Expand All @@ -13,7 +14,6 @@
#### Other Changes

### 4.59.0 (2024-04-27)

#### Features Added
* Added public APIs `getCustomItemSerializer` and `setCustomItemSerializer` to allow customers to specify custom payload transformations or serialization settings. - See [PR 38997](https://github.com/Azure/azure-sdk-for-java/pull/38997) and [PR 39933](https://github.com/Azure/azure-sdk-for-java/pull/39933)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -120,6 +120,15 @@ public static final class Properties {
public static final String SPATIAL_INDEXES = "spatialIndexes";
public static final String TYPES = "types";

// Vector Embedding Policy
public static final String VECTOR_EMBEDDING_POLICY = "vectorEmbeddingPolicy";
public static final String VECTOR_INDEXES = "vectorIndexes";
public static final String VECTOR_EMBEDDINGS = "vectorEmbeddings";
public static final String VECTOR_INDEX_TYPE = "type";
public static final String VECTOR_DATA_TYPE = "dataType";
public static final String VECTOR_DIMENSIONS = "dimensions";
public static final String DISTANCE_FUNCTION = "distanceFunction";

// Unique index.
public static final String UNIQUE_KEY_POLICY = "uniqueKeyPolicy";
public static final String UNIQUE_KEYS = "uniqueKeys";
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
import com.azure.cosmos.models.ClientEncryptionPolicy;
import com.azure.cosmos.models.ComputedProperty;
import com.azure.cosmos.models.ConflictResolutionPolicy;
import com.azure.cosmos.models.CosmosVectorEmbeddingPolicy;
import com.azure.cosmos.models.IndexingPolicy;
import com.azure.cosmos.models.ModelBridgeInternal;
import com.azure.cosmos.models.PartitionKeyDefinition;
Expand All @@ -24,6 +25,8 @@
import java.util.Collection;
import java.util.Collections;

import static com.azure.cosmos.implementation.guava25.base.Preconditions.checkNotNull;

/**
* Represents a document collection in the Azure Cosmos DB database service. A collection is a named logical container
* for documents.
Expand All @@ -40,6 +43,7 @@ public final class DocumentCollection extends Resource {
private UniqueKeyPolicy uniqueKeyPolicy;
private PartitionKeyDefinition partitionKeyDefinition;
private ClientEncryptionPolicy clientEncryptionPolicyInternal;
private CosmosVectorEmbeddingPolicy cosmosVectorEmbeddingPolicy;

/**
* Constructor.
Expand Down Expand Up @@ -410,6 +414,33 @@ public void setClientEncryptionPolicy(ClientEncryptionPolicy value) {
this.set(Constants.Properties.CLIENT_ENCRYPTION_POLICY, value, CosmosItemSerializer.DEFAULT_SERIALIZER);
}

/**
* Gets the Vector Embedding Policy containing paths for embeddings along with path-specific settings for the item
* used in performing vector search on the items in a collection in the Azure CosmosDB database service.
*
* @return the Vector Embedding Policy.
*/
public CosmosVectorEmbeddingPolicy getVectorEmbeddingPolicy() {
if (this.cosmosVectorEmbeddingPolicy == null) {
if (super.has(Constants.Properties.VECTOR_EMBEDDING_POLICY)) {
this.cosmosVectorEmbeddingPolicy = super.getObject(Constants.Properties.VECTOR_EMBEDDING_POLICY,
CosmosVectorEmbeddingPolicy.class);
}
}
return this.cosmosVectorEmbeddingPolicy;
}

/**
* Sets the Vector Embedding Policy containing paths for embeddings along with path-specific settings for the item
* used in performing vector search on the items in a collection in the Azure CosmosDB database service.
*
* @param value the Vector Embedding Policy.
*/
public void setVectorEmbeddingPolicy(CosmosVectorEmbeddingPolicy value) {
checkNotNull(value, "cosmosVectorEmbeddingPolicy cannot be null");
this.set(Constants.Properties.VECTOR_EMBEDDING_POLICY, value, CosmosItemSerializer.DEFAULT_SERIALIZER);
}

public void populatePropertyBag() {
super.populatePropertyBag();
if (this.indexingPolicy == null) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,7 @@ public CompositePathSortOrder getOrder() {
}

/**
* Gets the sort order for the composite path.
* Sets the sort order for the composite path.
* <p>
* For example if you want to run the query "SELECT * FROM c ORDER BY c.age asc, c.height desc",
* then you need to make the order for "/age" "ascending" and the order for "/height" "descending".
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -347,6 +347,28 @@ public CosmosContainerProperties setClientEncryptionPolicy(ClientEncryptionPolic
return this;
}

/**
* Gets the Vector Embedding Policy containing paths for embeddings along with path-specific settings for the item
* used in performing vector search on the items in a collection in the Azure CosmosDB database service.
*
* @return the Vector Embedding Policy.
*/
public CosmosVectorEmbeddingPolicy getVectorEmbeddingPolicy() {
return this.documentCollection.getVectorEmbeddingPolicy();
}

/**
* Sets the Vector Embedding Policy containing paths for embeddings along with path-specific settings for the item
* used in performing vector search on the items in a collection in the Azure CosmosDB database service.
*
* @param value the Vector Embedding Policy.
* @return the CosmosContainerProperties.
*/
public CosmosContainerProperties setVectorEmbeddingPolicy(CosmosVectorEmbeddingPolicy value) {
this.documentCollection.setVectorEmbeddingPolicy(value);
return this;
}

Resource getResource() {
return this.documentCollection;
}
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
// Copyright (c) Microsoft Corporation. All rights reserved.
// Licensed under the MIT License.

package com.azure.cosmos.models;

import com.fasterxml.jackson.annotation.JsonValue;

import java.util.Arrays;

/**
* Data types for the embeddings in Cosmos DB database service.
*/
public enum CosmosVectorDataType {
/**
* Represents a int8 data type.
*/
INT8("int8"),

/**
* Represents a uint8 data type.
*/
UINT8("uint8"),

/**
* Represents a float16 data type.
*/
FLOAT16("float16"),

/**
* Represents a float32 data type.
*/
FLOAT32("float32");

private final String overWireValue;

CosmosVectorDataType(String overWireValue) {
this.overWireValue = overWireValue;
}

@JsonValue
@Override
public String toString() {
return this.overWireValue;
}

/**
* Method to retrieve the enum constant by its overWireValue.
* @param value the overWire value of the enum constant
* @return the matching CosmosVectorDataType
* @throws IllegalArgumentException if no matching enum constant is found
*/
public static CosmosVectorDataType fromString(String value) {
return Arrays.stream(CosmosVectorDataType.values())
.filter(vectorDataType -> vectorDataType.toString().equalsIgnoreCase(value))
.findFirst()
.orElseThrow(() -> new IllegalArgumentException(String.format(
"Invalid vector data type with value {%s} for the vector embedding policy.", value)));
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
// Copyright (c) Microsoft Corporation. All rights reserved.
// Licensed under the MIT License.

package com.azure.cosmos.models;

import com.fasterxml.jackson.annotation.JsonValue;

import java.util.Arrays;

/**
* Distance Function for the embeddings in the Cosmos DB database service.
*/
public enum CosmosVectorDistanceFunction {
/**
* Represents the euclidean distance function.
*/
EUCLIDEAN("euclidean"),

/**
* Represents the cosine distance function.
*/
COSINE("cosine"),

/**
* Represents the dot product distance function.
*/
DOT_PRODUCT("dotproduct");

private final String overWireValue;

CosmosVectorDistanceFunction(String overWireValue) {
this.overWireValue = overWireValue;
}

@JsonValue
@Override
public String toString() {
return this.overWireValue;
}

/**
* Method to retrieve the enum constant by its overWireValue.
* @param value the overWire value of the enum constant
* @return the matching CosmosVectorDataType
* @throws IllegalArgumentException if no matching enum constant is found
*/
public static CosmosVectorDistanceFunction fromString(String value) {
return Arrays.stream(CosmosVectorDistanceFunction.values())
.filter(vectorDistanceFunction -> vectorDistanceFunction.toString().equalsIgnoreCase(value))
.findFirst()
.orElseThrow(() -> new IllegalArgumentException(String.format(
"Invalid distance function with value {%s} for the vector embedding policy.", value )));
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
// Copyright (c) Microsoft Corporation. All rights reserved.
// Licensed under the MIT License.

package com.azure.cosmos.models;

import com.azure.cosmos.implementation.Constants;
import com.azure.cosmos.implementation.JsonSerializable;
import com.azure.cosmos.implementation.apachecommons.lang.StringUtils;
import com.fasterxml.jackson.annotation.JsonProperty;
import static com.azure.cosmos.implementation.guava25.base.Preconditions.checkNotNull;

/**
* Embedding settings within {@link CosmosVectorEmbeddingPolicy}
*/
public final class CosmosVectorEmbedding {
@JsonProperty(Constants.Properties.PATH)
private String path;
@JsonProperty(Constants.Properties.VECTOR_DATA_TYPE)
private String dataType;
@JsonProperty(Constants.Properties.VECTOR_DIMENSIONS)
private Long dimensions;
@JsonProperty(Constants.Properties.DISTANCE_FUNCTION)
private String distanceFunction;
private JsonSerializable jsonSerializable;

/**
* Constructor
*/
public CosmosVectorEmbedding() {
this.jsonSerializable = new JsonSerializable();
}

/**
* Gets the path for the cosmosVectorEmbedding.
*
* @return path
*/
public String getPath() {
return path;
}

/**
* Sets the path for the cosmosVectorEmbedding.
*
* @param path the path for the cosmosVectorEmbedding
* @return CosmosVectorEmbedding
*/
public CosmosVectorEmbedding setPath(String path) {
if (StringUtils.isEmpty(path)) {
throw new NullPointerException("embedding path is either null or empty");
}

if (path.charAt(0) != '/' || path.lastIndexOf('/') != 0) {
throw new IllegalArgumentException("");
}

this.path = path;
return this;
}

/**
* Gets the data type for the cosmosVectorEmbedding.
*
* @return dataType
*/
public CosmosVectorDataType getDataType() {
return CosmosVectorDataType.fromString(dataType);
}

/**
* Sets the data type for the cosmosVectorEmbedding.
*
* @param dataType the data type for the cosmosVectorEmbedding
* @return CosmosVectorEmbedding
*/
public CosmosVectorEmbedding setDataType(CosmosVectorDataType dataType) {
checkNotNull(dataType, "cosmosVectorDataType cannot be null");
this.dataType = dataType.toString();
return this;
}

/**
* Gets the dimensions for the cosmosVectorEmbedding.
*
* @return dimensions
*/
public Long getDimensions() {
return dimensions;
}

/**
* Sets the dimensions for the cosmosVectorEmbedding.
*
* @param dimensions the dimensions for the cosmosVectorEmbedding
* @return CosmosVectorEmbedding
*/
public CosmosVectorEmbedding setDimensions(Long dimensions) {
checkNotNull(dimensions, "dimensions cannot be null");
if (dimensions < 1) {
throw new IllegalArgumentException("Dimensions for the embedding has to be a long value greater than 0 " +
"for the vector embedding policy");
}

this.dimensions = dimensions;
return this;
}

/**
* Gets the distanceFunction for the cosmosVectorEmbedding.
*
* @return distanceFunction
*/
public CosmosVectorDistanceFunction getDistanceFunction() {
return CosmosVectorDistanceFunction.fromString(distanceFunction);
}

/**
* Sets the distanceFunction for the cosmosVectorEmbedding.
*
* @param distanceFunction the distanceFunction for the cosmosVectorEmbedding
* @return CosmosVectorEmbedding
*/
public CosmosVectorEmbedding setDistanceFunction(CosmosVectorDistanceFunction distanceFunction) {
checkNotNull(distanceFunction, "cosmosVectorDistanceFunction cannot be null");
this.distanceFunction = distanceFunction.toString();
return this;
}
}

0 comments on commit 47cf3d5

Please sign in to comment.