Skip to content

[Bug]: Collection Schema causes client to fail to connect #5826

@tjkrusinskichroma

Description

@tjkrusinskichroma

What happened?

I created a collection in python with the following code:

chroma_client = chromadb.CloudClient(
    api_key=os.getenv("CHROMA_API_KEY"),
    tenant=os.getenv("CHROMA_TENANT"),
    database=os.getenv("CHROMA_DATABASE")
)

schema = Schema()

schema.create_index(
    config=VectorIndexConfig(
        space="cosine",
        embedding_function=OpenAIEmbeddingFunction(
            model_name="text-embedding-3-small"
        )
    )
)

schema.create_index(
    config=SparseVectorIndexConfig(
        source_key=K.DOCUMENT,
        embedding_function=ChromaBm25EmbeddingFunction(),
        bm25=True
    ),
    key=SPARSE_KEY
)

collection = client.get_or_create_collection(
    name=collection_name,
    schema=schema
)

If I then mirror the schema in Typescript:

const chromaClient = new CloudClient({
	apiKey: process.env.CHROMA_API_KEY,
	tenant: process.env.CHROMA_TENANT,
	database: process.env.CHROMA_DATABASE,
});

const chromaSchema = new Schema();

chromaSchema.createIndex(
	new VectorIndexConfig({
		space: "cosine",
		embeddingFunction: new OpenAIEmbeddingFunction({
			modelName: "text-embedding-3-small",
		}),
	})
);

chromaSchema.createIndex(
	new SparseVectorIndexConfig({
		sourceKey: "#document",
		embeddingFunction: new ChromaBm25EmbeddingFunction(),
		bm25: true,
	}),
	'bm25'
);

const chromaCollection = await chromaClient.getOrCreateCollection({
	name: "national_parks_11",
	schema: chromaSchema,
});

When running the typescript file, I get

ChromaConnectionError: Unable to connect to the chromadb server (status: 500). Please try again later.
 cause: undefined,

      at <anonymous> (/Users/tjkrusinski/Software/data/agents/mastra/node_modules/chromadb/dist/chromadb.mjs:3907:9)
      at async <anonymous> (/Users/tjkrusinski/Software/data/agents/mastra/node_modules/chromadb/dist/chromadb.mjs:334:32)
      at async getOrCreateCollection (/Users/tjkrusinski/Software/data/agents/mastra/node_modules/chromadb/dist/chromadb.mjs:4318:43)

If I remove the schema from the getOrCreateCollection, I then get the error:

error: Cannot find module '@chroma-core/default-embed' from '/Users/tjkrusinski/Software/data/agents/mastra/node_modules/chromadb/dist/chromadb.mjs'
1415 |     if (!knownEmbeddingFunctions.has(new DefaultEmbeddingFunction().name)) {
1416 |       registerEmbeddingFunction("default", DefaultEmbeddingFunction);
1417 |     }
1418 |   } catch (e) {
1419 |     console.error(e);
1420 |     throw new Error(
                     ^
error: Cannot instantiate a collection with the DefaultEmbeddingFunction. Please install @chroma-core/default-embed, or provide a different embedding function
      at <anonymous> (/Users/tjkrusinski/Software/data/agents/mastra/node_modules/chromadb/dist/chromadb.mjs:1420:15)
      at async <anonymous> (/Users/tjkrusinski/Software/data/agents/mastra/node_modules/chromadb/dist/chromadb.mjs:1447:44)
      at async getOrCreateCollection (/Users/tjkrusinski/Software/data/agents/mastra/node_modules/chromadb/dist/chromadb.mjs:4313:36)

This is my search code:

const search = async (query: string): Promise<Array<{ excerpt: string }>> => {
	const denseRank = Knn({
		query,
		key: "#embedding",
		returnRank: true,
		limit: 20
	});

	const sparseRank = Knn({
		query,
		key: "bm25",
		returnRank: true,
		limit: 20
	});

	const rrf = Rrf({
		ranks: [denseRank, sparseRank],
		weights: [.7, .3],
		k: 60
	});

	const search = new Search().rank(rrf);

	const results = await chromaCollection.search(search);

	console.log(results)

	return [];
};

While it may not be super common so ingest and query from separate languages, I'm stuck as to what to do to resolve the issue.

If I supply an embedding function at the getOrCreateCollection call, I don't think the chromaCollection.search() will ignore the embedding function I provided earlier.

Versions

Chroma 1.2.1

All latest clients.

Relevant log output

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingclients

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions