Difference in output when running via Trasformers.js and when hosting on Huggingface

I created an application that uses the UAE-large-V1 model inside Transformers.js and was able to embed sentences in a browser without issues. The model would return a single vector for a single input:
```javascript
extractor = await pipeline("feature-extraction", "WhereIsAI/UAE-Large-V1", {
      quantized: true,
});

let result = await extractor(text, { pooling: "mean", normalize: true });
```

When I hosted the model on Huggingface using their [inference endpoint](https://huggingface.co/docs/inference-endpoints/en/index) solution, it no longer works as expected. Instead of returning a single vector, it returns a variable length of 1024 dimension vectors. 

Sample input:
```
{
   "inputs":  "Where are you"
}
```

This returns a list of lists of lists of numbers.

Is there a way to make hosted model return a single vector? And why does the the model act differently based on where it's hosted?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Difference in output when running via Trasformers.js and when hosting on Huggingface #46

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Difference in output when running via Trasformers.js and when hosting on Huggingface #46

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions