This notebook gives an example on how to get embeddings from a large dataset.
1. Load the dataset
The dataset used in this example is fine-food reviews from Amazon. The dataset contains a total of 568,454 food reviews Amazon users left up to October 2012. We will use a subset of this dataset, consisting of 1,000 most recent reviews for illustration purposes. The reviews are in English and tend to be positive or negative. Each review has a ProductId, UserId, Score, review title (Summary) and review body (Text).
We will combine the review summary and review text into a single combined text. The model will encode this combined text and it will output a single vector embedding.