Premium Only Content
OpenAI Embeddings (and Controversy?!)
#mlnews #openai #embeddings
COMMENTS DIRECTLY FROM THE AUTHOR (thanks a lot for reaching out Arvind :) ):
1. The FIQA results you share also have code to reproduce the results in the paper using the API: https://twitter.com/arvind_io/status/... There's no discrepancy AFAIK.
2. We leave out 6 not 7 BEIR datasets. Results on msmarco, nq and triviaqa are in a separate table (Table 5 in the paper). NQ is part of BEIR too and we didn't want to repeat it. Finally, the 6 datasets we leave out are not readily available and it is common to leave them out in prior work too. For examples, see SPLADE v2 (https://arxiv.org/pdf/2109.10086.pdf) also evaluates on the same 12 BEIR datasets.
3. Finally, I'm now working on time travel so that I can cite papers from the future :)
END COMMENTS FROM THE AUTHOR
OpenAI launches an embeddings endpoint in their API, providing high-dimensional vector embeddings for use in text similarity, text search, and code search. While embeddings are universally recognized as a standard tool to process natural language, people have raised doubts about the quality of OpenAI's embeddings, as one blog post found they are often outperformed by open-source models, which are much smaller and with which embedding would cost a fraction of what OpenAI charges. In this video, we examine the claims made and determine what it all means.
OUTLINE:
0:00 - Intro
0:30 - Sponsor: Weights & Biases
2:20 - What embeddings are available?
3:55 - OpenAI shows promising results
5:25 - How good are the results really?
6:55 - Criticism: Open models might be cheaper and smaller
10:05 - Discrepancies in the results
11:00 - The author's response
11:50 - Putting things into perspective
13:35 - What about real world data?
14:40 - OpenAI's pricing strategy: Why so expensive?
Sponsor: Weights & Biases
https://wandb.me/yannic
Merch: store.ykilcher.com
ERRATA: At 13:20 I say "better", it should be "worse"
References:
https://openai.com/blog/introducing-t...
https://arxiv.org/pdf/2201.10005.pdf
https://beta.openai.com/docs/guides/e...
https://beta.openai.com/docs/api-refe...
https://twitter.com/Nils_Reimers/stat...
https://medium.com/@nils_reimers/open...
https://mobile.twitter.com/arvind_io/...
https://twitter.com/gwern/status/1487...
https://twitter.com/gwern/status/1487...
https://twitter.com/Nils_Reimers/stat...
https://twitter.com/gwern/status/1470...
https://www.reddit.com/r/MachineLearn...
https://mobile.twitter.com/arvind_io/...
https://mobile.twitter.com/arvind_io/...
Links:
TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
Discord: https://discord.gg/4H8xxDF
BitChute: https://www.bitchute.com/channel/yann...
LinkedIn: https://www.linkedin.com/in/ykilcher
BiliBili: https://space.bilibili.com/2017636191
If you want to support me, the best thing to do is to share out the content :)
If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar: https://www.subscribestar.com/yannick...
Patreon: https://www.patreon.com/yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n
-
LIVE
GritsGG
15 hours agoBO7 Warzone Is Here! Win Streaking! New Leaderboard?
1,634 watching -
1:00:41
Coin Stories with Natalie Brunell
16 hours agoBitcoin, Chasing Freedom & Breaking the Victimhood Cycle with Efrat Fenigson
12.2K2 -
10:38
MattMorseTV
15 hours ago $19.13 earnedEU plans $2.34 Trillion ATTACK on U.S.A.
79.8K84 -
35:41
MetatronGaming
2 days agoLet's Play HEXEN!
9.35K3 -
2:08:37
Side Scrollers Podcast
19 hours agoNetflix/WB Will RUIN Entertainment + Anita Sarkessian “doesn’t deserve hate” + More | Side Scrollers
55.3K6 -
2:00:01
TruthStream with Joe and Scott
16 hours agoAdrenochrome, Time Travel and Wormholes! Our most mindbending and mindblowing interview to date!! Premiers 12/9 3pm pacific 6pm eastern
11.8K9 -
1:41:23
The Michelle Moore Show
18 hours ago'Setting the Record Straight On Many Myths and Lies We've Been Told Concerning Our Health' Guest, Dr. Margaret Aranda: The Michelle Moore Show (Dec 9, 2025)
16K4 -
18:16
Nikko Ortiz
14 hours agoFighting A War With Pistols...
10.4K -
1:39:49
PandaSub2000
1 day agoBehemoth - Part I | PSVR 2000 (Edited Replay)
45K2 -
26:03
GritsGG
15 hours agoWarzone Brought Back This Over Powered Item! Daily Warzone Tips! #warzone=
8.8K