Cursor
evaluated the impact of semantic search on
agent code retrieval performance. Using semantic search in addition to grep, the
same agent was able to achieve up to 23.5% better accuracy in answering
questions versus an agent that used grep alone.
All models improve with semantic search
Model Relative Improvement (Cursor Context Bench)
────────────── ──────────────────────────────────────────────────────
Composer │▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ 23.5%
│
Gemini 2.5 Pro │▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ 8.7%
│
GPT-5 │▓▓▓▓▓▓▓▓▓▓▓▓▓▓ 6.5%
│
Grok Code │▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ 11.9%
│
Sonnet 4.5 │▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ 14.7%
source: cursor.com/blog/semsearch
Relative improvement by model
(Cursor Context Bench)
23.5%
▓▓▓
▓▓▓
▓▓▓
▓▓▓ 14.7%
▓▓▓ 11.9%
▓▓▓ ▓▓▓
▓▓▓ 8.7% ▓▓▓ ▓▓▓
▓▓▓ 6.5% ▓▓▓ ▓▓▓
▓▓▓ ▓▓▓ ▓▓▓ ▓▓▓
▓▓▓ ▓▓▓ ▓▓▓ ▓▓▓ ▓▓▓
▓▓▓ ▓▓▓ ▓▓▓ ▓▓▓ ▓▓▓
▓▓▓ ▓▓▓ ▓▓▓ ▓▓▓ ▓▓▓
────────────────────────
A B C D E
Models
A - Composer
B - Gemini 2.5 Pro
C - GPT-5
D - Grok Code
E - Sonnet 4.5
source:
cursor.com/blog/semsearch
Cursor ran A/B tests to measure how much agent-generated code was retained by
end users when the agent used semantic search versus just grep. They also
measured how much code required follow-ups or corrections. They found that
semantic search on turbopuffer increased code retention by 2.6% on large
codebases and decreased user dissatisfaction by 2.2%.
Semantic search improves code retention
and reduces dissatisfied user requests
Code Retention │▓ +0.3%
│
Code Retention (large codebases) │▓▓▓▓▓▓▓▓▓ +2.6%
│
Dissatisfied User Requests -2.2% ▓▓▓▓▓▓▓▓│
source: cursor.com/blog/semsearch
Semantic search improves
code retention & reduces
dissatisfied user requests
+2.6%
▓▓▓
▓▓▓
▓▓▓
▓▓▓
+0.3% ▓▓▓
▓▓▓
▓▓▓ ▓▓▓
──────────────────
A B C
▓▓▓
▓▓▓
▓▓▓
▓▓▓
▓▓▓
▓▓▓
-2.2%
A - Code Retention
B - Code Retention (large codebases)
C - Dissatisfied User Requests
source:
cursor.com/blog/semsearch