Centroid Proximity Weighting
Here we map responses as embedding vectors, calculate their centroid, and measure distances from it. Closer vectors get higher weights, and the total weight for each output determines the most likely correct answer.
Dataset | Method/Metric | Llama 2 7B | Mistral 7B | GPT 3.5 | Llama 3 8B | GPT-4o mini |
---|---|---|---|---|---|---|
AQuA-RAT | SC baseline | 24.80 | 25.60 | 59.40 | 45.28 | 83.07 |
CPW | 24.60 (-0.2) | 29.00 (+3.4) | 68.00 (+8.6) | 46.06 (+0.78) | 82.68 (-0.39) | |
SVAMP | SC baseline | 46.50 | 68.50 | 79.80 | 73.33 | 89.80 |
CPW | 47.40 (+0.9) | 69.80 (+1.3) | 81.00 (+1.2) | 74.67 (+1.34) | 89.60 (-0.2) | |
StrategyQA | SC baseline | 48.91 | 67.98 | 66.81 | 63.32 | 79.18 |
CPW | 55.02 (+6.11) | 60.70 (-7.28) | 65.21 (-1.6) | 63.32 (+0.0) | 73.80 (-5.38) |