Chain-of-thought prompting for small LLMs

Coordinated with a team of 4 to evaluate small LLMs (Llama 3.2 1B) on benchmark datasets: LogiQA, GSM8k, and QuaRTz.
Compared keyword and Chain-of-Thought (CoT) prompting techniques to discover keyword prompting is effective with 1 − 3% drop in accuracy

Link to the report!