Automated short-answer grading and misconception detection using large language models