ChatGPT Generated Training Plans for Runners are not Rated Optimal by Coaching Experts, but Increase in Quality with Additional Input Information

June 15, 2026 | 473 words | 3min read

Paper Title: ChatGPT Generated Training Plans for Runners are not Rated Optimal by Coaching Experts, but Increase in Quality with Additional Input Information

Link to Paper: https://doi.org/10.52082/jssm.2024.56

Date: 2024

Paper Type: Running, Sport, LLM

Summary

The paper investigates how well ChatGPT can generate 6-week running training plans depending on how much detail the user provides. The authors created three different prompts for ChatGPT: a very basic request, a moderately detailed description of a runner, and a highly detailed, data-rich profile including training history, heart rate, goals, and available testing tools. These outputs were then evaluated by experienced endurance coaches using 22 scientifically grounded criteria covering training design, monitoring, progression, and supporting elements like nutrition and recovery. Each criterion was rated on a 1–5 Likert scale, and statistical tests were used to compare plan quality.

The results show a clear pattern: the more detailed the input, the better the training plan. The simplest prompt produced the worst-rated plans, with most criteria scoring below neutral. The medium-detail plan improved significantly, and the most detailed prompt produced the highest-rated plan overall. However, even the best ChatGPT-generated plan was still not considered optimal by experts and contained shortcomings, particularly in areas like progression control, individualized monitoring, injury risk management, and inclusion of evidence-based testing and feedback loops. Importantly, ChatGPT did not behave like a human coach—it did not ask clarifying questions or actively refine the plan based on missing information, which limited its ability to individualize training.

The authors conclude that while ChatGPT can generate structured running programs, their quality is inconsistent and strongly dependent on user input quality. They warn that novice runners, who tend to provide less detailed input, are especially at risk of receiving inadequate or potentially unsafe training advice. As a result, they do not recommend using ChatGPT-generated training plans without oversight from qualified coaches, although they acknowledge potential future improvements if AI systems become better integrated with evidence-based knowledge and personal physiological data.

My Thoughts

This is an interesting research direction, especially for digital coaching and AI-assisted training.
A more advanced follow-up study could:
- Use newer LLMs (since model capability likely affects plan quality and adaptability).
- Move from one-shot plan generation to an iterative, adaptive system, where the training plan is updated weekly based on real user feedback and performance data.
- Integrate wearable data (heart rate, pace, training load, sleep, etc.) to provide a more objective basis for adjustments.
- Evaluate the system in a controlled study design, comparing:
  - AI-assisted adaptive coaching vs.
  - Traditional coaching or static AI plans (control group)
  - and measure outcomes like performance, injury rates, adherence, and satisfaction.
The main limitation of the paper is not just “insufficient data,” but that the interaction model is static and non-adaptive. It treats ChatGPT like a one-time plan generator rather than a feedback-driven training system, which is closer to how real coaching works.

reply via email