Large language models (LLMs) have recently sparked significant improvements across a large number of domains, including text-to-speech synthesis. The move from more traditional text-to-speech models to LLM models, however, is not simple for many applications as these models are much larger than traditional models and require the use of Classifier-Free Guidance (CFG) for optimal quality. This potentially limits the applications for which LLM text-to-speech models are suitable. In this paper, we aim to address these issues by exploring the use of knowledge distillation for transformer-based text-to-speech models. Namely, we investigate using knowledge distillation to directly optimize to the output of CFG, removing its need at inference time. In addition, we explore using knowledge distillation to significantly reduce the model size required. Altogether, we were able to reduce the model size by half, double inference speed, and remove the need for CFG without any perceivable drop in voice quality.