[Paper] HexiSeq: Accommodating Long Context Training of LLMs over Heterogeneous Hardware
Long-context training of large language models (LLMs) is commonly distributed with Context Parallelism (CP) and Head Parallelism (HP), but existing training sys...