Why do we need to aggregate frames into a multiframe in GSM networks?

I'm struggling to understand the need for frame aggregation into the multiframe and other higher level frames.
What exactly do we gain with the aggregation? Is this for synchronisation purposes across all the channels? Is this simplifying ...