Multiagent RL and Scalability Challenges for Random Access in MTC

A generic schematic of MTC Network in a small area
A generic schematic of MTC Network in a small area

MARL-based Access Scheme Requirements for MTC

  • Since the devices in MTC network are battery-constrained and have very low computational complexity and they have variable sleep cycles, the devices are required to have a same channel access policy, which can be deployed in a distributed way.
  • For the same reasons, training on such devices is not feasible and therefore we need the CTDE mechanism to learn a policy for the devices.
  • The scheme must be scalable to a large number of devices.
  • The scheme should have a low signaling overhead and therefore, communication between users is not feasible.

Parameter Sharing — a solution?

Parameter sharing is perhaps the simplest yet very useful form of multiagent RL with CTDE, in which the main idea is to extend the single agent network to the multiagent system [2]. All the agents use the same network or the same function approximator to calculate the value of each state and the agents are homogeneous: same action-space, rewards and state-space, and therefore the same policy. This means that agents with the same state will have the same values against that state. To distinguish between the states, agent ID is usually encoded in the state so that every agent will have a unique state. One of the advantages of the parameter sharing is that it can be scaled better than the other multiagent techniques. However, the extent of the scalability is not known yet and perhaps it is dependent on the problem and model of the environment. Anyhow, for MTC, where for the same type of devices it is reasonable to use parameter sharing for scalability, since the same type of devices will have same tasks to be performed; however, there’s an issue!

Concluding Remarks

The issue of scalability for MARL algorithms for machine-type communication is that it’s challenging to design an access policy. For RL perspective, if on one hand, parameter sharing allows us to scale for the same type of agents, but on the other hand, we need to design an unsourced grant-free random access policy in which we don’t need to know the identity of the agents. Furthermore, the traffic characteristics of MTC network needs to be considered in designing access policy. The traffic generation of devices in MTC network is independent as well as correlated. For instance, in case of an event such as fault detection in a machine or fire or quake, there is a high probability of devices becoming active together around the epicenter of the event together.


[1] C. Bockelmann, N. Pratas, H. Nikopour, K. Au, T. Svensson, C. Stefanovic, P. Popovski, and A. Dekorsy, “Massive machine-type communications in 5G: physical and MAC-layer solutions,” IEEE Communications Magazine, vol. 54, no. 9, pp. 59–65, 20


Article written and edited by Muhammad A. Jadoon (Medium profile), a PhD researcher at CTTC.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
ITN Windmill

ITN Windmill

You can always learn more about project Windmill on our homepage: