Multiagent RL and Scalability Challenges for Random Access in MTC
Machine-type communication (MTC) paradigm in wireless communication will help connect millions of devices to perform different tasks without human involvement. There are numerous use cases of MTC, such as a factory automation, fleet management, smart homes and smart metering, e-health and smart logistics, etc. Mostly, the devices called machine-type devices (MTDs) in the MTC network are power-constrained low-complexity devices that are battery-operated. Moreover, as opposed to the devices in human-type communication (HTC), these devices will have packets of short-lengths and random sleep-cycles. Due to these peculiarities — especially in terms of traffic characteristics — MTC systems demand novel multiple access techniques as opposed to the schemes in HTC: grant-free access and grant-based access .
It has been established that the grant-based or scheduling techniques are not feasible for the traffic that has short packet lengths and when devices are battery-constrained. This is due to the fact that grant-based schemes such as TDMA and FDMA require exchange of signaling, which consumes battery and a massive signaling to send a packet that may have payload data smaller than control data used for signaling. A preferred option is uncoordinated grant-based random access (RA) schemes such as slotted ALOHA for MTC. In uncoordinated RA, each users selects a physical resource randomly and selects its data to the common receiver. Such scheme requires minimal signaling and it is simple to implement; however, as we know, RA schemes suffer from high number of collisions when traffic intensity is high. For these reasons, there is a need to design and develop novel access schemes for MTC systems or better collision resolution mechanisms based on the new traffic models that are suited well for MTC.
In the recent years, a significant number of research works have been performed using machine learning and specifically reinforcement learning (RL) for resource allocation in wireless networks. These works employ state-of-the-art RL techniques to design resource allocation policies and for random access, since each user in the network takes the decision independently without knowing the actions and state of the other users, the problem is usually modelled using multiagent RL and partially observable environment.
Roughly speaking, multiagent RL techniques can be divided into centralized training decentralized execution (CTDE) and completely decentralized training and execution. For MTC network, the use of multiagent RL for RA becomes even more challenging due to the following requirements.
MARL-based Access Scheme Requirements for MTC
- Since the devices in MTC network are battery-constrained and have very low computational complexity and they have variable sleep cycles, the devices are required to have a same channel access policy, which can be deployed in a distributed way.
- For the same reasons, training on such devices is not feasible and therefore we need the CTDE mechanism to learn a policy for the devices.
- The scheme must be scalable to a large number of devices.
- The scheme should have a low signaling overhead and therefore, communication between users is not feasible.
Parameter Sharing — a solution?
Parameter sharing is perhaps the simplest yet very useful form of multiagent RL with CTDE, in which the main idea is to extend the single agent network to the multiagent system . All the agents use the same network or the same function approximator to calculate the value of each state and the agents are homogeneous: same action-space, rewards and state-space, and therefore the same policy. This means that agents with the same state will have the same values against that state. To distinguish between the states, agent ID is usually encoded in the state so that every agent will have a unique state. One of the advantages of the parameter sharing is that it can be scaled better than the other multiagent techniques. However, the extent of the scalability is not known yet and perhaps it is dependent on the problem and model of the environment. Anyhow, for MTC, where for the same type of devices it is reasonable to use parameter sharing for scalability, since the same type of devices will have same tasks to be performed; however, there’s an issue!
The Issue: The devices in MTC network can join, and leave the network randomly and even new devices can join the network. Therefore, incorporating agent ID in the state space doesn’t really make sense and it can only complicate things.
We have presented couple of our works on this area where we didn’t use any agent ID in the state and using different reward function in both of our works  and . We have shown that parameter sharing behaves reasonable, but perhaps for the traffic modelled considered in both of these works, there is nothing else to gain. We used a single resource and multiple number of users to design transmit probability of users. In , we presented our scalability analysis. We used DQN and we plan to use actor critic methods to see if we can learn a better probability distribution over action space. The problem with using actor-critic multiagent RL with centralized critic and decentralized actors is that the dimensions of the centralized critic increase as the number of agents increase. Therefore, we can either compress this information which is already limited or we can just use local actor and local critic for each agent, which we believe will not provide any performance gain over DQN.
The issue of scalability for MARL algorithms for machine-type communication is that it’s challenging to design an access policy. For RL perspective, if on one hand, parameter sharing allows us to scale for the same type of agents, but on the other hand, we need to design an unsourced grant-free random access policy in which we don’t need to know the identity of the agents. Furthermore, the traffic characteristics of MTC network needs to be considered in designing access policy. The traffic generation of devices in MTC network is independent as well as correlated. For instance, in case of an event such as fault detection in a machine or fire or quake, there is a high probability of devices becoming active together around the epicenter of the event together.
In this post, we have highlighted some of the high-level challenges in terms of of designing random access protocols for MTC networks using RL. Most of the works do not really talk about scalability and most of the works do not consider signaling overhead factor when designing multiagent RL system. Such techniques cannot be directly applied in MTC networks and therefore the above mentioned factors should be considered to design random access policies for MTC.
 C. Bockelmann, N. Pratas, H. Nikopour, K. Au, T. Svensson, C. Stefanovic, P. Popovski, and A. Dekorsy, “Massive machine-type communications in 5G: physical and MAC-layer solutions,” IEEE Communications Magazine, vol. 54, no. 9, pp. 59–65, 20
 J. K. Gupta, M. Egorov, and M. Kochenderfer, “Cooperative multi-agent control using deep reinforcement learning,” in Autonomous Agents and Multiagent Systems (G. Sukthankar and J. A. Rodriguez-Aguilar, eds.), (Cham), pp. 66–83, Springer International Publishing, 2017.
. M. A Jadoon, A. Pastore, M. Navarro, F. Perez-Cruz, “Deep Reinforcement Learning for Random Access in Machine-Type Communication”, in IEEE WCNC April, 2022.
. M. A Jadoon, A. Pastore, M. Navarro, “Collision Resolution with Deep Reinforcement Learning for Random Access in Machine-Type Communication”, in VTC Spring June, 2022.