Reinforcement Learning Approach to Cybersecurity in Space (RELACSS)
Securing satellite groundstations against cyber-attacks is vital to national security missions. However, these cyber threats are constantly evolving. As vulnerabilities are discovered and patched, new vulnerabilities are discovered and exploited. In order to automate the process of discovering existing vulnerabilities and the means to exploit them, a reinforcement learning framework is presented in this report. We demonstrate that this framework can learn to successfully navigate an unknown network and detect nodes of interest despite the presence of a moving target defense. The agent then exfiltrates a file of interest from the node as quickly as possible. This framework also incorporates a defensive software agent that learns to impede the attacking agents progress. This setup allows for the agents to work against each other and improve their abilities. We anticipate that this capability will help uncover unforeseen vulnerabilities and the means to mitigate them. The modular nature of the framework enables users to swap out learning algorithms and modify the reward functions in order to adapt the learning tasks to various use cases and environments. Several algorithms, viz., tabular Q learning, deep Q networks, proximal policy optimization, advantage actor-critic, generative adversarial imitation learning, are explored for the agents and the results highlighted. The agent learns to solve the tasks in a light-weight abstract environment. Once the agent learns to perform sufficiently well, it can be deployed in a minimega virtual machine environment (or a real network) with wrappers that map abstract actions to software commands. The agent also uses a local representation of the actions called a ‘slot-mechanism’. This allows the agent to learn in a certain network and generalize it to different networks. The defensive agent learns to predict the actions taken by an offensive agent and uses that information to anticipate the threat. This information can then either be used to raise an alarm or to take actions to thwart the attack. We believe that with the appropriate reward design, a representative environment, and action set, this framework can be generalized to tackle other cybersecurity tasks. By sufficiently training these agents, we can anticipate vulnerabilities leading to robust future designs. We can also deploy automated defensive agents that can help secure satellite groundstation and their vital national security missions.