Malware is pervasive in networks, and poses a critical threat to network security. However, we have very limited understanding of malware behavior in networks to date. In this paper, we investigate how malware propagate in networks from a global perspective. We formulate the problem, and establish a rigorous two layer epidemic model for malware propagation from network to network. Based on the proposed model, our analysis indicates that the distribution of a given malware follows exponential distribution, power law distribution with a short exponential tail, and power law distribution at its early, late and final stages, respectively. Extensive experiments have been performed through two real-world global scale malware data sets, and the results confirm our theoretical findings.Malware Propagation in Large-Scale Networks
Malware are malicious software programs deployed by cyber attackers to compromise computer systems by exploiting their security vulnerabilities.Motivated by extraordinary financial or political rewards, malware owners are exhausting their energy to compromise as many networked computers
as they can in order to achieve their malicious goals. A compromised computer is called a bot, and all bots compromised by a malware form a botnet. Botnets have become the attack engine of cyber attackers, and they pose critical challenges to cyber defenders. In order to fight against cyber criminals, it is important for defenders to understand malware behavior,such as propagation or membership recruitment patterns, the size of botnets, and distribution of bots.
We propose a two layer malware propagation model to describe the development of a given malware at the Internet level. Compared with he existing single layer epidemic models, the proposed model represents malware propagation better in large scale networks.We find the malware distribution in terms of networks varies from exponential to power law with a short exponential tail, and to power law
distribution at its early, late, and final stage, respectively. These findings are firstly theoretically proved based on the proposed model, and then confirmed by the experiments through the two large-scale real-world data sets.