CTI and Blockchain Security Architectures
Initially, my research focused on three areas of CyberSecurity, including Malware and Phishing, Blockchain Architecture and Security, and Information Security through an incident-centered security framework. After narrowing down these three areas of interest, I found that Information Security through an incident-centered approach focused on cyber threat intelligence and AI/ML is most connected to my future research. Offensive and defensive security ebb and flow with the threat landscape (Defense Advanced Research Projects Agency (DARPA).
Information security is one of the most critical areas of Information technology. While information and data gaining a more significant role, the ability for secure methodologies is lacking due to the ever-changing nature of technological improvements and threat vectors. When an issue does arise, teams must form a plan to balance prevention and response. My experience in cyber security has been on the offensive side, utilizing exploits and payload delivery systems to compromise networks. My goal is to investigate the crossroads between ML/AI and InfoSEC via an incident-centered approach; thus my three papers summarized include topics on AI, Cyber Threat Intelligence (Mandiant), and Federated Learning Sytems threat vectors (FBI Cyber Division).
Artificial intelligence (TensorFlow) machine learning are quickly advancing in all industries. Information security is no different, with its efficiency being felt both on the offensive and defensive sides. Many companies do not use a Cyber Threat Intelligence (CTI) because of the complexity of data that moves through the system daily. This article focuses on CTI and its implementation. Many organizations may not have the skills and knowledge to use CTI in an already complex InfoSEC landscape. This paper also goes over a framework for applying CTI and the type of tools needed to implement it in the framework. AI and machine learning significantly contribute to CTI processes, which can better assist organizations in their security approach. This paper also helps organizations better parse out CTI actionable intelligence instead of the staggering amount of data. Parsing out what is important when dealing with such large amounts of data is very difficult, e.g. ML/AI plays a huge role in mitigating this issue.
According to Montasari R, et al 2022, the best way to mitigate security breaches is to have a robust CTI plan which “assists security teams in safeguarding their networks against cyber-attacks.” CTI is used by governments and organizations alike as both an offensive and defensive measure. CTI is a branch of computer cyber security that focuses on the type of threats and intelligence gathering.
CTI is a branch of Cyber Security that concerns the contextual information surrounding cyber-attacks, i. e. the understanding of the past, present, and future tactics, techniques, and procedures (TTPs) of a wide variety of threat actors. It is actionable and timely and has business values in that it can inform the security teams in organizations of adversarial entities so that they can prevent them. CTI is also a proactive security measure that involves the gathering, collation, and analysis of information concerning potential attacks in real-time so as to prevent data breaches and subsequent adverse consequences. (Montasari R, et al 2022)
CTI’s goal is to deliver detailed information on the security threats that pose a higher risk to an organization’s infrastructure. One example of how CTI could have helped a real-world breach includes the Colonial Pipeline Hack; which was a ransomware-style attack on an industrial pipeline. It was later discovered to have originated with Russian operatives in May 2021. The results of the attack had an immediate impact on both the industrial infrastructure and oil prices since the pipeline was a huge portion of the east coast oil transportation. If the industrial systems had enabled a CTI-style approach, they may have picked up the ransomware intelligence earlier. According to Montasari R, et al 2022, “A CTI network can be considered as a combination of regular updating and learning feeds that develop the basis of powerful layered network security. Such threat feeds enable individual devices and networks to take advantage of the intelligence of numerous devices to safeguard their endpoints and networks.”
The model of CTI is broken up into Tactical CTI, Technical CTI, Operational CTI, and Strategic CTI (U.S. Department of Homeland Security):
Tactical CTI includes a proactive cybersecurity posture and strengthens risk management policies. Tactical Cyber Threat Intelligence Tactical CTI (TaCTI) focuses on the techniques and procedures of threat actors such as methodologies, tools, and tactics rely on sufficient resources and includes certain specific measures against malicious actors attempting to infiltrate a network or system.
Technical CTI poster “connects details associated with attacks rapidly and accurately.” (Montasari R, et al 2022)
Operational CTI provides “context and relevance to large amounts of data … to gain better insight into how threat actors plan, carry out, and sustain offensive and major operations.” (Montasari R, et al 2022)
Strategic CTI is focused on situational awareness in the threat landscape.
CTI has a robust way of dealing with security functions at every level. “CTI can facilitate better understanding into cyber threats, enabling faster, more targeted responses.” (Montasari R, et al 2022) This is particularly important with the world of cyber security evolving at such a rapid pace, the information today will be outdated within the hour. Having a CTI framework and shifting posters from Tactical to Technical, Operational, and Strategic CTI will have beneficial effects for all organizations. More research can be done on CTI and its relationship to decentralized systems and specific threat vectors.
Federated Learning systems are intrinsically connected to learning models for Machine Learning (ML), Artificial Intelligence (AI), and blockchain-based architectural data structures. In these papers, “FL is defined as a machine learning paradigm in which multiple clients work together to train a model under the coordination of a central server, while the training data remains stored locally.” (Kairouz et al. 2019) The main purpose of a Federated Learning system (FL) is to share training data anonymously from trusted workers. While Federated Learning systems are one model for training data, it has gained significant popularity in the past several years, particularly with large ML training models. The focus of this article is on the threats, attacks, and defenses to Federated Learning systems. With the prominence of cloud computing and blockchain decentralization of data, FL systems have gained significant traction, and so have the risks. The three phases of attack and defense to FL include:
Data and Behaviour Auditing Phase
Training Phase
Predicting Phase
Each one of the above phases intrinsically has an issue with attack/defense, which can be placed in taxonomy to assist in proactive and reactive responses, thus connecting to the initial premise of InfoSEC and its a connection to ML/AI with a lens of proactive vs reactive incident centric focus. FL models are trained locally and aggregated at a central server. As this process takes affect, a global model is obtained. FL can be categorized into Horizontal FL (HFL), Vertical FL (VFL), or Federated Transfer Learning (FTL). (Yang et al. 2019)
Offensive Taxonomy of Security Risks with a focus on Privacy Inference:
Some of the security risks associated with Federated Learning include Eavesdropping, privacy inference, model poisoning, data poisoning, and evasion. Since machine learning is dependent on high-quality data, the FL model, along with it’s central server relationships have serious security concerns and threat vectors. According to the article, the threat model is dependent on the “FL data of each local worker is available and invisible.” (Liu et al. 2022). This invisibility and steady data stream from the local worker to the central server makes model and data poisoning a real threat. Adversaries can “cause damage to data and systems by social engineering, penetration attacks, backdoor attacks, and advanced persistent threat (APT) attacks. (Liu et al. 2022) Since data is streamed through different devices from a local level, these devices can have vulnerabilities that can cause risk, e.g. cross-device data sharing at the local level. This can become present from authorization issues, exploits and compromised data flow. Taxonomy was a large portion of this research. Attacks can be can be categorized into:
Privacy Inference Attacks Against FL in the Training Phase
Membership inference attacks are used to infer whether an exact data was used to train. This gives the threat actor intelligence on the model parameters and training protocol.
Class representative inference attacks “aim to obtain prototypical samples on a target label that the adversary does not own.” (Liu et al. 2022) This type of attack can “infer” privacy information on victim data labels.
Property inference attacks “infer meta characteristics of other participants’ training data.” (Melis et al. 2019)
Data reconstruction attacks, according to (Liu et al. 2022) aim “to reconstruct training samples and/or associated labels accurately that was used during training.” This can include: DLG/iDLG (deep leakage from gradient), inverting gradients, GradInversion, and Generative Adversarial Networks (GAN). The threat of these types of attacks can result in the full reconstruction of the data sets being trained on, which obviously has huge privacy and security implications.
Ranking data in an FL model, data quality, can have defensive qualities to prevent compromization from poisining type attacks. One technique is the “data quality assessment in FL.” (cite main source) The article discusses the data quality assessment, while challenging in a FL architecture, can be possible with reading credibility through the historical behavior of local workers and central servers. “Malicous local workers usually behave differently than most trusted local workers, therefore by auditing the model behavior uploaded to the central server, the untrusted local workers can be eliminated.” (Liu et al. 2022)
Defensive Taxonomy of Security Risks with a focus on Privacy Inference:
Compression Gradients are used to improve communication between the local hosts and central server while the training models are being implemented. “Existing strategies to resisting private inference are usually based on processing shared gradient information.” (Liu et al. 2022)
Cryptology Gradients can be classified according to Homomorphic Encryptions (HE) or secured multi-party computing (SMC). Each one has specific cryptographical benefits and weaknesses. According to (this article cite), HE allows the data to be fully encrypted and processed, but the negative effects are the performance loss due to the encryption and process. SMC “enables individual participants to perform joint calculations on their inputs without revealing their own information.” Accuracy and privacy are increased, though at the expense of performance.
Perturbation Gradients is basically adding “noise” to the data to distract all but the trusted connections. However, “adding sufficient calibration noise to guarantee the data privacy, which may impair the performance of the model…and can lead to high training costs.” (Seif et al. 2020)
Federated Learning and the inherent security limitations within having data silos and local/central server relationships for Machine Learning frameworks. (Liu et al. 2022) Each offensive and defensive threat vector was taxonomically placed, which can benefit real-world applications. With the increased use of Machine Learning and Artificial Intelligence data models and training needed, companies and organizations are looking for better ways to train models, in this case, Federated Learning models. Further research can be done on the behavior auditing phase, which is a new area to increase efficiency, though it opens up another attack vector.
References:
Kairouz P, McMahan HB, Avent B et al (2019) Advances and open problems in federated learning. CoRR arXiv:1912.04977
Liu P, Xiangrui X, Wang W. (2022) Threats, attacks and defenses to federated learning: issues, taxonomy and perspectives. https://doi.org/10.1186/s42400-021-00105-6
Melis L, Song C. De Cristofaro E, Shmatikov V (2019) Exploiting unintended feature leakage in collaborative learning. In: 2019 IEEEE symposium on security and privacy (SP). IEEE, pp 691-706. https://doi.org/10.1109/SP.2019.00029
Montasari R, Carroll F, Jahankhani H, et al. (2022) Application of Artificial Intelligence and Machine Learning in Producing Actionable Cyber Threat Intelligence. https://www.springer.com/series/5540
Seif M, Tandon R, Li M (2020) Wireless federated learning with local differential privacy. In: 2020 IEEE international symposium on information theory (ISIT). IEEE, pp 2604-2609. https://doi.org/10.1109/ISIT44484.2020.9174426
Yang Q, Liu Y, Chen T, TOng Y (2019) Federated machine learning: concept and applications. ACM TRans Intell Syst Technology 10(2): 12:1-12:19. https://doi.org/10.1145/3298981