Every business wants more data. Data on their customers, competition, operations, processes, employees, inventory and more. Data can be used to make business decisions and provide strategic insights that gives companies a competitive advantage. This can be in terms of efficiencies, enhancing the customer experience, or refining market strategy. Its uses are limitless. Over the last decade, computing power has advanced where generating and storing data has become much simpler and cost efficient.
With all that data available, most businesses struggle to figure out what to do with it all now that they have it. According to Forrester, up to 73% of data within an enterprise goes unused for analytics. We are so used to extracting targeted information from data that we simply ignore what we don’t understand and throw it away as noise. This problem is prevalent in every industry, but especially in the security world. Security teams are overwhelmed with the vast amounts of data generated from firewalls, intrusion detection systems, network appliances and other devices. It’s impossible to expect security teams to interpret all this data. We unintentionally end up focusing on what we already know and ignoring what we don’t.
Typical alerting systems are configured to raise alarms, but only when they encounter a binary event or reach a threshold. For example, if three or more failed authentication attempts performed in succession are detected, generate an alert. Yet successful authentication attempts are mostly categorized as business as usual and ignored. The current mean time to detect a breach is over six months. Most organizations have all the data they need to identify a breach much faster than the six month average, yet they are still unable to detect and react to a breach in a semi-reasonable amount of time. This is due to:
- The volume and velocity of the data being generated
- Not looking for patterns in all of the data available – the unknown unknowns
- Not having the proper context for the data available
If your system is ever breached, you don’t necessarily need to look at the failed authentication events, you need to look for anomalies in the successful ones!
Most organizations are well down the path on their journey of capturing and storing all of their data for future analytics. Data Lakes are large repositories of raw data in any format. Capturing, storing and securing that data is key. Once the data is available, it can be analyzed and its value maximized using a variety of methods. This is where the fun starts!
On HPE NonStop servers, XYGATE Merged Audit (XMA) gathers, normalizes and stores security audit data from the system and its applications. Merged Audit is your central repository for all NonStop security data. This is your NonStop Security Data Lake. In some environments, the data XMA gathers can amount to tens of millions of records per system, per day. With that kind of volume, you might think it’s nearly impossible to draw all of the value out of from this massive amount of data. This data can be fed to an external SIEM or SOAR for alerting, but most of it likely falls into that 73% of noise.
Machine learning is all the rage in the industry these days and there is no doubt vendors seek to capitalize on the hype. Unlike statistical analysis, used for decades to draw inferences about the relationship between variables, machine learning is about the results and predictability of new inputs. There are a variety of machine learning technologies. The availability of data and computing power creates the unique challenge of leveraging machine learning to its full potential. Now with the availability of large volumes of training data, Graphic Processing Units (GPU) for fast computation of matrix operations, better activation functions and better architecture, it’s become far easier to construct and attempt to train the necessary deep networks for accurate machine learning. We’ll discuss two approaches on how to maximize the value of your data: Supervised and Unsupervised machine learning.
Supervised Machine Learning algorithms apply what has been learned about data in the past to new inputs using labeled examples to predict future results. For example, in cancer diagnosis, a large amount of patient data is gathered regarding the characteristics of a tumor. Since we know which data inputs are benign and which are malignant based on a variety of factors, we could label the data as such. Then by simply knowing the cell density and tumor size of new patient inputs, we could predict if the new data is identified as benign or malignant – or if you’re a fan of HBO’s Silicon Valley – if the data is a hot dog or not a hot dog.
Unsupervised Machine Learning – the data is not labeled. Rather, the algorithm finds the underlying patterns in the data without known, or labeled outcomes. Unsupervised learning is most commonly implemented to identify previously unknown patterns in data. This is used for clustering data but especially useful in anomaly detection which can identify fraudulent transactions, identify human errors and even cybersecurity attacks.
Supervised vs Unsupervised for Anomaly Detection
A supervised model “learns” by repeatedly comparing its predictions, given a set of inputs, against the “ground truth” label (the reality you want the model to predict) associated with those inputs. It then adjusts parameters such that the model’s predictions become more accurate. The model is essentially memorizing the input/output combinations. The goal is to have a model that makes good predictions against both the training data it has already seen as well as the future data that is yet to be seen.
In this way, the model learns a generalized way to recognize patterns within the data on which it’s been trained. In most contexts, this is exactly what is desired, but the corollary is that these supervised models do not perform well in unusual circumstances, i.e. in the face of inputs that are dissimilar from that on which they’ve been trained. In layman’s terms – if the security guard has been trained to recognize only faces and not patterns of comings and goings, they’re not going to recognize whether “Bob” coming into the office on a Tuesday is an anomaly and raise an alert. Because they have only been trained to know Bob’s face and not Bob’s working patterns – they will do nothing.
This is a key reason why supervised models are not typically used for anomaly detection. Mathematically, the supervised model is trying to determine the probability of an intrusion given a specific input vector whereas the unsupervised model is merely trying to determine the probability of seeing that specific input vector. In the unsupervised case for anomaly detection, probabilities below a determined threshold are flagged as anomalies.
NLP N-Grams: A Case Study in Pattern Recognition
Natural Language Processing (NLP) is a type of technology used to aid computers in understanding the natural human language. It is used in devices such as Amazon’s Alexa, Apple’s Siri and more. NLP relies on machine learning to derive meaning from human words, their sequences and the patterns they create together at varying frequencies. For example “See Jane Run” is a common pattern in English, where “Jane Run See” is not so common. A machine learning algorithm will churn through the data and learn that “See Jane Run” is common, where it may never see “Jane Run See”. If it ever does, that sequence can be identified as an anomaly.
We experimented using this same N-Gram approach for intrusion detection on an HPE NonStop server. The goal was to profile a system and identify normal behaviour to quickly detect anomalous activity, which then can be further analyzed for context. Detection methods based on n-gram models have been widely used for the past decade.
Using a sample data set of 2.2 million XYGATE Merged Audit events, we identified a vocabulary of 31 unique operations. (READ, RUN, WRITE, STOP, GIVE etc). Of unique user sessions, 31% contained 3 or more command operations during the session. We identified 359 unique 3-gram sequences out of a possible 29,791 combinations – which accounts for 1.2% of all possible combinations. For example, we frequently saw “READ+WRITE+WRITE” in a sequence pattern.
We also expanded our experiment to 4-gram operations. We identified 797 unique 4-gram operations in the same data out of a possible of 923,521 combinations accounting for .08% of the possible combinations. To put this in context, over 99% of the possible sequence patterns in a 4-gram may be an anomaly or indicate a system compromise.
Without machine learning algorithms, alerting on security incidents mainly relies on static rules within your alert system. For example, if a user attempts to read a secured file they don’t have access to, an alarm is generated. This method becomes unsustainable as the data gets more voluminous and patterns grow more complex. You would need to program every single pattern and variation to accurately generate alerts on suspicious behavior. Using machine learning, this data can be used to train algorithms to identify anomalous patterns so there isn’t a need to rely on programming for every single situation.
For HPE NonStop servers, the XYGATE Suite of products are able to generate the necessary data necessary for analytics. It is not only important to generate data, but to collect and store it using XYGATE Merged Audit. Not discard what you don’t need. XYPRO’s newest analytics solution, XYGATE SecurityOne (XS1)is the only solution in the market that ingests NonStop security data, identifies anomalous patterns and raises alerts based on the context of the incident pattern detected.
Referring back to a previous article: “Proactive Security and Threat Detection – it’s not That SIEMple”, we projected the ROI over a three year period of a large, US financial institution with a multi-node NonStop environment. Those that invested in analytics for investigating “in flight” activities with real correlation and the proper contextualization, can free up resources by nearly 80%.
The Bottom Line
Financial Analysis/Cost Savings
|Benefit||Year 1||Year 2||Year 3||TOTAL|
|Security Ops Improvements||$66,560||$68,557||$70,614||$205,731|
|Threat Intelligence Savings||$47,600||$49,028||$50,499||$147,217|
Steve Tcherchian, CISSP, PCI-ISA, PCIP is the Chief Product Officer and Chief Information Security Officer for XYPRO Technology. Steve is on the ISSA CISO Advisory Board, the NonStop Under 40 executive board and part of the ANSI X9 Security Standards Committee. With over 20 years in the cybersecurity field, Steve is responsible for strategy and innovation of XYPRO’s security product line as well as overseeing XYPRO’s risk, compliance and security to ensure the best experience to customers in the Mission-Critical computing marketplace. Steve is an engaging and dynamic speaker who regularly presents on cybersecurity topics at conferences around the world.