Big Data: It's Not About Size A Means to an End, Big Data Is Beyond Hype

A shift to what Gartner analyst Neil MacDonald calls context-aware security will drive the big data trend and aid in better defending against threats.

MacDonald says that context-aware security will enable organizations to mitigate threats by using modeling and analytics to make a security decision, even when other tools such as authentication deems transactions safe.

For example, if a customer about to conduct an online money transfer has produced valid credentials, should the transaction be allowed? "What else do I have to go on," he says in an interview with Information Security Media Group's Eric Chabrow [transcript below].

Other context information that could aid in allowing or denying that transaction includes knowing that the IP address was based in China, the time of day was 1 a.m. local time, not a normal time for this user to be logging in, and a prior login was done in the U.S. six hours ago. "All of a sudden what appears to be a legitimate security decision ... we deny because the context points to anomalous behavior," he says.

"Context will drive vast amounts of data for information security," MacDonald explains. "Time of day, location, device reputation, URL reputation - all will be factors into real-time security decisions."

In the interview, MacDonald, who also is a Gartner fellow and distinguished analyst, explains:

  • How big data is not just about size;
  • Use of increased monitoring to compensate for the loss of direct control over data and systems;
  • Why it's important for IT security and operations to collaborate in the era of big data.

MacDonald, who is on Gartner's information security and privacy research team, joined the company in 1995. Previously, he worked as network specialist responsible for the planning, deployment, security and support of a 9,500-node multiprotocol and multiserver network.

Defining Big Data

ERIC CHABROW: One of the initial things that you said [at a recent Gartner Security Summit session entitled "Big Data and Security: Integrating Security and Operations Data for Improved IT Intelligence"] was that a lot of people don't quite understand what big data is. Why don't you provide us with what you feel big data is?

NEIL MACDONALD: There's a lot of hype around big data and many people assume it's just all hype. I disagree with that and certainly as it pertains to information security, there are a couple of reasons people believe it's hype. Number one - they say we've dealt with these issues before. My counter argument is, it may have been dealt with before, but it's new to us. [It's] new because Moore's Law has lowered the bar, and open source technologies like Hadoop or Cassandra or MapReduce have lowered the barrier to entry for typical organizations to analyze vast amounts of data. Moore's Law has given us more processing power and more storage to deal with vast amounts of data. Number three is we need more data to make better information security decisions. So I disagree that this is all hype. I believe it actually is very real.

My other comment on big data is people tend to focus on the word big and they assume it's always about volume. In Gartner's definition of big data, there are four attributes: volume, velocity, variety and complexity. Any one of those attributes could create a new class of data processing problems. Collectively, [when] the term big data is used it encompasses all four. Most people tend to focus on the volume, which I also think leads them astray.

Context-Aware Security

CHABROW: You spoke of different drivers in your presentation. One was a shift to context-aware security. What do you mean by that?

MACDONALD: One of the driver's I talked about was this shift to context-aware security, a driver towards big data and big data analytics for information security. What is context-aware security? It's the use of context data at the point in time an information security decision is made in order to make a better information security decision. In the presentation today, I used a consumer-banking example where a lady was transferring money from one account to another. She had logged in, provided the proper credentials, was transferring money and my question to the audience was, "Should I allow or deny this transaction? She provided valid credentials. What else do I have to go on?" What's missing from the example is more context. Then I added additional context to this decision, such as the IP address resolved to China; the device that she was logging in from had not been previously profiled so it was unknown; the time of day was 1 a.m. local time, not a normal time for this user to be logging in; the prior login was in the United States six hours ago, so it was physically impossible for this person to be in China. All of a sudden what appears to be a legitimate security decision, "Yes I should allow, she has credentials," all of a sudden we deny because the context points to anomalous behavior.

Context will drive vast amounts of data for information security. Time of day, location, device reputation, URL reputation - all will be factors into real-time security decisions. This will be one of the primary drivers of the shift towards big data for information security.


CHABROW: Does this mean organizations are going to have to rely on their various vendors?

MACDONALD: Where will all of this context come from? It will require integration between different vendors, and unfortunately there's no good standard for context information today. We're talking about pulling information, let's say of identity context, from identity systems, linking to things like Active Directory. I'll pull device context using either an IP or URL reputation. There are providers of that data that I can subscribe to.

Today, unfortunately there's no single source of context, but there are providers at each layer in the stack that have context that would be valuable. There are device reputation vendors. There are IP reputation vendors. There are URL reputation vendors. There are vendors that help with transaction anomaly monitoring. There are vendors that help me monitor sensitive data. Your point is these are different sources and that's why part of the story today is talking about bringing this together and most people are doing that in the short term with security information and event management systems, to bring together data across these different silos.

CHABROW: You say that not all these same types of services provide the ultimate solution that people seek?

MACDONALD: The SIM vendors will not always take on the role of a big data analytics provider for information security. Some will; some won't. There are multiple issues you run into. How much information can the SIM vendor handle in their architecture before the performance becomes unacceptable? The results take too long to deliver meaningful intelligence. Another is the licensing model of the SIM vendor and are they going to penalize me for moving to bigger and bigger data sets? Another consideration is whether or not they handle raw data versus normalized data, which is what traditionally security information event management systems do in order to deliver near real-time performance. [They] normalize the data sets, but in many cases we're seeing clients also wanting to retain the raw data for further post-capture analytics as well, so the SIM provider needs to have an option for that.

Not all SIM providers will make this transition, nor will they do it with a reasonable cost or performance. Some will; some won't. And that's why I would be asking my SIM provider these types of questions.

Security Staffing

CHABROW: Who in the staff will be charged to integrate all these types of systems?

MACDONALD: Type A organizations - which Gartner characterizes as leading edge technologically aggressive organizations - 40 percent of those by 2016 will create and staff a security analytics role, or you could call them a security data scientist. Whatever term you want to use for that role, someone in that organization will have the responsibility of mining these data sets looking for patterns of anomalous behavior. Not every organization will have that title. Notice I said 40 percent of type A organizations. Type B or type C organizations that tend to be technology followers or technology laggards, or slower in adoption, I believe there will be service provider markets emerge that will do this type of analytics for me on my behalf as a service, and that's a new and emerging market.

Control and Visibility

CHABROW: One thing I found interesting was an analogy that you made in your presentation about the new Airbus A380 which dealt with compensating for the loss of intimacy or control that people now feel they may have over or they used to feel they had over securing information but they would be losing now.

MACDONALD: Exactly. If you look at the Airbus A380, what's referred to as a fly-by-wire system, the pilots do not have direct control of the navigational services of the aircraft. It's simulated but they don't have direct hydraulic control of the rudders and the lifts. It's the illusion of control, and we give them that confidence without direct ownership or control of these avionic surfaces through instrumentation and monitoring. The cockpit is full of monitors, thousands of sensors throughout the aircraft, 380 miles of cabling for the sensors, giving the pilots immersive amounts of information. You're compensating for the loss of direct control with visibility. The analogy then goes directly to the IT department. We don't own or control these cloud providers, but we can compensate for the loss of direct controls with vast amounts of monitoring and visibility. I think it's a very important point because that will drive this move towards big data and big data analytics for security. It's just the sheer amount of data we'll need to give that confidence and that visibility to IT operators, just like the pilots in the A380.

IT Security and Operations

CHABROW: You made a point about a collaboration among IT security and operations, I guess both at the vendor level and within organizations too. Why is that important in this era of big data?

MACDONALD: Big data is being brought to solve the next generation of information security problems, but big data is being brought to solve the next generation of IT operations problems, things like application performance monitoring and root-cause analysis and behavioral monitoring of applications and behavioral learning. It's the same type of problem. Help me understand where anomalous behavior exists in my systems. Sometimes it's going to be security issues; sometimes it's going to be an operational issue, but the fact that it's anomalous behavior is of interest to both.

So why do we duplicate eventing and monitoring across these two environments? Isn't there synergy by having a common eventing and monitoring system that can be used across IT operations and security? Some of the operational context, things like CMDBs, configuration management data bases, and dependency mapping and business level of the asset, these are already stored in operational repositories. That's very valuable information to bring to bear on security dashboards. Which is a more serious vulnerability that represents more risk: a vulnerability in a Window's system that has an internal web server or a vulnerability in a Windows system that's hosting a financial transaction for an external customer? We need that type of business context and operational context to help us prioritize where to focus our efforts from a risk perspective.

Around the Network