Big Data Analytics: Starting Small

Why Experts Recommend an Incremental Approach Fahmida Y. Rashid (ITsecuritytech) • October 14, 2013

Security teams struggling to detect signs of threats hidden in mountains of data are attracted to big data analytics. But experts advise security professionals to take an incremental approach, starting out with smaller projects. That's because the capabilities of the new analytical tools are still evolving.

Security professionals have always worked with large amounts of data, but they've struggled to extract actionable intelligence in a timely manner, says Mike Lloyd, CTO of RedSeal Networks, a network security management company. Big data analytics eventually will help security professionals obtain the information they need about ongoing threats. But the tools cannot yet perform all of the advanced predictive analytics tasks that security professionals want to accomplish, Lloyd contends.

Nevertheless, the latest tools can provide valuable insights in areas such as detecting fraud, identifying suspicious insider behaviors and discovering malicious activity, contends Adrian Lane, an analyst with technology research firm Securosis. "We are just learning what we can do," he says.

Big data security analytics tools are designed to take advantage of the wide variety of security data most organizations already collect. That includes data from security information and event management (SIEM) systems; operating system and user activity logs; user-level transactions; DNS and other network traffic data; and third-party threat intelligence feeds. External content, such as posts on social media networks, e-mails and documents, as well as financial data from credit reporting agencies, can also be analyzed.

Start Out Small

Security teams should start out by using the latest security analytics tools for relatively narrow initiatives, such as detecting fraud, crowdsourcing research about real-world phishing and malware attacks, and conducting forensics investigations after an incident, suggests Steve Durbin, a vice president at the Information Security Forum, a non-profit group developing best practices in security for banks, technology firms and other large organizations.

While the big data analytics tools can't yet be used to help anticipate threats, using the tools to collect forensic data can be a first step toward being proactive, Durbin notes. Security teams can identify the pattern for attack, beginning with the initial reconnaissance attempts, and share that information with counterparts in the industry so that others can take preventive measures, he suggests.

Historically, fraud detection has focused on looking for such factors as odd login times or known bad IP addresses, Lane says. But using big data analytics to scrutinize data from third-party sources, as well as internal data, enables banks, for example, to proactively identify credit risks, such as individuals who have defaulted on loans in the past, he says. Also, big data tools can be used to monitor social media websites and Twitter feeds for comments about fraud and to correlate credit activity with customer information, he adds.

Organizations can also use big data analytics to scrutinize more types of data, such as third-party intelligence feeds and command-and-control data, to look for machines in the network that may be compromised and transmitting sensitive data to remote servers, Lloyd notes. By correlating machine-generated information with full network packet data, IT can go beyond basic traffic analysis to identify what data is leaving the network and how widespread the problem is. This level of analysis can help detect network breaches and ongoing persistent threats, he says.

Hunting Threats

One company that's already diving into using big data analytics is the Depository Trust & Clearing Corp., a financial services transaction clearing and settlement provider. It's using security intelligence tools from IBM to protect financial systems for customers.

The big data analytics tools give the company real-time visibility while adding context to historical activity to help detect network breaches and identify suspicious insider activity, says Mark Clancy, CISO. Instead of looking for specific signatures or outright violations, the company can piece together clues to look for dynamic patterns, he says.

DTCC moved "from a world where we 'farm' security data and alerts with various prevention and detection tools to a situation where we actively 'hunt' for cyber-attackers in our networks," Clancy says.

Beyond SIEM

Lane argues that it's a mistake to dismiss big data analytics as just another name for a next-generation SIEM system or an extension of what SIEM platforms can do.

SIEM systems are commonly used in many large organizations, especially in the financial services, government and healthcare sectors. They can analyze log and event data to find clues to attacks and relationships between seemingly unrelated pieces of information. But unlike SIEM systems, big data analytics tools are designed to be scalable and correlate unstructured data, Lane says.

In moving toward using big data for security analytics, however, organizations must build on what's already in place, rather than ripping out analytics and event management systems already in place, Lane acknowledges.

The Challenges

Data collected for marketing or sales analytics projects can be useful for security analytics as well, Lloyd points out. However, there are some challenges in analyzing all these unrelated sources of information, including data quality and maturity of available tools.

Even the best and most advanced analytics tools are ineffective if there are problems with the source data, Lloyd notes. Poor quality data can make it harder to find relevant pieces of information, Lloyd warns.

Another challenge lies with the analytics tools themselves. Many of the query engines that come with big data analytics tools are not yet mature enough to automate data analysis, he says. These tools can derive key insights only if the organization poses the right questions in the first place. That means someone still has to design the query looking for a specific pattern, Lloyd says. Otherwise, manual analysis is still necessary to find the links between unrelated pieces of data.

"There is a mountain of data, but it's not going to analyze itself," he says.