Tanya of tags and markers. XML data is

 Tanya Akutota1 , Swarnava Choudhury21 UG Student, Computer Science DepartmentNational Institute of Technology- Silchar2UG Student, Electronics and Communication DepartmentNational Institute of Technology-Silchar—————————————————————————***—————————————————————————Abstract- Automation and digitization of activities haveresulted in a huge volume of data generated, called Big Data.Big Data helps many organizations gain useful insights, but atthe same time, there are two types of risk involved: SecurityRisk to Big Data itself, and Privacy Risks of users andIndividuals. In this paper, the characteristics of big data, itsapplications, and the security and privacy challenges that comewith it are discussed.

This paper also explores a novel Big DataSecurity Analytics method, called User Behavior Analytics, itsfunctioning, use cases and advantages.Keywords: Analytics, Big Data, Challenges, Security, SIEM,UBA1. INTRODUCTION 21st century has seen the human lives shiftingtowards digitalization; automated machines in industries,cellular phones, social networks, etc.

Best services for writing your paper according to Trustpilot

Premium Partner
From $18.00 per page
4,8 / 5
4,80
Writers Experience
4,80
Delivery
4,90
Support
4,70
Price
Recommended Service
From $13.90 per page
4,6 / 5
4,70
Writers Experience
4,70
Delivery
4,60
Support
4,60
Price
From $20.00 per page
4,5 / 5
4,80
Writers Experience
4,50
Delivery
4,40
Support
4,10
Price
* All Partners were chosen among 50+ writing services by our Customer Satisfaction Team

, all have led us to this.Such huge digitization means generation of huge, perhapscomplex sets of data every day. These large and complex datamaybe the data from sensors, browsing reports, users’statistics or anything which are increasing exponentially witheach passing day. As the inventor of World-Wide-Web, TimBerners-Lee said, ‘Data is a precious thing as they last longerthan systems’, Big Data Analytics (or BDA) is the tool whichactually helps us in realizing the power of such large andcomplex datasets. The conventional database tools are notable to process such amount of heterogeneous data. WhereasBig Data Analytics uses the power of parallel processing toextract an enormous amount of valuable information, likefuture trends of market, developments in life science, etc.

,from the data gathered from all possible and availablesources. A Big Data has many unique characteristics whichset it apart from a conventional database system. The types ofdata they work upon varies. There are basically 3 majorclasses of data, namely:1. Structured data- These data are present in the formof rigid relational models, with specific data typesand sizes.

Conventional database techniques areefficient at this level.2. Semi-structured data- A type of structured data, butit is hierarchical in nature with the use of tags andmarkers.

XML data is a perfect example of such data.3. Unstructured data- It doesn’t follow a predefinedmodel. The data vary widely; this is where Big DataAnalytics comes in.A Big Data can be best described using 5 characteristics, morepopularly known as “the 5 V’s”:? Volume- the scale of data; from Exabytes toZettabytes!? Velocity- rate at which streaming data is generatedand analysed.? Variety- different forms of data- from variousexternal or internal sources.? Veracity- the uncertainty of data, i.e.

, the differentprobabilities a value can take.? Value- analysis and visualization of all the abovecomponents gives out the final data, the preciousinformation referred to as the Value. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 04 Issue: 10 | Oct -2017 www.

irjet.net p-ISSN: 2395-0072© 2017, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 1545With all the big companies and industries adopting BDA moreand more with time, data is being generated at a rapid rate.Let’s look at this adjacent comparison. The rate at which datais being generated is much steeper than the famous curve ofMoore’s law, which says that the computer’s capacity willdouble-fold every 2 years, at half the cost. Moore’s law ispartially responsible for this rise in data generated, alongwith other factors.

This wild growth in data overlooks the securitythreats that such huge well of information may attract toitself. Also, is our personal data getting compromised? Theseare the privacy issues that we need to worry about. Wediscuss the limitations BDA faces in the following section.2.

BIG DATA AND SECURITYThe digital data generated is so huge and random,that sometimes it may contain the personal information ofthe users, thus compromising their privacy. Also, the datagenerated needs to be kept safe and far from the reach ofhackers who have the ability to use such vital information fortheir own benefits.2.

1 Challenges in BDAThe most important challenges faced by the big datatechnology are:? Random Distribution- The distribution of datastorage and processing is vital in parallel processing,failing which results in security problems.? Privacy- Currently, Big Data Analytics treats all thedata with same priority. Encoding the more valuabledata may prevent any risk of a sniffing attack.? Computations- The computations performed on bigdata determines crucial results. Any tampering withthe computation may lead to deceiving results.? Integrity- The raw data fed to the Big Data Analyticsmust be checked for genuineness of the data beforerelying on it.? Communication- The nodes and clusters in Big DataAnalytics communicate over ordinary networks,making the data vulnerable to being seized.

? Access Control- The addition or removal of nodes,and privileges among various nodes must becontrolled and supervised.2.2 Techniques to ensure security:The above challenges can be tackled by taking upprecautionary measures. There has been extensive researchgoing on, to make the big data systems more secure:? R. Toshniwal et al have presented in 8 an improvedway of encrypting the big data.

It emphasises onencrypting data selectively, instead of encryptingentire database.? The use of Virtual Private Networks (VPNs) willprevent the chance of sniffing out data from thecommunication cables.? A few of the nodes should be used as ‘trap-nodes’ orHoney-pots. The hacker is deceived and hisbehaviour can be analysed for improving securitymeasures.

? Segregating the huge amount of data before it’sanalysed, so as to reject any sort of personal datacollected from random sources.? Nodes in a cluster should have proper authorisation.Authentication software like Kerberos distinguish amalicious node from an authorised one.3. USER BEHAVIOR ANALYTICSBehavior analysis systems first appeared in in the early2000’s as a tool to help marketing teams analyze and predictcustomer buying patterns- they used the data andinformation based on users behavior to customize and tailortheir marketing strategies. User Behavior Analysis or UBA isa cyber-security tool that helps the detection of insiderthreats, targeted attacks, and financial fraud.

UBA solutionslook at patterns of human behavior, and then apply ML based International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 04 Issue: 10 | Oct -2017 www.irjet.net p-ISSN: 2395-0072© 2017, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 1546algorithms and statistical analysis to detect meaningfulanomalies that indicate potential threats Besides UBA, someof the other security terms are:SIEM: Security Information and Event ManagementDLP: Data Loss PreventionNBA: Next Best ActionEDR: Endpoint Detection and ResponseCASB: Cloud Access Security Brokers.

3.1 UBA for Security:According to the research firm Gartner, “User BehaviorAnalytics (UBA) is where the sources are variable (often logsfeature prominently, of course), but the analysis is focused onusers, user accounts, user identities — and not on, say, IPaddresses or hosts. Some form of SIEM and DLP post-processingwhere the primary source data is SIEM and/or DLP outputsand enhanced user identity data as well as algorithmscharacterize these tools. So, these tools may collect logs andcontext data themselves or from a SIEM and utilize variousanalytic algorithms to create new insight from that data.”Through learning behavior and tracking anomalies, UBAmakes it possible to detect and identify security risk orthreats such as:? Credential compromise? Rogue / insecure Insiders? Privileged user abuse? Malicious hackers? Breaches? Password brute force attacksSome of the popular UBA vendors in the present day marketinclude: Caspida (Splunk), Exabeam, Fortscale, Gurucul,Rapid7, Securonix, ObserveIT, Microsoft ATA, namely.3.2 Functioning of UBA:? First, UBA tools determine a baseline of normalactivities specific to the organization and itsindividual users.

? Second, they identify deviations from normal. UBAuses big data and machine learning algorithms toassess these deviations in near-real time. They shedlight on cases in which abnormal behavior isunderway.In most standalone UBA vendors these days, there’s a coreengine, running specialized analytics algorithms, that is feddata from existing sources and, and it analyzes the data.

TheAnalytics Algorithms are the distinguishing factors of thetools. The findings are displayed on a dashboard, and thetarget is to provide actionable information. These tools don’ttake any defensive or corrective action themselves, theyrather provide security operators with the insight to decide ifan action is required. However, it is plausible for integratedtools, such as UBA + Firewall + Defensive systems to beavailable in the near future.UBA collects various types of data, such as user roles andtitles — including access, accounts and permissions — useractivity and geographical location, and security alertsMachine learning algorithms allow UBA systems to eliminatefalse positives and provide clearer and more accurateactionable risk intelligence.

Here’s a list of crucial features for UBA software that canfunction efficiently:? Process vast amounts of user file and email activity:The UBA system should be able to search throughand analyze the activity of many users across hugevolumes of data.? Determine a baseline of “normal” file and emailaccess activities, based on historical data about theemployees’ activities. The UBA engine thereforeshould have intimate knowledge of file metadatawith access times, users, permissions, etc. to produceaccurate profiles of average user behavior.? Real Time Alerts.

The UBA software must be able totrack file activities across a large user population in International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 04 Issue: 10 | Oct -2017 www.irjet.net p-ISSN: 2395-0072© 2017, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 1547real-time. The decision algorithms must be fastenough to be on par with real-time activity.3.

2 How is UBA different from SIEM?UBA is a close variant of SIEM. SIEM mainly relies onanalyzing events captured in firewalls, OS, and other systemlogs in order to spot interesting correlations, usually throughpre-defined rules. By prioritizing and focusing on userbehavior instead of system events, UBA builds a profile of anemployee based on their usage patterns, and sends out analert if it sees abnormal user behaviorUBA tools utilize both basic and advance analytics approachranging from rules-based models to Deep machine learning. ASIEM tool may or may not utilize these advances methods.UBA engines work with narrow and highly relevant data foranalysis. This results in higher quality of alerts with less falsenegativesand false-positives. Whereas, SIEM tools take in anoverwhelming amount of data only to generate more noise intheir alertsUBA tools build profiles for Users’ behavior over a period oftime and uses that as a baseline to detect any maliciousactions by recording any abrupt change in their behavioralpatterns. This functionality is not available in all SIEM tools.

3.3 Use Cases:Situations and possible scenarios where UBA can play a keyrole are discussed hereunder. They are (but not limited to):1. Account hijacking and Credential Compromise: Here,attackers exploit vulnerabilities through attacks such asPass-the-Hash (PtH), Pass-the-Token, golden ticket, BruteForce and Remote Execution to gain access to usercredentials. the underlying machine learning algorithmshelp detect these by inspecting various parameters liketimestamp, location, IP, device, transaction patterns etc.to identify any deviation from the normal behavior of aparticular account its activities.

2. Privileged access compromise: Privileged users accountcan be at risk particularly because they might have accessto highly sensitive or classified information. UBA shouldbe able to detect these scenarios, such as using HPA toassign special or elevated privileges to the user’s ownaccount or, transactions outside the window of checkoutand check-in time window3. Insider Threat: A rogue insider continues to be a sourceof data loss. Using ML behavior models Along with datarisk monitoring and identifying high-risk profiles, UBAcan reveal anomalies in data that humans could nototherwise recognize or detect.

4. Data Exfiltration Alerts: UBA solutions can identifyknown patterns such as: sensitive content downloadedand copied to external storage devices, large amounts ofsource code checked out from source code repositoriesand file uploads to cloud storage, emails to personalaccounts, access to competitor websites, etc.5. Account Lockouts: UBA should also help identify if anaccount lockout is an honest mistake or an attempt byhackers to compromise the details.

6. Continuous Session Tracking: More visibility is providedby tracking the user session state, even when a usernavigates across different sources or applications, usingdifferent accounts at the same time. This helps reducefalse positives.7. Anomalous behavior and watch lists: UBA addressesanomalous behavior with watch lists to quickly profileand keep track of unknowns and apply escalatingpredictive risk scores.

Machine learning behavior modelsare designed to deliver feedback on false positives8. Aggregating Risk Scores: Unlike SIEM, UBA doesn’tgenerate a huge number of alerts. Rather, it aggregatesthe user risk scores at the user level.3.4 Advantages of UBA:? More efficient than SIEM in terms of detectingmalicious user behavior? It doesn’t collect data. Rather, it uses data collectedby other security tools.

? As opposed to CASB gateways, UEBA actually tracksevery online and offline transaction, activity, and logs? UBA is designed to reduce false positives with newtypes of algorithms. These algorithms concentrate onaggregate of anomalies instead of each and everyanomaly? UBA is more efficient in pointing out and alertingabout insider threats ( E.g.: Such as Ed Snowden’stheft of critical information)? Allows more comprehensive management and riskhandling of privileged accounts. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 04 Issue: 10 | Oct -2017 www.

irjet.net p-ISSN: 2395-0072© 2017, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 1548REFERENCES1 Jon Olstick, “Time to Consider User Behavior AnalyticsUBA,” https://csooonline.com , January 25th , 20162 Andy Green, “What is User Behavior Analytics?”https://blog.varonis.com/what-is-user-behavioranalytics,July 21st , 20153 Heather Howland,” What is UEBA and Why Does it MatterIn Threat Detection? Part 1 – Blog Series”https://blog.

preempt.com/part-1-what-is-ueba-andwhy-does-it-matter-in-threat-detection-blog-series,September 22nd, 20164 Amit Singh “Top 5 User Behaviour Analytics (UBA)Vendors at RSAC 2017” Fire Compass(https://www.firecompass.com/blog/top-5-userbehaviour-analytics-uba-vendors-at-rsa-conference2017/),January 26th, 20175 Johna Till Johnson “User behavioral analytics tools canthwart security attacks” TechTarget(http://searchsecurity.techtarget.

com/feature/Userbehavioral-analytics-tools-can-thwart-security-attacks)6 Margaret Rouse” User Behavior Analytics (UBA)”http://searchsecurity.techtarget.com/definition/userbehavior-analytics-UBA7 Gurucul – “User and Entity Behavior Analytics Use Cases”e Paper White Paper 20178 Raghav Toshniwal “Big Data Security Issues andChallenges” IJIRAE Issue 2 Vol. 2, February 20159 William El Kaim “Introduction to Big Data” , October2016(https://www.slideshare.net/welkaim/introduction-tobig-data-65870623)10 Youssef Gahi “Big Data Analytics: Security and PrivacyChallenges” IEEE 201611 K.

Shanmugapriya “Security Issues Associated with BigData in Cloud Computing” IJCSIT Vol. 6(6), 2015