p.p1 project is to develop an efficient and

p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica; color: #454545}
p.p2 {margin: 0.0px 0.0px 2.0px 0.0px; font: 14.0px Helvetica; color: #454545}
p.p3 {margin: 0.0px 0.0px 2.0px 0.0px; font: 12.0px Helvetica; color: #454545}
p.p4 {margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica; color: #454545; min-height: 14.0px}
li.li1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica; color: #454545}
span.s1 {font: 14.0px Helvetica}
ul.ul1 {list-style-type: disc}

2018 International Conference on Computer Communication and Informatics (ICCCI -2018), Jan. 04 – 06, 2018, Coimbatore, INDIA 
A MACHINE LEARNING APPROACH TO THE 
DETECTION AND ANALYSIS OF ANDROID MALICIOUS APPS 
SHIBIJA K 
M.Tech, Information Security and Cyber Forensics SRM Institute of Science and Technology, Kattankulathur Chennai, India [email protected] 
Abstract— Today, the use of mobile phone is growing in all the areas and unfortunately, it made the mobile phones a continuous target of cyber attackers. The main source of these kinds of attack is the malicious applications which a user will be downloading from trusted mediums such as Playstore, App store and all. Considering the millions of applications, the play store is having, it is impossible to identify which one is malicious and which one is not for a user. Even after the installation, the user will not be able to understand the activities the application will be performing in the mobile device. A lot of problems are arising nowadays because of this and a lot of confidential information is getting leaked from the mobile device. So, it is important to have a platform where it should be able to distinguish a malicious app from the set of benign app. 
This system is a mobile android application which will be working based on machine learning. The application will perform both static and dynamic analysis to identify the malicious activities of an application. The static analysis is mainly focused on the manifest.xml file of an Android application and the dynamic analysis will be based on the actions it will be triggering while running on a mobile device. The system can combine both static and dynamic analysis results. The main aim of this project is to develop an efficient and effective android mobile application with a high success rate of distinguishing malicious from benign applications. 
Keywords—Android, Malicious Apps, Machine Learning 
I. INTRODUCTION 
It is always an open war between the attackers and defenders. The defenders will make use of new technologies to stop the attackers and the attackers will try their level best to bypass the wall created by the defenders. For example, when anti-virus makers came up with signature analysis to protect the platforms, the attackers started creating new/encrypted signatures to bypass that. This made the need for a new technique and that is what we are trying to implement in our 
978-1-4673-8855-9/17/$31.00 ©2018 IEEE 
JOSEPH RAYMOND V 
Assistant Professor (O.G) I.T. Department
SRM Institute of Science and Technology, Kattankulathur Chennai, India [email protected] 
system using the Machine Learning Technique considering Machine Learning is the future. 
Today we say Machine Learning as the future. The reason is that, if you search around, you will understand that there is a lot of data everywhere. Starting from text messages to Facebook, email, maps and the list goes on. So, it became very necessary to manage these data’s in an efficient way. If you consider humans, there is a limit for data a human can manage. So, there is one way left and that is the Machines Learning. A machine learning is the ability of a machine to learn without being explicitly programmed. It’s like, if you tell the machine to perform one task 2 times repeatedly, 3rd time the machine will do it automatically and the 4th time it will do it better than the previous time. If that good the machine learning is then the outcome we will be getting once we use this concept for developing a mobile android application that can be used to differentiate malicious apps and benign apps should be more efficient. That is what our aim is and here, we will be developing a mobile application with the purpose of detecting and analyzing malicious apps and it will be working on the basis of Machine Learning. 

A. 
II. APP ANALYSIS 
App Compatibility 
One familiar word that we will come across while developing an android app is the “compatibility” and here it is the app compatibility. Android apps can run on many devices, starting from phones, tablets, and television. It is bound to have some variances in the features based on the devices. So, it is very important to consider whether your application is compatible with all the kind of devices. As android can run on many device configurations and not all features are available on all the devices. Considering this app will be using machine learning, it is important to make sure the algorithm will not get affected because of this variance. 

 
C. 
Root of Trust 
Dynamic analysis is also known as behavioral analysis as it will be detecting whether a file is malicious or not during runtime. This is based on the analysis of the behavior of a file while it is running. In order to perform an effective dynamic analysis, the android app should be able to track a list of activities the targeted app will be engaging to while it runs. These activities mainly include monitoring the network traffic, amount of battery it will be using during its run and other background activities. The malicious app detection system should also be able to continuously extract activities from API call event logs as some application will not showcase their malicious activity immediately after the installation. The behavior analysis also can take advantage of the root of trust to produce better results. 
Figure 2. Stages of behavior analysis 
2018 International Conference on Computer Communication and Informatics (ICCCI -2018), Jan. 04 – 06, 2018, Coimbatore, INDIA 
B. Static Analysis of Apps 
Static analysis will examine the malware without running it. It will make use of different tools and techniques to determine whether a file is malicious or not. In this system, static analysis will happen before the installation of the application and static analysis is based on the permissions in the manifest file. In Android applications, an apk file is the file format used for installing the software on the Android operating system just like .exe extension used by windows software. The apk contains a manifest.xml file that has all the permissions of the application. This what we want for static analysis. Static analysis will examine this manifest.xml file to determine the malicious behavior of an application using machine learning. 
Figure 1. The various stages of protection of the proposed system: analyze the downloaded app using static analysis and check for the app behavior using dynamic analysis 
Another important term which comes into picture while you talk about the manifest.xml file and it is the Root. Rooting refers to the act of obtaining access to commands, system files, and folder locations that are usually not available to the user. Rooting can be treated as an escalation of user privilege to administrator privilege which will give the user full access to the mobile device. In this our case, the rooting will help our application to access the manifest.xml file that will be given as an input to the static analysis. Therefore, Root of Trust is needed for this application to function properly on the web application. 
D. 
Dynamic Analysis of Apps 

 

 
2018 International Conference on Computer Communication and Informatics (ICCCI -2018), Jan. 04 – 06, 2018, Coimbatore, INDIA 
E. Combining Static and Dynamic Analysis 
This system can combine the results of both static and dynamic analysis. Before the installation of an application, our malware detection system will perform a static analysis and will make a threshold value out of the result it will be obtaining. This threshold value will also get passed to the dynamic analysis and there it will make use of this threshold value along with its own results to produce a final result. Basically speaking, the final result will be a combination of both static and dynamic analysis. 
III. PROBLEM STATEMENT 
Most of the existing commercial anti-virus applications are based on the signature analysis. They will do the matching of extracted signature of an application with the already available signatures in the database. The problem with such applications is that they are vulnerable to zero-day exploits as nowadays the malware writers are capable of creating a new signature by their own to bypass the anti-virus software. Furthermore, they can encrypt or obfuscate the malicious code to make the signature analysis more difficult. There is a security check done by the play store to stop the uploading of malicious applications into it. But the truth is that there are a lot of malicious applications available in play store even after the security check. 
IV. PROPOSED SOLUTION 
The existing signature-based analysis is based on the signature stored in the database of an anti-virus application and it is not all a tough job for malware writers to bypass it. This is because the malware writers can create a signature by their own or they can modify it during the runtime. My system implements ML methodology for detection and analysis of malicious applications. This approach helps us filter harmful applications more accurately and effectively as it contains both static and dynamic analysis. So the above issue is also addressed in the proposed system as it is based on the permissions in the manifest file rather than the signature in static analysis and based on the malicious activities the app will be triggering in the dynamic analysis. In this system, I will be creating a rule set by my own so that it can give maximum success rate. This system will have the functionality to combine both static and dynamic analysis result. This also allows us to adapt to the new attacking methodologies being implemented by cybercriminals constantly. 
V. THE EXPECTED RESULT 
The expected result at the end of this project work is a fully developed application with the capability of finding whether an application available in the play store is malicious or not with the use of Machine Learning Technique as the primary method. The expected minimum success rate for the system is above 90%. 
VI. RELATED WORK 
Most of the existing techniques are based on the signature analysis and this, in turn, is based on the signatures stored in the database of an anti-virus application. The antivirus software will do a comparison of the new extracted signature with the existing signature in the database. If a match found, it will consider it as a malicious application. This makes it vulnerable to zero-day exploits as it is not a tough job for malware writers to create a new signature to bypass the existing technique. Apart from creating a new signature, the malware writers can also modify the signatures during the runtime. 
The truth is that there are a lot of malicious apps in the Play store even after performing a security check before uploading any application to it. This increases the need for an application which can find the maliciousness of another application. This application should be able to monitor the targeted application continuously. The system will be working on the basis of a machine learning algorithm and it should be able to identify the patterns in the program properties in order to carry out the operation. The main aim is to achieve a system with a success rate of at least 90%. 
VII. CONCLUSION 
As the technology is opening so many new methods for the attackers, we also need to utilize the same technology to implement counter methods to safeguard our privacy from the attackers. When anti-virus software makers started using signature analysis to find a malware, the attackers started creating a new signature to bypass such solutions. This made such solutions as less reliable. So the need for introducing a different solution which is more reliable, secure and efficient is very high. That is where the Machine Learning technique comes into play. Machine Learning is the future and in this system, the Machine-learning technique will look for patterns in the program properties. This will be the base for differentiating between a malicious application and a benign application. But the problem will arise when malware writers will start developing new techniques to bypass the algorithms that we implemented using the machine learning. In other words, the future will be going to be an open war between the malware authors and the defenders. 
VIII. FUTURE DIRECTIONS 
Our proposed system can be implemented to expand into the cloud thereby enhancing the reach of security by providing protection even for low run devices. Also, our system can be modified in such a way that it will be able to prevent unauthorized access of devices, financial crimes carried out from mobile devices and mobile phone spoofing. 
IX. REFERENCES 
1 Nayeem Islam ; Saumitra Das ; Yin Chen(2017) On-Device Mobile Phone Security Exploits Machine Learnin IEEE Pervasive Computing ( Volume: 16, Issue: 2, April-June 2017 ) 
2018 International Conference on Computer Communication and Informatics (ICCCI -2018), Jan. 04 – 06, 2018, Coimbatore, INDIA 

2  A. Regen scheid, “Roots of Trust in Mobile Devices,” Nat’l Inst. Standards and Technology, Feb. 2012; http://csrc.nist.gov/groups/SMA/ispab/documents/minutes/2012- 02/feb1_mobility-rootsof-trust_regenscheid.pdf

3  D. Arp et al., “DREBIN: Effective and Explainable Detection of Android Malware in Your Pocket,” Proc. Network and Distributed System Security Symp. (NDSS), 2014; www.sec.cs.tu-bs.de/pubs/2014- ndss.pdf.

4  Shuang Liang and Xiaojiang Du “Permission-Combination-based Scheme for Android Mobile Malware Detection” Dept. of Computer and Information Science Temple University, Philadelphia, PA 19121, USA {shuang.liang2012, dux}@temple.edu, IEEE

5  Ambra Demontis, Marco Melis, Battista Biggio, Davide Maiorca, Member, IEEE, Daniel Arp, Konrad Rieck, Igino Corona,Giorgio Giacinto, Fabio Roli “Yes, Machine Learning Can Be More Secure! A Case Study on Android Malware Detection”. IEEE Transactions on Dependable and Secure Computing ( Volume: PP, Issue: 99 )

6  Xiang Li , Jianyi Liu , Yanyu Huo , Ru Zhang , Yuangang Yao “An android malware detection method based On androidmanifest file”, Proceedings of CCIS2016, 2016 IEEE

7  A. Kliarsky, Responding to Zero Day Threats, white paper, SANS Inst., June2011;www.sans.org/reading-room/whitepapers/incident/responding- zeroday-threats-33709

8 T.Wang et al., “Jekyll on iOS: When Benign Apps Become Evil,” Proc. 22nd Usenix Security Symp. (SEC), 2013;www.usenix.org/conference/usenixsecurity13/technical- sessions/presentation/wang_tielei. 
9 J. Oberheide and C. Miller, “Dissecting the Android Bouncer,” SummerCon, 2012. 
10 N.J. Percoco and S. Schulte, Adventures in Bouncerland: Failures of Automated Malware Detection within Mobile Application Markets, Black Hat, 2012. 
11 N. Idika and A.P. Mathur, A Survey of Malware Detection Techniques, tech. report, Purdue Univ., 2007. 
12 A.P. Felt et al., “A Survey of Mobile Malware in the Wild,” Proc. First ACM Workshop Security and Privacy in Smartphones and Mobile Devices (SPSM), 2011, pp. 3–14. 
13 J. Bickford et al., “Security versus Energy Tradeoffs in Host-Based Mobile Malware Detection,” Proc. 9th Int’l Conf. Mobile Systems, Applications, and Services (MobiSys), 2011, pp. 225–238. 
14 S. Poeplau et al., “Execute This! Analyzing Unsafe and Malicious Dynamic Code Loading in Android Applications,” Proc. 20th Annual Network & Distributed System Security Symp. (NDSS), 2014; https://cs.ucsb.edu/~vigna/publications/ 2014_NDSS_ExecuteThis.pdf.