We We use as case studies groups and

We embark in a quest for modelling a fully distributed environment. We analyse data traces from mobile phones in MobEmu and build an environment that assures consensus, a fully decentralized certificate infrastructure and study the streaming capabilities of some opportunistic routing protocols. We propose for our future work a framework for data analysis that can process Machine Learning algorithms in a distributed manner and uses TensorFlow as back end.The centralized environments and hierarchic structures dictate or traditional way of considering and viewing solutions. Decoupling predicates over a chain of layers gives a decisional tree the chance to reconsider a wrong paradigm at run time. In a centralized system, the master-slave paradigm alleviates our algorithms of perfection responsibility. A feedback loop over one of the layers will suffice to commute a bad result to a satisfying one. But one can only dream at a new infrastructure for modelling entities as being independent and self sufficient. At storage and communication layer we do that by solving the consensus problem . But to design a complete abstraction of a fully distributed environment, we need to be in fully control of our context, both from a global and a local point of view.

This goal can be accomplished by prediction. Recent achievements in scaling the Machine Learning algorithms cite{gradient} are a great instrument for modelling a predictable distributed environment. We use as case studies groups and human social behaviour and vehicular dynamics in real world traffic. Based on these models, we emulate solutions for a certificate infrastructure independent of a root CA, a streaming application and a distributed solution for solving consensus.

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!


order now

One of the greatest enemies of building a framework that can analyse collected data in real time is physical resources. We take on this challenge by building ScienceOpp, a framework that uses TensorFlow as back end, distributes the load of Machine Learning algorithms to workers in a cluster, and monitor the failure status of connected nodes. This will help the data analysis quest as to speed up the solution testing. A speed-up in trial-and-error method used at the time in data science will also improve the capacity of the developer to find the best model for the wanted prediction and hopefully lead to rules in predicting particular domains other than graphics, such as human or vehicular behaviour .As we seek for leverage in interpreting a totally distributed environment, we want to understand the nature and dynamics of traces taken in real social settings.

These traces are taken in academic environment. They might have a higher degree of social aggregation in traces taken in 2012 in Politehnica Bucharest cite{upb_analysis}, or a lower rate of group formation as in an experimental setup organized in 2011 also by Politehnica Bucharest cite{upb_social_aspect}. We also parse and use traces taken at SIGCOMM and a trace taken in St Andrews campus with a very high rate of group forming and NCCU campus with lots of nodes (115) and sparse meetings cite{nccu}. We also exploit and analyse the traces shared by CRAWDAD project: HAGGLE traces at INTEL, CAMBRIDGE and INFOCOM .

The platform used for testing proposed solutions on these traces is MOBEmu. MOBEmu is an emulator developed by Politehnica Bucharest cite{upb_gaussian} . This emulator analyses the encounters between nodes. Its structure is simple and robust, based on the main entity: nodes. MobEmu is not spatial aware, as the emulator specializes in the social dimension of a mobile environment and aims for a high level analysis of algorithms. The time is represented by a while loop that detects the encounter events. As the encounters are detected, the nodes interact by an implementation of a Data Exchange procedure and use message buffers to emulate data transmission. The nodes and the Messages accumulate statistics that are examined at the end of the trace.

As we want to study the environment and algorithms most suited for data streaming and want to validate our proposed solutions for a fully distributed certificate algorithm and a fully distributed consensus algorithm, we implement our own abstraction of nodes with its own initialization and data exchange procedure, modify the statistics analysis to match our purposes and tamper with packet generation at runtime simulation.
As we seek for leverage in interpreting a totally distributed environment, we want to understand the nature and dynamics of traces taken in real social settings.

These traces are taken in academic environment. They might have a higher degree of social aggregation in traces taken in 2012 in Politehnica Bucharest cite{upb_analysis}, or a lower rate of group formation as in an experimental setup organized in 2011 also by Politehnica Bucharest cite{upb_social_aspect}. We also parse and use traces taken at SIGCOMM and a trace taken in St Andrews campus with a very high rate of group forming and NCCU campus with lots of nodes (115) and sparse meetings cite{nccu}. We also exploit and analyse the traces shared by CRAWDAD project: HAGGLE traces at INTEL, CAMBRIDGE and INFOCOM .

The platform used for testing proposed solutions on these traces is MOBEmu. MOBEmu is an emulator developed by Politehnica Bucharest cite{upb_gaussian} . This emulator analyses the encounters between nodes. Its structure is simple and robust, based on the main entity: nodes. MobEmu is not spatial aware, as the emulator specializes in the social dimension of a mobile environment and aims for a high level analysis of algorithms. The time is represented by a while loop that detects the encounter events. As the encounters are detected, the nodes interact by an implementation of a Data Exchange procedure and use message buffers to emulate data transmission. The nodes and the Messages accumulate statistics that are examined at the end of the trace.

As we want to study the environment and algorithms most suited for data streaming and want to validate our proposed solutions for a fully distributed certificate algorithm and a fully distributed consensus algorithm, we implement our own abstraction of nodes with its own initialization and data exchange procedure, modify the statistics analysis to match our purposes and tamper with packet generation at runtime simulation. % vim: set tw=78 sts=2 sw=2 ts=8 aw et ai:

subsection{A Distributed Certificates Environment}

In a fully distributed environment is it possible to implement a certificate environment ?

For this question we rediscover a paper published in 2001 that proposed the fabric for such a solution cite{quest} . The main assumption is that we don’t have a Central Authority that is the root of a chain of certificates. We issue the first certificate as self-signed and we give unrestricted trust only in ourselves.

The algorithm has two main phases:
egin {itemize}
item Building the trust threshold by an authentication procedure. This certificate chain is similar to the standard one with the node as root CA instead of a standard certificate authority.
item Checking the identity of an encountered node by merging the two certificate chains and decrypting the neighbour from the destination to ourselves.
end {itemize}

The next figure represents node 2 meeting node 4 with a maximum depth of 4 in the certificate tree

egin{center}
includegraphicsscale=0.8{certdag.jpg}
end{center}

This is the Hunter Algorithm initially implemented . However, as we do not intend to study the efficiency of authorizations, we will use a graph of depth one, and built a buffer of trusted certificates in order to see if a fully distributed solutions is viable and to find what determines the optimal number of certificates kept as trusted so that the convergence time is minimum.

We designate as hit rate, the percentage of the node encounters that can merge their certificate chain and recognize their identity without any special authentication procedure. We confirm in MobEmu that this solution is feasible, as we quickly reach a 99,98 percent hit rate after the threshold build . the 0,02 percent miss is ignorable as it origins in isolated nodes that do not communicate with any group. In essence, the threshold size is the number of foreign nodes a node needs to trust in order for everybody to have full control over the identity of the whole batch.

A study over the number of certificates needed made over all our traces quickly reveals that the total number of certificates is around the value of node number raised at 1,5 power. This means that the size of the needed threshold is approximately equal to radical of the number of nodes.

egin{center}
includegraphicsscale=1.1{authentications.jpg}
end{center}

The slight variation is determined by the group capacity of forming groups. We measure this feature dynamically using the nature of the routing algorithms. Benchmarking with epidemic routing, we know that usage of Bubble Rap will decrease the number of hops per packet. By its nature, the Bubble Rap is in itself a measure of the capacity of group forming as it detects the groups based on the k-clique technique and uses the centrality of a node as a routing parameter for the taken decision. cite{routing_bubble_rap} .

egin{center}
includegraphicsscale=1.1{hop.jpg}
end{center}

As we expect, the traces taken in UPB 2011 has a low degree of social aggregation cite{upb_social_aspect}, whereas groups form more easily in the trace obtained in 2012 cite{upb_analysis} . A dataset that depicts a community that forms stable groups is taken at SIGCOMM and we will take consider it as a sample that reflects a high degree of aggregation cite{sigcomm}.

As we see, the number of certificates is also linear dependent of the capacity of the environment to form groups. In this graphic the 1 to 10 are the traces that we’ve processed as in the previous chart.

egin{center}
includegraphicsscale=1.1{aggregation.jpg}
end{center}

As groups tend to form, the size of the ideal threshold rises . We explain that by the low number of relays between groups. As social entropy raises, also the number of social relays raises and their importance decreases.

subsection{Streaming in Opportunistic Networks}

At what length can we use a pervasive infrastructure ? Messages can be routed, of course over the transport layer cite{upb_hyccups} and routing can surely be done as far as individual messages goes if we ignore the delays. But a more complete measure of the communication capacity of such networks is done by emulating streaming requirements. To model communication in mobile networks without infrastructure we have to rely on application routing paradigm. IoT descendants of layer 3 protocols either Link-State (OSLR) or vector based (AODV) prove highly inefficient in MANET networks due to high overhead or long convergence time. Therefore, we study routing solutions based on application environment analysis. We also emulate a multicast packet generation of 1-to-n nodes in order to provide a benchmark for streaming applications.
egin{itemize}
item We take as benchmark the Epidemic Routing Algorithm cite{routing_epidemic} , a basic flooding mechanism that will spread the message to all neighbours until the message destination is reached.
item BubbleRap is a routing protocol that will determine the do the routing based on community and centrality of the node, sending the message to the most central node of its group cite{routing_bubble_rap}.
item SPRINT is a predictive algorithm that routes the message based on an utility function. cite{upb_sprint} The main weight of this utility function is given by the predicted outcome of sending the message: considering the history of the node encounters so far and based on gaussian prediction cite{upb_gaussian} we want to determine if the message has a high chance to reach its destination. Of course, a large factor is the TTL of the message and its freshness, as the protocol gives priority to recent communication
item Sense and IRONMAN are algorithms that give weight to the selfishness of a node, determine the malicious neighours and analyze the trust of the network. cite{routing_ironman} cite{upb_sense}
item Spray and Wait and Spray and Focus route the packets based on the outcome of spreading a batch of control packets. This is the Spray phase that seeks to map the network. cite{routing_spray_and_wait} cite{routing_spray_and_focus}
end{itemize}

To stress their streaming capability, we emulate them on the UPB 2011 trace as it is sparse and the low number of encounters and their random nature makes it perfect for protocols evaluation .

egin{center}
includegraphicsscale=1.1{protocols.jpg}
end{center}

Further, to evaluate the ups and downs of using a application routing based protocol in a real environment we use the SIGCOMM trace cite{sigcomm} and Social Blue Conn. We use the Sigcomm trace as its specific is to have multiple highly aggregated groups with a high number of nodes (76) . Social Blue Conn manages to reach a 100 percent hit rate with a low number of encounters, therefore securing a high efficiency in communication.

egin{center}
includegraphicsscale=1.1{performance.jpg}
end{center}

We see that delivery performance, represented by the number of messages consumed per message delivered is highly improved, at the cost of a slightly higher latency and at a hit rate with 9 percent lower. We could not simulate a SPRINT environment for SIGCOMM trace due to the lack of resources (the predictive component is very resource demanding in simulations) .

egin{center}
includegraphicsscale=1.1{loss.jpg}
end{center}

However, this loss was observed only in SIGCOMM trace, as the performance obtained by emulating the Social Blue Conn environment where the same for Bubble Rap and Sprint as for Epidemic Routing .

Therefore, we conclude that an multicast streaming application is best suited for groups between 10 and 15 with a high degree of communication efficiency. A higher group size brings the hit rate to an inadequate point and a lower group size would be irrelevant for the proposed algorithms as a layer 2 bus would be sufficient for the streaming purpose.

subsection{Consensus in Opportunistic Networks}

We studied the feasbility of a distributed consensus algorithm in an environment that does not have the guarantee of an end-to-end path. We’ve simulated the OTR algorithm (One-Third Rule) cite{consensus_otr}.

Each round in OTR consists of two steps:
egin{itemize}
item a sending step where process p sends its value for round r follow by a transition step in which either p takes a decision or proceeds to the next round. A node p can tolerate not to receive messages from up to one-third of the other participants in round r, while still being able to decide or proceed to the next round. This becomes the condition of having at least 2n/3 participants sending contributions in each round. After the initialization, the consensus session is initiated by calling the startSession procedure with the group identifier and the number of participants as the parameters. A subscription is set for the group before starting the first round of the algorithm.
item At each round the current contribution is published along with the identifier of the message recorded in contribIds so it can be cancelled later. At receiving a message we decide if it is either a message of a new round and start a new round, of the current round and see if a decision can be made or a former round and cancel the message. Function decide is called when a decision is made locally, or when a decision message is received from another participant to decide upon the value.

end{itemize}

We’ve concluded the implementation of this algorithm is realistic after the UPB 2011 and 2012 simulations. The network overhead is directly proportional with the number of sequences needed to reach consensus.

egin{center}
includegraphics{message2011.jpg}
includegraphics{message2012.jpg}
end{center}

The distribution of the number of sequences in a group of 22 (as in UPB 2011 trace) or 25 (as in UPB 2012 trace) is dependent of the existence of one or more relay nodes that carry information at the right time.Our main goal is to model and design an architecture suited for a fully decentralized system. The environmental abstraction might differ from one case to another: mobile networks have no end-to-end stable connection, while an IoT setup is highly heterogeneous . One thing is common though, each node must take its own decision. As the own point of view of the node is limited to a small number of inputs, the best bet to tackle the challenges of constructing such a network is prediction.

Feeding data to a learning system is feasible in a static environment that can be trained for several days, but for real time performance we need a distributed solution. We propose for data analysis ScienceOpp, a framework that will distribute the load of machine learning algorithms to machines in the network. We do so using modelling Deep Learning and Reinforcment Learning solutions over a Map Reduce paradigm embedded in a Tensor Flow back endcite{tensorflow}. Also, we want to transparent monitor tolerance to failures as output from TensorFlow engine. We will test this framework with the traces used in MobEmu and in Sim2Car. We then compare our solutions to Keras proposed framework already existent.

x

Hi!
I'm Mack!

Would you like to get a custom essay? How about receiving a customized one?

Check it out