Data Flow Optimization in the Internet of Things

The Internet of Things (IoT) is constituted of an important number of constrained nodes limited in terms of power energy, computation capacity, storage capacity. They produce a considerable amount of data, which increases the data flow in the network. The inefficient transmission of data via constrained nodes makes the network unstable, the energy consumption increases rapidly, and the data delay increases strictly. To overcome these limitations, we propose a new approach that allows nodes to select the efficient path to transmit data from source nodes to base stations (BSs) to optimize the data flow in the constrained network. First, we grouped nodes using a density peaks (DP) clustering algorithm based on the coordinate’s location of these nodes. Second, using the group nodes, the assignment of nodes to BSs that are considered as the collectors of data is performed. Third, the nodes make a dynamic and automated path plan to optimize the data flow in the constrained network. Simulation results on a real network data set demonstrate that our proposal outperforms the state-of-the-art approaches in terms of the number of hops to achieve the cluster head (CH) node, the data delay, the network lifetime, and the number of the alive nodes.


Introduction
Kevin Ashton initialized the term Internet of Things in 1999. It is regarded as the next progression of the current Internet. The Internet of Things is a novel communication model of devices to the Internet; it has attracted much concentration due to its potential to transmit our physical world into a digital cyber world with important information [1]. The IoT devices are often diminutive in size, sheer in number, hold less memory, use less power, and have less computational capabilities. The IoT objective is a common aspect of our everyday life and depends on the behaviour of users [2].
Sensors [3], radio-frequency identifications (RFIDs) [4], actuators [5],are essential components of embedded systems in IoT, appropriated in numerous real-time applications [6].The sensors are used to sense the change appearing in environments, while the actuators transform an electrical signal into a physical parameter to control physical transformation. The increasing number of devices communicating will produce massive data traffic, where the generators of data and the executors of tasks should collaboratively optimize the data gathering and disseminating.
IoT oriented various platforms and applications that exploit communication among heterogeneous and nonheterogeneous devices to execute tasks and provide real-time services [7,8,9,10,11,12]. It creates a new type

Related works
Low-Energy Adaptive Clustering Hierarchy (LEACH) [27] is the most known cluster-based routing algorithm used in many IoT networks. In LEACH algorithm, a node in the network is selected by nodes in a given area as a CH for the cluster. This algorithm dynamically updates the CHs in the network, which is efficient to balance the load. However, when a node with low energy is selected will die rapidly, which leads to a decrease in the network lifetime, such that the CH are the aggregator of data.
A distributed routing path planning algorithm [28] is proposed in WSN. The cluster heads are selected according to the timer value, computed based on residual energy, node's density, and average distance to neighbors. The authors presented synchronized and periodic rounds to avoid a high amount of energy consumption by the cluster heads than non-cluster heads. The proposed work is measured in terms of energy consumption, network lifetime, number of alive nodes, and network throughput. However, the data transmission delay is not considered.
A multi-path routing algorithm to transmit data from source nodes to BS is proposed [28].The algorithm is focused on the residual energy on the hop count to discover the optimized path, which is added to the routing table. The optimized path is based on ant colony optimization (ACO) [30]. The proposed algorithm considers the minimum number of hops, the maximum residual energy, and the weighted energy to extend the network lifetime. However, the clustering mechanism and the data transmission delay are not considered. A routing algorithm for improving the network lifetime is proposed in [31]. The routing algorithm is based on the particle swarm optimization (PSO) [32], where the relay nodes considered gateways are the principal elements for path optimization. The gateways are responsible for gathering, aggregating, and transmitting data sent to BS/sink. The simulation results can extend the network compared to other bio-inspired approaches. However, the clustering mechanism and the data transmission delay are not considered.
Two data aggregation and dissemination models are in a constraint IoT network [33], constructed by the sensor, actuators, aggregators, gateways, and a central server. The first model considers the presence of actuators that can use the data generated from the sensors. Meanwhile, the second model does not consider the actuators' presence; the data are transmitted to aggregator nodes that have to send it to getaway. Data transmission is performed via multiple hops via the optimal path to reduce energy and extend the network lifetime. The proposed work is evaluated in terms of the total energy consumption, the network lifetime, and the time consumed.
Authors in [34] proposed a clustering mechanism in a constraint network of IoT. Nodes select the most preferred nodes (the cluster heads) as relay nodes to transmit data using the genetic algorithm [35] . The selection of the most preferred nodes is based on the distance to the BS, the cluster head distance to the covered nodes, the standard deviation of the cluster head distance, transfer energy, and the number of the transmission. The proposed work is evaluated in terms of the number of slots, the number of packets transmitted to cluster heads, the number of packets transmitted to BSs. However, the energy consumption and the network lifetime are not considered.
In [36] , a cluster head selection and clustering mechanism are proposed. The sensors select the near-optimal cluster heads using the Artificial Bee Colony algorithm [37] based on the remaining energy of the sensors, the number of neighbors, the Euclidian distance between the sensors and the sink, and the Euclidian distance between sensors and their neighbors. The second phase is to cluster devices with cluster heads based on the Euclidian distance between these devices and cluster heads and data generated. The simulations are evaluated in terms of residual energy and data transmission delay. However, the network lifetime is not considered.
The authors in [38] proposed a mechanism to optimize the energy consumption in the IoT network, especially in WSN. The CHs are selected using a hybrid Whale Optimization Algorithm (WOA) with Simulated Annealing (SA) [39]. The parameters of the selection of CHs are energy consumption, path distance, delay, load, and temperature. The fitness function is computed based on these parameters. The proposed mechanism is measured in terms of temperature, the number of nodes alive, load, network energy, and costs. However, the data transmission delay is not considered.
The data generated from source devices in the IoT networks is transmitted to high-level entities to respond to the request requirements of users and applications. As the data transmission in a decentralized IoT network is performed via multi-hop nodes, it is essential to consider the crucial elements to optimize the data flow in this constrained network, which contains nodes limited in energy consumption, storage capacity, processing capacity, etc. As illustrated in Table 1, each previous study focuses on some parameters of clustering mechanism, data transmission delay, energy consumption, and network lifetime to achieve the best data optimization in a constrained IoT network. However, in our contribution, we are considering all these parameters at once with the improvement in the performance compared to the previous studies.

Problem definition
The area of interest is considered a constrained network IoT constructed by sensor nodes, actuator nodes, and BSs/sinks. The sensor nodes can sense the environment parameters; the actuator nodes or end-users can utilize the sensed data. The actuator nodes can receive the data from sensor nodes via intermediate nodes or end-users via the BS to perform a physical parameter. To transmit the data from a source node to a destination node in a constrained IoT network constructed by limited nodes in terms of energy consumption, processing capacity, and storage capacity, the mechanisms adopted have to deal with these different constraints. The communication is one-hop (from node to another) and unicast.
This contribution considers that the network contains a set of BSs, which act as middlewares between the endusers and the sensors/actuators. The problem is to manage data transmission from an optimized path that can minimize the energy consumption, increase the network lifetime, decrease the data delay, and minimize the path length and distance. It is an N-hard problem that we have to deal with these criteria simultaneously. Figure 1 provides an example of transmission data from the source node S to the destination node D. As shown, there are two paths: (i) a short path (the bold line) in terms of distance and the number of hops and (ii) the longest path (the dashed line). The transmission of data from the shortest path can decrease energy consumption and execution time. However, sending data from a short path does not guarantee efficient transmission and data flow minimization in a constraint network. Furthermore, a node within a short path that transmits or execute data can lead to the failure of node lifespan, which decreases the network lifetime. Thus, the objective is to transmit the data from the optimized path, which is a short path and, at the same time, includes the nodes with high capacities of nodes.

Energy model
The constrained nodes are constrained in terms of energy consumption, which is a principal evaluated parameter for the proposed solutions in IoT.
A radio model [40] used to analyze the parametric effect during the design and operation of a protocol. A sensor node in this model is made up of seven components: transmission and reception electronic equipment, a transmission and reception antenna, a transmission amplifier, and a data processor. The energy is consumed in the operations of data transmission, data reception, and data compression.
We adopted the simplified energy model introduced in [45]. Depending on the distance between the source and the destination, their model employs free space and multi-path fading channels.
The transmission phase consumes energy in radio electronics and amplifier electronics, it is given as follows: where, • L is number of bits (data) transmitted over a distance D • E elec (nJ/bit) is the total amount of electrical energy needed by modulator, digital coding, and other electronic circuits to transmit or receive one bit of data.
• E f s (nJ/bit/m 2 ) is the amount of power needed by the amplifier to transmit data directly to the receiver.
is the amount of energy needed by the amplifier to send data using other nodes in a multi hop manner.
• D 0 is the threshold distance computed as : The reception phase consumes to receive data is given as follows:

Our proposal
This section presents a new contribution for efficient data flow in a constrained Internet of Things network. As stated above, several factors may impact this task, such as the density wobbling, the required time, the energy consumption, and the network lifetime. The main novelty of our contribution is to address these criteria simultaneously.

System overview
The data is generated from different nodes and transmitted to BSs; end-users or nodes exploit it to produce the services and applications. As the network can be composed of many nodes, the inefficient path plan from where data is transmitted leads to consuming more energy, decreasing the network lifetime, and increasing the data delay. Some assumptions for the proposed model are adopted as follows: • Nodes are homogeneous and static with different initial energy and capacities.
• Nodes are distributed randomly in the network to monitor (sensors) environmental parameters or execute instructions (actuators). To optimize the data flow in a constrained IoT network, Figure 2 shows different phases of the proposed solution. The first phase is a clustering mechanism of nodes using the DP clustering based on their locations. Then the assignment of nodes to BSs is adopted to determine the efficient positions to BSs. Thirdly, a dynamic path plan method to determine the efficient nodes in real-time data transmission.

Nodes clustering using DP
The nodes are distributed randomly in the network, such that the nodes have diverse communication range. The first phase of the solution is to cluster the nodes into a set of clusters. The clustering is based on the location of nodes.
DP clustering [42] is a new clustering algorithm recently proposed by Rodriguez and Laio, which is designed for a large data set. The DP algorithm is capable of detecting non-sphere clusters and needs no iteration. The flow chart of DP algorithm is shown in Figure 3. The algorithm is based on two information: (i) cluster centers (CHs) (Have higher local densities) are frequently surrounded by neighbors that have lower local densities, and (ii) there is a large distance between cluster centers. For that, DP calculates two metrics for each node: its local density ρ and its distance δ from other nodes with higher density. The local density of the node i is the number of nodes neighboring of i, it is computed as: Where χ is a function defined as follows: S is set of all nodes, r ij is the Euclidean distance between two different nodes i and j, and ρ i is the average communication range of all nodes (a cut-off distance), which is a threshold determined as an input of algorithm, such that is equal to the number of nodes within the communication range r.
The distance δ of node i is computed as: It is the minimum distance from node i to any other node whose local density is higher than that of node i. Furthermore, the node i that has the highest density among all nodes is defined as follows: The DP clustering algorithm draws a decision graph. In this graph, the abscissa is ρ and the ordinate is δ, where large values of ρ and δ of nodes are chosen as cluster centers. Then, after selecting the centers, the nodes are arranged in descending order of local density, and the rest nodes are assigned to the cluster of the nearest high local density nodes. In the next step, the nodes (the noise nodes) that are not in the communication range with centers will choose the nearest center that is linked by a neighbor (or neighbor of neighbor, etc.).
Using this DP algorithm, the nodes are clustered, where the cluster centers are the nodes with high density (a high number of connected nodes) than others. The nodes that are indirectly linked to the cluster achieve the cluster center via multi-path, but all nodes related to a cluster share the same identifier ID.

Assignment of nodes to BSs
As the first phase is considered the clustering of nodes using the DP clustering, the second phase considers the assignment of nodes to BSs.
In the network, the nodes and BSs are deployed randomly with different communication range. Some nodes can communicate directly to BSs, and others connected indirectly to BSs via multi-hop communication. For that, it is essential to assign the correct nodes for each BS. The objective is to semi-balanced the number of nodes assigned to each BS.
Base station density: The network is constructed by N nodes and M BSs, the density factor of the BS (named N Where N i is the number nodes that are in communication with the BS i ( the adjacent nodes) and N r is the total number of nodes that are in communication with BSs. More the value of d (BS) i is higher, more the density of the BS i is higher. We assigned a node that exist in communication with two BSs to just one BS, where this case is rarely happen, which is expressed in the following: Clusters assignment: The network is constructed by Nc clusters; we assign all nodes of clusters that contain the nodes with communication range to BS i.
Where N i,c is the number of clusters that are connected to the BS i via adjacent nodes and |N i,j | is the number of nodes in the cluster j. Each BS i will be assigned by d (BS) i *%of nodes, the rest of nodes N rest that are not assigned yet to BSs is defined as below.
The objective is to distribute the N rest nodes among BSs, such that the clusters that are not connected to adjacent nodes of a given BS are not assigned yet. So, the objective is to assign these nodes to BSs. The mechanism is given as follows: • Each cluster from the unassigned list can select assigned neighboring clusters as candidate intermediate clusters to reach a given BS.
• The assigned cluster that has the center node near to BS is selected as an intermediate cluster by unassigned clusters, and an unassigned cluster will be assigned with the same BS of the first one.

Dynamic path plan for real-time data transmission
As noted above, the nodes are assigned to all BSs, such that the distribution of nodes is semi-balanced among BSs, and the clusters determine the intermediate sets that can reach the assigned BSs.
After that, the objective is to make nodes selecting an optimized path that can minimize the data flow in realtime in the network. Each node has the identifier (ID) of the assigned BS and the IDs of intermediate surrounding clusters assigned to BS i. To transmit the data from sensors/actuators to BS, the center nodes are not considered as aggregators of data in our contribution, all the nodes are responsible to transmit data from source to destination. The aggregator nodes lose more energy than others, which makes the network lifetime decreased rapidly.
A node k can decide by itself the receiver of its data using the sorting factor of each neighboring node j as follows: Where L j is minimal number of hops to achieve its BS i from the node j, T j (t) is the amount of data received from beginning to instant t from the node k, E expected j is residual energy of node j normalized in range 0 and 1, and q is the average residual energy of all neighboring nodes of k, and α ,β , and γ are the influence parameters in range [0; 1], such that α + β + γ = 1 .
Each node k calculates the sorting factor of its neighboring nodes, the one with a high value of the sorting factor in instant t is considered as the receiver of the data; its value is adjusted during the time. The node k can expect the residual energy of the neighboring node from the data that can be received.

Simulation setup
In this simulation, we used Python language (version 3) to implement our mechanism on a Dell Inspiron 155567 computer with an Intel i7 CPU running at 2.4 GHz, Windows 7 (64-bit), and 8 GB of RAM.
To implement our contribution, we extracted data from the location-based online social network Brightkite dataset, which was obtained from the Stanford Large Network Dataset Collection [44]. The extracted dataset includes 5000 nodes with their location coordinates. We deployed 25 of BSs into the network. The nodes are initialized with different residual energies between 0.3 and 0.9.

IMAGE RECONSTRUCTION FROM INCOMPLETE CONVOLUTION DATA
For the energy consumption during the data flow, according to [45,46] , we consider the energy for reception and transmission is 0.001J and 0.004J, respectively, for each packet. Also, the energy of the processing information (by an intermediate node) is 0.0015J. The size packet is not considered; it is considered as one packet (one data). Also, we send 10000 packets to the network generated from random nodes, where each packet should achieve the BS. The packets consume 1ms at each node. The influence parameters are α = 0.25 ,β = 0.5 , and γ = 0. 25. For the purpose of performance with other approaches, we compared two algorithms as reviewed above: LEACH algorithm [27] and OPEN algorithm [28]. In the LEACH algorithm, the CHs are selected randomly and updated at a fixed period of time, and the data is transmitted randomly from CH to a given BS. In the OPEN algorithm, the CHs are selected based on the residual energy, density, and average distance to neighbors. The data is transmitted from CH to a given BS via the shortest path.

Simulation results
To measure the efficiency of the DP clustering compared to LEACH and OPEN clustering algorithms, Table 2 shows the main differences in terms of the number of clusters and the number of hops to reach the CHs. As illustrated, the number of clusters and the number of nodes with more than one-hop in our approach are lesser than other algorithms. Furthermore, the maximal number of hops to achieve CH is 3 in our approach. However, it is 4 and 5 for LEACH and OPEN algorithms, respectively. This is because the applied DP clustering algorithm is efficient on a large scale than other clustering algorithms. As we send 10000 packets to the network, the total execution time is measured for our approach and others, as shown in Figure 4. The consumed time for our approach and the OPEN algorithm is less than the LEACH algorithm because the packets are transmitted via the shortest path, which leads to a minimal number of hops to the BS. The clustering mechanism and the assigned nodes to BSs boost finding the shortest path to BS, which decreases the data delay in the network. The energy consumption is measured during the packet transmission for each approach, as shown in Figure 5. As demonstrated, the total energy consumption for our solution and the OPEN algorithm are semi-equal and lesser than the LEACH algorithm. This is a consequence of the mechanism based on residual energy to send data packets for our approach and OPEN algorithm. To study the behavior energy consumption of nodes in the network, we measure the deviation of residual energies of all nodes during the data transmission, as shown in Figure 6. As illustrated, during the first transmissions of data, the deviation of residual energies for our approach is higher, but after then the deviation is converged to zero. However, the deviation of residual energies of other algorithms is wobbling; it does not converge to zero. Consequently, the network lifetime is decreased for others, especially the OPEN algorithm, which has semi-equal energy consumption, where the same energy consumption does not mean the increased network lifetime. To examine the consequence of the network lifetime, we determine the number of alive nodes during the data sending from sources to BSs, as shown in Figure 7. The number of alive nodes for our approach is still at 100% during an increased number of packets and decreased to 97% after a higher number of packets; this is a result of an extended network lifetime. However, the number of alive nodes is decreased rapidly to 89% and 92% for LEACH algorithm and OPEN algorithm, respectively.

Conclusion
In this paper, to optimize the data flow in a constrained IoT network, where the devices are constrained in terms of the power energy, the capacity of storage, and the capacity of processing, we proposed a new approach that allows these devices to make self-decisions based on the state of data flow in the network. The mechanism of grouped nodes is applied using the DP clustering algorithm designed for a large scale. The nodes are then assigned to BSs based on the centrality and the distance. After that, a path plan in real-time is performed by nodes to determine the optimized path to transmit data from source nodes to the BSs. The results show the efficiency of our approach  compared to other approaches in terms of the number of hops to CH nodes, the execution time, the network lifetime, and the number of alive nodes.
The current work can benefit researchers and practitioners from developing and creating applications designed for lossy networks. Furthermore, the systems of smart agriculture, smart industry, smart city, and so on can exploit the data generated from the deployed sensors to create new services for end-users and applications.
In the future, we plan to incorporate the caching system in the constrained network of IoT, where the nodes can store the data of sources nodes without sending it to the BSs, to be exploited by the nodes without requesting the central entities.