Adaptive Learning-Based Delay-Sensitive and Secure Edge-End Collaboration for Multi-Mode Low-Carbon Power IoT

2022-08-22 03:06:56HaijunLiaoZehanJiaRuiqiuyuWangZhenyuZhouFeiWangDongshengHanGuangyuanXuZhentiWangYanQin

China Communications 2022年7期

Haijun Liao，Zehan Jia，Ruiqiuyu Wang，Zhenyu Zhou，*，F(xiàn)ei Wang，Dongsheng Han，Guangyuan Xu，Zhenti Wang，Yan Qin

1 Hebei Key Laboratory of Power Internet of Things Technology，North China Electric Power University，Baoding 071003，Hebei，China

2 State Grid Anhui Electric Power Company Chuzhou Power Supply Company，Chuzhou 239000，Anhui，China

Abstract:Multi-mode power internet of things(PIoT)combines various communication media to provide spatio-temporal coverage for low-carbon operation in smart park.Edge-end collaboration is feasible to achieve the full utilization of heterogeneous resources and anti-eavesdropping.However，edge-end collaboration-based multi-mode PIoT faces challenges of mutual contradiction in communication and security quality of service(QoS)guarantee，inadaptability of resource management，and multi-mode access conflict.We propose an Adaptive learning based delAysensitive and seCure Edge-End Collaboration algorithm(ACE2)to optimize multi-mode channel selection and split device power into artificial noise(AN)transmission and data transmission for secure data delivery.ACE2 can achieve multi-attribute QoS guarantee，adaptive resource management and security enhancement，and access conflict elimination with the combined power of deep actor-critic(DAC)，“win or learn fast(WoLF)” mechanism，and edge-end collaboration.Simulations demonstrate its superior performance in queuing delay，energy consumption，secrecy capacity，and adaptability to differentiated low-carbon services.

Keywords:multi-mode low-carbon PIoT; edge-end collaboration; multi-attribute QoS guarantee; security enhancement;adaptive deep actor-critic

I.INTRODUCTION

Smart park which integrates rooftop photovoltaic，distributed energy storage，and controllable loads can achieve low-carbon operation through bidirectional energy interaction with power grid[1].In order to support renewable energy utilization，real-time energy management，and electricity-carbon computing，massive power internet of things(PIoT)devices are deployed in smart park to collect data of voltage，current，active/reactive power，CO2，CH4，O3，water，gas，heat，etc[2，3].Considering access and data transmission requirements of PIoT devices[4]，as well as complex network topology of smart park，it is necessary to construct multi-mode PIoT by combining various existing communication media including alternating current/direct current(AC/DC)power line communication(PLC)，micro-power wireless communication，and wireless local area network(WLAN)to provide full spatio-temporal coverage[5，6].However，the spectrum resources and channel features of different communication media possess obvious heterogeneity.For example，the switching of distributed renewable energy will introduce electromagnetic interference(EMI)to PLC，while micro-power wireless communication and WLAN are susceptible to eavesdropping attacks[7].Besides，the operation of lowcarbon services as well as real-time transmission and processing of low-carbon data pose stringent requirement on multi-attribute quality of service(QoS)in terms of delay，energy consumption，secrecy capacity，and so on[8]，which cannot be satisfied by existing PIoT technologies without multi-mode heterogeneous networking.

Edge-end collaboration provides a feasible solution by integrating edge intelligence and device-side ubiquitous perception capability[9–11].Based on the data processing results and performance feedback provided by edge-computing gateways，device-side channel selection can be optimized according to differentiated service requirements and dynamic characteristics of communication media to achieve efficient utilization of multi-mode heterogeneous resources[12，13].Besides，edge-end collaboration can be combined with advanced security mechanisms to against attacks.For example，eavesdropping attacks can be eliminated by emitting artificial noise(AN)to suppress the data reception of eavesdroppers and improve secrecy capability[14，15].However，the research on edge-end collaborative multi-mode PIoT for low-carbon smart park is in its infancy.Major technical challenges are introduced below.

Mutual contradiction in joint communication and security QoS guarantee:Simultaneous improvement of communication and security performances is constrained by the limited resources of PIoT devices.For example，allocating more transmission power for ANbased anti-eavesdropping inevitably degrades communication throughput and delay performances.Therefore，the limited power resources should be intelligently split into AN transmission and data transmission to balance communication and security QoS.

Inadaptability of resource management with lowcarbon services:It is unrealistic to obtain the global state information(GSI)of the overall network in smart park due to wide-range infrastructure distribution，complex asset attributes，and differentiated geographic features.Traditional resource management methods based on GSI will lead to enormous deviation between theoretical optimization performance and practical application，which cannot support the reliable operation of low-carbon services such as carbon footprint monitoring，load monitoring for automatic control，and electricity spot market.

Multi-mode access conflict due to the lack of edgeend collaboration:Access conflict occurs when massive PIoT devices compete for the same channel，which leads to high latency and low resource utilization.Traditional device-oriented distributed channel selection schemes cannot perceive access behaviors of other devices and eliminate access conflict due to the lack of unified coordination and optimization.On the other hand，the edge-oriented centralized channel selection optimization suffers from dimensionality curse.

Edge-end collaboration based heterogeneous network resource management has attracted intensive attention.In[16，17]，a joint optimization algorithm of task offloading，task scheduling and resource allocation algorithm was proposed for edge-end collaborative 5G communication network，but the integration of multi-mode wired and wireless communication resources is not considered.In[18]，F(xiàn)ilomenoet al.addressed the power allocation problem in an edge-end collaboration based hybrid PLC and wireless system to maximize sum transmission rate.Only single QoS metric optimization is considered，which cannot provide multi-attribute QoS guarantee for low-carbon services.In[19]，Wuet al.proposed a game-theoretical approach for energy-efficient task offloading in edgeend collaboration based heterogeneous networks to balance energy consumption and delay.However，the aforementioned work relies on the perfect knowledge of GSI and does not consider the eavesdropping attacks in multi-mode data transmission.

Deep reinforcement learning(DRL)provides an effective solution for intelligent resource management by integrating complex environment learning capability of deep learning and sequential decision making optimization of reinforcement learning[20，21].DRL has been widely studied under incomplete GSI to guarantee multi-attribute communication and security QoS.In[22]，Dinhet al.proposed a blockchain and multi-agent DRL-based channel selection and power allocation optimization algorithm to achieve the high degree of security and trust.A DRL-based task offloading approach for delay-sensitive services was developed to reduce communication delay and security.However，these works have not considered the heterogeneity of multi-mode wired and wireless communication resources，as well as the EMI caused by renewable energy switching.Moreover，they lack adaptability with differentiated low-carbon services because the learning rate of DRL is not adjusted dynamically in accordance with the performance feedback and environment interaction.

Motivated by the aforementioned challenges，we address the problem of how to support multi-mode heterogeneous networking in low-carbon PIoT through delay-sensitive and secure edge-end collaboration[23].The objective is to jointly maximize network secrecy capacity and minimize device energy consumption and queuing delay under the constraint of deviceside queue stability.First，the long-term stochastic joint optimization problem is transformed into a series of short-term subproblems of joint channel selection and power split by minimizing the upper bound of a drift-minus-reward term.Next，we propose an Adaptive learning based delAy-sensitive and seCure Edge-End Collaboration algorithm(ACE2)which allows each device to solve the transformed subproblems in parallel with the assistance of gateway.Finally，ACE2is compared with state-of-the-art algorithms through extensive simulations to demonstrate its superior performance in terms of queuing delay，energy consumption，secrecy capacity，and adaptability to differentiated low-carbon services.

The main contributions are summarized as follows.

·Multi-attribute QoS guarantee for low-carbon services:ACE2optimizes the weighted difference among secrecy capacity，energy consumption and queuing delay.Particularly，the weights of optimization objectives can be adjusted in accordance with the multi-attribute QoS requirements of differentiated low-carbon services.

·Adaptive intelligent resource management optimization and security enhancement:ACE2combines actor neural network based decision making and critic neural network based system state evaluation to learn the optimal joint optimization policy based on the performance feedback provided by the edge-computing gateway.Moreover，ACE2intelligently split device power into AN transmission and data transmission to enhance data delivery security.

·Edge-end collaboration-based multi-mode access conflict elimination:Access conflict caused by massive devices competing for the same multimode channel is eliminated through slot-level device-gateway interaction.The edge-computing gateway compares the device-side action-state values and grants the channel to the device with the maximum one.

The rest of this paper is organized as follows.Section II and Section III introduce the system model and problem formulation，respectively.ACE2is elaborated in Section IV.Section V presents the simulation results.Section VI concludes the paper.

II.SYSTEM MODEL

An edge-end collaborative multi-mode low-carbon PIoT network is shown in Figure 1.It consists ofIdevices and an edge-computing gateway.The devices are deployed in photovoltaic(PV)panels and charing piles of smart parks to collect data.The device set isU={u1，···，ui，···，uI}.The gateway interacts with devices through multi-mode communications such as PLC，WLAN，and micro-power wireless communications，and uploads data to lowcarbon service platforms through 5G of telecom operators or private network of power grid corporation.There areJmulti-mode channels includingJ1PLC channels，J2WLAN channels，andJ3micro-power wireless channels，i.e.，J1+J2+J3=J.The channel set is defined asC={c1，···，cj，···，cJ}，wherecj，j= 1，···，J1，are PLC channels，cj，j=J1+1，···，J1+J2，are WLAN channels，andcj，j=J1+J2+1，···，J，are micro-power wireless channels.The heterogeneity of multi-mode communication media lies in channel quality，energy consumption，and secrecy[24].On one hand，PLC channel exists severe EMI introduced by the switching of distributed renewable energy.On the other hand，WLAN facilitates the highest transmission rate provisioning and micro-power wireless communication consumes the least energy.Besides，WLAN and micro-power wireless channels are more susceptible to eavesdropping attack compared with PLC[25].An attacker launches active eavesdropping attack by proactively transmitting noise to interfere data transmission in the legitimate channel between devices and gateway[26].Devices transmit AN in the null space of the legitimate channel to achieve anti-eavesdropping[14].

Figure 1.Multi-mode low-carbon PIoT network.

There areTtime slots.The slot length isτ，and the slot set isT={1，···，t，···，T}[27].In each slot，each device collectsAi(t)amount of data and jointly optimizes channel selection and power split for task offloading.Then，access conflicts among devices are eliminated by edge-end collaboration.An example is shown in Figure 1.u1selects PLC channelc1with severe EMI and all the transmission power is split for data transmission.Bothu5andu6selectc5firstly.Then，with edge-end collaboration，u6selectsc6and splits a part of power for AN-based antieavesdropping.u7selects micro-power wireless channelc7for energy saving.

Figure 2.Algorithm structure of ACE2.

2.1 Device-Side Data Queue Model

The collected data ofuiare stored in a local buffer，which is modeled as a device-side data queueHi(t).The queue backlog is updated as

whereUi(t)is the throughput in thet-th slot.

2.2 Secure Transmission Model

2.3 Device-Side Energy Consumption Model

2.4 Device-Side Queuing Delay Model

The queuing delay ofuiis defined as the ratio of the average queue length to the average data arrival rate[29]，which is given by

III.PROBLEM FORMULATION

This paper addresses the problem of how to support multi-mode heterogeneous network in low-carbon PIoT through delay-sensitive and secure edge-end collaboration.The objective is to jointly maximize network secrecy capacity and minimize device energy consumption and queuing delay under the constraint of device-side queue stability.The joint optimization problem of channel selection and power split is formulated as

The long-term stochastic optimization problem P1 is NP-hard.To provide a tractable solution，P1 is transformed into a series of short-term subproblems based on Lyapunov optimization[30].H(t)is the set of data queues in thet-th slot.Define a drift-minusreward to maximize the optimization objective of P1 under the queue stability constraint，which is given by

whereVis a weight to trade off“queue stability”and“reward maximization”.ΔL(H(t))is the Lyapunov drift to parameterize queue fluctuation，which is the conditional expected difference of the Lyapunov functionL(H(t))between two adjacent slots.L(H(t))is defined as the quadratic sum of queue backlogs.η(t)denotes the short-term optimization objective，i.e.，

Theorem 1.The upper bound of DMRV(H(t))is derived as[30]

where C(t)is irrelevant to η(t).Proof.See Appendix VI.

Therefore，the upper bound ofDMRV(H(t))derived in(15)is minimized under constraintC1in thet-th slot.The slot-by-slot short-term subproblem is formulated as

P2 can be further decomposed intoIsubproblems solved by each device in parallel.Define the combination of channel selection and power splitting variables as Xi(t)={xi，1(t)，···，xi，j(t)，···，xi，J(t)，ρi(t)}.Thei-th subproblem solved byuiis formulated as

However，due to the lack of centralized coordination，the individual optimization carried out by each device results in access conflict when more than one devices select the same channel.

IV.ACE2:ADAPTIVE LEARNING-BASED DELAY-SENSITIVE AND SECURE EDGE-END COLLABORATION

In this section，we propose ACE2to solve P3.First，P3 is modeled as a Markov decision process(MDP)problem.The fundamental elements are introduced as follows.

1)State:The state space Si(t)consists of the device-side queue backlog，empirical task arrival rate，as well as the information of the(t -1)-th slot including channel selection，power splitting，and secrecy capacity，i.e.，.

2)Action:The action space is defined as the set of optimization variables of P3，i.e.，Xi(t).

3)Reward:Define reward as the optimization objective of P3，i.e.，Γ(Xi(t)).

Algorithm 1.ACE2.1:Input:Si(t).2:Output:Xi(t).3:For time slot t=1，···，T do 4:Phase 1:Device-Side Action Drawing 5:Each ui ∈ U draws an action x*i(t)based on policy π(Si(t)，xi(t)|θi)and obtains Q(Si(t)，x*i(t)|ωi).6:Phase 2:Edge-Side Coordination 7:Each ui ∈U transmits Q(Si(t)，x*i(t)|ωi)and x*i(t)to the gateway.8:If multiple devices select the same channel do 9:Gateway assigns the channel to the device with the maximum Q(Si(t)，x*i(t)|ωi)and rejects the other devices.10:Rejected devices redraw their actions.11:Return to line 8.12:end if 13:Phase 3:Device-Side Learning Rate Adjustment 14:Each ui ∈U executes action x*i(t)and calculates reward Γ(Xi(t)).15:Update average mixed policy ?π(Si(t)，xi(t)|?θi)as(21)and adjust learning rates as(19).16:Transfer to the next state Si(t + 1)，update Qi(t+1)as(1)，and obtain Q(Si(t+1)|ωi).17:Phase 4:Device-Side Network Updating 18:Calculate TD error φi(t)as(22).19:Update actor neural network and critic neural network as(23)and(24).20:end for

Due to network uncertainties such as channel gain，EMI power，and eavesdropping information，it is nontrivial to derive MDP transition probability accurately.We propose ACE2which combines device-side deep actor-critic(DAC)and edge-side coordination to learn channel selection and power split optimization without MDP transition probability.To further improve convergence speed and learning adaptability，ACE2explores “win or learn fast(WoLF)” to dynamically adjust learning rates based on edge-side performance feedback.The algorithm structure of ACE2is shown in Figure 2，which includes four parts，i.e.，1)deviceside action drawing，2)edge-side coordination，3)device-side learning rate adjustment，and 4)deviceside network updating.In the device side，each device possesses a policy-based actor neural network and a value-based critic neural network.The actor network interacts with the environment and draws action policy，while the critic network evaluates the current action policy and guides policy updating.In the edge side，the gateway eliminates access conflicts by comparing device-side action-state values.Detail implementation procedures are summarized in Algorithm 1 and introduced as follows.

Figure 3.System performances versus time slots:(a)Queuing delay;(b)Energy consumption;(c)Secrecy capacity.

Figure 4.Queuing delay and secrecy capacity of different services.

Figure 5.Energy consumption of different services.

Figure 6.Selection probability of multi-mode channels under different services.

Figure 7.Average service utility versus time slot.

V.SIMULATION RESULTS

In this paper，we consider an edge-end collaborative multi-mode low-carbon PIoT network，which contains 15 PIoT devices for three low-carbon services.ui，i=1，···，5，are deployed for carbon footprint monitoring，ui，i= 6，···，10，are deployed for load monitoring for automatic control，andui，i= 11，···，15 are deployed for electricity spot market.Carbon footprint monitoring has the largest energy consumptionweight，i.e.，due to the stringent requirement on energy consumption.Load monitoring for automatic control has the largest weights of delay and secrecy capacity，i.e.，due to the strict requirements on delay and secrecy capacity.Electricity spot market has the largest data arrival，i.e.，Ai(t)∈[0.8，1.2]Mbits.There are 5 PLC channels，5 WLAN channels，and 5 micropower wireless channels.The maximum transmission power on PLC，WLAN and micro-power wireless channels are set as 0.2W，0.1W and 0.01W.In simulation，an attacker is assumed to randomly eavesdrop on a WLAN channel whent= 101～200 and a micro-power wireless channel whent= 201～300.The other simulation parameter settings are shown in Table 1.Two state-of-the-art algorithms are employed for comparison.The first one is the actor-critic-based power allocation algorithm(ACPA)against active eavesdroppers[32].The second one is the WoLF hill climbing-based power allocation algorithm(WoLF-PHC)against smart attacks[33].Edgeend collaborative channel selection is not considered in both ACPA and WoLF-PHC.

Table 1.Simulation parameters.

Figures 3a-3c show the queuing delay，energy consumption，and secrecy capacity versus time slots.Whent= 1～100，queuing delay and energy consumption of three algorithms gradually decrease and secrecy capacity increase due to continuous learning.Whent= 101～200，system performances degrade since part of transmission power is split for antieavesdropping on WLAN channel，thereby increasing queuing delay and energy consumption.Whent= 201～300，energy consumption and secrecy capacity performance become better due to the superior performance achieved by continuous learning and the less impact conducted by the attack on micropower wireless channel.Compared with ACPA and WoLF-PHC，ACE2reduces queuing delay by 57.96%and 61.81%，reduces energy consumption by 14.96%and 18.66%，and improves secrecy capacity by 5.57%and 7.21%.On one hand，edge-end collaborationbased access conflict elimination and learning rate adjustment can adapt resource management optimization to multi-attribute QoS requirement of low-carbon services，thereby reducing queuing delay and energy consumption.On the other hand，the additional consideration of multi-mode channel selection improves secrecy capacity by switching to non-eavesdropped channels.

Figure 4 and Figure 5 show queuing delay，secrecy capacity and energy consumption of different services.ACE2provides multi-attribute QoS guarantee for lowcarbon services by consuming the least energy for carbon footprint monitoring and achieving the smallest queuing delay and largest secrecy capacity for load monitoring.Taking carbon footprint monitoring as an example，compared with ACPA and WoLF-PHC，ACE2reduces energy consumption by 18.53% and 21.39%.The reasons are further explained in Figure 6.

Figure 6 shows the selection probability of multimode channels of low-carbon services.ACE2selects micro-power wireless channel for carbon footprint monitoring with 79.40% probability due to its lower energy consumption，PLC channel for load monitoring with a 88.07%probability due to its large secrecy capacity，and WLAN channel for electricity spot market with a 85.07% probability due to its large transmission rate.ACPA and WoLF-PHC cannot achieve adaptive intelligent resource management and security enhancement because edge-end collaborative channel selection is not considered.

Figure 7 shows the average service utility versus time slots，which is defined as the weighted difference among secrecy capacity，energy consumption and queuing delay，i.e.，.Compared with WoLF-PHC，ACE2increases the average service utility by 34.5%.Compared with ACE2without WoLF and ACE2without edge-end collaboration，the performance gains of learning rate adjustment and edge-end collaborative access conflict eliminations are 14.6%and 37.1%，respectively.

VI.CONCLUSION

In this paper，we addressed the problem of how to support multi-mode heterogeneous networking in low-carbon PIoT.We proposed an adaptive learningbased delay-sensitive and secure edge-end collaboration algorithm named ACE2to provide multiattribute QoS guarantee for low-carbon services，security enhancement，as well as multi-mode access conflict elimination.Compared with ACPA and WoLFPHC，ACE2reduces queuing delay by 57.96% and 61.81%，reduces energy consumption by 14.96%and 18.66%，and improves secrecy capacity by 5.57%and 7.21%.Three low-carbon services were employed to verify that ACE2can achieve adaptability of resource management with differentiated low-carbon services.In the future，the multi-layer multi-timescale resource scheduling in cloud-edge-end collaborative low-carbon IoT network will be considered.

ACKNOWLEDGEMENT

This work was supported by the Science and Technology Project of State Grid Corporation of China under Grant Number 52094021N010(5400-202199534A-0-5-ZN).