On the Interplay Between Coded Distributed Inference and Transmission in Mobile Edge Computing Systems


Introduced by the European Telecommunications Standards Institute (ETSI), the concept of mobile edge computing is by now established as a pillar of the 5G network architecture as an enabler of computation-intensive applications on mobile devices. As illustrated in the figure with mobile edge computing, users offload local data to edge servers connected to wireless Edge Nodes (ENs). The ENs in turn carry out the necessary computations and return the desired output to the users on the wireless downlink.

As a baseline application, assume that each user wishes to compute a linear function Wx of a local data vector x, e.g., an image taken by the user’s camera, and a network-side model matrix W. Each EN acquires the users’ local data points x through uplink transmission at runtime, while the matrix W can be pre-stored at the ENs offline. Matrix W is generally large and hence it is split across the servers of multiple ENs. After the computing phase, the ENs transmit the computed outputs back to the users in the downlink.

Linear operations of the type illustrated above are of practical importance. For example, they underlie the implementation of recommendation systems based on collaborative filtering, or similarity searches based on the cosine distance. In both cases, the user-side data is a vector x that embeds the user profile or a query, and the goal is to search through the matrix of all items on the basis of the inner products between the corresponding row of matrix W and the userdata x.

In the presence of storage redundancy, matrix W can be stored at the ENs in uncoded or coded form. In the first case, the rows of the matrix are duplicated across different ENs. As a result, the ENs can transmit any shared computed output back to the users using cooperative transmission techniques. In contrast, with coding, no cooperation transmission is possible but downlink transmission can start as soon as only a subset of ENs has completed computations. The question main is: How should one balance the robustness to straggling ENs afforded by coding with the cooperative downlink transmission advantages of uncoded repetition storage in order to reduce the overall computation-plus-communication latency?

Some Results

Our work investigates three approaches: Uncoded Storage and Computing (UC), MDS coded Storage and Computing (MC), and a proposed Hybrid Scheme (HS) that concatenates an MDS code with a repetition code. The main contribution of this research is to demonstrate that HS is able to combine the robustness to stragglers afforded by MC and the cooperative downlink transmission advantages of UC.

To illustrate this point, consider the figure where we plot overall communication-plus-computation latency as a function of the ratio γ between the communication and computation latencies. The variability in the computing times is defined by a parameter η. It is observed that as γ increases, the total latencies of both UC and MC grow linearly. When the variability in the computing times of the ENs is high, hence this happens for η=0.8, and MDS coding for the most part outperforms the UC scheme due to its robustness to stragglers. This is unless γ is large enough, in which case downlink transmission latency becomes dominant and the UC scheme can benefit from redundant computations via cooperative EN communication. In contrast, when the computing times have low variability, hence for η=8, MDS coding is uniformly outperformed by the UC scheme. The proposed hybrid coding strategy is seen to be effective in trading off computation and communication latencies by controlling the balance between robustness to stragglers and cooperative opportunities.

The full paper can be found at ieeexplore (open access: arxiv)  

Combining Cloud and Edge Processing for Optimal Wireless Content Delivery


Content delivery is one of the most important use cases for mobile broadband services in 5G networks. As seen in Fig. 1, in 5G systems, content can be potentially stored at distributed units, or edge nodes (ENs), and hence closer to the user, with the aim of minimizing delivery latency and network congestion. Furthermore, a cloud processor, also known as central unit, has typically access to the content library and connects to the ENs via finite capacity fronthaul links. The central unit is not only necessary to enable content delivery when the overall edge cache capacity is insufficient, but it can also foster cooperative transmission from the ENs to the users by sharing common information to the ENs. However, any transmission from cloud unit to the ENs comes at a latency cost due to the use of fronthaul links. How should edge and fronthaul resources be optimally combined to minimize delivery latency?

In a recent work just published on IEEE Transaction on Information Theory, we provided a conclusive answer to this question by taking an information-theoretic viewpoint, and making the following simplifying assumptions:

1) only uncoded edge caching is allowed;
2) the cloud can only send fractions of contents via the fronthaul links;
3) the ENs are constrained to use standard linear precoding on the wireless channel;
4) The signal to noise ratio is sufficiently large.

Some Results

Our work derives a caching and delivery policy that is able to offer a near optimal trade-off between fronthaul latency overhead and downlink transmission latency from the ENs to the users. Two key scenarios are identified that depend on key system parameters such as fronthaul capacity, edge cache capacity, and number of per-edge node antennas:

1) When the overall cache capacity of the ENs is smaller than a given threshold, as illustrated in Fig. 2, it is necessary to use both fronthaul and edge caching resources in order to minimize latency. Importantly, even when the edge resource alone would be sufficient to deliver all requested contents, the policy, it is generally required to make use of fronthaul resources in order to foster EN  cooperative transmission. In fact, when the fronthaul capacity is sufficiently large, the latency cost caused by a fronthaul delay does not offset the cooperative transmission gains in the downlink;

2) Otherwise, when edge cache capacity is above the given threshold, as seen in Fig. 2, only edge caching should be used. Under this condition, the gains due to enhanced EN cooperation do not overcome the latency associated with fronthaul transmission. Interestingly, the threshold on the edge cache capacity increases as the number of ENs’ antennas increases, since edge processing becomes more effective when more antennas are deployed.

The full paper can be found at ieeexplore (open access: arxiv)

How can heterogeneous 5G services coexist on a shared Fog-Radio architecture?


Figure 1: A Fog-Radio Architecture with coexisting 5G services (URLLC and eMBB)

In 5G, Ultra-Reliable Low-Latency Communications (URLLC) – catering to use cases such as vehicular-to-cellular communications and Industry 4.0 — and enhanced Mobile Broadband (eMBB) – with its support of applications such as virtual reality – will share the same radio interface and network architecture. The 5G network architecture will be fog-like (see Fig. 1), enabling a flexible split of network functionalities between cloud and edge nodes. The cloud generally enables centralised processing, but at the cost of an increased latency for fronthaul transfer, while the edge can provide low-latency feedback but subject to the constraints of local processing.

This raises the following questions:

  • How should radio resources be shared between the two services?
  • How should the URLLC and eMBB network slices be configured?

A Novel Solution

In a recent work just published on IEEE Access , we proposed a novel solution illustrated in Fig. 1, whereby

  • Baseband processing is carried out at the edge for the URLLC slice, hence ensuring low  latency, and centrally at the Base Band Unit (BBU) as in a C-RAN for the eMBB slice, with the aim of increasing spectral efficiency;
  • eMBB and URLLC services can share the same radio resources in a non-orthogonal fashion – an approach we define as Heterogeneous Non-Orthogonal Multiple Access.

Towards the goal of managing the interference between URLLC and eMBB packets arising from H-NOMA, we consider a number of practical approaches in order of complexity. For the uplink, we have:

  • Treating URLLC interference as noise: each edge node forwards both eMBB and URLLC signal to the BBU, where the eMBB signal is decoded while treating URLLC signal as noise;
  • Puncturing: each edge node discards the received eMBB signal whenever a URLLC user is transmitting;
  • Successive Interference Cancellation (SIC): each edge node decodes and cancels the URLLC signal before transmitting only the eMBB signal to the cloud.

And for the downlink we consider:

  • Superposition coding: each edge node transmits a superposition of both eMBB and URLLC signal to corresponding users;
  • Puncturing: each edge node discards the eMBB signal whenever a URLLC signal is generated at the edge node.

It is noted that there is no counterpart of successive interference cancellation for the downlink.

Some Results

Figure 2

To give a taste of the results in the paper, we now provide an example. In Fig. 2, we plot the eMBB average per-cell sum-rates (black curves) and URLLC per-cell outage capacity (red curves) for the uplink as function of the URLLC activation probability. The latter is a measure of the URLLC traffic load. In general, the results demonstrate the potential advantages of H-NOMA for both services, especially when the URLLC traffic load is sufficiently large and successive interference cancellation is enabled at the edge nodes.

Link to our paper: https://ieeexplore.ieee.org/stamp/stamp.jsparnumber=8612914

Hello, world!

Welcome to King’s Centre for Learning and Information processing research blog.

We’re excited to share with you our findings in the future!