Author: Jingjing Zhang

Using Machine learning to Measure Intrinsic and Synergistic Information Flows

Context

Quantifying the causal flow of information between different components of a system is an important task for many natural and engineered systems, such as neural, genetic, transportation and social networks. A well-established metric of the information flow between two time sequences  and  that has been widely applied for this purpose is the information-theoretic measure of Transfer Entropy (TE). The TE equals the mutual information between the past of sequence  and the current value at time t when conditioning on the past of . However, the TE has limitations as a measure of intrinsic, or exclusive, information flow from sequence to sequence . In fact, as pointed out in this paper, the TE captures not only the amount of information on that is contained in the past of in addition to that already present in the past of , but also the information about that is obtained only when combining the past of both and . Only the first type of information flow may be defined as intrinsic, while the second can be thought of as a synergistic flow of information involving both sequences.

In the same paper, the authors propose to decompose the TE as the sum of an Intrinsic TE (ITE) and a Synergistic TE (STE), and introduce a measure of the ITE based on cryptography. The idea is to measure the ITE as the size (in bits) of a secret key that can be generated by two parties, one holding the past of sequence and the other , via public communication, when the adversary has the past of sequence .

The computation of ITE is generally intractable. To estimate ITE, in recent work, we proposed an estimator, referred to as ITE Neural Estimator (ITENE), of the ITE that is based on variational bound on the KL divergence, two-sample neural network classifiers, and the pathwise estimator of Monte Carlo gradients.

 

Some Results

We first apply the proposed estimator to the following toy example. The joint processes are generated according to

for some threshold λ, where variables are independent and identically distributed as .  Intuitively, for large values of the threshold λ, there is no information flow between  and , while for small values, there is a purely intrinsic flow of information. For intermediate values of λ, the information flow is partly synergistic, since knowing both and is instrumental in obtaining

Figure 1

 

information about .  As illustrated in Fig. 1, the results obtained from the estimator are consistent with this intuition.

 

Figure 2

For a real-world example, we apply the estimators at hand to historic data of the values of the Hang Seng Index (HSI) and of the Dow Jones Index (DJIA) between 1990 and 2011 (see Fig. 2). As illustrated in Fig. 3, both the TE and ITE from the DJIA to the HSI are much larger than in the reverse direction, implying that the DJIA influenced the HSI more significantly than the other way around for the

Figure 3

given time range. Furthermore, we observe that not all the information flow is estimated to be intrinsic, and hence the joint observation of the history of the DJIA and of the HSI is partly responsible for the predictability of the HSI from the DJIA.

The full paper will be presented at 2020 International Zurich Seminar on Information and Communication and can be found here.

On the Interplay Between Coded Distributed Inference and Transmission in Mobile Edge Computing Systems

Problem

Introduced by the European Telecommunications Standards Institute (ETSI), the concept of mobile edge computing is by now established as a pillar of the 5G network architecture as an enabler of computation-intensive applications on mobile devices. As illustrated in the figure with mobile edge computing, users offload local data to edge servers connected to wireless Edge Nodes (ENs). The ENs in turn carry out the necessary computations and return the desired output to the users on the wireless downlink.

As a baseline application, assume that each user wishes to compute a linear function Wx of a local data vector x, e.g., an image taken by the user’s camera, and a network-side model matrix W. Each EN acquires the users’ local data points x through uplink transmission at runtime, while the matrix W can be pre-stored at the ENs offline. Matrix W is generally large and hence it is split across the servers of multiple ENs. After the computing phase, the ENs transmit the computed outputs back to the users in the downlink.

Linear operations of the type illustrated above are of practical importance. For example, they underlie the implementation of recommendation systems based on collaborative filtering, or similarity searches based on the cosine distance. In both cases, the user-side data is a vector x that embeds the user profile or a query, and the goal is to search through the matrix of all items on the basis of the inner products between the corresponding row of matrix W and the userdata x.

In the presence of storage redundancy, matrix W can be stored at the ENs in uncoded or coded form. In the first case, the rows of the matrix are duplicated across different ENs. As a result, the ENs can transmit any shared computed output back to the users using cooperative transmission techniques. In contrast, with coding, no cooperation transmission is possible but downlink transmission can start as soon as only a subset of ENs has completed computations. The question main is: How should one balance the robustness to straggling ENs afforded by coding with the cooperative downlink transmission advantages of uncoded repetition storage in order to reduce the overall computation-plus-communication latency?

Some Results

Our work investigates three approaches: Uncoded Storage and Computing (UC), MDS coded Storage and Computing (MC), and a proposed Hybrid Scheme (HS) that concatenates an MDS code with a repetition code. The main contribution of this research is to demonstrate that HS is able to combine the robustness to stragglers afforded by MC and the cooperative downlink transmission advantages of UC.

To illustrate this point, consider the figure where we plot overall communication-plus-computation latency as a function of the ratio γ between the communication and computation latencies. The variability in the computing times is defined by a parameter η. It is observed that as γ increases, the total latencies of both UC and MC grow linearly. When the variability in the computing times of the ENs is high, hence this happens for η=0.8, and MDS coding for the most part outperforms the UC scheme due to its robustness to stragglers. This is unless γ is large enough, in which case downlink transmission latency becomes dominant and the UC scheme can benefit from redundant computations via cooperative EN communication. In contrast, when the computing times have low variability, hence for η=8, MDS coding is uniformly outperformed by the UC scheme. The proposed hybrid coding strategy is seen to be effective in trading off computation and communication latencies by controlling the balance between robustness to stragglers and cooperative opportunities.

The full paper can be found at ieeexplore (open access: arxiv)  

Combining Cloud and Edge Processing for Optimal Wireless Content Delivery

Problem

Content delivery is one of the most important use cases for mobile broadband services in 5G networks. As seen in Fig. 1, in 5G systems, content can be potentially stored at distributed units, or edge nodes (ENs), and hence closer to the user, with the aim of minimizing delivery latency and network congestion. Furthermore, a cloud processor, also known as central unit, has typically access to the content library and connects to the ENs via finite capacity fronthaul links. The central unit is not only necessary to enable content delivery when the overall edge cache capacity is insufficient, but it can also foster cooperative transmission from the ENs to the users by sharing common information to the ENs. However, any transmission from cloud unit to the ENs comes at a latency cost due to the use of fronthaul links. How should edge and fronthaul resources be optimally combined to minimize delivery latency?

In a recent work just published on IEEE Transaction on Information Theory, we provided a conclusive answer to this question by taking an information-theoretic viewpoint, and making the following simplifying assumptions:

1) only uncoded edge caching is allowed;
2) the cloud can only send fractions of contents via the fronthaul links;
3) the ENs are constrained to use standard linear precoding on the wireless channel;
4) The signal to noise ratio is sufficiently large.

Some Results

Our work derives a caching and delivery policy that is able to offer a near optimal trade-off between fronthaul latency overhead and downlink transmission latency from the ENs to the users. Two key scenarios are identified that depend on key system parameters such as fronthaul capacity, edge cache capacity, and number of per-edge node antennas:

1) When the overall cache capacity of the ENs is smaller than a given threshold, as illustrated in Fig. 2, it is necessary to use both fronthaul and edge caching resources in order to minimize latency. Importantly, even when the edge resource alone would be sufficient to deliver all requested contents, the policy, it is generally required to make use of fronthaul resources in order to foster EN  cooperative transmission. In fact, when the fronthaul capacity is sufficiently large, the latency cost caused by a fronthaul delay does not offset the cooperative transmission gains in the downlink;

2) Otherwise, when edge cache capacity is above the given threshold, as seen in Fig. 2, only edge caching should be used. Under this condition, the gains due to enhanced EN cooperation do not overcome the latency associated with fronthaul transmission. Interestingly, the threshold on the edge cache capacity increases as the number of ENs’ antennas increases, since edge processing becomes more effective when more antennas are deployed.

The full paper can be found at ieeexplore (open access: arxiv)