Pairwise relationships are prevalent in real life. For example, friendships between people, communication links between computers and pairwise similarity of images. Networks provide a way to represent a group of relationships. The entities in question are represented as network nodes and the pairwise relations as edges. In real network data, there are often missing edges between nodes. This can be due to a bug or deficiency in the data collection process, a lack of resources to collect all pairwise relations or simply there is uncertainty about those relationships. Analysis performed on incomplete networks with missing edges can bias the final output, e.g., if we want to find the shortest path between two cities in a road network, but we are missing information of major highways between these cities, then no algorithm will able to find this actual shortest path. Furthermore, we might want to predict if an edge will form between two nodes in the future. For example, in disease transmission networks, if health authorities determine a high likelihood of a transmission edge forming between an infected and uninfected person, then the authorities might wish to vaccinate the uninfected person. In this way, being able to predict (and correct for) missing edges is an important task. Your task: In this project, you will be learning from a training network and trying to predict whether edges exist among test node pairs. The training network is a fragment of the academic co-authorship graph. The nodes in the network—authors— have been given randomly assigned IDs, and an undirected edge between node A and B represents that authors A and B have published a paper together as co-authors. The training network is a subgraph of the entire network, focussing on individuals in a specific academic subcommunity. The test data is a list of 2,000 edges, and your task is to predict if each of those test edges are really edges in the authorship network or are fake ones. 1,000 of these test edges are real and withheld from the training network, while the other 1,000 do not actually exist. To make the project fun, we will run it as a Kaggle in-class competition. Your assessment will be partially based on your final ranking in the privately-held competition, partially based on your absolute performance and partially based on your report.