1.022 | Fall 2018 | Undergraduate

Introduction to Network Models

Final Project

General Description

Below is a general description of the final course project. We would prefer students to work individually but teams of two are also acceptable. For the project, you may either choose to work on the default topic of controlling epidemics or you may propose your own topic of interest.

The project requirements consist of three main parts, dealing with both theoretical and numerical analysis of a real world application of networks, and will culminate with a report and a short presentation. Regardless of your choice of topic, your project must include the following:

Part I

Identify a network dataset of interest. Download it, build the network using Python, and conduct a descriptive analysis. You should visualize the network and compute some of the network summary statistics we have discussed in class. This should include degree distribution, diameter, average shortest path length, clustering coefficient, and centralities. Include as many descriptive statistics as necessary to convey the characteristics and structure of your chosen network.

Part II

Formulate a research question tailored to your dataset. This should be the thesis of your fi nal report and should describe exactly what you are interested in studying / answering. This part should also include the description of the mathematical model you intend to use to answer your research question. This could be a model you have learned in the course or taken from a text book or paper related to your research question.1

Part III

Implement a strategy or solution to your research question posed in Part II and report your results / findings. This should be the main section of your fi nal report.

For the default epidemic project we will provide the dataset, the research question, the model that you should use for the spread of a virus over networks, and the list of the tasks you should perform for each part of the project. You may alternatively propose your own project of interest, contingent on our approval.

For details on the default epidemic project, see Controlling Epidemics on Networks.  It will give you an idea about the type of the research question and the deliverables.

For details on how to propose your own topic of interest together with some suggestions regarding the alternative topics, see Proposing Your Own Application of Interest.

Grading Scheme

The most important part of the project will, of course, be your final report, so make sure it is complete and understandable. Make sure to complete all three parts listed above and try to come up with some interesting conclusions after finishing Part III (i.e. don’t just list a bunch of results but instead try to provide a clear answer to your question posed in Part II). You will also be graded on the clarity of your presentation. Remember, this project is worth 40% of your grade so please do not leave it to the last minute.

1 You need to provide proper citation.

The default application for the project is “Controlling Epidemics on Networks.” In this project, we consider the problem of epidemic spreading on networks and how to prevent their outbreaks. You will use a global network of flight connections as your dataset. The original dataset was made available by Tore Opsahl and is discussed in the blog post:

We will provide you with 2 csv fi les, that contain the airports (nodes) and flights (edges) that make up the network. The research question of interest (Part II) is: Given a virus that can spread throughout the network and the model of spread on networks (also described in Part II) what immunization strategy is most effective at stopping / slowing its spread (i.e. which nodes should be vaccinated if you are only able to vaccinate a finite subset). The goal of Part III is to develop, implement, and compare several immunization strategies to fi nd the most effective one. The report should describe the various strategies and their performance on the dataset.

For further detail, see Controlling Epidemics on Networks (PDF).

Default Project Data.zip

You have the option of proposing your own application of interest for the project. To do so, you will first need to write a short one-page proposal:

  • Briefy describe the application in mind and the model you want to use (this could be a reference to a paper on the topic from which you want to use the model).
  • Specify the dataset you want to work with (with the link to the dataset).
  • Discuss what you want to do with the data.

You should email your proposal to both the Teaching Assistant and instructor by Lecture 18. Be sure to specify your research question (Part II) and your proposed approach (Part III). Don’t worry about the bumps that you may face by going with your own project, we will help you get it done. If the findings are novel and interesting, it can result in some publication and you can continue working on it as a paid job at the MIT Institute for Data, Systems, and Society (IDSS) over the summer.

Here are some papers (and ideas based on them) that may help you come up with an idea for your proposal:

Analyze inter-bank connection to understand how the network (for example, distribution of degrees, centralities, …) changes with the state of the economy (e.g., in a volatile economy like 2008 versus a more stable economy in the years before). Consider the systemic risk in the network such as how defaults would spread from bank to bank. Does it depend on the centrality of the defaulting banks or the underlying network structure? A good database for this idea is provided by the International Monetary Fund website.

Given some novel information and a social network, which individuals would you inform in order to maximize its spread if you can only contact a few people? First de fine an underlying model of diffusion for a network and then compare several seeding strategies for different realistic networks. Compare their performance and its dependence on the underlying network structure and diffusion model. Does it make sense to use centrality-based measures, or exploit the friendship paradox? How do these strategies compare to random seeding?

Consider a rumor that originates at a single node and spreads through a network. If you only observe the end result of the spread (which nodes end up hearing the rumor), can you come up with an algorithm to determine which node initiated the rumor? This paper proposes an algorithm based on what they call rumor centrality. See if you can apply it to some realistic or simulated networks. Propose alternative algorithms and test their performance.

You can also look at the datasets available at the following databases to come up with an idea:

Timeline

Each individual (or group if you are working with someone else) will have to write and submit a report along with their Python code.1 The report should describe the motivation of your project, the network dataset, the descriptive information from Part I, the research question, model and thesis from Part II, and fi nally your approach and results from Part III. The Part I descriptive write-up will be due before the fi nal report, in order to make sure everyone is on track. You will also have to give a 15 minute presentation on your project during the last week of class. There will be the following deadlines.

  • Lecture 21: Part I descriptive write-up.
  • Lecture 25: Oral Presentation (15 minutes).
  • One week after Lecture 25: Final project report (including all code).

1 You may use other programming languages, while Python is preferred.

Course Info

Instructor
As Taught In
Fall 2018
Learning Resource Types
Problem Sets
Lecture Notes
Projects