Collaborative Research Network Analysis: A Case Study of Harvard’s Biomedical Research Community


The high quality of medical care in our society is built upon the creation of scientific knowledge generated from medical research. While there has been a growth of network literature and research examining both citation and co-author networks across various academic fields, there continue to be important questions that remain to be further investigated, including both community detection and time series-based analysis of networks (Newman [2004]). Consequently, we are motivated by these unexplored questions and an urge to better understand the role that academic networks and greater collaboration within the biomedical research field play in enabling successful research. For the purpose of this project, we are focusing specifically on the research community at Harvard Medical School in our analyses around the following set of core research questions:

  • Does stronger collaboration in the biomedical research co-author network translate into greater impact?
  • What effects, if any, did the NIH’s Clinical and Translational Science Award Program (fully implemented in 2012) have on advancing collaborative research?
  • In what ways do the examined collaboration networks’ centrality measurements and community structures change pre-2012 and post-2012?
  • What is the nature of community structures in the biomedical research field? Is there inter- or intra-field research?
  • Do researchers in the biomedical network have a tendency to collaborate in an intra or inter-departmental manner?

To answer these research questions, we scraped data from two sources – namely the Harvard Catalyst, a repository for all Harvard faculty, and the Web of Science. The data used for this project includes information about the title, year, journal of publication, author names, affiliation (research area), institution, and citation information from 2003 to 2017. Having organized and combined these files, we built a co-author network, working with a few different adjacency matrix versions of the data: (1) the full (weighted and undirected) network, (2) the full (weighted and undirected) network without isolated nodes – i.e. degree 0, (3) the previous two networks with the inclusion of author labels, and (4) a subsetted dataset that includes all papers that have no fewer than two authors within the Harvard Catalyst Principal Investigator list.

Then, several approaches were taken to meta-analyze the networks. First, a regression analysis was conducted to evaluate the relationship between collaboration (assessed by the degree centrality) and scientific impact. Next, 16 communities were detected using Louvian Modularity Optimization. Last, temporal analysis of network collaboration patterns allowed us to examine the potential effect of the NIH Clinical Translational Science Awards Program on advancing collaboration in this biomedical coauthor network, which was enacted in 2012. We ran a Welch Two Sample t-test on the average edge density for the pre-period (2009-2012) and post-period (2013-2016) and a two-sample Kolmogorov-Smirnov test for comparing the degree distributions for the 5024 PIs in the pre-period vs post-period.

Through our co-author analysis of the biomedical research network of Harvard University, we arrive at a number of illustrative results about the structure, growth, and impact of collaboration between researchers over the last decade. In particular, our three-pronged methodological approach paves the way to three complementary sets of finding.

First, the regression analysis assessing the effects of collaboration (as measured by degree centrality and average collaborations/paper) on impact (as measured by number of papers published, citations per paper, and the H-index in our study time frame) reveal a highly statistically significant effect of collaboration on the success and impact of a researcher across the measures.

Second, the analysis of collaboration patterns in the biomedical research community at Harvard by delving deeper into understanding the structures and patterns of collaboration through the implementation of a community detection algorithm using Louvain modularity and measures of homophily. We find that the biomedical research network possesses a strong community structure, in which closely related departments collaborate mostly within communities.

We ultimately concluded our temporal case study analysis of the effects of the Clinical and Translational Science Award Program, we find that the CTSA seems to have had an impact on collaboration patterns by consolidating communities, and moreover resulted in a strong increase in edge density and degree distribution measures after its full implementation in 2012.


This work was done as a part of the class taught by Professor Caroline Uhler and Professor Stefanie Jegelka at MIT.

This was a group project with four other students.