How do Scientists Develop Software

Gustavo Pinto
4 min readFeb 26, 2018

This is a summary of the paper “How Do Scientists Develop Scientific Software? An External Replication”. As the name suggests, we conducted an external replication (different authors in a different context) of a well-known paper about how scientists develop and use software. The original paper was published in 2009, and given the recent changes in the software development practices in the last few years (e.g., the introduction of social coding websites and the prevalence of on-line learning platforms) and unique challenges of scientific research (e.g., the frequent and unforeseen changes in requirements and the need for both highly specialized domain knowledge and programming expertise), we believe that we should update our understanding about this topic.

In the original study, the authors conducted an on-line survey with 2,000 scientists. In our replication, we surveyed 1,574 R developers. One might argue that not all R developers are indeed scientists. We concur. To mitigate this threat, in our invitation email we kindly asked R developers that do not consider themselves as scientists not to answer the questionnaire. We also asked them not to share the questionnaire with their peers, since some of them might not be scientists as well. We also mined their R repositories to complement some of the findings. For instance, the following figure shows the number of projects with commits in the master branch over the last decade.

The following compares the main findings of the two studies:

Original Study: 96.9% of the respondents state that informal self study is important or very important for developing scientific software (60.1% state that informal learning from peers is important or very important). Only 34.4% state that formal education at an educational institution is important or very important.

Replication Study: 99% of the respondents agree that self-study was important or very important (60% suggested that learning from peers is important or very important). However, 35% of the respondents believe that formal education is important or very important.

Original Study: The importance of graduate studies is clearly greater than undergraduate studies. This, in turn, is clearly greater than that of high school studies. Formal training at work was considered as important or very important for only 13.1%.

Replication Study: Similarly, most of the respondents believe that graduate studies (72%) and their last five years at work (76%) were important or very important to learn what they know about software development. Our respondents also perceived High school (61%) and over the last 15 years at work (51%) as not important or not important at all.

Original Study: 84.3% of the responses state that developing scientific software is important or very important for their own research. 46.4% state that developing scientific software is important or very important for the research of others.

Replication Study: 86% of the respondents believe that scientific software is important or very important to their own research. However, 63% of our respondents stated that developing software for the research of others is important or very important.

Original Study: On average, scientist spend approximately 30% of their work time developing scientific software.

Replication Study: On average, our respondents spent 30% of their working hours developing software per week (min: 0%, 1st quartile: 10%, median: 20%, max: 100%, std dev: 23.73%).

Original Study: 53.5% of the respondents state that they spend more or much more time developing scientific software than they did 10 years ago (44.7% spend more or much more time than they did 5 years ago and 14.5% spend more or much more time than they did 1 year ago.

Replication Study: 82% of the respondents believe that they spend ‘‘much more time’’ or ‘‘more time’’ developing software than 10 years ago; 72% when compared to the last five years. Over the last decade, there is also an increase in activity, in terms of commits, in the studied projects.

Original Study: Scientific software is either used by a very large number of people (more than 5,000 users) or by a very small number of people (less than three).

Replication Study: Our respondents work mostly alone (53% of the respondents develop software alone) or in small teams (another 42% develop software within a team of two to five). At maximum, we found 723 stars and 139 forks in the analyzed projects.

Original Study: The level of importance that scientists assigned to a software engineering concepts is mostly consistent with their understanding of this concept. Except for ‘‘software testing’’ and ‘‘software verification’’ scientists assigned a higher level of importance to these concepts than they judged understand.

Replication Study: Generally speaking, we observed that the respondents judged they knew less about the concepts of what they believe are important. A Chi-Square test confirmed that the scientist knowledge depends from what they judge important.

Original Study: There is no consistent trend of association that links an increase of project or team size to perceived importance of software engineering concepts.

Replication Study: After performed the ANOVA test, we could not reject the null hypothesis that project and team size influences the respondent’s perceptions about the software engineering concepts, once all p-values were higher than 0.05.

Categories: blog

Updated: February 26, 2018

Twitter Facebook LinkedIn

Originally published at www.gustavopinto.org on February 26, 2018.

--

--