Notebooks vs. Scripts

Project details

In the recent years, computational notebooks, e.g., Jupyter notebooks, have become popular. Many researchers, especially in the field of machine learning, use these notebooks due to the many benefits they offer, such as the ability to add narration and viewing execution output interspersed with source code. However, scripts are still used across by many data scientists… or are they? What is the role of scripts and notebooks in the current toolbox of data workers? I wanted to investigate these questions with this project.

 

The two programming modalities in data science programming

 

Study

Data collection

We collected data from 21 data workers. These data workers were from different fields, had various levels of experience, and used one or more of R, Python, and MATLAB. We observed our participants when they worked on their data science task whenever possible, noticing how they used the two main programming modalities, notebooks and script files. We encouraged our participants to think aloud during the session, and asked them to provide us a walkthrough of their recent project. We collected the audio and video footage of each session, and made a note of the interesting insights that came up.

Analysis

We used a rigorous grounded-theory approach employing various coding techniques to analyze the interview transcripts. The codebook can be found here.

Survey

In addition to the study, we also conducted an online survey with 62 respondents. Participants were recruited through word-of-mouth and social media.

 

Key findings

Given below are some of the key findings of our work:

  • The initial phase of experimentation is often the most time-consuming one. Notebooks support this phase better than scripts.
  • The later phase of curating source code and storing source code is better supported by scripts. Notebooks lack support in production pipeline, making scripts indispensable.
  • Scripts were considered to be “rigid”, “formal”, and “outdated”, whereas notebooks were considered to be “fun”, “interactive”, and “casual”.

In addition to the above findings, we also provide design recommendations.

For details, see the project page.