StatWire is an attempt to marry the power of text-based programming with the simplicity of visual programming. Text-based programming is very powerful, but can be hard to learn and use. Visual programming, on the other hand, uses predefined blocks to help ease beginners into programming, but are limited by what they offer. We combined the two into a seamless interface called StatWire.
To understand more about the state of R scripts written by data scientists, we collected 40 scripts from Open Science Framework and from researchers at the local university. In total, these scripts contained 20,303 source lines of code. We found that:
- These script files contained a significant amount of code cloning. Over 70% of the script files were clones.
- Script files contained very little modular code. Functions were seldom used, and most code was written to be executed in a top-down manner.
StatWire: A hybrid interface for statistics
Motivated by the above findings, we implemented StatWire. It is an IDE for R that closely integrates the traditional text-based editor with a visual data flow editor to better support statistical programming.
The idea behind StatWire is to augment the power of text-based programming with the ease of visual programming.
Evaluation with users
We evaluated StatWire with four R users. Participants used StatWire, RStudio, and RapidMiner/KNIME (which are two other hybrid programming environments) to perform statistical analysis. We analyzed our participants’ utterances to find that the hybrid approach in StatWire could result in source code that is more understandable and efficient.