Jupyter and science
Some weeks ago was reading a paper for a course I am taking for my Digital Humanities minor. It is an introduction to a journal issue containing digital applications of the humanities. It contains some very interesting insights about how methods from data science and database analysis can be used to improve research in the humanities.
Specifically, it made me think about Jupyter, a tool that has revolutionised research communication in scientific fields using data science and database analysis, for example biology, psychology and neuroscience.
Notebooks for literate programming
Jupyter is a tool for interactive programming, allowing you to create notebooks that combines executable code with explanations in natural language (called literate programming) into a ‘notebook’ file.
In a Jupyter notebook, you can write in natural language complete with markup, but also insert pieces of code between other text. This allows you to guide a reader through the contents step-by-step.
Those also following the Digital Humanities minor will have some experience with this tool already using Python, but it has support for many other programming languages mainly used for data analysis, such as Julia and R.
It is used for education and in tutorials, but lately is has been used increasingly often to publish scientific results, allowing a critical reader to interact with the researchers’ data set in real time.
Before Jupyter
This last point is the main reason for its popularity in academia. The data science community had to endure quite a few scandals the last few years.
Scientists often don’t share the raw data sets they used, and even if they did, the methods described in their papers would not be clear enough to allow someone else to get the same result.
The emergence of Jupyter as a standard for data science communication has helped researchers to transparently clean up, plot, and run statistical analysis on data, ensuring reproducibility of their work.
Today
Jupyter notebooks are incredibly popular, with more than 10 million notebooks uploaded on the GitHub code-sharing platform as of today.
The popularity in academia is so great that The Atlantic has called the scientific paper obsolete when compared with literate programming notebooks such as Jupyter notebooks.
Besides research and science communication, Jupyter notebooks have found use in tutorials and demonstrations of software. Notebooks are used to teach (for example in the Hacking the Humanities course), to great effect: research suggests that using Jupyter notebooks helps students with developing their written communication.
The future for Jupyter
There are many exciting developments going on that will make Jupyter even more powerful.
Frameworks for automated Jupyter notebook grading already exist, but researchers have made tools that can integrate them with learning management systems such as Canvas, and perhaps soon also BrightSpace.
Extensions of the Jupyter framework can make it possible to use multiple programming languages in the same notebook.
As data sets become increasingly large, cloud computing has a chance to step in. For example, Google’s Colaboratory project allows anyone to interactively run notebooks that were shared by others on GitHub.
JupyterLab is a next generation notebook interface that is more similar to integrated development environments traditionally used by developers.
Future tools for the humanities
It’s clear that Jupyter has began a revolution in data science, and its possible uses are likely not yet fully explored. I am curious if it can kick off a similar change of paradigm in the humanities. Perhaps there are places in humanities research that digital tools can fill.
Hopefully we’ll gain some more insights on this over the course of the Digital Humanities minor!
It’s a very interesting post; thank you! It took a while to understand the IDE thingy when learning to code, and I didn’t look into their background, so it’s really interesting to see a post about it
I didn’t know that the Jupyter Project was specifically aimed at academics! I’m also curious if they will keep their focus on making Python more readable for all audiences or if they will shift their focus to other programming languages or use cases