Writing history in the age of Big Data

Our age is truly the age of Big Data. Every day we tweet, write posts and upload photos to Facebook and Instagram and send messages through WhatsApp to keep in touch with our friends and relatives. While doing this, we might not be aware that we are constantly producing new data, thus only increasing the amount that is already out there. As someone who is used to work with mostly a limited amount of sources to analyse history, it seems to be such a great contrast with all the data that is now available for our own age. It makes me wonder, how is Big Data going to shape how future historians will one day write about our day and age?

From data shortage to data overload

Generally speaking, the further we go back in time, the less sources are available to us. This not only makes it difficult to give certainties about past time periods, it also tends to shift our perspective to parts of the population that we have more information about, which are mostly the elites of a society. The availability of more sources about the recent centuries has already led to a certain democratization of history, as it enabled us to write more in-depth narratives about the lower classes of society. The age of Big Data will undoubtedlty bring a further democratization of history.

For previous ages, historians have to create consistent narratives on basis of an exhaustive amount of sources. As Ian Milligan describes in his book History in the age of abundance? : how the web is transforming historical research, the age of Big Data is going to transform the workflow of historians.[1] Instead of dealing with data shortage, in which we will never have all data that we might want and in which the availability of sources can actually steer the kinds of research we do, it seems we now encounter a situation of data overload. It has to be mentioned that whereas data shortage imposes limitations on the possibilities for historians to do their research, data overload also comes with its implications. How are we going to analyse a dataset which consists of millions of records? As Ian Milligan argues, historians have to be aware of the upcoming transition as the field of history is already starting to pick up topics of the 1990’s. It will be crucial for future historians to be able to filter data instead of being overwhelmed by data overload.

The vulnerability of digital data

Taken from another perspective, there is also some concern about the vulnerability of digital data and that it might eventually be lost. Instead of an age of data abundance, we might thus be heading to what is already called a ‘digital dark age’.[2] This concern makes us reflect on the fact that although digital data is available to us now, if not documented properly, it will be lost after a while. Websites, including social media, come and go and there is off course a possibility that with the ending of an online platform, all of its user data will be gone. It is thus important to think about the documentation and archiving of digital data. The Wayback Machine is a well-known initiative which documents websites, thus saving the data that it represents. Besides that, some companies offer products to archive social media, which is mostly used by governmental agencies. These methods are at the moment able to save a lot of digital data, but it cannot document all online content. From an ethical point of view it might be asked in how far all this data has to be documented and preserved forever. No one will argue against archiving the social media posts of a politician, but should this also be done with the online behaviour of a regular citizen? Privacy concerns are at the heart of debates about the availability of digital data and might (and maybe should) limit the amount of data that will finally be preserved.

Besides data being lost due to improper documentation, the fear of a digital dark age is also fuelled by both the incapability of modern devices to work with older file formats and a concern about attacks on our digital infrastructure.[3] Although there are some solutions to these issues, it cannot be guaranteed that no data will be lost. I think it goes too far however to state that it will result in a digital dark age in which all digital data is lost.

Embracing the future

I would like to conclude that the age of Big Data is not only something to worry about. Although there are some concerns and Big Data might also have its limitations, it still seems that there will be an abundance of digital data that will give future historians a lot of material to work with. Not only will it further democratize historical narratives, the digital age itself will also become the subject of new historical fields.

[1] Ian Milligan, History in the age of abundance? : how the web is transforming historical research (2019).

[2] Sarah Ditum, ‘Why we are in danger of entering a digital dark age, losing huge amounts of online information’, New Statesman, (2019) <https://www.newstatesman.com/science-tech/internet/2019/03/why-we-are-danger-entering-digital-dark-age-losing-huge-amounts-online>.

[3] Pallab Ghosh, ‘Google’s Vint Cerf warns of ‘digital Dark Age”, BBC, (2015) <https://www.bbc.com/news/science-environment-31450389>.

Header: Camelia.boban, ‘File:BigData 2267×1146 trasparent.png’ (2014, online image) <https://commons.wikimedia.org/wiki/File:BigData_2267x1146_trasparent.png>.