Bar Chart race of artists

Introduction: For the midterm, I used data from the Tate gallery collection, specifically the artwork data from the 1980’s, to make a bar chart race for the number of artworks each artist had in their gallery for that given year. For the methods of this project, I used OpenRefine and R to clean and format the data, and the data visualization was created using Flourish. The resulting bar chart race can be seen below.

Sources: This dataset was gathered from the Tate gallery collection, a GitHub repository that contains data on all of the artwork that Tate owns. The dataset is a subset of the large artwork data csv file that can be found on the repository.

Process: The data was cleaned using a combination of OpenRefine and R. I initially used R to add a column to the dataset for the count of artworks for an artist in a given year. From here, I used OpenRefine to filter columns to only have the artist, year, and counts, columnize the years into separate columns based on the counts, then transform the data where each row represented an artist. Finally, I used R again to filter the data to only include the top 10 artists with the most number of artworks in the gallery for each year, and to change NA values in the dataset to 0. The reason all of this cleaning was necessary was for Flourish to be able to process the dataset.

Presentation: Flourish was then used to create a visualization of this data. A bar chart race was chosen for this, because it shows how the data changes from year to year, and there were too many unique artists for a line graph to look aesthetically appealing in this case (which is typically a good option for time series). Additionally, a bar chart race can be paused at a given moment, and is a fun way to engage the viewer in a more interactive way than a static graph or a screenshot. An element that had to be changed in order to make the bar chart more presentable was to change the margins on the side to prevent names from being cut off.

Significance: The bar chart race approach used to visualize the number of artworks per year at an art gallery provides some important insights into the trends and patterns of the data. Firstly, it allows for a clear visualization of the change in the number of artworks produced by artists over time, which could indicate something about an artist’s productivity that year with how many pieces they are producing. Additionally, the use of a bar chart race shows how volatile an artist’s output was year to year, and allows us to compare the output of different artists across the same time period.

The application of this digital approach to the data is particularly relevant to the field of Digital Arts & Humanities, as it provides a powerful tool for visualizing and analyzing large datasets related to the arts. In the traditional sense of data science, such visualizations would not be used to show the data at hand, as line charts would be a more likely choice here given the discrete nature of measuring artwork counts based on year. A pitfall to this method is that it’s less statistically accurate than other methods, as the number of artworks doesn’t decrease in linear rates as the bar chart race might indicate; however, the accuracy lost in this approach shows the productivity of each artist and the competition between each artist for outputting pieces that data science would likely fail to capture. Overall, the use of digital approaches in the study of the arts represents a significant shift in the way we understand and engage with artistic production, and highlights the importance of interdisciplinary collaboration between the fields of arts, humanities, and data science.