NYARC has developed a robust web archiving program that encompasses a range of resources valuable to a diverse field of researchers. Many web archiving issues, including the varied technical and logistical aspects of collection and preservation, have been and continue to be addressed. However, access and use remain significant challenges. As NYARC and other institutions develop strategies to meet their users’ needs, it is vital to know who those researchers are or could be and what they hope to accomplish.
My fellowship at NYARC involved the technical work involved in collecting and preserving web archives, but professional development experiences associated with the fellowship have inspired the direction of this project. During a Continuing Education to Advance Web Archiving (CEDWARC) workshop in the fall, several colleagues raised questions about who is using web archives in their research, and who might be in the future. What obstacles keep researchers from using web archives in their work? The initial step of this project assesses who the users of the web archives like those in the NYARC collections are or might be.
There have been but a few web archives usage surveys, the most recent of which was conducted by Maria-Dorina Costea in 2018. Completed by humanities and social sciences members of the University of Copenhagen community, the findings showed that the community’s general knowledge of web archives and how to use them as a resource was limited. Considering the speed at which the internet grows and changes, and the various ways we adapt to those changes, has the knowledge of web archives changed since that survey? This survey aimed to build on these results, while gathering data from North American users of web archives.
I created an online survey addressing opportunities and obstacles in web archives research to collect quantitative and qualitative information about researchers’ familiarity with and uses of web archives. A different set of questions was provided to participants who had or had not used web archives in their research. In order to collect information from a wide range of researchers, I solicited participation through several specialized listservs. The survey launched on March 4, 2020 and was closed on April 13, 2020.
To evaluate the resulting data for how researchers are or are not using web archives in their work, I employed descriptive statistical analysis of the quantitative data, processing the data and creating visualizations of the results. In analyzing the qualitative data, I employed emergent-category coding to determine common themes among short-answer responses.
A total of 66 participants completed the survey to some degree, with few participants opting not to answer some questions. Because of my involvement in the library and archives field, many of the listservs with which I shared the survey are related to that field. This likely explains why the majority of participants, 40 of 66 (about 61%) indicated their current area of research is in library and information science, with responses from the fields of art history, digital humanities, computer science, and others as well. The majority of respondents categorized their current roles as librarians/archivists (27 of 66, or about 41%), with a similar number as students (22 masters and 4 PhD). Most participants were between 18 and 39 years of age (48 out of 65, or about 74%).
Though about 32% (21 out of 65) participants indicated they have never used web archives in their research, most expressed familiarity with their existence (however, as will be shown, some participants may have a misunderstanding of what a web archive actually is). Participants also expressed a fair amount of confidence in their ability to gather research data from web archives.
Those participants who responded that they had not used web archives in their research (21 out of 65) were pointed to this next set of questions. First, they were asked why they haven’t used web archives in their research, and were provided several possible options, as well as offered a space to write in additional reasons. The most common reason selected was that they hadn’t considered using web archives in their research (indicated by 13 out of 21 participants). This was followed by a lack of awareness of web archives that might apply to their research (9 out of 21) and the idea that the content contained in existing web archives isn’t relevant to their research (8 out of 21).
Participants were then offered the opportunity to expand on their answers in an open-text field. One participant expressed concurrent challenges: “It’s a mix of being unaware of what applicable web archives exist and also being uncertain on how to apply web archives in research.”
When asked why they might use web archives in their research, most participants (15 out of 20) noted that if the collections included content that they were researching, then those collections might be of use to them.
The 45 participants who responded that they had used web archives in their research were directed to this next set of questions. Of those, 43 participants provided responses in an open-text field to the first question, which asked them to provide names of web archives they have consulted in their research. These responses were occasionally difficult to parse, and many required the context of the participant’s other responses to determine if the web archives collections that participants mentioned included actual web archives, or if what they thought of as “web archives” were actually digital archives that do not include archived websites. For instance, if a participant referenced having used “NYPL,” were they referring to New York Public Library’s extensive digital archives, or the web archive project underway at the system’s Schomburg Center? Some responses were plainly not web archives, and those were counted as such. Those web archives that received two or more mentions are included in the list below; if a web archive received one mention, it was counted as “other.”
For the next question, 41 participants provided information about the ways in which they’ve used web archives in their research. Several common uses emerged, and I categorized the responses based on those. Again, it is likely that some participants conflated digital archives and web archives, so here their perceived use of web archives was likely the use of digital archives; some of these responses did not clearly match with potential research using web archives, but those responses that matched the commonly expressed uses were included, as they may indicate what researchers may be interested in finding in web archives.
Responses reflect the wide variety of research possibilities that web archives provide. Several participants noted research activities on the temporal aspects of websites (including responses such as “compare how website changes over time” and “record changes over time about individuals and organizations as presented on their websites”), as well as using web archives to collect or verify information that is no longer available on the live web. Only a few participants specifically noted research activities that can be applied to the type of large data collections that web archives can provide, such as text or network analysis.
Several responses that I classified as “other” are references to participants who are working with web archives themselves, as in collection development/maintenance or instruction on use of them.
The reasons participants gave for having used web archives in their research were largely that the archives contained content that was important to their work, though many participants also said that the web archives themselves are the focus of study. Of those who added additional information about why they use web archives, a common note was on studying web archives to be more familiar with them as a resource to share with others.
Following the separate sets of questions for those who have or have not used web archives in their research, all participants were then directed to a common set of questions for both groups.
Most participants (39 out of 65) indicated that they find the use of web archives for research to be very valuable. When prompted to expand on why they feel that way, the most common sentiment, expressed by 17 participants, was that web archives provide access to information that is otherwise unavailable elsewhere. One participant summed it up this way: “A lot of information is published on the web, and sometimes exclusively on the web. Ignoring web publications means ignoring valuable data.” Another common response, expressed by 13 participants, was the immediate accessability to research content that web archives provide, with several noting this as an advantage over physical archives.
Those participants who said they did not find value in web archives for research mainly noted that they don’t believe web archives are relevant to their own research, or that they doubt collections are available that would be relevant to their research. A few participants pointed to difficulty accessing and/or using them.
When the responses to the question of value were separated by those who have not used web archives in their research and those who have, they show slightly different results. Though the sentiment of value is still strong, those who have not used them in their research responded less enthusiastically about the perceived value of web archives.
Asked to consider what web archives might be used for, beyond what participants may have already used them for, many participants suggested various examples of accountability — whether an archived web page could be used in court to support a claim, or using it as a reference to a promise previously made by an authority, or simply providing a representation of the current existence in which a large part, for some, is recorded on the web. One participant noted that much like the web itself, web archives may provide a more egalitarian information ecosystem: “Wider access to history, individuals who are not scholars can find more information about a topic without necessarily knowing how to conduct academic research,” they wrote, adding, “Public involvement in curating archival collections.”
So what is needed to help improve that connection, access, and use? Making more information available about existing web archives was most important to participants, though better tools and training to use those archives were also seen as necessary. Some participants also noted that improved partnerships might help researchers, and the public at large, to become more familiar with web archives, suggesting that web browsers integrate connections to web archives more plainly, and that education on web archives as research resources should be integrated into high school and undergraduate courses.
Discussion and Future Research
“We’re not very long into the history of the internet, and I think web archives will become increasingly relevant,” one participant noted. Web archives as a resource may seem young, but some of the content dates back about 25 years. So while relevance may increase with time, we’re still dealing with information that may already be of interest to a variety of researchers. Knowing what’s available, what we might still want to collect, how to access and to help others access it, the different ways we might use it, all seems necessary now, not later.
This survey represents a limited scope and quantity of researchers; responses from more researchers, including those who represent a broader range of disciplines may present a more complete understanding of what might be needed to best provide assistance to them.
Even so, these results indicate that many researchers still misunderstand web archives, and many may not fully realize they exist, which are results similar to Costea’s findings (2018). Of the participants who said they’ve used web archives in their research, many provided responses that demonstrate that the concept of a web archive is still unclear for many scholars, and that the line between a digital collection of any kind (i.e., digitized historic journals or born-digital photos) and a web archive is not always clearly distinguishable. With more time, performing follow-up interviews with some of these participants may be useful, particularly to clarify their understanding of web archives in general.
Another issue facing both researchers and librarians/archivists in a position to assist researchers is that if they do have a general understanding of web archives, they may not know how they might go about using them. Additional information about interests in close and distant reading, and what training may be helpful in building those skills, could provide guidance on where to focus instruction in colleges and universities and in professional development programs.
As Joanna Drucker has noted (2013), a range of humanists including art historians should already be devising ways to work with these changes to materials, investigating how their research methods and the skills required to work with digital materials like web archives may need to change along with them. Otherwise, as Drucker warns, the scholarship may be taken up by those outside these fields. A greater push toward interdisciplinary cooperation may help with these challenges. Ian Milligan (2019) advises historians to learn some of the technical skills needed to work with web archives, or at least commit to working on teams with those who have those skills. He adds that librarians and archivists need to be part of that collaboration, as well, and as this survey helps demonstrate, the more familiar those professionals are with web archives, the more likely they are to believe they are valuable. For librarians and archivists to play a meaningful role, however, this survey suggests that our profession may benefit from additional training, such as the CEDWARC workshop, to better understand web archives and the ways we might be able to support scholars in the use of them.
As for that use, providing more examples of what web archives research might look like may help increase their use — the more ubiquitous these projects become, the easier it may be for scholars to think up ways to use them in their own research. For NYARC’s part, Sumitra Duncan (2017) describes NYARC’s web archive collection of New York City galleries as one example for possible future study because of their somewhat ephemeral nature, opening and closing or moving in relatively short periods of time. A previous pair of NYARC Web Archiving Fellows presented one way these collections might be explored by mapping the galleries included in that web archive to illustrate some of the changes the galleries had experienced (Milliken & Senturk, 2019). I hope my own explorations using the NYARC web archive collections adds to this, helping to demonstrate the wide variety of ways these collections might be used.