A version of this article originally appeared on Tedium, a twice-weekly newsletter that hunts for the end of the long tail.
Look, I’m not going to tell you that Yahoo Answers was the height of cultural artifacts.
But the thing is, it had value. And the reason it did was because of the amount of time that it was online, the sheer number of its answers, and its public-facing nature. But sites do not stay stationary, encased in amber, and there is significant financial motivation for large companies to only play the hits. After all, it’s why Top 40 radio isn’t all Dishwalla, all the time.
But after seeing yet another situation where a longstanding Yahoo-owned website is shutting down, I’m left to wonder if the problem is that the motivations for maintaining sites built around user-generated content simply do not favor preservation, and never will without outside influence.
How can we change that motivation? In a follow-up to an argument I made about historic preservation as Yahoo Groups was getting shut down, here’s my attempt to see the issue of preservation from the corporate perspective.
“I understand your usage of groups is different from the majority of our users, and we understand your frustration. However, the resources needed to maintain historical content from Yahoo Groups pages is cost-prohibitive, as they’re largely unused.”
— A statement sent to an archivist in 2019 as Verizon took steps to shut down the vast majority of the existing Yahoo Groups, the last major element of Yahoo’s user-generated content apparatus that was dismantled, with Groups meeting its maker a little over a year ago. It’s worth keeping in mind that at the scale Verizon works—making billions of dollars per year, on average—the costs of continuing to host such content would have been relatively minimal—especially given the fact that, uh, it owns a big chunk of the network through which that content is distributed.
The problem with corporate motivations is that they aren’t the same as the user’s, even when the user made the content.
Users very much have an expectation of permanence just as they did with physical media, but in the context of online distribution, these companies have competing interests driving their decision-making that discourage them from not taking steps to protect historic or vintage content.
And in the case of user-generated content, there might be outside considerations at play. Perhaps they are concerned that something within an old user agreement might come to bite them if they leave a website online past its sell-by date, opening up to liabilities. Perhaps the concern is old, outdated code that may look novel on the outside but is effectively a potential attack surface in the wrong hands. After all, if they’re not keeping an eye on it, who’s to say someone can’t take advantage of that?
And then there are reasons that are a little more consumer-hostile. Nintendo recently ended sales for a bunch of old Mario content in both digital and physical form. It evokes the old gating of home video releases that Disney used to do in an effort to keep its old content fresh and make more money from that old content.
When it comes to websites, though, much of that content is user-generated, even if a technology company technically maintains it. I have to imagine that there’s an expectation that a company only has limited capability for maintenance costs, and the motivation for doing so is limited.
But on the other hand, as digital preservationist David Rosenthal has pointed out, in the grand scheme, preservation is not really all that expensive. The Internet Archive has a budget—soup to nuts—of around $20 million or less per year, around half of which goes to pay for the salaries of the staff. And while they don’t get all of it (in part because they can’t!), they cover a significant portion of the entire internet, literally millions of websites. They have a fairly complex infrastructure, with some of its 750 servers online for as long as nine years and petabyte capacity in the hundreds, but given that they are trying to store decades worth of digitized content—including entire websites that were long-ago forgotten—it’s pretty impressive!
So the case that it costs too much to continue to simply publicly host a site that contains years of historically relevant user-generated content is bunk to me. It feels like a way of saying “we don’t want to shoulder the maintenance costs of this old machine,” as if content generated by users can be upgraded in the same way as a decade-old computer.
One thought I have is that this issue repeatedly comes up because the motivations for corporations naturally lean in favor of closure when the financial motivation has dried up. Legislation could be one way to manage this to sort of right the axis in favor of preservation—but legislation could be difficult to pass. (This was the crux of my case for trying to make the existing legislation for the National Register of Historic Places apply to websites.)
In my frustration about this issue recently on Twitter, I found myself arguing for legislation that balances liability in favor of preservation of public-facing content. But I’m a realist—a law like that would have many moving parts and may be a tough sell. So, if we can’t encourage a law, maybe we need to build strategies to make maintaining a historic website easier to lift.
was the year that the genealogy platform Ancestry.com launched a new site, Newspapers.com, to offer paid archives of newspapers to interested parties. The company, which charges about $150 per year for access to the archive, has helped maintain access to the historic record for researchers who need it. (I’m a subscriber and it is worth it.) With the exception of paid services for Usenet like Giganews, this model has not really been tried for vintage digital-only content, which seems like a major missed opportunity for companies raising concerns about financial costs for maintaining old platforms, like Yahoo/Verizon. Certainly I would prefer it to be free, but if I had to have a choice between free and non-existent, I’d pay money to access old content. Just throwing that out there.
A middle ground: An “analog nightlight” mode for websites
In some ways, I think that part of the motivation for taking down old or outdated websites is the expectation that the internal systems must also stay online.
But I think archivists and historians would be more than happy if public-facing content—that is, content that appeared on search engines, or was a part of the main experience when logged in at a basic level—was prioritized and protected in some way, which would at least keep the information alive even if its value was limited.
There’s something of a comparison here that I’d make: When the U.S. dropped the vast majority of its analog signals in favor of digital tuning, it led to something called the “analog nightlight,” in which very minimal, basic information was presented on analog stations was presented during the period before it was turned off. A TV host parlayed basic information to viewers about the transition, and told them what to do next. It didn’t entirely work—TV stations in smaller markets didn’t actually air the analog nightlight—but it helped give a sense of continuity as a new medium found its footing.
This approach, to me, feels like a path forward that could minimize the crushing pain of a loss of historic content while taking away much of the risks that come with continuing to host a site that may no longer be popular in the modern day but still continues to have value in a long-tail sense.
In the case of an “analog nightlight” equivalent for websites, the goal would be to essentially shut down any sort of attack surface through good design and planning. Before the site is taken offline in its original form, users are given the chance to download their old content or remove it from the website over a period of, say, 60 days. This is not too dissimilar to the warnings that site operators offer when they shut down currently—and looks like what Yahoo Answers is doing.
But once the deadline is hit, the site operators launch a minimal version of the original platform, with no way to log in or comment. The information is static, and there’s no directly accessible backend. That’s actually the important part of this—the site needs to be untethered from its original content-management system so no new content can be added. Instead, the content would be served up as a barebones static site (perhaps with advertising, if they roll that way), so as to minimize the “attack surface” left by a site that is not actively being maintained.
This reflects relatively recent best practice in the content-management space. Platforms like Netlify have gained popularity in recent years because they actively separate the form of distribution from the means of production, meaning that security risks are minimized. This is a great approach for live-production sites, but for sites that are intentionally meant to stay static, it removes one of the biggest risk factors that might discourage a content owner from continuing to maintain the work.
As far as liability concerns go, language could be included on the page to allow for users to remove old content if they so choose, along the lines of the “right to be forgotten” measure of the European Union’s General Data Protection Regulation (GDPR), though that measure includes a carve-out for purposes of historical research, which an archived version of a website would presumably cover. But the thing is, sites that are driven by user-generated content are generally protected by Section 230 in the United States anyway, so the onus for liability for the content itself falls onto the end user.
And if, even after these steps, a company still feels uncomfortable about hosting a dead website, they should reach out to librarians and archivists to donate the collection for maintenance purposes—perhaps with a corresponding donation to said nonprofit so they can cover the hosting costs. The Internet Archive actually offers a service like this!
The one site that makes me think that a model like this could work is Gawker. The news and gossip site, which was taken offline by the combination of a lawsuit and a corporate asset sale that specifically excluded it, remains online nearly five years after its closure in a mode very similar to this. Comments are closed and not visible to end users, which is a true shame as those comments often fed into the writing. But the content—the part that was truly valuable and important—is still out there, accessible and readable, even if you can’t do anything with it other than read it.
There are no ads. It’s a shrine to a platform that a lot of people cared about, even if others found it controversial. And there’s no reason what Gawker did couldn’t work in an equivalent way for Yahoo Answers.
Look, I’m going to be the first to fully admit that the motivations for protecting publicly accessible user-generated content simply remain only if the owner of that content feels “nice” about it.
And even then it feels like a bit of a surprise.
Recently, Warner Bros. got a little bit of flak for replacing its long-online Space Jam website, which dated back a quarter-century in its original form, with a site for the sequel. But I think what the company did was actually shockingly noble. They not only left the old site online, but they made it accessible from the new one. The work done to maintain this was not perfect—I think they should do archivists a solid by putting in 301 redirects on the old URLs of the vintage site, so they go to the new place—but the fact that they showed the initiative at all is incredibly impressive given what we’ve seen of corporate motivations when it comes to preservation.
Honestly, part of this was a result of people who were associated with the website’s creation still being at the company years later and being willing to speak up for preserving it—a 2015 Rolling Stone article explains that the site actually briefly was taken down after it went viral in 2010, only for employees involved in the creation of the site (now with leadership roles in the company) to swoop in and save it after some executive made the call to shut it down.
“If we had left the company, the site probably would not exist today,” said Andrew Stachler, one of the employees involved with saving the effort. “It would’ve gone down for good at that time.”
But imagine if they weren’t there. We’d be telling a different story right now.
And perhaps that’s what many companies need—someone who is willing to go to bat for the purposes of archival and protection of historic content.
In the digital age, preservation is the act of doing nothing but minimal upkeep and being comfortable with that fact. As proven time and time again, companies are more than comfortable with killing services entirely rather than leaving well enough alone.
Perhaps the way to save user-generated content is by making it as painless as possible to keep the status quo.
This post has been read 20 times!