This article originally appeared in the July/August 2017 issue of Intercom magazine, published by the Society for Technical Communication. I meant to re-post it sooner, but got bogged down in other projects. The target audience is technical writers and anyone else who uses Atlassian Confluence to manage content. I made some minor edits to fit the blog format.
I work at a company that loves wikis a little too much.
“But wait,” you say. “Isn’t an active wiki user base a good thing?”
Certainly it’s wonderful that so many people at our company embrace our instance of Atlassian Confluence. Our users create pages, contribute knowledge, post comments, and answer each other’s questions. And my own team’s wiki space is frequently used and widely regarded as a trustworthy, mission-critical source of knowledge.
At the same time, our wiki has grown so large that it has been called the Tower of Babel. There are thousands of pages, and if you don’t know exactly what terms to use, running a search quickly feels overwhelming. There is an embarrassing amount of redundant, outdated information which junks up the search results. Not only does this cause immense frustration, it prompts users to create new silos of information—which of course only compounds the problem.
One of the most important ways we addressed this problem was by defining a method for page archiving using Confluence’s out-of-box functionality—a reliable manual method that won’t require you to invest money in any third-party plugins. I’ll explain that method in this post.
Note: Below I use terms and concepts (such as “spaces,” “macros,” and “plugins”) that assume a basic understanding of Confluence. If you’re new to that tool, you can get an overview here.
Challenges with Archiving
It seems that archiving pages in Confluence should be simple. Why can’t you just query a space, find all the old pages, and either delete them in bulk or move them to another space that no one uses?
Confluence does provide some easy solutions for this. For example, you can create and designate an archive space, and use it to store lists of wiki pages sorted by timestamp. But as I discuss below, these solutions only go so far in helping you identify exactly which pages can be archived. The fact is that some wiki pages are very old, and yet they still receive frequent page views and therefore shouldn’t be archived just yet.
The other challenge is time. Assessing the quality of a wiki page is laborious and subjective. Arguably the best person for the job is the page author, but what if that person has left the company? Even if the author is still around, he or she may have created hundreds of pages. And few people have the time and patience to go back through that much information to determine what to deprecate and what to keep.
Automated Methods: Pros and Cons
Ideally, you could automatically archive old wiki pages based on a combination of factors such as a page’s age (when it was last updated) and relevance (how often it has been viewed). The best solution that my team found for this is a third-party tool called the Archiving Plugin for Confluence. Developed by Midori Global Consulting, this plugin provides a range of benefits, a few which include:
- Run a report showing you the quality of wiki pages in any given space. Quality is determined through a combination of the age of a page and how often it has been viewed within a configurable date range.
- Send bulk email notifications to authors whose pages have “expired” in terms of quality. This gives the authors a chance to update, delete, or archive the page.
- Bulk archive pages based on configurable criteria and send automatic email notifications to the authors in case they need to restore the pages.
However, the plugin is fairly costly. For a company of 500 or more users, a license will cost you about $3,000 a year. Moreover, there is a learning curve and an opportunity cost: you could end up spending substantial time as an administrator of the plugin and lose your focus on other valuable writing tasks.
I would argue that the return on investment well exceeds these factors, but if your company is anything like mine, approvals for third-party software takes time, and budgets are tight. In fact, your company may already be paying for multiple third-party plugins, making it all the more difficult to decide which ones justify the investment.
A Reliable Manual Method
The good news is that you can still develop a reliable manual method for archiving wiki pages that doesn’t cost you extra money. What I describe below is certainly not a perfect solution, but it is systematic and intuitive, and my team has implemented it with great success.
Step 1: Create an archive space and update the global space header.
In Confluence you can easily create a special archive space and move outdated pages into it. This is done by going into the space settings and changing the space’s status to Archived. See full instructions at Archive a Space.
The effect of this change is that any pages within the space no longer appear in the search results and activity feeds. This reduces the noise in the wiki while still giving you the ability to reference or restore the archived pages if necessary.
Additionally, you can display a prominent message in the global header of the space to cue readers about the space’s purpose. For example, the header could say: This is an archived page. The content is outdated and should not be trusted. This too can be added in the space’s settings. You can even style the message using standard wiki markup so that it displays more effectively. For guidance on this step (and the one before it), consult the Confluence online help.
Step 2: Create a single level of subdirectories within the archive.
After creating an Archive space, you may find that it soon becomes filled with thousands of pages in no particular order. In our case, the archive space got so large that there was a major performance lag whenever someone tried to visit the space. A related problem emerged whenever a user needed to return to the space and restore a page, but couldn’t remember the page’s name. That meant we had to browse for it, which was all the more difficult due to the performance issue and the haphazard page structure.
To alleviate these difficulties, I recommend creating a single set of subdirectories within the space based on the major organizations in your company. When my team did this, the result was an alphabetically sorted list of sub-archives that looked like the following:
- Archive Home
- Analytics Archive
- Client Success Archive
- Engineering Archive
- Human Resources Archive
- Marketing Archive
- Product Management Archive
- Regulatory Archive
- Sales Archive
With this structure in place, users could move an archived page into more meaningful subsection and have a somewhat easier time restoring it if they needed to. For example, if you worked on the HR team, you would move your wiki page under the HR Archive subdirectory, and could start by looking there if you ever needed to resurrect it.
If you go this route, make sure you commit to it. We allowed users to create whatever structure they wanted under the second layer, but we required the first layer to remain intact and had to monitor it to keep it clean. We did not create any additional layers, knowing that too much structure would complicate matters.
Again, this is not a perfect solution, but arguably it’s better than nothing if your archive space expands so much that it starts having performance issues. And it’s prudent to consider building this layer ahead of time, since organizing pages ad-hoc is time-consuming and error prone.
Step 3: Identify pages to archive.
One of the more complicated aspects of the archiving process is identifying exactly which pages can be archived. In some cases, it’s easy. You will be familiar with your own pages, or the pages created by your teammates, and will thus have a good sense of what can be removed. But inevitably you will come across many other pages of dubious quality. You will also feel the urge to archive sets of outdated pages in bulk. What can you do in these cases?
Share the page with the page author. If you’re uncertain as to a page’s relevance, share it with the page author and ask if it can be archived. If the author is not around, ask someone on his or her team. You might be surprised at how quickly they respond.
Use Confluence’s content macros. You can use the Content by Labels and Content Report Table macros (both come packaged with Confluence) to generate page lists and sort them by their last updated date. The downside of these macros is that they require users to have meticulously labeled their pages, which is by no means a guarantee. Moreover, neither of these macros offer insight into how often the pages are viewed.
If neither of these methods get you anywhere, you’ll have to use your judgment. But remember that you’re not deleting the page, so even if you make a hasty or inaccurate judgment, you can still restore the page if necessary.
Step 5: Prepare you wiki pages for archival.
Not all wiki pages are created equal. By that I mean you can’t always move a page to the archive space without some unintended consequences, the chief one being that you will break any incoming links. The links will still work, but they will be misleading because they will direct users to pages that have been deemed no longer relevant or useful.
Fortunately, Confluence makes it easy to identify incoming links. This information is shown in the Page Information menu of an individual page. You can then follow each incoming link and edit the source page to redirect or remove the link. This is a time-consuming step, but an important one—especially if the page you’re archiving has been replaced with a new one that you want people to use.
Step 6: Move the page to the archive space and (optionally) document your decision in the page itself.
After all incoming links have been deleted or redirected, you can safely move the page to the relevant subdirectory in the archive space. By default, any child pages will be moved along with the parent page, but you have the option to keep the child pages behind if you like.
But what if multiple people are watching the page, and they don’t understand why the page has been archived? This could be a problem if the page is frequently used (a relatively common scenario in our company).
In such cases, consider adding a note at the top of the page to explain your rationale. To facilitate this step, my team created boilerplate explanations to cover the most common archiving scenarios, and encouraged users to modify the explanations as needed. Of course, this is another manual step, so unless the page is high-profile, consider it optional.
Archive Scenario | Boilerplate Explanation |
Page was outdated and irrelevant | This page was archived because it was deemed outdated or irrelevant. See <TICKET NUMBER> for background information. |
Page contents were combined with another page | This page was archived because its contents were merged into another page: <PAGE NAME>. See <TICKET NUMBER> for background information. |
Page was replaced | This page was archived because it was replaced by a new page: <PAGE NAME>. See <TICKET NUMBER> for background information. |
Step 5: Write instructions for the process and evangelize it.
The final step is to create a user-friendly set of instructions on how to execute the page archive process and then publish it in a prominent place on the wiki. Here are the subtasks that I recommend you include in the page:
- Identify pages to archive
- Prepare the pages for archival
- Move the pages to the archive space
- Document your rationale (optional)
- Restore an archived page
- Create a new archive subdirectory
Conclusion
It is true that the above method of page archival relies on end users to proactively identify and move pages from one space to another. A more automated approach would be preferable (especially one that determines page quality based on age and page views). Still, Confluence equips with you valuable tools to move you towards a reliable, albeit non-automated solution.
Once you have developed your process, share it regularly with users in your organization and include it in any relevant trainings. Archiving pages should be something that anyone in the organization can do. If your wiki is large and active, and you’re limited to out-of-box features, then getting as many people educated as possible is your best long-term strategy. Indeed, that kind of collaborative crowdsourcing is what wiki gardening is all about.
