These are done only when another fix is done: 1. Add https if missing from the archive.org URL. 2. Add second-level domain web if missing (archive.org/web/ --> web.archive.org/web/) 3. Add "/web" path (web.archive.org/2016/ --> web.archive.org/web/2016/) 4. Remove ":80" (eg. https://web.archive.org/web/2016/http://example.com:80/). Port 80 is added by the API and not needed. Non-80's are retained.
If |archiveurl= is empty, remove |archiveurl= and |archivedate= and add {{dead link}}. If |archiveurl= is empty but the |url= is working then leave alone.
Check all Wayback Machine URLs for response code errors (anything but 200s). If an error code, try for a better URL via the Wayback API - first using accessdate, then using the earliest date available. If still none found, remove |archiveurl= and |archivedate= and add {{dead link}}.
The wayback template is mangled in a certain way. Action: re-assemble. It won't delete multiple instances if they exist in the same ref (as in the Example).
Bare wayback URLs outside templates. If these return 404 etc replace with the regular URL. WM is currently unable to add {{dead link}} in this case.
WM design
Multiple HTTP checks at application layer if Wayback reports an error to account for brief outages or intermittent responses.
In addition time outs & retries built-in to the web transfer agent settings (wget)
Multiple checks of the Wayback API using multiple dates to ensure a page really is unavailable.
Re-checks the API results by looking at the header to ensure it really is a good page.
If IA returns a 404 Bummer. The machine that serves this file is down. -- treat it as a code 200 and leave the link alone.
If no Wayback available, checks Memento for alternative archives such as Library of Congress, WebCite and a few dozen others.
Statistics
August 2015 to June 6, 2016
WaybackMedic checked ~140k articles edited by Cyberbot II from August 2015 to June 6, 2016. It found ~374596 wayback links (includes duplicates) of which 29171 were dead in 17978 articles. It was able to fix 8785 by finding a new snapshot date, and 661 by finding an alternative archive service through Mementoweb.org - the rest 19602 were deleted from Wikipedia (robots.txt or missing page or link was never good to begin with). Other fixes and problems were logged and corrected.
WaybackMedic Stats
Type
Number
Description
Bummer
215
Wayback links that return "Bummer page not found"
API mismatch
15596
Wayback API returned fewer records than sent
Bogusapi
4360
Wayback API-returned links that don't match real status code