User talk:RetractionBot/Archive 1

From WikiProjectMed
Jump to navigation Jump to search
Archive 1

Hi Sam

Hi User:Samwalton9. Is this really yours? Anna Frodesiak (talk) 12:10, 1 December 2018 (UTC)

I also have this same question. Pinging Samwalton9. ~Oshwah~(talk) (contribs) 12:19, 1 December 2018 (UTC)
@Anna Frodesiak and Oshwah: It is indeed, thanks for checking! Sam Walton (talk) 12:34, 1 December 2018 (UTC)
Samwalton9 - Cool deal; thanks for confirming! :-) ~Oshwah~(talk) (contribs) 12:35, 1 December 2018 (UTC)
Thanks for confirming. :) Anna Frodesiak (talk) 16:57, 1 December 2018 (UTC)

{{Errata}}

You might want to add support for this to the bot too. Headbomb {t · c · p · b} 01:05, 16 February 2019 (UTC)

Update to {{Retracted}}

Instead of using {{retracted|{{PMID|123456}}}}, the new code is {{retracted|pmid=123456}}. It also supports |bibcode=, |doi=, |pmid=, and |pmc=. Headbomb {t · c · p · b} 18:00, 22 January 2019 (UTC)

@Headbomb: Thanks for the heads up! Bot progress is still waiting on some conversations regarding the data source, but I hope to continue with development soon :) Sam Walton (talk) 18:12, 22 January 2019 (UTC)
I figured it'd just be more reliable to have dedicated parameters, and support multiple ones, than have a general WYSIWIG field where people could put a lot of crap in it. Headbomb {t · c · p · b} 18:16, 22 January 2019 (UTC)
@Samwalton9: While the Retraction Watch thing is going on, you could also consider using PUBMED directly. It'll be medicine-centric, but it'll be a good start. For example PMID 23432189 links to a retraction notice at PMID 29897867 (and mentions an erratum in "N Engl J Med. 2014 Feb 27;370(9):886"). Likewise PMID 29307074 links to an erratum at PMID 30798464.Headbomb {t · c · p · b} 09:06, 25 February 2019 (UTC)
@Headbomb: Yes, that was my original plan - to create a fresh database by querying the Pubmed and Crossref APIs for papers flagged as retracted and keeping it updated. From my understanding, RetractionWatch take this data and do further cleanup to ensure that everything flagged as a retraction indeed is a retraction, and they therefore have a more reliable data source. That said, if I don't hear from them soon and find some time to work on this I'll go ahead with the original plan for now. Sam Walton (talk) 09:27, 25 February 2019 (UTC)

How to avoid an edit warring bot

Hi! Is this enough to hinder the bot form tagging the same citation again, or is there some way to stop the bot from editing one specific citation again? (tJosve05a (c) 22:39, 28 April 2019 (UTC)

Semi-relatedly, it might be good to have {{Outdated publication}} to flag old Cochrane reviews and other Living reviews. And possibly old preprints too, like arXiv:1801.02658v1. Headbomb {t · c · p · b} 22:48, 28 April 2019 (UTC)
Great point on potential edit warring. I've disabled tomorrow's bot run and will ensure it won't edit war before restarting it. Hadn't considered this. I'm not sure about Outdated publication though - while it's a good idea I'm not sure how feasible it will be to have the bot be able to distinguish between that and a retraction given that the Crossref data (and their own definition) simply flags them as retractions. I'm a little confused by the undo, however. @BallenaBlanca: As far as I can tell the tag was correct - Cochrane issued a retraction of this review on the basis that it was outdated, and the retraction notice is located here, so the template links to the retraction notice (the latter). Could you explain what you mean by this not being the retracted article? Sam Walton (talk) 23:15, 28 April 2019 (UTC)
Hi! I understand that this can be confusing. I will try to explain it.
We have a 2008 Cochrane review. This is what the bot labeled by mistake. It is valid:
PMID 18425890
Cochrane Database Syst Rev. 2008 Apr 16;(2):CD003498. doi: 10.1002/14651858.CD003498.pub3.
Gluten- and casein-free diets for autistic spectrum disorder.
Millward C, Ferriter M, Calver S, Connell-Jones G.
In April 2019, the Cochrane published a new revision of the same subject, with the same title and the same authors:
PMID 30938835
Cochrane Database Syst Rev. 2019 Apr 2;4:CD003498. doi: 10.1002/14651858.CD003498.pub4. [Epub ahead of print]
WITHDRAWN: Gluten- and casein-free diets for autistic spectrum disorder.
Millward C1, Ferriter M, Calver SJ, Connell-Jones GG.
But what happened? In reality, no new RCT was made since 2008. Therefore, this 2019 review did not bring anything new. That was the reason for its withdrawal.
I explained it in all the edit summaries [1] [2] [3] [4] [5] [6] [7] [8] "This is not the retracted article, but the 2019 one PMID 30938835. Reason "This review was withdrawn from the Cochrane Library in Issue 4, 2019, as it has not been updated since its last revision in 2008". https://www.cochranelibrary.com/cdsr/doi/10.1002/14651858.CD003498.pub4/full", but it seems that I did not do it well ... --BallenaBlanca 🐳 ♂ (Talk) 00:25, 29 April 2019 (UTC)
@BallenaBlanca: To clarify, the new revision (the second 'publication' linked there) is the retraction notice, right? As such, it's what the bot linked to. If we want to continue linking to a retracted article (in this case because it was only retracted due to being outdated), shouldn't the way forward be to use the |intentional=yes field of the retracted template, rather than reverting that template addition entirely? Sam Walton (talk) 09:00, 29 April 2019 (UTC)
"To clarify, the new revision (the second 'publication' linked there) is the retraction notice, right?" No, it is not the retraction notice of the 2008 review.
There are two different reviews: one of 2008, which has not been withdrawn, and another one of 2019, which has been withdrawn.
Explained in other words. The Cochrane made a new revision on the same subject. The conclusions are identical to those of 2008 because, during these eleven years, there have been no new studies that meet the criteria to be included in the systematic review. Therefore, a new review does not make sense. It is duplicating the same article. That's why they withdrawn the new 2019 paper.
Look at the two papers, each one identified with its own PMID and doi, and published on different dates: PMID 18425890 (2008 Apr 16) and PMID 30938835 (2019 Apr 2). They are identical. It is absurd to publish two identical papers. See the data of the withdrawn article, it is the one published in April 2019 [9]
Sorry if I'm not explaining myself correctly. The language barrier makes it more difficult for me .... --BallenaBlanca 🐳 ♂ (Talk) 21:26, 29 April 2019 (UTC)
@BallenaBlanca: Ah! I understand now, thank you for explaining in detail, I appreciate it. It's strange that these are logged in Crossref as retractions, given that the retraction should be issued for the "updated-but-not-actually" review rather than the outdated review, but I guess that's the data we have to deal with. Do you think it would make sense to simply blacklist Cochrane Systematic Reviews from the bot's operation? It seems like a rough approach, we'd miss a genuine retraction of one of those reviews, but I'm not sure there's a better way given how these are recorded in Crossref. Sam Walton (talk) 22:06, 29 April 2019 (UTC)
You are welcome! Thank you very much for your effort and your kindness.
I do not think that's necessary, but it's just my opinion. I have not reviewed all the bot notices, just a few of them, but the ones I have seen are correct (with this exception).
Probably the best thing is that you ask this in the Wikipedia talk:WikiProject Medicine to check if there are frequent errors. --BallenaBlanca 🐳 ♂ (Talk) 22:49, 29 April 2019 (UTC)
@Samwalton9: in the meantime, possibly untag the Cochrane retractions? Headbomb {t · c · p · b} 00:30, 1 May 2019 (UTC)

@Samwalton9: It's sad to see that this caused you to stop the bot compleatly...any plans on the future? Jonatan Svensson Glad (talk) 02:23, 15 December 2019 (UTC)

Mistake

See [10]

It matched doi:10.1038/380707a0 with doi:10.1038/36382, rather than doi:10.1038/36384. Headbomb {t · c · p · b} 21:00, 26 May 2024 (UTC)

I've reviewed this today - there appears to be an error in the datasource on this one, rather than the bot doing anything unexpected. I will try and get in touch with them to confirm. I can add a crossref check back in if these sort of errors are common. Mdann52 (talk)

Mistake 2

[11]

|pmed=0 makes no sense. First the parameter is |pmid= not |pmed=, and a value of 0 shouldn't be allowed.

Headbomb {t · c · p · b} 21:05, 26 May 2024 (UTC)

See also [12] Headbomb {t · c · p · b} 21:12, 26 May 2024 (UTC)

Fixed in the code with PR2. I spotted these while it was running - test of changed code here. Mdann52 (talk) 06:23, 27 May 2024 (UTC)
12 appears to have been a case which the bot wasn't set up for where an expression on concern and a retraction would be issued, I've fixed the assumption that led to this being missed. Mdann52 (talk) 07:04, 27 May 2024 (UTC)

Wikidata

@Samwalton9: If you ever get this bot up and running again, perhaps it can be adjusted to also do edits such a this. Jonatan Svensson Glad (talk) 16:56, 31 March 2020 (UTC)

@Josve05a: Is this still likely to be useful, do you know? Mdann52 (talk) 17:28, 28 May 2024 (UTC)
Given that 4 more academic years has passed, and I don't see a bot having done such edits in sight, I'd wager a yes. Jonatan Svensson Glad (talk) 19:31, 28 May 2024 (UTC)
However, given their new public database, I'm sure someone will make something with the data - given enough time and encouragement :) Jonatan Svensson Glad (talk) 19:40, 28 May 2024 (UTC)

Mistake 3

https://en.wikipedia.org/w/index.php?title=Stress_granule&diff=prev&oldid=1227681328%7C

It added {{Retracted|doi=10.1091/mbc.E14-11-1497-retraction|pmid=35041471}} a few too many times. Headbomb {t · c · p · b} 01:32, 9 June 2024 (UTC)

@Headbomb bot didn't seem to like tables due to a recursion in the regex. Will deploy fix tomorrow. Mdann52 (talk) 20:08, 10 June 2024 (UTC)

Possible issue

In this edit RetractionBot doesn't notice that a citation is already tagged with the retracted template. Thanks, Sunrise (talk) 04:22, 17 June 2024 (UTC)

@Sunrise: - fixed this morning - looks like all the whitespace confused the code handling detecting previous templates. Mdann52 (talk) 05:36, 17 June 2024 (UTC)

Removes existing parameter checked=yes

https://en.wikipedia.org/w/index.php?title=MDPI&curid=13943035&diff=1232637577&oldid=1231081226

Headbomb {t · c · p · b} 20:24, 4 July 2024 (UTC)

@Headbomb Cheers - bot have been stopped, fix put in locally and I'll test tomorrow before starting it up again. Mdann52 (talk) 20:30, 4 July 2024 (UTC)

If it's fixed, it's fixed. Unleash the bot!

Do you only process a few articles per day, or otherwise limit the bot? Because I find it strange that it only does 10-20 articles a day. I'd much rather the both tagged everything it could asap. Headbomb {t · c · p · b} 20:52, 4 July 2024 (UTC)

So it's rate limited to 5s between API calls (was 20s but has been reduced, but not going below 5s as I don't consider this an "urgent" task that needs to run quicker than that), and 5s (was 60s) between edits. I'm using the search API to scan for the identifiers, which does risk false positives which slow things down somewhat.
I did look at using the dumps instead, but that actually slows the task down significantly so I've not taken that option. There's currently ~49600 retractions I'm scanning for, and I've significantly optimised the code as of late which has further sped things up. I can't think of a better way to capture the data, short of pulling every DOI out the dump and running that way. Mdann52 (talk) 06:00, 5 July 2024 (UTC)
How long do you estimate it would take to process the ~50K or so retractions?
I notice that majority of retractions concern papers with DOI prefix ~10.1400 or under. It might be worth, to process all 10.1517/... original DOI with retractions together, and compare with [13], so you can do it prefix-by-prefix.
I don't know how well that would scale to big prefixes like 10.1002 (85K hits on Wikipedia) or 10.1007 (125K hits) though. Headbomb {t · c · p · b} 07:49, 5 July 2024 (UTC)
I estimate now I've sped the bot up significantly and added a bit more code for efficiency, and on the run speeds this morning, around 3.5 days if I was just scanning for all references, and hopefully 10 days to do everything (previously was looking like 30 days, however I've had some issues with the database driver I'm using on toolforge crashing). I did consider fetching by prefix, I don't think this saves time, as it's just then more pages to fetch/scan. My current approach is to take all the DOIs on the page, check against the database and fix any hits, which given a number of pages seem to have multiple retracted DOIs (and different prefixes) I think strikes a good balance. Mdann52 (talk) 08:08, 5 July 2024 (UTC)
Under 10 days for the whole database? That's pretty good! I was worried we'd be looking at months+. Headbomb {t · c · p · b} 08:18, 5 July 2024 (UTC)
I'd also alternate between DOI and PMID, after full runs, this way papers without DOIs but PMIDs get picked up, and vice versa. Headbomb {t · c · p · b} 08:19, 5 July 2024 (UTC)
The bot is already doing a "DOI" OR "PMID" in the searches, and should pick up instances of either (but if it's missed one please shout!) - 10 days is an estimate, and I'm not going to be held to that, but it shouldn't take months now! I prefer starting bots slow and sorting out the niggles. Mdann52 (talk) 08:32, 5 July 2024 (UTC)
Even if it's 12.3 days, that still would be pretty good. The bot could process the entire db twice a month, ish. Headbomb {t · c · p · b} 08:35, 5 July 2024 (UTC)