Wikipedia talk:Contributor copyright investigations/Archive 3

From WikiProjectMed
Jump to navigation Jump to search
Archive 1 Archive 2 Archive 3

New tool for CCI casepages (automatic Earwig linking bot)

After reading a couple suggestions on Enterprisey's talk page, I decided to throw together a script for automatically adding links to Earwig's copyvio checker to CCI casepages. The repository is at https://github.com/jp-x-g/wigout and you can see an example of its output here. Essentially, it processes a CCI casepage, and adds little links to all the diffs. Example:

Before
After

I think that this does make the source look a little unwieldly (and the way the links are displayed could definitely stand to be improved), but I figure it's a start. What do you all think? Anyway, I am looking to develop additional tools for CCI as well -- if anyone has any ideas, let me know here (or on my talk page). jp×g 05:24, 20 August 2021 (UTC)

JPxG, I like this! :) Could we also have a link to the Earwig report for the current revision of a page? Useful for quick checks on recently created pages. firefly ( t · c ) 06:36, 20 August 2021 (UTC)
@Firefly: Gottem. Check the sandbox.
N Eric LichtblauC​: (2 edits, 2 major, +1589) (+1589)C(+480)C
If anyone wants to start coming up with a list of pages to run this on, I'd be happy to (not quite ready to just let people feed things straight into the bot yet, so I will do it manually). jp×g 07:36, 20 August 2021 (UTC)

Conflicting advice

Following Moneytrees' recent and heart-rending 'cri de coeur' at WP:AN about lack of editor involvement in CCI, I have been trying to get my head around understanding the processes involved at CCI. I've already made a few minor tweaks to the Header text wording to improve clarity of the instructions (which I hope will be found acceptable), but have encountered a significant conflict of advice in those instructions relating to those users found responsible for repeat COPYVIO issues.

On the one hand we say in the /Header template text shown on this page: "...However, if concerns are substantiated, you may best demonstrate your willingness to comply with Wikipedia's copyright policies by helping to identify and address copyright issues." (now slightly modified by me to read "..to identify and address the copyright issues you have caused."

Yet on individual CCI pages the guidance notes say exactly the opposite. viz:

"All contributors with no history of copyright problems are welcome to contribute to clean up. Contributors who are the subject of a contributor copyright investigation are among contributors with a history of copyright problems and so are not welcome to contribute to clean up of their own or others' copyright violations."

It seems to me that allowing certain subsets of 'good faith', but wayward, editors to find and successfully fix their own copyvio mistakes is not only an excellent opportunity for them to fix their own mistakes and to demonstrate they now understand the issues around copyright and close paraphrasing, but it also helps address the backlog at CCI. Not only that, but these same editors may well be the ones most likely to be motivated enough to fix things in that topic area (especially if a permanent block/unblock is part of the inducement or deal to participate in the monitored clean-up of their own mistakes.

But which instruction is it? We can't have it both ways, surely! Nick Moyes (talk) 23:52, 31 August 2021 (UTC)

@Nick Moyes: I think the conflicting part is "so are not welcome to contribute to clean up of their own or others' copyright violations". From the sentences, I think anyone with a CCI on them cannot clean up other's peoples vios in other peoples cases. In terms of their own CCI case, I think they might be able to identify whether or not it was a copyvio but can't actually do the removing in their own case. I'm not 100% sure so I'm pinging @Moneytrees: for advice. --MrLinkinPark333 (talk).

New PD template

I'm working on Wikipedia:Contributor copyright investigations/20190125, which involves frequent use of a specific old PD source. The source is thousands of pages long, so it's often very hard to figure out where exactly the content came from, especially when the editor didn't indicate which volume of the work things come from. Given the circumstances, I've been acting out of caution and marking usages of that source that aren't obvious non-copies with the {{PD-notice}} template just to be safe. To be more specific, I've created {{Official Records}}. 1) Is the presumptive PD attributing appropriate and 2) Is that template good enough to at least be useful? Hog Farm Talk 20:28, 3 September 2021 (UTC)

@Hog Farm: I haven't come across presumptive PD attribution in the past. I'm not sure if presumptivly tagging would work, as it needs to be verified whether the text is indeed cited in that reference. University of North Texas has a search bar for the series, so it might help determine whether it was copied or not if no voulume is provided. As for the template, it looks fine but I'm not a template expert. --MrLinkinPark333 (talk) 03:22, 4 September 2021 (UTC)

Notifying subjects of a closed CCI

Hi, per request from some users I've created {{CCI-subject}}, which can be used to notify subjects that a CCI into them has been closed. — Berrely • TalkContribs 16:59, 5 September 2021 (UTC)

Clerk ish notice

Moneytrees gave me the goahead on Discord to clerk at CCI since I am a clerk at WP:CP. Noting here for transparency and for anyone to object. Sennecaster (Chat) 12:31, 8 October 2021 (UTC)

In my opinion the roles could be merged, if you're good at one than you're probably fine with the other. Moneytrees🏝️Talk/CCI guide 16:02, 8 October 2021 (UTC)

Archiving

Archiving was turned off for some reason. I've turned it back on, in lieu of multiple discussions sitting for over a year. Sennecaster (Chat) 02:21, 11 November 2021 (UTC)

Aditya soni

Please help to check Copyright violations in the articles created by Aditya soni. Wikipedia:Articles for deletion/Pashyanti was deleted for the same and 3 more articles are found having problems. Venkat TL (talk) 08:06, 16 November 2021 (UTC)

Hi Venkat TL! I would follow the procedures listed on the corresponding page to this talk page if you want to request a review into an editor's contributions. If there have been 5 violations, we generally look at the case. Sennecaster (Chat) 14:42, 16 November 2021 (UTC)
yes, I understand. But checking for this takes too much of my time. I posted here to seek help from experts. Venkat TL (talk) 14:55, 16 November 2021 (UTC)
I'm looking at the talk page. This person has had 3 AfC declines for copyright violations, and the Pashyanti AfD. I will wait to notify an admin until the G12 requests are handled. Sennecaster (Chat) 15:35, 16 November 2021 (UTC)

Rapid grant for the creation of a CCI workflow tool

Hello, everyone! You may know me as the creator of the copyright-related userscripts CopiedTemplateEditor and InfringementAssistant. I'll be developing a userscript that better handles the processing of CCI cases and I thought it would be a good idea to request a rapid grant from the Wikimedia Foundation. You can find more information about the scope of the tool and the grant on m:Grants:Project/Rapid/Chlod/Contributor copyright investigation tool. Feel free to leave comments, questions, suggestions, ideas, and endorsements on the grant talk page. Thanks! Chlod (say hi!) 02:19, 10 February 2022 (UTC)

Copying open license sources without attribution

Does CCI cover copying open license sources without attribution? I understand that they can be repaired and are considered less serious. I came across a user with 3 examples copied from Fandom (website) (formerly Wikia, see also {{Fandom content}}) and 5 within Wikipedia. I will notify the user if and when I submit a case. Flatscan (talk) 04:22, 5 April 2022 (UTC)

@Flatscan For serial cases, yes, we do. If it's extensive and can't be cleaned up in one go, it definitely deserves a looking at. Sennecaster (Chat) 04:04, 7 April 2022 (UTC)
One additional consideration is that sometimes the material on Fandom is not actually original to Fandom, so at least spot checks should be made for that before we just assume everything is fine after attribution. VernoWhitney (talk) 13:56, 7 April 2022 (UTC)
Thanks for the replies! The Fandom pages were the closest results I found when searching, but expert double-checking will be helpful. Flatscan (talk) 04:34, 8 April 2022 (UTC)

I had the same question about unattributed open license copying and was going to raise a new section, until I saw this discussion. My uncertainty arose from the fact that there's nothing on the project page at all about missing attribution (the string 'attribution' occurs only in the template section near the bottom). Can we add something to the project page to clarify this, so questions like Flatscan's and mine needn't be raised here on the Talk page? Here's a "before" (black) and "after" (green xt font) proposal:

This process is intended for large-scale systematic copyright violations only. Persistent unattributed copying or translation within Wikipedia or from other Wikimedia properties or compatibly licensed sources [are less serious but] may also be raised here. To list individual media files or articles for evaluation, see Wikipedia:Copyright problems or Wikipedia:Files for discussion; however, attempting resolution is not required prior to filing a CCI request.

With the bracketed bit perhaps not necessary, but could be mentioned later if needed. Thoughts?

P.S., this is not just theoretical; I am trying to decide whether to raise a CCI regarding KatBet (talk · contribs)[noping], all of whose edits are unattributed machine translations from de-wiki or it-wiki. Thanks, Mathglot (talk) 19:10, 4 May 2022 (UTC)

Unattriubted translations & copying CC licensed sources without attributions are indeed an issue, and do indeed checking. I don't think the bracketed part is needed, especially since Wikipedia:Copying within Wikipedia specifically states attribution. It also could discourage editors from mentioning those issues if the instructions stated they were less of an issue. In any case, unattributed copying still needs fixing, even if they're not from copyrighted sources. MrLinkinPark333 (talk) 19:24, 4 May 2022 (UTC)
It makes sense to me, I've added it. Moneytrees🏝️(Talk) 20:21, 4 May 2022 (UTC)
(edit conflict)Agreed; thank you very much for your response, MrLinkinPark333. It occurred to me that I could just make a bold edit to add something of that nature and see what happens, but as I haven't been active on this before I thought it better to at least discuss here first, and either a regular here could make the change, or if it seemed acceptable after some discussion, then I'd be okay making a change. I do worry it's a bit long, but I don't see how to shorten it much, given the cases of copying, translation, and in- or out of wikimedia properties.
Post-ec update: I see that Moneytrees has made the change in the interim; many thanks! Mathglot (talk) 20:31, 4 May 2022 (UTC)

Can we get one more closed soon?

I've been chipping away (mostly by myself) at Wikipedia:Contributor copyright investigations/Hary1mo for almost two months, and have the majority of the images addressed and almost all of the text checked. Would anyone be willing to help push this one over the line? We're down to mainly images I can't find the original of or aren't sure what to do about. Note to anyone who chips in here - the DANFS/NHHC text source is public domain, and claims of "own work" for this uploader generally can't be trusted. Hog Farm Talk 07:02, 9 July 2022 (UTC)

@Hog Farm: I'll try to bonk this one down over the next few days. Sennecaster (Chat) 19:18, 9 July 2022 (UTC)
Thank you very much! I've been keeping track of the SPI and can get anything that's new from there. Hog Farm Talk 19:31, 9 July 2022 (UTC)
Sockfarm CCIs are the absolute worst. Glad this one is easy in subject and the only tedious or annoying part is navigating Commons deletion politics. Sennecaster (Chat) 19:36, 9 July 2022 (UTC)
@Hog Farm I've narrowed down the first section to all but one which I have no idea about, File:Blockade runner Hope.jpg. Maybe you'll have better luck. — Berrely • TalkContribs 19:37, 11 July 2022 (UTC)
@Berrely: - I can't find a source that states exactly where this is originally from, but I'm thinking the image is of a model ship which would mean that it's quite likely a violation. Nominated for deletion under Commons's precautionary principle. Hog Farm Talk 01:44, 12 July 2022 (UTC)
Yes. But it's Billy Hathorn. Finally. –♠Vamí_IV†♠ 01:53, 12 July 2022 (UTC)
@Hog Farm: Closed. Sennecaster (Chat) 05:44, 20 July 2022 (UTC)

Clerk role merge proposal

WT:CP#Merging of clerk roles for those of you who are interested. Sennecaster (Chat) 00:05, 22 July 2022 (UTC)

Help close one?

Wikipedia:Contributor copyright investigations/Lightburst on an active editor, has 11 pages left. Not too meaty but I'd like to have it closed soon. Sennecaster (Chat) 03:13, 31 July 2022 (UTC)

Accepted case with no subpage

This is probably a dumb question, because I'm not a regular here. When a case is accepted and removed from the main page, such as ini this edit accepting a case on 1 August, doesn't that mean that a CCI subpage is created for the user more or less at the same time? I thought I would find a subpage here, but apparently not. There's probably some elementary bit of process I'm not familiar with. Thanks, Mathglot (talk) 07:43, 8 August 2022 (UTC)

@Mathglot: Sometime the subpage is "anonymized" as a courtesy, or for other reasons, especially if the editor in question has an (otherwise) long and good standing with the community. The subpage you are looking for may be found at Wikipedia:Contributor copyright investigations/20220720. Iazyges Consermonor Opus meum 08:03, 8 August 2022 (UTC)
Thank you, that was definitely beyond my CCI knowledge horizon. Mathglot (talk) 08:10, 8 August 2022 (UTC)

Does CCI also address Portal pages?

Through copyvio issues at 2021 in Vatican City, User:Justlettersandnumbers and I discovered at least two apparently serial copyright violators at Portal:Current events. See Talk:2021 in Vatican City for the details. If a CCI would be opened about these two editors, would their edits to the portal also be included? Fram (talk) 07:28, 16 September 2022 (UTC)

 You are invited to join the discussion at Wikipedia talk:Contributor copyright investigations/20220720 § Requested move 20 September 2022. DanCherek (talk) 12:24, 20 September 2022 (UTC)

Deputy – a toolkit for copyright cleanup

Deputy logo

Hello! After months of development, I've published the first working version of Deputy, a userscript that contains a few tools useful for copyright cleanup. This is a merger of two of my old scripts, the {{copied}} Template Editor and the Infringement Assistant, piling on more features and also creating a better interface for working with CCI case pages — Deputy's main goal. Because the script is new, your feedback on the toolkit (specifically what can be improved or what features you would like) would be greatly appreciated. Hopefully this can help with all the copyright backlogs that need to be addressed. If you'd like to discuss, please do so at Wikipedia talk:WikiProject Copyright Cleanup § Deputy – a toolkit for copyright cleanup to keep discussions centralized. Thank you! Chlod (say hi!) 02:37, 30 September 2022 (UTC)

Survey tool

Can someone give me a link to the survey tool or is it not online any more? I'd like to use it to investigate some undisclosed paid editing, where it comes in just as handy as here. If it's not online, if someone could run it on Timtempleton and TechnoTalk and post the results somewhere that would be helpful. @MER-C: since you seem to have used it recently. Thanks SmartSE (talk) 12:05, 2 November 2022 (UTC)

@Smartse: I believe you're looking for this online version. Chlod (say hi!) 12:12, 2 November 2022 (UTC)
@Chlod: Thanks - that's the one! I'll add a link to the page. SmartSE (talk) 12:16, 2 November 2022 (UTC)
@Smartse: Beware that it only works on editors with less than 10k mainspace edits; I'll add a note as well. Sennecaster (Chat) 22:40, 2 November 2022 (UTC)

DC GAR/CCI

The mass messages for the Doug Coldwell Good article/CCI assessment will be going out later today. Please speak up (on the talk page there) if anything in that writeup, or any of the other linked messages at WP:DCGAR, needs attention. SandyGeorgia (Talk) 20:04, 8 February 2023 (UTC)

Deputy June 2023 User Experience Survey

Hello! I'm Chlod, the developer of Deputy. You may have heard of this tool before or you may be an active user, but in case you haven't heard of it prior, it's a tool for streamlining some of the copyright cleanup work on this wiki.

I'm currently holding a survey to gauge editors' experience of the tool, be it as a user or a non-user. You're invited to participate, even if you don't use Deputy! The responses collected will help improve the tool and, if you're not a user, help make Deputy work smoothly with how you do your work.

The survey can be found here, and you can learn more about this survey on the Wikimedia Meta-Wiki. Thanks! Chlod (say hi!) 23:19, 17 June 2023 (UTC)

 You are invited to join the discussion at Template talk:Copyvio § Making the template more intuitive. – Isochrone (T) 13:21, 1 August 2023 (UTC)

Lost CCI nomination for Victuallers?

Hi can you help? I was notified two or three months ago that a CCI investigation had been opened on 29 June 2023 Wikipedia:Contributor copyright investigations#Victuallers. I realise there is a queue while you work out whether to take the case, investigate, turn in down or decide that its an abusive nomination... but I was wondering how the nomination was progressing. The link is now dead and I cannot find it in your queues. Sorry if I have misunderstood your system. Could you advise me on the status of this nomination? Thanks Victuallers (talk) 13:19, 9 September 2023 (UTC)

Hello @Victuallers. Wizardman decided on Sept 6 that there was not enough going on to warrant opening a case. Diff. — Diannaa (talk) 14:23, 9 September 2023 (UTC)
Thanks Diannaa, I agree with Wizardman (if maybe not the speculation). Victuallers (talk) 15:25, 9 September 2023 (UTC)

Any ways to automate any of this?

I've been doing a bit on CCI/20220720. This leads me to ask whether there may be any ways to part-automate the review process. I should stress, first off, that I'm a technical numpty, but I'm well aware that many editors are not, and I would have thought a group of these would be able to suggest some practical automations. Secondly, given the importance of Copyright Compliance to Wikipedia, I would have thought it was an issue where a well-drafted proposal may elicit a positive funding response from WMF. I see that other proposals around automating the process workflow have been supported. Thirdly, I'm sure that this issue has been discussed previously, for example there's an interesting discussion, Tangential discussion on possibility of a Copyvio algorithm here. In this regard, am I overlooking any existing tools that could expedite the review process? Nonetheless, considering the time it takes to review these cases, and the number of such cases outstanding (some dating back over a decade and still incomplete), I do wonder whether alternatives ought to be considered. I'm basically thinking of some variant of Earwig, as has already been suggested by others, that could, at the least, do a really valuable job in weeding out the false positives, enabling editors to focus on the areas of real concern.

I'd be interested if other editors thought this was something warranting further thought/discussion. KJP1 (talk) 08:52, 23 November 2023 (UTC)

Hey, @KJP1!
Yeah, there really actually is a deficit of automation in this space. There was an idea previously floated around about having Turnitin or Earwig run on all revisions of past cases; I'd say this is probably the general idea when talking about automation for CCI cases. When it actually comes down to making it happen, though, it's a spider web of caveats and limitations that make it hard to get off the ground. Here's a more-organized explanation of my thoughts that I randomly collected in the past few months:
  • First is the issue of cost. There's around 508 thousand revisions left to check (as of May this year), but we only ever have a finite amount of Earwig search engine searches or Turnitin credits. Processing all of these automatically means we have to work with the WMF to get more credits for a one-time run-through, and we're not sure if we'll get decent results for a majority of those checks.
    • We could work around this by completely disabling search engine checks, as the thread you linked discussed, but this can either work for or against us based on the case. We could also work around this by only selecting a few cases which rely mostly on web sources or (for Turnitin) sources that we know would probably be indexed. This significantly cuts down on the amount of revisions to check. But then there's the next issue:
  • A lot of the older cases, especially the ones over three years old, start getting a lot of false positives. As article text remains on the wiki for long periods of time, SEO spam sites, academic documents, slideshows, and others start copying from Wikipedia. We filter out a lot of these already (like those in this list and a bunch of others), but we still hit them every once in a while and enough that it clogs up what reports we would otherwise get from Earwig/Turnitin.
    • A possible solution to this would be human intervention (which is more or less a given with something like this), where editors will double-check to see if a flagged revision actually is copied from somewhere, or if it's just a false positive. Human intervention will weed out false positives, but then it won't weed out the false negatives.
  • At the end of the day, copyvio checking is a really hard computer science problem that humanity is still in the middle of solving. False negatives; like when a revision flies under the radar because a source it copied from has died, or when the text has been paraphrased enough to make checkers think it's completely original text; will always be one of the biggest brick walls we face. False positives waste editor time, yes, but false negatives arguably take up more time, because we then need to re-check the case. It also wouldn't be a good look for us or the WMF if it turns out that we get a lot of false positives and negatives, since that could be perceived by the community as a waste of funds. Perhaps this is still something that could benefit from research and testing.
Unfortunately, I still haven't thought of a good way to deal with that last one. I've talked with a few WMF staff and other technical contributors about this to get their insight, but I'm still unable to find a good compromise. I'm happy to hear from anyone on what we can do to make this happen, because this could be one of the ways we can make a good dent in the massive backlog. :) Chlod (say hi!) 13:02, 24 November 2023 (UTC)
Chlod - Hi, thanks very much indeed for getting back, and for the very interesting suggestions. As I said, I'm not strong technically, but I thought kicking of a discussion may get editors who are, interested in thinking around the issue. I agree it is likely to be complex, and with a whole load of brick walls to surmount. But we know the problem is very real, is generally getting worse rather than better, and I think that individual editors grinding through individual CCIs is unlikely to be the complete answer, although I also agree that any solution will certainly still need individual editors making judgement calls. Let's see if our exchange sparks further interest/discussion. All the best. KJP1 (talk) 06:55, 25 November 2023 (UTC)

I'm not sure if this was the proper place, but in 2021, Vergel Meneses was practically blanked due to User:Obetpaguia's problematic edits. That user's revisions were relatively minor. Would it be possible to restore the removed content, most of which was sourced and written by other people such as myself? Howard the Duck (talk) 01:58, 25 December 2023 (UTC)

Claudine Gay

I'm not sure if this is the best place for what I'm about to say, but I think it's the place I want to be heard. Claudine Gay, IMO, had actual and real copyright issues in her writing. They were fairly minor and of a type that can be fairly common both here and in the real world. And they were used to drive someone out of their job for political reasons. I feel like that happens here sometimes--find someone you don't like and find some close paraphrasing and drive them out. I don't want to say that is what happens on a regular basis, or that significant copyright issues aren't a problem. I do want to say that writing well enough to avoid any such issues requires skills that even the best educated don't always have. This is particularly true wrt "close paraphrasing". But on Wikipedia it can be even harder. We often aren't subject experts. We read maybe 3 or 4 sources (at most) and put together a sentence or a paragraph that reflects what they said. It is often easy to end up with phrasings that are close to one of the originals without intending to or noticing. And yes, often people are just too fast. I'd prefer we take a less punitive approach with folks--more coaching, less accusations and negativity. If people are taking more than a few words and directly copying them without quotations, that's an issue. But the copyright issues that I often see discussed often appear to be in good faith and well within fair use (thus not putting the project at risk). That we go after the issues with vehemence is praiseworthy. That we sometimes go after the people when the errors are, or reasonably could be good faith issues is suboptimal and I think has caused us to lose some good contibutors who needed coaching, not attacks. Thanks for listening and sorry if the issue has already been brought up elsewhere. Hobit (talk) 18:13, 3 January 2024 (UTC)

Courtesy blanking

Wikipedia:Contributor copyright investigations/20220720 03

Could somebody with the authority to do so undertake courtesy blanking of the above, as it is now completed. KJP1 (talk) 09:53, 9 February 2024 (UTC)
 Done. To the best of my knowledge anyone can blank a subpage of a CCI once it's done, so nothing would've prevented you from doing it yourself. It's only the main pages of individual CCIs that should only be blanked and closed by admins or copyright clerks. Thanks for all your hard work! Callitropsis🌲[talk · contribs] 15:27, 9 February 2024 (UTC)
Many thanks, for that and for the advice. Fortunately, there’s only one left to do. KJP1 (talk) 18:09, 9 February 2024 (UTC)

Contributor copyright investigations/20220720

Having today completed and closed up Wikipedia:Contributor copyright investigations/20220720 02, I think Wikipedia:Contributor copyright investigations/20220720 is now concluded. I would be most grateful if somebody could courtesy blank Section 2, and undertake any other administrative actions that might be needed to conclude/close the CCI. Do please ping if anything more is needed of me. Thanks and regards. KJP1 (talk) 16:31, 1 March 2024 (UTC)

@KJP1; Undertaken by MER-C. Thank you for your assistance in the closure of this case. Sennecaster (Chat) 02:27, 2 March 2024 (UTC)