Wikipedia talk:Articles for deletion/Anybot's algae articles

From WikiProjectMed
Jump to navigation Jump to search

AfD notices

Is there some straight-forward automated method of putting AfD notices on all the articles, or is that a too-burdensome task at this magnitude? Melburnian (talk) 07:59, 22 June 2009 (UTC)[reply]

Anyone with a bot account could do it with AWB. But is there any point? 4000 edits to save half a dozen deletions and restorations? Hesperian 10:35, 22 June 2009 (UTC)[reply]
Well, for my own contributions it was nine, so we're over half a dozen right there. For me the scary part is that without going through my contributions I had no real way of knowing which articles I had worked on. I guess I'd notice either a deletion or an AfD notice on my watchlist, so maybe it is OK either way, but I'd lean towards trying to err on the side of notifying, if we think there is a chance it would let someone know who would otherwise never notice or find out about this well after the fact. Kingdon (talk) 03:57, 24 June 2009 (UTC)[reply]
This is a good idea. The articles would then appear on watchlists. I want to see the articles disappear ASAP, but I agree that deleting an article someone has improved is tough for the writer. One of the least appreciated jobs on wikipedia is sourcing articles. I would not like to read half a dozen sources, pick the most authoritive, add it to an article, then have the article deleted without my knowledge. Still, I don't like to delay the removal of 2 billion years of misinformation. I like what Hesperian said, something about calling an animal a plant only far more wrong. --69.226.103.13 (talk) 18:17, 24 June 2009 (UTC)[reply]
This error has now been fixed in most articles; the rest are being fixed. Martin (Smith609 – Talk) 21:14, 24 June 2009 (UTC)[reply]
You mean you're running the bot now? Is this agreed upon. It seems we're still discussing the issue. --69.226.103.13 (talk) 22:29, 24 June 2009 (UTC)[reply]

You're introducing more and new errors and deleting stuff from the wrong place. --69.226.103.13 (talk) 22:57, 24 June 2009 (UTC)[reply]

(after edit conflict) Yes, it seems that he ran the bot this afternoon. Care to take a look at some of the edits and see what you think? As I said, making a small number of test edits and waiting for feedback would have been wise before doing this - but I suppose that the end result is either going to be 'slightly better than before' or 'as broken as before'... --Kurt Shaped Box (talk) 23:02, 24 June 2009 (UTC)[reply]

I have reblocked the bot. Given the context (discussion proceeding on the understanding that the bot was blocked; concerns about unauthorised access; the BAG notified that the bot was blocked, but not notified of the unblocking; recommendations that the bot be blocked and/or deapproved; expressed opposition to the bot being deployed to fix its own errors; no BAG approval for the new task; apparently no test run), I think it was inappropriate for the bot operator to unilaterally unblock and deploy. And then there is the problem that it was introducing new errors. Hesperian 23:40, 24 June 2009 (UTC)[reply]

My finger was actually hovering over the block button a few minutes ago, though I decided that I would wait for Smith to comment here before taking any action. Yes, I fully agree that it was an unwise move on his part to unleash the bot in this manner. Speaking frankly, I'm beginning to wonder if Smith is sufficiently familiar with this field and these particular organisms to be able to program a bot capable of outputting non-error-ridden stubs. I'm 100% certain that his intentions are good and I mean no insult to the man - but this just doesn't seem to be working. --Kurt Shaped Box (talk) 23:59, 24 June 2009 (UTC)[reply]


I appreciate your concerns; however:
  • The discussion taking place hinges on the quality of the articles the bot has created
  • The first run of 'bot version 1' contained major errors, spotted in February.
  • The second (April?) run of 'bot version 2' fixed many of these errors (whilst some escaped my attention and still need addressing). These edits stood for several months without attracting complaint.
  • The third (June?) run of 'bot version 1', which took place without my knowledge, reintroduced the errors which had been fixed in the second run.
  • Editors had expressed the opinion that errors should be fixed ASAP; restoring the version produced by 'bot version 2' goes a long way to fixing many of the most important errors, and allows editors to form a fair view of what the bot created when it was originally run.
  • Remember, the bot only edited articles that humans had not touched. The only 'new' amendment it made was to fix higher taxonomy errors which had occurred because I had assumed that Algaebase only contained Algae; therefore they could only be improving the articles.
Further,
  • Unauthorised access is now impossible
There appears to be consensus that errors in the bot should be fixed ASAP - something which I wholeheartedly agree with. Running the bot in this fashion cannot increase the amount of errors; nor can it remove any human input (except my own); nor does it make it any harder to delete some or all articles, should this be the outcome of the ongoing debate.
Perhaps I am being short-sighted, but I cannot see what harm will be done by using the bot in this 'roll-back' capacity, and I do see some benefits. Can anyone provide me an example of how it is harming WP that I have missed? By blocking the bot account, you are effectively saying that you don't want the errors that previous scripts created to be fixed; as far as I can tell this is the opposite sentiment to the dominant one thus far. If the edits had been performed by a user who wasn't called 'Anybot', would there have been any reason to block that user?
Martin (Smith609 – Talk) 02:34, 25 June 2009 (UTC)[reply]
  • Martin, one result of this fiasco is that your assurances with respect to your bot have lost a degree of weight with the community. A number of people have said that they don't trust your bot to fix these errors. By unblocking and running the bot in the face of this opposition, you are thumbing your nose at these concerns. If you are right that this bot cannot increase the amount of errors, then you are also right that there is no harm in running it. But again I say, your hubris does not suffice to establish that the bot is safe. As I said on your talk page, if you want the bot unblocked, then get a BAG member to unblock it. If you can convince a BAG member that the bot will do no harm, then I won't stand in the way of you running it. Hesperian 03:47, 25 June 2009 (UTC)[reply]
Put it this way, Martin, if a user (not bot) is trying to pass hundreds of algae as cyanobacteria, trust me, that person's account is blocked indef very quickly and I doubt any admins will unblock to allow this person to correct his/her mistakes. OhanaUnitedTalk page 04:57, 25 June 2009 (UTC)[reply]
It now emerges that this latest run was putting articles into a state they had never been in before, including introducing novel errors, but with edit summaries that say "Restore article to last good version", and flagged as minor. That is seriously, seriously dodgy, Martin. That is Fucked Up Shit. Hesperian 07:37, 25 June 2009 (UTC)[reply]

New and different errors

The bot is introducing errors because Martin did not read the list of errors that the bot introduced the first time. I suggested the bot use wikipedia's existing higher level taxonomies. It didn't. So, now, the red algae are one thing at the genus level and something different at the family level.[1]

It still can't count.[2] Or Martin is using more than one source but omits that information from the article. 12 species total in AlgaeBase, and only 7 valid ones, but our bot says there are 12 valid species. There's no such information in AlgaeBase, and if it's correcting its mistakes, the one mistake people have been pointing out since February is, the bot can't count. It still can't count. This is computer programming pre-101, making your program count. Bots that can't count should not be operated to count things. Or to correct their own mistakes. Or operated at all.

It's NOW editing articles it did not start.[3]

It's making the diatoms plants.[4] An archaic classification wikipedia doesn't use.

The cyanobacteria now say prokaryote, but they still say "it's an alga." And it's not. So, the articles remain as bad as they were.[5] It claims a preliminary addition to the data base is the only taxonomically valid species.

It's edit histories do not go with what it's doing. Here it deleted text, then blanked the page, then did something else. [6]

The edit history states, "Restore article to last good version," for every edit, regardless of it rewriting its owners articles, blanking and tagging the page for deletion, or whatever it has done.

And it is still not pulling data from the database correctly. The article it blanked for no valid taxonomy actually was a valid species.

So, in addition to not being able to count, the bot does not distinguish "C" "S" "P". Again, pre-101 computing. Have your bot read a letter, verify it the same letter.

The algorithm is wrong. No matter how many times this bot is run a bot programmed this poorly is not going to improve things.

It's removing the taxonomy boxes and leaving a stub sentence, this when the taxonomy was bad, but still leaves the sentence that it's the only taxonomically valid species. [7] Without an accurate taxonomy box it may be hard to find the genus, because algae genera are not all unique like plant genera. They may have some duplicate names (I think).

Wikipedia uses Archaeplastida for Rhodophyta, but anybot made them all plant kingdom, so it introduced yet another new error. [8]

The bot was not "secure" in the first place. How was that allowed to happen? These other mistakes weren't fixed, we're supposed to trust the security to someone who didn't know to not allow others to use his code in the first place? Is this how bots are routinely coded on wikipedia? Anyone with half a brain for a hack could find the bot code.

I'm not going to edit any of the articles. Anyone who does is being played for a fool by this "programmer" who is having too much fun playing with the community. --69.226.103.13 (talk) 07:04, 25 June 2009 (UTC)[reply]

Well, shit. I was under the impression that all this latest run of edits did was revert to a previous version of the articles; i.e. reverting the edits that ought not to have happened post-April. Of course, the pre-April versions were also wrong, so that means changing articles from a wrong version to a wrong version. Putting "last good version" in the edit summary was silly.
But you're right, it is editing articles it has never edited before, into novel versions, with edit summaries that say "Restore article to last good version". That is seriously, seriously dodgy.
I have re-blocked the bot. Martin won't unblock it and continue, because we have both social and policy strictures against that sort of thing. I still say that Martin has been acting in good faith; he has just bitten off more than he can chew, and seems to think the solution is to take another bite.
"I'm not going to edit any of the articles." That's a shame; the 'pedia has been fortunate to have had your help thus far. If we don't see you again, at least take my personal thanks away with you.
Hesperian 07:27, 25 June 2009 (UTC)[reply]
I don't know what's happening, but as long as Martin is running his bot these articles are going to continue to the level of misinformation on the internet, and all traced back to one source: Martin's bot. You say there are safeguards, but Martin ignored all of them. New bots (as he claims his is) require approval by the bot owner's group and a flag by a bureaucrat-it seems. Martin did not get this, he says this is not anybot but a new unapproved unflagged bot. To say Martin won't unblock it seems to be ignoring all evidence of his behavior in creating the mess, he was told repeatedly about errors, and did not correct them, he has the power to unblock his own bot, and will, and does, he got "scolded" for not getting prior approval to run a bot before (it's in the archive). The evidence appears to say he'll do whatever he wants regardless of lacking community approval. It's a waste of time for all the editors working to find consensus in the means of fixing this mess he created.
But, seriously folks, I spent a lot of time figuring out what was wrong with these articles before I came to the conclusion that they were not fixable by a bot. It took hours each day for a number of days looking at articles, edit histories, and writers' contributions. In this discussion I listened to and weighed what others had to say. I considered other suggestions and was in the middle of considering other possibilities (check the bot's test edits, have an administrator check articles for substantive edits, offered to read the algorithm and debug it). Then Martin offered up he was not listening to anyone.
If the community is discussing something, trying hard to find a way to make the decision benefit
  1. the encyclopedia (by creating good articles) and
  2. the community (by not wasting the IP's extensive editing),
then someone comes along and unilaterally disrespects all the work being done it makes editing wikipedia stink. Wikipedia is a hobby. Hobby's are fun. A=B!=C in this case. Also, you know what, I even offered to verify the algorithm for him for free! You know how much I could get paid to do that? --69.226.103.13 (talk) 08:33, 25 June 2009 (UTC)[reply]
I hear what you're saying. I've been here long enough to have a pretty clear understanding of community standards. Martin has arguably operated in some grey areas, by unblocking his bot and running a new, unapproved automatic task on his bot account without seeking approval. On the other hand, he can argue that he was undoing his own block, having addressed the reason he applied the block in the first place; and that fixing his own errors falls within the scope of his original bot request. Like I said: grey areas. We have an ignore all rules rule, and this kind of behaviour is routinely accepted so long as it ends up all for the best. In this case it hasn't, and I have responded by reblocking his bot. You'll have to take my word for it that undoing your own block on your own bot is a grey area, but undoing someone else's block on your bot is black. There is no evidence that Martin is going rogue on us; but if it is any comfort to you, if Martin were to unilaterally unblock his bot at this point, that would be grounds for an emergency desysop i.e. immediate loss of the ability to block and unblock other editors. The safeguards are there; we have just been a little slow in applying them. And why not?—I see incompetence and a degree of culpable negligence here; but I don't see any malice.
Hesperian 11:23, 25 June 2009 (UTC)[reply]
You're right, incompetence works to do it all.
I don't think Martin's a programmer. His programming errors are too basic. He may have learned a programming language on his own without studying introductory programming logic. Maybe that's why he declined/ignored my offer to debug his algorithm: he didn't write one.
Whatever his intentions, giving someone who isn't a programmer the right to run a bot and the ability to block and unblock the bot can go wrong for wikipedia. Especially true when the person isn't listening to the community's input. This point might be in Martin's favor for malice or not: someone whose intentions were bad and who had the ability and authority to run and unblock his own bots may have created tens of thousands of bad articles in the same time.
I wish all of you luck in correcting this in a timely fashion. --69.226.103.13 (talk) 15:40, 25 June 2009 (UTC)[reply]

Response to bot owner's post

Delete all that contain errors, improve bot code in consultation with 'experts', and run bot again. I had been unaware of the lengthy discussion at WikiProject Algae; first I should note that the original version of the bot contained some errors, which a later version of the bot corrected as soon as they were pointed out. The original version seems to have been run since April, replicating some of the errors, which has inflamed the discussion.

All articles I checked, except for a small number corrected by other editors, contain huge errors, and a diversity of errors. If it is required that each of the 4000+ articles be checked individually before being deleted, how can that be done, and by whom? You said you don't have time to recode the bot until October. Should wikipedia maintain these articles as they are until October if the 4000 cannot be individually checked for errors? I checked 100s and found not a single accurate article created by anybot.
A user named FingersOnRoids informed you of many of the errors in February. These errors were not removed from the articles or code, because I found the same errors.[9]
I checked articles written by the bot before, during, and after the April problems, including its initial run of articles, including later corrects made by the bot-these are all bad, all contain a variety of errors, all need scrapped to a stub that says “Thisgenus is” if edited by a bot because many of the taxonomies boxes are wrong and the errors are so diverse. In addition to cyanobacteria identified as eukaryotes, there are all different types of macroalgae, algae, angiosperms and at least one fungus identified as diatoms.
The lengthy discussion on WikiProject Plants was linked on your discussion page on June 16.

Now, in my opinion, articles that contain small errors (e.g. the wrong tense) but cite a reliable source are better than no article at all - and if all such pages were deleted from WP the encyclopedia would probably shrink by a factor of two. As evidenced by the work of some dedicated IP editors, the existence of a skeleton article is often the seed from which a useful and correct article is developed. And as all of the articles use information attributed to a reliable source, it is possible for people to check the data against the facts (no-body should ever use WP as a reliable source in itself). Again, this makes the articles more useful than many other unsourced articles on WP.

I identified no articles that contained only a wrong tense. I tried to find systematic errors that would allow the keeping of say, all articles about diatoms or coccoliths or red algae, with a bot fixing. Each group had huge errors in the taxonomy boxes, in the writing of the descriptions where included, in writing of extinct species as living taxa, in addition to errors such as incorrect species counts, and the error in every article attributing AlgaeBase’s verified data to the number of species in the genus. Again, other than stubifying to "Thisgenus is an organism" I don't see how it can be done. The organisms are not even all photosynthetic.
AlgaeBase does not list whether a species is extinct or extant. Every species would have to be checked for this in other sources. This is why I suggested asking phycologists to check the information. A number of species are fairly well known and are getting a couple of hundred page hits a month (bad page hits, sadly). I recognized a common extinct species when I went to check strange information in a paper I was editing.
The main IP editor corrected only the higher level taxonomy. Another IP editor corrected some more serious errors. These IPs did not correct the entire article. Their articles would require identification and additional editing. I offered to edit these if a list is made.
The source is reliable, but the data were not gathered in a reliable manner. Therefore all data would have to be verified. Who would do this? When? How many more Wikipedia mirrors would copy the wrong information before it was done?
Source material that copies the name of an angiosperm then calls it an alga, or copies the information about a macroalga then fills a taxonomy box with information about a diatom is not usable and is not useful. This was discussed at Wikipedia plants. The seagrass that was called an alga in its article, Thalassia’’ is a major marine angiosperm, also.

However, I am embarrassed that wide-spread errors do exist. Systematic errors - such as the use of 'alga' instead of 'cyanobacterium' - are very easy to fix automatically. If I had a list of the errors that have been spotted, so that I could easily understand what is said that is wrong, and what should be said, I could re-code the bot until it got everything right, and then put it up for retesting (hopefully it is now notorious enough that people will be willing to check its output). At that point it would be possible to run the bot again and create error-free articles. In the meantime, perhaps it is a good idea to delete articles which contain factual errors. (I will never support the deletion of any article which details a notable subject, and contains factually correct information attributed to a reliable source.)

Cyanobacteria morphologically described as red algae within their articles is a data-gathering error that is within the code, or, more probably and primarily, it is probably within the coding algorithm.
For example, the article ‘’Aphanocapsaopsis’’ is from the AlgaeBase page that lists two species of uncertain taxonomies, [10], our article calls the organism an alga, lists it as a eukaryotic cyanobacterium, then goes on to say it produces carpospores and tetraspores! There are also golden algae(‘’Hibberdia’’ reproducing with tetraspores and carpospores. This information is NOT from AlgaeBase.
I don’t think “deleting articles which detail a notable subject and containing factually correct information attributed to a reliable source” is going to be an issue with any of anybot’s articles. Please read the nature of the mistakes on the WP:Plants page which lists a variety of genera with a variety and number of errors. There are so many errors and so many different types of errors. Every articles, as far as I can tell, has at least two major errors making it entirely useless.
When these errors were initially pointed out to you, beginning in February, you did not properly recode the bot to correct the errors or correct the articles with the errors already in them. You were asked to fix the cyanobacteria articles in March. You did not fix them. You told me when I started pointing out these errors that you do not have time to fix them until October.
I just do not think that you wrote an algorithm that can scrape a database. I would be concerned using your bot to rewrite the articles without carefully checking the algorithm. I am willing to do that if you post the algorithm and if the other writers agree that this is a useful solution. I don’t think it can wait until October. I think it needed fixed in February when first brought to your attention, or at least the cyanobacteria needed fixed in March when mentioned by Rkitko.

I think that the worst case scenario would be to delete articles willy-nilly and thereby deplete WP. We have the potential to use the Algaebase material to generate useful information - if it's not entirely up to date, then neither are most text books; and if the classification needs systematically updating, the bot can do that as taxonomy is updated. If this is done regularly, WP can keep up to date and become as useful a resource as Algaebase is today. Let's be careful to produce the best quality output we can before the deadline. Martin (Smith609 – Talk) 13:39, 22 June 2009 (UTC)[reply]

The worst case scenario in my opinion is that one more wikipedia mirror copies anybot's cyanobacteria articles.
I do not care if the articles are not entirely up to date, but 2 billion years is too much. The taxonomies of single celled marine photosynthetic organisms are difficult. I offered a solution to this difficulty, namely, using existing taxonomies on Wikipedia. Find the family or order, and simply use the higher level taxonomy already used. Again, you don't have time for this until October.
AlgaeBase is maintained by phycologists. Its primary use is to other scientists who understand the data. It is an amazing resource on the internet. For wikipedia to incorporate some of its information editors must understand the nature of the database.
--69.226.103.13 (talk) 16:47, 22 June 2009 (UTC)[reply]
  • I guess I'm one of the "experts" who would be asked to fix the bot, and I'd have to say "no thanks". It just doesn't seem worth the effort any more (and this isn't just true of AnyBot, I've been saying the same about other proposed bots to mass-create species articles). Kingdon (talk) 04:00, 24 June 2009 (UTC)[reply]


There is the "proverbial ghastly pun somewhere" involving scraping algae, pooled ignorance ("are lichens and seaweeds really classed as algae?" and a broad division into "green gunge in pools" and "that red stuff near the shore that causes problems") and suchlike. —Preceding unsigned comment added by 83.104.132.41 (talk) 09:14, 30 June 2009 (UTC)[reply]