Wikipedia talk:Version 1.0 Editorial Team/Assessment rewrite

From WikiProjectMed
Jump to navigation Jump to search

Material was moved here from Wikipedia_talk:Version_1.0_Editorial_Team/Assessment#Overhaul_and_rewrite_of_the_assessment_scheme and related discussions on Wikipedia talk:Version 1.0 Editorial Team/Assessment. Walkerma (talk) 17:35, 12 May 2008 (UTC)[reply]

Fewer words?

For the assessment table, I'm wondering if we couldn't say more or less the same thing using fewer words, here's a suggestion. I've included a description of a "list", which was commented out of the table; feel free to delete it if it's not relevant.

Article progress grading scheme [ ]
Label Criteria Reader's experience Editor's experience Example
B
{{B-Class}}
Anything that's definitely better than the "Start" category, but doesn't meet higher standards. It gives the impression that a typical reader would learn something. Improve the article by trying to meet higher standards. Jammu_and_Kashmir (as of October 2007) has a lot of helpful material but needs more.
Start
{{Start-Class}}
A good article that is still weak in many areas. Has at least a particularly useful picture or graphic, or multiple links that help explain or give examples of the topic, or a subheading that covers one topic more deeply, or multiple subheadings that suggest material that could be added to complete the article. Useful to some, provides more than a little information, but many readers will need more. Major editing is needed, not a complete article. Real analysis (as of November 2006)
Stub
{{Stub-Class}}
Either a very short article or a rough collection of information that needs a lot of work. Possibly useful. It might be just a dictionary definition. Any editing or additional material can be helpful. Coffee table book (as of July 2005)
List
{{List-Class}}
An article that meets the definition of a Stand-alone List. It should contain many wikilinks, with descriptions. There is no one way to make a list, but it should be logical and useful to the reader. Lists can be anything from a stub to a Featured List. List of aikidoka (as of June 2007)

- Dan Dank55 (talk) 19:45, 4 April 2008 (UTC)[reply]

Hm, no response. Let me add that one of my favorite essays, WP:KISS (linked from K.I.S.S., linked from WP:Instruction creep), says, in its entirety: "Keep policy, guideline and procedure pages short, or else people won't read them, more people will leave the project, and less people will join the project." That's my experience too ... the shorter the instructions, the more likely they are to get read. - Dan Dank55 (talk) 19:57, 5 April 2008 (UTC)[reply]
Sorry I didn't see this, I've been very busy offline; this proposal is well worth looking at. I'm a great supporter of KISS, but the wiki approach tends to mean that often people simply add rather than rewriting. There has also been a proposal for tightening up the assessment scheme, and it would make sense to do everything at once. If you're up for working on this, I'll try and recruit a few others for their input. Bearing in mind that this scheme is used by over 1000 projects, we need to make sure that any rewrite represents a consensus of several interested people. Thanks! Walkerma (talk) 07:15, 6 April 2008 (UTC)[reply]
Sure, I'll follow your lead, Martin. Note that there's a new way to notify wikiprojects if you like, here. I can't speak for WP:GAN or WP:FAC ... perhaps a link to those pages would be better than trying to sum up what's needed in a table ... and I didn't know what to say about A-class. I should add that I and others are putting major energy into the GA process. I know how A-class is defined, but I'm kind of wondering what the function of A-class is ... there might have been a feeling that the GA process should be avoided for a variety of reasons. This would be a good time for people to post a message at WT:WGA if they have been dissatisfied with the goals or output of the GA review, we're very much on it. - Dan Dank55 (talk) 11:49, 6 April 2008 (UTC)[reply]
P.S. I'd be happy to drop "Lists" if it's not needed, and it seems to me the "Editor's experience" column could be dropped, since it logically follows from everything else. - Dan Dank55 (talk) 12:04, 6 April 2008 (UTC)[reply]
Good! A-Class predates GA, and GA was added into the scheme later. It still fulfils a useful role for some projects, and there are some projects (such as WP:MILHIST) that have had a wariness concerning the GA process - I don't think that would change at this point, since they don't relate any more to specific points of procedure. I personally think that GA-Class is the one that doesn't belong in the scheme, but not because I don't like GA (I'm a big fan!). It's because it's not a project-based assessment. However, we need to work with the system that the consensus likes to use, and that will (for the foreseeable future) include both GA and A. I'll try to get things started this week from my side. Thanks again, Walkerma (talk) 05:41, 7 April 2008 (UTC)[reply]
Well, one great thing about A-class is that it's not work for me :) To the extent that it's useful, absolutely, keep it. I look forward to learning more about the individual review processes of the wikiprojects. - Dan Dank55 (talk) 12:42, 7 April 2008 (UTC)[reply]
The person interested in tightening up the assessment scheme looks to be quite busy elsewhere (he had warned me), so maybe we should make a start here anyway. However, I have an idea - I wonder if we should consider having a simple form, but if people want to know more detail they can click for more information? Bearing in mind the fact that thousands of Wikipedians need to consult this page (see these stats for proof), it would be perfectly reasonable to set up subpages as needed for this - especially if we add a range of examples, as planned. That's one thing I like about the wiki framework - you can keep it simple on the main page, but have more detail for those who want to go deeper. I think I'd like to write up an FAQ, because we do have some standard questions that keep getting asked. Does this sounds like a reasonable idea? Walkerma (talk) 14:49, 10 April 2008 (UTC)[reply]
Sounds great. Or, if you'd like to put more information in the chart, based on the questions people ask, that's fine too. What didn't look right to me about it was the low meaning-to-words ratio, or maybe I just didn't get the meaning. - Dan Dank55 (talk) 20:22, 10 April 2008 (UTC)[reply]

Overhaul and rewrite of the assessment scheme

There have been two proposals recently relating to assessment, and both seem to be reasonable (IMHO). They would both involve some rewriting and recalibrating, and therefore I think we should consider both proposals at the same time. I'm adding a third proposal, which is in effect how I think the first two would best be implemented together. There's also a fourth, which came up in discussions, and which I'll throw in for good measure. Walkerma (talk) 18:01, 12 April 2008 (UTC)[reply]

Simplifying the descriptions

(Described right above here) We should simplify the basic definitions of each class. The descriptions are quite detailed, but that may mean simply that people don't bother to read them properly. We could simply do a copyedit and chop out a lot of wording; that would make them easier to follow, but we may lose some of the rigour if actual examples or nuances of meaning are lost. Hence my proposal for a "summary style" approach; this will allow us to have very clear, simple definitions for routine use. Walkerma (talk) 18:01, 12 April 2008 (UTC)[reply]

Comments
I want to help with the rewrite

Be happy to help - Dan Dank55 (talk) 02:00, 13 April 2008 (UTC)[reply]

I can do what I can, anyway. John Carter (talk) 15:36, 20 April 2008 (UTC)[reply]
I do. Arman (Talk) 03:56, 8 May 2008 (UTC)[reply]

Refining the assessment scheme

See Wikipedia_talk:Version_1.0_Editorial_Team/Work_via_Wikiprojects#Assessment for the original proposal. We have a good scheme that works well, but there are variations in standards. It should be possible to sharpen the boundaries of the scheme by including additional examples to indicate specific detail about the levels (the lowest standard for Start-Class, vs. the highest standard for Start-Class). We may also be able to consider how we handle the different aspects of assessment (article length, quality, technical aspects, aesthetics, etc). We have one very knowledgable contributor offering to help, and I think we should use this opportunity to make the scheme more rigorous. Any thoughts? Walkerma (talk) 18:01, 12 April 2008 (UTC)[reply]

Comments
I want to help with the refining process

Happy to help with style and language issues - Dan Dank55 (talk) 02:09, 13 April 2008 (UTC)[reply]

Good idea. A few ideas that come to mind:
  • Clearly defining the standards for Start and B, specifically regarding need for referencing, if any, for B
  • Deciding what if anything to do with the GA/A conundrum
  • And something that has arisen with a few Biography articles where even everything known about a notable, obscure subject still isn't much: maybe some sort of "bastard" A/Start grade for articles that are as complete as reasonably possible, but still so very short that many stubs are longer. John Carter (talk) 15:36, 20 April 2008 (UTC)[reply]

Converting the scheme to summary style

This was my suggestion for dealing with the first two proposals, which at first glance would appear to be irreconcilable. How can we make the scheme even more nuanced and rigorous, yet make it simpler to understand? I think we can accomplish this through use of the summary style approach: Have one short, succinct description of the scheme, but then have a sub-page (or sub-pages) to give more detail. That way, someone who just wants to "get the general idea" can do so, but the reviewer who is agonising over where something is B or Start can look for some more detailed guidance. Is this a good approach to the problem?

Comments

Add an FAQ

The scheme is now well into its third year, and some of the standard questions and proposals keep coming up over and over again:

  • Why is A-Class above GA-Class?
  • Why is A-Class (or GA-Class) even needed?
  • Are citations required for B-Class?
  • How are articles promoted to A-Class?
  • How do I request use of the bot for our WikiProject?
  • I think we should have one more/fewer level in the assessment scheme!
  • Can our project use an extra level or categories in its assessment scheme?
  • Our project uses its own descriptor, "Foo-Class": Can this be added into the statistics table?
  • etc.

I think it's about time we wrote a simple FAQ to deal with these questions; for every one person who posts on one of these, there are probably ten who are simply baffled and leave.

Comments and suggestions for FAQs

We have Wikipedia:WikiProject Council/Assessment FAQ, which we can always expand for this purpose. Titoxd(?!? - cool stuff) 20:42, 11 May 2008 (UTC)[reply]

I'm willing to help write the FAQ page

Happy to help with style and language issues - Dan Dank55 (talk) 02:14, 13 April 2008 (UTC)[reply]

Happy to help address "I think we should have one more/fewer levels in the scheme", and there may be other things I can help with also. Holon (talk) 10:10, 15 April 2008 (UTC)[reply]
Me too. We have a FAQ (or two) at Milhist we can probably draw from. --ROGER DAVIES talk 09:04, 20 June 2008 (UTC)[reply]

General comments

Our scheme has grown from around 2000 articles when the scheme was automated two years ago, to around 1.1 million today - that's more than the growth of Boston in 1776 to the Boston of today. The scheme is holding up remarkably well, IMHO, but I think we need to revamp the "architecture" a bit. Walkerma (talk) 18:01, 12 April 2008 (UTC)[reply]

On Simplifying

Hi all. On Dank55's suggested simplication. I would certainly keep the current version with a more detailed description. However, for those who have become accustomed to it, I doubt they will refer to it in detail often, and an abbreviated version could be used to complement (not replace) the more detailed version -- i.e. a kind of quick reference version that people can go to if they prefer. An option to consider anyhow.

From experience, the examples tend to be the most powerful part of the process, and as I've said elsewhere I think it is excellent you have examples. The description of an article (like the description of most complex things) can be interpreted in different ways, and most importantly here, more or less strictly/harshly or leniently, and with different assumed interpretation of the various elements. Don't get me wrong, I think it's very important to describe, to orient to what features people need to look at, but then at the end of the day someone can always ask: so what does that actually look like? Just as a picture tells a thousand words, so does an example of an article!

So a quick reference with the same examples could be useful for those assessing a lot of stuff, or even those who assess just a few things after becoming familiar with scheme.

The more detailed version is also likely to be important in cases where there is some dispute. Holon (talk) 09:26, 15 April 2008 (UTC)[reply]

I've just noticed my comments are similar to the summary style idea above. If many are familiar with the existing scheme though, I'd still argue that keeping it and adding a short version would be easiest, but either way the principle is the same Holon (talk) 11:14, 15 April 2008 (UTC)[reply]

On examples

I'd strongly advise against using different examples/exemplars in different versions of the generic scheme (not that anyone has suggested it) because I have seen empirical cases in which exemplars are changed in a scheme that otherwise remains the same, and there is a severe impact on ratings (e.g. becomes much harder to be deemed in a category). The relevant research was thorough and very controlled. I can't say exactly to what extent it applies to this scheme, but in general it's better to keep things consistent as much as possible (provided of course they're sound and working!).

On that note, I'd also be careful changing the examples over time if you want consistent grading over time. Having said that, there are ways to link new to old if this is a must and I can advise and help. It takes some time and effort to make sure changing examples doesn't change the relative difficulties of the 'grades' though.

However, it may be very useful to have specific examples for more unusual kinds of article. In these cases, for the sake of comparability, I would advise people in the relevant projects to very carefully select examples that are as close to the same quality as those in the generic scheme as possible (for the same reason, they're very powerful). The basic principle is this. In cases where there are unique considerations and/or some of the generic considerations are not applicable (or less so), people have to take into account the considerations when deciding the grade of a given article. Now, assuming some effort goes into this, you may as well do it once then save having to do it every time thereafter. However, becuase the exemplars may have a strong impact on the assessments, ideally the first time a decision is made, an exemplar should be selected that is considered as close to the generic one as possible. Another thing worth considering.

Cheers Holon (talk) 09:26, 15 April 2008 (UTC)[reply]

On borderline examples

In keeping with the general principle of having a simple scheme with flexibility for cases that require or warrant special attention, I want to add that additional borderline examples could also be listed on a separate page, only to be used when necessary (e.g. if start vs B is a difficult call).

The scheme gives broad classifications, which is fine for many purposes. If people are interested in adding, I would recommend selecting candidate articles and experienced assessors quickly doing pairwise comparisons between the candidates and existing exemplars. Given relatively little data, I can analyse and report back scaled locations in order so a decision can be made about borderline below/above articles. If there is enough data, I can also advise which were most consistently judged, which are better to use.

Just as extra increments on a tape measure (or any instrument) provide additional precision in a region of a continuum, so additional exemplars provide additional precision in the region (border between adjacent classifications). So additional exemplars in selected regions allow greater precision when desired, provided they've been carefully selected and calibrated. Incidentally, this also answers the question in FAQ about more or less grades. The thing people are generally thinking when they ask this is: I think there should be more precision (or less, though I wouldn't recommend less here). So this is one way to get the best of both worlds -- simplicity plus precision when needed. Anyhow, hope this background and explanation is useful but fire away with questions if not. Cheers Holon (talk) 11:07, 15 April 2008 (UTC)[reply]

What would it take to establish a first-class foundation for Wikipedia standards?

After looking through the discussion here, I think it might be instructive to describe an 'ideal' process, and to work back from there to what's doable. Pretty much every issue that has come up here is fairly common in assessment. I hope it will be easier to see why from the ideal. Please keep in mind that the work put into what I outline to follow overlaps with normal work on articles anyway, and in the long run would likely make that work far easier by helping to identify what needs to be done and when.

It may also turn out something closer to the ideal is achievable with available skills than I realize. With some ingenuity, Wikipedia could be a first for online ratings en mass by developing a top-class process based on solid foundations! OK, not likely, but possible. It's already considerably better than the crude methods normally used, such as ratings of 1-10 plucked out of the air.

Ideal

Given the nature of articles, the following process in the ideal is what I would (and do) recommend.

  1. Compare (pairwise) a set or sets of exemplars and scale them.
  1. List in order from worst to best (by links). This provides an ordered set analogous to a ruler with many points of possible distinction.

This achieves two things:

  1. When the scheme is refined, it can be refined not only based on what 'should' distinguish better from worse, but what is seen to in a carefully calibrated and ordered set of examples.
  1. The set of examples can sit in the background and be used whenever anyone wishes to, for greater precision (you can always go from cm to meters).

The last point is key when articles are near a threshold for going from one "grade" to the next.

Probably more important than all else an ordered set of examples provides a clear picture for editors of what it takes for an article to progress toward the highest standard.

Common reaction

A common reaction to this is that it's too time consuming because most are used to easy, but poor, rating processes (e.g. pluck a number from 1 to 10 out of the air or a grade based on best guess).

I understand but my standard response is that the payoffs outway the up-front time, often by a large factor, and of course anything worth doing takes some effort and coordination. The only reason most of us can buy a thermometer and easily, yet precisely, measure temperature at will is that a lot of work lies behind its development and construction. Like anything else, including articles on Wikipedia themselves, quality products require some work.

Good measurement instruments and procedures are a cornerstone of industry and technology -- without common standards, many things are impossible in industry. The same idea applies to Wikipedia as a whole. If editors can quickly, yet precisely, measure against calibrated standards as they work and assess articles, there are similar payoffs. There is a lot more clarity on standards and how to know where you are and what it takes to progress.

I believe around a million have been assessed, is that right?

However, it's like everything, it does take time and coordination. Hopefully though, this helps in explaining various issues and how they all fit together in the bigger picture even if nobody actually ends up participating.

Small-scale test

I can offer to anyone who wishes to do a small scale test in their own project. I don't think I have yet encountered a case in assessment where people have not found the process informative and useful.

Send me a set of article labels, preferably 15 or more, and I'll send back a spreadsheet with a set of pairwise comparisons to be done: each to be compared with each other and a judgment made about which is better. Do these and send me back the results. I will scale it, put them in order, and tell you how consistent you were overall and tell you which articles were anomalous, if any. Include at least two or three of the articles in the scheme so you will be able to see how the rest scale in between. If you can organize more than one judge to make comparisons, even better, and I can give you feedback on each judge's consistency and the agreement between them.

This should be quite quick for someone who is reasonably familiar with the set of articles, if the assessor only needs to refer to them when it's hard to say which is better. Most judgments should be quick and only a portion take more time. The payoff -- for your project you get a much clearer picture of the way articles progress from worse to better quality, and you have a far more precise basis for judging when an article should move up a grade.

This can be extended across projects. This would simply require choosing a number of articles in your project as well as some from another project also doing a calibration exercise. All articles can be scaled jointly and tests conducted to see how successful the exercise was. It's preferable that the assessors have some knowledge of the other articles, but I doubt it would be necessary for them to be experts on the content to get worthwhile results.

Obviously, this requires coordination if it crosses editors and particularly projects. However, the result could be a nice list across projects of articles from the worst to higher quality that everyone can refer to pluse the benefits to the project mentioned.

So to reiterate, this process is beneficial for

  • refining the scheme by seeing what actual progression looks like, according to consistent judgments by a methodical process.
  • provideing a set of examples (behind the scenes) that includes examples in the scheme, and can be used when the call between one grade and the next is getting difficult, avoiding debate the number of classifications (there is more precision if you want it and editors would know more clearly when an article is getting close to progressing to the next grade).
  • giving a clear summary picture of what it takes to progress articles for editors, which would probably also reveal things not anticipated up front.
  • founding refinements on the information to make the criteria more accurate, so more efficient to use and more credible.

I know there's a lot, but I hope it gives a clear picture of the ideal, and it might spark ideas even if nobody elects to do a trial.

Don't hesitate to criticize -- believe me it's unlikely you'll raise anything I haven't heard many times, and if you do, I'll be grateful for the challenge.

Cheeers all. Holon (talk) 10:45, 11 May 2008 (UTC)[reply]

I'm not entirely sure what you mean by all of the above. Having said that, the new Wikipedia:WikiProject Christianity/Christianity in China work group has about 400 pages total tagged to date in Category:Christianity in China work group articles. It might work for the purposes you're suggesting. I expect there to be a lot of deviation there, though, because many of the assessments seem to have been copies of preexisting assessments. John Carter (talk) 20:17, 12 May 2008 (UTC)[reply]
I'm not sure I followed that either, but the general thrust seems to be adding examples, or proceeding to describe the problem by comparing examples, and I'm all for that. My intention in simplifying the table was only to take out some words that I couldn't follow, to make the table easier to read, at least for "stub" and "start". It's perfectly okay with me to add detail to the table, as long as the table is easy to read and understand.

Lists

I noticed something about possibly including List class into the table, so here's something I dug up deep within WP:VG/A. --.:Alex:. 21:00, 20 June 2008 (UTC)[reply]

List progress grading scheme
Label Criteria Reader's experience Editor's experience Example
FL
{{FL-Class}}
Reserved exclusively for lists that have received "Featured list" status, and meet the current criteria for featured lists. Definitive. Outstanding, thorough list; a great source for encyclopedic information. No further additions are necessary unless new published information has come to light, but further improvements to the text are often possible.

List
{{List-Class}}
Reserved exclusively for stand-alone lists, which are articles consisting of a lead section followed by a list. Articles with lists embedded within a small section of an article are prosaic articles and are not considered lists. Useful to many, but not all, readers. The reader doing in-depth research may find insufficient information or excessive information only useful to fans. Any editing or additional material can be helpful. Considerable editing is needed to reach Featured list status. In particular, issues of breadth, completeness, and balance may need work. Peer-review would be helpful at this stage.

Sounds like B class combined with Start class to me. However it might be quickest for assessors (and editors) just to define it as 'not a Featured List'. --Hroðulf (or Hrothulf) (Talk) 23:10, 20 June 2008 (UTC)[reply]

For the most part, "list class" would refer to articles that can never be more than a mere list or indexes (as opposed to a categories), for instance List of symphonies by name; refer to Wikipedia:Categories, lists, and navigational templates in this regard; I would rather that lists which are seen as articles follow the normal schema (Stub>Start>C>B...) G.A.S 05:19, 25 June 2008 (UTC)[reply]

Comment regarding rewrite

  • We should use a single article to show progression through each of the classes; in a similar way that was done with atom at Wikipedia talk:Version 1.0 Editorial Team/Assessment#Evolution of an article - an example. We could use multiple articles for this purpose to have a comprehensive set of examples. G.A.S 05:51, 25 June 2008 (UTC)[reply]
  • Either FA/FL/GA has to be integrated into the list at the top, or split into a new table. G.A.S 05:51, 25 June 2008 (UTC)[reply]
  • I am not sure about the comment about considering a peer review at "A Class" level, as it is done at high-B class for most articles in preparation for GAC. (This may be the reason why Peer review is flooded with articles—but unless Peer review change their policies to review articles only when they are at A-class, this will not change.) G.A.S 05:51, 25 June 2008 (UTC)[reply]
  • We may consider adding list class to the table, and specify that this is only to be used for Wikipedia:Lists (stand-alone lists) and lists which act as navigational aid per Wikipedia:Categories, lists, and navigational templates.
    Reader's experience = Helps to navigate across related topics.
    It should also be specified that "article lists" should follow the normal scheme (except for FL status??).
    G.A.S 05:51, 25 June 2008 (UTC)[reply]
Thanks! I think I agree with some of these. For the second point, we're disagreeing about the meaning of "peer review". I meant an internal peer review WITHIN THE WIKIPROJECT. I should probably find a better way of saying that. I love that Atom example, that's a must for this page. I put FA etc at the bottom because I was trying to avoid people thinking "GA is better than A" or vice versa; it's not better, it's just different. (And it's obvious that an FA is better than a Stub) If we could put a border between the two sections, it'd be even better. My preference is to have the WikiProject-based assessments at the top, simply because that's what the page is mainly about; we're not really trying to define what a Featured Article is. If the consensus is for FA etc at the top, I don't mind switching them, though. Adding List-Class is definitely a good idea, I should have included that, my mistake. Very useful comments, thanks! Walkerma (talk) 14:26, 25 June 2008 (UTC)[reply]
The link threw me off (I meant "Editor's experience"—where we say that PR may help, but most articles go through PR before GA/late B class).
Still not with you on the A Class and peer review/project review: The project has to review the article for it to become A class, yet only then is it a candidate for a project review!?←N/A after seeing diff Regards, G.A.S 17:13, 25 June 2008 (UTC)[reply]

What's happening then?

So... how, if at all, is this going to progress? What do we have to work with, and work on? Happymelon 18:27, 30 June 2008 (UTC)[reply]

OK, I'm glad you asked that! I think that on Friday I'd like to officially announce that C-Class is going ahead, and these are the new definitions, etc. We will have to leave the GA/A issue for now, though I think we can tighten up on peer review a little bit. I think we can reach consensus on some changes to B and stub. I'm away at a conference at the moment, but I should be able to work on it on Tuesday night. I should get back on WP properly again soon. Feel free to tweak the write-up in the meantime.
As for making the announcement, should we use AWB to spam all the WikiProject pages? Should we divide up the work between us? Is there a better way? I'll cross post this on the main assessment page. Walkerma (talk) 03:56, 1 July 2008 (UTC)[reply]
Friday could work. I should be able to have Igor ready by then, even if I have to skimp on a couple of new features to get it out. – ClockworkSoul 04:35, 1 July 2008 (UTC)[reply]

This went live yesterday? It seems from the revision history that changes were still being made until late yesterday, and no-one mentioned here that it was considered complete. I don't think there are any big problems, as it doesn't deviate in substance from WPBIO or MILHIST's criteria, but it would have been nice to know that it was considered ready for launch. --Hroðulf (or Hrothulf) (Talk) 11:27, 5 July 2008 (UTC)[reply]

I announced last Monday/Tuesday that I wanted to do this on Friday, and that we would be implementing the results of the discussions held during late June. on Wednesday I closed the discussions and announced that we would go live on Friday. I was at a conference until the early hours of Thursday morning, with limited internet access, so I could only do a certain amount. Some things got missed - such as a cross-post here - there are all sorts of details to be handled. I've used nearly all of my spare time since the conference getting these details done; fortunately, the group of us active here over the last couple of days were unanimous in our opinions about changes (because we'd agreed on the substance of these earlier). The alternative - letting things drift along for a few more days - was much worse than last-minute wording changes. Sorry that we didn't things clearer. Walkerma (talk) 17:50, 5 July 2008 (UTC)[reply]

Less than fewer words

I agree with the discussion above that there should be a simplified version of the assessment scheme, even if just as a summary or supplement of the more complete and nuanced version. How about taking it to the extreme? A simple checklist could help give an overview at a glance without having to read (almost) anything! For example, something like the table below:

Class Completeness NPOV References Figures/tables Readability MOS Headings
A 100 100 100 100 100 100 100
B 75 75 75 50 85 75 100
C 50 50 25 20 70 25 66
Start 25 0 0 5 60 5 33
Stub 1 0 0 0 50 0 0

In this table, each number is a "semi-quantitative" percent progress towards achieving the goal for that column. Of course, the numbers are a bit fuzzy and subjective, and I just made them up, so they are certainly open to discussion. I'm just presenting this table as a proof of concept; the point is to find a way of showing how the various requirements change at different rates from one level to the next, and to provide a checklist for quick reference. Naturally, we would hope that the raters would also read the detailed descriptions, which would help in understanding qualitatively what "75%" means. :) --Itub (talk) 16:43, 1 July 2008 (UTC)[reply]

Yes, I like this! As an organic chemist, I'm happy to use numbers in a fuzzy, subjective way to convey concepts, but I'm sure the physical chemists (and some others!) might say, "How do you define these percentages!" I wonder if we could put these numbers into a graphic format, which might put the numbers into a nice visual - and also it might be more apparent that this is a fuzzy approximation. (A pie chart would be easiest, but someone artistic could probably come up with something better and prettier. Thanks a lot, Itub! Walkerma (talk) 02:13, 2 July 2008 (UTC)[reply]
This is an excellent idea! G.A.S 05:24, 2 July 2008 (UTC)[reply]
I'm going to try doing a graphical version of this in Excel, if I succeed I'll post tomorrow. Walkerma (talk) 07:45, 4 July 2008 (UTC)[reply]

B = 0.5?

With the new B-class criteria (and overall upgrade of standards for B), should we not change "A well written B-class may correspond to the "Wikipedia 0.5" or "usable" standard" to something like "B-class corresponds to the "Wikipedia 0.5" or "usable" standard"? Or at the very least, "usually corresponds"?--Father Goose (talk) 20:40, 4 July 2008 (UTC)[reply]

That phrasing was in the draft, but got omitted from the final version we used. This description (0.5=usable) dates from the beginnings of the WP:1.0 project, before an assessment scheme had been organized. This was relevant when we wrote the assessment scheme in 2006, but I doubt if many people remember this system now! Please take a look at the version that has been uploaded, and let us know if there's anything else you see that doesn't fit. Thanks, Walkerma (talk) 21:21, 4 July 2008 (UTC)[reply]
I just came back from a trip, and I haven't really seen if this still remains, but generally, this is useful for WP:WPRVN, as it sets a sort of "quality floor" that was once not very well defined. Originally, articles that were A's and GA's were accepted, with some B's and some Starts either accepted or not accepted in the release versions. This now sets some sort of "hard" requirement. Titoxd(?!? - cool stuff) 05:03, 8 July 2008 (UTC)[reply]