Wikipedia talk:Requests for comment/User page indexing

From WikiProjectMed
Jump to navigation Jump to search

Notification

I hope I didn't miss anyone while notifying participants of the move. If I did, it was not because I was trying to do any selective canvassing. If you notice that I missed someone who has participated in one of the discussions, please notify them of the move/consolidation. Gigs (talk) 18:45, 26 June 2009 (UTC)[reply]

Thank you for doing it the right way. Unomi (talk) 19:14, 26 June 2009 (UTC)[reply]

Replies to Statement by Xeno

Google's purpose is to index the internet and provide a useful search tool. If we have content we don't want The Internet finding, we should remove it entirely, not hide it from Google - that's more like sweeping the problem under the rug.

  • Many a notable web site chooses to not let Google index certain pages and content via the no_robots meta tag. So why is Wikipedia user space so much more special that it just absolutely must be indexed by Google? To your second point, I totally agree that if we have content that we don't want being indexed, we should remove it immediately. How do you propose knowing when a BLP-violation article has been created or moved to user space? If you don't know about it, you can't delete it. There's just far too many WP:OR and WP:BLP violating "articles" resting in user space that isn't known about to be deleted in the first place. Google is throwing these up in their search results, misleading users who may think they are reading an actual article but are reading an article that's been created in userspace without any kind of scrutiny by the community as to the validity of the claims in said userspace article. These should not be indexed and it's impossible for us to know how many exist and when they exist. - ALLSTRecho wuz here 19:13, 26 June 2009 (UTC)[reply]
  • What would an example of a google search be that you could not do with the internal search at this stage? In other words, what advantage does google give you?Unomi (talk) 19:16, 26 June 2009 (UTC)[reply]
But you agree that this possibly useful content could be found using wikipedias internal search mechanism? Unomi (talk) 20:15, 26 June 2009 (UTC)[reply]
I suppose, if I wanted to do two separate searches on every Google search I made. But that would be a pain. –xenotalk 20:18, 26 June 2009 (UTC)[reply]
I use Google Toolbar and thus can search the internet, just a website (like Wikipedia), and the page, and can highlight each word and phrase with colors, and can hop to each instance as desired. I can even print the colored version! It works great and it would be a disadvantage to me not to be able to use it. Using it I can also find content here that shouldn't be here and thus take action to get it deleted. Otherwise it can exist without anyone noticing it. So from that angle, I'd hate to give up using Google to search ALL of Wikipedia, but I suspect that we could get Google to provide an internal search engine that could search all of the indexed AND nonindexed content. That way we'd have the advantage of keeping the outside world from searching usertalk and subpages, while (preferably only registered) editors could do so. -- Brangifer (talk) 01:59, 27 June 2009 (UTC)[reply]
I use Googlepedia along with Google Toolbar. It basically pulls up a relevant article to your search. When there's not an article title to exactly match your search term, it pulls up something it considers to be most relevant. I've never seen it pull up anything out of main space, but how entirely unprofessional for it to display a user page or some half-assed written user sandbox draft page. Chances are fair that many readers would not even notice it's a user space page and just consider it a horrible, unreferenced mess (on a non-notable person if it's a user page). Or, just as bad, it pulls up a user talk page where too nerds are arguing over some stupid, pointless aspect of the subject, tossing around CAPS and archaic words used nowhere outside of WP, like "tendentious", all of which no common reader would understand. It's equally unprofessional for any type of Google search to yield such results. لennavecia 14:10, 27 June 2009 (UTC)[reply]
Xeno, I don't use googlebar because I don't like the privacy policy. That being what it is, doesn't the google bar just return one or two results per site like any other google search of the web? In order to find all the occurrences of a search term on a site, don't you have to then have to click a "more results from this site"? This is 2 searches - and this is what you do if you google a term through the regular google. Now, you can go directly to the google site search, but this will not give you the results on the rest of the web. So, I don't quite see the advantage. Can you give me an example of something where you find in one google search all the occurrences of a search term in the User space, and get back the results from the web? --stmrlbs|talk 19:24, 27 June 2009 (UTC)[reply]
I don't recall saying anything about the Google toolbar. –xenotalk 12:59, 29 June 2009 (UTC)[reply]
The point is you would still have to do 2 searches to find every occurrence of something on both the web and in wikipedia. Just doing one search only gives you a couple of hits in wikipedia, but not all occurrences of a phrase. examples:
* google search for horticulture - 1 result (found on wikipedia)
* wikipedia search for horticulture - 6,015 results
* google search for tiger - 2 results shown from wikipedia with 1st search / click "more results from this site" to get rest - 44,500 results
* wikipedia search for tiger - 59,027 results shown from wikipedia. A difference of over 14,500 more results from the wikipedia search.
bottom line, Xeno - you might think you are getting all occurrences of a word/phrase by using Google, but you aren't
--stmrlbs|talk 16:25, 29 June 2009 (UTC)[reply]
No, but if Google is working properly you might get the most relevant one. –xenotalk 22:17, 29 June 2009 (UTC)[reply]
gosh, I hope it is the most relevant result, if google only gives you one result as compared to 6015 that you get with the wikipedia search.
btw, the wikipedia search has ranking, too, where it tries to order the results with the most relevant at the top. --stmrlbs|talk 07:34, 30 June 2009 (UTC)[reply]

Replies to Statement by Gigs

The current practice at WP:MfD allows far too much leniency in user space to allow indexing: POV versions of articles (previously deleted, or never in mainspace), "pseudoarticles", often promotional, that have little to no chance of ever going into mainspace, and often what amount to personal home pages. Additionally, many people were under the incorrect impression that userspace was already not indexed. Therefore, I propose that user space should not be indexed by default, and should rather opt-in using {{INDEX}} tags. The list of user pages that are indexed could then be monitored.

and such opting in should not be allowed on draft articles or articles copied to user space 70.71.22.45 (talk) 00:02, 27 June 2009 (UTC)[reply]
Agree with IP. لennavecia 14:38, 27 June 2009 (UTC)[reply]

MfD enforcement (or lack thereof)

I have created User:Gigs/Kept in userspace Jan 2009. It examines 1 month of Userspace MfDs, and lists my opinion on them. I believe that a number of them failed to reach the proper conclusion in application of our userspace policies, and that a few of them are actively damaging to the encyclopedia if userspace remains indexed, mainly because they are easily confused with real articles. Some of this might have been because the !voters were under the impression that userspace was already not indexed. Gigs (talk) 20:03, 26 June 2009 (UTC)[reply]

I just assumed that the User pages were not indexed. I was really surprised to find that they were, and that sometimes they would appear ahead of the main articles in google depending on what phrase you used in the search. --stmrlbs|talk 06:02, 27 June 2009 (UTC)[reply]
Yeah, I think that is a common assumption. Gigs (talk) 16:27, 27 June 2009 (UTC)[reply]

Query on an alternative

If the consensus is to keep userspace indexed, might we be able to implement some sort of prominent disclaimer box on userspace pages that clearly states "This is NOT part of the encyclopedia; don't believe anything you read here" or something of that sort? Powers T 20:17, 26 June 2009 (UTC)[reply]

Something like Template:Userpage? Seems like a good idea. –xenotalk 20:21, 26 June 2009 (UTC)[reply]
I think this might be a good idea even if we do disable indexing on userspace, since some wikipedia mirrors are mirroring userspace and allowing it to be indexed as well. One of my main concerns are things that look like mainspace articles in userspace, and this would help there. But only if it were mandatory. Gigs (talk) 20:23, 26 June 2009 (UTC)[reply]
You should statementfy this suggestion. –xenotalk 20:26, 26 June 2009 (UTC)[reply]
Done. Powers T 02:18, 27 June 2009 (UTC)[reply]
Just be aware this would probably require a wikipedia software change to make this appear on every user page. This is a lot more effort than adding a "disallow" to the robots.txt file. Otherwise, you are back to depending on Users putting it in themselves, and the users who want to use Wikipedia for promotional purposes probably won't be jumping to put this in. Why don't we just put in the guidelines "If you are spamming, please put a banner on your page to let people know this isn't a legitimate wikipedia article". --stmrlbs|talk 07:00, 27 June 2009 (UTC)[reply]
Whichever way will make it the default will be best. Your logic about spammers doesn't make sense. Spammers will obviously NOT add a NOINDEX template. If we create an even playing field by making this a part of the software, there will be no problem. It will be part of the rules for editing here. It will also eliminate a lot of the conspiracy theories and assumptions of bad faith that are going on here. Lots of stuff exists here without any thought of spamming or getting noticed by Google. I was rather surprised to discover that NOINDEX wasn't the norm in userspace. It turns out I was mistaken. It was the "nofollow" for external links I was getting confused about. From the comments I have read during these discussions, and some specific searches only people in the know would do, I seem to be able to access some of my own userspace pages. They are innocent, but they are being indexed! Now it appears that Google is also our best friend for discovering policy violations as well. Without it there will be lots of hidden violations going on that won't be discovered. That's the downside to making userspace NOINDEX. Crap, I just added NOINDEX to a bunch of my subpages. I still have some, but have forgotten where they are. I'm beginning to realize why this sounds good, but has been repeatedly defeated. -- Brangifer (talk) 07:31, 27 June 2009 (UTC)[reply]
Wow, you do have a lot of subpages. View them all here. Powers T 11:53, 27 June 2009 (UTC)[reply]
Thanks for that link! I knew that such a link existed but had forgottten what it was. I see there are some pretty old "storage bins" and redirects there. Time to cleanup...as I get time. -- Brangifer (talk) 14:31, 27 June 2009 (UTC)[reply]
BRangifer, my statement about asking spammers to put a banner up on their user page was meant to be ridiculous, and an obvious example of why voluntary placement of "userpage banners" is not going to work. So, thanks for proving my point. --stmrlbs|talk 17:59, 27 June 2009 (UTC)[reply]
Stmrlbs, I admit I don't know how hard this would be, but I'm hoping it wouldn't actually be a software change. Something like showing the "This is a talk page" disclaimer when editing a talk page. Powers T 11:53, 27 June 2009 (UTC)[reply]
It could probably be done by editing the UI which I believe any admin can do. Gigs (talk) 16:44, 27 June 2009 (UTC)[reply]
what is the UI (user interface, I assume)? is it defined somewhere? --stmrlbs|talk 17:59, 27 June 2009 (UTC)[reply]

Replies to Statement by MZMcBride

A few points to consider:

  1. Extreme views rarely work
    • Some people have called for indexing only the article namespace, which is a great idea until you consider things like files, categories, and portals. And, hey, is it really necessary to block Google from indexing "Assume good faith" or "Ignore all rules" when these ideas have permeated our culture?
    • Even when discussing only the user space, this isn't an all-or-none battle. More creative options might include removing user subpages from indexes, but leaving the root pages. (I've always found it a bit odd to see my sandboxes in search results, personally.)
  2. Local decisions are local
    • Even if it is decided to remove all User: pages from search engine indexes, it does not impact the other wikis and sites that contain similar content. In fact, it will push the "less important" content up in the results. (Meta user pages, Commons user pages, etc.)
  3. There's not an expectation of privacy on a public website
    • Some have tried to claim that users are posting material on their user pages and are (seemingly) surprised when the content reappears in search engine results. The English Wikipedia is one of the most indexed, crawled, and visited sites in the world. No user should ever think they can post information on a public website and expect it to stay private.
  • For each point
    • 1)I would assume that when there's a call for not automatically indexing userspace, that's all it means, literally. Other spaces, assuming along with article space, wouldn't be included in automatic noindexing. It just only makes sense that when "some people" are calling for userspace to stop being indexed, they don't mean article space, category space, portal space.. etc. Although since Portal is a pseudo-namespace, it really shouldn't be indexed either.
    • 2)We can deal with those local projects, at those projects just as we are doing here.
    • 3)I don't think privacy is the issue. I think the issue is misleading readers that an article in someone's userspace is the definitive, be all article on the subject. Likewise, the BLP issues that exist in some userspace articles cause more concern with indexing.
Just my thoughts on your points. - ALLSTRecho wuz here 23:23, 26 June 2009 (UTC)[reply]
i agree... especially on point 1. this is about userspace not other non-mainspace indexing... we shouldnt get sidetrackeded from the issue 70.71.22.45 (talk) 00:04, 27 June 2009 (UTC)[reply]

Voting more than once?

What is meant by this statement:

  • "Users should only edit one summary or view, other than to endorse."

Does it mean that multiple votes are allowed? I notice that it's happening. -- Brangifer (talk) 03:59, 27 June 2009 (UTC)[reply]

i think it means that users can endorse more than one summary or view... they may be different but not opposed... but that you aren't allowed to create more than one view 70.71.22.45 (talk) 04:57, 27 June 2009 (UTC)[reply]
some people come up with the same proposal, but perhaps for different reasons. I think the idea is that you are endorsing not just the proposal, but the summary and reasons for the proposal. so.. you can vote for more than one, because you agree with both summaries for the same proposal. --stmrlbs|talk 06:44, 27 June 2009 (UTC)[reply]

RFC is not a vote :-P --Kim Bruning (talk) 13:08, 27 June 2009 (UTC) (it's a request for comments, like it says on the tin )[reply]

Replies to statement by BullRangifer

Actually the only change required for this to happen is part two. Part one is the default way things already work, so it's definitely already "doable", with no change required. I have now added parenthetical remarks to make this clear. -- Brangifer (talk) 05:24, 27 June 2009 (UTC)[reply]

perhaps the person who said it wasn't "doable" meant that it isnt working properly the way it is set up 70.71.22.45 (talk) 06:37, 27 June 2009 (UTC)[reply]
Maybe so. The solution is to do what I have seen done many times -- do something about it. That works. There is no need for added bureaucracy or policies. The existing ones work. -- Brangifer (talk) 06:56, 27 June 2009 (UTC)[reply]

Replies to statement by Stmrlbs

this is just a reminder to User:BullRangifer that any discussion should be taking place on the dicussion page... "All signed comments and talk not related to an endorsement should be directed to this page's discussion page. Discussion should not be added below. Discussion should be posted on the talk page. Threaded replies to another user's vote, endorsement, evidence, response, or comment should be posted to the talk page." 70.71.22.45 (talk) 06:37, 27 June 2009 (UTC)[reply]

Of course. I got carried away. At least you noticed it, but apparently won't do something to lessen the increasingly confusing chaos on the page. We're getting too many proposals that overlap each other. You should just add your comments to Gig's existing section. -- Brangifer (talk) 06:59, 27 June 2009 (UTC)[reply]
It's a common problem with this style of discussion that you get several statements that largely overlap. Gigs (talk) 16:29, 27 June 2009 (UTC)[reply]

So you point out a valid reason to want to search userspace using Google, then say we shouldn't allow searching userspace with Google? The spam doesn't go away just because we cover our eyes. Mr.Z-man 22:14, 27 June 2009 (UTC)[reply]

uh.. what? I don't think there is any situation where you would have to use google to find something in a User space that couldn't be found with Wikipedia search. So, I don't get your point. Take the user pages out of google, and you take away the motivation for putting spam on the user pages to begin with. --stmrlbs|talk 22:48, 27 June 2009 (UTC)[reply]
Stmrlbs, what kind of "spam" are you referring to? Some hate sites are forbidden here, but that applies to all of Wikipedia, not just to userpages. Personal websites are allowed on userpages and even on article talk pages when it's relevant to a discussion. So what are you talking about? Are you suggesting that we change the rules for userpage content? I don't understand your point. -- Brangifer (talk) 23:28, 27 June 2009 (UTC)[reply]
from my summary (which I take it you didn't read) - Take a look at Template listing of User pages that have a high probability of spam/personal use. Here is an example from one of the searches of an advertisement for a graphic design company that has been around since 2006. Main User page, too. I'd find more examples, but there are 47,709,651 Registered Users (and growing). So, I'll let you look through them.
This is the problem with "scans".. they help, but you still need a lot of time to go through all the results of a scan to determine which users are using the wikipedia google ranking for non-wikipedia purposes. --stmrlbs|talk 07:44, 28 June 2009 (UTC)[reply]
You linked to a page that uses Google searches to find spam in userspace. Mr.Z-man 23:37, 27 June 2009 (UTC)[reply]
Who are you talking to in both instances in this thread? I'm having a hard time figuring out what you mean. -- Brangifer (talk) 23:47, 27 June 2009 (UTC)[reply]
My first comment was about Stmrlbs' RFC comment in general, hence it wasn't indented at all and added at the end of the section. My 2nd comment was to Stmrlbs' reply to my first, hence it was indented 1 ":" more than his. Mr.Z-man 15:33, 28 June 2009 (UTC)[reply]
Mr. Z-man, all of these searches in this template could be done with the wikipedia search, but the purpose of this spamsearch template is to show how these pages appear in google. The point is a lot of these pages are spam/promotional pages, and they shouldn't be on google as part of wikipedia. Don't you wonder why people would put this stuff on Wikipedia when they can put their ad, or resume, or political piece on one of those free blogs? The reason is because their ad/resume/political-article won't appear on the first page of google if they do that. By putting their ad/resume/political piece on a wikipedia user page, they can ride the back of wikipedia to the first page of google. --stmrlbs|talk 07:44, 28 June 2009 (UTC)[reply]
they shouldn't be on google as part of wikipedia - No, they shouldn't be on Wikipedia at all. The real solution is to remove the content, not hide it. Mr.Z-man 15:33, 28 June 2009 (UTC)[reply]
No one is saying that we should not delete 'spammy stuff', simply that that userspace allows, rightfully, for a range of material that would not be allowed on the mainspace, but as of now is actively indexed by google and achieves prominent listing from its association with wikipedia when it in all likelihood should not. Unomi (talk) 17:09, 28 June 2009 (UTC)[reply]
Every namespace has stuff that we wouldn't allow in mainspace. That's why we have different namespaces. If something is so bad that we don't want the general public to be able to find it, it should not exist on the site at all, and if isn't bad, we shouldn't hide it. Mr.Z-man 17:20, 28 June 2009 (UTC)[reply]
Until we find it so we can delete it, we can choose not to display it. As one of the statements effectively said, sure you want to keep your closet clean, but in practice you often get behind - so why publicise its contents and current state? Effectively, we have a dirty closet and are choosing to point a webcam at it. Yeah we should clean it (continuously...), but why point a webcam at it? Disembrangler (talk) 17:39, 28 June 2009 (UTC)[reply]

Replies to statement by Kim Bruning

People who contribute to wikipedia obtain reputation as their reward. This reputation spills over into things like google results. This proposal will deny this form of reward, and chase away many skilled contributors.

Like many open source and open content projects, Wikipedia doesn't provide a monetary reward to the majority of its contributors. However, the GFDL and CC-BY licenses both reward people by requiring that their names are attached to their work, and thus reward reputation.

A user page is part of the semantics that provide the reward portion of our open content licensing.

While we don't actually earn Whuffies here, and no one quite lives the life of Manfred Macx (except maybe User:Jimbo Wales?), I've found that on-line reputation does actually help a lot in the real world too. In my own case, it has gotten me support or jobs I otherwise would not have had access to, and (since the jobs are all wiki-related) it allows me to gain experience and do things that help wikipedia directly, or at least indirectly.

While I'm sure most folks here don't care one whit about whether their fellow wikipedians or wikimedians starve or not in these financially harrowing times :-P; I'm pretty sure that you'll be less happy if certain folks who are maintaining and creating support infrastructure for wikipedia stop doing so, and instead start a promising career as burger-flip-engineers.

So in summary: By removing indexing from user pages, you are denying reputation to people. Some people who rely on that reputation to support wikipedia will have more trouble doing so in future. Please at least allow people to have their own user pages indexed.

  • They could just add {{INDEX}} to their page. لennavecia 14:25, 27 June 2009 (UTC)[reply]
    • So wouldn't noindexing everything else give these folks looking to promote themselves even more coverage? Antithetical to the initial motivation of this RFC, it seems. –xenotalk 14:40, 27 June 2009 (UTC)[reply]
      • The point is that then it can be monitored more easily. Unomi (talk) 14:59, 27 June 2009 (UTC)[reply]
        • And then people will be at each others throats at what can and cannot be indexed. No, I don't think this is a good solution at all, nor am I convinced there's much of a problem to solve. –xenotalk 15:05, 27 June 2009 (UTC)[reply]
          • There's a difference between promotion and using your wiki-accomplishments for your benefit. My user page is wholly within policy. Would I put it on my resume for any reason? Surely not, but if I did, is that considered inappropriate promotion? No way. Especially in the sense Kim notes, where his profession is with wikis. An example of a promotional user page that violates policy, in the opinion of some, is User:Jimbo Wales. That said, I don't see how this is antithetical to the motivation of the RFC. I'm sort of lost on your argument. We can easily monitor which pages utilize the INDEX template. لennavecia 15:18, 27 June 2009 (UTC)[reply]
            • The fact that we'll be giving greater promotional coverage to people who choose to index their subpages, and we'll have to now create an index-police to ensure nothing "too spammy" gets in there. And where do we draw the line? Shall the next lame edit war be over the indexing of User:Jimbo Wales? –xenotalk 15:21, 27 June 2009 (UTC)[reply]
So in summary, you're saying that the whole NOINDEX concept might not be worth the trouble? :-)
In furtherance of my position: Admittedly, I'm an extreme example, because I actually work with wikis all day. But even people who only casually do things with wikis should at least "see their name in print." It's a great motivator.
Also, while NOINDEX doesn't technically violate GFDL or CC-BY-SA, the user pages do help show appreciation for people, and doing so is very much in the spirit of open source and open content. --Kim Bruning (talk) 15:43, 27 June 2009 (UTC)[reply]
Yes, I've never been convinced that NOINDEXing on a wide scale is necessary. Perhaps on a limited scale for noticeboards, AFDs, and the like. But the rest should be available for Google to see. –xenotalk 15:46, 27 June 2009 (UTC)[reply]
  • Not everyone edits for the reward of "public wikipedia reputation". In fact, I think a majority of people do not wish their real life identity to be publicly accessible to the world through google. This is why there is a policy of no outing of USER real world identity.
However, for those who do want to have the reward of public acknowledgement of their work on wiki, the use of {{INDEX}} would accomplish this. So, I think {{INDEX}} would be a much more workable solution for people wanting public acknowledgement, than indexing all USERS for the needs of a few. Indexing User pages should be the exception, not the rule.--stmrlbs|talk 17:35, 27 June 2009 (UTC)[reply]
I'm fine with giving people a choice, though I would very much prefer for people to allow their pages to be indexed as the default, not the exception (reputation works both ways!). --Kim Bruning (talk) 18:28, 27 June 2009 (UTC) that, and I actually try to search user pages from time to time, but wiki-search sucks. :-/ [reply]
I don't understand "reputation works both ways"? And can you give me an example of something that you can find with google that you can't find with the wiki special search? Check the User checkbox and User talk checkbox if you want to search the user space. --stmrlbs|talk 19:04, 27 June 2009 (UTC)[reply]
You can eventually find it with both, if you have an infinite amount of time, but only google has the superior google algorithm.
Reputation works in both ways, in the sense that how more smart people are associated with wikipedia, the better the reputation of wikipedia.
(A cynical SEO flunkie might also remark that it also improves wikipedia's page-rank in general :-P )
--Kim Bruning (talk) 20:19, 27 June 2009 (UTC)[reply]
If someone needs awards to contribute, then their contribution is meaningless. Sorry, but the products of corruption are corrupt and therefore tainted. Ottava Rima (talk) 21:50, 8 July 2009 (UTC)[reply]
I'm sorry to hear that you disagree with our foundation principle that everything is to be open content. People do deserve to be rewarded in reputation, for the work they do here. Either you fulfill the contract both in letter and spirit, or you do not. --Kim Bruning (talk) 01:10, 11 July 2009 (UTC)[reply]

Replies to statement by Beetstra

If allowing people to choose to INDEX their pages (if NOINDEX is default) causes problems, the fairly obvious solution is to put some hurdles in the path of INDEX - find some way so that INDEXing requires some degree of community approval (before it has any effect - i.e. not just monitoring INDEXed pages). Can't immediately think how, but I'm sure we can come up with something. Rd232/Disembrangler (talk) 08:43, 28 June 2009 (UTC)[reply]

Also what does xeno's comment "the other chaff being thrown aside to ensure only the spammy stuff gets out there" mean? Is non-spammy WP userpage stuff really (in a material way) competing with spammy WP userpage stuff for any given Google search? Disembrangler (talk) 15:32, 28 June 2009 (UTC)[reply]
And (@Z-man) if __INDEX__ can be abused, maybe the solution is to prevent that abuse somehow, not to make such abuse irrelevant by indexing everything. Disembrangler (talk) 15:32, 28 June 2009 (UTC)[reply]
Yes, I had no idea what Xeno meant either. --stmrlbs|talk 19:05, 28 June 2009 (UTC)[reply]
You could use an abuse filter to prevent people with a low editcount from adding it to userspace pages, but in the time it took me to think of that, I also thought of 3 ways that it could be (fairly easily) circumvented, and all but one of those ways (probably the hardest way) would also have the effect of making abuse harder to find. Mr.Z-man 15:45, 28 June 2009 (UTC)[reply]
OK, so how would that be worse than the status quo? Disembrangler (talk) 15:49, 28 June 2009 (UTC)[reply]
Also, I'm not convinced that we really need to apply INDEX to any userspace pages, in which case, a software tweak could simply make INDEX inapplicable to that namespace. Disembrangler (talk) 15:51, 28 June 2009 (UTC)[reply]
Another point: "Do all search engines follow Google's example of not following noindexed pages?" - AFAIK all the ones worth caring about do. Disembrangler (talk) 15:49, 28 June 2009 (UTC)[reply]

This does not address the problems with mirroring or simply pointing to the page from elsewhere, and setting up new processes for deciding which pages to index and which not. And it still needs to be checked. And they still needs to go in the end, or do we leave them lying indef around now (but delete e.g. user talkpages of indef blocked users, as they are useless!)? There is one very, very simple solution to not having purely promotional pages showing up in google results .. DELETE the it. Yes, usefying is nice, so people can work on it, but we sometimes just have to realise that a rewrite from scratch is better. Having no-index on the userpages does not help, as, as I said, other measures are necessery to keep it in place, and to keep it controlled (and also, what are the limits, Spammers come back after not using an account for weeks, months, years, and start of with doing some good edits to get accounts started, it does not help.

Noindexing userspace is fighting symptoms, not creating a solution. The solution is to delete advertising. --Dirk Beetstra T C 16:01, 28 June 2009 (UTC)[reply]

How does one exclude the other? Disembrangler (talk) 16:06, 28 June 2009 (UTC)[reply]

One is not excluding the other, but making it less useful. As a negative point to noindexing there is that noindexing would take away the urge to remove spam from userspace. That need is now at least significant (but maybe underestimated). --Dirk Beetstra T C 16:13, 28 June 2009 (UTC)[reply]

Mmm. I reminded of the economist who said if we put a giant spike in the middle of our steering wheels we'd all drive really carefully... Disembrangler (talk) 16:32, 28 June 2009 (UTC)[reply]
I hope that if one puts a giant spike (read: noindex) in the middle of the steering wheels of spammers, that they will start 'spamming' really carefully (my personal experiences tell me, that that it will not help too much, there is too much too gain ...). --Dirk Beetstra T C 16:40, 28 June 2009 (UTC)[reply]
I actually was conceiving of the spike being Indexing by Default: spam we allow in this situation will do well in searches (off WP pagerank), so we work extra hard to remove it. (But how hard/effectively in practice?) Anyway, if userspace pages aren't indexed, doesn't spamming become virtually impossible? Abuse in terms of hosting bad pages people can link to, sure, but not in terms of spamming search engines. And once the latter incentive is gone, I don't see much point in people doing the former. Disembrangler (talk) 16:50, 28 June 2009 (UTC)[reply]
The thing is, en.wikipedia has high google ranks, which makes it likely that en.wikipedia is also the most mirrored site.[citation needed] Although userpage spamming is not a huge issue, spamming your pages there will still result in them appearing on mirrors who ignore noindex, and hence, it still has its results. True, noindexing them puts a spike in that, but it does not make it useless to spam. Even if we disallow total indexing of userpages (also ignoring the index tag), it makes it still worth while to have your page here. So the problem does not go away by this, and therefore I don't see the use of noindex. It is quite a spike for a not-too-huge problem for which better and more efficient solutions exist. If you really want to spike the system, noindex the whole of wikipedia, except for certain checked/reviewed versions of a page. This is too easy to circumvent (you don't even have to do anything to get mirrored), I do not believe it will help a lot, the neglected work is still there, and may break some functionality (having to use the internal search in stead of an external search (or both)). --Dirk Beetstra T C 16:51, 28 June 2009 (UTC)[reply]
(previous comment was edit conflicted). Question: how much do people here believe that the mediawiki-wide nofollow on external links has slowed down/hampered spamming of external links (see WT:WPSPAM, local spam blacklist, meta spam blacklist, &c. &c.)? My guess: ' limit to nihil'. --Dirk Beetstra T C 16:56, 28 June 2009 (UTC)[reply]
I believe that this initiative has less to do with offering an alternative to rooting out 'spammy stuff', it is not an alternative, it merely reduces the impact of userspace material, which would not be allowed on mainspace, achieving prominent google listings by from merely being hosted on wikipedia. I am not sure what otherstuff has to do with this. Unomi (talk) 17:14, 28 June 2009 (UTC)[reply]
Do mirrors copy userpages? And if so, can't we stop them? Disembrangler (talk) 17:43, 28 June 2009 (UTC)][reply]
imo, it has slowed it down because intelligent spammers look to see where they can get the most bang for the buck. The fact that Wikipedia has the NOFOLLOW policy seems to be pretty well known. However, they might still put it in for click throughs and exposure, but they aren't going to get increased SEO rank by doing this anymore. Google answers to questions about NOFOLLOW (and wikipedia). --stmrlbs|talk 20:32, 28 June 2009 (UTC)[reply]
All of our content is released under a free license, so we can't stop anyone from copying it, as long as they're compliant with the license (which just got easier with the switch to CC-BY-SA). We could stop providing database dumps that include userspace, but that assumes all mirrors get their content from the dumps, and it would also hurt bot operators and researchers who use the dumps. Mr.Z-man 18:45, 28 June 2009 (UTC)[reply]
Replies:
  • Unomi: Exactly, it reduces the impact, it is not an alternative.
  • Disembrangler: Don't think so, you download the whole database, and do with it whatever you want to do with it.
  • Strmrlbs: Yes, it has slowed it down (but when looking at the feeds .. there is still a lot of spamming going on). Ask A. B. how many SEO's are still active putting pages on Wikipedia. And seen the cases that I have encountered or have been involved in, they are very persistent. There is still quite some gain to have your company/organisation visible here, apparently.
This may slow it down (and that would for me be a reason to nofollow), but it may make some other things more difficult (it is difficult to see what the reasons are behind finding userpages, but I imagine there are reasons for it, and that is here a negative point), but in the end, it will not significantly increase the amount of work that needs to be done (I'm now thinking: people may become only more persistent to have their pages in mainspace in stead of userspace). --Dirk Beetstra T C 09:49, 29 June 2009 (UTC)[reply]

my replies by points

I realize that I am repeating some of what others have said, I would like to address Beetstra's summary points (Beetstra's statements italicized):

The search Special:Whatlinkshere/Template:INDEX counts at this moment 11 entries in userspace (soon to be 12, and more to come).

Purely as an example, say that User:XXXXX now has promotional material on their userpage, and User:YYYYY now does not have promotional material on their userpage. Our smart spammer XXXXX has however added {{INDEX}} to his userpage (so he can be found by Google!), and smart user YYYYY has added it, but has not added promotional material .. yet. So in a first scan we remove the {{index}} of the first (but leave it on the second, no inappropriate information, right?). However, in a later go, he could turn it into promotional material.

Now we have only 11 .. what if we have 5000 .. or 99.708 (less than one procent of the current registered users, and then they may have numerous subpages ...). Scan them all on a regular basis? Yes, the ones with a clearly promotional username are easy to be found (but some good accounts have a COI-username), but others have a seemingly normal name but are only here to promote.

What do we have now? We have 47,709,651 Registered users now that in essence, because of the USER page default of INDEX, all have {{INDEX}} on all their pages. How do we check for this now? Wikipedia is fighting a losing battle with this because the reward of getting a Wikipedia User page with promotional material without getting caught is great. We are fighting the symptoms instead of addressing the cause. You can run scans from here to kingdom come, but even if these scans catch spammers or anyone using wikipedia for high google placement, the User page that they create is out on google for a few days/weeks/months/years. The payoff is great for that time. I'm sure a lot of them don't care if they are caught if they get some google exposure. Even after the page has been deleted, it is out there in google cache for quite a while. We are sweeping up after, without addressing the motivation.

In other words, noindexing has absolutely no effect

I disagree. NOINDEXING removes the easy access to top google billing by just registering and setting up a user page. Heck, you don't even have to register.. just set up a user page with whatever your IP is at the moment. Set it up with the right keywords in the right places and bingo!. You got your reward by getting your page in google at least for a while. It is a HUGE incentive. Take away that incentive, and even though scans should still be run, Wikipedia won't be cleaning up after the damage is done.

setting up a proper system to remove and delete promotional material (and not userfying promotional material but ask for a rewrite from scratch) would help much more.

What do you suggest? That hasn't already been tried? What system do you think would good for evaluating 47,709,651 registered users - and that number is growing. Who is going to evaluate all these pages? I just found a one page ad from 2006.

Do we really believe that spammers will not be able to circumvent this (and do they have to? Do all search engines follow Google's example of not following noindexed pages;

The major search engines, Google, Yahoo, etc. all follow the robots.txt file and whatever that file says is ok or not ok to index for that site. It is a standard that is followed by all the big legitimate search engines. So, the answer to your question is that the big search engines all follow the indexing rules set up by the site owners.

what about creating a non-indexed userpage and saying on other webpages 'find our free webpage on Wikipedia!';

This would not get the same coverage as an indexed wikipedia page automatically showing up in a search for certain phrases. The page containing the link to the non-indexed page would have to have high rank for people to find it.

or wait until your page gets mirrored on a site that does not noindex)?

Wikipedia can't control what other sites do - regardless of whether this is indexing or not.

I would suggest that the people who do spend some time following our wikiproject on spam, seeing how inventive ({{index}}ing is easy) and persistent (do something harmless or good now, come back in 2 months or 2 years, or simple readd and readd and readd until someone gets enough of it) spammers (SEO's) can be (it's their job to sell).

I know how persistent and inventive spammers can be. I've maintained some websites and forums. This is why I was so surprised that Wikipedia allowed indexing of user pages. To just rely on scans.. especially after the fact of after posting the page is trying to drain Lake Erie with a spoon. I will assume that Wikipedia also does filtering on the server side (most sites do) before a page is even created. I don't know, but most sites do this, so I will assume that Wikipedia does too. So, you have filtering before a page is created, you have scans to try to catch stuff like this after a page is created. But, as long as you have the motivation of high google exposure, the reward of high placement on google for a period of time makes it worth it for people using wikipedia in this way. NOINDEXing of user pages will not eliminate spam - but it will sure take the motivation out of using the USER pages for promotional purposes if the pages no longer show up in google.--stmrlbs|talk 20:22, 28 June 2009 (UTC)[reply]

Answers to point I think need an answer:

  • "Wikipedia is fighting a losing battle" -> noindex is not going to make a difference, the spam still needs to be deleted.
  • "NOINDEXING removes the easy access to top google billing by just registering and setting up a user page." -> no, it does not, as soon as it gets mirrored on a wiki that does index it still goes, and since en.wikipedia is the biggest, it is likely to be the most mirrored one as well.[citation needed] So it still pays. nofollow on external links did slow down spamming, but seen the enormous amounts of spam added daily, it does not deter too much, it still pays. Sure, noindex will have a positive effect, but other solutions still need to be followed.
  • "What do you suggest? That hasn't already been tried?" -> sure, we still do, as I said, having noindex or not is not going to make the difference, it still needs to be done in the other way as well. Don't userfy purely promotional pages, let it be rewritten from scratch. And don't let those userfied pages stay forever, give it 1 or 2 weeks, and then still, bye bye. And/or tag them (and have the tag include noindex): template on top: "this page is here to be improved so it can be copied to mainspace. page copied here <date>, if no improvements have been made on <date+2 weeks> then delete this under <some speedy deletion number>. this page is noindexed so it can not be found by the major search engines"
  • "Wikipedia can't control what other sites do - regardless of whether this is indexing or not." -> true, but if the information is deleted here they won't get it from us, and that deletion still is needed.
  • "NOINDEXing of user pages will not eliminate spam - but it will sure take the motivation out of using the USER pages for promotional purposes" -> yes, but I don't believe significantly, nofollow still is not a huge incentive not to add spam.

So my feeling is, that noindexing is not going to have a major effect (yes, it will have some, maybe even quite some effect), it will not take away the work that needs to be done anyway (cleaning the stuff that is not Wikipedia worthy, and quite some of it just needs to go anyway), and there will on the other hand be examples where indexing would have been nice, or where indexing helps (and that is where it differs mainly from nofollow, there is no need to find that information from outside, here there are/may be cases that we might want to find). Cure it, don't hide it. --Dirk Beetstra T C 09:38, 29 June 2009 (UTC)[reply]

Beetstra, in an ideal world I think I'd agree with you. If MfD always did its job and didn't close debates as "Keep - No harm" so often, then maybe I'd be of a different view. I just don't see a resolve to really crack down on userspace the way you are asking for. Go look at recent MfD results. Your idea of how userspace should operate is way out of line with how it is actually being operated. Gigs (talk) 12:39, 29 June 2009 (UTC)[reply]
I know, Gigs, but I don't think that this will help a real lot. It will help quite a bit (but far from total), and also 'break' a little. --Dirk Beetstra T C 13:08, 29 June 2009 (UTC)[reply]
What is it that will "break" a little? --stmrlbs|talk 19:53, 29 June 2009 (UTC)[reply]
You loose the possibility to search userspace via Google, a thing that quite some users find useful. --Dirk Beetstra T C 08:20, 30 June 2009 (UTC)[reply]

Examples of useful non-mainspace pages

For a while, I have been transcribing relevant portions of Interstate Commerce Commission valuation reports on railroads, mostly from the late 1910s. I initially did them as user subpages, but later moved them to Wikipedia:WikiProject Trains/ICC valuations and subpages, because I was pretty sure userspace would eventually be noindexed without the capability of opting out. These reports are useful to Google searchers as transcriptions of hard-to-find documents, and to Wikipedia editors through the use of 'what links here'. Since I am constantly making additions, manually mirroring them at Wikisource is not an option. I know that this RFC is only about userspace, but some people have the idea that Wikipedia: space should also be noindexed, and so I'm giving an example of "internal" pages that both help us in article work and help other people. --NE2 08:10, 28 June 2009 (UTC)[reply]

Wow! Looks like you put a lot of work into this! I have some friends that work for the railroad. I will have to point this out to them. This RFC is only about User Pages, though. I would think WikiProjects would be a completely different animal, so to speak. --stmrlbs|talk 09:01, 28 June 2009 (UTC)[reply]
Honestly, your friends probably won't care unless they're interested in history. --NE2 09:16, 28 June 2009 (UTC)[reply]
They are railroad buffs. They would be interested. :) --stmrlbs|talk 21:02, 28 June 2009 (UTC)[reply]
"These reports are useful to Google searchers as transcriptions of hard-to-find documents"? No offense, but that sounds like a violation of WP:NOT#HOST to me. --Orange Mike | Talk 17:33, 29 June 2009 (UTC)[reply]
Uh... you seem to have ignored the part about them being useful for article work... --NE2 01:13, 30 June 2009 (UTC)[reply]
That's an argument for having them in userspace, not for indexing them in Google. Disembrangler (talk) 15:12, 2 July 2009 (UTC)[reply]
I guess I don't see what makes "what links here" useful for these? Powers T 14:20, 28 June 2009 (UTC)[reply]
I came across Danville, Olney and Ohio River Railroad, clicked "what links here", and got the information to add this. --NE2 00:46, 29 June 2009 (UTC)[reply]

I would argue against making Wikipedia namespace NOINDEX very strongly, for what its worth. I don't really buy all the slippery slope arguments. I know some more sweeping proposals were made in the past, and I think that was a mistake, since it's hard enough to get a consensus on just userspace, which should be the least controversial. Gigs (talk) 12:47, 29 June 2009 (UTC)[reply]

Yes, I agree. Each area needs to be evaluated on its own merits. --stmrlbs|talk 19:58, 29 June 2009 (UTC)[reply]

This Rfc is only about user space, not article space. User: and User talk: and their subpages only. Just to clarify. - ALLSTRecho wuz here 07:20, 30 June 2009 (UTC)[reply]

Reply to statement by Ned Scott

the following discussion was moved from the main page (--stmrlbs|talk 21:18, 30 June 2009 (UTC)):[reply]

Hi, this may be a valid issue, but I think you need to provide more information on how noindexing would cause those problems. Aren't these wiki sites using database dumps? Excluding userpages from those dumps doesn't seem to be an option, and certainly isn't affected by noindexing. And Wikipedia search will still be available for manual searches by users of those sites. Disembrangler (talk) 07:05, 29 June 2009 (UTC)[reply]
Some use database dumps, some export individual files using special:export, some just copy and paste then attribute via template or by copy/pasting the edit history to the talk page. Using Wikipedia search doesn't help the guy who's hunting down information that might have come from Wikipedia or not, but doesn't know ahead of time.
For example, lets say you are looking for the author of a userscript that you're using that just got broken on your wiki. Searching the name of that script will most likely lead you to their user page of the wiki they're most active on. -- Ned Scott 07:34, 29 June 2009 (UTC)[reply]

Ned Scott raises the issue that some off-Wikipedia wikis find it helpful to use Google to search for, eg, userscripts. Responses: (a) such wikis can still use Wikipedia search (b) that's not really Wikipedia's problem (c) some of that material (eg widely used userscripts) may be accommodatable in a different, indexed namespace, notably the Wikipedia namespace. Rd232/Disembrangler (talk) 22:15, 29 June 2009 (UTC)[reply]

Also I think LtPowers' suggestion solves a lot of those problems: mirrored userpages then point to the original Wikipedia userpage. Rd232/Disembrangler (talk) 11:07, 30 June 2009 (UTC)[reply]

Statement by Collect

Wow - call the users "dumb" because they don't know exactly how Wikipedia works, and may not be able to judge what value pages in userspace have. Yeah, OK. And putting "user article" in the title of every userspace page - yes, that'll really clear things up. I can't help remarking that LtPowers already made the suggestion to (more) clearly mark userpages, which I've endorsed; and that if editors don't have anything constructive to add to existing statements, maybe they should stick to endorsements or the talk page, because multiplying the RFC statements unnecessarily is just not very helpful for later editors who want to see what's going on. Rd232/Disembrangler (talk) 14:31, 2 July 2009 (UTC)[reply]

Perhaps Collect or an uninvolved party could move his comments to an endorsement of LtPowers. –xenotalk 14:43, 2 July 2009 (UTC)[reply]
I left a note on his user page. Gigs (talk) 15:53, 2 July 2009 (UTC)[reply]
I am just about ready to post a statement that we don't need any more statements ;-) especially those which just give the same suggestions already posed by other users in their statements. It seems to me that this is getting convoluted when it really should be rather simple. I think we can narrow this all down to two positions and let user support reflect those: 1. We should noindex userpages by default with the option to index them case-by-case. 2. We should leave everything as is. I realize that there are some other nuanced suggestions involved here, but maybe we should just be looking at this concept broadly at first and then tackle the minutia. Yes? -- əʌləʍʇ əuo-ʎʇuəʍʇ ssnɔsıp 16:01, 2 July 2009 (UTC)[reply]
I don't think it is that simple. I see substantial support for (a) status quo (b) noindexing by default with optional indexing by anyone (c) noindexing, end of story (d) noindexing by default, but restrictions on indexing. If we did voting, I'd suggest a two-stage procedure: i) status quo vs some form of default no-indexing ii) if default no-indexing is agreed, decide between alternative forms of that. As it is, how should we proceed? Wait a week or two and see if clarity emerges via the RFC alone? Disembrangler (talk) 16:07, 2 July 2009 (UTC)[reply]
I highly doubt that clarity will emerge on its own. We could try to collapse the popular statements into more concise proposals, and then run a new straw poll. One issue is that we may not get people to come back that commented or endorsed earlier. Gigs (talk) 17:14, 2 July 2009 (UTC)[reply]
Well we could let it run a while, and then do another RFC for step i, and then another for step ii. It'll take a while, but WP:DEADLINE. Disembrangler (talk) 17:45, 2 July 2009 (UTC)[reply]
Take a look at what I did below. If we can develop a consensus about what the questions are based on the suggestions and arguments given, then we can probably archive the current statements and open a second stage straw poll/discussion in a few days or so. I don't think we should open a new RfC because people have made some solid arguments for or against proposals in their statements, and we shouldn't throw those out. We can archive/collapse as needed, to help with readability. Gigs (talk) 17:52, 2 July 2009 (UTC)[reply]

Imo, it is an isolationist concentric close-mindedness type of ignorance on the part of some wikipedia users to think the world revolves around Wikipedia and to expect the world to understand - or even care - what the Wikipedia standards are. I would say the majority of people on the internet would not get that USER: means some editor on Wikipedia and don't care. Wikipedia is an encyclopedia to most people that use the internet. In fact, to a person using the internet, from their perspective, they are THE USER of wikipedia. They just do a search and see results from Wikipedia returned by google.
Perhaps the difference of opinion on whether to index the UserPages or not stems in part from a difference of opinion as to what Wikipedia is and who it is for. I think Wikipedia is a great site, a historical social/information experiment that is a fascinating use of the internet, and I love the open part of it.. but I think sometimes that we, as editors, get so wrapped up in what Wikipedia means to us, that we forget what it looks like to people not involved with wikipedia. I'm not saying that our needs are unimportant - but, I think as far as this RFC, we should try to think about the outside perspective as well as our own personal perspectives and needs. --stmrlbs|talk 19:10, 2 July 2009 (UTC)[reply]


I would like to note that I do not consider people "dumb" and I added "or ignorant" as suggested by one response -- my point was that this is the position being taken by many of those opining on the project page, not to assert that it is my position. I, in fact, think it is more important for WP to educate non-editors as to what is, and is not, encyclopedic content than to rail at those who index the userspace pages. It is also my personal opinion that we are in danger of fixing something which is not actually broken -- which is why my comment actually echoes parts of several others. If we can actually get it "out there" that userspace is not part of the encyclopedia, that, frankly, should be sufficient. Collect (talk) 21:23, 2 July 2009 (UTC)[reply]

OK, well Stmrlbs has an excellent response above on the "dumb/ignorant" issue so I won't address that. I will ask how getting it "out there" that userspace isn't part of the encyclopedia will have any effect on spam and other abuse. Generally, I don't understand the position that there is no problem. You might think the benefits of the status quo (can easily find everything through a third-party search engine, rather than having to use WP) are higher than the costs (spam and other abuse), but why would you think there is zero abuse now and won't be any in future? (I'd like a piece of that optimism...) Alternatively, why do you think it's worth editors' time to police userspace so effectively that there is no problem (assuming this is even possible)? Noindexing (especially without optional indexing) makes spam impossible at zero ongoing time cost to editors. Clearly, a problem solved! Disembrangler (talk) 22:06, 2 July 2009 (UTC)[reply]
Um -- I thought my position that WP needs to make sure that non-encyclopedia pages are clearly marked as such would be in accord with your position. Did I misread your post, or you, mine? Collect (talk) 19:57, 11 July 2009 (UTC)[reply]

Summary of positions.

This section should be used to sorting out what the wording of the positions should be, not discussion of their relative merits.

Main Issues

  1. Indexing, yes or no on each (three separate issues).
    1. User
    2. User_talk
    3. User/Subpages
  2. Should users be able to opt in for indexing?
  3. If there is consensus for opt-in indexing, should there be a threshold for usage, like autoconfirmation.

Other issues

  1. Include a mandatory and obvious warning on all user and user_talk pages and subpages, such as a different background or a textual warning that informs the reader they are not looking at an encyclopedia article.
  2. WP:MfD should be more vigilant and strict about removing material that violates the policies.
  3. Open a dialog with search engine providers to tell them that User and User_talk should be ranked lower.

Discussion of position wording

I have tried to group these by mutual exclusivity. There was a proposal in one of the statements to treat user and user_talk separately, so I broke those up. Issue 4 could be similarly split, though I'm not sure that it makes as much sense to split that up based on user vs talk and subpages. Issue 5 is sort of a vague mandate, but there did seem to be some calling for stricter enforcement at MfD as a way to help address the problems here. I am a biased participant here, so lets hash out if these really reflect the current proposals. Remember we aren't discussing merits here, just what our options are based on suggestions offered in statements. Gigs (talk) 17:45, 2 July 2009 (UTC)[reply]

This is simpler. So.. what do we do? voice support here? or are you going to start a new RFC? --stmrlbs|talk 03:41, 6 July 2009 (UTC)[reply]

Not a new RFC. I can archive all the current statements, and change to a proposal/discussion format instead. I'll try to get that going tonight. Gigs (talk) 22:00, 6 July 2009 (UTC)[reply]

An Alternate 2 Step Summary of Positions

I think Levine2112 and Disembrangler have a point. We need to keep this issue simple or we are never going to get anywhere. I think this could be resolved in 2 steps: Decision 1:

  • Should all User Pages be INDEXED by Default? (The point of the original RFC)
  • YES - this is the status quo, and that means no change, and no need to go any further
  • NO - this should be changed. The next step is to determine how the status quo should be changed.

Decision 2: If all User Pages should NOT be indexed by Default, then this means the default for USER PAGES is NOINDEX. Start discussion with the assumption that the default for USER PAGES is NOINDEX:

  • The next discussion should be on whether there should be exceptions, and if so - how are the exceptions determined - with what criteria?

--stmrlbs|talk 08:27, 3 July 2009 (UTC)[reply]

  1. I suspect that the last word in your "Decision 2" is a typo (DEFAULT may be the wrong word).
  2. I do think we need to get back to basics and simplify things. What's happened so far has given people an opportunity to brainstorm. Now we need to distill it into a few very simple points, and as suggested it needs to be done in steps like a flowchart. -- Brangifer (talk) 17:03, 5 July 2009 (UTC)[reply]
thanks for noticing that, Brangifer. I fixed it. --stmrlbs|talk 17:40, 5 July 2009 (UTC)[reply]
Good. Now proceed. You're on to something. -- Brangifer (talk) 18:16, 5 July 2009 (UTC)[reply]

There's obviously no consensus on "noindex all user pages", one way or the other (15 xeno/26 Gigs == no consensus). There's no need to put that back up for another straw poll. The point here isn't to keep running new straw polls until we get the result we want, it's to work toward something that there is consensus on. So Decision 1 is pretty much done, and the result is "no consensus". We need to work toward proposals that we can actually get consensus on. I have tried to simplify my earlier issue statements, see what you think. Gigs (talk) 02:32, 6 July 2009 (UTC)[reply]

Good points. Then let's move on and find out what we can agree on. -- Brangifer (talk) 03:02, 6 July 2009 (UTC)[reply]
Be my guest. I suppose if you contacted everyone who weighed in before - and only them - then there wouldn't be a WP:CANVASS issue. -- əʌləʍʇ əuo-ʎʇuəʍʇ ssnɔsıp 04:01, 8 July 2009 (UTC)[reply]
 Done. –xenotalk 14:10, 8 July 2009 (UTC)[reply]

Wikipedia areas NOT indexed by Google and other major search engines

I would like to list the areas of wikipedia (the en.wikipedia.org) that are NOT indexed by Google as of July 4, 2009. I think a lot of people are under the impression that everything in Wikipedia is "open content", meaning the major search engines have access to all of wikipedia. This is not true. To find anything in the following areas, you have to use the wikipedia search. Each link will list a group of areas that are not indexed by google. Try looking up something in these areas, and see if you can find it with google. Then try finding it with the Wikipedia search:

--stmrlbs|talk 04:46, 5 July 2009 (UTC)[reply]

about threshold for usage requirement for opt-in indexing

I don't think this is a good way to determine which editors get opt-in indexing. I know too many editors whose main activity is reverting edits for any little reason, but do little to contriubute to articles. It takes a lot more time to go over an article, and research and find something appropriate to add. Reverts with no discussion take a minute. It is a quick and dirty way to quickly build up edit counts.
I think instead opt-in indexing should be on a case by case basis, and have the editor explain why they want opt in indexing. --stmrlbs|talk 23:15, 6 July 2009 (UTC)[reply]

I don't think this is possible (technically) or a good idea (WP:CREEP). –xenotalk 23:09, 7 July 2009 (UTC)[reply]
I agree. I don't think it should be based on editor usage history. I think it should be determined by the community on a "per page" basis, rather than a "per user" basis. Any editor can add the the INDEX tag to their page, but it can be challenged by any user, reverted by any user, and brought to the community for the final word if needed. -- əʌləʍʇ əuo-ʎʇuəʍʇ ssnɔsıp 23:19, 7 July 2009 (UTC)[reply]
That sounds more reasonable. That way only possibly objectionable pages will take up any time, while all the other innocent pages will be left alone. There is no need to waste the project's time with examining each request. -- Brangifer (talk) 04:54, 8 July 2009 (UTC)[reply]

Forum shopping

I'm sending the notes out now, but the whole history of this push to noindex userspace really smacks of forum shoppery. First the discussions in three different forums. Then the RFC. Ok, so the RFC didn't get us what we wanted. Let's repurpose it into a straw poll with some statements that say the exact same thing we said before! Maybe then only the people who want NOINDEXing will vote. Meh. –xenotalk 14:05, 8 July 2009 (UTC)[reply]

Sour grapes? The previous discussions wandered around a bit, so an RFC was warranted to clarify. The outcome of that pointed fairly clearly in the direction of some form of noindexing - though there was substantial opposition too. Now it needs to be clarified what form that noindexing will take (including allowing optional indexing, which for low thresholds of eligibility is the key question for what noindexing means). However, I am a bit sceptical of the straw poll format. I think we need to find some way to do collaborative statements, so that arguments for particular positions evolve as articles do, instead of discussions back and forth like a talk page. I mean, it's not really that difficult a concept, it's just a different way of doing things which I think we should try! Rd232 talk 14:22, 8 July 2009 (UTC)[reply]
Pointing out a fairly obvious abuse of process (repeated asking the same question to get the answer you want) is not sour grapes. –xenotalk 14:25, 8 July 2009 (UTC)[reply]
What was the result of the RFC, in your opinion? لennavecia 14:40, 8 July 2009 (UTC)[reply]
No consensus. –xenotalk 14:43, 8 July 2009 (UTC)[reply]
How do you figure? Apart from anything else, the RFC only ran from 26 June to 7 July, before being collapsed into a straw poll. Rd232 talk 15:25, 8 July 2009 (UTC)[reply]
No consensus because it suffered from (likely unintentional) selection bias in who was contacted to participate and furthermore because it was prematurely ended. And why was it? Why didn't we just create a straw poll somewhere else? In a subpage? Here? Why completely eliminate the carefully prepared statements people had written? –xenotalk 15:29, 8 July 2009 (UTC)[reply]
That's a funny interpretation of "no consensus" - the first Status Quo statement got 15 endorsements, the first Change statement 26 [1]). Also disagree on selection bias - it's advertised widely enough, and if anything, selection bias is likely to work the other way, because the few people who have Useful Things Which Must Be Indexed in their userspace are more likely to be motivated to participate than the vast majority who don't. Agreed though on the straw poll part - but when I tried to un-straw-poll it, you reverted me. So what now? Rd232 talk 15:34, 8 July 2009 (UTC)[reply]
The selection bias is because the people who were involved in the previous discussions were mostly those wanting the noindexing. Though each discussion was soundly rejected, the folks rejecting it probably don't care that much to continually find and refute the discussion each time it was raised again. So you got 63% in favour, in an RFC that didn't run the full 30 days, because all the right people who want to sweep the problem under the rug were canvassed. So what? That doesn't mean anything. That's not consensus. –xenotalk 15:38, 8 July 2009 (UTC)[reply]
So your logic is (a) people involved in discussion were mostly those who wanted noindexing and (b) they're wrong, nah ("soundly rejected") and (c) there's more of them, but they won't shut up. ... Well that's one way to look it. Evidence of selection bias? No. Rd232 talk 16:59, 8 July 2009 (UTC)[reply]
Also the option of sticking with the status quo is still very much on the table. It's not like the issue has been closed, and the only question is how to noindex. Rd232 talk 17:01, 8 July 2009 (UTC)[reply]
O RLY? I didn't know we had moved on from whether or not to noindex. –xenotalk 17:33, 8 July 2009 (UTC)[reply]
Now you're not even reading what I write. We haven't moved on. That's what "still on the table" means. Rd232/Disembrangler (talk) 18:49, 8 July 2009 (UTC)[reply]
Yes, I mis-parsed your comment. Apologies. –xenotalk 19:08, 8 July 2009 (UTC)[reply]

Could someone notify me if anyone actually tries to implement this. Thanks, R. Baley (talk) 14:35, 8 July 2009 (UTC)[reply]

what doe s"this" refer to? Rd232 talk 17:02, 8 July 2009 (UTC)[reply]

I'm with xeno on this. The pro-noindexing side didn't get what they wanted from the first RfC, so now it's a vote. -- Ned Scott 03:41, 10 July 2009 (UTC)[reply]

And I sure as hell don't appreciate pro-noindexers removing parts of my statement in the first RfC [2]. -- Ned Scott 03:48, 10 July 2009 (UTC)[reply]
Ned, I didn't delete anything.. I just moved the discussion from the RFC to Reply_to_statement_by_Ned_Scott on the talk page. I left the original comment you left with your support statement. It wasn't anything personal, and I didn't look at what you said and think "Oh! I have to get that off the project page because I don't agree with it!". The discussions are supposed to be on the talk page, and the Support/Oppose on the project page. Hence the instruction at the top of the Project page that says

Discussion should be posted on the talk page. Threaded replies to another user's vote, endorsement, evidence, response, or comment should be posted to the talk page.

Otherwise, everyone starts discussing everything on the project page, and you can't see who voted for what. But you will find your answer to the question asked here. And also a note that it was moved from the project page. But hey, I see you added it back, now your discussion is on both pages. --stmrlbs|talk 07:46, 10 July 2009 (UTC)[reply]
There was no discussion occurring. I quoted my answer to another user (including his question for context) as an example. -- Ned Scott 21:30, 12 July 2009 (UTC)[reply]
Actually, if you bother to look at the prematurely-stopped first RFC at the point it was stopped, you'll see a substantial majority (about 1/3 to 2/3) in favour of (more) noindexing, and the trend was towards noindexing, which I think is reflected in the current RFC. Also my attempt to refocus the straw poll back onto the core issue was undone by xeno, who supports the status quo. Rd232 talk 08:04, 10 July 2009 (UTC)[reply]

Xeno, you gave feedback into the current positions before I posted them, which is why talk is broken out from user. Why didn't you object then if you didn't think those positions reflected the statements? Why object now? Regarding forum shopping, repeated polling, I specifically wanted to do this here, rather than start a new RfC, so that it wouldn't be forum shopping/audience shopping. DOing it here means that everyone who comes here can see the full discussion that lead up to this. Gigs (talk) 18:59, 12 July 2009 (UTC)[reply]

Inherently flawed

This RFC has completely lost its way and is now likely irredeemably flawed.

It's been manipulated from a discussion into a vote in a graceless manner, with options that simply clutter the relevant issues. Additionally, the voting options are all skewed toward a particular outcome (noindexing).

Look, for example, at the option "Make User_talk: namespace NOINDEX, excluding it from search engines." You can't make the User_talk: namespace noindexed. It already is. It has been for months. How can someone propose and support implementing something that's already been implemented?

Given the number of issues, I'd strongly suggest starting over and doing it properly. The current mess is simply unacceptable. --MZMcBride (talk) 14:54, 8 July 2009 (UTC)[reply]

Agree, per my above noted concerns. And on the user talk thing... I wasn't quite sure of this. Is there a page that shows exactly what is and what isn't indexed at present? There ought be. –xenotalk 15:23, 8 July 2009 (UTC)[reply]
I'm inclined to agree. I did try to change the structure of the straw poll (but Xeno reverted), and the next step was going to be bring back the arguments. I was going to try out the Collaborative Position Statement approach. Anyway, what do you suggest? (Also, what's your citation for usertalk being noindexed?) Rd232 talk 15:24, 8 July 2009 (UTC)[reply]
I'd suggest archiving the whole mess and trying again in a month or two. The selection bias now is too great to conclusively say there is consensus for any of this. –xenotalk 15:31, 8 July 2009 (UTC)[reply]
Any evidence whatsoever of selection bias towards noindexing? Rd232 talk 15:35, 8 July 2009 (UTC)[reply]
See my above statement [3]. –xenotalk 15:39, 8 July 2009 (UTC)[reply]
So that's a no then. Rd232 talk 17:00, 8 July 2009 (UTC)[reply]
If you don't see how that's a self-selection bias that was compounded by canvassing them to participate in this RFC, then I'm not sure what more to say. –xenotalk 17:02, 8 July 2009 (UTC)[reply]
What canvassing? What self-selection bias? Please connect the dots here. The only self-selection bias I see, as noted above somewhere, is working the other way. Rd232 talk 17:07, 8 July 2009 (UTC)[reply]
Is it really that unclear? The folks who were contacted to participate in this RFC were those that had initiated or participated in several previous discussions on noindexing userspace. It stands to reason that the folks that think NOINDEXing userspace is necessary would self-select by participating in those discussions. Contacting them all to participate in this RFC creates an obvious selection bias.
Furthermore, by nuking the statement-endorse RFC and replacing it with a very slanted straw poll, those same folks who so vehemently think userspace needs to be NOINDEXed will, as a result of their very persistent nature, come back in droves to vote again. However, the folks that rejected it in the statement-endorse format might not care enough to constantly have to re-register their opinion that "no, this really doesn't matter, let google do what they want". This is the very definition of forum shopping: repeated re-asking and re-framing the question to get the answer you want. –xenotalk 17:26, 8 July 2009 (UTC)[reply]
(a) who was contacted to participate in this RFC? what evidence is there of bias in such contacting? (b) what makes you think noindexers are more persistent or have stronger opinions? as I said before, a priori one would think the opposite (c) How is the straw poll slanted? And if it gradually emerges that you're concerned that people who expressed an opinion in the RFC won't all do so in the straw poll, well (i) you've already xenobotted them, haven't you? and (ii) we can also check in a while how many people that applies to, and weigh it up in the conclusion. Not an earthshatteringly difficult problem. (d) your whole forum-shopping etc frame is really very problematic, and frankly the vociferousness of your "won't the people who disagree with me just shut up" is starting to grate. Rd232/Disembrangler (talk) 18:48, 8 July 2009 (UTC)[reply]
As is the vociferousness of those who think allowing search engines to index this stuff is at all a problem. –xenotalk 19:12, 8 July 2009 (UTC)[reply]

User talk pages are indexed.. but it doesn't seem to be as widespread as the User pages and subpages. Perhaps Google doesn't rank them as high. Google seems to skip indexing anything in the external links section, too. Don't know why this is either. --stmrlbs|talk 20:09, 8 July 2009 (UTC)[reply]

Please stop spreading inaccurate information. I realize that likely isn't your intention, but you're flatly wrong. Those user talk pages have been specifically tagged with {{INDEX}} (see also: WhatLinksHere for the User_talk: namespace). That's why they're indexed.

Given the amount of misinformation being spread around here, I'm inclined to move to shut down this entire RFC as soon as possible. --MZMcBride (talk) 20:24, 8 July 2009 (UTC)[reply]

ok, I see what the difference is. The User Talk pages are allowed for indexing via the robots.txt file, which is the first file checked by Google. But on the user talk pages, it has this: <meta name="robots" content="noindex,follow" /> which tells Google not to index this page, but to follow the links in this page. However, this can be overridden with the INDEX template.
I wonder why there are less than 250 user talk pages listed as using INDEX, and yet there are over 4,000 user talk pages listed in Google? --stmrlbs|talk 22:25, 8 July 2009 (UTC)[reply]
Look carefully. "User:Denni/User talk:Denni/2005 July Archive - Wikipedia, the free ..." etc. --MZMcBride (talk) 22:44, 8 July 2009 (UTC)[reply]
MZM, this isn't 20 clues. If you have an explanation, please help by explaining it to the people discussing it, as I'm sure there are those who might not understand all the details. Also, what Denni did with making his user talk page a subpage of his Main User page is certainly not the norm - nor is it the reason for so many wikipedia talk pages in google. The reason is other areas of en.wikipedia.org - wikinews,wiktionary,etc. are evidently subdirectories of en.wikipedia.org, and picked up by google as part of en.wikipedia.org - but they evidently aren't governed by the same search policies as the main part of en.wikipedia.org. Example: User:Cometstyles on Wikinews has no INDEX template on his talk page. He is also here in the first 50 results - just look for "Russian whores pregnant after sperm blast, penis suspected" in the synopsis presented to the public. Lovely --stmrlbs|talk 06:48, 10 July 2009 (UTC)[reply]
I believe you have a fundamental misunderstanding of the way in which indexing and search engines operate and I really wish you would stop making claims or trying to present facts about them. There are plenty of resources available to you on the Web and elsewhere to learn more about Google and <meta> tags; however it is completely unhelpful (and even harmful) to spread information that simply isn't true. --MZMcBride (talk) 15:39, 10 July 2009 (UTC)[reply]
A bit more: My problem with your posts is not that you have questions or that you don't know everything there is to know about search engines and indexing. My problem is that you're speaking with authority and making claims that you will admit you're unfit to make. You're drawing conclusions based on random Google searches and trying to spread your conclusions without bothering to fact check them or ensure that they are accurate. You're not saying "I tried this, got these results, and I think it means this." Instead, you're saying, "I did this search, got these results, this is what it means." Which is a fine thing to do if you're right, but you seem to continually be wrong. --MZMcBride (talk) 15:42, 10 July 2009 (UTC)[reply]
The "random" search I did was a site search of en.wikipedia.org, where I search for the exact phrase "User_talk" only in the URL. Are you saying that this "random" search would not show what User talk pages are indexed by Google for en.wikipedia.org? I would love to hear how you would do a google search to find all en.wikipedia.org User talk pages indexed by Google. There were many User pages returned by the en.wikipedia.org site search that did not have an INDEX template, and did not have a "<meta name="robots" content="noindex,follow" />" in the User talk page source, yet you offered no explanation for that. When I asked (in general, not you) why the big difference in results between the number returned by searching for users that use the INDEX template and the user talk pages returned with the google site search of en.wikipedia.org, you replied with the vague "Look carefully. "User:Denni/User talk:Denni/2005 July Archive - Wikipedia, the free ..." etc." Which is not explaining, but playing games. Nor did your Denni "clue" explain all the results from the sister projects - which do not seem to be included in the User talk page NOINDEX change. MZM, the problem I have with you is that you seem to enjoy pointing out what is wrong with everyone else's explanations, without giving any clear explanations yourself. "Dropping a clue" is a safe way to imply that you know why, without actually making the commitment of giving a clear explanation. --stmrlbs|talk 05:57, 11 July 2009 (UTC)[reply]
MZMcBride, instead of complaining about misinformation, why don't you help the rest of us by showing us where you saw this policy stated? I am going by looking at the actual files used by Google, but a lot of time could be saved if you could give us a link to where Wikipedia says what is searched and what isn't. --stmrlbs|talk 20:48, 8 July 2009 (UTC)[reply]
http://noc.wikimedia.org/conf/highlight.php?file=InitialiseSettings.php specifically 'wgNamespaceRobotPolicies' => array() --MZMcBride (talk) 22:44, 8 July 2009 (UTC)[reply]
Which tells us that enwiki currently noindexes user_talk; dewiki (German) noindexes all talk pages, including user_talk; dawiki (Danish) noindexes user and user_talk; hewiki (Hebrew) noindexes user pages. Rd232 talk 13:55, 9 July 2009 (UTC)[reply]

This new format is worse than the old one, and is full of bias and turns the whole thing into a mindless vote. People have some major misconceptions about what actually happens with search engines, and there's little to no evidence to prove either side of this debate. How much of a problem is promo userpages? Are there a lot of people getting userpages in search results that are not relevant to their search? It's not like userpages pop up when you just search some random word or term. -- Ned Scott 03:33, 10 July 2009 (UTC)[reply]


I'm going to agree with the above persons. This RFC has been gutted, and no longer represents consensus in any form.

Also, I have been disenfranchised: my position had been removed from consideration, and I needed to re-add it by hand. You Do Not Do That In An RFC, even (or especially) to a current minority position. Worse if it's founding principles.

I could be wrong, people could correct me, but it certainly must be considered and checked!

I would suggest we close the RFC within 24 hours. However, I see that many people hold particular positions on user pages that need to be discussed. We should certainly provide a venue for discussion on the matter. I'm just not happy how a small part of the community is pushing their position on the rest of us in this case.

And how do we calm down and make sure this doesn't become a big blowup? :-) --Kim Bruning (talk) 09:23, 11 July 2009 (UTC)[reply]

Your statement did not include any actionable ideas. It was just an argument for why user pages should remain indexed. The straw polls were formulated based on paths forward. All of the arguments for and against remain, on here and in the collapsed statements. Nothing has been removed. Gigs (talk) 18:45, 12 July 2009 (UTC)[reply]

discussion on "Make User/Subpages NOINDEX, excluding them from search engines."

(moved from RFC page - discussion is supposed to be on talk page):
Support - based on statements and concerns I stated previously, including the fact that indexing would only encourage Wikipedia becoming a myspace/home page and distract from it being an encyclopedia. Ottava Rima (talk) 14:10, 8 July 2009 (UTC)[reply]

People have been claiming this is a problem since at least 2005. It hasn't happened. It is not going to happen and fundiemetaly the software isn't conducive to it happening. Of course the world's largest online encyclopedia does have extensive social networking features but thats life.©Geni 14:14, 8 July 2009 (UTC)[reply]
How can people claim that allowing Google to search user pages has been a problem since 2005 when it has yet to happen? And user pages are already treated as websites by many people - it will only increase if there is more attention on them via searching. Ottava Rima (talk) 15:20, 8 July 2009 (UTC)[reply]
Google can and does index user pages. Has done for years. Over that time people have claimed all sorts of things will turn wikipedia into myspace (userboxes would be the most obvious example). Hasn't happened. Not going to happen. Avialible evidence suggests it might not be such a bad thing if it did.©Geni 15:24, 8 July 2009 (UTC)[reply]
Actually, then if Google does do this and we do have massive myspacing, your argument only defeats itself. I see people using their user pages like myspace all the time. It really needs to end and now we know its source. Ottava Rima (talk) 15:27, 8 July 2009 (UTC)[reply]
We do not have ah "massive myspacing". While people have tended to improve the layout of their userpages over time it has nothing to do with google and everything to do with makeing a good impression within the wikipedia community. The myspacing boogeyman ceased to be credible in 2007. Please come up with a new one.©Geni 16:46, 8 July 2009 (UTC)[reply]
"We do not have ah "massive myspacing"." Saying that makes you seem like you are an ostrich with your head in the sand. I cannot trust in your ability to see the reality of Wikipedia or willing to acknowledge such. Ottava Rima (talk) 21:55, 8 July 2009 (UTC)[reply]

Evidently, the people that say this hasn't happened just haven't looked very hard. from my previous summary (which is now hatted with the previous proposal) - Take a look at Template listing of User pages that have a high probability of spam/personal use. Here is an example from one of the searches of an advertisement for a graphic design company that has been around since 2006. Main User page, too. I'd find more examples, but there are 47,709,651 Registered Users (and growing). So, I'll let you look through them. This is the problem with just using scans to look at Userspace.. they help, but you still need a lot of time to go through all the results of a scan to determine which users are using the wikipedia google ranking for non-wikipedia purposes. As far as searching, you would still have to do 2 searches to find every occurrence of something on both the web and in wikipedia. Just doing one search only gives you a couple of hits in wikipedia, but not all occurrences of a phrase. examples:

  • google search for horticulture - 1 result (found on wikipedia)
  • wikipedia search for horticulture - 6,015 results
  • google search for tiger - 2 results shown from wikipedia with 1st search / click "more results from this site" to get rest - 44,500 results
  • wikipedia search for tiger - 59,027 results shown from wikipedia. A difference of over 14,500 more results from the wikipedia search.

bottom line - you might think you are getting all occurrences of a word/phrase by using Google, but you aren't--stmrlbs|talk 17:39, 8 July 2009 (UTC)[reply]

Stmrlbs, the problem is not what Wikipedia finds and Google not (as that is not a reason to noindex, google already can't find it .. so those userpages are not a problem then!), the problem is, what can Google find and Wikipedia not. Your bottomline rewritten: you might think you are getting all occurences of a word/phrase by using one of the search engines, but for a full result you need to use both Google ánd the Wikipedia search (or even more). --Dirk Beetstra T C 09:48, 9 July 2009 (UTC)[reply]

Search engine difference ..

OK, a small example, which maybe can also be solved in a different way, but the wikiproject on chemicals gave me the following example:

Note that the two searches give different results, and the page one actually would like to find (Glucose) does not show up in the Wikipedia search (though you will get there through the disambig that is the top result). For me this is yet another reason for oppose (next to that it does not actually solve anything (maybe make the problem a bit smaller), it really does break things). --Dirk Beetstra T C 18:25, 8 July 2009 (UTC)[reply]

WTF does that prove? You put "C6H12O6" into a search engine, you don't expect to find "glucose". You expect to find what it may be, and it may be a number of things, one of which is glucose - info which the top hit on the WP search provides, and the second hit on Google. And what exactly does this have to do with userspace?? Disembrangler (talk) 18:55, 8 July 2009 (UTC)[reply]
Dirk, the pages that you give as an example of what you want to find are MAINSPACE articles. No one is contesting the fact that the MAIN articles should be indexed by Google. They should be. The Articles on Main Space is what Wikipedia is about - they are Wikipedia encyclopedic articles. Can you give us an example of something you needed to find on the User pages that you couldn't find with Wikipedia search? --stmrlbs|talk 18:59, 8 July 2009 (UTC)[reply]
I believe Dirk's example succinctly demonstrated why the Google algorithm is superior to our own. –xenotalk 19:10, 8 July 2009 (UTC)[reply]
Thanks, Xeno. Yes. What I mean is, that the wikipedia engine misses mainspace articles. Which also means, that the wikipedia engine will for certain terms miss userspace articles, while you are explicitly looking for those! I am sorry if I was unclear, but I don't think that I deserved being yelled at! Thanks for the understanding, Disembrangler and Stmrlbs! --Dirk Beetstra T C 19:12, 8 July 2009 (UTC)[reply]
Sorry, didn't mean to yell, but I was in a hurry and it seemed like a terrible example, not least because to me the WP result is logically superior here. Disembrangler (talk) 08:29, 9 July 2009 (UTC)[reply]

You want examples, fine:

vs

Note, that a) the results are different, and b) that I really would be looking for someone producing fireworks, then for the mere 5 links that the internal search gives me.

Next beans warning: if you want to write an advertising page here, make sure that you riddle your page with useless marks, waiting for it to be mirrored and found on internet (as wikipedia userspace is going to be noindexed), as the internal engine is likely not able to find it (and google for sure not). It still pays to have your cruft here. Please, can someone now try to solve the problem? --Dirk Beetstra T C 19:23, 8 July 2009 (UTC)[reply]

search issue: I'm not convinced that this is a common problem (searching with subscripts) or more generally, a realistic way to find useful userspace content. Normally it's for the user's benefit only, and if not, then linked from somewhere relevant (particularly, a relevant talk page). Also, as stated previously, no solution has zero costs. The question is which has the highest cost-benefit. Disembrangler (talk) 08:29, 9 July 2009 (UTC)[reply]
beans issue: you're missing the obvious point that junk mirrored is only problem if it shows up in Google (et al), in which case we can find it and remove it from Wikipedia, which will eventually be reflected in the mirrors. But in the mean time, the junk isn't on Wikipedia, not associated with it, and not benefiting from its page rank. Disembrangler (talk) 08:29, 9 July 2009 (UTC)[reply]

This is a no-discussion issue I think. Google will always have superior search because:

  1. it has the whole internet as a repository of information, so if person links wp userpage from his external homepage, it is likely this page is going to be a hit for his real name and other information
  2. they actually have people working on it, and have invested more than $0 WMF did
  3. since the last two it is better in guessing "what you might want", instead of finding the most relevant article with the string you entered

This particular example is some kind of parsing weirdness with formulas that have lots of subscripts... A corner case no-one got time to work out, test and fix. In any case, whether WP search is good or not shouldn't be an issue here, the issue is whether community wants user namespace stuff to be public (as in easily accessible) or not. --rainman (talk) 19:57, 8 July 2009 (UTC)[reply]

I agree. "The question is what does Wikipedia want to present to the public?". However, no matter what the decision here, it would be nice if Wikipedia spelled out somewhere what it allows Google to index and what it doesn't. I think a lot of people have the idea that they can find anything on wikipedia with google, and that is not the case. Probably never will be, because there are areas that most people don't want out in google. --stmrlbs|talk 21:22, 8 July 2009 (UTC)[reply]

We can't say in either way if one is superior to the other (re to Disembrangler), they simply give (sometimes completely) different results (re to Disembrangler and Strmrlbs). Sometimes the Wikipedia search is sufficient, and is suitable enough, for other things it simply is not. Superiority is depending on what you are actually looking for. For me, as active spam fighter, if I want to look for bad information, then the above example (of the fireworks) gets close to something that I would like to find and eradicate, and the wikipedia engine is then, for me, a) really insufficient and b) one can circumvent it to hide information which shows up after mirroring. I can go on with my beans (my nose is almost full), noindexing and proper hiding can make userspace even a better medium to store (hide) information or to use it as a communication medium for malicious things.

It is al solveable, we could write a better, stronger, specialised search engine on the toolservers for specific means, and we could also improve the ways of MfD and Speedy deletion of 'rubbish' in these namespaces. If those mechanisms are there, then I would have much weaker opposition, but for now, and as I have argued earlier, it breaks some things, and it makes the problem that it is intended to solve (a bit) smaller (but it does not solve it!), and that is why I am not a supporter of the idea, at all.

Let me be clear, I am sure that noindexing userspace is going to help! I am not doubting that. But, I am afraid that it is not a solution, as there still is a gain in having your pages here (e.g. after mirroring the noindex will still give you results, maybe not as high, but well), and, in my opinion the rubbish still has to go! And seen the results that MfD's on userspace pages now have ('mwaagh, it is in userspace, who cares'), that will just be worse after noindexing ('mwaagh, it is in userspace, who cares, and it is noindexed anyway'). --Dirk Beetstra T C 09:37, 9 July 2009 (UTC)[reply]

Re to "a realistic way to find useful userspace content", I am not worried about the useful content, I am worried about the useless content. The stuff that has to go. Sure, no solution has zero costs, but this solution does not solve it, it makes (a part of) the problem only smaller. --Dirk Beetstra T C 09:41, 9 July 2009 (UTC)[reply]

OK, but I'm a bit confused about how that fits with the point I made above: once it's mirrored, it will then appear in Google, and then we can find it (using the Firefox plugin to exclude Wikipedia from Google results, maybe). BUT: Having checked out Wikipedia:Mirrors and forks, I'm looking now at the most recent enwiki dump [4], and it suggests most mirrors will want the dump without userspace in it. Do we have a listing of how many mirrors (if any??) choose to use the larger dump including userspace? There is Wikipedia:Mirrors and forks/All but the issue doesn't seem to be systematically covered there (it's mostly about GFDL compliance). Rd232 talk 12:14, 9 July 2009 (UTC)[reply]
Just try it. At least a couple dozen, but noone will be able to give you an exact number. However, I'm not at all worried about mirrors. If they want to mirror our user (talk) pages, fine by me, search engines will list them with an appropriate page rank.
What I'm worried about is pages that live in user space, with the associated leeway MfD grants. Google will sometimes list them higher than the actual home page of the entity. Take "Americans for Educational Testing Reform" for example. The userfied page is the top Google result, higher then other Wikipedia pages mentioning it, and higher than the official page. It's being worked on from time to time by the user, doesn't violate any userspace guidelines, but would at the moment be inappropriate as an article (WP:N, WP:POV, WP:OR, from a glance).
There are several ways to handle this. We could just not care and say Wikipedia is big enough, it's Google's problem to rank the pages correctly. We could be much more strict with user pages. Or we could tell Google which namespaces we don't deem index-worthy.
All of those solutions have drawbacks. Google et. al. adapting to Wikipedia would certainly be best, but I don't see that happening in the near future. It boils down to what we find is more important: having everything indexed for easy access (and I'm sure that many of those user space pages can be useful resources, Dirk mentions one example above), or acknowledge the consequences of an inherently high page rank and prevent indexing of all pages that follow less strict guidelines and/or aren't policed as well, either because we want to show our best side (=article space) or because we acknowledge that unless people start patrolling user space, there is no way to control the bad and harmful pages there (WP:SPAM, WP:BLP).
Amalthea 16:05, 9 July 2009 (UTC)[reply]
Whoa! An old version of my talk page is mirrored on something called "chemistry daily"... [5] Um. Really, can't we try harder to discourage mirrors from taking userspace pages, unless there's an actual good reason for taking them? Rd232 talk 16:33, 9 July 2009 (UTC)[reply]
Is there an "official" statement as to what Wikipedia wants to present to the public? I am making assumptions as to what that is based on what I've read on Wikipedia policy (like WP:Blog), and what has been said publicly about Wikipedia by Wikipedia in the news. But even though I registered a couple of years ago, I only started editing regularly a few months ago. Who would decide this? Usually the site owner decides what to present to the public.. but who is that for Wikipedia? Us? The editors? Wikipedia Foundation? Jimbo Wales? Who? --stmrlbs|talk 06:16, 11 July 2009 (UTC)[reply]

Missing proposal

So we have a Proposal 3: Make User/Subpages NOINDEX, excluding them from search engines. proposal which deals with User/Subpages but no matching one for User talk/Subpages? - ALLSTRecho wuz here 19:54, 8 July 2009 (UTC)[reply]

I think that's worth adding to the mix. The sooner the better, so feel free. -- ǝʌlǝʍʇ ǝuo-ʎʇuǝʍʇ ssnɔsıp 20:06, 8 July 2009 (UTC)[reply]
According to MZM above, and a very brief test I did, user talk pages aren't indexed as of now presently. –xenotalk 21:56, 8 July 2009 (UTC)[reply]
Groovy. Thanks xeno. -- ǝʌlǝʍʇ ǝuo-ʎʇuǝʍʇ ssnɔsıp 22:08, 8 July 2009 (UTC)[reply]

(unindent) As of now? That makes it sound as though they have been for months previously or that there's a proposal to index them now. Bah. --MZMcBride (talk) 22:45, 8 July 2009 (UTC)[reply]

freaking grmr nazi ;p –xenotalk 22:54, 8 July 2009 (UTC)[reply]

Reasons behind User Talk page change to NOINDEX with opt-in INDEX

Does anyone know of the rationale behind the decision to not index User talk subpages? Has it always just been like that? -- ǝʌlǝʍʇ ǝuo-ʎʇuǝʍʇ ssnɔsıp 22:58, 8 July 2009 (UTC)[reply]
Yes, I would like to know that, too. I also would like to know the reasons behind these different policies for different wikis for the different countries, etc. I see the Deusch wikipedia has NOINDEX for the talk pages in most namespaces. The Danish Wikipedia has all user pages NOINDEX --stmrlbs|talk 23:36, 8 July 2009 (UTC)[reply]
Why would you want to be informed before making decisions? That's just silliness. ;-) Search Bugzilla and the proposals village pump archives for details about not indexing User_talk:. It's in there somewhere; too lazy to look myself. --MZMcBride (talk) 01:52, 10 July 2009 (UTC)[reply]
so, you don't know the reasons either, I take it. --stmrlbs|talk 03:58, 10 July 2009 (UTC)[reply]
No. See my comments to you above in the "Inherently flawed" section. The logic and reasoning skills on this page are quite simply appalling. For what it's worth, a quick Bugzilla search revealed that it was bug 13890. The relevant on-wiki discussion is linked to from the bug report. --MZMcBride (talk) 15:47, 10 July 2009 (UTC)[reply]
MZM, I see that you were the one who announced the change of the USER Talk page search indexing policy on the Administration Noticeboard. You were also a participant, not in the initial discussion that prompted the bug report, but in the discussion after where the change in User Talk Pages was discussed. Perhaps your participation in this policy change might be the reason for your intimate knowledge of this policy? Perhaps you could have just explained to the rest of the group why this decision was made, instead of again blustering about the lack of knowledge on everyone else's part because they were not privy to this 5 paragraph initial discussion which prompted changing the way the Search Engines indexed User Pages? You would have saved this group a lot of time. --stmrlbs|talk 19:52, 10 July 2009 (UTC)[reply]
The ad-hominem arguments ("so, you don't know the reasons either", "for your intimate knowledge of this policy") are definitely unhelpful. Because he made an announcement on AN about a technical change 10 months ago, you're alleging that he's purposely being unhelpful and lying about his "intimate knowledge" of our no-indexing "policy" rather than considering the more obvious answer, that he doesn't remember or write down every single discussion that he involves himself in, like most people. Mr.Z-man 20:14, 10 July 2009 (UTC)[reply]

For everyone else - why the User Talk Pages are NOINDEX by default:

  1. This discussion started the ball rolling for the User Talk Page indexing change
  2. This discussion in Village Pump followed, and was the basis for the Bugzilla Report submission
  3. A Bugzilla Report was submitted [here] and acted upon - resulting on the USER TALK PAGE default of NOINDEX
  4. MZMcBride announces the change on the Administrative Noticeboard
  5. Then after the USER TALK PAGE default of NOINDEX was in force, and made the status quo, Ned Scott opened this discussion asking why the change was made without some greater community discussion? A good question, imo. Ned requested reenabling User Talk Page indexing. But, because there was not a consensus to change the new status quo of USER PAGES not being indexed, so the discussion was closed.

--stmrlbs|talk 19:52, 10 July 2009 (UTC)[reply]

Your link in item 3 is wrong and you've made a distinction IN ALL CAPITAL LETTERS FOR SOME REASON between User: and User_talk: pages that isn't accurate. (Specifically "Then after the USER PAGE default of NOINDEX was in force" ← you mean USER TALK PAGE. There's a similar error in item 2.) I really think it would be best if you stopped posting here. --MZMcBride (talk) 19:58, 10 July 2009 (UTC)[reply]
Thanks for pointing out that I posted USER PAGE instead of USER TALK PAGE in a couple of places on my last post. I fixed that and I fixed the link so it correctly points to where you announced the change to USER TALK PAGE default of NOINDEX on the Administration Notice board. I think this will the question as to why the policy was changed for the User Talk pages. As for your comment "I really think it would be best if you stopped posting here", excuse me? I will ignore your WP:Bait. Please remember WP:Civil WP:Battle. --stmrlbs|talk 22:54, 10 July 2009 (UTC)[reply]

Another discussion on the User Talk pages - this time on the wikipedia foundation blog: [6]

Moving Forward

OK, so let's try and see where we can go from here. I'm going to summarise as I see it, hopefully to not too widespread disagreement.

  • (a) a 2/3 majority of those who took part in the RFC supported the general idea of noindexing
  • (b) the RFC was prematurely (after less than 2 weeks) turned into a straw poll, in a form which cut off continuing RFC discussion by collapsing it with a hat tag. (i) This severely limits the legitimacy of the outcome (ii) it was clearly premature in terms of the discussion, as demonstrated by the fact that it was only subsequently that we found out that one of the straw poll options (user talk page noindexing) had been implemented months ago.
  • (c) a weak majority in the present straw poll supports noindexing all userspace, without optional indexing
  • (d) the remaining opinion divides about equally between those supporting noindexing by default, with well-policed optional indexing, and those supporting indexing everything by default
  • (e) The weight of opinion seems to favour the argument that effectively implemented noindexing (either with well-policed optional indexing, or without optional indexing) will prevent spam being a problem, and that this outweighs the benefit of being able to use external search engines for userspace besides the internal Wikipedia search engine. WP:NOTWEBHOST summarises that position.
  • (f) clear evidence of the costs and benefits of different options remains outstanding. Some examples have been given, but to little effect in terms of convincing anyone.

So, to move things forward, I think we need to both collect more evidence, if possible, and develop the arguments, with a view to holding another RFC in the future (because the outcome of this one doesn't seem like to have the required legitimacy). I propose trying my Collaborative Position Statement idea: instead of "one person, one statement", we have "one position, one statement", and permit collaborative editing of those positions. I'm going to kick this off below. Please feel free to edit that statement, and to add new ones - but try not to proliferate statements for similar positions. Note: statements are not endorsed at this point, we're just developing the arguments. Later on we can use these statements as the basis of an RFC.

Rd232 talk 18:03, 11 July 2009 (UTC)[reply]

Now moved to the main RFC page. Rd232 talk 09:18, 15 July 2009 (UTC)[reply]
In response to (b), I don't see the "statement and support" style of RFC as a "discussion"... just a way to roughly gauge where people are and where consensus might already exist. It lasted long enough to serve that purpose. The discussion has happened and will happen here on the talk page. I agree with the rest of your summary, and I'm interested to see if your method can move us toward a consensus. Gigs (talk) 18:38, 12 July 2009 (UTC)[reply]
One thing about this RFC, I've learned a lot about procedure. I look this up after reading Rd232's comments, because I wasn't sure what the difference was. Wikipedia:POLLS. --stmrlbs|talk 21:45, 12 July 2009 (UTC)[reply]
I'm familiar with the community views on polling and building consensus. I believe the "statement + support" style RfC is very bad at building a consensus, since it does tend to discourage discussion. It's good at seeing where people stand though, and where potential consensus may exist. Gigs (talk) 22:25, 12 July 2009 (UTC)[reply]
I'm also concerned about point F, that we don't have clear evidence about what's going on. Lots of assertions but nothing to back it up. Would anyone oppose putting off any action of indexing/noindexing until we can get some data? -- Ned Scott 21:36, 12 July 2009 (UTC)[reply]
Imo, I think input from someone from the tech team of wikipedia who does the actual programming of setting up the robots.txt file and the programming changes which determine which pages are indexed and which are not would be a big help. Also, if there was a page where search policies were explained in English, that would be a big help to everyone on the project - not everyone has a programming background. Then at least the current status of search policy would be clear. Then we could go from there. --stmrlbs|talk 21:55, 12 July 2009 (UTC)[reply]
I agree that getting more technical information will help with implementation, and the communication of the eventual policy. However, the overall question before us is not a technical one, it is about what content we want to present to the world, and how we want to present it. On that question, the implementation details are irrelevant. Gigs (talk) 22:25, 12 July 2009 (UTC)[reply]
In principle, I agree. But, we aren't creating the Search Policy from scratch. The current status - status quo - is relevant because any change to the status quo has is harder to implement, and has to be justified (which is why there is an RFC to begin with). --stmrlbs|talk 22:58, 12 July 2009 (UTC)[reply]
The way I understand it, the current "policy" is just a de facto one, not one decided by consensus, or even by fiat. To me that says its all up for review. Gigs (talk) 23:04, 12 July 2009 (UTC)[reply]
you still have to know what current policy is in order to review it. Part of reviewing anything is to see how well it is currently working. But, I agree, the whole question of what should be searched (or not searched) and why should be reviewed. --stmrlbs|talk 23:11, 12 July 2009 (UTC)[reply]

Some compromise on GFDL/CC-BY(-SA)

To build a consensus which fits inside both letter and principle of our founding principles, we'll need some sort of compromise on the GFDL/CC-BY-SA requirements too.

Imagine if some outside site were to use pages and/or material from wikipedia, but using (google) advanced search didn't show up any reference or credits or even the word "wikipedia". Would you say they had met their requirements under CC-BY(-SA)/GFDL?

What if they've squirreled away their GFDL/CC-BY-SA compliance data on some noindexed page which can only be found by site regulars?

I think they'd be getting a nice friendly letter from Mike Godwin to tighten up their compliance procedures. ;-)

Now, noindexing both user pages and history pages basically means that it becomes somewhat hard to find our own compliance data (because afaict little or no compliance data whatsoever remains indexed) .

This means that we ourselves will be doing a barely adequate job on compliance (if even that) and that in turn doesn't exactly send the best of signals to our re-users either.

In conclusion:

  • We need *some sort of* compliance data to be indexed
  • I think we have to find either some sort of either/or compromise or apply some out-of-the-box thinking, where even if we decide to noindex user pages, some other aspect of user contributions will be indexed.
  • Personally I have no preference for what specifically gets indexed, as long as it is clear, has no performance issues, and is easily policed.
  • If people want to waive their rights under GFDL/CC-BY-SA, they are free to do so. However, I don't intend to waive my own rights.

So how do we ensure some kind of indexed compliance data? Perhaps a special: page might work? Or... what can people come up with? —Preceding unsigned comment added by Kim Bruning (talkcontribs) 15:38:07, 12 Jul 2009 (UTC)

How are people given credit now - can you give a specific example? And how is this credit dependent on user pages being indexed? --stmrlbs|talk 20:50, 12 July 2009 (UTC)[reply]
What is "compliance data"? The licenses we use don't mention search engines. Gigs (talk) 21:21, 12 July 2009 (UTC)[reply]
ips such as me dont even have a user page... how do i get credit then? 70.71.22.45 (talk) 01:35, 13 July 2009 (UTC)[reply]
You do get credit(via history pages) , but those aren't indexed. I still have a user talk or two around from when I was an IP, and nowadays, those are no longer indexed either. --Kim Bruning (talk) 01:56, 13 July 2009 (UTC)[reply]

Compliance data is already explained: it's shorthand for attribution, as required for compliance with CC-BY (paragraph 4b) CC-BY-SA (paragraph 4b) and GFDL (paragraph 2). Currently authors are attributed in page history, and also somewhat indirectly by use of user pages.

Now, strictly speaking, yes you are right, by simply publishing our data, we are probably in compliance... in theory.

However, by NOINDEXing *all* of it , the data is essentially off the (indexed) web. Understandably, I'm not very happy about that.

(I should probably reiterate the fact that /w/ is disallowed in our robots.txt, thus hiding page history etc from search engines)

--Kim Bruning (talk) 01:56, 13 July 2009 (UTC)[reply]

I don't really see how this is any different from your original statement, that you view allowing indexed self-promotion in User: space as some kind of "reward" for editing wikipedia. Gigs (talk) 02:24, 13 July 2009 (UTC)[reply]

Not self promotion, per-se. I've never believed in self-promotion; probably to my detriment ;-)
As to this being my view ... well... I think I'm quoting relevant chapter and verse of the legalese underpinning our site correctly, aren't I? At least as far as the reward part goes. The GFDL does attribution as a matter of course. Lawrence Lessig recognized its importance, and actually made it explicit in 2 licenses he's promoting (CC-BY, CC-BY-SA), and of course sf-writers like Cory Doctorow really run with the concept. And yes, Cory Doctorow only *seems* crazy, to this Future Shocked century, in fact he does seem to know what he's talking about ;-). --Kim Bruning (talk) 02:33, 13 July 2009 (UTC)[reply]
Kim, the article histories are not indexed now that I can see. So, how does a person see what you have worked on? How does google indexing Wikipedia give you credit for the articles that you have worked on? --stmrlbs|talk 02:23, 13 July 2009 (UTC)[reply]
If you know your way around wikipedia, you can look at a persons contributions, but that isn't indexed.
And that's the rub: none of this stuff is indexed, at least, not anymore. Over time it's been whittled down to almost nothing; only the user pages are left, really; and admittedly, they're not so great.
I'm open to suggestions for alternatives. --Kim Bruning (talk) 02:39, 13 July 2009 (UTC)[reply]
so, really, what you want is your user page to be indexed. You want your userpage to be displayed on the internet as a Wikipedia editor. This really doesn't affect the credit given to work, because changes to user page indexing will not affect article history. Then I would think the the ability to have opt-in indexing is what you want. If not, why not? --stmrlbs|talk 03:45, 13 July 2009 (UTC)[reply]

Kim, I respect what you're trying to do here, but I admit I just don't understand it. I honestly don't see how search engines enter into the attribution requirement. We have a clearly-visible button right at the top of every page that leads directly to the history of the page, which contains all of the attribution information necessary right there. I cannot even begin to fathom why someone would need to do a Google search for that information, when it's linked from the page itself. Powers T 11:05, 13 July 2009 (UTC)[reply]

Yes. The attribution information is right there, for everyone who knows where to look.
Great. So we've met our legal obligations, and now we have all this lovely attribution information. But it's just sitting there, doing nothing.
So what's the point of having it? Did the open-content gods just decide to require attribution information because they were having a good laugh, and wanted us to sweat for their amusement? ;-)
Well, I don't think so, so I posted an RFC post Wikipedia:Requests_for_comment/User_page_indexing#Statement_by_Kim_Bruning.
And I'm willing to compromise, maybe we can make *something* indexable?
--Kim Bruning (talk) 17:44, 13 July 2009 (UTC)[reply]
Kim, why don't you quote the part you are talking about here, for those of us who "don't know where to look". Because I just don't see where indexing is going to affect anything in where you've pointed to. --stmrlbs|talk 18:09, 13 July 2009 (UTC)[reply]
I can summarize: "thou shalt provide attribution", which basically boils down to having a page somewhere that says who did what when. It does not -I admit- explicitly mention indexing. --Kim Bruning (talk) 22:55, 13 July 2009 (UTC)[reply]
is it too much trouble for you to provide a link and quote the text you are talking about? It seems to be. But basically, I think this boils down to attribution is done on the history page of the article, which is not even a part of User space. So.. I don't think your point applies, as attribution for the articles will remain as is no matter where this RFC goes. --stmrlbs|talk 23:06, 13 July 2009 (UTC)[reply]
No, no, I was thinking I was saving you trouble. But you have every right to insist on links, and I am obligated to provide them where possible.
  • CC-BY legal code , section 4b; it's quite a long quote, most of which is relevant. Summarizing: "You are required to provide, reasonable to the medium: " i. author name, ii. title, iii. URI, iv. Adaption credit. (and I think we could argue whether or not allowing indexing is reasonable to the medium, if you want.)
  • CC-BY-SA legal code, this time section 4c, summary of which is similar to CC-BY section 4b (see above)
  • GFDL Section 4. more than section 2, which includes 4B: "list up to 5 principle authors on a title page" (not really us), and 4I: "Maintain a history section listing all authors." (definitely us).
I agree that attribution happens on history pages, but see discussion with LtPowers below.
I still don't understand how having userspace indexed somehow is making the attribution information "do something". What is it doing under that case that it isn't doing now? And for history pages, I'm even more confused. Powers T 19:44, 13 July 2009 (UTC)[reply]
I sense an "impedance mismatch" here: that is to say, we seem to be talking past each other somehow. If we're both a bit patient, I think we can work it out though.
In your view; what do you believe the attribution is there for, besides us being legally required to put it there? (In other words: why do we attribute, according to you?) --Kim Bruning (talk) 22:55, 13 July 2009 (UTC)[reply]
I hadn't thought about it before you brought up the subject, so I'm perfectly willing to accept your suggestion that it is intended to provide reward in the form of recognition of efforts. That seems reasonable to me, and is most likely the reason I license my own photographs with an attribution requirement. Powers T 00:14, 14 July 2009 (UTC)[reply]
Right. So it occurred to me that attribution doesn't go very far, if it's all NOINDEXed. I think it's good to have some form of attribution that does get indexed by search engines.
(I'm actually not married to it being user pages. User pages are not really great for attribution, I know, but they happen to be just about the only things still indexed. :-/ )
And that's basically all there is. Does that manage to make a bit more sense for you now?
--Kim Bruning (talk) 01:49, 14 July 2009 (UTC)[reply]
Ah, so just to be clear: what you actually want is to type in "Kim Bruning" in Google, and get recognition of your article work by getting lots of hits? Amalthea 08:13, 14 July 2009 (UTC)[reply]
What she said ^. I guess I don't think of attribution from that perspective, Kim. I think "here's a work; who made it?" not "here's a person, what has she done?" Using a search engine for that purpose seems a bit kludgy. This is the web; it should be done with links, ideally. If I want to publicize my efforts here at Wikipedia, I can do so with links to my user page and contributions history. Powers T 12:16, 14 July 2009 (UTC)[reply]
Or Amalthea, or LtPowers, or etc. Yes.
It's an intended main effect of the license. People that gain reputation through attribution can then apply that reputation to do other cool things, either to help the wiki, or to otherwise further free/open content.
There's a lot of theory behind how and why that works, and how we can apply it to improving society in general. A lot of friends and acquaintances of mine are really into this kind of thing, AND some of those friends are actually doing stuff that shows up on world scale already. ;-)
Wikipedia is a lot of fun, but you didn't think it was *just* for fun, right? And it's part of a larger community, all working towards similar goals.
But bottom line for today, for my small part, yes it does help if (some aspect of) my contributions can be found by search engines; and I'm also pretty sure I'm not the only person with that concern. --Kim Bruning (talk) 12:32, 14 July 2009 (UTC)[reply]
Well, no, that wasn't at all the "intended main effect". GFDL was used, at the time, because it was an established license for textual content with a "viral" copyleft license. To the best of my knowledge, the reputation of the specific contributors was of no concern, and the founding principles you quoted quite explicitly focus on free licensing. I don't think there was a proper alternative to GFDL at the time, but the reputation of Wikipedia as a benefactor of attribution was certainly welcomed.
Furthermore, all the people who help here anonymously or are using a pseudonym (which I dare say are more than 50%) are quite obviously not concerned about gathering reputation and do help out here as an end in itself. Also, all we require from re-users to fulfill the attribution requirement is typically that they link back here, or at least say that they got their content from Wikipedia – nobody even tries to enforce that they list all authors.
That being said, with your reworded Item 8 I now understand what you want and can relate to it, but don't support it since I find it of no importance, possibly even detrimental to getting good search results from search engines. Amalthea 15:54, 14 July 2009 (UTC)[reply]
You are correct that Attribution is not one of the 4 freedoms. However, most current free content licenses do provide Attribution, as a means to induce people to contribute to the free content ecosystem.
Some people pick up on that promise.
Feel free to suggest a new free content license for wikipedia which leaves out the attribution requirement. Some people might even cross-license their content, as they do and did with CC-BY-SA. I would respect other people's choice to use such a license, but I do hope that you equally respect my own choice not to do so.
Note that over time a number of people have learned some lessons, and switched to using real names (google for Snowspinner), or at least have connected a real name to their pseudonym (google for Eloquence or Mindspillage).
--Kim Bruning (talk) 16:46, 14 July 2009 (UTC)[reply]
What you are looking for though is a kind of reverse attribution: Not having a piece of work and through it getting to a list of who contributed, but having a contributor and finding out where he contributed. And seriously, that attribution is "a means to induce people to contribute to the free content ecosystem" is your opinion, nothing more, as is the supposed argument that people switch to real names because they learned any lesson (I assume you mean to imply that said lesson is a desire for reputation). I understand your desire for reputation, as I said, but highly doubt your assumptions and conclusions. Amalthea 17:30, 14 July 2009 (UTC)[reply]
Hmmm, now we're getting to philosophy. Have you read anything by Alvin Toffler or Corey Doctorow? In the mean time, why do you think we have attribution? --Kim Bruning (talk) 17:32, 14 July 2009 (UTC)[reply]
I do not know why. I'm certain you can find the discussions that lead to the GFDL in the old Nupedia mailing lists, or by asking one of the veterans, or Jimbo. If I had to guess, I would say that GFDL was chosen quite pragmatically because it was a successful and established copyleft license that was there, and fulfilled all other requirements. Creative Commons didn't exist at the time (or at least hadn't developed their licenses yet), so something like CC-SA was not an option.
Wikipedia's purpose is to build an encyclopaedia. The page history helps with its approach. Its main purpose is not the glorification of the contributors. Amalthea 19:52, 14 July 2009 (UTC)[reply]
IIRC among other things it was a requirement for the merger with GNUPedia. And it was accepted. Currently, we are leaving the GFDL behind us, and are switching to CC-BY-SA. But CC-BY-SA still has the attribution requirement. If you are so opposed to attribution, why didn't you speak up against the explicit BY (attribution) requirement when people proposed the license switch?
But even if you had spoken up, I think you would have found that all the alternatives also have some form of attribution requirement (For starters, creative commons does not provide a license option without attribution).
Apparently attribution is something very fundamental to free content. Have you thought about why that is before? --Kim Bruning (talk) 04:28, 29 July 2009 (UTC)[reply]
I don't recall ever speaking out against attribution. I'm saying it's unrelated to the question about search engine indexing directives.
And the recent license switch is irrelevant, too: once we had GFDL, switching existing licensed content to a license without attribution would have been impossible, I assume, and as you say, CC has deprecated all non-BY licenses anyway. Amalthea 06:12, 29 July 2009 (UTC)[reply]
Oh, excellent! I was under the impression that you were opposed to attribution.
Now my argument is admittedly somewhat weaker when we start getting to details over whether or not our attribution is actually still valuable or not, if it isn't indexed by any search engine.
But I'm happy to leave things at the point where attribution is a good idea, just that personally I'd like to do more of it (both in quantity and quality). ;-) --Kim Bruning (talk) 21:22, 29 July 2009 (UTC)[reply]
@LtPowers... I'm confused again; If you say it should be done with links, you DO realize that a search engine is basically a tool to manage and organize links (and that which they link to), right? (NOINDEX and NOFOLLOW are search engine directives). And there are whole bunches of people who ask either or both your questions "who made it", and "what did this person make".
Example 1 : 2001:_A_Space_Odyssey_(novel) was written by Arthur Clarke. If we then search for arthur clarke, we might find that he also wrote The Sentinel (short story). If you read on, you can find that the two stories are related (The Sentinal from the short story evolved into TMA1 in 2001).
Example 2: Linux is written by Linus Torvalds, as you can find out fairly quickly. Quick perusal of google finds that Torvalds also wrote something called Git (software), which is apparently a great tool for source code management, and this tool is used to maintain Linux.
This chain of work, person, work is often useful because often one person will be working on different things that have some relationship with each other. I might now decide to read The Sentinal, use and/or contribute to either Linux or Git, or I might decide to send Mr. Torvalds some virtual beer (no anchor, so you'll have to search the text).
And that's just two examples of search strategys using names, and how having your name indexed might help you or your works in life. :-)
--Kim Bruning (talk) 16:17, 14 July 2009 (UTC)[reply]
Why in the world would you go to Google to search for Arthur Clarke, when you have a link right to his article? That's what I don't get -- if you read about "2001" and read that it was written by Arthur Clarke, the power of the World Wide Web lies in the fact that "Arthur Clarke" should be linked to a page of information about Arthur Clarke. Powers T 17:58, 14 July 2009 (UTC)[reply]
I'm giving examples from wikipedia, because that happens to be easy to quickly read and check. The web is a bit bigger than wikipedia alone, and not everyone uses wikipedia every day. ;-) --Kim Bruning (talk) 18:32, 14 July 2009 (UTC)[reply]
Two answers: One, I never mentioned Wikipedia in my response. Two, even if I did, that is what we're talking about -- attribution on Wikipedia. My point is that proper deployment of links should be far more effective at disseminating one's corpus than just having a user page available and hoping someone will stumble upon it via a Google search. Powers T 19:22, 14 July 2009 (UTC)[reply]
I suppose there's a lot of things that could be indexed that would be superior to user pages. Can you outline how you would go about things? --Kim Bruning (talk) 19:27, 14 July 2009 (UTC)[reply]
I can't say I'd change anything as far as attribution goes. I'm personally satisfied with the available methods of describing and publicizing my contributions. Powers T 01:30, 15 July 2009 (UTC)[reply]

Response to JoshuaZ's comment

Regarding mandatory warning on user pages:

Measures to limit user page exposure go against spirit of meta:founding principles

Something no one has really discussed properly so far. Foundation principles state that we use free licensing (#4) . The core requirement of the most minimal of free licenses we accept (CC-BY) is that authors contributions are recognized. This is part of the implicit and explicit contract that goes with the territory of free/open content. Sure, we still have history pages too, but those aren't exactly handled entirely to spec either.

Some of you whipper-snappers might have much higher ideals (such as donating your time to public domain) but not all of us here are public domain people. Check the top left corner of the page: Wikipedia is the free encyclopedia, not the public domain encyclopedia. Some rights reserved, including the right to recognition.

  • Support I never thought I'd have the *least* starry eyed opinion on wikipedia for once. But there is a trade being made. We should live up to the deal! --Kim Bruning (talk) 01:22, 11 July 2009 (UTC) I just checked... our robots.txt disallows \w\ , so that includes page history. So if the proposal to disallow user page indexing to go through, there will be much less mention of wikipedians in google at all. That's rather sad on many levels. [reply]

And wait just a minute. Did someone actually go through a proper RFC, and refactor it into poll format, whilst semi-arbitrarily assigning opinions.... and no one protested this? --Kim Bruning (talk) 09:14, 11 July 2009 (UTC)[reply]

I've just been ignoring it. Should someone actually try to do do something on the basis of this flawed survey, or whatever it is (in the latest incarnation) . . .I'll protest. R. Baley (talk) 09:32, 11 July 2009 (UTC)[reply]
Giant mess all around. ^^;; <sigh> --Kim Bruning (talk) 13:20, 11 July 2009 (UTC)[reply]
Of the meta:founding principles, which one(s) do you feel "noindex" of user/user talk pages violates specifically and why? -- ǝʌlǝʍʇ ǝuo-ʎʇuǝʍʇ ssnɔsıp 20:30, 12 July 2009 (UTC)[reply]
Wooh! Red herring! Fire up the BBQ! User page and User talk contributions are credited in giant black nasty bold letters at the top of the page where they are created. Promotion you have to do yourself. Aw, the red herrings are gone, so soon. Ah, wait, there's another scrap...The death of this RFC has been greatly exaggerated. There. Yummy. :) Anarchangel (talk) 20:43, 12 July 2009 (UTC)[reply]
This actually might count as talk, would you folks mind if we moved the thread there? I already posted some things about this on the talk page (last section, in fact ;-) )
I believe there's some issues with the 4th, since if some of these positions are implemented, all of the ways in which we fulfill the BY requirement of CC-BY, and paragraph 2 of GFDL (in whole or in part) will be NOINDEXed. Legally, we might just be able to get away with it, but it's definitely a tad dirty, imho. --Kim Bruning (talk) 20:55, 12 July 2009 (UTC)[reply]
No, advising search engines to ignore user and user talk pages has not the least impact on GFDL or CC-BY compliance. Amalthea 21:32, 12 July 2009 (UTC)[reply]
All on its own, and if you look at it from a 100% legal standpoint; you are obviously correct. but see talk page for more details. --Kim Bruning (talk) 22:26, 12 July 2009 (UTC)[reply]

Kim, now that I have reopened the statement section, please work your additional statement here into that, and move the discussion to the talk page. I'll leave it for you to do since it's not entirely straightforward. You may delete my comments here when you do it. Gigs (talk) 21:36, 12 July 2009 (UTC)[reply]

Ok, I'll work on that in the morning, and add the details from the talk page too. Give me a kick in the pantsreminder if I forget. ;-) Thanks for looking in! --Kim Bruning (talk) 22:26, 12 July 2009 (UTC)[reply]
Hmph, so I promised to look at it today, and now, looking at it myself, it looks like that's going to be seriously non-trivial, actually. <scratches head>.
There's 2 different modes of communication above and neither of them are really building a consensus by the look of things. So I'll just be making a big mess bigger :-/ We need to do something more constructive somehow... --Kim Bruning (talk) 00:57, 14 July 2009 (UTC)[reply]
I don't think this section giving extra weight to your position is helping build consensus either. Gigs (talk) 01:07, 14 July 2009 (UTC)[reply]
Well, I guess I broke it, so I'm going to have to figure a way to fix it. Please don't shoot! --Kim Bruning (talk) 01:51, 14 July 2009 (UTC)[reply]
Thanks. Gigs (talk) 12:48, 14 July 2009 (UTC)[reply]

Back to statements?

It seems the subject-space page is now back to using statements like a proper RFC should. This is a Good Thing. A few points are still very concerning:

  • "Proposal 2: Make User_talk: namespace NOINDEX, excluding it from search engines." — You can't make the User_talk: namespace noindex'd. It already is. This seems like a fundamental misrepresentation that people are voting on.
  • "Proposal 6: Include a mandatory and obvious warning on all user and user_talk pages and subpages, such as a different background or a textual warning that informs the reader they are not looking at an encyclopedia article." — This also seems like a misrepresentation. Backgrounds for non-articles have been colored for years.

I'm very concerned that people are voting without knowing all (or any) of the relevant facts or background. This is one of the reasons Wikipedians don't like polls. --MZMcBride (talk) 22:55, 14 July 2009 (UTC)[reply]

At the time those were drafted, no one participating had brought up the fact that user_talk wasn't indexed. Since there was a question about the consensus when that decision was made, I don't think it's inappropriate to examine the issue again "de novo". Regarding proposal 6, the wording was supposed to suggest that the exact means of more prominent communication was yet to be determined. Light blue vs white isn't very prominent or obvious to the casual observer. Since it does seem that there is a pretty good consensus for proposal 6, we might want to start talking about what sort of message/technique should be used to make it obvious to the casual reader that they are not looking at a mainspace article.
Where do you think we should go from here? Gigs (talk) 00:23, 15 July 2009 (UTC)[reply]

My summary of consensus

  • Proposal 1: Make User: namespace NOINDEX, excluding it from search engines. -- 35/10 with vehement and valid objection from some... no consensus.
  • Proposal 2: Make (Keep) User_talk: namespace NOINDEX, excluding it from search engines. (status quo/revote) 33/7 with mostly mild opposition. support of status quo of NOINDEX
  • Proposal 3: Make User/Subpages NOINDEX, excluding them from search engines. 32/10/1 pretty much no consensus again
  • Proposal 4: Allow users to opt-in to bypass the NOINDEX, with a template or with __INDEX__. 19/20 This one seems to be a non-starter. significant opposition
    • Proposal 5: If there is consensus for opt-in indexing, should there be a threshold for usage, like autoconfirmation. moot
  • Proposal 6: Include a mandatory and obvious warning on all user and user_talk pages and subpages, such as a different background or a textual warning that informs the reader they are not looking at an encyclopedia article. 26/5 the few opposes there were generally had no rationale or were weak opposition. strong consensus to implement
  • Proposal 7: Open a dialog with search engine providers to tell them that User and User_talk should be ranked lower. no consensus/significant opposition

It looks to me that our main action going forward here should be on Proposal 6. We should formulate what kind of automatic warning we want to have on user pages and subpages. Gigs (talk) 17:52, 19 July 2009 (UTC)[reply]

A sound reading. Proposal 6 should be easy to implement, and to reverse if the community unexpectedly screams. --SmokeyJoe (talk) 22:15, 19 July 2009 (UTC)[reply]
I actually don't see much evidence there is great difference between the current level of support and opposition between proposal 2 and proposal 1 + 3. It seems to me from the discussion proposal 2 was implemented without much discussion and many people weren't even aware it was the case during the initial discussion. I think this is relevant because it goes to the heart of how issues like this are decided. We shouldn't be pretending that there is great insumountable oppposition to 1 and 3 and then with the same breath ignore the fact there is nearly the same level of opposition to something that while the recent status quo, never went thorough the same level of review as 1 and 3. While I freely admit I support 1 most of all (but also 2 and 3), I'm not saying we should implement 1 or 3 right now given the status quo. Rather we need to continue discussion until there is really a clear and undeniable consensus for 2 (and clearly no consensus for 1 and 3) or in absence of that, consider whether it's appropriate to continue 2. Nil Einne (talk) 10:43, 20 July 2009 (UTC)[reply]
Nil, it's also the nature of the opposition that matters. The few opposes there were to proposal 2 were generally no rationale given, weak rationales, or only weak opposition. Gigs (talk) 14:32, 21 July 2009 (UTC)[reply]

Well that encapsulates this RFC, which itself encapsulates what is wrong with WP governance. You can't get anything done with this RFC structure, not least because discussion is either messy on talk pages, or badly structured on the main page (Lots of Comments and Reams of Endorsements, little progress in discussion), which is fine for simple things, but awful for establishing, clarifying, and debating complex issues. Then it gets boiled down to a vote (sorry, !vote) and generally the conclusion is "no consensus" (either formally or because the debate just dies) because nothing's really been achieved. After all this talk, we still don't have clarity about the arguments, about the pros and cons of different scenarios, etc. Just a load of pointless hot hair. I tried pointing in a more useful direction with the Collaborative Position Statement, but perhaps by then it was too late (too much energy expended already, people given up). Can we please please though try and draw some lessons from this debacle? Rd232/Disembrangler (talk) 12:11, 20 July 2009 (UTC)[reply]

We draw lessons, and we write them down, but people keep reading in their own biases and will tend to bend the policy/guidelines/essays back to the Wrong Way To Do It, over time. This is not entirely surprising, if you remember that wikipedia is quite counter-intuitive to a lot of people.
We have an annual or biannual cycle to keep things ship-shape, and we could definitely use more help! :-)
--Kim Bruning (talk) 11:31, 21 July 2009 (UTC)[reply]
The "reading in bias" works both ways.. to keep the status quo of a policy because it is of benefit to a small group, as well as to change a policy. --stmrlbs|talk 18:46, 21 July 2009 (UTC)[reply]
Yup, and that's why "status quo" is explicitly not a good reason to keep things the way they are: consensus can change, after all. --Kim Bruning (talk) 21:29, 25 July 2009 (UTC)[reply]

Time for a new RfC?

@Gigs: I see that you were involved in setting up this RfC in 2009.  Since then there has been good agreement at the incubator, and now in draftspace, that such pages automatically be noindex.  I see many user space pages taken to MfD, and I have to wonder if there would be fewer such MfDs if user space were marked noindex.  Is it time for a new RfC?  Unscintillating (talk) 21:35, 12 April 2014 (UTC)[reply]

@Unscintillating: With Draft: being noindex by default, I think this proposal might get even less traction. At a minimum, we should give it some time to see how Draft winds up being used or not used, as the case may be. Gigs (talk) 21:25, 15 April 2014 (UTC)[reply]