User:Monkbot/task 17: remove replace deprecated last-author-amp params

From WikiProjectMed
Jump to navigation Jump to search

The new version of the Module:Citation/CS1 suite deprecates |last-author-amp= and |lastauthoramp= because these two parameters style all name lists (author, contributor, editor, interviewer, and translator) even when there is no author name-list. Because of this change, |name-list-format= will also be deprecated in favor of a new parameter, |name-list-style=, which will accept a variety of keywords.


description

|last-author-amp= and |lastauthoramp= accept a limited set of keywords that control the rendering of all name-lists in a cs1|2 citation template. The keywords that concern this task are:

yes, y, true

These deprecated parameters with any of these keywords as assigned values will be converted to |name-list-style=amp.

|name-list-format= accepts only the single keyword vanc. These parameters will be converted to |name-list-style=vanc.

|last-author-amp=, |lastauthoramp=, and |name-list-format= are styling parameters that, in the ideal world, should be included in every cs1|2 template where it makes sense to have them. When it does not make sense to have these parameters, task 17 will delete them:

|last-author-amp=, |lastauthoramp= will be deleted when:
  • the parameter does not have an assigned value
  • the no name-list in a cs1|2 template has more than one name
|name-list-format= will be deleted when
  • the parameter does not have an assigned value
  • the cs1|2 template does not an author name-list and does not have an editor name list
  • the cs1|2 template uses either of |vauthors= or |veditors= (with assigned value)

Templates that contain an <!--<html comment>--> are skipped.

ancillary tasks

Task 17 deletes all empty parameters from templates that are repaired. |url-status= with or without assigned value is deleted as pointless when |archive-url= and |archive-date= are empty or missing. Also deletes |deadurl=y which is pointlessly and incorrectly added by no-longer-maintained user-script WP:REFILL.

This task does not do awb general fixes.

edit summaries

Task 17 writes an edit summary message that tallies the number of replacements, the number of deletions, and an indication of the parameter's usage in the article. The message has several forms:

  • replace deprecated: |last-author-amp= (<replace count>× replaced; usage: <replace count> of <eligible count>);
  • delete deprecated: |last-author-amp= (<delete count>× deleted; usage: <replace count> of <eligible count>);
  • replace / delete deprecated: |last-author-amp= (<replace count>× replaced; <delete count>× deleted; usage: <replace count> of <eligible count>);

The edit summaries for |name-list-format= are similar except that the lead-in text reads:

  • to-be-deprecated: |name-list-format=

When task 17 makes both fixes, the |name-list-format= summary is concatenated onto the end of the |last-author-amp= summary.

Definitions:

  • <replace count> – the number of parameters that have been replaced with the new parameter name
  • <delete count> – the number of parameters that have been deleted
  • <eligible count> – the number of cs1|2 templates that are eligible to use the parameter (count before any replacements/deletions)

When task 17 deletes empty parameters, it appends 'empty parameters removed;' to the end of the edit summary.

The edit summary has a link to this page.

script

// This script
//		replaces |lastauthoramp=<[yes|y|true]> and |last-author-amp=<[yes|y|true]> with |name-list-format=amp if
//			|name-list-format=<anything> is not present
//		deletes |lastauthoramp=<[yes|y|true]> and |last-author-amp=<[yes|y|true]> when
//			cs1|2 template does not have multiple author, contributor, editor, interviewer, or translator (one of these must have |<name>2= or alias)
//
// wikitext search:
//		hastemplate:"Module:Citation/CS1" insource:/\| *last\-?author\-?amp *=/
//		hastemplate:"Module:Citation/CS1" insource:/\| *name\-list\-format *=/
//


public string ProcessArticle(string ArticleText, string ArticleTitle, int wikiNamespace, out string Summary, out bool Skip)
	{
	Skip = false;
//	Summary = "[[User:Monkbot/task 17: remove replace deprecated last-author-amp params|Task 17]] (developmental testing):";
//	Summary = "[[User:Monkbot/task 17: remove replace deprecated last-author-amp params|Task 17]] ([[Wikipedia:Bots/Requests_for_approval/Monkbot_17|BRFA]] trial):";
	Summary = "[[User:Monkbot/task 17: remove replace deprecated last-author-amp params|Task 17]]:";
	
	string	pattern;
	string	replace;
	int		laa_count = 0;
	int		laa_eligible = 0;
	int		laa_replaced = 0;
	int		laa_deleted = 0;
	int		laa_skipped = 0;
	
	int		nlf_replaced = 0;
	int		nlf_eligible = 0;
	int		nlf_deleted = 0;
	
	int		comment_skip = 0;
	
	bool	params_removed = false;


	string	IS_CS1 = @"(?:[Cc]ite[_\-\s]*(?=(?:AV [Mm]edia(?: notes)?)|album\-notes|[Aa][Vv] media|[Aa][Vv] media notes|article|ar[Xx]iv|audio|biorxiv|blog|book|chapter|conference|contribution|dictionary|dissertation|document|DVD|dvd|encyclopa?edia|episode|iucn|image|interview|[Jj]ournal|letter|liner notes|[Mm]agazine|mailing ?list|manual|map|media release|media|newsgroup|newspaper|(?:[Nn]ews(?!group|paper))|[Nn]ew|paper|plaque|podcast|press release|press|publication|pr|radio|report|serial|sign|speech|techreport|thesis|video|url|wb|[Ww]eb|[Ww]ork|act|[Hh]ansard|periodical)|[Cc]itation|[Cc]ite(?=\s*\|)|AIOH|[Cc]it news|[Cc]it web|[Cc]ita web|[Cc]itar notícia|[Cc]itat web|[Cc]ite DANFS|[Cc]ite [Ss]ports\-[Rr]eference|[Cc]ite tweet|[Cc]ite we|[Cc]ite vob|[Cc]w|eFloras|Gilliland|PFAF|PLANTS|SA Rugby Article|Silvics|[Ww]eb cite)";

	string	laa_pattern = @"(\|\s*)last\-?author\-?amp(\s*=\s*)\b(?:yes|y|true)\b";		// always test this first; look for valid param values
	string	laa_invalid_pattern = @"\|\s*last\-?author\-?amp\s*=[^\|\}]";				// always test after laa_pattern; finds param with any or no value
	string	nlf_pattern = @"(\|\s*)name\-list\-format(\s*=\s*vanc\s*)";

	string	laa_names_pattern = @"last2|author\-last2|author2\-last|author2|surname2|subject2|host2|contributor\-last2|contributor2\-last|contributor2|contributor\-surname2|contributor2\-surname|editor\-last2|editor#\-last|editor2|editor\-surname2|editor2\-surname|interviewer\-last2|interviewer2\-last|interviewer2|translator\-last2|translator2\-last|translator2|translator\-surname2|translator2\-surname";

	string	nlf_names_pattern = @"first\d*|author\-first\d*|author\d*\-first|given\d*|author\-given\d*|author\d*\-given|contributor\-first\d*|contributor\d*\-first|contributor\-given\d*|contributor\d*\-given|editor\-first\d*|editor\d*\-first|editor\-given\d*|editor\d*\-given|interviewer\-first\d*|interviewer\d*\-first|interviewer\-given\d*|interviewer\d*\-given|translator\-first\d*|translator\d*\-first|translator\-given\d*|translator\d*\-given";

	string	nlf_vnames_pattern = @"vauthors|veditors";

//---------------------------< B E G I N >--------------------------------------------------------------------

	ArticleText = hide (ArticleText, IS_CS1);						// hide all templates except cs1|2 and hide wikilinks


//---------------------------< L A S T - A U T H O R - A M P >------------------------------------------------
//
// count the number of cs1|2 templates
//

	pattern = @"\{\{\s*" + IS_CS1 + @"[^\}]+\}\}";
	if (Regex.Match (ArticleText, pattern).Success)
		ArticleText = Regex.Replace (ArticleText, pattern,
			delegate(Match match)
				{
				string	raw_template = match.Groups[0].Value;						// this will be returned if no changes
				string	fixed_template = raw_template;
				bool	changed = false;

				laa_count++;														// bump total number of cs1|2 templates tally
				
				Match	laa_match = Regex.Match (fixed_template, laa_pattern);		// look for |last-author-amp= param with valid assigned value
				Match	laa_names_match = Regex.Match (fixed_template, laa_names_pattern);	// look for second author, contributor, editor, interviewer, translator name; assumes that there is a matching first name
				Match	nlf_match = Regex.Match (fixed_template, nlf_pattern);		// look for |name-list-format= param with valid assigned value (vanc)
				Match	nlf_names_match = Regex.Match (fixed_template, nlf_names_pattern);	// look for first, contributor-first, editor-first, interviewer-first, translator-first; assumes that there is a matching last name
				Match	nlf_vnames_match = Regex.Match (fixed_template, nlf_vnames_pattern);	// look for |vauthors=, |veditors=
				
				if (laa_names_match.Success)
					laa_eligible++;

				if (laa_match.Success)												// template has param with valid value?
					{
					pattern = @"\<!\-\-";
					if (Regex.Match (fixed_template, pattern).Success)				// if html comment is found in this template
						{
						comment_skip++;
						return raw_template;										// abandon this template
						}

					if (laa_names_match.Success)									// if a second name, replace it
						{
						if (nlf_match.Success)										// if |name-list-format=vanc, |last-author-amp=<anything> is pointless so delete
							{
							fixed_template = Regex.Replace (fixed_template, laa_pattern, "");
							nlf_deleted++;
							changed = true;
							}

						fixed_template = Regex.Replace (fixed_template, laa_pattern, "$1name-list-style$2amp");		// with new parameter name
						laa_replaced++;
						changed = true;
						}
					else															// no second name, delete
						{
						fixed_template = Regex.Replace (fixed_template, laa_pattern, "");
						laa_deleted++;
						changed = true;
						}
					}
				
				if (Regex.Match (fixed_template, laa_invalid_pattern).Success)		// when |last-author-amp= exists with empty or invalid parameter value
					{
					fixed_template = Regex.Replace (fixed_template, laa_invalid_pattern, "");	// delete
					laa_deleted++;
					changed = true;
					}

//-----

				if (nlf_names_match.Success)
					nlf_eligible++;

				if (nlf_match.Success)												// if |name-list-format=vanc found
					{
					pattern = @"\<!\-\-";
					if (Regex.Match (fixed_template, pattern).Success)				// if html comment is found in this template
						{
						comment_skip++;
						return raw_template;										// abandon this template
						}

					if (nlf_vnames_match.Success || !nlf_names_match.Success)				// if template has |vauthors= / |veditors= or does not have first names
						{
						fixed_template = Regex.Replace (fixed_template, nlf_pattern, "");	// delete |name-list-format=vanc as unnecessary
						nlf_deleted++;
						changed = true;
						}
					else
						{
						fixed_template = Regex.Replace (fixed_template, nlf_pattern, "$1name-list-style$2");		// rename with new parameter name
						nlf_replaced++;
						changed = true;
						}
					}

				if (changed)
					{
					fixed_template = empty_param_remove (fixed_template, ref params_removed);
					return fixed_template;
					}

				return raw_template;
				});

		
//---------------------------< F I N I S H >------------------------------------------------------------------

	ArticleText = unhide (ArticleText);													// unhide all that is hidden

	string	laa_fixes = @"";
	string	nlf_fixes = @"";
	
	if (0 != laa_replaced)																// first do last-author-amp summary
		laa_fixes = laa_fixes + laa_replaced + @"× replaced; ";
	if (0 != laa_deleted)
		laa_fixes = laa_fixes + laa_deleted + @"× deleted; ";
	if (0 != laa_skipped)
		laa_fixes = laa_fixes + laa_skipped + @"× skipped; ";

	if ((0 != laa_replaced) && (0 != laa_deleted))
		Summary = Summary + " replace / delete deprecated:";
	else if (0 != laa_replaced)
		Summary = Summary + " replace deprecated:";
	else if (0 != laa_deleted)
		Summary = Summary + " delete deprecated:";

	if ((0 != laa_replaced) || (0 != laa_deleted))
		{
		Summary = Summary + " |last-author-amp= ";
		Summary = Summary + "(" + laa_fixes + " usage: " + laa_replaced + " of " + laa_eligible + ");";
		}

	if ((0 != nlf_replaced) && (0 != nlf_deleted))
		Summary = Summary + " replace / delete to-be-deprecated:";
	else if (0 != nlf_replaced)
		Summary = Summary + " replace to-be-deprecated:";
	else if (0 != nlf_deleted)
		Summary = Summary + " delete to-be-deprecated:";

	if (0 != nlf_replaced)																// now do name-list-format summary
		nlf_fixes = nlf_fixes + nlf_replaced + @"× replaced; ";
	if (0 != nlf_deleted)
		nlf_fixes = nlf_fixes + nlf_deleted + @"× deleted; ";
	
	if ((0 != nlf_replaced) || (0 != nlf_deleted))
		{
		Summary = Summary + " |name-list-format= ";
		Summary = Summary + "(" + nlf_fixes + " usage: " + nlf_replaced + " of " + nlf_eligible + ");";
		}

	if (params_removed)
		Summary = Summary + @" empty parameters removed;";
	
	if (1 == comment_skip)
		Summary = Summary + comment_skip + @"× template skipped;";
	else if (1 < comment_skip)
		Summary = Summary + comment_skip + @"× templates skipped;";

	return ArticleText;
	}


//===========================<< S U P P O R T >>==============================================================

//---------------------------< H I D E >----------------------------------------------------------------------
//
// HIDE TEMPLATES: find templates that are not <dont_hide>; replace the opening {{ with __0P3N__, the closing }}
// with __CL0S3__, and internal | (pipes) with __P1P3__
//
// single curly braces in urls and other parameter values can confuse other regex in this code so replace {
// with __0CU!21Y__ and } with __CCU!21Y__
//

private string hide (string ArticleText, string dont_hide)
	{
	string pattern = @"\{\{(?!\s*" + dont_hide + @")[^\{\}]*\}\}";
	if (Regex.Match (ArticleText, pattern).Success)
		{
		ArticleText = Regex.Replace(ArticleText, pattern,
			delegate(Match match)
				{
				string	fixed_template;										// a hidden template is assembled here
				string	raw_template = match.Groups[0].Value;				// the whole template

				pattern = @"\{\{";											// hide the opening {{
				fixed_template = Regex.Replace (raw_template, pattern, "__0P3N__");

				pattern = @"\}\}";											// hide the closing }}
				fixed_template = Regex.Replace (fixed_template, pattern, "__CL0S3__");

				pattern = @"\|";											// and hide the pipes
				fixed_template = Regex.Replace (fixed_template, pattern, "__P1P3__");

				return fixed_template;
				});
		}

	pattern = @"([^\{])\{([^\{])";											// single opening curly brace
	ArticleText = Regex.Replace(ArticleText, pattern, "$1__0CU!21Y__$2");

	pattern = @"([^\}])\}([^\}])";											// single closing curly brace
	ArticleText = Regex.Replace(ArticleText, pattern, "$1__CCU!21Y__$2");

	pattern = @"\[\[(?![Ff]ile|[Ii]mage)([^\|\]]+)\|([^\]]+)\]\]";			// HIDE complex wikilinks: [[article title|label]] to __WL1NK_O__article title__P1P3__label__WL1NK_C__
	ArticleText = Regex.Replace(ArticleText, pattern, "__WL1NK_O__$1__P1P3__$2__WL1NK_C__");	// [[File: with wikilinks inside can be confusing

	pattern = @"\[\[([^\]]+)\]\]";											// HIDE simple wikilinks: [[article title]] to __WL1NK_O__article title__WL1NK_C__
	ArticleText = Regex.Replace(ArticleText, pattern, "__WL1NK_O__$1__WL1NK_C__");

	return ArticleText;
	}


//---------------------------< U N H I D E >------------------------------------------------------------------
//
// UNHIDE TEMPLATES: find templates and wikilinks that are hidden; replace the 'hide' keywords with the
// appropriate wiki markup
//

private string unhide (string ArticleText)
	{
	ArticleText = Regex.Replace(ArticleText, @"__WL1NK_O__", "[[");		// UNHIDE: replace __WL1NK_O__ with [[
	ArticleText = Regex.Replace(ArticleText, @"__WL1NK_C__", "]]");		// UNHIDE: replace __WL1NK_C__ with ]]
	ArticleText = Regex.Replace(ArticleText, @"__P1P3__", "|");			// UNHIDE: replace __P1P3__ with |

	ArticleText = Regex.Replace(ArticleText, @"__0CU!21Y__", "{");		// UNHIDE: replace __0CU!21Y__ with {
	ArticleText = Regex.Replace(ArticleText, @"__CCU!21Y__", "}");		// UNHIDE: replace __CCU!21Y__ with }

	ArticleText = Regex.Replace(ArticleText, @"__0P3N__", "{{");		// UNHIDE: replace __0P3N__ with {{
	ArticleText = Regex.Replace(ArticleText, @"__CL0S3__", "}}");		// UNHIDE: replace __CL0S3__ with }}

	return ArticleText;
	}


//---------------------------< E M P T Y _ P A R A M _ R E M O V E >------------------------------------------
//
// This function removes all empty named parameters from a template, attempting to leave what remains the same form.
//
// this is a multi-step process that attempts to handle most of the vagaries of how templates are written in
// wikitext.  In general there are three basic 'styles': horizontal – all parameters written on a single
// line of text, vertical – all parameter written singly one-to-a-line, and a mix of the two – multiple lines
// where each has one or more parameters.
//
// 1. where the parameter name & '=' are on one line and the value on a following line, put the value on the same line as the '='
// 2. for mixed, when empties are followed by new line; remove the empty but leave the newline
// 3. for any, empties are followed by pipe closing }; remove the empty but leave the | or }
// 4. the preceding steps can leave blank lines; remove the blank lines
//

private string empty_param_remove (string template, ref bool params_removed)
	{
	int init_len = template.Length;
	string pattern;
	
	pattern = @"(\|[^=]+=[ \t]*)[\r\n]+(?!\s*[\|\}])";					// parameter name & '=' on one line, value on a following line
	while (Regex.Match(template, pattern).Success)						// put them on the same line
		template = Regex.Replace(template, pattern, "$1");

	pattern = @"\|[^=]+=[ \t]*([\r\n]+)";								// empty followed by new line
	while (Regex.Match(template, pattern).Success)
		template = Regex.Replace(template, pattern, "$1");

	pattern = @"\|[^=]+=\s*([\|\}])";									// empty followed by pipe or at end of template
	while (Regex.Match(template, pattern).Success)
		template = Regex.Replace(template, pattern, "$1");
//--
	pattern = @"\|\s*archive";											// remove |url-status=<anything> when |url-archive= and |archive-date= not present
	if (!Regex.Match(template, pattern).Success)
		{
		pattern = @"\|\s*url-status\s*=\s*\b[^\|\}]*([\|\}])";
		template = Regex.Replace(template, pattern, "$1");
		}
	
	pattern = @"\|deadurl=y([\s\|\}])";									// remove reFill-added |deadurl=y
	if (!Regex.Match(template, pattern).Success)
		template = Regex.Replace(template, pattern, "$1");
//--
	if (false == params_removed)
		{
		if (2 < (init_len - template.Length))							// smallest cs1|2 param is |p= (3 chars)
			params_removed = true;
		}

	pattern = @"([\r\n]+)[ \t]*[\r\n]+";								// close up multiple new lines
	while (Regex.Match(template, pattern).Success)
		template = Regex.Replace(template, pattern, "$1");

	return template;
	}


// |last-author-amp= to |name-list-format=