Module talk:Delink

From WikiProjectMed
Jump to navigation Jump to search

Optimization of nested links

Hello,

When used many times in a page (several hundred times) this delink operation is quite long (several seconds), especially if text to delink is not short. To avoid the 10s limit for lua operation, I've tried to optimize it for French Wikipedia. My new version seams 4 to 10 times faster depending of string length.

Here are my modifications. I think the main effect comes from the way nested brackets are handled.

Please feel free to evaluate, criticize and/or use it.

Zebulon84 (talk) 09:07, 17 April 2014 (UTC)[reply]

That's a neat trick! It's certainly better than the character-by-character concatenation mess that I created. We should be able to increase the performance further by using the Lua string library rather than the mw.ustring library where possible. Using mw.ustring has the drawback of having to cross back and forth between Lua and PHP all the time, which reduces the performance by quite a bit. Also, your version has some regressions in dealing with interwiki links, but that should be fixed easily enough. I'll have a look and see if I can improve it. Best — Mr. Stradivarius ♪ talk ♪ 14:51, 17 April 2014 (UTC)[reply]
I've notice that calling a simple function from mw.ustring is also about 10 times slower than calling the same function from string, but I do not know what can be replace without risk, and to be able to mesure this difference I run the same function 1 million times, so I just keep mw.ustring for know.
I'd be glad to know the regressions. I've tried to deep the same result, except when I notice Mediawiki did not give the same result the delink function. My goal was to have the same text as seen on screen (even if there is brackets).
Zebulon84 (talk) 19:56, 17 April 2014 (UTC)[reply]
You can see the regressions in the test cases (Module talk:Delink/testcases). Not all of those tests pass with the main module, though, so beware. As for when to use the string library instead of the mw.ustring library, I got this very helpful reply by Anomie in February which made things much clearer for me. Basically, if we are only looking for ASCII text, then we should be fine to use the Lua string library functions. Also, another thing which can make things faster is to anchor your patterns where possible. For example, inside the delinkWikilink function you were doing a gsub of the pattern '%[%[.-%]%]'. In this case we know that the end two brackets are at the end of the string, but Lua doesn't know this, so it checks every possible location of both the starting brackets and the ending brackets to see if there is a match. If we anchor the string like '%[%[.-%]%]$', then Lua only has to check every possible location of the starting brackets, which is a lot quicker. I'm also wondering if we could make things more efficient by splitting the wikilink up into a table of different parts before processing each part, but that plan is only in the early stages yet. I'll report back here when I have some results. — Mr. Stradivarius ♪ talk ♪ 12:11, 18 April 2014 (UTC)[reply]
Thanks for all theses details.
I've applied your "use string library functions" modifications. I prefer the single quotes too so I've taken this part too.
I've analyzed the Unit tests to improve the results. I eventually understood how wiki decodes links, and have all this correct :
Just one question about your sanitizing : local function are quicker than function in a table. So why do you declare all the functions as part of the returned p table ? Their name starting with underscore show that you don't expect them to be used outside this module anyway.
Zebulon84 (talk) 16:51, 25 April 2014 (UTC)[reply]

delinkURL sometimes fails with "Tried to write global s_decoded" if used from a module with Module:No_globals

Hi. My apologies for using the wrong "edit template-protected". I know this is a Module, and that it is unprotected, but I think it's being used on many pages and I wasn't sure if I should make the edit myself.

My proposal is to add local to the s_decoded variable declaration. Specifically:

  • old:
 s_decoded = mw.text.decode(s, true)
  • new:
 local s_decoded = mw.text.decode(s, true)

Without the change, the call may fail from a module using require('Module:No globals').

An example of such a module is Module:HS listed building

An example of a failed invocation is as follows:

 {{#invoke:Gnosygnu|delink_test|[http://a.org b]}}

No results will be returned. Instead, the following error will be generated:

 Script error<!--Lua error: Tried to write global s_decoded.-->

Let me know if you need any other info. Thanks. gnosygnu 23:51, 19 July 2014 (UTC)[reply]

I updated Module:Delink/sandbox with the latest code and added "local" there. You can use the following to test the new result:

 {{#invoke:Gnosygnu|delink_sandbox|[http://a.org b]}}

Note that it returns "b" now, instead of "Script error" gnosygnu 23:58, 19 July 2014 (UTC)[reply]

Done Jackmcbarn (talk) 02:21, 20 July 2014 (UTC)[reply]

Help for writing code

Hello everyone, I need help on delinking wikilinks. The current template {{delink}} works like this:

  • {{delink|[[article]]}} returns article
  • {{delink|[[article|display name]]}} returns display name

Can someone with experience provide the code to make a template, let's call it X, so that

  • {{X|[[article]]}} returns article
  • {{X|[[article|display name]] returns article

That is, how can I get the target of the wikilink instead of the label? I guess we can use Module:String, 'cause I see many string-manipulating templates are based on it but I have no knowledge about Lua, so... Thank you in advance. Tran Xuan Hoa (talk) 23:24, 10 September 2016 (UTC)[reply]

I checked around on IRC to see if I could find someone who might be able to help you and it was suggested that I direct you to Mr. Stradivarius. I'm going to mark this as needing a specific user to answer. Cheers! --Cameron11598 (Talk) 05:26, 11 September 2016 (UTC)[reply]
@Tran Xuan Hoa: You can use :
  • wikicode : {{#invoke:String|replace|{{{1|}}}|%[%[ *([^%[%]{{!}}]+)[^%[%]]*%]%]|%1|plain=false}}
  • lua : article = article:gsub( '%[%[ *([^%[%]|]+)[^%[%]]*%]%]', '%1' )
--Zebulon84 (talk) 11:46, 24 September 2016 (UTC)[reply]

@Zebulon84: It worked. Actually I'm writing a template on my wiki. I myself was able to make up the code to achieve the same result but yours is more efficient. I will apply yours now. Thank you so much! Tran Xuan Hoa (talk) 13:39, 24 September 2016 (UTC)[reply]

Where is this line break coming from?

I am trying to delink text that starts with a pound sign (#), and I am getting unexpected results.

Foo bar {{delink|#SomethingNew}}

Biz Baz {{delink|Foo bar #SomethingNew}}

I am expecting:

Foo bar #SomethingNew

Biz Baz Foo bar #SomethingNew

Actual results:

Foo bar

  1. SomethingNew

Biz Baz Foo bar #SomethingNew

In the first example, there is a line break before the pound sign. Where is the extra line break coming from in the first example?

And yes, I know that the text contains no wikilinks; I am trying to strip wikilinks from all text in a template parameter (see {{YouTube/sandbox}} and Template:YouTube/testcases#Playlist) and need to ensure that delinking does not affect unlinked text that was working fine before my changes.– Jonesey95 (talk) 06:39, 5 January 2020 (UTC)[reply]

The module returns the correct text without a newline. However, something outside our control inserts a newline when a module returns text begining with certain characters, and one of the characters is #. See Template talk:Weather box#Spacing. Johnuniq (talk) 09:05, 5 January 2020 (UTC)[reply]
Additional unexpected lines are often related to parser bug T18700. A <nowiki/> before a template call can help for templates returning an HTML table.   Jts1882 | talk  09:28, 5 January 2020 (UTC)[reply]
This is expected behaviour and is nothing to do with tables. If the first non-whitespace character of a parameter is one of those used to generate list markup (: ; * #), then a list will be started. See H:T#Problems and workarounds. --Redrose64 🌹 (talk) 10:02, 5 January 2020 (UTC)[reply]
T14974 is the bug here. Anomie 14:31, 5 January 2020 (UTC)[reply]
Thanks, all. That's a strange one. I have added <nowiki/>, which seems to have done the trick. – Jonesey95 (talk) 15:43, 5 January 2020 (UTC)[reply]

Handling HTML line breaks

Hello!

On the sandbox I've made a small change which means that HTML line breaks (<br>, <br/>, <br />, etc.) are replaced by newline characters and thus treated in the same way as normal newlines.

I've added new test cases, and it doesn't seem to have broken any existing tests.

Thanks - odg (talk) 00:25, 18 August 2020 (UTC)[reply]

 Not done for now: It seems to remove newlines completely:
{{delink/sandbox|[http://www.example.com HTML line breaks] between<br>two [http://www.example.com links]}}
HTML line breaks between
two links
{{delink|[http://www.example.com HTML line breaks] between<br>two [http://www.example.com links]}}
HTML line breaks between
two links
Please try again. – Jonesey95 (talk) 02:52, 18 August 2020 (UTC)[reply]

A link with a question mark does not get delinked

A link with a question mark does not get delinked.

  • See simple example: {{Delink|[[Name?]]}} -> Name?
  • Real use case: {{Delink|[[What If...? (TV series)|What If...?]]}} -> What If...? Gonnym (talk) 10:10, 29 September 2021 (UTC)[reply]
@Gonnym I see that the issue was solved by adding a second condition at line 84. However question marks are getting matched at that line only because the pattern includes an invisible control character (U+007F). I assume it was added by mistake and it can be removed along with the second condition Sakretsu (talk) 14:56, 31 March 2024 (UTC)[reply]

Performance enhancement?

I have made a few changes to the /sandbox version

  1. delinkLinkClass now searches forward for the next '[' rather than one char at a time
  2. a check in _delink is made for the existence of '[' as Module:Delink is called often without any links to delink (eg 2018–19_UEFA_Europa_League_qualifying_phase_and_play-off_round_(Main_Path))
  3. in function getDelinkedLabel a check is made for the 'colon trick' - it will be the third byte or not at all

I believe this to be a helpful improvement Desb42 (talk) 07:01, 30 April 2022 (UTC)[reply]