User:Sun Creator/Avoid domains and URLs

From WikiProjectMed
Jump to navigation Jump to search

Generic code

Add the following to any RegEx rule to generically avoid matching to correctly formed domains and URLs.

(?![^\s\.]*\.\w)(?<!\.[^\s\.]{0,999})

Tests

On '\b' start and end of every word for high numerical occurrence
  • \b(?![^\s\.]*\.\w)(?<!\.[^\s\.]*) 491ms
  • \b(?<!\.[^\s\.]*)(?![^\s\.]*\.\w) 493ms //Somewhat unexpected given the results for 'a' words below.
  • \b(?<!\.[^\s\.]*) 408ms
  • \b(?![^\s\.]*\.\w) 385ms
  • \b 319ms
Word with 'a' on boundary
  • \ba(?![^\s\.]*\.\w)(?<!\.[^\s\.]*) 24ms
  • \ba(?<!\.[^\s\.]*)(?![^\s\.]*\.\w) 22ms
  • \ba(?![^\s\.]*\.\w) 20ms
  • \ba(?<!\.[^\s\.]*) 18ms
  • \ba 17ms
Realistic test, 'state' occurs six times in article during test.
  • \bstate\b(?![^\s\.]*\.\w)(?<!\.[^\s\.]*) 2ms
  • \bstate\b(?<!\.[^\s\.]*)(?![^\s\.]*\.\w) 2ms
  • \bstate\b(?<!\.[^\s\.]*) 2ms
  • \bstate\b(?![^\s\.]*\.\w) 2ms
  • \bstate\b 2ms