Talk:Tikhonov regularization/Archive 1

This is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page.

Archive 1

Least Squares Solution

"For α = 0 this reduces to the least squares solution of an overdetermined problem (m > n)."

This isn't correct. (A^TA)^-1 won't exist unless A has full column rank, and (m>n) does not imply that A has full column rank. The sentence should read "For α = 0 this reduces to the least squares solution provided that (A^TA)^-1 exists."

Isn't this equivalent to saying just that

A^{-1}

exists? —Simetrical (talk • contribs) 21:37, 14 August 2007 (UTC)

No: A doesn't have to be square. If A has full column rank then

A^{T}A

is invertible. --Zvika 06:28, 15 August 2007 (UTC)

Textbooks?

I take it that Tikhonov regularization is not discussed in text books yet? If it is, I would love a reference. 17:50, 16 October 2006 (UTC)

Yes of course. See some of the references in Inverse problem Billlion 08:22, 15 August 2007 (UTC)

explain what the variables represent

like the "m" and "q" in the bottom section. They just appear out of nowhere. Not very helpful. It's kinda like making up words adn randomly using them without explaining what they mean. Not very keaulnor. 65.26.249.208 (talk) 22:58, 9 December 2007 (UTC)

Also "and q is the rank of A" pops out of nowhere in Wiener filter formula (there is no q in the formula!) Merilius (talk) 08:26, 17 June 2008 (UTC)

There is indeed some woodoo to this technology. It requires some time spent on it. This is usually done when it is the last straw for a fitting problem. Consider you have to fit experimental data covering a bigger dynamic range and you're doing the approximation on a log scale. A normal least square fit prefers the bigger values and thus dumps the finer details. Tikhonov with a parameter lets you choose how the weight between smaller and bigger features is distributed. —Preceding unsigned comment added by 84.227.21.231 (talk) 16:40, 18 January 2009 (UTC)

Notation suggestions

It would be helpful to match notation here with the general linear regression page. Further, the reader is left to infer which symbol is a predictor, parameter, and response variable. This is also complicated by the traditional Ax=b notation used in systems of linear equations. — Preceding unsigned comment added by 24.151.151.137 (talk) 21:18, 21 December 2011 (UTC)

Generalized Tikhonov

The formula

x₀+ (A^TP A + Q)^-1A^TP(b- Ax₀)

is from Tarantola, eg (1.93) page 70. There are several other equivalent formulas.

Although the present article only treats linear inverse problems, Tikhonov regularization is widely used in nonlinear inverse problems. Under some conditions it can be shown that the regularized solution approximates the theoretical solution. See H.W. Engl, M Hanke, A Neubauer, Regularization of Inverse Problems, Springer 1996. — Preceding unsigned comment added by 187.40.219.120 (talk) 13:35, 26 May 2012 (UTC)

Description of conditioning needs more detail

The article says, "This demonstrates the effect of the Tikhonov parameter on the condition number of the regularized problem." I can't see how. I can see how it demonstrates the effect on the condition number of the matrix A; the condition number of the least squares problem is more subtle than that, as demonstrated briefly in Lecture 18 of Trefethen and Bau, and more fully in many other texts. If anyone knows the details, or at least where to look them up, could you please add them? It would be nice to state precisely what the effect is, for a start.

Actually, this seems to be an open research problem. See for instance Chu et al. 136.186.19.229 (talk) 01:05, 4 December 2012 (UTC)

Some examples?

I came across Tikhonov years ago in reading, and recently started using it. Would an examples section be appropriate? When I get time, I can type up some examples from image reconstruction, using weighted Fourier operators as the T matrix. From there, it would be simple to show how any particular type or location of details within a reconstructed image can be minimized or maximized via Tikhonov.

90.194.203.28 (talk) 13:17, 18 January 2013 (UTC)

Merge suggestion

I have added a {{mergefrom}} tag, suggesting that Bayesian linear regression is merged into here, since the content of that article viz. introducing a quadratic penalty on the size of the regression coefficients is exactly the same as that considered in this article, the only twist being that the quadratic penalty is interpreted as a multivariate Gaussian prior probability on the regression coefficients; and that having put the model into a Bayesian framework, Bayesian model selection can be used to fix the size of the trade-off parameter α.

(More on this can be found eg in David J. C. MacKay's 2003 book Information Theory, Inference, and Learning Algorithms, the relevant chapters being developed from his 1992 CalTech PhD thesis; the Bayesian linear regression article also cites a number of book sources.) Jheald (talk) 17:52, 11 November 2012 (UTC)

Minor comment -- why merge Bayesian linear regression in, Tikonov regularization is important enough in its own right and I would not confuse thing.165.124.167.130 (talk) 22:58, 8 January 2013 (UTC)

I also disagree, Tikhonov is not Bayesian anything and it would be confusing. The fact that Bayesians figure out ways to do the same calculation does not mean we have to rewrite everything under Bayesian lights.

In fact, I find surprising that the first section is Bayesian Interpretation moving to second place Generalized Tikhonov regularization. How is possible that stating the Bayesian translation of the method is more important than the very subject of the article which is Tikhonov? Viraltux (talk) 16:22, 18 March 2013 (UTC)

Relation to probabilistic formulation -- Dangling reference

The reference to $\alpha$ appears to be dangling. Csgehman (talk) 19:46, 6 January 2014 (UTC)

What is matrix A?

In the article, it is not clear what A, x, and b represent. In the case of linear regression, which of these represent the predictors, the outcomes, and the parameters? — Preceding unsigned comment added by 161.130.188.133 (talk) 17:15, 1 April 2014 (UTC)

Removal of SVD & Wiener filter information

Why were these sections removed? There does not appear to have been any discussion here, and they clearly contained useful information. If that information was redundant or should be somewhere else (though I don't see where that would be exactly) then there should be a link here, but there isn't. Here is the diff for the two edits that removed all this information. Unless someone has a good reason for removing these sections, I'd like to revert these egregious edits.

Caliprincess (talk) 20:30, 3 August 2013 (UTC)

Thank you for pointing this out, I completely agree. Reverted the edit now. Jotaf (talk) 19:24, 3 April 2014 (UTC)

Link between constrained optimization and filtering

The difference operator is high-pass, not low-pass (quick check: diff_t(exp(i*omaga*t)) = i*omega*exp(i*omaga*t); the higher omega, the higher the derivative). It is the Tichonov regularization process as a whole that is low-pass (i.e.: enforcing smoothness) with the finite difference operator. More generally, in penalized optimization, the constraints express what the result should not look like, so if one puts a high-pass as the operator, then the Tichonov system is low-pass and conversely. — Preceding unsigned comment added by 12.54.94.28 (talk) 17:24, 24 April 2014 (UTC)

"Known as ridge regression"

Isn't ridge regression the special case of L₂ regularization applied to regression analysis, rather than a synonym of general Tihkonov regularization? QVVERTYVS (hm?) 12:26, 27 January 2014 (UTC)

  This is also what my understanding is.  Support vector machine is also Tihkinov regularization, just with Hinge loss and L1 norm Robbieboy74 (talk) 02:01, 4 December 2014 (UTC)

"Lowpass operators"

Isn't a difference operator a highpass operator? MaigoAkisame (talk) 04:17, 16 December 2014 (UTC)

Bayesian interpretation section - are some definitions missing?

1. "the matrix $\Gamma$ seems rather arbitrary": is some specific choice of the matrix meant? 2. "the Tikhonov-regularized solution is the most probable..."; "the solution is the minimal unbiased estimator": with what value of $\Gamma$ ? — Preceding unsigned comment added by 78.56.151.70 (talk) 06:58, 2 June 2015 (UTC)

Multicollinearity

One use of ridge regression that is not mentioned here is as a remedy for approximate multicollinearity. This doesn't lead, exactly, to an ill-posted problem; however, it becomes worth adding some bias to reduce the variance PeterLFlomPhD (talk) 21:35, 20 July 2015 (UTC)

Derivation

Any hopes of a derivation similar to the one for the ordinary least squares problem found here: Linear least squares (mathematics)? It would be great to see how we go from here:

\|A\mathbf {x} -\mathbf {b} \|^{2}+\|\Gamma \mathbf {x} \|^{2}

to here:

{\hat {x}}=(A^{T}A+\Gamma ^{T}\Gamma )^{-1}A^{T}\mathbf {b}

At least a derivation in matrix notation would show how the smoke clears to give us such a simple form. Also, I suspect that this will show what conditions are necessary on the $\Gamma$ matrix to arrive at the solution. Thanks! Hamsterlopithecus (talk) 15:51, 3 November 2015 (UTC)

Well it is trivial. Just form a new matrix B with a block A above a block

\Gamma

and apply standard least squares to that Billlion (talk) 19:38, 26 May 2016 (UTC)

Low-pass and high-pass filter analogy

The article explains that the problem of Ax=b has the effect of a low pass filter and the reverse problem a high-pass filter. I think this needs more explanation. It is quite confusing as it stands without any explanation or references.-- 2001:630:12:2E31:E41C:F57D:906D:21DC (talk) 14:58, 18 May 2018 (UTC)MaxZhao

High- and low-pass filters?

"Most real-world phenomena have the effect of low-pass filters in the forward direction where {\displaystyle A} A maps {\displaystyle \mathbf {x} } \mathbf {x} to {\displaystyle \mathbf {b} } \mathbf {b} . Therefore, in solving the inverse-problem, the inverse mapping operates as a high-pass filter that has the undesirable tendency of amplifying noise"

It's not clear what this means. In mapping $\mathbf {A}$ to $\mathbf {b}$ , details of the functional relationship are drowned out by noise, leaving only general trends? Also, in the inverse problem, what is the signal that's being filtered? How could any kind of high-pass filter be at work in a linear model? There's no lower-frequency signal than a straight line.

Also:

"In other cases, high-pass operators may be used to enforce smoothness"

Wouldn't it be low-pass filters that do this? — Preceding unsigned comment added by Ise kin (talk • contribs) 18:49, 10 September 2018 (UTC)

over-fitted vs. overdetermined?

The article uses the term "over-fitted" in a way that seems to be opposite to how I would apply it in Machine Learning terminology: I would expect a solution to be "over-fitted" for an "underdetermined" problem (i.e. too few constraints/too little data for too many free parameters), while the article here suggests the opposite association.

Is this just me being confused, or would equating "over-fitted" with "underdetermined" make more sense (and vice versa)?

Mgroeber (talk) 13:52, 24 March 2016 (UTC)

I agree with your opinion too. Who should update this part? Priancho (talk) 02:29, 13 September 2016 (UTC)

Great points! I made the change before seeing your comment here, with the edit summary: "Underfitting happens in overdetermined systems, not underdetermined systems. As long as sufficient data exists to cause an overdetermined system, over/underfitting is a measure of model complexity relative to the underlying phenomena, rather than relative to the amount of data." While I think it's true that overdetermined usually means underfitting, it can also be overfitting as shown by https://cdn-images-1.medium.com/max/1600/1*9hPX9pAO3jqLrzt0IE3JzA.png (imagine a cubic model, which would still overdetermined with respect to the number of data points there, but overfitting because the underlying phenomena is a parabola.) J6he (talk) 20:20, 6 June 2019 (UTC)

Difference from Ridge Regression

According to https://stats.stackexchange.com/questions/234280/is-tikhonov-regularization-the-same-as-ridge-regression "Tikhonov regularizarization is a larger set than ridge regression." although in my experience the terms are used interchangeably. Wqwt (talk) 06:05, 15 April 2020 (UTC)

Inconsistent notation $A$ / $X$

It would be good if only one of either $A$ or $X$ was used throughout the article.

Have a look at section 6 "Determination of the Tikhonov factor" for example. It starts by defining the objective in terms of $X$ . It then goes on to use the SVD from the previous section, which was an SVD of $A$ , not $X$ .

--cfp (talk) 08:52, 5 June 2020 (UTC)

Note also that the first equation in that section is in terms of $y$ while the second is in terms of $b$ ! --cfp (talk) 09:36, 5 June 2020 (UTC)