Talk:Linear discriminant analysis

Psychology Low‑importance

	Psychology portal This article is within the scope of WikiProject Psychology, a collaborative effort to improve the coverage of Psychology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.PsychologyWikipedia:WikiProject PsychologyTemplate:WikiProject Psychologypsychology articles
Low	This article has been rated as Low-importance on the project's importance scale.

Robotics Mid‑importance

	This article is within the scope of WikiProject Robotics, a collaborative effort to improve the coverage of Robotics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.RoboticsWikipedia:WikiProject RoboticsTemplate:WikiProject RoboticsRobotics articles
Mid	This article has been rated as Mid-importance on the project's importance scale.
	This article has been marked as needing immediate attention.

Statistics Low‑importance

	This article is within the scope of WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.StatisticsWikipedia:WikiProject StatisticsTemplate:WikiProject StatisticsStatistics articles
Low	This article has been rated as Low-importance on the importance scale.

Special case in "LDA for two classes" is inconsistent. Also, is the definition of c necessary?

In the section "LDA for two classes", the "additional simplifying homoscedasticity assumption" is made so that $\Sigma _{0}=\Sigma _{1}=\Sigma$ . However, $\Sigma _{0}$ and $\Sigma _{1}$ appear in the equation for c. This should simplify if they are replaced with $\Sigma$ . Moreover, is the explicit expression for c even necessary? If T can be any number, then so can c. Hence, it would seem that all the constants added to T are unnecessary. That is, it would be equivalent if we just wrote ${\vec {w}}\cdot {\vec {x}}>T$ in place of ${\vec {w}}\cdot {\vec {x}}>c$ where $c=T+...$ — Preceding unsigned comment added by Llmmnnoopp (talk • contribs) 15:43, 6 February 2018 (UTC)[reply]

Is T usually 0? It disappears when the LDA is introduced. Unhandyandy (talk) 18:44, 10 March 2023 (UTC)[reply]

Notation

The arrow notation for vectors in this article looks really ugly. No statistics text would use that. In this article, it would be easy enough just to have a box containing definitions for each symbol. E.g. x,w,\mu = vectors ...

YPawitan 14:48, 10 December 2015 (UTC)[reply]

References

There is a problem with the references. The link to Martinez in the cited version mentions another year as the one that is found in the general reference list. This should be corrected.

Machine Learning vs Stats

changed 'nmachine learning' to statitics -- FLD was invented and used my statisticians a long time before all that ML nonsense!

---

That's true, but the wording said that it's currently used (rather than was developed in) the area called machine learning, so it was not an incorrect statement (not that I'm particularly bothered by the change, but a reader looking for related techniques would be better served by being referred to machine learning than to statistics in general).

BTW: I notice two references by H.Abdi have been added by user 129.110.8.39. Looking at this user's other edits, it seems as though a lot of other statistics based articles have been edited to refer to these references, leading me to believe that this is the author trying to publicise his/her books. Is there a wikipedia policy on this situation? My gut reaction would be to remove all of the references he added.

--Tcooke 02:30, 13 October 2006 (UTC)[reply]

Linear?

A few questions I had while learning about this technique that could be addressed here:

What is the significance of the word discriminant in this technique?
What about this technique is linear?

The problem to be solved is the discrimination between two classes of objects/events based on a number of measurements. The discriminant is a single variable which tries to capture all of the discriminating ability of these measurements. In this case, the discriminant function is a linear combination of the measurements.

--Tcooke 12:49, 22 July 2005 (UTC)[reply]

Implementation Details

I recently implemented Fisher's linear discriminant and found that internet resources (including wikipedia) were lacking in two respects

finding the threshold value $c$
finding the sign of ${\vec {w}}$

Most of the examples that I saw assumed that the data was centered about the origin required a zero threshold.

My solution for finding $c$ was to naively search for the best value for my training set. I'm sure that this approach does not give the best generalization - I would guess calculating the maximal margin would be better.

With regards to the sign;

$S={\frac {\sigma _{between}^{2}}{\sigma _{within}^{2}}}={\frac {({\vec {w}}\cdot {\vec {\mu }}_{y=1}-{\vec {w}}\cdot {\vec {\mu }}_{y=0})^{2}}{{\vec {w}}^{T}\Sigma _{y=1}{\vec {w}}+{\vec {w}}^{T}\Sigma _{y=0}{\vec {w}}}}={\frac {({\vec {w}}\cdot ({\vec {\mu }}_{y=1}-{\vec {\mu }}_{y=0}))^{2}}{{\vec {w}}^{T}(\Sigma _{y=0}+\Sigma _{y=1}){\vec {w}}}}$

does not contain any information about the direction of the separator. What is the best way find the direction when using this formulation?

Are implementation details for algorithms relevant to wikipedia articles? If so, I'm sure a short note on the page would add to its usefulness.

128.139.226.34 06:58, 7 June 2007 (UTC)[reply]

LDA for two classes

This is very well written. However, a little more definition of $\Sigma$ and $\Sigma ^{-1}$ might be nice. I realize they are mentioned as the "class covariances" but a formula or a ref would be great.

Also, the problem is stated as "find a good predictor for the class y .. given only an observation x." However, then the result is an enormous formula (QDA) or the simpler LDA. It would be nice to state the decision criterion in the original problem terms.

That is, the next-to-last sentence would be (I think!) something like: a sample x is from class 1 if p(x|y=1) or w * x < c. Maybe I'm wrong, but using the language of the original problem would be good.

dfrankow (talk) 21:49, 27 February 2008 (UTC)[reply]

Fisher Linear Discriminant VS LDA

My understanding is that Fisher's Linear Discriminant is the ONE dimensional space which it is best to project the data onto if you are trying to separate it into classes. LDA is a much different idea in that you are actually trying to find a hyperplane which divides the data. Can someone confirm or deny this? If it is correct, then I think Fisher's LD should just be mentioned, but should have a separate article.daviddoria (talk) 13:42, 2 October 2008 (UTC)[reply]

The hyperplane you mentioned is orthogonal to the one dimensional space on which to project the data. In other words, I perceive your two "different" statements of the problem to be essentially equivalent -- or at best "dual" statements of the problem. DavidMCEddy (talk) 21:41, 29 May 2016 (UTC)[reply]

it says "which does not make some of the assumptions of LDA such as normally distributed classes or equal class covariances". I think the first part might have to be removed, or what else do the covariance matrixes then refer to? —Preceding unsigned comment added by 89.150.104.58 (talk) 15:32, 21 April 2011 (UTC)[reply]

Multiclass LDA

The Section "Multiclass LDA" contains the following text

This means that when ${\vec {w}}$ is an eigenvector of $\Sigma ^{-1}\Sigma _{b}$ the separation will be equal to the corresponding eigenvalue. Since $\Sigma _{b}$ is of most rank C-1, then these non-zero eigenvectors identify a vector subspace containing the variability between features. These vectors are primarily used in feature reduction, as in PCA.

The text only says what separation the eigenvectors will produce (their corresponding eigenvalue), but it does not say if those separation values are in some way optimal.

It also does not say why the plane spanned by the eigenvectors is a good choice for the projection plane. —Preceding unsigned comment added by 217.229.206.43 (talk) 23:57, 10 July 2009 (UTC)[reply]

Robotics attention needed

Refs - inline need adding
Check content and structure
Reassess

Chaosdruid (talk) 11:10, 24 March 2012 (UTC)[reply]

Equation seems weird

~~The equation for two classes seems a bit strange to me:~~
$({\vec {x}}-{\vec {\mu }}_{0})^{T}\Sigma _{y=0}^{-1}({\vec {x}}-{\vec {\mu }}_{0})+\ln |\Sigma _{y=0}|-({\vec {x}}-{\vec {\mu }}_{1})^{T}\Sigma _{y=1}^{-1}({\vec {x}}-{\vec {\mu }}_{1})-\ln |\Sigma _{y=1}|\ <\ T$

~~Why are we summing from y=0 to y=-1? what does y=-1 even mean in this context? I thought the classes were labelled 0 and 1?~~

~~What do $\ln |\Sigma _{y=0}|$ and $\ln |\Sigma _{y=1}|$ even mean? What is being summed?~~

--Slashme (talk) 08:16, 27 August 2013 (UTC)[reply]

Edit: I now realise that the Σ here is the covariance, ~~but that article doesn't explain what the -1 means. I'd be interested to know the answer!~~ - Of course, the -1 is the transpose of the covariance matrix. --Slashme (talk) 08:26, 27 August 2013 (UTC)[reply]

NO: The -1 is the inverse not the transpose of the covariance matrix. DavidMCEddy (talk) 21:42, 29 May 2016 (UTC)[reply]

“Linear discriminant analysis” and “Discriminant function analysis”

What's the difference between “Linear discriminant analysis” and “Discriminant function analysis”? They look the same to me. DavidMCEddy (talk) 21:44, 29 May 2016 (UTC)[reply]

go ahead and merge. speaking as an expert who has proficiency with the aforementioned technique, try to keep as much of the content from the "Linear Discriminant Analysis" page, and just add the missing bits from "discriminant function analysis". there shouldn't be much to add. fyi: the "discriminant function analysis" page is much poorer in quality than the "discriminant analysis" page (imo). i didn't even know there was a discriminant function analysis page until you posted the merge request (thanks for that DavidMCEddy).174.3.155.181 (talk) 21:03, 31 May 2016 (UTC)[reply]

If someone else can take the lead in merging these articles, I can help. However, I doubt if I'll ever get the time to do it myself. DavidMCEddy (talk) 15:26, 1 June 2016 (UTC)[reply]

same. i wouldn't be opposed to deleting the discriminant function page entirely. it makes a lot of unusual claims, and has a lot of unreferenced ones too. maybe if you could put a tag up for deletion and see how it fares? 174.3.155.181 (talk) 17:17, 1 June 2016 (UTC)[reply]

Merged given the consensus for action, no opposition to a merge, and no formal deletion proposal (which I don't think would work, as the page contained appropriate references and there was some material not at the target). I may have moved more than I should, but am very happy for any overlap to be judiciously culled. Klbrain (talk) 16:12, 29 March 2018 (UTC)[reply]

External links modified

Hello fellow Wikipedians,

I have just modified one external link on Linear discriminant analysis. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:

Added archive https://web.archive.org/web/20150405124836/http://biostat.katerynakon.in.ua/en/prognosis/discriminant-analysis.html to http://www.biostat.katerynakon.in.ua/en/prognosis/discriminant-analysis.html

When you have finished reviewing my changes, you may follow the instructions on the template below to fix any issues with the URLs.

This message was posted before February 2018. After February 2018, "External links modified" talk page sections are no longer generated or monitored by InternetArchiveBot. No special action is required regarding these talk page notices, other than regular verification using the archive tool instructions below. Editors have permission to delete these "External links modified" talk page sections if they want to de-clutter talk pages, but see the RfC before doing mass systematic removals. This message is updated dynamically through the template {{source check}} (last update: 18 January 2022).

If you have discovered URLs which were erroneously considered dead by the bot, you can report them with this tool.
If you found an error with any archives or the URLs themselves, you can fix them with this tool.

Cheers.—InternetArchiveBot (Report bug) 10:03, 16 May 2017 (UTC)[reply]

LDA vs FLDA

There seems to be a kind of confusing mixture of usages of the word LDA. In the section "Fisher's linear discriminant" it says "terms Fisher's linear discriminant and LDA are often used interchangeably", however, as far as I am aware there are two related but distinct methods going on here. One of them, used in Hastie et al, and Murphy (and used in the definition for this article) just makes the assumption that the covariance matrices are the same in each class. The other, which Murphy refers to as Fisher LDA involves dimension reduction (which I think is described in Multiclass LDA). It would be nice if someone could clearly indicate the differences, and if there is some conflicting terminology that could be mentioned in the article — Preceding unsigned comment added by Pabnau (talk • contribs) 06:03, 13 February 2019 (UTC)[reply]

log of likelihoods

In the first, more general, formula for the log of the likelihood ratio, it seems that the probability density at a point is being used, rather than some cumularive probability like P(|Z|>|z|), for an appropriate z-score analog. Why is the density appropriate here? Doesn't that make it possible to get different results just by multiplying the x's by constants? Unhandyandy (talk) 18:53, 10 March 2023 (UTC)[reply]

"Assumptions" list includes non-assumptions

Specifically "Multicollinearity: Predictive power can decrease with an increased correlation between predictor variables" is not an assumption, it's simply an observation. It doesn't belong in the assumptions. Glenbarnett (talk) 07:54, 23 May 2023 (UTC)[reply]