P(X = x) for continuous data is problematic

https://github.com/GeostatsGuy/DataScienceInteractivePython/blob/adf0515484c587a900d6991d81c504e99171611f/Interactive_Model_Fitting.ipynb#L56

The notebook useses the P(X | ... ) notation, which I would interpret as the conditional probability of the data. However, linear models would typically be used for continuous response data where P(X_i = <value> | ... ) is zero. Instead, one would use the densities, i.e. small p or f. 

Furthermore, since a product is used, this implies that the observations are independent from each other. Hence, as written a little further down:

*OLS: - assumes that the errors have a mean of zero, constant variance and are independent of eachother (no correlation in error).*

Is incomplete, because the same was assumed for the ML approach.

Altogether, I find that the post a little confusion. As far as I know: For a Gaussian response distribution with KNOWN $\sigma$ the OLS and MLE should be identical. I fail to completely understand what the exact data generating mechanism is in the example due to a lot of code, but for a simple normal X_1,...,X_n \iid N(\mu, \sigma^2) there are explicit solutions available? As a suggestion: Maybe write the data generating mechanism clearer in math notation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

P(X = x) for continuous data is problematic #6

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

P(X = x) for continuous data is problematic #6

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions