Skip to content

P(X = x) for continuous data is problematic #6

@mhoehle

Description

@mhoehle

"P(X | \\hat{f}_{\\beta}) = \\prod_{\\alpha = 1}^{n} P(X_{\\alpha}|\\hat{f}_{\\beta}(X)), \\alpha = 1,\\ldots,n\n",

The notebook useses the P(X | ... ) notation, which I would interpret as the conditional probability of the data. However, linear models would typically be used for continuous response data where P(X_i = | ... ) is zero. Instead, one would use the densities, i.e. small p or f.

Furthermore, since a product is used, this implies that the observations are independent from each other. Hence, as written a little further down:

OLS: - assumes that the errors have a mean of zero, constant variance and are independent of eachother (no correlation in error).

Is incomplete, because the same was assumed for the ML approach.

Altogether, I find that the post a little confusion. As far as I know: For a Gaussian response distribution with KNOWN $\sigma$ the OLS and MLE should be identical. I fail to completely understand what the exact data generating mechanism is in the example due to a lot of code, but for a simple normal X_1,...,X_n \iid N(\mu, \sigma^2) there are explicit solutions available? As a suggestion: Maybe write the data generating mechanism clearer in math notation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions