-
Notifications
You must be signed in to change notification settings - Fork 916
Description
@jadianes Nice tutorial on Logistic Regression, thankyou.
I ran the tutorial on Spark 1.6.2 and 2.1.0 - both ran fine and I could repeat your results perfectly in 1.6.2, but I would like to offer the following observation re 2.1.0. In 2.1.0 the process takes about 3 times longer to run and produces a different answer than that produced by 1.6.2. I thought this was strange and found that in the list of Spark tasks 2.1.0 was calling a non-LBFGS algorithm. I raised this issue in a JIRA question (https://issues.apache.org/jira/browse/SPARK-16768). It seems that even though a user can import the LBFGS version into pyspark and you can call help on it and actually call it, I don't think it is actually an LBFGS version.
http://spark.apache.org/docs/latest/mllib-optimization.html has some other information on LBFGS in Spark.
Later when 2.1.0 becomes the standard your readers may find that they don't get your results for accuracy. Or maybe I just missed something, can anyone confirm my observations?