Article - CS294545

Error "All learners failed to train" is reported when training a specific model in ThingWorx Analytics 8.3

Modified: 10-Dec-2019   


Applies To

  • ThingWorx Analytics 8.3.1

Description

  • Following error is seen in Job Details after a specific training job failed:
Error training Learner [trainer=MultiGoalTrainer [Learner [trainer=SoloistModel []All learners [learners=[Learner [trainer=com.thingworx.analytics.training.neuralnet.SparkAnnTrainer@7688546, transformer=NeuralNetTransformerFactory [maxNumberOfMiningFields=15 useRedundancyFilter=true expanding=true]]]], transformer=com.thingworx.analytics.training.xforms.NormalizeTransformerFactory@5abe2624]], transformer=DoNothingTransformer []] Error training Learner [trainer=SoloistModel []All learners [learners=[Learner [trainer=com.thingworx.analytics.training.neuralnet.SparkAnnTrainer@7688546, transformer=NeuralNetTransformerFactory [maxNumberOfMiningFields=15 useRedundancyFilter=true expanding=true]]]], transformer=com.thingworx.analytics.training.xforms.NormalizeTransformerFactory@5abe2624] All learners failed to train. For support cases please provide this log tag: c6ab0554-a524-4a2c-a7a2-865dba986ac4
 
  • Worker.log file has got following errors
com.thingworx.analytics.training.TrainingFailedException: Error training Learner [trainer=MultiGoalTrainer ...
    at com.thingworx.analytics.training.Learner.internalTrainModel(Learner.java:109)
    at com.thingworx.analytics.training.Learner.trainForGoals(Learner.java:186)
    at com.thingworx.analytics.training.Learner.trainWithSpecificLookback(Learner.java:180)
    at com.thingworx.analytics.training.Learner.trainByParameters(Learner.java:128)
    at com.thingworx.analytics.training.core.TrainingRunner.run(TrainingRunner.java:94)Caused by: java.lang.RuntimeException: All learners failed to train.
    at com.thingworx.analytics.training.ensemble.AbstractEnsembleModel.executeTrainingOnLearners(AbstractEnsembleModel.java:81)
    at com.thingworx.analytics.training.ensemble.AbstractEnsembleModel.trainAllTrainers(AbstractEnsembleModel.java:100)
    at com.thingworx.analytics.training.ensemble.AbstractEnsembleModel.ensembleTrainModel(AbstractEnsembleModel.java:95)
    at com.thingworx.analytics.training.ensemble.SoloistEnsemble.trainModel(SoloistEnsemble.java:32)
    at com.thingworx.analytics.training.Learner.internalTrainModel(Learner.java:101)
    ... 19 common frames omitted
  • and also :
com.thingworx.analytics.training.TrainingFailedException: Error training Learner [trainer=com.thingworx.analytics.training.neuralnet.SparkAnnTrainer@29a88d64, transformer=NeuralNetTransformerFactory [maxNumberOfMiningFields=10 useRedundancyFilter=true expanding=true]]
    at com.thingworx.analytics.training.Learner.internalTrainModel(Learner.java:109)
    at com.thingworx.analytics.training.ensemble.AbstractEnsembleModel.lambda$executeTrainingOnLearners$0(AbstractEnsembleModel.java:72)
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 18.0 failed 1 times, most recent failure: Lost task 2.0 in stage 18.0 (TID 34, localhost, executor driver): java.lang.OutOfMemoryError: Java heap space
    at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
    at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
    at org.apache.spark.storage.BlockManager$$anonfun$19.apply(BlockManager.scala:1140)

Driver stacktrace:
    at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1499)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1487)
...
Caused by: java.lang.OutOfMemoryError: Java heap space
    at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
    at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
    at org.apache.spark.storage.BlockManager$$anonfun$19.apply(BlockManager.scala:1140)
This is a PDF version of Article CS294545 and may be out of date. For the latest version click https://www.ptc.com/en/support/article/CS294545