Article - CS294545

Error "All learners failed to train" is reported when training a specific model in ThingWorx Analytics

Modified: 19-Mar-2024

Applies To

ThingWorx Analytics 8.3.1 to 9.5

Description

Following error is seen in Job Details after a specific training job failed:

Error training Learner [trainer=MultiGoalTrainer [Learner [trainer=SoloistModel []All learners [learners=[Learner [trainer=com.thingworx.analytics.training.neuralnet.SparkAnnTrainer@7688546, transformer=NeuralNetTransformerFactory [maxNumberOfMiningFields=15 useRedundancyFilter=true expanding=true]]]], transformer=com.thingworx.analytics.training.xforms.NormalizeTransformerFactory@5abe2624]], transformer=DoNothingTransformer []] Error training Learner [trainer=SoloistModel []All learners [learners=[Learner [trainer=com.thingworx.analytics.training.neuralnet.SparkAnnTrainer@7688546, transformer=NeuralNetTransformerFactory [maxNumberOfMiningFields=15 useRedundancyFilter=true expanding=true]]]], transformer=com.thingworx.analytics.training.xforms.NormalizeTransformerFactory@5abe2624] All learners failed to train. For support cases please provide this log tag: c6ab0554-a524-4a2c-a7a2-865dba986ac4

Worker.log file has got following errors

com.thingworx.analytics.training.TrainingFailedException: Error training Learner [trainer=MultiGoalTrainer ...
   at com.thingworx.analytics.training.Learner.internalTrainModel(Learner.java:109)
   at com.thingworx.analytics.training.Learner.trainForGoals(Learner.java:186)
   at com.thingworx.analytics.training.Learner.trainWithSpecificLookback(Learner.java:180)
   at com.thingworx.analytics.training.Learner.trainByParameters(Learner.java:128)
   at com.thingworx.analytics.training.core.TrainingRunner.run(TrainingRunner.java:94)Caused by: java.lang.RuntimeException: All learners failed to train.
   at com.thingworx.analytics.training.ensemble.AbstractEnsembleModel.executeTrainingOnLearners(AbstractEnsembleModel.java:81)
   at com.thingworx.analytics.training.ensemble.AbstractEnsembleModel.trainAllTrainers(AbstractEnsembleModel.java:100)
   at com.thingworx.analytics.training.ensemble.AbstractEnsembleModel.ensembleTrainModel(AbstractEnsembleModel.java:95)
   at com.thingworx.analytics.training.ensemble.SoloistEnsemble.trainModel(SoloistEnsemble.java:32)
   at com.thingworx.analytics.training.Learner.internalTrainModel(Learner.java:101)
   ... 19 common frames omitted

and also :

com.thingworx.analytics.training.TrainingFailedException: Error training Learner [trainer=com.thingworx.analytics.training.neuralnet.SparkAnnTrainer@29a88d64, transformer=NeuralNetTransformerFactory [maxNumberOfMiningFields=10 useRedundancyFilter=true expanding=true]]
   at com.thingworx.analytics.training.Learner.internalTrainModel(Learner.java:109)
   at com.thingworx.analytics.training.ensemble.AbstractEnsembleModel.lambda$executeTrainingOnLearners$0(AbstractEnsembleModel.java:72)
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 18.0 failed 1 times, most recent failure: Lost task 2.0 in stage 18.0 (TID 34, localhost, executor driver): java.lang.OutOfMemoryError: Java heap space
   at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
   at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
   at org.apache.spark.storage.BlockManager$$anonfun$19.apply(BlockManager.scala:1140)

Driver stacktrace:
   at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1499)
   at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1487)
...
Caused by: java.lang.OutOfMemoryError: Java heap space
   at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
   at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
   at org.apache.spark.storage.BlockManager$$anonfun$19.apply(BlockManager.scala:1140)

This is a printer-friendly version of Article 294545 and may be out of date. For the latest version click CS294545

Answers and online help

Support cases and reported issues

Ask the PTC Community

Product Support Home Page

Support services

Trying to solve an issue?

Download software

Licenses

Plan updates and upgrades

Proactive system scans and performance

PTC services and SaaS products

Product guides and references

Product Training

Developer guides and tools

Find a PTC partner

PTC Implementation Services

Answers and online help

Support cases and reported issues

Ask the PTC Community

Product Support Home Page

Support services

Trying to solve an issue?

Download software

Licenses

Plan updates and upgrades

Proactive system scans and performance

PTC services and SaaS products

Product guides and references

Product Training

Developer guides and tools

Find a PTC partner

PTC Implementation Services

Error "All learners failed to train" is reported when training a specific model in ThingWorx Analytics

Applies To

Description

Knowledge Base Access

Sign In

Sign In

Answers and online help

Support cases and reported issues

Product Support Home Page

Support services

Trying to solve an issue?

Licenses

Plan updates and upgrades

Proactive system scans and performance

PTC services and SaaS products

Product guides and references

Product Training

Developer guides and tools

Answers and online help

Support cases and reported issues

Product Support Home Page

Support services

Trying to solve an issue?

Licenses

Plan updates and upgrades

Proactive system scans and performance

PTC services and SaaS products

Product guides and references

Product Training

Developer guides and tools

Error "All learners failed to train" is reported when training a specific model in ThingWorx Analytics

Applies To

Description

Flagging content with inappropriate information will hide this article from public view for you and other customers, do you want to continue?

Knowledge Base Access

Sign In