アーティクル - CS320043
ThingWorx Analytics Server でトレーニング ジョブがエラー「org.apache.spark.SparkException: Job aborted due to stage failure」で失敗する
修正日: 22-Dec-2022
適用対象
- ThingWorx Analytics 8.3.3
説明
- 150 本の木と 24 の深さでランダム フォレスト学習器を実行すると、エラーで失敗する
SparkRandomForestTrainer でのトレーニングが失敗したようです。
org.apache.spark.SparkException: ステージの失敗によりジョブが中止されました: ステージ 1473.0 のタスク 0 が 1 回失敗しました。最近の失敗: ステージ 1473.0 でタスク 0.0 が失われました (TID 4111、localhost、executor ドライバー): ExecutorLostFailure (executor ドライバーが原因で終了しました)実行中のタスクの 1 つによって) 理由: エグゼキューターのハートビートが 235516 ミリ秒後にタイムアウトしました
ドライバースタックトレース:
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1499) で
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1487) で
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1486) で
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) で
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) で
org.apache.spark.scheduler.DAGScheduler.abortStage (DAGScheduler.scala:1486) で
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply (DAGScheduler.scala:814) で
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply (DAGScheduler.scala:814) で
scala.Option.foreach(Option.scala:257)で
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed (DAGScheduler.scala:814) で
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive (DAGScheduler.scala:1714) で
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive (DAGScheduler.scala:1669) で
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive (DAGScheduler.scala:1658) で
org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) で
org.apache.spark.scheduler.DAGScheduler.runJob (DAGScheduler.scala:630) で
org.apache.spark.SparkContext.runJob (SparkContext.scala:2022) で
org.apache.spark.SparkContext.runJob (SparkContext.scala:2043) で
org.apache.spark.SparkContext.runJob (SparkContext.scala:2062) で
org.apache.spark.SparkContext.runJob (SparkContext.scala:2087) で
org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:936) で
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) で
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) で
org.apache.spark.rdd.RDD.withScope(RDD.scala:362) で
org.apache.spark.rdd.RDD.collect(RDD.scala:935) で
org.apache.spark.rdd.PairRDDFunctions$$anonfun$collectAsMap$1.apply(PairRDDFunctions.scala:746) で
org.apache.spark.rdd.PairRDDFunctions$$anonfun$collectAsMap$1.apply(PairRDDFunctions.scala:745) で
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) で
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) で
org.apache.spark.rdd.RDD.withScope(RDD.scala:362) で
org.apache.spark.rdd.PairRDDFunctions.collectAsMap (PairRDDFunctions.scala:745) で
org.apache.spark.ml.tree.impl.RandomForest$.findBestSplits (RandomForest.scala:563) で
org.apache.spark.ml.tree.impl.RandomForest$.run(RandomForest.scala:198) で
org.apache.spark.mllib.tree.RandomForest.run (RandomForest.scala:94) で
org.apache.spark.mllib.tree.RandomForest$.trainRegressor (RandomForest.scala:218) で
org.apache.spark.mllib.tree.RandomForest$.trainRegressor (RandomForest.scala:258) で
org.apache.spark.mllib.tree.RandomForest$.trainRegressor (RandomForest.scala:274) で
org.apache.spark.mllib.tree.RandomForest.trainRegressor(RandomForest.scala) で
com.thingworx.analytics.training.trees.SparkRandomForestTrainer.train(SparkRandomForestTrainer.java:48) で
com.thingworx.analytics.training.trees.SparkRandomForestTrainer.train(SparkRandomForestTrainer.java:21) で
com.thingworx.analytics.training.trees.SparkTreeTrainer.trainModel (SparkTreeTrainer.java:63) で
com.thingworx.analytics.training.Learner.internalTrainModel (Learner.java:101) で
と
com.thingworx.analytics.training.TrainingFailedException: 学習者のトレーニング エラー [trainer=com.thingworx.analytics.training.trees.SparkRandomForestTrainer@65df41a、transformer=DecisionTreeTransformerFactory [maxNumberOfMiningFields=25 useRedundancyFilter=false expand=false]]
com.thingworx.analytics.training.Learner.internalTrainModel (Learner.java:109) で
com.thingworx.analytics.training.ensemble.AbstractEnsembleModel.lambda$executeTrainingOnLearners$0(AbstractEnsembleModel.java:72) で
java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) で
java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382) で
java.util.stream.AbstractPipeline.copyInto (AbstractPipeline.java:481) で
java.util.stream.AbstractPipeline.wrapAndCopyInto (AbstractPipeline.java:471) で
java.util.stream.AbstractPipeline.evaluate (AbstractPipeline.java:545) で
java.util.stream.AbstractPipeline.evaluateToArrayNode (AbstractPipeline.java:260) で
java.util.stream.ReferencePipeline.toArray (ReferencePipeline.java:438) で
com.thingworx.analytics.training.ensemble.AbstractEnsembleModel.executeTrainingOnLearners (AbstractEnsembleModel.java:78) で
com.thingworx.analytics.training.ensemble.AbstractEnsembleModel.trainAllTrainers (AbstractEnsembleModel.java:100) で
com.thingworx.analytics.training.ensemble.EliteAverageEnsembleModel.trainModel (EliteAverageEnsembleModel.java:60) で
com.thingworx.analytics.training.Learner.internalTrainModel (Learner.java:101) で
com.thingworx.analytics.training.MultiGoalTrainer.trainMultipleModels (MultiGoalTrainer.java:53) で
com.thingworx.analytics.training.MultiGoalTrainer.trainModel (MultiGoalTrainer.java:38) で
com.thingworx.analytics.training.Learner.internalTrainModel (Learner.java:101) で
com.thingworx.analytics.training.Learner.trainForGoals (Learner.java:186) で
com.thingworx.analytics.training.Learner.trainByParameters (Learner.java:133) で
org.apache.spark.SparkException: ステージの失敗によりジョブが中止されました: ステージ 1473.0 のタスク 0 が 1 回失敗しました。最近の失敗: ステージ 1473.0 でタスク 0.0 が失われました (TID 4111、localhost、executor ドライバー): ExecutorLostFailure (executor ドライバーが原因で終了しました)実行中のタスクの 1 つによって) 理由: エグゼキューターのハートビートが 235516 ミリ秒後にタイムアウトしました
ドライバースタックトレース:
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1499) で
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1487) で
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1486) で
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) で
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) で
org.apache.spark.scheduler.DAGScheduler.abortStage (DAGScheduler.scala:1486) で
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply (DAGScheduler.scala:814) で
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply (DAGScheduler.scala:814) で
scala.Option.foreach(Option.scala:257)で
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed (DAGScheduler.scala:814) で
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive (DAGScheduler.scala:1714) で
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive (DAGScheduler.scala:1669) で
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive (DAGScheduler.scala:1658) で
org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) で
org.apache.spark.scheduler.DAGScheduler.runJob (DAGScheduler.scala:630) で
org.apache.spark.SparkContext.runJob (SparkContext.scala:2022) で
org.apache.spark.SparkContext.runJob (SparkContext.scala:2043) で
org.apache.spark.SparkContext.runJob (SparkContext.scala:2062) で
org.apache.spark.SparkContext.runJob (SparkContext.scala:2087) で
org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:936) で
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) で
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) で
org.apache.spark.rdd.RDD.withScope(RDD.scala:362) で
org.apache.spark.rdd.RDD.collect(RDD.scala:935) で
org.apache.spark.rdd.PairRDDFunctions$$anonfun$collectAsMap$1.apply(PairRDDFunctions.scala:746) で
org.apache.spark.rdd.PairRDDFunctions$$anonfun$collectAsMap$1.apply(PairRDDFunctions.scala:745) で
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) で
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) で
org.apache.spark.rdd.RDD.withScope(RDD.scala:362) で
org.apache.spark.rdd.PairRDDFunctions.collectAsMap (PairRDDFunctions.scala:745) で
org.apache.spark.ml.tree.impl.RandomForest$.findBestSplits (RandomForest.scala:563) で
org.apache.spark.ml.tree.impl.RandomForest$.run(RandomForest.scala:198) で
org.apache.spark.mllib.tree.RandomForest.run (RandomForest.scala:94) で
org.apache.spark.mllib.tree.RandomForest$.trainRegressor (RandomForest.scala:218) で
org.apache.spark.mllib.tree.RandomForest$.trainRegressor (RandomForest.scala:258) で
org.apache.spark.mllib.tree.RandomForest$.trainRegressor (RandomForest.scala:274) で
org.apache.spark.mllib.tree.RandomForest.trainRegressor(RandomForest.scala) で
com.thingworx.analytics.training.trees.SparkRandomForestTrainer.train(SparkRandomForestTrainer.java:48) で
com.thingworx.analytics.training.trees.SparkRandomForestTrainer.train(SparkRandomForestTrainer.java:21) で
com.thingworx.analytics.training.trees.SparkTreeTrainer.trainModel (SparkTreeTrainer.java:63) で
com.thingworx.analytics.training.Learner.internalTrainModel (Learner.java:101) で
と
com.thingworx.analytics.training.TrainingFailedException: 学習者のトレーニング エラー [trainer=com.thingworx.analytics.training.trees.SparkRandomForestTrainer@65df41a、transformer=DecisionTreeTransformerFactory [maxNumberOfMiningFields=25 useRedundancyFilter=false expand=false]]
com.thingworx.analytics.training.Learner.internalTrainModel (Learner.java:109) で
com.thingworx.analytics.training.ensemble.AbstractEnsembleModel.lambda$executeTrainingOnLearners$0(AbstractEnsembleModel.java:72) で
java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) で
java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382) で
java.util.stream.AbstractPipeline.copyInto (AbstractPipeline.java:481) で
java.util.stream.AbstractPipeline.wrapAndCopyInto (AbstractPipeline.java:471) で
java.util.stream.AbstractPipeline.evaluate (AbstractPipeline.java:545) で
java.util.stream.AbstractPipeline.evaluateToArrayNode (AbstractPipeline.java:260) で
java.util.stream.ReferencePipeline.toArray (ReferencePipeline.java:438) で
com.thingworx.analytics.training.ensemble.AbstractEnsembleModel.executeTrainingOnLearners (AbstractEnsembleModel.java:78) で
com.thingworx.analytics.training.ensemble.AbstractEnsembleModel.trainAllTrainers (AbstractEnsembleModel.java:100) で
com.thingworx.analytics.training.ensemble.EliteAverageEnsembleModel.trainModel (EliteAverageEnsembleModel.java:60) で
com.thingworx.analytics.training.Learner.internalTrainModel (Learner.java:101) で
com.thingworx.analytics.training.MultiGoalTrainer.trainMultipleModels (MultiGoalTrainer.java:53) で
com.thingworx.analytics.training.MultiGoalTrainer.trainModel (MultiGoalTrainer.java:38) で
com.thingworx.analytics.training.Learner.internalTrainModel (Learner.java:101) で
com.thingworx.analytics.training.Learner.trainForGoals (Learner.java:186) で
com.thingworx.analytics.training.Learner.trainByParameters (Learner.java:133) で
最新バージョンはこちらを参照ください CS320043