public final class RDFUpdate extends MLUpdate<String>
MODEL_FILE_NAME
Constructor and Description |
---|
RDFUpdate(com.typesafe.config.Config config) |
Modifier and Type | Method and Description |
---|---|
org.dmg.pmml.PMML |
buildModel(org.apache.spark.api.java.JavaSparkContext sparkContext,
org.apache.spark.api.java.JavaRDD<String> trainData,
List<?> hyperParameters,
org.apache.hadoop.fs.Path candidatePath) |
double |
evaluate(org.apache.spark.api.java.JavaSparkContext sparkContext,
org.dmg.pmml.PMML model,
org.apache.hadoop.fs.Path modelParentPath,
org.apache.spark.api.java.JavaRDD<String> testData,
org.apache.spark.api.java.JavaRDD<String> trainData) |
List<HyperParamValues<?>> |
getHyperParameterValues() |
canPublishAdditionalModelData, getTestFraction, publishAdditionalModelData, runUpdate, splitNewDataToTrainTest
public List<HyperParamValues<?>> getHyperParameterValues()
getHyperParameterValues
in class MLUpdate<String>
HyperParamValues
per
hyperparameter. Different combinations of the values derived from the list will be
passed back into MLUpdate.buildModel(JavaSparkContext,JavaRDD,List,Path)
public org.dmg.pmml.PMML buildModel(org.apache.spark.api.java.JavaSparkContext sparkContext, org.apache.spark.api.java.JavaRDD<String> trainData, List<?> hyperParameters, org.apache.hadoop.fs.Path candidatePath)
buildModel
in class MLUpdate<String>
sparkContext
- active Spark ContexttrainData
- training data on which to build a modelhyperParameters
- ordered list of hyper parameter values to use in building modelcandidatePath
- directory where additional model files can be writtenPMML
representation of a model trained on the given datapublic double evaluate(org.apache.spark.api.java.JavaSparkContext sparkContext, org.dmg.pmml.PMML model, org.apache.hadoop.fs.Path modelParentPath, org.apache.spark.api.java.JavaRDD<String> testData, org.apache.spark.api.java.JavaRDD<String> trainData)
evaluate
in class MLUpdate<String>
sparkContext
- active Spark Contextmodel
- model to evaluatemodelParentPath
- directory containing model files, if applicabletestData
- data on which to test the model performancetrainData
- data on which model was trained, which can also be useful in evaluating
unsupervised learning problemsCopyright © 2014–2018. All rights reserved.