Unlimited Access

Get Unlimited Contributor Access to the all ExamTopics Exams!
Take advantage of PDF Files for 1000+ Exams along with community discussions and pass IT Certification Exams Easily.

Get Unlimited Access

Databricks Discussions

Exam Certified Machine Learning Professional topic 1 question 41 discussion

Actual exam question from Databricks's Certified Machine Learning Professional

Question #: 41
Topic #: 1

[All Certified Machine Learning Professional Questions]

A machine learning engineer has developed a random forest model using scikit-learn, logged the model using MLflow as random_forest_model, and stored its run ID in the run_id Python variable. They now want to deploy that model by performing batch inference on a Spark DataFrame spark_df.
Which of the following code blocks can they use to create a function called predict that they can use to complete the task?

A.
B. It is not possible to deploy a scikit-learn model on a Spark DataFrame.
C.
D.
E.

Show Suggested Answer

Suggested Answer: D 🗳️

by BokNinja at Dec. 19, 2023, 1:46 a.m.

Comments

Submit Cancel

64934ca

1 day, 13 hours ago

Selected Answer: E

The spark session is passed as the first argument to mlflow.pyfunc.spark_udf to provide the necessary context for creating and executing the UDF within the Spark environment. The model_uri is passed as the second argument to specify which MLflow model to load and use for predictions. This order is required by the function's design to ensure proper integration with Spark.

upvoted 1 times

...

spaceexplorer

5 months ago

Selected Answer: E

E is correct

upvoted 3 times

...

JaydeepT

5 months, 1 week ago

Selected Answer: A

spark_df is the frame to be used for variable evaluation in runtime

upvoted 1 times

...

BokNinja

6 months, 3 weeks ago

E. import mlflow logged_model = 'runs:/e905f5759d434a131bbe1e54a2b/best-model' # Load model as a Spark UDF. loaded_model = mlflow.pyfunc.spark_udf(spark, model_uri=logged_model) # Predict on a Spark DataFrame. df.withColumn('predictions', loaded_model(*columns)).collect()

upvoted 2 times

victorcolome

5 months, 2 weeks ago

Must be A, not E, as the question states that the variable is called "spark_df".

upvoted 2 times

victorcolome

5 months, 2 weeks ago

My bad, it is E. Because the spark_udf function expects the SparkSession as first paramenter, not the DataFrame!

upvoted 4 times

...

Unlimited Access

Exam Certified Machine Learning Professional topic 1 question 41 discussion

Comments

64934ca

spaceexplorer

JaydeepT

BokNinja

victorcolome

victorcolome

Get IT Certification

New Version GCP Professional Cloud Architect Certificate & Helpful Information

The 5 Most In-Demand Project Management Certifications of 2019