Welcome to ExamTopics
ExamTopics Logo
- Expert Verified, Online, Free.

Unlimited Access

Get Unlimited Contributor Access to the all ExamTopics Exams!
Take advantage of PDF Files for 1000+ Exams along with community discussions and pass IT Certification Exams Easily.

Exam Certified Machine Learning Professional topic 1 question 41 discussion

Actual exam question from Databricks's Certified Machine Learning Professional
Question #: 41
Topic #: 1
[All Certified Machine Learning Professional Questions]

A machine learning engineer has developed a random forest model using scikit-learn, logged the model using MLflow as random_forest_model, and stored its run ID in the run_id Python variable. They now want to deploy that model by performing batch inference on a Spark DataFrame spark_df.
Which of the following code blocks can they use to create a function called predict that they can use to complete the task?

  • A.
  • B. It is not possible to deploy a scikit-learn model on a Spark DataFrame.
  • C.
  • D.
  • E.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment (?) , you can switch to a simple comment.
Switch to a voting comment New
64934ca
1 day, 13 hours ago
Selected Answer: E
The spark session is passed as the first argument to mlflow.pyfunc.spark_udf to provide the necessary context for creating and executing the UDF within the Spark environment. The model_uri is passed as the second argument to specify which MLflow model to load and use for predictions. This order is required by the function's design to ensure proper integration with Spark.
upvoted 1 times
...
spaceexplorer
5 months ago
Selected Answer: E
E is correct
upvoted 3 times
...
JaydeepT
5 months, 1 week ago
Selected Answer: A
spark_df is the frame to be used for variable evaluation in runtime
upvoted 1 times
...
BokNinja
6 months, 3 weeks ago
E. import mlflow logged_model = 'runs:/e905f5759d434a131bbe1e54a2b/best-model' # Load model as a Spark UDF. loaded_model = mlflow.pyfunc.spark_udf(spark, model_uri=logged_model) # Predict on a Spark DataFrame. df.withColumn('predictions', loaded_model(*columns)).collect()
upvoted 2 times
victorcolome
5 months, 2 weeks ago
Must be A, not E, as the question states that the variable is called "spark_df".
upvoted 2 times
victorcolome
5 months, 2 weeks ago
My bad, it is E. Because the spark_udf function expects the SparkSession as first paramenter, not the DataFrame!
upvoted 4 times
...
...
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
ex Want to SAVE BIG on Certification Exam Prep?
close
ex Unlock All Exams with ExamTopics Pro 75% Off
  • arrow Choose From 1000+ Exams
  • arrow Access to 10 Exams per Month
  • arrow PDF Format Available
  • arrow Inline Discussions
  • arrow No Captcha/Robot Checks
Limited Time Offer
Ends in