New Year Special Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: cramtick70

Databricks-Machine-Learning-Associate Databricks Certified Machine Learning Associate Exam Questions and Answers

Questions 4

A data scientist has written a feature engineering notebook that utilizes the pandas library. As the size of the data processed by the notebook increases, the notebook's runtime is drastically increasing, but it is processing slowly as the size of the data included in the process increases.

Which of the following tools can the data scientist use to spend the least amount of time refactoring their notebook to scale with big data?

Options:

A.

PySpark DataFrame API

B.

pandas API on Spark

C.

Spark SQL

D.

Feature Store

Buy Now
Questions 5

A machine learning engineering team has a Job with three successive tasks. Each task runs a single notebook. The team has been alerted that the Job has failed in its latest run.

Which of the following approaches can the team use to identify which task is the cause of the failure?

Options:

A.

Run each notebook interactively

B.

Review the matrix view in the Job's runs

C.

Migrate the Job to a Delta Live Tables pipeline

D.

Change each Task’s setting to use a dedicated cluster

Buy Now
Questions 6

A data scientist has been given an incomplete notebook from the data engineering team. The notebook uses a Spark DataFrame spark_df on which the data scientist needs to perform further feature engineering. Unfortunately, the data scientist has not yet learned the PySpark DataFrame API.

Which of the following blocks of code can the data scientist run to be able to use the pandas API on Spark?

Options:

A.

import pyspark.pandas as ps

df = ps.DataFrame(spark_df)

B.

import pyspark.pandas as ps

df = ps.to_pandas(spark_df)

C.

spark_df.to_sql()

D.

import pandas as pd

df = pd.DataFrame(spark_df)

E.

spark_df.to_pandas()

Buy Now
Questions 7

A data scientist is wanting to explore summary statistics for Spark DataFrame spark_df. The data scientist wants to see the count, mean, standard deviation, minimum, maximum, and interquartile range (IQR) for each numerical feature.

Which of the following lines of code can the data scientist run to accomplish the task?

Options:

A.

spark_df.summary ()

B.

spark_df.stats()

C.

spark_df.describe().head()

D.

spark_df.printSchema()

E.

spark_df.toPandas()

Buy Now
Questions 8

A data scientist wants to parallelize the training of trees in a gradient boosted tree to speed up the training process. A colleague suggests that parallelizing a boosted tree algorithm can be difficult.

Which of the following describes why?

Options:

A.

Gradient boosting is not a linear algebra-based algorithm which is required for parallelization

B.

Gradient boosting requires access to all data at once which cannot happen during parallelization.

C.

Gradient boosting calculates gradients in evaluation metrics using all cores which prevents parallelization.

D.

Gradient boosting is an iterative algorithm that requires information from the previous iteration to perform the next step.

Buy Now
Questions 9

Which of the following machine learning algorithms typically uses bagging?

Options:

A.

Gradient boosted trees

B.

K-means

C.

Random forest

D.

Linear regression

E.

Decision tree

Buy Now
Questions 10

Which statement describes a Spark ML transformer?

Options:

A.

A transformer is an algorithm which can transform one DataFrame into another DataFrame

B.

A transformer is a hyperparameter grid that can be used to train a model

C.

A transformer chains multiple algorithms together to transform an ML workflow

D.

A transformer is a learning algorithm that can use a DataFrame to train a model

Buy Now
Questions 11

A data scientist has created two linear regression models. The first model uses price as a label variable and the second model uses log(price) as a label variable. When evaluating the RMSE of each model bycomparing the label predictions to the actual price values, the data scientist notices that the RMSE for the second model is much larger than the RMSE of the first model.

Which of the following possible explanations for this difference is invalid?

Options:

A.

The second model is much more accurate than the first model

B.

The data scientist failed to exponentiate the predictions in the second model prior tocomputingthe RMSE

C.

The datascientist failed to take the logof the predictions in the first model prior to computingthe RMSE

D.

The first model is much more accurate than the second model

E.

The RMSE is an invalid evaluation metric for regression problems

Buy Now
Questions 12

Which of the following tools can be used to parallelize the hyperparameter tuning process for single-node machine learning models using a Spark cluster?

Options:

A.

MLflow Experiment Tracking

B.

Spark ML

C.

Autoscaling clusters

D.

Autoscaling clusters

E.

Delta Lake

Buy Now
Questions 13

A health organization is developing a classification model to determine whether or not a patient currently has a specific type of infection. The organization's leaders want to maximize the number of positive cases identified by the model.

Which of the following classification metrics should be used to evaluate the model?

Options:

A.

RMSE

B.

Precision

C.

Area under the residual operating curve

D.

Accuracy

E.

Recall

Buy Now
Questions 14

Which of the following approaches can be used to view the notebook that was run to create an MLflow run?

Options:

A.

Open the MLmodel artifact in the MLflow run paqe

B.

Click the "Models" link in the row corresponding to the run in the MLflow experiment paqe

C.

Click the "Source" link in the row corresponding to the run in the MLflow experiment page

D.

Click the "Start Time" link in the row corresponding to the run in the MLflow experiment page

Buy Now
Questions 15

A data scientist has developed a linear regression model using Spark ML and computed the predictions in a Spark DataFrame preds_df with the following schema:

prediction DOUBLE

actual DOUBLE

Which of the following code blocks can be used to compute the root mean-squared-error of the model according to the data in preds_df and assign it to the rmse variable?

A)

B)

C)

D)

E)

Options:

A.

Option A

B.

Option B

C.

Option C

D.

Option D

E.

Option E

Buy Now
Questions 16

A data scientist has created a linear regression model that useslog(price)as a label variable. Using this model, they have performed inference and the predictions and actual label values are in Spark DataFramepreds_df.

They are using the following code block to evaluate the model:

regression_evaluator.setMetricName("rmse").evaluate(preds_df)

Which of the following changes should the data scientist make to evaluate the RMSE in a way that is comparable withprice?

Options:

A.

They should exponentiate the computed RMSE value

B.

They should take the log of the predictions before computing the RMSE

C.

They should evaluate the MSE of the log predictions to compute the RMSE

D.

They should exponentiate the predictions before computing the RMSE

Buy Now
Questions 17

A machine learning engineer has created a Feature Table new_table using Feature Store Client fs. When creating the table, they specified a metadata description with key information about the Feature Table. They now want to retrieve that metadata programmatically.

Which of the following lines of code will return the metadata description?

Options:

A.

There is no way to return the metadata description programmatically.

B.

fs.create_training_set("new_table")

C.

fs.get_table("new_table").description

D.

fs.get_table("new_table").load_df()

E.

fs.get_table("new_table")

Buy Now
Questions 18

Which of the following evaluation metrics is not suitable to evaluate runs in AutoML experiments for regression problems?

Options:

A.

F1

B.

R-squared

C.

MAE

D.

MSE

Buy Now
Questions 19

Which of the following machine learning algorithms typically uses bagging?

Options:

A.

IGradient boosted trees

B.

K-means

C.

Random forest

D.

Decision tree

Buy Now
Questions 20

A data scientist wants to use Spark ML to one-hot encode the categorical features in their PySpark DataFramefeatures_df. A list of the names of the string columns is assigned to theinput_columnsvariable.

They have developed this code block to accomplish this task:

The code block is returning an error.

Which of the following adjustments does the data scientist need to make to accomplish this task?

Options:

A.

They need to specify the method parameter to the OneHotEncoder.

B.

They need to remove the line with the fit operation.

C.

They need to use Stringlndexer prior to one-hot encodinq the features.

D.

They need to useVectorAssemblerprior to one-hot encoding the features.

Buy Now
Questions 21

A data scientist has developed a linear regression model using Spark ML and computed the predictions in a Spark DataFrame preds_df with the following schema:

prediction DOUBLE

actual DOUBLE

Which of the following code blocks can be used to compute the root mean-squared-error of the model according to the data in preds_df and assign it to the rmse variable?

A)

B)

C)

D)

Options:

A.

Option A

B.

Option B

C.

Option C

D.

Option D

Buy Now
Questions 22

What is the name of the method that transforms categorical features into a series of binary indicator feature variables?

Options:

A.

Leave-one-out encoding

B.

Target encoding

C.

One-hot encoding

D.

Categorical

E.

String indexing

Buy Now
Exam Name: Databricks Certified Machine Learning Associate Exam
Last Update: Dec 26, 2024
Questions: 74
Databricks-Machine-Learning-Associate pdf

Databricks-Machine-Learning-Associate PDF

$25.5  $84.99
Databricks-Machine-Learning-Associate Engine

Databricks-Machine-Learning-Associate Testing Engine

$30  $99.99
Databricks-Machine-Learning-Associate PDF + Engine

Databricks-Machine-Learning-Associate PDF + Testing Engine

$40.5  $134.99