New Year Special Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: cramtick70

Databricks-Certified-Professional-Data-Scientist Databricks Certified Professional Data Scientist Exam Questions and Answers

Questions 4

Select the correct statement which applies to Principal component analysis (PCA)

Options:

A.

Is a mathematical procedure that transforms a number of (possibly) correlated variables into a (smaller) number of uncorrelated variables.

B.

Is a mathematical procedure that transforms a number of (possibly) correlated variables into a (higher) number of uncorrelated variables

C.

Increase the dimensionality of the data set.

D.

1 and 3 are correct

E.

1 and 2 are correct

Buy Now
Questions 5

You are having 1000 patients' data with the height and age. Where age in years and height in meters. You wanted to create cluster using this two attributes. You wanted to have near equal effect for both the age and height while creating the cluster. What you can do?

Options:

A.

You will be adding height with the numeric value 100

B.

You will be converting each height value to centimeters

C.

You will be dividing both age and height with their respective standard deviation

D.

You will be taking square root of height

Buy Now
Questions 6

Suppose you have been given two Random Variables X and Y, whose joint distribution is already known, the marginal distribution of X is simply the probability distribution of X averaging over information about Y. It is the probability distribution of X when the value of Y is not known. So how do you calculate the marginal distribution of X

Options:

A.

This is typically calculated by summing the joint probability distribution over Y.

B.

This is typically calculated by integrating the joint probability distribution over Y

C.

This is typically calculated by summing (In case of discrete variable) the joint probability distribution over Y

D.

This is typically calculated by integrating(ln case of continuous variable) the joint probability distribution over Y.

Buy Now
Questions 7

Find out the classifier which assumes independence among all its features?

Options:

A.

Neural networks

B.

Linear Regression

C.

Naive Bayes

D.

Random forests

Buy Now
Questions 8

Which technique you would be using to solve the below problem statement? "What is the probability that individual customer will not repay the loan amount?"

Options:

A.

Classification

B.

Clustering

C.

Linear Regression

D.

Logistic Regression

E.

Hypothesis testing

Buy Now
Questions 9

What describes a true limitation of Logistic Regression method?

Options:

A.

It does not handle redundant variables well.

B.

It does not handle missing values well.

C.

It does not handle correlated variables well.

D.

It does not have explanatory values.

Buy Now
Questions 10

Select the correct problems which can be solved using SVMs

Options:

A.

SVMs are helpful in text and hypertext categorization

B.

Classification of images can also be performed using SVMs

C.

SVMs are also useful in medical science to classify proteins with up to 90% of the compounds classified correctly

D.

Hand-written characters can be recognized using SVM

Buy Now
Questions 11

Regularization is a very important technique in machine learning to prevent overfitting. Mathematically speaking, it adds a regularization term in order to prevent the coefficients to fit so perfectly to overfit. The difference between the L1 and L2 is...

Options:

A.

L2 is the sum of the square of the weights, while L1 is just the sum of the weights

B.

L1 is the sum of the square of the weights, while L2 is just the sum of the weights

C.

L1 gives Non-sparse output while L2 gives sparse outputs

D.

None of the above

Buy Now
Questions 12

Refer to image below

Options:

A.

Option A

B.

Option B

C.

Option C

D.

Option D

Buy Now
Questions 13

Which of the following are advantages of the Support Vector machines?

Options:

A.

Effective in high dimensional spaces.

B.

it is memory efficient

C.

possible to specify custom kernels

D.

Effective in cases where number of dimensions is greater than the number of samples

E.

Number of features is much greater than the number of samples, the method still give good performances

F.

SVMs directly provide probability estimates

Buy Now
Questions 14

Suppose you have been given a relatively high-dimension set of independent variables and you are asked to come up with a model that predicts one of Two possible outcomes like "YES" or "NO", then which of the following technique best fit.

Options:

A.

Support vector machines

B.

Naive Bayes

C.

Logistic regression

D.

Random decision forests

E.

All of the above

Buy Now
Questions 15

Select the correct statement which applies to logistic regression

Options:

A.

Computationally inexpensive, easy to implement knowledge representation easy to interpret

B.

May have low accuracy

C.

Works with Numeric values

D.

Only 1 and 3 are correct

E.

All 1, 2 and 3 are correct

Buy Now
Questions 16

Which of the following statement true with regards to Linear Regression Model?

Options:

A.

Ordinary Least Square can be used to estimates the parameters in linear model

B.

In Linear model, it tries to find multiple lines which can approximate the relationship between the outcome and input variables.

C.

Ordinary Least Square is a sum of the individual distance between each point and the fitted line of regression model.

D.

Ordinary Least Square is a sum of the squared individual distance between each point and the fitted line of regression model.

Buy Now
Questions 17

Suppose there are three events then which formula must always be equal to P(E1|E2,E3)?

Options:

A.

P(E1,E2,E3)P(E1)/P(E2:E3)

B.

P(E1,E2;E3)/P(E2,E3)

C.

P(E1,E2|E3)P(E2|E3)P(E3)

D.

P(E1,E2|E3)P(E3)

E.

P(E1,E2,E3)P(E2)P(E3)

Buy Now
Questions 18

A website is opened 3 times by a user. What is the probability of he clicks 2 times the advertisement, is best calculated by

Options:

A.

Binomial

B.

Poisson

C.

Normal

D.

Any of the above

Buy Now
Questions 19

A problem statement is given as below

Hospital records show that of patients suffering from a certain disease, 75% die of it. What is the probability that of 6 randomly selected patients, 4 will recover?

Which of the following model will you use to solve it.

Options:

A.

Binomial

B.

Poisson

C.

Normal

D.

Any of the above

Buy Now
Questions 20

Select the correct option from the below

Options:

A.

If you're trying to predict or forecast a target value^ then you need to look into supervised learning.

B.

If you've chosen supervised learning, with discrete target value like Yes/No. 1/2/3, A/B/C: or Red/Yellow/Black, then look into classification.

C.

If the target value can take on a number of values, say any value from 0.00 to 100.00, or -999 to 999: or +_to -_, then you need to look unsupervised learning

D.

If you're not trying to predict a target value, then you need to look into unsupervised learning

E.

Are you trying to fit your data into some discrete groups? If so and that's all you need, you should look into clustering.

Buy Now
Exam Name: Databricks Certified Professional Data Scientist Exam
Last Update: Dec 26, 2024
Questions: 138
Databricks-Certified-Professional-Data-Scientist pdf

Databricks-Certified-Professional-Data-Scientist PDF

$25.5  $84.99
Databricks-Certified-Professional-Data-Scientist Engine

Databricks-Certified-Professional-Data-Scientist Testing Engine

$30  $99.99
Databricks-Certified-Professional-Data-Scientist PDF + Engine

Databricks-Certified-Professional-Data-Scientist PDF + Testing Engine

$40.5  $134.99