Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Databricks Certified Associate Developer for Apache Spark 3.0 Exam Questions and Answers

Questions 4

The code block displayed below contains an error. The code block below is intended to add a column itemNameElements to DataFrame itemsDf that includes an array of all words in column

itemName. Find the error.

Sample of DataFrame itemsDf:

1.+------+----------------------------------+-------------------+

2.|itemId|itemName |supplier |

3.+------+----------------------------------+-------------------+

4.|1 |Thick Coat for Walking in the Snow|Sports Company Inc.|

5.|2 |Elegant Outdoors Summer Dress |YetiX |

6.|3 |Outdoors Backpack |Sports Company Inc.|

7.+------+----------------------------------+-------------------+

Code block:

itemsDf.withColumnRenamed("itemNameElements", split("itemName"))

Options:

All column names need to be wrapped in the col() operator.

Operator withColumnRenamed needs to be replaced with operator withColumn and a second argument "," needs to be passed to the split method.

Operator withColumnRenamed needs to be replaced with operator withColumn and the split method needs to be replaced by the splitString method.

Operator withColumnRenamed needs to be replaced with operator withColumn and a second argument " " needs to be passed to the split method.

The expressions "itemNameElements" and split("itemName") need to be swapped.

Buy Now

Questions 5

Which of the following code blocks reads in the two-partition parquet file stored at filePath, making sure all columns are included exactly once even though each partition has a different schema?

Schema of first partition:

1.root

2. |-- transactionId: integer (nullable = true)

3. |-- predError: integer (nullable = true)

4. |-- value: integer (nullable = true)

5. |-- storeId: integer (nullable = true)

6. |-- productId: integer (nullable = true)

7. |-- f: integer (nullable = true)

Schema of second partition:

1.root

2. |-- transactionId: integer (nullable = true)

3. |-- predError: integer (nullable = true)

4. |-- value: integer (nullable = true)

5. |-- storeId: integer (nullable = true)

6. |-- rollId: integer (nullable = true)

7. |-- f: integer (nullable = true)

8. |-- tax_id: integer (nullable = false)

Options:

spark.read.parquet(filePath, mergeSchema='y')

spark.read.option("mergeSchema", "true").parquet(filePath)

spark.read.parquet(filePath)

1.nx = 0

2.for file in dbutils.fs.ls(filePath):

3. if not file.name.endswith(".parquet"):

4. continue

5. df_temp = spark.read.parquet(file.path)

6. if nx == 0:

7. df = df_temp

8. else:

9. df = df.union(df_temp)

10. nx = nx+1

11.df

1.nx = 0

2.for file in dbutils.fs.ls(filePath):

3. if not file.name.endswith(".parquet"):

4. continue

5. df_temp = spark.read.parquet(file.path)

6. if nx == 0:

7. df = df_temp

8. else:

9. df = df.join(df_temp, how="outer")

10. nx = nx+1

11.df

Buy Now

Answer:

Explanation:

Explanation

This is a very tricky QUESTION NO: and involves both knowledge about merging as well as schemas when reading parquet files.

spark.read.option("mergeSchema", "true").parquet(filePath)

Correct. Spark's DataFrameReader's mergeSchema option will work well here, since columns that appear in both partitions have matching data types. Note that mergeSchema would fail if one or

more columns with the same name that appear in both partitions would have different data types.

spark.read.parquet(filePath)

Incorrect. While this would read in data from both partitions, only the schema in the parquet file that is read in first would be considered, so some columns that appear only in the second partition

(e.g. tax_id) would be lost.

nx = 0

for file in dbutils.fs.ls(filePath):

if not file.name.endswith(".parquet"):

continue

df_temp = spark.read.parquet(file.path)

if nx == 0:

df = df_temp

else:

df = df.union(df_temp)

nx = nx+1

Wrong. The key idea of this solution is the DataFrame.union() command. While this command merges all data, it requires that both partitions have the exact same number of columns with identical

data types.

spark.read.parquet(filePath, mergeSchema="y")

False. While using the mergeSchema option is the correct way to solve this problem and it can even be called with DataFrameReader.parquet() as in the code block, it accepts the value True as a

boolean or string variable. But 'y' is not a valid option.

nx = 0

for file in dbutils.fs.ls(filePath):

if not file.name.endswith(".parquet"):

continue

df_temp = spark.read.parquet(file.path)

if nx == 0:

df = df_temp

else:

df = df.join(df_temp, how="outer")

nx = nx+1

No. This provokes a full outer join. While the resulting DataFrame will have all columns of both partitions, columns that appear in both partitions will be duplicated - the QUESTION NO: says all

columns that

are included in the partitions should appear exactly once.

More info: Merging different schemas in Apache Spark | by Thiago Cordon | Data Arena | Medium

Static notebook | Dynamic notebook: See test 3, QUESTION NO: 37 (Databricks import instructions)

Questions 6

Which of the following code blocks generally causes a great amount of network traffic?

Options:

DataFrame.select()

DataFrame.coalesce()

DataFrame.collect()

DataFrame.rdd.map()

DataFrame.count()

Buy Now

Questions 7

Which of the following code blocks returns about 150 randomly selected rows from the 1000-row DataFrame transactionsDf, assuming that any row can appear more than once in the returned

DataFrame?

Options:

transactionsDf.resample(0.15, False, 3142)

transactionsDf.sample(0.15, False, 3142)

transactionsDf.sample(0.15)

transactionsDf.sample(0.85, 8429)

transactionsDf.sample(True, 0.15, 8261)

Buy Now

Answer:

Explanation:

Explanation

Answering this QUESTION NO: correctly depends on whether you understand the arguments to the DataFrame.sample() method (link to the documentation below). The arguments are as follows:

DataFrame.sample(withReplacement=None, fraction=None, seed=None).

The first argument withReplacement specified whether a row can be drawn from the DataFrame multiple times. By default, this option is disabled in Spark. But we have to enable it here, since the question asks for a row being able to appear more than once. So, we need to pass True for this argument.

About replacement: "Replacement" is easiest explained with the example of removing random items from a box. When you remove those "with replacement" it means that after you have taken an

item out of the box, you put it back inside. So, essentially, if you would randomly take 10 items out of a box with 100 items, there is a chance you take the same item twice or more times. "Without

replacement" means that you would not put the item back into the box after removing it. So, every time you remove an item from the box, there is one less item in the box and you can never take the

same item twice.

The second argument to the withReplacement method is fraction. This referes to the fraction of items that should be returned. In the QUESTION NO: we are asked for 150 out of 1000 items – a

fraction of 0.15.

The last argument is a random seed. A random seed makes a randomized processed repeatable. This means that if you would re-run the same sample() operation with the same random seed, you

would get the same rows returned from the sample() command. There is no behavior around the random seed specified in the question. The varying random seeds are only there to confuse you!

More info: pyspark.sql.DataFrame.sample — PySpark 3.1.1 documentation

Static notebook | Dynamic notebook: See test 1, QUESTION NO: 49 (Databricks import instructions)

Questions 8

The code block shown below should return the number of columns in the CSV file stored at location filePath. From the CSV file, only lines should be read that do not start with a # character. Choose

the answer that correctly fills the blanks in the code block to accomplish this.

Code block:

__1__(__2__.__3__.csv(filePath, __4__).__5__)

Options:

1. size

2. spark

3. read()

4. escape='#'

5. columns

1. DataFrame

2. spark

3. read()

4. escape='#'

5. shape[0]

1. len

2. pyspark

3. DataFrameReader

4. comment='#'

5. columns

1. size

2. pyspark

3. DataFrameReader

4. comment='#'

5. columns

1. len

2. spark

3. read

4. comment='#'

5. columns

Buy Now

Answer:

Explanation:

Explanation

Correct code block:

len(spark.read.csv(filePath, comment='#').columns)

This is a challenging QUESTION NO: with difficulties in an unusual context: The boundary between DataFrame and the DataFrameReader. It is unlikely that a QUESTION NO: of this difficulty level

appears in the

exam. However, solving it helps you get more comfortable with the DataFrameReader, a subject you will likely have to deal with in the exam.

Before dealing with the inner parentheses, it is easier to figure out the outer parentheses, gaps 1 and 5. Given the code block, the object in gap 5 would have to be evaluated by the object in gap 1,

returning the number of columns in the read-in CSV. One answer option includes DataFrame in gap 1 and shape[0] in gap 2. Since DataFrame cannot be used to evaluate shape[0], we can discard

this answer option.

Other answer options include size in gap 1. size() is not a built-in Python command, so if we use it, it would have to come from somewhere else. pyspark.sql.functions includes a size() method, but

this method only returns the length of an array or map stored within a column (documentation linked below). So, using a size() method is not an option here. This leaves us with two potentially valid

answers.

We have to pick between gaps 2 and 3 being spark.read or pyspark.DataFrameReader. Looking at the documentation (linked below), the DataFrameReader is actually a child class of pyspark.sql,

which means that we cannot import it using pyspark.DataFrameReader. Moreover, spark.read makes sense because on Databricks, spark references current Spark session

(pyspark.sql.SparkSession) and spark.read therefore returns a DataFrameReader (also see documentation below). Finally, there is only one correct answer option remaining.

More info:

- pyspark.sql.functions.size — PySpark 3.1.2 documentation

- pyspark.sql.DataFrameReader.csv — PySpark 3.1.2 documentation

- pyspark.sql.SparkSession.read — PySpark 3.1.2 documentation

Static notebook | Dynamic notebook: See test 3, QUESTION NO: 50 (Databricks import instructions)

Questions 9

Which of the following code blocks returns a one-column DataFrame of all values in column supplier of DataFrame itemsDf that do not contain the letter X? In the DataFrame, every value should

only be listed once.

Sample of DataFrame itemsDf:

1.+------+--------------------+--------------------+-------------------+

3.+------+--------------------+--------------------+-------------------+

7.+------+--------------------+--------------------+-------------------+

Options:

itemsDf.filter(col(supplier).not_contains('X')).select(supplier).distinct()

itemsDf.select(~col('supplier').contains('X')).distinct()

itemsDf.filter(not(col('supplier').contains('X'))).select('supplier').unique()

itemsDf.filter(~col('supplier').contains('X')).select('supplier').distinct()

itemsDf.filter(!col('supplier').contains('X')).select(col('supplier')).unique()

Buy Now

Questions 10

Which of the following is a characteristic of the cluster manager?

Options:

Each cluster manager works on a single partition of data.

The cluster manager receives input from the driver through the SparkContext.

The cluster manager does not exist in standalone mode.

The cluster manager transforms jobs into DAGs.

In client mode, the cluster manager runs on the edge node.

Buy Now

Questions 11

The code block displayed below contains multiple errors. The code block should return a DataFrame that contains only columns transactionId, predError, value and storeId of DataFrame

transactionsDf. Find the errors.

Code block:

transactionsDf.select([col(productId), col(f)])

Sample of transactionsDf:

1.+-------------+---------+-----+-------+---------+----+

3.+-------------+---------+-----+-------+---------+----+

4.| 1| 3| 4| 25| 1|null|

5.| 2| 6| 7| 2| 2|null|

6.| 3| 3| null| 25| 3|null|

7.+-------------+---------+-----+-------+---------+----+

Options:

The column names should be listed directly as arguments to the operator and not as a list.

The select operator should be replaced by a drop operator, the column names should be listed directly as arguments to the operator and not as a list, and all column names should be expressed

as strings without being wrapped in a col() operator.

The select operator should be replaced by a drop operator.

The column names should be listed directly as arguments to the operator and not as a list and following the pattern of how column names are expressed in the code block, columns productId and

f should be replaced by transactionId, predError, value and storeId.

The select operator should be replaced by a drop operator, the column names should be listed directly as arguments to the operator and not as a list, and all col() operators should be removed.

Buy Now

Answer:

Explanation:

Explanation

Correct code block: transactionsDf.drop("productId", "f")

This QUESTION NO: requires a lot of thinking to get right. For solving it, you may take advantage of the digital notepad that is provided to you during the test. You have probably seen that the code

block

includes multiple errors. In the test, you are usually confronted with a code block that only contains a single error. However, since you are practicing here, this challenging multi-error QUESTION

NO: will

make it easier for you to deal with single-error questions in the real exam.

The select operator should be replaced by a drop operator, the column names should be listed directly as arguments to the operator and not as a list, and all column names should be expressed as

strings without being wrapped in a col() operator.

Correct! Here, you need to figure out the many, many things that are wrong with the initial code block. While the QUESTION NO: can be solved by using a select statement, a drop statement, given

the

answer options, is the correct one. Then, you can read in the documentation that drop does not take a list as an argument, but just the column names that should be dropped. Finally, the column

names should be expressed as strings and not as Python variable names as in the original code block.

The column names should be listed directly as arguments to the operator and not as a list.

Incorrect. While this is a good first step and part of the correct solution (see above), this modification is insufficient to solve the question.

The column names should be listed directly as arguments to the operator and not as a list and following the pattern of how column names are expressed in the code block, columns productId and f

should be replaced by transactionId, predError, value and storeId.

Wrong. If you use the same pattern as in the original code block (col(productId), col(f)), you are still making a mistake. col(productId) will trigger Python to search for the content of a variable named

productId instead of telling Spark to use the column productId - for that, you need to express it as a string.

The select operator should be replaced by a drop operator, the column names should be listed directly as arguments to the operator and not as a list, and all col() operators should be removed.

No. This still leaves you with Python trying to interpret the column names as Python variables (see above).

The select operator should be replaced by a drop operator.

Wrong, this is not enough to solve the question. If you do this, you will still face problems since you are passing a Python list to drop and the column names are still interpreted as Python variables

(see above).

More info: pyspark.sql.DataFrame.drop — PySpark 3.1.2 documentation

Static notebook | Dynamic notebook: See test 3, QUESTION NO: 30 (Databricks import instructions)

Questions 12

Which of the following code blocks reads in the JSON file stored at filePath as a DataFrame?

Options:

spark.read.json(filePath)

spark.read.path(filePath, source="json")

spark.read().path(filePath)

spark.read().json(filePath)

spark.read.path(filePath)

Buy Now

Questions 13

The code block shown below should return an exact copy of DataFrame transactionsDf that does not include rows in which values in column storeId have the value 25. Choose the answer that

correctly fills the blanks in the code block to accomplish this.

Options:

transactionsDf.remove(transactionsDf.storeId==25)

transactionsDf.where(transactionsDf.storeId!=25)

transactionsDf.filter(transactionsDf.storeId==25)

transactionsDf.drop(transactionsDf.storeId==25)

transactionsDf.select(transactionsDf.storeId!=25)

Buy Now

Questions 14

Which of the following describes how Spark achieves fault tolerance?

Options:

Spark helps fast recovery of data in case of a worker fault by providing the MEMORY_AND_DISK storage level option.

If an executor on a worker node fails while calculating an RDD, that RDD can be recomputed by another executor using the lineage.

Spark builds a fault-tolerant layer on top of the legacy RDD data system, which by itself is not fault tolerant.

Due to the mutability of DataFrames after transformations, Spark reproduces them using observed lineage in case of worker node failure.

Spark is only fault-tolerant if this feature is specifically enabled via the spark.fault_recovery.enabled property.

Buy Now

Questions 15

Which of the following describes a narrow transformation?

Options:

narrow transformation is an operation in which data is exchanged across partitions.

A narrow transformation is a process in which data from multiple RDDs is used.

A narrow transformation is a process in which 32-bit float variables are cast to smaller float variables, like 16-bit or 8-bit float variables.

A narrow transformation is an operation in which data is exchanged across the cluster.

A narrow transformation is an operation in which no data is exchanged across the cluster.

Buy Now

Questions 16

Which of the following code blocks performs an inner join between DataFrame itemsDf and DataFrame transactionsDf, using columns itemId and transactionId as join keys, respectively?

Options:

itemsDf.join(transactionsDf, "inner", itemsDf.itemId == transactionsDf.transactionId)

itemsDf.join(transactionsDf, itemId == transactionId)

itemsDf.join(transactionsDf, itemsDf.itemId == transactionsDf.transactionId, "inner")

itemsDf.join(transactionsDf, "itemsDf.itemId == transactionsDf.transactionId", "inner")

itemsDf.join(transactionsDf, col(itemsDf.itemId) == col(transactionsDf.transactionId))

Buy Now

Questions 17

Which of the following code blocks stores DataFrame itemsDf in executor memory and, if insufficient memory is available, serializes it and saves it to disk?

Options:

itemsDf.persist(StorageLevel.MEMORY_ONLY)

itemsDf.cache(StorageLevel.MEMORY_AND_DISK)

itemsDf.store()

itemsDf.cache()

itemsDf.write.option('destination', 'memory').save()

Buy Now

Questions 18

Which of the following is a problem with using accumulators?

Options:

Only unnamed accumulators can be inspected in the Spark UI.

Only numeric values can be used in accumulators.

Accumulator values can only be read by the driver, but not by executors.

Accumulators do not obey lazy evaluation.

Accumulators are difficult to use for debugging because they will only be updated once, independent if a task has to be re-run due to hardware failure.

Buy Now

Answer:

Explanation:

Explanation

Accumulator values can only be read by the driver, but not by executors.

Correct. So, for example, you cannot use an accumulator variable for coordinating workloads between executors. The typical, canonical, use case of an accumulator value is to report data, for

example for debugging purposes, back to the driver. For example, if you wanted to count values that match a specific condition in a UDF for debugging purposes, an accumulator provides a good

way to do that.

Only numeric values can be used in accumulators.

No. While pySpark's Accumulator only supports numeric values (think int and float), you can define accumulators for custom types via the AccumulatorParam interface (documentation linked below).

Accumulators do not obey lazy evaluation.

Incorrect – accumulators do obey lazy evaluation. This has implications in practice: When an accumulator is encapsulated in a transformation, that accumulator will not be modified until a

subsequent action is run.

Accumulators are difficult to use for debugging because they will only be updated once, independent if a task has to be re-run due to hardware failure.

Wrong. A concern with accumulators is in fact that under certain conditions they can run for each task more than once. For example, if a hardware failure occurs during a task after an accumulator

variable has been increased but before a task has finished and Spark launches the task on a different worker in response to the failure, already executed accumulator variable increases will be

repeated.

Only unnamed accumulators can be inspected in the Spark UI.

No. Currently, in PySpark, no accumulators can be inspected in the Spark UI. In the Scala interface of Spark, only named accumulators can be inspected in the Spark UI.

More info: Aggregating Results with Spark Accumulators | Sparkour, RDD Programming Guide - Spark 3.1.2 Documentation, pyspark.Accumulator — PySpark 3.1.2 documentation, and

pyspark.AccumulatorParam — PySpark 3.1.2 documentation

Questions 19

The code block displayed below contains an error. The code block should display the schema of DataFrame transactionsDf. Find the error.

Code block:

transactionsDf.rdd.printSchema

Options:

There is no way to print a schema directly in Spark, since the schema can be printed easily through using print(transactionsDf.columns), so that should be used instead.

The code block should be wrapped into a print() operation.

printSchema is only accessible through the spark session, so the code block should be rewritten as spark.printSchema(transactionsDf).

printSchema is a method and should be written as printSchema(). It is also not callable through transactionsDf.rdd, but should be called directly from transactionsDf.

(Correct)

printSchema is a not a method of transactionsDf.rdd. Instead, the schema should be printed via transactionsDf.print_schema().

Buy Now

Questions 20

Which of the following code blocks returns a new DataFrame with only columns predError and values of every second row of DataFrame transactionsDf?

Entire DataFrame transactionsDf:

1.+-------------+---------+-----+-------+---------+----+

3.+-------------+---------+-----+-------+---------+----+

4.| 1| 3| 4| 25| 1|null|

5.| 2| 6| 7| 2| 2|null|

6.| 3| 3| null| 25| 3|null|

7.| 4| null| null| 3| 2|null|

8.| 5| null| null| null| 2|null|

9.| 6| 3| 2| 25| 2|null|

10.+-------------+---------+-----+-------+---------+----+

Options:

transactionsDf.filter(col("transactionId").isin([3,4,6])).select([predError, value])

transactionsDf.select(col("transactionId").isin([3,4,6]), "predError", "value")

transactionsDf.filter("transactionId" % 2 == 0).select("predError", "value")

transactionsDf.filter(col("transactionId") % 2 == 0).select("predError", "value")

(Correct)

1.transactionsDf.createOrReplaceTempView("transactionsDf")

2.spark.sql("FROM transactionsDf SELECT predError, value WHERE transactionId % 2 = 2")

transactionsDf.filter(col(transactionId).isin([3,4,6]))

Buy Now

Questions 21

Which of the following describes characteristics of the Dataset API?

Options:

The Dataset API does not support unstructured data.

In Python, the Dataset API mainly resembles Pandas' DataFrame API.

In Python, the Dataset API's schema is constructed via type hints.

The Dataset API is available in Scala, but it is not available in Python.

The Dataset API does not provide compile-time type safety.

Buy Now

Questions 22

Which of the following code blocks sorts DataFrame transactionsDf both by column storeId in ascending and by column productId in descending order, in this priority?

Options:

transactionsDf.sort("storeId", asc("productId"))

transactionsDf.sort(col(storeId)).desc(col(productId))

transactionsDf.order_by(col(storeId), desc(col(productId)))

transactionsDf.sort("storeId", desc("productId"))

transactionsDf.sort("storeId").sort(desc("productId"))

Buy Now

Questions 23

Which of the following code blocks returns a copy of DataFrame itemsDf where the column supplier has been renamed to manufacturer?

Options:

itemsDf.withColumn(["supplier", "manufacturer"])

itemsDf.withColumn("supplier").alias("manufacturer")

itemsDf.withColumnRenamed("supplier", "manufacturer")

itemsDf.withColumnRenamed(col("manufacturer"), col("supplier"))

itemsDf.withColumnsRenamed("supplier", "manufacturer")

Buy Now

Answer:

Explanation:

itemsDf.withColumnRenamed("supplier", "manufacturer")

Correct! This uses the relatively trivial DataFrame method withColumnRenamed for renaming column supplier to column manufacturer.

Note that the QUESTION NO: asks for "a copy of DataFrame itemsDf". This may be confusing if you are not familiar with Spark yet. RDDs (Resilient Distributed Datasets) are the foundation of

Spark DataFrames and are immutable. As such, DataFrames are immutable, too. Any command that changes anything in the DataFrame therefore necessarily returns a copy, or a new version, of it

that has the changes applied.

itemsDf.withColumnsRenamed("supplier", "manufacturer")

Incorrect. Spark's DataFrame API does not have a withColumnsRenamed() method.

itemsDf.withColumnRenamed(col("manufacturer"), col("supplier"))

No. Watch out – although the col() method works for many methods of the DataFrame API, withColumnRenamed is not one of them. As outlined in the documentation linked below,

withColumnRenamed expects strings.

itemsDf.withColumn(["supplier", "manufacturer"])

Wrong. While DataFrame.withColumn() exists in Spark, it has a different purpose than renaming columns. withColumn is typically used to add columns to DataFrames, taking the name of the new

column as a first, and a Column as a second argument. Learn more via the documentation that is linked below.

itemsDf.withColumn("supplier").alias("manufacturer")

No. While DataFrame.withColumn() exists, it requires 2 arguments. Furthermore, the alias() method on DataFrames would not help the cause of renaming a column much. DataFrame.alias() can be

useful in addressing the input of join statements. However, this is far outside of the scope of this question. If you are curious nevertheless, check out the link below.

More info: pyspark.sql.DataFrame.withColumnRenamed — PySpark 3.1.1 documentation, pyspark.sql.DataFrame.withColumn — PySpark 3.1.1 documentation, and pyspark.sql.DataFrame.alias —

PySpark 3.1.2 documentation (https://bit.ly/3aSB5tm , https://bit.ly/2Tv4rbE , https://bit.ly/2RbhBd2)

Static notebook | Dynamic notebook: See test 1, QUESTION NO: 31 (Databricks import instructions) (https://flrs.github.io/spark_practice_tests_code/#1/31.html ,

https://bit.ly/sparkpracticeexams_import_instructions)

Questions 24

The code block displayed below contains an error. The code block should return the average of rows in column value grouped by unique storeId. Find the error.

Code block:

transactionsDf.agg("storeId").avg("value")

Options:

Instead of avg("value"), avg(col("value")) should be used.

The avg("value") should be specified as a second argument to agg() instead of being appended to it.

All column names should be wrapped in col() operators.

agg should be replaced by groupBy.

"storeId" and "value" should be swapped.

Buy Now

Questions 25

Which of the following options describes the responsibility of the executors in Spark?

Options:

The executors accept jobs from the driver, analyze those jobs, and return results to the driver.

The executors accept tasks from the driver, execute those tasks, and return results to the cluster manager.

The executors accept tasks from the driver, execute those tasks, and return results to the driver.

The executors accept tasks from the cluster manager, execute those tasks, and return results to the driver.

The executors accept jobs from the driver, plan those jobs, and return results to the cluster manager.

Buy Now

Questions 26

Which of the following code blocks produces the following output, given DataFrame transactionsDf?

Output:

1.root

2. |-- transactionId: integer (nullable = true)

3. |-- predError: integer (nullable = true)

4. |-- value: integer (nullable = true)

5. |-- storeId: integer (nullable = true)

6. |-- productId: integer (nullable = true)

7. |-- f: integer (nullable = true)

DataFrame transactionsDf:

1.+-------------+---------+-----+-------+---------+----+

3.+-------------+---------+-----+-------+---------+----+

4.| 1| 3| 4| 25| 1|null|

5.| 2| 6| 7| 2| 2|null|

6.| 3| 3| null| 25| 3|null|

7.+-------------+---------+-----+-------+---------+----+

Options:

transactionsDf.schema.print()

transactionsDf.rdd.printSchema()

transactionsDf.rdd.formatSchema()

transactionsDf.printSchema()

print(transactionsDf.schema)

Buy Now

Questions 27

Which is the highest level in Spark's execution hierarchy?

Options:

Task

Executor

Slot

Job

Stage

Buy Now

Exam Code: Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0

Exam Name: Databricks Certified Associate Developer for Apache Spark 3.0 Exam

Last Update: Jul 3, 2025

Questions: 180

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 PDF

$29.75 ~~$84.99~~

Add to Cart

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Engine

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Testing Engine

$35 ~~$99.99~~

Add to Cart

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 PDF + Engine

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 PDF + Testing Engine

$47.25 ~~$134.99~~

Add to Cart

Summer Special Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: cramtreat

cramtick logo

Navigation:

Hot Vendors:

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Databricks Certified Associate Developer for Apache Spark 3.0 Exam Questions and Answers

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 PDF

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Testing Engine

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 PDF + Testing Engine