Welcome to ExamTopics
ExamTopics Logo
- Expert Verified, Online, Free.

Unlimited Access

Get Unlimited Contributor Access to the all ExamTopics Exams!
Take advantage of PDF Files for 1000+ Exams along with community discussions and pass IT Certification Exams Easily.

Exam Certified Data Engineer Professional topic 1 question 30 discussion

Actual exam question from Databricks's Certified Data Engineer Professional
Question #: 30
Topic #: 1
[All Certified Data Engineer Professional Questions]

A nightly job ingests data into a Delta Lake table using the following code:

The next step in the pipeline requires a function that returns an object that can be used to manipulate new records that have not yet been processed to the next table in the pipeline.
Which code snippet completes this function definition?
def new_records():

  • A. return spark.readStream.table("bronze")
  • B. return spark.readStream.load("bronze")
  • C.
  • D. return spark.read.option("readChangeFeed", "true").table ("bronze")
  • E.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment (?) , you can switch to a simple comment.
Switch to a voting comment New
AzureDE2522
Highly Voted 7 months, 4 weeks ago
Selected Answer: D
# not providing a starting version/timestamp will result in the latest snapshot being fetched first spark.readStream.format("delta") \ .option("readChangeFeed", "true") \ .table("myDeltaTable") Please refer: https://docs.databricks.com/en/delta/delta-change-data-feed.html
upvoted 5 times
...
Laraujo2022
Highly Voted 7 months, 3 weeks ago
In my opinion E is not correct because we do not see parameters pass within to the function (year, month and day)... the function is def new_records():
upvoted 5 times
...
zhiva
Most Recent 1 week, 3 days ago
Selected Answer: A
Both E and A can be correct but in the definition of the function there are no input parameters. This means we can't use them correctly in returned statement only with the given information in the question. This is why I vote for A
upvoted 1 times
...
imatheushenrique
1 month ago
The E option makes more sense because all the partition would be filtered. Can't be the options that use CDF because theres no readChangeFeed option in dataframe read
upvoted 1 times
...
arik90
3 months, 1 week ago
Selected Answer: E
Since the ingest_daily_batch function writes to the "bronze" table in batch mode using spark.read and write operations, we should not use readStream to read from it in the subsequent function.
upvoted 1 times
...
alexvno
3 months, 3 weeks ago
Selected Answer: E
Probable E, but still filename not specified only folder path
upvoted 1 times
...
vikram12apr
4 months ago
Selected Answer: E
Please read the question again . it is asking to get the data from bronze table to the some downstream table. Now as its a append only daily nightly job the filter on file name will give the new data available in bronze table which is still not flown down the pipeline.
upvoted 2 times
...
agreddy
4 months, 2 weeks ago
D is correct. https://delta.io/blog/2023-07-14-delta-lake-change-data-feed-cdf/ CDF can be enabled on non-streaming Delta table.. "delta" is default table format.
upvoted 1 times
...
ojudz08
4 months, 3 weeks ago
Selected Answer: D
the question here is how to manipulate new records that have not yet been processed to the next table, since the data has been ingested into the bronze table you need to check whether or not the data ingested daily is already there in the silver table, so I think answer is D. Enabling change data feed allows to track row-level changes between delta table versions https://docs.databricks.com/en/delta/delta-change-data-feed.html
upvoted 1 times
...
guillesd
5 months ago
the problem here is that both A and E are correct. E just follows the previous filtering logic while A uses the readStream method which will have to maintain a checkpoint. But both can work
upvoted 1 times
...
DAN_H
5 months, 1 week ago
Selected Answer: A
A as Structured Streaming incrementally reads Delta tables. While a streaming query is active against a Delta table, new records are processed idempotently as new table versions commit to the source table.
upvoted 3 times
...
adenis
5 months, 1 week ago
Selected Answer: A
A is Correct
upvoted 1 times
...
adenis
5 months, 1 week ago
A is Correct
upvoted 1 times
...
Jay_98_11
5 months, 3 weeks ago
Selected Answer: E
can't be D since no read option in CDF. https://docs.databricks.com/en/delta/delta-change-data-feed.html
upvoted 1 times
mht3336
5 months, 2 weeks ago
spark.read.format("delta") \ .option("readChangeFeed", "true") \ .option("startingVersion", 0) \ .option("endingVersion", 10) \ .table("myDeltaTable")
upvoted 1 times
...
...
RafaelCFC
6 months ago
Selected Answer: E
E addresses the desired filtering, while keeping with the logic of the first step being a batch job, and has no code errors.
upvoted 1 times
...
alaverdi
6 months, 3 weeks ago
Selected Answer: A
In my opinion A is the correct answer. You read delta table as a stream and process only newly arrived records. This is maintained while writing the stream with the state stored in checkpoint location. spark.readStream.table("bronze") .writeStream .format("delta") .outputMode("append") .option("checkpointLocation", "/path/to/checkpoints/") .toTable("silver")
upvoted 3 times
...
chokthewa
8 months, 3 weeks ago
E is correct. D use invalid option refer to see sample in https://docs.databricks.com/en/delta/delta-change-data-feed.html . A , B didn't filter ,so it will gather whole table data. E uses the knew value to filter .
upvoted 1 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
ex Want to SAVE BIG on Certification Exam Prep?
close
ex Unlock All Exams with ExamTopics Pro 75% Off
  • arrow Choose From 1000+ Exams
  • arrow Access to 10 Exams per Month
  • arrow PDF Format Available
  • arrow Inline Discussions
  • arrow No Captcha/Robot Checks
Limited Time Offer
Ends in