Welcome to ExamTopics
ExamTopics Logo
- Expert Verified, Online, Free.

Unlimited Access

Get Unlimited Contributor Access to the all ExamTopics Exams!
Take advantage of PDF Files for 1000+ Exams along with community discussions and pass IT Certification Exams Easily.

Exam DP-203 topic 1 question 49 discussion

Actual exam question from Microsoft's DP-203
Question #: 49
Topic #: 1
[All DP-203 Questions]

You have an Azure Synapse Analytics Apache Spark pool named Pool1.
You plan to load JSON files from an Azure Data Lake Storage Gen2 container into the tables in Pool1. The structure and data types vary by file.
You need to load the files into the tables. The solution must maintain the source data types.
What should you do?

  • A. Use a Conditional Split transformation in an Azure Synapse data flow.
  • B. Use a Get Metadata activity in Azure Data Factory.
  • C. Load the data by using the OPENROWSET Transact-SQL command in an Azure Synapse Analytics serverless SQL pool.
  • D. Load the data by using PySpark.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️
Serverless SQL pool can automatically synchronize metadata from Apache Spark. A serverless SQL pool database will be created for each database existing in serverless Apache Spark pools.
Serverless SQL pool enables you to query data in your data lake. It offers a T-SQL query surface area that accommodates semi-structured and unstructured data queries.
To support a smooth experience for in place querying of data that's located in Azure Storage files, serverless SQL pool uses the OPENROWSET function with additional capabilities.
The easiest way to see to the content of your JSON file is to provide the file URL to the OPENROWSET function, specify csv FORMAT.
Reference:
https://docs.microsoft.com/en-us/azure/synapse-analytics/sql/query-json-files https://docs.microsoft.com/en-us/azure/synapse-analytics/sql/query-data-storage

Comments

Chosen Answer:
This is a voting comment (?) , you can switch to a simple comment.
Switch to a voting comment New
galacaw
Highly Voted 2 years, 2 months ago
Should be D, it's about Apache Spark pool, not serverless SQL pool.
upvoted 39 times
...
Joanna0
Highly Voted 5 months, 4 weeks ago
Selected Answer: D
If your JSON files have a consistent structure and data types, then OPENROWSET is a good option. However, if your JSON files have a varying structure and data types, then PySpark is a better option.
upvoted 5 times
...
e56bb91
Most Recent 10 hours, 32 minutes ago
Selected Answer: D
ChatGPT 4o Using PySpark in an Apache Spark pool within Azure Synapse Analytics is the most flexible and powerful way to handle JSON files with varying structures and data types. PySpark can infer schema and handle complex data transformations, making it well-suited for loading heterogeneous JSON data into tables while preserving the original data types.
upvoted 1 times
...
Okkier
1 day, 15 hours ago
Selected Answer: D
When loading data into an Apache Spark pool, especially when dealing with inconsistent file structures, PySpark (the Python API for Spark) is generally the better choice over OpenRowset. This is because PySpark offers greater flexibility, better performance, and more robust handling of varied and complex data structures.
upvoted 1 times
...
kldakdlsa
1 week, 1 day ago
should be D
upvoted 1 times
...
ellala
9 months ago
Selected Answer: D
We have an "Azure Synapse Analytics Apache Spark pool" therefore, we use Spark. There is no information about a serverless SQL Pool
upvoted 2 times
...
kkk5566
10 months, 1 week ago
Selected Answer: D
Should be D
upvoted 2 times
...
vctrhugo
1 year ago
Selected Answer: D
PySpark provides a powerful and flexible programming interface for processing and loading data in Azure Synapse Analytics Apache Spark pools. With PySpark, you can leverage its JSON reader capabilities to infer the schema and maintain the source data types during the loading process.
upvoted 3 times
...
vctrhugo
1 year, 1 month ago
Selected Answer: D
To load JSON files from an Azure Data Lake Storage Gen2 container into tables in an Azure Synapse Analytics Apache Spark pool, you can use PySpark. PySpark provides a flexible and powerful framework for working with big data in Apache Spark. Therefore, the correct answer is: D. Load the data by using PySpark. You can use PySpark to read the JSON files from Azure Data Lake Storage Gen2, infer the schema, and load the data into tables in the Spark pool while maintaining the source data types. PySpark provides various functions and methods to handle JSON data and perform transformations as needed before loading it into tables.
upvoted 4 times
...
janaki
1 year, 1 month ago
Option D: Load the data by using PySpark
upvoted 1 times
...
henryphchan
1 year, 2 months ago
Selected Answer: D
The question stated that "You have an Azure Synapse Analytics Apache Spark pool named Pool1.", so this question is about Spark pool
upvoted 1 times
...
Victor_Kings
1 year, 2 months ago
Selected Answer: C
As stated by Microsoft, "Serverless SQL pool can automatically synchronize metadata from Apache Spark. A serverless SQL pool database will be created for each database existing in serverless Apache Spark pools.". So even though the files in Azure Storage were created with Apache Spark, you can still query them using OPENROWSET with a serverless SQL Pool https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/develop-storage-files-spark-tables
upvoted 3 times
dgerok
2 months, 3 weeks ago
We are dealing with varying JSON. There is nothing about this option by the link you've provided. The correct answer is D...
upvoted 1 times
...
Tejashu
7 months, 2 weeks ago
As the question states that "You need to load the files into the tables" , through serverless sql pool we cannot load data. so the answer should be D
upvoted 2 times
...
...
esaade
1 year, 3 months ago
Selected Answer: D
To load JSON files from an Azure Data Lake Storage Gen2 container into the tables in an Apache Spark pool in Azure Synapse Analytics while maintaining the source data types, you should use PySpark.
upvoted 3 times
...
haidebelognime
1 year, 4 months ago
Selected Answer: D
PySpark is the Python API for Apache Spark, which is a distributed computing framework that can handle large-scale data processing.
upvoted 2 times
...
brzhanyu
1 year, 7 months ago
Selected Answer: D
Should be D, it's about Apache Spark pool, not serverless SQL pool.
upvoted 2 times
...
smsme323
1 year, 9 months ago
Selected Answer: D
Its a spark pool
upvoted 2 times
...
Deeksha1234
1 year, 11 months ago
Both C and D looks correct
upvoted 2 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
ex Want to SAVE BIG on Certification Exam Prep?
close
ex Unlock All Exams with ExamTopics Pro 75% Off
  • arrow Choose From 1000+ Exams
  • arrow Access to 10 Exams per Month
  • arrow PDF Format Available
  • arrow Inline Discussions
  • arrow No Captcha/Robot Checks
Limited Time Offer
Ends in