Unlimited Access

Get Unlimited Contributor Access to the all ExamTopics Exams!
Take advantage of PDF Files for 1000+ Exams along with community discussions and pass IT Certification Exams Easily.

Get Unlimited Access

Microsoft Discussions

Exam DP-203 topic 1 question 49 discussion

Actual exam question from Microsoft's DP-203

Question #: 49
Topic #: 1

[All DP-203 Questions]

You have an Azure Synapse Analytics Apache Spark pool named Pool1.
You plan to load JSON files from an Azure Data Lake Storage Gen2 container into the tables in Pool1. The structure and data types vary by file.
You need to load the files into the tables. The solution must maintain the source data types.
What should you do?

A. Use a Conditional Split transformation in an Azure Synapse data flow.
B. Use a Get Metadata activity in Azure Data Factory.
C. Load the data by using the OPENROWSET Transact-SQL command in an Azure Synapse Analytics serverless SQL pool.
D. Load the data by using PySpark.

Show Suggested Answer

Suggested Answer: C 🗳️
Serverless SQL pool can automatically synchronize metadata from Apache Spark. A serverless SQL pool database will be created for each database existing in serverless Apache Spark pools.
Serverless SQL pool enables you to query data in your data lake. It offers a T-SQL query surface area that accommodates semi-structured and unstructured data queries.
To support a smooth experience for in place querying of data that's located in Azure Storage files, serverless SQL pool uses the OPENROWSET function with additional capabilities.
The easiest way to see to the content of your JSON file is to provide the file URL to the OPENROWSET function, specify csv FORMAT.
Reference:
https://docs.microsoft.com/en-us/azure/synapse-analytics/sql/query-json-files https://docs.microsoft.com/en-us/azure/synapse-analytics/sql/query-data-storage

by galacaw at April 27, 2022, 12:15 p.m.

Comments

Submit Cancel

galacaw

Highly Voted 2 years, 2 months ago

Should be D, it's about Apache Spark pool, not serverless SQL pool.

upvoted 39 times

...

Joanna0

Highly Voted 5 months, 4 weeks ago

Selected Answer: D

If your JSON files have a consistent structure and data types, then OPENROWSET is a good option. However, if your JSON files have a varying structure and data types, then PySpark is a better option.

upvoted 5 times

...

e56bb91

Most Recent 10 hours, 32 minutes ago

Selected Answer: D

ChatGPT 4o Using PySpark in an Apache Spark pool within Azure Synapse Analytics is the most flexible and powerful way to handle JSON files with varying structures and data types. PySpark can infer schema and handle complex data transformations, making it well-suited for loading heterogeneous JSON data into tables while preserving the original data types.

upvoted 1 times

...

Okkier

1 day, 15 hours ago

Selected Answer: D

When loading data into an Apache Spark pool, especially when dealing with inconsistent file structures, PySpark (the Python API for Spark) is generally the better choice over OpenRowset. This is because PySpark offers greater flexibility, better performance, and more robust handling of varied and complex data structures.

upvoted 1 times

...

kldakdlsa

1 week, 1 day ago

should be D

upvoted 1 times

...

ellala

9 months ago

Selected Answer: D

We have an "Azure Synapse Analytics Apache Spark pool" therefore, we use Spark. There is no information about a serverless SQL Pool

upvoted 2 times

...

kkk5566

10 months, 1 week ago

Selected Answer: D

Should be D

upvoted 2 times

...

vctrhugo

1 year ago

Selected Answer: D

PySpark provides a powerful and flexible programming interface for processing and loading data in Azure Synapse Analytics Apache Spark pools. With PySpark, you can leverage its JSON reader capabilities to infer the schema and maintain the source data types during the loading process.

upvoted 3 times

...

vctrhugo

1 year, 1 month ago

Selected Answer: D

To load JSON files from an Azure Data Lake Storage Gen2 container into tables in an Azure Synapse Analytics Apache Spark pool, you can use PySpark. PySpark provides a flexible and powerful framework for working with big data in Apache Spark. Therefore, the correct answer is: D. Load the data by using PySpark. You can use PySpark to read the JSON files from Azure Data Lake Storage Gen2, infer the schema, and load the data into tables in the Spark pool while maintaining the source data types. PySpark provides various functions and methods to handle JSON data and perform transformations as needed before loading it into tables.

upvoted 4 times

...

janaki

1 year, 1 month ago

Option D: Load the data by using PySpark

upvoted 1 times

...

henryphchan

1 year, 2 months ago

Selected Answer: D

The question stated that "You have an Azure Synapse Analytics Apache Spark pool named Pool1.", so this question is about Spark pool

upvoted 1 times

...

Victor_Kings

1 year, 2 months ago

Selected Answer: C

As stated by Microsoft, "Serverless SQL pool can automatically synchronize metadata from Apache Spark. A serverless SQL pool database will be created for each database existing in serverless Apache Spark pools.". So even though the files in Azure Storage were created with Apache Spark, you can still query them using OPENROWSET with a serverless SQL Pool https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/develop-storage-files-spark-tables

upvoted 3 times

dgerok

2 months, 3 weeks ago

We are dealing with varying JSON. There is nothing about this option by the link you've provided. The correct answer is D...

upvoted 1 times

...

Tejashu

7 months, 2 weeks ago

As the question states that "You need to load the files into the tables" , through serverless sql pool we cannot load data. so the answer should be D

upvoted 2 times

...

esaade

1 year, 3 months ago

Selected Answer: D

To load JSON files from an Azure Data Lake Storage Gen2 container into the tables in an Apache Spark pool in Azure Synapse Analytics while maintaining the source data types, you should use PySpark.

upvoted 3 times

...

haidebelognime

1 year, 4 months ago

Selected Answer: D

PySpark is the Python API for Apache Spark, which is a distributed computing framework that can handle large-scale data processing.

upvoted 2 times

...

brzhanyu

1 year, 7 months ago

Selected Answer: D

Should be D, it's about Apache Spark pool, not serverless SQL pool.

upvoted 2 times

...

smsme323

1 year, 9 months ago

Selected Answer: D

Its a spark pool

upvoted 2 times

...

Deeksha1234

1 year, 11 months ago

Both C and D looks correct

upvoted 2 times

...

Load full discussion...

Unlimited Access

Exam DP-203 topic 1 question 49 discussion

Comments

galacaw

Joanna0

e56bb91

Okkier

kldakdlsa

ellala

kkk5566

vctrhugo

vctrhugo

janaki

henryphchan

Victor_Kings

dgerok

Tejashu

esaade

haidebelognime

brzhanyu

smsme323

Deeksha1234

Get IT Certification

New Version GCP Professional Cloud Architect Certificate & Helpful Information

The 5 Most In-Demand Project Management Certifications of 2019