DSA-C02 Study Guide Brilliant DSA-C02 Exam Dumps PDF [Q37-Q54]

Share

DSA-C02 Study Guide Brilliant DSA-C02 Exam Dumps PDF

View DSA-C02 Exam Question Dumps With Latest Demo

NEW QUESTION # 37
Mark the correct steps for saving the contents of a DataFrame to aSnowflake table as part of Moving Data from Spark to Snowflake?

  • A. Step 1.Use the PUT() method of the DataFrame to construct a DataFrameWriter.
    Step 2.Specify SNOWFLAKE_SOURCE_NAME using the NAME() method.
    Step 3.Use the dbtable option to specify the table to which data is written.
    Step 4.Specify the connector options using either the option() or options() method.
    Step 5.Use the save() method to specify the save mode for the content.
  • B. Step 1.Use the write() method of the DataFrame to construct a DataFrameWriter.
    Step 2.Specify SNOWFLAKE_SOURCE_NAME using the format() method.
    Step 3.Specify the connector options using either the option() or options() method.
    Step 4.Use the dbtable option to specify the table to which data is written.
    Step 5.Use the mode() method to specify the save mode for the content.
    (Correct)
  • C. Step 1.Use the PUT() method of the DataFrame to construct a DataFrameWriter.
    Step 2.Specify SNOWFLAKE_SOURCE_NAME using the format() method.
    Step 3.Specify the connector options using either the option() or options() method.
    Step 4.Use the dbtable option to specify the table to which data is written.
    Step 5.Use the save() method to specify the save mode for the content.
  • D. Step 1.Use the writer() method of the DataFrame to construct a DataFrameWriter.
    Step 2.Specify SNOWFLAKE_SOURCE_NAME using the format() method.
    Step 3.Use the dbtable option to specify the table to which data is written.
    Step 4.Specify the connector options using either the option() or options() method.
    Step 5.Use the save() method to specify the save mode for the content.

Answer: B

Explanation:
Explanation
Moving Data from Spark to Snowflake
The steps for saving the contents of a DataFrame to a Snowflake table are similar to writing from Snowflake to Spark:
1. Use the write() method of the DataFrame to construct a DataFrameWriter.
2. Specify SNOWFLAKE_SOURCE_NAME using the format() method.
3. Specify the connector options using either the option() or options() method.
4. Use the dbtable option to specify the table to which data is written.
5. Use the mode() method to specify the save mode for the content.
Examples
1.df.write
2..format(SNOWFLAKE_SOURCE_NAME)
3..options(sfOptions)
4..option("dbtable", "t2")
5..mode(SaveMode.Overwrite)
6..save()


NEW QUESTION # 38
Data Scientist can query, process, and transform data in a which of the following ways using Snowpark Python. [Select 2]

  • A. Write a user-defined tabular function (UDTF) that processes data and returns data in a set of rows with one or more columns.
  • B. SnowPark currently do not support writing UDTF.
  • C. Transform Data using DataIKY tool with SnowPark API.
  • D. Query and process data with a DataFrame object.

Answer: B,D

Explanation:
Explanation
Query and process data with a DataFrame object. Refer to Working with DataFrames in Snowpark Python.
Convert custom lambdas and functions to user-defined functions(UDFs) that you can call to process data.
Write a user-defined tabular function (UDTF) that processes data and returns data in a set of rows with one or more columns.
Write a stored procedure that you can call to process data, or automate with a task to build a data pipeline.


NEW QUESTION # 39
Which ones are the key actions in the data collection phase of Machine learning included?

  • A. Label
  • B. Measure
  • C. Probability
  • D. Ingest and Aggregate

Answer: A,D

Explanation:
Explanation
The key actions in the data collection phase include:
Label: Labeled data is the raw data that was processed by adding one or more meaningful tags so that a model can learn from it. It will take some work to label it if such information is missing (manually or automatically).
Ingest and Aggregate: Incorporating and combining data from many data sources is part of data collection in AI.
Data collection
Collecting data for training the ML model is the basic step in the machine learning pipeline. The predictions made by ML systems can only be as good as the data on which they have been trained. Following are some of the problems that can arise in data collection:
Inaccurate data. The collected data could be unrelated to the problem statement.
Missing data. Sub-data could be missing. That could take the form of empty values in columns or missing images for some class of prediction.
Data imbalance. Some classes or categories in the data may have a disproportionately high or low number of corresponding samples. As a result, they risk being under-represented in the model.
Data bias. Depending on how the data, subjects and labels themselves are chosen, the model could propagate inherent biases on gender, politics, age or region, for example. Data bias is difficult to detect and remove.
Several techniques can be applied to address those problems:
Pre-cleaned, freely available datasets. If the problem statement (for example, image classification, object recognition) aligns with a clean, pre-existing, properly formulated dataset, then take ad-vantage of existing, open-source expertise.
Web crawling and scraping. Automated tools, bots and headless browsers can crawl and scrape websites for data.
Private data. ML engineers can create their own data. This is helpful when the amount of data required to train the model is small and the problem statement is too specific to generalize over an open-source dataset.
Custom data. Agencies can create or crowdsource the data for a fee.


NEW QUESTION # 40
Which is the visual depiction of data through the use of graphs, plots, and informational graphics?

  • A. Data Mining
  • B. Data Interpretation
  • C. Data Virtualization
  • D. Data visualization

Answer: A

Explanation:
Explanation
Data visualization is the visual depiction of data through the use of graphs, plots, and informational graphics.
Its practitioners use statistics and data science to conveythe meaning behind data in ethical and accurate ways.


NEW QUESTION # 41
Mark the incorrect statement regarding usage of Snowflake Stream & Tasks?

  • A. Snowflake automatically resizes and scales the compute resources for serverless tasks.
  • B. An standard-only stream tracks row inserts only.
  • C. Streams support repeatable read isolation.
  • D. Snowflake ensures only one instance of a task with a schedule (i.e. a standalone task or the root task in a DAG) is executed at a given time. If a task is still running when the next scheduled execution time occurs, then that scheduled time is skipped.

Answer: B

Explanation:
Explanation
All are correct except a standard-only stream tracks row inserts only.
A standard (i.e. delta) stream tracks all DML changes to the source object, including inserts, up-dates, and deletes (including table truncates).


NEW QUESTION # 42
Mark the incorrect statement regarding Python UDF?

  • A. A scalar function (UDF) returns a tabular value for each input row
  • B. For each row passed to a UDF, the UDF returns either a scalar (i.e. single) value or, if defined as a table function, a set of rows.
  • C. Python UDFs can contain both new code and calls to existing packages
  • D. A UDF also gives you a way to encapsulate functionality so that you can call it repeatedly from multiple places in code

Answer: A

Explanation:
Explanation
A scalar function (UDF) returns one output row for each input row. The returned row consists of a single column/value


NEW QUESTION # 43
Consider a data frame df with 10 rows and index [ 'r1', 'r2', 'r3', 'row4', 'row5', 'row6', 'r7', 'r8', 'r9', 'row10'].
What does the aggregate method shown in below code do?
g = df.groupby(df.index.str.len())
g.aggregate({'A':len, 'B':np.sum})

  • A. Computes length of column A and Sum of Column B values
  • B. Computes Sum of column A values
  • C. Computes length of column A
  • D. Computes length of column A and Sum of Column B values of each group

Answer: D

Explanation:
Explanation
Computes length of column A and Sum of Column B values of each group


NEW QUESTION # 44
Which of the Following is not type of Windows function in Snowflake?

  • A. Association functions.
  • B. Window frame functions.
  • C. Aggregation window functions.
  • D. Rank-related functions.

Answer: A,C

Explanation:
Explanation
Window Functions
A window function operates on a group ("window") of related rows.
Each time a window function is called, it is passed a row (the current row in the window) and the window of rows that contain the current row. The window function returns one output row for each input row. The output depends on the individual row passed to the function and the values of the other rows in the window passed to the function.
Some window functions are order-sensitive. There are two main types of order-sensitive window functions:
Rank-related functions.
Window frame functions.
Rank-related functions list information based on the "rank" of a row. For example, if you rank stores in descending order by profit per year, the store with the most profit will be ranked 1; the second-most profitable store will be ranked 2, etc.
Window frame functions allow you to perform rolling operations, such as calculating a running total or a moving average, on a subset of the rows in the window.


NEW QUESTION # 45
Which type of Machine learning Data Scientist generally used for solving classification and regression problems?

  • A. Supervised
  • B. Regression Learning
  • C. Unsupervised
  • D. Instructor Learning
  • E. Reinforcement Learning

Answer: A

Explanation:
Explanation
Supervised Learning
Overview:
Supervised learning is a type of machine learning that uses labeled data to train machine learning models. In labeled data, the output is already known. The model just needs to map the inputs to the respective outputs.
Algorithms:
Some of the most popularly used supervised learning algorithms are:
Linear Regression
Logistic Regression
Support Vector Machine
K Nearest Neighbor
Decision Tree
Random Forest
Naive Bayes
Working:
Supervised learning algorithms take labelled inputs and map them to the known outputs, which means you already know the target variable.
Supervised Learning methods need external supervision to train machine learning models. Hence, the name supervised. They need guidance and additional information to return the desired result.
Applications:
Supervised learning algorithms are generally used for solving classification and regression problems.
Few of the top supervised learning applications are weather prediction, sales forecasting, stock price analysis.


NEW QUESTION # 46
Mark the Incorrect statements regarding MIN / MAX Functions?

  • A. NULL values are skipped unless all the records are NULL
  • B. For compatibility with other systems, the DISTINCT keyword can be specified as an argument for MIN or MAX, but it does not have any effect
  • C. NULL values are ignored unless all the records are NULL, in which case a NULL value is returned
  • D. The data type of the returned value is the same as the data type of the input values

Answer: C

Explanation:
Explanation
NULL values are ignored unless all the records are NULL, in which case a NULL value is returned


NEW QUESTION # 47
Which of the following process best covers all of the following characteristics?
Collecting descriptive statistics like min, max, count and sum.
Collecting data types, length and recurring patterns.
Tagging data with keywords, descriptions or categories.
Performing data quality assessment, risk of performing joins on the data.
Discovering metadata and assessing its accuracy.
Identifying distributions, key candidates, foreign-key candidates,functional dependencies, embedded value dependencies, and performing inter-table analysis.

  • A. Data Visualization
  • B. Data Collection
  • C. Data Profiling
  • D. Data Virtualization

Answer: C

Explanation:
Explanation
Data processing and analysis cannot happen without data profiling-reviewing source data for con-tent and quality. As data gets bigger and infrastructure moves to the cloud, data profiling is increasingly important.
What is data profiling?
Data profiling is the process of reviewing source data, understanding structure, content and interrelationships, and identifying potential for data projects.
Data profiling is a crucial part of:
Data warehouse and business intelligence (DW/BI) projects-dataprofiling can uncover data quality issues in data sources, and what needs to be corrected in ETL.
Data conversion and migration projects-data profiling can identify data quality issues, which you can handle in scripts and data integration tools copying data from source to target. It can also un-cover new requirements for the target system.
Source system data quality projects-data profiling can highlight data which suffers from serious or numerous quality issues, and the source of the issues (e.g. user inputs, errors in interfaces, data corruption).
Data profiling involves:
Collecting descriptive statistics like min, max, count and sum.
Collecting data types, length and recurring patterns.
Tagging data with keywords, descriptions or categories.
Performing data quality assessment, risk of performing joins on the data.
Discovering metadata and assessing its accuracy.
Identifying distributions, key candidates, foreign-key candidates, functional dependencies, embedded value dependencies, and performing inter-table analysis.


NEW QUESTION # 48
Which ones are the known limitations of using External function?

  • A. External functions have more overhead than internal functions (both built-in functions and internal UDFs) and usually execute more slowly
  • B. Currently, external functions must be scalar functions. A scalar external function re-turns a single value for each input row.
  • C. An external function accessed through an AWS API Gateway private endpoint can be accessed only from a Snowflake VPC (Virtual Private Cloud) on AWS and in the same AWS region.
  • D. Currently, external functions cannot be shared with data consumers via Secure Data Sharing.

Answer: A,B,C,D


NEW QUESTION # 49
You are training a binary classification model to support admission approval decisions for a college degree program.
How can you evaluate if the model is fair, and doesn't discriminate based on ethnicity?

  • A. None of the above.
  • B. Compare disparity between selection rates and performance metrics across ethnicities.
  • C. Remove the ethnicity feature from the training dataset.
  • D. Evaluate each trained model with a validation datasetand use the model with the highest accuracy score.

Answer: B

Explanation:
Explanation
By using ethnicity as a sensitive field, and comparing disparity between selection rates and performance metrics for each ethnicity value, you can evaluate the fairness of the model.


NEW QUESTION # 50
The most widely used metrics and tools to assess a classification model are:

  • A. Confusion matrix
  • B. Cost-sensitive accuracy
  • C. Area under the ROC curve
  • D. All of the above

Answer: D


NEW QUESTION # 51
Which metric is not used for evaluating classification models?

  • A. Recall
  • B. Mean absolute error
  • C. Precision
  • D. Accuracy

Answer: B

Explanation:
Explanation
The four commonly used metrics for evaluating classifier performance are:
1. Accuracy: The proportion of correct predictions out of the total predictions.
2. Precision: The proportion of true positive predictions out of the total positive predictions (precision = true positives / (true positives + false positives)).
3. Recall (Sensitivity or True Positive Rate): The proportion of true positive predictions out of the total actual positive instances (recall = true positives / (true positives + false negatives)).
4. F1 Score: The harmonic mean of precision and recall, providing a balance between the two metrics (F1 score = 2 * ((precision * recall) / (precision + recall))).
Root Mean Squared Error (RMSE)and Mean Absolute Error (MAE) are metrics used to evaluate a Regression Model. These metrics tell us how accurate our predictions are and, what is the amount of deviation from the actual values.


NEW QUESTION # 52
Which type of Python UDFs let you define Python functions that receive batches of input rows as Pandas DataFrames and return batches of results as Pandas arrays or Series?

  • A. Hybrid Python UDFs
  • B. Scaler Python UDFs
  • C. Vectorized Python UDFs
  • D. MPP Python UDFs

Answer: C

Explanation:
Explanation
Vectorized Python UDFs let you define Python functions that receive batches of input rows as Pandas DataFrames and return batches of results as Pandas arrays or Series. You call vectorized Py-thon UDFs the same way you call other Python UDFs.
Advantages of using vectorized Python UDFs compared to the default row-by-row processing pat-tern include:
The potential for better performance if your Python code operates efficiently on batches of rows.
Less transformation logic required if you are calling into libraries that operate on Pandas Data-Frames or Pandas arrays.
When you use vectorized Python UDFs:
You do not need to change how you write queries using Python UDFs. All batching is handled by the UDF framework rather than your own code.
As with non-vectorized UDFs, there is no guarantee of which instances of your handler code will see which batches of input.


NEW QUESTION # 53
Which method is used for detecting data outliers in Machine learning?

  • A. Scaler
  • B. CMIYC
  • C. Z-Score
  • D. BOXI

Answer: C

Explanation:
Explanation
What are outliers?
Outliers are the values that look different from the other values in the data. Below is a plot high-lighting the outliers in 'red' and outliers can be seen in both the extremes of data.
Reasons for outliers in data
Errors during data entry or a faulty measuring device (a faulty sensor may result in extreme readings).
Natural occurrence (salaries of junior level employees vs C-level employees) Problems caused by outliers Outliers in the data may causes problems during model fitting (esp. linear models).
Outliers may inflate the error metrics which give higher weights to large errors (example, mean squared error, RMSE).
Z-score method is of the method for detecting outliers. This methodis generally used when a variable' distribution looks close to Gaussian. Z-score is the number of standard deviations a value of a variable is away from the variable' mean.
Z-Score = (X-mean) / Standard deviation
IQR method , Box plots are some more example of methods used to detect data outliers in Data science.


NEW QUESTION # 54
......

Free DSA-C02 Test Questions Real Practice Test Questions: https://www.topexamcollection.com/DSA-C02-vce-collection.html

DSA-C02 Dumps Updated May 05, 2024 WIith 67 Questions: https://drive.google.com/open?id=1RoCBtOGHfdghcOx6l2pN2JBoqo7VnAFG