Pages

Wednesday, December 12, 2018

Top 25 most frequently asked SAS interview questions and answers


1. What is the Program Data Vector (PDV)? What are its functions?
  • The function is to store the current obs. PDV (Program Data Vector) is a logical area in memory where SAS creates a dataset one observation at a time. When SAS processes a data step it has two phases. Compilation phase and execution phase. During the compilation phase the input buffer is created to hold a record from external file. After input buffer is created the PDV is created. The PDV is the area of memory where SAS builds dataset, one observation at a time. The PDV contains two automatic variables _N_ and _ERROR_.The Logical Program Data Vector (PDV) is a set of buffers that includes all variables referenced either explicitly or implicitly in the DATA step. It is created at compile time, then used at execution time as the location where the working values of variables are stored as they are processed by the DATA step program

2. Difference between IF and Where:

  • IF : it can only be used in a DATA step. Many IF statements can be used in one DATA step. Must read the record into the program data vector to perform selection with IF
  • WHERE: can be used in a DATA step as well as a PROC. A second WHERE will replace the first unless the option ALSO is used. Data subset prior to reading record into PDV.


3. Ways to create macro variable
  • In addition to the %LET statement, other features of the macro language that create macro variables are:
    • iterative %DO statement
    • %GLOBAL statement
    • %INPUT statement
    • INTO clause of the SELECT statement in SQL
    • %LOCAL statement
    • %MACRO statement
    • SYMPUT routine and SYMPUTN routine in SCL
    • %WINDOW statement

4. How to convert rows to columns/ columns to rows in SAS
  • Proc TRANSPOSE is used to to convert the data

5. What is the difference between using drop = data set option in data statement and set statement?
  • If you do not want to process certain variables and do not want them to appear in the new dataset, then mention the drop=  option in the SET statement. 
    • Example: set datasetname(drop= var1 var2);
  • If you want to process certain variables and do not want them to appear in the output dataset, then mentopn the drop= option in the DATA statement.
    • Example: Data newdatasetname(drop=var1 var2);

6. What is the difference between "+" operator and SUM function?
  • "+" operator returns the output as missing if there are any missing values in the data
    • Example: Y= 3 + . + 2 output: Y=.
  • SUM function returns the sum of non missing values even if there are missing values in the data
    • Example: Y= sum(3 , . ,2) output: Y=5

7. How many datatypes are there in SAS?
  • We have 2 datatypes in SAS: 
    • Numeric
    • Character
  • Date is considered as numeric datatype.


8. How to remove duplicate observations in SAS?
  • Here are the few techniques to remove duplicates:
    • Using 'Nodup' or 'NodupKEY' option in proc sort
      • Example: Proc sort data=datasetname Nodup; by var1; Run;
    • Using first. and last. option
      • Example: 
                                data datasetname;
                                set inputdata;
                                by id;
                                if first.id and last.id;
                                run;

          9. What is the difference between input and put function in SAS?
          • Put : converts numeric to character
            • Example:
                                  data put_function;
                                      pincode_num= 123456;
                                      pincode_char= put(pincode_num, 6.);
                                  run;
          • Input :  converts character to numeric
            • Example:
                                    data input_function;
                                        salary_char= '12345678';
                                        salary_num= input(salary_char, 8.);
                                    run;     


              10. What is the default length while using scan function?
              • 200 length

              11. Name few SAS functions that you worked on?
              • substr
              • find
              • intnx
              • intck
              • index
              • catx 
              • sun
              • alnum
              • scan

              12. How do SAS dates  work?
              • SAS date is stored as numeric value. Jan 1, 1960 has the sas date value as 0. Any date after this is the no. of days from this date. Example, Jan 1, 1961 is 366.

              13. What are the default option of proc means?
              •  n, mean, minimum, maximum, standard deviation

              14. What the few procedures that you worked on?
                • Proc SORT : Sorts the data by the variable mentioned in the by statement. Using nodupkey, we can remove duplicates.
                • Proc APPEND : Adds one dataset to another. 'force' option to be used when the variables are of different length in two datasets

                15. What is the difference between nodupkey and nodup in sort procedure?
                • The identical observations are checked and removed through NODUP option. NODUPKEY option checks for all BY variable values and if found, it will eliminate that.

                16. What is the difference between VAR V1 – V3 and VAR V1 -- V3?
                • VAR V1 - V3 would return V1, V2 and V3 variables
                • VAR V1 -- V3 would return all the variables between V1 and V3. 
                  • For example, there are 5 variables i.e. V1 name id V2 V3, then all the variables between V1 and V3 would be returned

                17. What is the difference between format and informat?
                • An informat is a specification for how raw data should be read. 
                • A format is a layout specification for how a variable should be printed or displayed.

                18. Explain why double trailing @@ is used in Input Statement?

                • During data step iteration, including double trailing @@ in Input statements implies that SAS should hold the current record for the purpose of execution of next Input statement rather than switching onto the new record

                19. Explain data _null_?
                • DATA statement processes all statements within the DATA step without dataset creation

                20. What is the difference between Proc MEANS and Proc SUMMARY?

                • Proc Means: This procedure produces the printed report by default in the OUTPUT window. By default take all the numeric variables in the analysis.
                • Proc Summary: This procedure includes the PRINT in the statement to produce the printed report. It takes the variables into the statistical analysis that are described in VAR statement.

                21. Mention SAS system options to debug SAS macros.
                • MLOGIC
                • MPRINT
                • SYMBOLGEN


                22. What is the difference between SYMPUT and SYMGET?


                • SYMPUT: used for storing the value of a data set into the macro variable.
                • SYMGET: used for retrieving the value from the macro variable to the data set.

                23. What are the programming errors that you committed?
                • Not checking log after submitting program
                • Missing semicolon
                • run statement instead of quit in proc sql

                24. What is the difference between the SAS DATA STEP and SAS PROCs?
                • SAS DATA STEP is used to read in and manipulate data.
                • SAS PROCs are sub-routines perform tasks on SAS data set.

                  25. what is the use of %include statement?
                  • %INCLUDE statement reads an entire file into the current SAS program you are running and submits that file to the SAS System immediately.




                  Please feel free to drop your comments. If you need 1-1 online training on data science, SAS & SQL, drop a comment below.


                  Subscribe for Email to get free updates on Data Science.


                                        No comments:

                                        Post a Comment