Pages

Wednesday, February 13, 2019

Free SAS Tutorials: Dataset creation


Dataset

SAS dataset is generally in the form of table, i.e rows and columns. In SAS, rows are called observations and columns as variables.

Dataset creation

Below statements are used for creating a dataset:

  • DATA -  The DATA step always begins with a DATA statement. The purpose of the DATA statement is to tell SAS that you are creating a new data set i.e. outdata.
  • INPUT -  To define the variables used in data set.
  • Dollar sign ($) - To identify variable as character.
  • DATALINES - To indicate that lines following DATALINES statement a real data.
  • RUN - The DATA step ends with a RUN statement.


Example 1:


Data datasetname;
     input var1 var2 $;
     datalines;
      1 anil
      2 raj
      3 ravi
      4 neetu
      ;
run; 

Code:

Output:


Example 2:


data Test;    
input Item $ 1-6 Color $ 8-14 Investment 16-22 Profit 24-31;    
format Item Color $9. Investment  Profit  15.2;     
datalines; 
SHIRTS ORANGE  2431354 83952431 
TIES   BLUE    498432  2349123 
SUITS  BLUE    9482121 69839123 
BELTS  MAGENTA 7693    14893 
SHOES  MAGENTA 7936712 22956 
run;

The highlighted input statement is another way of creating the variables by mentioning the position numbers.

Code:



Output:





Datasets are of 2 types: Temporary and Permanent

Temporary dataset:

These datasets are not available after the session is closed or ended. Usually they are saved in work library(a storage location where sas datasets are saved). 

The below 2 statements are equivalent.

data work.sample;
or
data sample;

Permanent dataset:

These datasets are saved in a location which can be used even after closing the sas session. Usually, these datasets are referenced as libref.dataset_name. Libref is a name that is temporarily assigned to a library. Library is storage where all the sas datasets are saved. 

A LIBNAME statement associates the libref with the SAS library. In the following PROC PRINT step, PROCLIB is the library reference and EMP is the SAS data set within the library:
libname proclib 'SAS-library';
proc print data=proclib.emp;
run;

Dataset creation using PROC SQL:


Proc SQL can be used for creating a dataset by connecting to Oracle database or from existing dataset.

Example: Creation of dataset sample from existing dataset i.e sashelp.cars

Proc sql;
create table sample as
select * from sashelp.cars;
quit;