Pages

Tuesday, December 18, 2018

Free SAS tutorials: SAS software without installation

SAS OnDemand for Academics is the free software and no installation is required. Internet is required as it runs on cloud and can be accessed for anywhere.


Steps for accessing SAS OnDemand for Academics:


  1. Register yourself by clicking on the link: SAS Registration
  2. Enter your First Name, Last Name, and Email Address in the form. Select the Country in which you reside. Click Submit.
  3. You will receive an email from SAS team for activation of your profile
  4. Clink on the link and set your password by mentioning your email id
  5. You will receive another email with your user id and the link to access SAS Studio

Click on the link to sign in: Sign In to SAS

Once you login the below screen appears:


Advantages of the software:

  • Its free
  • Can be accessed for everywhere
  • No installation is required

The following products are available on the software:

  • Base SAS
  • SAS/STAT software 
  • SAS/GRAPH software
  • SAS/ETS software 
  • SAS/OR software, including OPT, PRS, IVS, and LSO 
  • SAS/IML software 
  • SAS/CONNECT 
  • SAS High-Performance Forecasting 
  • SAS/ACCESS Interface to PC Files 
  • SAS/QC software

Browsers supported:

  • Microsoft Internet Explorer 9, 10, and 11 (Microsoft Internet Explorer 10 or later is recommended for certain features, such as the ability to drag and drop files) 
  • Mozilla Firefox 21 or later 
  • Google Chrome 27 or later 
  • Apple Safari 6.0 or later (on Mac OS X) 


Please feel free to drop your comments. If you need 1-1 online training on data science, SAS & SQL, drop a comment below.


Subscribe for Email to get free updates on Data Science.

Monday, December 17, 2018

Data Science summarized in ONE picture




Please feel free to drop your comments. If you need 1-1 online training on data science, SAS & SQL, drop a comment below.


Subscribe for Email to get free updates on Data Science.

Wednesday, December 12, 2018

Case Study : CHICAGO CRIME DATA ANALYSIS

Problem Statement:  

The below location has historical crime data in the state of Chicago.   The data has different co-ordinates like location,date/time,address,zipcode etc. The requirement here is to provide visual insights dashboard from the existing data for better prevention and policing. You can use any visualization tool as you wish.

Solution:

Business Analysis:

There are many visualization tools available in the market. I have utilized Tableau for the visualisation. For insights into this data, I have done the below visualisation charts.  
    • Crimes Trending Analysis : The goal is to understand the period during which there are more crimes than the rest of periods.
    • Type and Location Trends :  Top 5 locations with more crimes. 
    • Location Analysis :   Sub-location crime analysis. Goal is to understand where exactly the police have to pay attention to during patroling.
    • Type Analysis : Different Types of crimes at different locations.
    • Geo Analysis:  This is a good geo chart with spots to identify the location analysis.
    • Zipcode Analysis: Crime Analysis by zipcode.
    • Community Area Analysis : Identify top few communities with crime rate.
    • Arrest Analysis: Provide insights where arrests are happening. The police can compare arrests analysis vs geo analysis and streamline their operations accordingly.

 Visualisation Outputs:












From the graph we can understand that the Theft is happening mostly on the Streets and Battery on Sidewalk, Residence and apartments etc



















Summary:


From the above graphs, below insights can be drawn:
  •      Most of the crimes are happening in the mid of the year i.e. May, June, July, August and during the evening hours which is after 6PM and also observed crimes happening during noon.
  •     60% of the crimes are happening in these locations: Streets, Residence, Apartment and Sidewalk.
  •      60% of the crimes are theft, battery, criminal damage, narcotics
  •      Need to concentrate on the beats per each zipcode or district
  •      Police to concentrate more on the arrest performance and also to find new techniques to arrest criminals. Only 26% of the criminals are arrested in 2015

Where should Chicago Police Focus?

  • Type Vs Location focus?
    • Street <=> Theft
    • Apartment <=> Battery
    • Sidewalk <=> Narcotics
  • Which places to focus?
    • Zipcode: 0000X, 001XX, 002XX
    • Community: 25
  • When to have increased Policing
    • May - October :12:00 PM, 18:00  - 21:00
    • Arrests : Focus on all types excluding Narcotics


      Statistical Modeling ideas:

      •        Considering the past/historic data i.e. 15years to predict the crime in future.


      •        Few of the advanced analytics we can perform are :
        •        Clustering,
        •        Hypothesis testing
        •        Trending analysis
        •        Pattern Detection


      •        Identifying the trends in the crimes can help police monitor on those areas.


      •        Logistic regression can be used to predict if a person is criminal or not.


      •        Pattern analysis can help in identifying the patterns followed by criminals.


      •        Model can be built to predict the occurrence of crime.


      •        Data required for more analysis:
        •        Information on criminal age and background, gender, race can help in identifying the crime trends.
        •        Census data to understand the population by area. Using this data we can understand the crime and population relation.


      Please feel free to drop your comments. If you need 1-1 online training on data science, SAS & SQL, drop a comment below.


      Subscribe for Email to get free updates on Data Science.


      Top 25 most frequently asked SAS interview questions and answers


      1. What is the Program Data Vector (PDV)? What are its functions?
      • The function is to store the current obs. PDV (Program Data Vector) is a logical area in memory where SAS creates a dataset one observation at a time. When SAS processes a data step it has two phases. Compilation phase and execution phase. During the compilation phase the input buffer is created to hold a record from external file. After input buffer is created the PDV is created. The PDV is the area of memory where SAS builds dataset, one observation at a time. The PDV contains two automatic variables _N_ and _ERROR_.The Logical Program Data Vector (PDV) is a set of buffers that includes all variables referenced either explicitly or implicitly in the DATA step. It is created at compile time, then used at execution time as the location where the working values of variables are stored as they are processed by the DATA step program

      2. Difference between IF and Where:

      • IF : it can only be used in a DATA step. Many IF statements can be used in one DATA step. Must read the record into the program data vector to perform selection with IF
      • WHERE: can be used in a DATA step as well as a PROC. A second WHERE will replace the first unless the option ALSO is used. Data subset prior to reading record into PDV.


      3. Ways to create macro variable
      • In addition to the %LET statement, other features of the macro language that create macro variables are:
        • iterative %DO statement
        • %GLOBAL statement
        • %INPUT statement
        • INTO clause of the SELECT statement in SQL
        • %LOCAL statement
        • %MACRO statement
        • SYMPUT routine and SYMPUTN routine in SCL
        • %WINDOW statement

      4. How to convert rows to columns/ columns to rows in SAS
      • Proc TRANSPOSE is used to to convert the data

      5. What is the difference between using drop = data set option in data statement and set statement?
      • If you do not want to process certain variables and do not want them to appear in the new dataset, then mention the drop=  option in the SET statement. 
        • Example: set datasetname(drop= var1 var2);
      • If you want to process certain variables and do not want them to appear in the output dataset, then mentopn the drop= option in the DATA statement.
        • Example: Data newdatasetname(drop=var1 var2);

      6. What is the difference between "+" operator and SUM function?
      • "+" operator returns the output as missing if there are any missing values in the data
        • Example: Y= 3 + . + 2 output: Y=.
      • SUM function returns the sum of non missing values even if there are missing values in the data
        • Example: Y= sum(3 , . ,2) output: Y=5

      7. How many datatypes are there in SAS?
      • We have 2 datatypes in SAS: 
        • Numeric
        • Character
      • Date is considered as numeric datatype.


      8. How to remove duplicate observations in SAS?
      • Here are the few techniques to remove duplicates:
        • Using 'Nodup' or 'NodupKEY' option in proc sort
          • Example: Proc sort data=datasetname Nodup; by var1; Run;
        • Using first. and last. option
          • Example: 
                                      data datasetname;
                                      set inputdata;
                                      by id;
                                      if first.id and last.id;
                                      run;

              9. What is the difference between input and put function in SAS?
              • Put : converts numeric to character
                • Example:
                                      data put_function;
                                          pincode_num= 123456;
                                          pincode_char= put(pincode_num, 6.);
                                      run;
              • Input :  converts character to numeric
                • Example:
                                        data input_function;
                                            salary_char= '12345678';
                                            salary_num= input(salary_char, 8.);
                                        run;     


                  10. What is the default length while using scan function?
                  • 200 length

                  11. Name few SAS functions that you worked on?
                  • substr
                  • find
                  • intnx
                  • intck
                  • index
                  • catx 
                  • sun
                  • alnum
                  • scan

                  12. How do SAS dates  work?
                  • SAS date is stored as numeric value. Jan 1, 1960 has the sas date value as 0. Any date after this is the no. of days from this date. Example, Jan 1, 1961 is 366.

                  13. What are the default option of proc means?
                  •  n, mean, minimum, maximum, standard deviation

                  14. What the few procedures that you worked on?
                    • Proc SORT : Sorts the data by the variable mentioned in the by statement. Using nodupkey, we can remove duplicates.
                    • Proc APPEND : Adds one dataset to another. 'force' option to be used when the variables are of different length in two datasets

                    15. What is the difference between nodupkey and nodup in sort procedure?
                    • The identical observations are checked and removed through NODUP option. NODUPKEY option checks for all BY variable values and if found, it will eliminate that.

                    16. What is the difference between VAR V1 – V3 and VAR V1 -- V3?
                    • VAR V1 - V3 would return V1, V2 and V3 variables
                    • VAR V1 -- V3 would return all the variables between V1 and V3. 
                      • For example, there are 5 variables i.e. V1 name id V2 V3, then all the variables between V1 and V3 would be returned

                    17. What is the difference between format and informat?
                    • An informat is a specification for how raw data should be read. 
                    • A format is a layout specification for how a variable should be printed or displayed.

                    18. Explain why double trailing @@ is used in Input Statement?

                    • During data step iteration, including double trailing @@ in Input statements implies that SAS should hold the current record for the purpose of execution of next Input statement rather than switching onto the new record

                    19. Explain data _null_?
                    • DATA statement processes all statements within the DATA step without dataset creation

                    20. What is the difference between Proc MEANS and Proc SUMMARY?

                    • Proc Means: This procedure produces the printed report by default in the OUTPUT window. By default take all the numeric variables in the analysis.
                    • Proc Summary: This procedure includes the PRINT in the statement to produce the printed report. It takes the variables into the statistical analysis that are described in VAR statement.

                    21. Mention SAS system options to debug SAS macros.
                    • MLOGIC
                    • MPRINT
                    • SYMBOLGEN


                    22. What is the difference between SYMPUT and SYMGET?


                    • SYMPUT: used for storing the value of a data set into the macro variable.
                    • SYMGET: used for retrieving the value from the macro variable to the data set.

                    23. What are the programming errors that you committed?
                    • Not checking log after submitting program
                    • Missing semicolon
                    • run statement instead of quit in proc sql

                    24. What is the difference between the SAS DATA STEP and SAS PROCs?
                    • SAS DATA STEP is used to read in and manipulate data.
                    • SAS PROCs are sub-routines perform tasks on SAS data set.

                      25. what is the use of %include statement?
                      • %INCLUDE statement reads an entire file into the current SAS program you are running and submits that file to the SAS System immediately.




                      Please feel free to drop your comments. If you need 1-1 online training on data science, SAS & SQL, drop a comment below.


                      Subscribe for Email to get free updates on Data Science.