Stata for Students: Correlations
A Pearson's correlation attempts to draw a line of best fit through the data of two variables, and the Pearson correlation coefficient, r, indicates how far away all. When you use the correlation command in Stata, listwise deletion of missing the strength and direction of the linear relationship between the two variables. Correlation analysis is conducted to examine the relationship between dependent and independent variables. There are two types of.
Correlations This article is part of the Stata for Students series.
If you are new to Stata we strongly recommend reading all the articles in the Stata Basics section. Correlations are a measure of how strongly related two quantitative variables are. It can only perfectly measure linear relationships, but a linear relationship will serve as a first approximation to many other kinds of relationships.
You can calculate correlations for categorical variables and the results you get will sometimes point you in the right direction, but there are better ways to describe relationships involving categorical variables. Correlation coefficients range from -1 to 1. A positive correlation coefficient means the two variables tend to move together: The larger the coefficient the stronger the relationship.
A negative correlation coefficient means they tend to move in opposite directions: Variables which are independent will have a correlation of zero, but variables which are related but not in a linear way can also have a correlation of zero. Setting Up If you plan to carry out the examples in this article, make sure you've downloaded the GSS sample to your U: Then create a do file called cor.
If you plan on applying what you learn directly to your homework, create a similar do file but have it load the data set used for your assignment. Each section gives a brief description of the aim of the statistical test, when it is used, an example showing the Stata commands and Stata output with a brief interpretation of the output. You can see the page Choosing the Correct Statistical Test for a table that shows an overview of when each test is appropriate to use.
In deciding which test is appropriate to use, it is important to consider the type of variables that you have i.
About the hsb data file Most of the examples in this page will use a data file called hsb2, high school and beyond. This data file contains observations from a sample of high school students with demographic information about the students, such as their gender femalesocio-economic status ses and ethnic background race.
Correlation analysis using STATA
It also contains a number of scores on standardized tests, including tests of reading readwriting writemathematics math and social studies socst. You can get the hsb2 data file from within Stata by typing: For example, using the hsb2 data filesay we wish to test whether the average writing score write differs significantly from We can do this as shown below. We would conclude that this group of students has a significantly higher mean on the writing test than See also Stata Class Notes: Analyzing Data One sample median test A one sample median test allows us to test whether a sample median differs significantly from a hypothesized value.
We will use the same variable, write, as we did in the one sample t-test example above, but we do not need to assume that it is interval and normally distributed we only need to assume that write is an ordinal variable and that its distribution is symmetric.
We will test whether the median writing score write differs significantly from See also Stata Code Fragment: Descriptives, ttests, Anova and Regression Binomial test A one sample binomial test allows us to test whether the proportion of successes on a two-level categorical dependent variable significantly differs from a hypothesized value.
See also Chi-square goodness of fit A chi-square goodness of fit test allows us to test whether the observed proportions for a categorical variable differ from hypothesized proportions.
Correlation | Stata Annotated Output
We want to test whether the observed proportions from our sample differ significantly from these hypothesized proportions. To conduct the chi-square goodness of fit test, you need to first download the csgof program that performs this test.
You can download csgof from within Stata by typing search csgof see How can I used the search command to search for programs and get additional help?