Research for Global Development

Transposing variables and sub-sampling with statistical packages


Last week we needed to transform a 3000 household survey into an individual survey. To do so was a two step process. First, it was necessary to transpose the data from column based variables (there was one column for each question of each individual), into row-based observations where each individual has his/her own row. Secondly, we needed to randomly select one individual within each household over the age of 15 (adults) in the household roster. Then, voila, we would have a 3000 individual survey sub-sample.

On the first step, transforming from 13 column-based variables into row-based observations, SPSS had a super useful function helpfully named VARSTOCASES. The below sample code takes 13 sex and age columns and turns them into rows, but keeps the household ID (hhid) the same for all individuals within each household.

SPSS Syntax:
VARSTOCASES
/ID=id
/MAKE sex FROM a1_1 a1_2 a1_3 a1_4 a1_5 a1_6 a1_7 a1_8 a1_9 a1_10 a1_11 a1_12 a1_13
/MAKE age FROM a2_1 a2_2 a2_3 a2_4 a2_5 a2_6 a2_7 a2_8 a2_9 a2_10 a2_11 a2_12 a2_13
/INDEX=Index1(13)
/KEEP=hhid
/NULL=KEEP.

Once we had our household roster, the next challenge was to find a way to generate a stratified sample. SPSS requires you to purchase an additional package unhelpfully named “Complex Samples.”  R has a function called Strata in the Sampling package but the examples and documentation were a bit befuddlingly.  I ended up going with Stata’s bsample utility which had the simplest code and the best documentation to back it up. The below code limits the sample to those over age 15, then selects 1 individual from each household.

Stata Syntax:
clear
use "C:YourData.dta"
tab hhid
drop if Age<15 bsample 1, strata( hhid) tab hhid

InterMedia

Transposing variables and sub-sampling with statistical packages


Last week we needed to transform a 3000 household survey into an individual survey. To do so was a two step process. First, it was necessary to transpose the data from column based variables (there was one column for each question of each individual), into row-based observations where each individual has his/her own row. Secondly, we needed to randomly select one individual within each household over the age of 15 (adults) in the household roster. Then, voila, we would have a 3000 individual survey sub-sample.

On the first step, transforming from 13 column-based variables into row-based observations, SPSS had a super useful function helpfully named VARSTOCASES. The below sample code takes 13 sex and age columns and turns them into rows, but keeps the household ID (hhid) the same for all individuals within each household.

SPSS Syntax:
VARSTOCASES
/ID=id
/MAKE sex FROM a1_1 a1_2 a1_3 a1_4 a1_5 a1_6 a1_7 a1_8 a1_9 a1_10 a1_11 a1_12 a1_13
/MAKE age FROM a2_1 a2_2 a2_3 a2_4 a2_5 a2_6 a2_7 a2_8 a2_9 a2_10 a2_11 a2_12 a2_13
/INDEX=Index1(13)
/KEEP=hhid
/NULL=KEEP.

Once we had our household roster, the next challenge was to find a way to generate a stratified sample. SPSS requires you to purchase an additional package unhelpfully named “Complex Samples.”  R has a function called Strata in the Sampling package but the examples and documentation were a bit befuddlingly.  I ended up going with Stata’s bsample utility which had the simplest code and the best documentation to back it up. The below code limits the sample to those over age 15, then selects 1 individual from each household.

Stata Syntax:
clear
use "C:YourData.dta"
tab hhid
drop if Age<15 bsample 1, strata( hhid) tab hhid

Marketing Materials

Contact Us:

InterMedia Headquarters

1825 K Street, NW
Suite 650
Washington, D.C. 20006
+1.202.434.9310
FAX: +1 202 434 9560
Contact | View Map

InterMedia Africa

UN Avenue, Gigiri Nairobi
Box 10224
City Square 00200
Nairobi, Kenya
+254.720.109183
Contact | View Map