IMPORTANT: THIS PROJECT IS ONLY FOR SURVEY ANALYSTS.
Your Creator assignment is to draft an analysis plan that contains the following sections and tasks.
DO NOT START THIS PROJECT IF YOU ARE A SURVEY MANAGER.
Cleaning the data is the first step in data analysis to ensure data quality. The following checks can be written as syntax within a selected software.
Data cleaning will be made using the ASSERTLIST (VCQI) command in Stata or classical tools in R. Mistaken data can be extracted into spreadsheet for verification.
Sample weight is a statistical measure that accounts for the contribution of survey respondents to the population from which they were sampled. In a weighted survey, each respondent selected for the sample represents a similar number of eligible respondents from the population. Therefore, weighting a sample size is important to estimate a population coverage.
There are three categories of weights: the design weights, response weights, and the post-stratification weights.
To decide whether we want to use one of this weight depends of the type of analysis. The most important ones here are the design weights and the response weights. All these will be calculated in excel shells before matching them with the eligible children.
There are obtained in 3 steps of sampling, and each step needs 2 informations.
The sampling weight for a child in a particular household within a cluster is 1/(p1*p2*1).Therefore this becomes the household design since p3=1. At this stage, we should have a spreadsheet that looks like the attached one.
2. Calculate response weights or Adjust for non-response at the household level and child level
Here we need the number of households that were not interviewed despite repeated visits, and the number of eligible respondent who did not participate. Identify on the data set using a new variable, the respondent (code 1) and non-respondent (code 0), for households and eligible children.
Then we calculate the response rate at the level of strata and respondents by dividing the number of household with complete interview (or number of respondent in a household) by the total number of household within strata (or total eligible children within a household). And the final response rate is the product of both. Therefore, the response rate of a respondent is the number of eligible with complete interviews divided by the number of eligible respondent per stratum, and the response weights is the reciprocal.
To summarize,
3. Post-stratification weights
Post-stratification weight is possible when the objective is to estimate the coverage at a national level or sub-regional level, by combining estimates in classes such as gender, ethnicity, or geographical zones. This is also possible if we have accurate data on the total eligible population in each class/stratum/states. Consequently, post-stratified weights make the sum of weights in each class proportional to the known eligible population.
For this reason, I need the eligible population totals for each state and zone which can be obtained from census agency. In addition, the eligible totals for each sex subgroup within each state and zones (can also be obtained from a census agency) is also needed.
In each geographic area or demographic within geographic area, calculate a post-stratify weights given the weight (non-response weights) of each respondent by multiplying this weight by the known population of the stratum and, divide it by the sum of weights in the stratum; that is:
Scaled weights(i)=unscaled_weight(i) * (known eligible population total for stratum)/sum(of unscaled weights in the stratum)
Each of the following table will be filled with weighted percentages, representing the population proportions.
We have provided Table 6 to help supporting the team for the next campaign. The table will aid in documenting the design effect (DEFF) and intra-class correlation (ICC) by providing recent data (coverage results) on the proportion of vaccinated children. This proportion will be used in the calculation of the effective sample size (ESS), as the expected vaccination coverage along with a desired precision. Then the DEFF is a function of the ESS and the known target number of respondent per cluster and the ICC. The ICC could be obtained by fitting a linear mixed model.
The attached spreadsheet shows the different table shells. In each table, one can include the 95% confidence interval because this shows how precise is the point estimate and its calculation will be based on the weighted sample size. One can also include the weighted and the unweighted sample size for each stratum to easily visualize how prevalence varies between stratum, and this helps in identify stratum with low coverage, and track demographic characteristics that influence the unweighted sample size. The two-sided 95% confidence interval should also accompagny the estimate. This interval contains the true value of the coverage. It means that if the survey is replicated 100 times, and we calculated at every occasion the interval, 95% will contain the true coverage value; hence we are 95% confident that it contains the population coverage.
In this space, we presented the vaccination coverage of BCG disaggregated by sex and the 14 stratum. We also presented the results of sex disaggregated by the 12 states and the 2 zones. Percentages of children with evidence of vaccination based on a date on a home-based record were calculated. The syntax of the program (see attached file) that calculates the results has been written in a text file (Notepad).
We also used the survey package in R, namely the function “svydesign” that represent the study design and the function “svyby” (which gives coverage by different level) accompagnied with “svyciprop” to calculate proportion, and “confint” to extract confidence intervals. The design here is a 2-level cluster sampling with 12 strata, 655 clusters, and 1649 households. And we considered that the primary sampling units are clusters and based the design model on clusters.
Confidence intervals were calculated using the logit method and based on the complex sampling design.
Results obtained in R were extracted into excel file to avoid hand-copy and hand-paste, and arranged according to states, zone and sex
The attached spreadsheet showed the results.
The 95% confidence interval was shown for purpose of precision and accuracy.
1. Summary of the methods
Data cleaning was performed on identifiers variables, demographic variables and dates. Vaccination dates were checked to be consistent and non-sensitive. In case mistakes were encountered, the data set was amended to include corrections on the indicator of interest. Correction options like the Asserlist command in VCQI, and R program codes were used.
In addition, descriptive statistics were used to calculate the respondent rates in each stratum. The probabilities of selection were documented and were used to calculate the design weights, response weights and post-stratify weights.Data was weighted to fit population totals in each state, zone and sex. An Excel tool was used to calculate the different sampling weights and we matched them with individual patient data.
For purpose of estimation, a derived variable was obtained for a valid BCG vaccine. This indicator combined age-eligible children with home-based record, who had tick mark with date in the card. We assumed that the sum of survey weights in each stratum is an estimate of the relative counts of eligible population. Across states, zones and sex, a weighted valid coverage for BCG was obtained with their 95% confidence intervals based on a complex sample design. Results on gender were also presented dissagragated by states and zones. The analysis method computes confidence intervals for proportions using a method that produces more accurate estimates. Thus we used the logit function. All statistical analysis was carried under R, version 3.1.3.
2. Results summary of the Nigeria combined MICS/NICS, 2016-2017
Results in the 12 states, 2 zones and sex
A total of 12 states, 655 clusters and 1649 households were selected. All eligible children were of 1728 (unweighted sample size), and 5545 were the weighted sample size that fit population totals in sex, states and zones (Table 1 in the result section). The graph (image attached) showed percentages of children who had received a valid dose of BCG, disaggregated by sex (male, female), states (12), and zones (North east, South South). Vaccination coverage ranged from 4% to 69%, with a critical coverage estimate in Yobe (4.42%). In addition, although states in the South South zone tend to have higher coverages than states in the North east zone, percentages remained fewer than 80%, and their corresponding limits bounds were also below 80%. Hence, all stratum highlighted low evidence of valid BCG. In the North East zone, 22% of the children population who were eligible for the survey would have been estimated to have a home-based record and to have received a valid dose of BCG. In contrary, 48% in the South South were concerned.
Results on sex disaggregated by states and zones
Female coverage (30.4%, 95% CI=27-34) tended to be higher than male coverage (27%; 95% CI=22-33). In addition, of 1808 boys in the sample who received a valid dose of BCG, 44.1% were administered before the age 1 year in the South South zone, and 51.9% among the 1873 girls. The coverage decreased among boys and girls in the North East zone.
1. Identification of concerns/caveats
In the planning of the study, the team must consider health facility registers to allow more data and ensure its quality. They can find support from local health authority to identify all relevant registers.In home-based records, they must consider a card with legible data of BCG vaccination with a day, a month, and a year.
The big picture here is a low-coverage of BCG found in every stratum and demographics, suggesting an urgent need of supplementary immunization activities or post-campaigns to reinforce and evaluate the BCG coverage. It will be worth to carefully look at states where none of the children in the survey received BCG. In addition, there is a concern in assessing the access to health facilities that serve these states, and how is it possible that so many children in the sample were not vaccinated. Mothers who give birth should be advised on the importance of BCG before they leave the hospitals. The different states should also be studied regarding the quality of recording and reporting vaccinations. A tick mark in the card must be accompanied by a consistent date because a low coverage could indicate that many children had home-based records but did not receive BCG at birth, or that, they received BCG and it was not indicated. Furthermore, the team committee must compare the reasons for non-vaccination with higher and lower coverage values. Sensibilization campaigns must be increased for a better adherence of caretakers to vaccination schedule.
2. Strenghts and limitations
Valid coverage is important from an immunological point of view;
As far as limitations is concerned,