BIOSTATISTICS ASSIGNMENT 2
Running head: BIOSTATISTICS ASSIGNMENT 1
Data and Data Cleaning
In this study, the data in consideration constitutes among others factors that lead to COVID-related stress among health care givers in a university-based hospital. It has 12 variables in total and 114 cases.
First the data was copied in a different excel sheet, relabeled Health_Data, and imported into SPSS. As part of the data cleaning process, the imported data was then checked for any extra variables that might have been created in the process of importing. There were no extra variables created. The next step was to rename the variables so that the variable names are short and precise; unlike the variable labels that explain in detail what the variables stand for. The respective names given to each of the 12 variables are shown in the SPSS data file. The responses to the various questions were also fine-tuned in SPSS to match the Statistical standards. Each variable was selected at a time. Using the control-find/ replace option, responses such as 3 Applied to me very much, or most of the time – ALMOST ALWAYS were found and replaced with much shorter responses such as ALMOST ALWAYS. The necessary coding was then carried out in the variable view to much the response with the assigned score (in this case, 3=ALMOST ALWAYS).
Another important aspect of the data cleaning process in SPSS is ensuring that the variables are in their right formats (string/numeric) and measures (nominal/ordinal/scale). This aspect was counter-checked in the variable-view option.
Missing Observations and Imputation
Using the descriptive statistics (Frequencies option), it was possible to establish how many missing values were present for each of the variables.
The table below shows the number of complete and incomplete cases for each of the 12 variables in the dataset.
are you provided with adequate personal protective equipment when needed?
have you had a direct contact with a confirmed COVID-19 case
On scale 0 to 10, how do you rate your fear of getting infected with COVID-19, (0= I have no fears at all, 10= I am very afraid)
Do you find it hard to cope with the current situation?
Do you think you have a good social/emotional support during this pandemics?
S1 I found it hard to wind down
S2 I tended to over-react to situations
S3 I felt that I was using a lot of nervous energy
S4 I found myself getting agitated
S5 I found it difficult to relax
S6 I was intolerant of anything that kept me from getting on with what I was doing
S7 I felt that I was rather touchy
Evidently, all the variables have complete observations except the third variable (on the fear of getting infected with COVID-19). The variable has 6 missing values. To identify the specific cases with missing values within the third variable, the filter option in excel was used. A screenshot of the cases with missing observations is displayed below:
Given that the first row is for variable names, cases 23, 24, 25, 26, 27, and 28 have missing observations for the variable Fear (On scale 0 to 10, how do you rate your fear of getting infected with COVID-19, (0= I have no fears at all, 10= I am very afraid). The mean imputation method (in the Transform-Replace missing values (Series mean/mean imputation option in SPSS) was used to fill in the missing positions. Since the meaning of the complete observations was 5.4, the mean imputation rounded it off to 5 and replaced all the missing positions.
Score of Stress
The stress_score variable was created in SPSS by aggregating the scores of seven variables (labeled S1-S7) in the original dataset. The transform-compute variable option was used. In the expression dialogue box, the study made use of the sum function (statistical) to facilitate the row-wise addition of scores for the seven variables.
All the necessary recoding was carried out in SPSS. The output is part of the restructured dataset.
The binary labels presence and absence of stress were done for the stress_score variable under a new variable called stress_classification. The normal and mild scores were classified as absence of stress while moderate, severe, and extremely severe scores were classified as presence of stress.
Old and New Variables
Before aggregating the scores for the seven variables (S1-S7) into one variable (stress_score), the variables were ordinal in nature, with responses ranging from Never=0 to Almost Always=3. The aggregated variable (stress_score) is still ordinal in nature, with categories ranging from 0-4 (Normal) to 14+ (Extremely Severe). The values in the stress score are then recoded into Normal=0, Mild=1, Moderate=2, Severe=3, and Extremely Severe=4 (using the transform-recode into same variables option in SPSS). Even after the recoding, the stress_score variable remains ordinal. A focus on the stress_classification variable (classification of stress scores into absence or presence of stress) reveals that the variable is nominal in nature (value 0 (for Normal and Mild stress scores) represents absence of stress whereas value 1 (for Moderate, Severe, and Extremely Severe stress scores) represents presence of stress).
The factors for COVID-related stress touch on the provision of appropriate PPE, being in contact with COVID case, rating fear of getting infected with COVID, finding it hard to cope-up with the current situation, and thinking to have good social or emotional support during COVID pandemic. All these variables are categorical in nature. Therefore, the appropriate graphical representation would be bar graphs. The bar graphs were produced in SPSS.s The graphs are displayed below.
On the question of protective gear, it can be observed that the majority of the respondents feel that they are being provided with enough PPEs. The majority of the health caregivers have been in contact with COVID patients. Nothing much can be deduced from the fear rating variable. The majority of health care workers do not find it hard to cope with the current situation. The majority of them also believe they have good social-emotional support during the pandemic.
As for the stress scores variable and the classification variable, a cross-tabulation would be key in establishing the number of health care workers that fall in each of the classifications of stress. The cross-tab results are displayed in the following table:
stress score * classification into absence or presence of stress Crosstabulation
classification into absence or presence of stress
Absence of Stress
Presence of Stress
From the table, it is evident that the majority of the respondents fall in the Normal/Mild stress score. This group is given the tag “absence of stress”. The results are consistent with the ones obtained in the bar graphs for the 5 COVID-related stress factors.
Univariable analysis was carried out to determine the relationship between fear and stress score. Logistic regression was conducted. The results are as shown below.
95% Confidence Interval for Exp(B)
a. The reference category is: Extremely Severe.
From the results, only the normal stress score has a significant association with the fear variable.
Further Multivariable Analysis
For further multivariate analysis, we would require more quantitative variables since the rest of the independent variables are categorical in nature.
Most Significant Variable
Evidently, the factor most significantly associated with COVID-related stress is the fear of being infected with the disease.
BIOSTATISTICS ASSIGNMENT 2