Assignment 4 -Programming Sructure in R
The boxplots next to each other reveal that patients who had a Bad rating on the first assessment and a High rating on the second assessment usually had greater blood pressure. The High final decision category also goes along with high blood pressure. In other words, it seems like the doctors' decisions in this small sample are in line with BP: patients who are labeled as more worrying tend to have higher BP numbers.
The histograms show two things: Visit Frequency is grouped together between 0.2 and 0.6, while Blood Pressure is considerably more spread out and has distinct outliers, such a very low number at 30 and a very high value over 200. In a genuine clinical dataset, numbers so severe would cause tests for data quality (measurement mistake, unit mix-ups, or true but uncommon occurrences) before looking for patterns.
The patterns in this dataset are simply examples, not generalizable, since it is little and made up (only 10 rows, decreased to 9 after cleaning). I used na.omit() to deal with missing data, which got rid of the one row with a NA in the initial assessment. That made the graphs easy to read, but it also cut down on the number of samples. If the missing value wasn't random, it may have skewed the comparisons. The visualizations are still helpful for immediately seeing how evaluations relate to BP and for finding unexpected values that need further attention.
The histograms show two things: Visit Frequency is grouped together between 0.2 and 0.6, while Blood Pressure is considerably more spread out and has distinct outliers, such a very low number at 30 and a very high value over 200. In a genuine clinical dataset, numbers so severe would cause tests for data quality (measurement mistake, unit mix-ups, or true but uncommon occurrences) before looking for patterns.
The patterns in this dataset are simply examples, not generalizable, since it is little and made up (only 10 rows, decreased to 9 after cleaning). I used na.omit() to deal with missing data, which got rid of the one row with a NA in the initial assessment. That made the graphs easy to read, but it also cut down on the number of samples. If the missing value wasn't random, it may have skewed the comparisons. The visualizations are still helpful for immediately seeing how evaluations relate to BP and for finding unexpected values that need further attention.
https://github.com/shanzay28/r-programming-assignments/tree/main/Module-4-Programming-Structure





Comments
Post a Comment