Posts

Module 9. Assignment

Image
  1.How does the syntax and workflow differ between base, lattice, and ggplot2? You call plot() and hist() in Base R and specify labels, colors, and other things by yourself.Lattice has a formula interface (y ~ x | group) that makes tiny multiples simple to construct without having to write  loops by hand.ggplot2 employs a layered syntax that is consistent and may be added to across plots.The layers include data, aesthetics, geoms, and facets. 2. Which system gave you the most control or produced the most “publication‑quality” output with minimal code? Layering (geom_point() + geom_smooth()) and consistent theming in ggplot2 provide professional pictures rapidly.Lattice is an excellent way to create conditioned views with very little code. Base is the quickest rapid inspections, but it requires more manual styling to look goodfor publishing. 3. Any challenges or surprises you encountered when switching between systems The mental model evolves every time:  Base is easy to ...

Assignment 8

Image
 I learned how to load data into R, summarize it, filter it, and save the results in different file types for this project.First, I set up my working location so that R could find files quickly and save them in a neat place. After that, I used read.table() to add the information to a data frame. This helped me learn how heads and divisions work when reading real data. Next, I used the ddply() method and the plyr package to find the average grade and age for each gender. This taught me how to do grouped reports in R.  Lastly, I used the subset() and grepl() methods to practice filtering data. I got all the student names that started with "i" and saved both the names-only and full filtered results. By following these steps, I learned how to use R to automatically clean data, analyze it, and send it to a file. These are important skills for working with datasets quickly.  # R Code  setwd("C:/Users/shanz/OneDrive/Documents/Assigment 6") x <- read.table("Assignme...

Module 7. Assignment

Image
This week, I studied Object-Oriented Programming (OOP) in R and discovered that R perceives all entities as objects, including integers, vectors, data frames, and functions. The talk addressed two main systems used in R: S3 and S4. S3 is the preliminary, more accessible way that allows for the rapid addition of a class to a list and the creation of custom print methods. S4 is the more recent and systematically structured framework that has inherent validation and explicit class definitions with slots. We further examined the use of common methods such as summary(), print(), and plot() to see their distinct functionalities across different object kinds. Ultimately, I was able to create my own S3 and S4 objects, implement fundamental methods, and understand how R determines which function version to use, a process known as method dispatch. # Download Data for Mtcar data("mtcars") # Show the first few rows head(mtcars) # Describe its structure str(mtcars) # Test Generic Function...

Module 6 – Linear Algebra in R (Part 2)

Image
  # 1. Matrix Addition & Subtraction A <- matrix(c(2, 0, 1, 3), ncol = 2) B <- matrix(c(5, 2, 4, -1), ncol = 2) # Addition A_plus_B <- A + B A_plus_B # Subtraction A_minus_B <- A - B A_minus_B Explanation : Matrix addition and subtraction work element-by-element on matrices of the same size, combining or contrasting values at the same positions. This is useful for quickly aggregating or comparing structured numeric data # 2.Create a Diagonal Matrix D <- diag(c(4, 1, 2, 3)) D Explanation :  diag() places the supplied numbers along the main diagonal and fills all other entries with zeros. Diagonal matrices are commonly used for identity/scaling operations and as building blocks in linear algebra. # 3.Construct a Custom 5 × 5 Matrix M <- diag(3, 5, 5)           M[1, 2:5] <- 1               M[2:5, 1] <- 2     M Explanation :  started with a diagonal of 3’s and then...

Assignment #5

Image
  Why  solve(A)   and   det(A)   work  A is a square matrix (10×10), hence det(A) is defined and equals 0, which means that A is unique (not invertible). det(A) = 0, solve(A) properly gives a solitary system error (there is no inverse). Why operations on B fail (non‑square matrix). B is not a square (10×100). Inverse and determinants are only defined for square matrices, therefore both calls are wrong by definition. A determinant close to 0 means (almost) singularity and computations that aren't stable. It's better to use solution(A, b) (or qr.solve/SVD) to solve systems than to make an explicit inverse; it's more reliable and quicker. https://github.com/shanzay28/r-programming-assignments/edit/main/Doing%20Math%20in%20R%20-%20Part%201%20-README.md

Assignment 4 -Programming Sructure in R

Image
The boxplots next to each other reveal that patients who had a Bad rating on the first assessment and a High rating on the second assessment usually had greater blood pressure. The High final decision category also goes along with high blood pressure. In other words, it seems like the doctors' decisions in this small sample are in line with BP: patients who are labeled as more worrying tend to have higher BP numbers. The histograms show two things: Visit Frequency is grouped together between 0.2 and 0.6, while Blood Pressure is considerably more spread out and has distinct outliers, such a very low number at 30 and a very high value over 200. In a genuine clinical dataset, numbers so severe would cause tests for data quality (measurement mistake, unit mix-ups, or true but uncommon occurrences) before looking for patterns. The patterns in this dataset are simply examples, not generalizable, since it is little and made up (only 10 rows, decreased to 9 after cleaning). I used n...

Module 3 -Introduction to Data Frame

Image
From this chapter and exercise, I have taken away a number of operations I can perform to explore and understand a data frame in R. By utilizing functions such as str(), head() and summary(), it allowed me to visually see the structure and column names of my dataset, as well summary statistics for each variable. I also drilled how to calculate statistics such as the mean, median and range, which helped me further understand what was going on with the poll data. The results indicate that Donald is leading with the most support in both polls, but his strength varies between ABC and CBS. Ted does well. Carly and Hillary still suck. The biggest difference was caused by Donald, who received a lot more points on CBS than ABC. Since the dataset is entirely manufactured, none of these results can be taken at face value. There is no sample size or margin of error behind the numbers and no demographics to break it down. The data is only useful if you want to practice R skills that involve creati...