Sowmyalakshmi
King county is one of the fastest growing city in the US. The early 20th century was a time of population growth and industrial diversification. The data set taken for analysis has data of 2014-2015. The primary goal is to analyze the factors affecting the Sales/Selling Price of the houses.
HOW?
Brainstorm the Key Questions, Analyze the Data set and find corresponding answers applying the basic functions, charts, plots, testing methods and models.
KEY QUESTIONS
1. What is the mean home price in Seattle?
2. Are the renovated houses expensive than the non renovated ones?
3. After how many years on an average a house is being renovated?
4. When was the highest number of houses sold and where?
5. Sqft living area got reduced over time?
6. Chart of price vs bedrooms and bathrooms
7. Sqft above correlation with price
8. Is the price of a home with a waterfront higher than a home without? And is it more in number.
9. Are the Houses with sqft basement greater in price than the houses without?
10. What is the expected price of a 1000 feet home with a waterfront?
DATASET EXPLORATION
•This dataset contains house sale prices for King County, which includes Seattle. It includes homes sold between 2014 and 2015.
•Target of the analysis is the sales/Selling Price of the houses.
•The predictors available to us: the sales date, renovation year, number of bedrooms and bathrooms, number of floors, the square footage of the houses (both living and lot), and available view of waterfront. Also, the dataset contains the grade and condition of each house.
Distribution of the data:
•continuous: sqft_living, sqft_lot, sqft_above, sqft_basement
•discrete: bedrooms, bathrooms, grade, floors, condition
•categorical: waterfront
DATA WRANGLING
id,lat and long are removed
Different data frames were used to answer specific questions
FUNCTIONS
Filter() , Mutate() , mean() , as.integer() , substr() ,
group by(), summarize() , distinct() , cat()
Tasks performed on data FOR MODELS
DATA SET outliers are removed (>2000000)
SPLIT into Training and Testing Data set
Models are created and trained with training data , tested and values are predicted for testing data
Data is a precious thing and will last longer than the systems themselves. -- Tim Berners-Lee
ANSWERS
1. Mean price of the houses in Seattle is: 534963.7705246742.
2. The renovated houses are expensive than the non renovated ones
3. A house is being renovated on an average after 56.2997811816193 years
4. Highest number of houses (432) were sold in the year 2014 in the location with zip code 98103
5. No, the square foot of living area has only increased over time.
6. Number of bedrooms and bathrooms does not have a significant effect on the Price of the houses
7. As the sqft above value increases the Price value increases and a steady increase is noted from the Trendline created. So, the sqft above has a significant effect in Price of the houses
8. Price of waterfront houses are greater than the houses without waterfront.
And houses with waterfront are less in number.
9. Houses with sqft basement are greater in price than the houses without.
10.The expected price of a 1000 feet home with a waterfront is 274662.9 dollars.