为了账号安全,请及时绑定邮箱和手机立即绑定

Mean vs Median

R day 2:

I was working on a dataset of Airbnb in New York City from Kaggle, when i run the summary function for the price variable in R, i noticed there’s a strong difference between Mean and Median of the variable.

summary(ab$price)

Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0 69.0 106.0 152.7 175.0 10000.0

In this case, which variable is more persuasive? Mean or Median.

In order to answer this question, we will run the density distribution of the price variable first.
As the graph shows, the price density distribution is extremely skewed to the left.

Can you guess which one would make more sense?
Yes, it is the median value that tells a better story about Airbnb price in NYC !

d1<- ggplot(ab, aes(price))+geom_density(alpha=0.2)
d1

What if the data is not skewed or just slightly skewed?

In this case, Mean Value is very reliable to describe the central tendency of the data

carrots <- data.frame(length = rnorm(100000, 6, 2))
cukes <- data.frame(length = rnorm(50000, 7, 2.5))

#Now, combine your two dataframes into one.  First make a new column in each.
carrots$veg <- 'carrot'
cukes$veg <- 'cuke'

#and combine into your new data frame vegLengths
vegLengths <- rbind(carrots, cukes)

#now make your lovely plot
p <- ggplot(vegLengths, aes(length, fill = veg)) + geom_density(alpha = 0.2)

p

by examining the density distributions of data, now we have a conclusion.

Conclusion:

if a data distribution is Normal/slightly Skewed the Mean Value shows the Central Tendency of the dataset.
Whereas if the data is skewed, then the Median is a more intuitive measurement.

Thanks to Jun.z, who is willing to share with me about all the stats tricks.

REF:

点击查看更多内容
TA 点赞

若觉得本文不错,就分享一下吧!

评论

作者其他优质文章

正在加载中
  • 推荐
  • 评论
  • 收藏
  • 共同学习,写下你的评论
感谢您的支持,我会继续努力的~
扫码打赏,你说多少就多少
赞赏金额会直接到老师账户
支付方式
打开微信扫一扫,即可进行扫码打赏哦
今天注册有机会得

100积分直接送

付费专栏免费学

大额优惠券免费领

立即参与 放弃机会
意见反馈 帮助中心 APP下载
官方微信

举报

0/150
提交
取消