Series和DataFrame上存储了许多信息,但我们很多时候只需要获取其中的部分信息。当想要查看表的前几行或后几行时,可以使用head()函数和tail()函数,它们分别返回表或者序列的前n行和后n行信息,其中n默认为5:
In [41]: df.head(2)
Out[41]: School Grade Name Gender Height Weight Transfer
0 A Freshman Gaopeng Yang Female 158.9 46.0 N
1 B Freshman Changqiang You Male 166.5 70.0 N
In [42]: df.tail(3)
Out[42]: School Grade Name Gender Height Weight Transfer
197 A Senior Chengqiang Chu Female 153.9 45.0 N
198 A Senior Chengmei Shen Male 175.3 71.0 N
199 D Sophomore Chunpeng Lv Male 155.7 51.0 N
info()函数和describe()函数分别返回表的信息概况和表中数值列对应的主要统计量:
In [43]: df.info()
Out[43]: <class 'pandas.core.frame.DataFrame'>
RangeIndex: 200 entries, 0 to 199
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--------- -------------------
0 School 200 non-null object
1 Grade 200 non-null object
2 Name 200 non-null object
3 Gender 200 non-null object
4 Height 183 non-null float64
5 Weight 189 non-null float64
6 Transfer 188 non-null object
dtypes: float64(2), object(5)
memory usage: 11.1+ KB
In [44]: df.describe()
Out[44]: Height Weight
count 183.000000 189.000000
mean 163.218033 55.015873
std 8.608879 12.824294
min 145.400000 34.000000
25% 157.150000 46.000000
50% 161.900000 51.000000
75% 167.500000 65.000000
max 193.900000 89.000000
注解
info() 和 describe() 只能实现对信息的初步汇总,如果想要对一个数据集进行更为全面且有效的观察,特别是在列较多的情况下,推荐使用 pandas-profiling包。