pandas的文件输入输出，安装第三方库的命令？

2023年2月16日10:23:48已关闭评论

pandas的文件输入输出模块依赖xlrd、xlwt和openpyxl这3个第三方库，若未安装可使用如下命令安装：

# 可以使用如下<em>conda</em>命令或<em>pip</em>命令安装
$ conda install xlrd xlwt openpyxl
$ pip install xlrd xlwt openpyxl

csv、txt和excel文件分别可以用read_csv()、read_table()和read_excel()读取，其中的传入参数为相应文件的绝对路径或相对路径。

In [3]:   df_csv = pd.read_csv('data/ch2/my_csv.csv')
          df_csv

Out[3]:      col1 col2  col3    col4      col5
          0     2    a   1.4   apple  2020/1/1
          1     3    b   3.4  banana  2020/1/2
          2     6    c   2.5  orange  2020/1/5

In [4]:   df_txt = pd.read_table('data/ch2/my_table.txt')
          df_txt

Out[4]:     col1  col2  col3    col4      col5
          0    2     a   1.4   apple  2020/1/1
          1    3     b   3.4  banana  2020/1/2
          2    6     c   2.5  orange  2020/1/5

In [5]:   df_excel = pd.read_excel('data/ch2/my_excel.xlsx')
          df_excel

Out[5]:     col1  col2  col3    col4      col5
          0    2     a   1.4   apple  2020/1/1
          1    3     b   3.4  banana  2020/1/2
          2    6     c   2.5  orange  2020/1/5

这些函数有一些公共参数，含义如下：将header设置为None表示第一行不作为列名；index_col表示把某一列或几列作为索引，索引的内容将会在第3章进行详述；usecols表示读取列的集合，默认读取所有列；parse_dates表示需要转化为时间的列，关于时间序列的内容将在第10章讲解；nrows表示读取的数据行数。上面这些参数在上述的3个函数里都可以使用。

In [6]:   pd.read_table('data/ch2/my_table.txt', header=None)

Out[6]:        0     1     2       3         4
          0 col1  col2  col3    col4      col5
          0    2     a   1.4   apple  2020/1/1
          1    3     b   3.4  banana  2020/1/2
          2    6     c   2.5  orange  2020/1/5

In [7]:   pd.read_csv('data/ch2/my_csv.csv', index_col=['col1', 'col2'])

Out[7]:                 col3    col4      col5
          col1 col2 
          2    a         1.4   apple  2020/1/1
          3    b         3.4  banana  2020/1/2
          6    c         2.5  orange  2020/1/5

In [8]:   pd.read_table('data/ch2/my_table.txt', usecols=['col1', 'col2'])

Out[8]:     col1 col2 
          0    2    a
          1    3    b
          2    6    c

In [9]:   # col5的格式已不是原先的字符串
          pd.read_csv('data/ch2/my_csv.csv', parse_dates=['col5'])

Out[9]:     col1  col2  col3    col4      col5
          0    2     a   1.4   apple  2020/1/1
          1    3     b   3.4  banana  2020/1/2
          2    6     c   2.5  orange  2020/1/5

In [10]:   pd.read_excel('data/ch2/my_excel.xlsx', nrows=2)

Out[10]:     col1  col2  col3    col4      col5
           0    2     a   1.4   apple  2020/1/1
           1    3     b   3.4  banana  2020/1/2

在读取txt文件时，经常会遇到分隔符非空格的情况，read_table()有一个分割参数sep，它使得用户可以自定义分割符号来进行对txt类型数据的读取。例如，下面读取的表以“||||”为分割符号：

In [11]:   pd.read_table('data/ch2/my_table_special_sep.txt')

Out[11]:                 col1 |||| col2
           0  TS |||| This is an apple.
           1    GQ |||| My name is Bob.
           2         WT |||| Well done!

上面的结果显然不是我们想要的，这时可以使用参数sep，同时需要指定引擎（engine）为Python：

In [12]:   pd.read_table(
               'data/ch2/my_table_special_sep.txt',
               sep= '\|\|\|\|',
               engine= 'python'
           )

Out[12]:    col1               col2
           0  TS  This is an apple.
           1  GQ    My name is Bob.
           2  WT         Well done!

登录 找回密码

登录找回密码