Python编程入门学习笔记(九)

Python 学习笔记(九)

## Python第四课

### 新的数据格式:CSV

- 纯文本,使用某个字符集,比如ACSII,Unicode,EBCDIC或GB2312(简体中文环境)等;
- 由记录组成(典型的是每行一条记录);
- 每条记录被分隔符(英语:Delimiter)分隔为字段(英语:Field(computer science))(典型分隔符有逗号、分号或制表符;有时分隔符可以包括可选的空格);
- 每条记录都有同样的字段序列。

#### pandas


```python
import pandas as pd
import numpy as np
```


```python
f = open('K:/Code/jupyter-notebook/Python Study/成绩表.csv')
df = pd.read_csv(f)
```


```python
#head默认读取前5行
df.head()
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>学号</th>
      <th>姓名</th>
      <th>性别</th>
      <th>年龄</th>
      <th>班级</th>
      <th>计算机</th>
      <th>英语</th>
      <th>数学</th>
      <th>语文</th>
      <th>物理</th>
      <th>化学</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>1</td>
      <td>张小文</td>
      <td>男</td>
      <td>20</td>
      <td>1002</td>
      <td>56</td>
      <td>62</td>
      <td>86</td>
      <td>85</td>
      <td>86</td>
      <td>75</td>
    </tr>
    <tr>
      <th>1</th>
      <td>2</td>
      <td>李清</td>
      <td>女</td>
      <td>19</td>
      <td>1001</td>
      <td>94</td>
      <td>65</td>
      <td>85</td>
      <td>90</td>
      <td>84</td>
      <td>75</td>
    </tr>
    <tr>
      <th>2</th>
      <td>3</td>
      <td>孙明</td>
      <td>男</td>
      <td>19</td>
      <td>1003</td>
      <td>74</td>
      <td>85</td>
      <td>80</td>
      <td>84</td>
      <td>86</td>
      <td>91</td>
    </tr>
    <tr>
      <th>3</th>
      <td>4</td>
      <td>陈平</td>
      <td>男</td>
      <td>8</td>
      <td>1003</td>
      <td>85</td>
      <td>75</td>
      <td>78</td>
      <td>73</td>
      <td>86</td>
      <td>81</td>
    </tr>
    <tr>
      <th>4</th>
      <td>5</td>
      <td>刘东</td>
      <td>男</td>
      <td>20</td>
      <td>1001</td>
      <td>88</td>
      <td>74</td>
      <td>77</td>
      <td>65</td>
      <td>85</td>
      <td>71</td>
    </tr>
  </tbody>
</table>
</div>




```python
type(df)
```




    pandas.core.frame.DataFrame



### DataFrame


```python
# 列名
print(df.columns)
# 索引
print(df.index)
```

    Index(['学号', '姓名', '性别', '年龄', '班级', '计算机', '英语', '数学', '语文', '物理', '化学'], dtype='object')
    RangeIndex(start=0, stop=8, step=1)
    


```python
df.loc[0]
```




    学号        1
    姓名      张小文
    性别        男
    年龄       20
    班级     1002
    计算机      56
    英语       62
    数学       86
    语文       85
    物理       86
    化学       75
    Name: 0, dtype: object




```python
# 筛选数学成绩大于80的
df[df.数学 > 80]
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>学号</th>
      <th>姓名</th>
      <th>性别</th>
      <th>年龄</th>
      <th>班级</th>
      <th>计算机</th>
      <th>英语</th>
      <th>数学</th>
      <th>语文</th>
      <th>物理</th>
      <th>化学</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>1</td>
      <td>张小文</td>
      <td>男</td>
      <td>20</td>
      <td>1002</td>
      <td>56</td>
      <td>62</td>
      <td>86</td>
      <td>85</td>
      <td>86</td>
      <td>75</td>
    </tr>
    <tr>
      <th>1</th>
      <td>2</td>
      <td>李清</td>
      <td>女</td>
      <td>19</td>
      <td>1001</td>
      <td>94</td>
      <td>65</td>
      <td>85</td>
      <td>90</td>
      <td>84</td>
      <td>75</td>
    </tr>
  </tbody>
</table>
</div>




```python
df[df.数学 < 70]
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>学号</th>
      <th>姓名</th>
      <th>性别</th>
      <th>年龄</th>
      <th>班级</th>
      <th>计算机</th>
      <th>英语</th>
      <th>数学</th>
      <th>语文</th>
      <th>物理</th>
      <th>化学</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>7</th>
      <td>8</td>
      <td>黄佳</td>
      <td>女</td>
      <td>20</td>
      <td>1002</td>
      <td>81</td>
      <td>78</td>
      <td>58</td>
      <td>84</td>
      <td>90</td>
      <td>82</td>
    </tr>
  </tbody>
</table>
</div>




```python
# 复杂筛选
df[(df.语文 >= 80) & (df.数学 >= 80) & (df.英语 >= 80)]
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>学号</th>
      <th>姓名</th>
      <th>性别</th>
      <th>年龄</th>
      <th>班级</th>
      <th>计算机</th>
      <th>英语</th>
      <th>数学</th>
      <th>语文</th>
      <th>物理</th>
      <th>化学</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>2</th>
      <td>3</td>
      <td>孙明</td>
      <td>男</td>
      <td>19</td>
      <td>1003</td>
      <td>74</td>
      <td>85</td>
      <td>80</td>
      <td>84</td>
      <td>86</td>
      <td>91</td>
    </tr>
  </tbody>
</table>
</div>



### 排序


```python
df.sort_values(['数学','语文']).head()
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>学号</th>
      <th>姓名</th>
      <th>性别</th>
      <th>年龄</th>
      <th>班级</th>
      <th>计算机</th>
      <th>英语</th>
      <th>数学</th>
      <th>语文</th>
      <th>物理</th>
      <th>化学</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>7</th>
      <td>8</td>
      <td>黄佳</td>
      <td>女</td>
      <td>20</td>
      <td>1002</td>
      <td>81</td>
      <td>78</td>
      <td>58</td>
      <td>84</td>
      <td>90</td>
      <td>82</td>
    </tr>
    <tr>
      <th>6</th>
      <td>7</td>
      <td>王大力</td>
      <td>男</td>
      <td>18</td>
      <td>1003</td>
      <td>85</td>
      <td>85</td>
      <td>75</td>
      <td>78</td>
      <td>84</td>
      <td>69</td>
    </tr>
    <tr>
      <th>4</th>
      <td>5</td>
      <td>刘东</td>
      <td>男</td>
      <td>20</td>
      <td>1001</td>
      <td>88</td>
      <td>74</td>
      <td>77</td>
      <td>65</td>
      <td>85</td>
      <td>71</td>
    </tr>
    <tr>
      <th>5</th>
      <td>6</td>
      <td>严云峰</td>
      <td>男</td>
      <td>19</td>
      <td>1001</td>
      <td>84</td>
      <td>87</td>
      <td>77</td>
      <td>80</td>
      <td>70</td>
      <td>81</td>
    </tr>
    <tr>
      <th>3</th>
      <td>4</td>
      <td>陈平</td>
      <td>男</td>
      <td>8</td>
      <td>1003</td>
      <td>85</td>
      <td>75</td>
      <td>78</td>
      <td>73</td>
      <td>86</td>
      <td>81</td>
    </tr>
  </tbody>
</table>
</div>



### 访问


```python
# 按照索引定位
df.loc[1]
```




    学号        2
    姓名       李清
    性别        女
    年龄       19
    班级     1001
    计算机      94
    英语       65
    数学       85
    语文       90
    物理       84
    化学       75
    Name: 1, dtype: object



### 索引


```python
scores = {
    '英语': [90,70,89],
    '数学': [64,78,48],
    '姓名': ['wang','li','sun']
}
df = pd.DataFrame(scores, index = ['one','two','three'])
df
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>英语</th>
      <th>数学</th>
      <th>姓名</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>one</th>
      <td>90</td>
      <td>64</td>
      <td>wang</td>
    </tr>
    <tr>
      <th>two</th>
      <td>70</td>
      <td>78</td>
      <td>li</td>
    </tr>
    <tr>
      <th>three</th>
      <td>89</td>
      <td>48</td>
      <td>sun</td>
    </tr>
  </tbody>
</table>
</div>




```python
df.index
```




    Index(['one', 'two', 'three'], dtype='object')




```python
df.loc['one']
```




    英语      90
    数学      64
    姓名    wang
    Name: one, dtype: object




```python
# 实实在在的所谓的第几行,当索引不是数字索引时使用
df.iloc[0]
```




    英语      90
    数学      64
    姓名    wang
    Name: one, dtype: object




```python
# 合并了loc和iloc的功能
df.ix[0]
```

    c:\python\python36\lib\site-packages\ipykernel_launcher.py:1: DeprecationWarning: 
    .ix is deprecated. Please use
    .loc for label based indexing or
    .iloc for positional indexing
    
    See the documentation here:
    http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
      """Entry point for launching an IPython kernel.
    




    英语      90
    数学      64
    姓名    wang
    Name: one, dtype: object




```python
df.loc[:2]
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>学号</th>
      <th>姓名</th>
      <th>性别</th>
      <th>年龄</th>
      <th>班级</th>
      <th>计算机</th>
      <th>英语</th>
      <th>数学</th>
      <th>语文</th>
      <th>物理</th>
      <th>化学</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>1</td>
      <td>张小文</td>
      <td>男</td>
      <td>20</td>
      <td>1002</td>
      <td>56</td>
      <td>62</td>
      <td>86</td>
      <td>85</td>
      <td>86</td>
      <td>75</td>
    </tr>
    <tr>
      <th>1</th>
      <td>2</td>
      <td>李清</td>
      <td>女</td>
      <td>19</td>
      <td>1001</td>
      <td>94</td>
      <td>65</td>
      <td>85</td>
      <td>90</td>
      <td>84</td>
      <td>75</td>
    </tr>
    <tr>
      <th>2</th>
      <td>3</td>
      <td>孙明</td>
      <td>男</td>
      <td>19</td>
      <td>1003</td>
      <td>74</td>
      <td>85</td>
      <td>80</td>
      <td>84</td>
      <td>86</td>
      <td>91</td>
    </tr>
  </tbody>
</table>
</div>




```python
df.iloc[:3]
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>学号</th>
      <th>姓名</th>
      <th>性别</th>
      <th>年龄</th>
      <th>班级</th>
      <th>计算机</th>
      <th>英语</th>
      <th>数学</th>
      <th>语文</th>
      <th>物理</th>
      <th>化学</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>1</td>
      <td>张小文</td>
      <td>男</td>
      <td>20</td>
      <td>1002</td>
      <td>56</td>
      <td>62</td>
      <td>86</td>
      <td>85</td>
      <td>86</td>
      <td>75</td>
    </tr>
    <tr>
      <th>1</th>
      <td>2</td>
      <td>李清</td>
      <td>女</td>
      <td>19</td>
      <td>1001</td>
      <td>94</td>
      <td>65</td>
      <td>85</td>
      <td>90</td>
      <td>84</td>
      <td>75</td>
    </tr>
    <tr>
      <th>2</th>
      <td>3</td>
      <td>孙明</td>
      <td>男</td>
      <td>19</td>
      <td>1003</td>
      <td>74</td>
      <td>85</td>
      <td>80</td>
      <td>84</td>
      <td>86</td>
      <td>91</td>
    </tr>
  </tbody>
</table>
</div>




```python
# 访问某一行,是错误的
# df[0]

#访问多行数据是可以使用切片的
df[:2]
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>学号</th>
      <th>姓名</th>
      <th>性别</th>
      <th>年龄</th>
      <th>班级</th>
      <th>计算机</th>
      <th>英语</th>
      <th>数学</th>
      <th>语文</th>
      <th>物理</th>
      <th>化学</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>1</td>
      <td>张小文</td>
      <td>男</td>
      <td>20</td>
      <td>1002</td>
      <td>56</td>
      <td>62</td>
      <td>86</td>
      <td>85</td>
      <td>86</td>
      <td>75</td>
    </tr>
    <tr>
      <th>1</th>
      <td>2</td>
      <td>李清</td>
      <td>女</td>
      <td>19</td>
      <td>1001</td>
      <td>94</td>
      <td>65</td>
      <td>85</td>
      <td>90</td>
      <td>84</td>
      <td>75</td>
    </tr>
  </tbody>
</table>
</div>




```python
# dataFrame中的数组
df.values
```




    array([[1, '张小文', '男', 20, 1002, 56, 62, 86, 85, 86, 75],
           [2, '李清', '女', 19, 1001, 94, 65, 85, 90, 84, 75],
           [3, '孙明', '男', 19, 1003, 74, 85, 80, 84, 86, 91],
           [4, '陈平', '男', 8, 1003, 85, 75, 78, 73, 86, 81],
           [5, '刘东', '男', 20, 1001, 88, 74, 77, 65, 85, 71],
           [6, '严云峰', '男', 19, 1001, 84, 87, 77, 80, 70, 81],
           [7, '王大力', '男', 18, 1003, 85, 85, 75, 78, 84, 69],
           [8, '黄佳', '女', 20, 1002, 81, 78, 58, 84, 90, 82]], dtype=object)




```python
df.数学.values
```




    array([86, 85, 80, 78, 77, 77, 75, 58], dtype=int64)




```python
# 简单的统计
df.数学.value_counts()
```




    77    2
    78    1
    75    1
    58    1
    86    1
    85    1
    80    1
    Name: 数学, dtype: int64




```python
new = df[['数学','语文']].head()
new
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>数学</th>
      <th>语文</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>86</td>
      <td>85</td>
    </tr>
    <tr>
      <th>1</th>
      <td>85</td>
      <td>90</td>
    </tr>
    <tr>
      <th>2</th>
      <td>80</td>
      <td>84</td>
    </tr>
    <tr>
      <th>3</th>
      <td>78</td>
      <td>73</td>
    </tr>
    <tr>
      <th>4</th>
      <td>77</td>
      <td>65</td>
    </tr>
  </tbody>
</table>
</div>




```python
new * 2
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>数学</th>
      <th>语文</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>172</td>
      <td>170</td>
    </tr>
    <tr>
      <th>1</th>
      <td>170</td>
      <td>180</td>
    </tr>
    <tr>
      <th>2</th>
      <td>160</td>
      <td>168</td>
    </tr>
    <tr>
      <th>3</th>
      <td>156</td>
      <td>146</td>
    </tr>
    <tr>
      <th>4</th>
      <td>154</td>
      <td>130</td>
    </tr>
  </tbody>
</table>
</div>



### 重点


```python
def func(score):
    if score>=80:
        return '优秀'
    elif score>=70:
        return '良'
    elif score>=60:
        return '及格'
    else:
        return '不及格'
df['数学分类'] = df.数学.map(func)
```


```python
df.head()
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>学号</th>
      <th>姓名</th>
      <th>性别</th>
      <th>年龄</th>
      <th>班级</th>
      <th>计算机</th>
      <th>英语</th>
      <th>数学</th>
      <th>语文</th>
      <th>物理</th>
      <th>化学</th>
      <th>数学分类</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>1</td>
      <td>张小文</td>
      <td>男</td>
      <td>20</td>
      <td>1002</td>
      <td>56</td>
      <td>62</td>
      <td>86</td>
      <td>85</td>
      <td>86</td>
      <td>75</td>
      <td>优秀</td>
    </tr>
    <tr>
      <th>1</th>
      <td>2</td>
      <td>李清</td>
      <td>女</td>
      <td>19</td>
      <td>1001</td>
      <td>94</td>
      <td>65</td>
      <td>85</td>
      <td>90</td>
      <td>84</td>
      <td>75</td>
      <td>优秀</td>
    </tr>
    <tr>
      <th>2</th>
      <td>3</td>
      <td>孙明</td>
      <td>男</td>
      <td>19</td>
      <td>1003</td>
      <td>74</td>
      <td>85</td>
      <td>80</td>
      <td>84</td>
      <td>86</td>
      <td>91</td>
      <td>优秀</td>
    </tr>
    <tr>
      <th>3</th>
      <td>4</td>
      <td>陈平</td>
      <td>男</td>
      <td>8</td>
      <td>1003</td>
      <td>85</td>
      <td>75</td>
      <td>78</td>
      <td>73</td>
      <td>86</td>
      <td>81</td>
      <td>良</td>
    </tr>
    <tr>
      <th>4</th>
      <td>5</td>
      <td>刘东</td>
      <td>男</td>
      <td>20</td>
      <td>1001</td>
      <td>88</td>
      <td>74</td>
      <td>77</td>
      <td>65</td>
      <td>85</td>
      <td>71</td>
      <td>良</td>
    </tr>
  </tbody>
</table>
</div>




```python
# applymap对dataFrame中所有的数据进行操作的一个函数,非常重要
def func(number):
    return number + 10
# 等价
func = lambda number: number + 10

df.applymap(lambda x: str(x) + ' -').head(2)
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>学号</th>
      <th>姓名</th>
      <th>性别</th>
      <th>年龄</th>
      <th>班级</th>
      <th>计算机</th>
      <th>英语</th>
      <th>数学</th>
      <th>语文</th>
      <th>物理</th>
      <th>化学</th>
      <th>数学分类</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>1 -</td>
      <td>张小文 -</td>
      <td>男 -</td>
      <td>20 -</td>
      <td>1002 -</td>
      <td>56 -</td>
      <td>62 -</td>
      <td>86 -</td>
      <td>85 -</td>
      <td>86 -</td>
      <td>75 -</td>
      <td>优秀 -</td>
    </tr>
    <tr>
      <th>1</th>
      <td>2 -</td>
      <td>李清 -</td>
      <td>女 -</td>
      <td>19 -</td>
      <td>1001 -</td>
      <td>94 -</td>
      <td>65 -</td>
      <td>85 -</td>
      <td>90 -</td>
      <td>84 -</td>
      <td>75 -</td>
      <td>优秀 -</td>
    </tr>
  </tbody>
</table>
</div>



### 匿名函数


```python
[i+ 100 for i in range(10)]
```




    [100, 101, 102, 103, 104, 105, 106, 107, 108, 109]




```python
def func(x):
    return x + 100
```


```python
list(map(func,range(10)))
# 函数太简单,不经常使用,或者没有必要取名字就可以使用匿名函数lambda
list(map(lambda x: x + 100,range(10)))
```




    [100, 101, 102, 103, 104, 105, 106, 107, 108, 109]




```python
# 根据多列生成新的一个列的操作,用apply函数
df['new_score'] = df.apply(lambda x: x.数学 + x.语文, axis = 1)
```


```python
#前几行
df.head(2)
#最后几行
df.tail(2)
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>学号</th>
      <th>姓名</th>
      <th>性别</th>
      <th>年龄</th>
      <th>班级</th>
      <th>计算机</th>
      <th>英语</th>
      <th>数学</th>
      <th>语文</th>
      <th>物理</th>
      <th>化学</th>
      <th>数学分类</th>
      <th>new_score</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>6</th>
      <td>7</td>
      <td>王大力</td>
      <td>男</td>
      <td>18</td>
      <td>1003</td>
      <td>85</td>
      <td>85</td>
      <td>75</td>
      <td>78</td>
      <td>84</td>
      <td>69</td>
      <td>良</td>
      <td>153</td>
    </tr>
    <tr>
      <th>7</th>
      <td>8</td>
      <td>黄佳</td>
      <td>女</td>
      <td>20</td>
      <td>1002</td>
      <td>81</td>
      <td>78</td>
      <td>58</td>
      <td>84</td>
      <td>90</td>
      <td>82</td>
      <td>不及格</td>
      <td>142</td>
    </tr>
  </tbody>
</table>
</div>



### pandas中的dataFrame的操作,很大一部分和numpy中的二维数组的操作是近似的

<h1 style="text-align:center">matplotlib绘图 </h1>


```python
df = df.drop(['new_score'],axis = 1)
```


```python
df.head(2)
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>学号</th>
      <th>姓名</th>
      <th>性别</th>
      <th>年龄</th>
      <th>班级</th>
      <th>计算机</th>
      <th>英语</th>
      <th>数学</th>
      <th>语文</th>
      <th>物理</th>
      <th>化学</th>
      <th>数学分类</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>1</td>
      <td>张小文</td>
      <td>男</td>
      <td>20</td>
      <td>1002</td>
      <td>56</td>
      <td>62</td>
      <td>86</td>
      <td>85</td>
      <td>86</td>
      <td>75</td>
      <td>优秀</td>
    </tr>
    <tr>
      <th>1</th>
      <td>2</td>
      <td>李清</td>
      <td>女</td>
      <td>19</td>
      <td>1001</td>
      <td>94</td>
      <td>65</td>
      <td>85</td>
      <td>90</td>
      <td>84</td>
      <td>75</td>
      <td>优秀</td>
    </tr>
  </tbody>
</table>
</div>



### 绘图


```python
import numpy as np
import matplotlib.pyplot as plt
#这一行是必不可少的
%matplotlib inline 
```


```python
x = np.linspace(0, 10, 100)
y = np.sin(x)

plt.plot(x, y)
plt.plot(x, np.cos(x))
```




    [<matplotlib.lines.Line2D at 0x1b3061cc7f0>]




![png](output_48_1.png)



```python
plt.plot(x, y, '--')
```




    [<matplotlib.lines.Line2D at 0x1b3082c71d0>]




![png](output_49_1.png)



```python
fig = plt.figure()
plt.plot(x, y, '--')
```




    [<matplotlib.lines.Line2D at 0x1b30832ca58>]




![png](output_50_1.png)



```python
fig.savefig('K:/Code/jupyter-notebook/Python Study/first_figure.png')
```


```python
# 虚线样式
plt.subplot(2,1,1)
plt.plot(x, np.sin(x),'--')

plt.subplot(2,1,2)
plt.plot(x, np.cos(x),)
```




    [<matplotlib.lines.Line2D at 0x1b308395198>]




![png](output_52_1.png)



```python
# 点状样式
x = np.linspace(0,10,20)
plt.plot(x, np.sin(x),'o')
```




    [<matplotlib.lines.Line2D at 0x1b3084f4940>]




![png](output_53_1.png)



```python
# color控制颜色
x = np.linspace(0,10,20)
plt.plot(x, np.sin(x),'o',color= 'red')
```




    [<matplotlib.lines.Line2D at 0x1b30855bef0>]




![png](output_54_1.png)



```python
# 加label标签
x = np.linspace(0, 10, 100)
y = np.sin(x)

plt.plot(x, y,'--',label='sin(x)')
plt.plot(x, np.cos(x),'o',label='cos(x)')
# legend控制label的显示效果,loc是控制label的位置的显示
plt.legend(loc= 1 )
```




    <matplotlib.legend.Legend at 0x1b309907198>




![png](output_55_1.png)



```python
plt.legend?
##当遇到一个不熟悉的函数的时候,多使用?号,查看函数的文档
```


```python
# plot函数,可定义的参数非常多
x = np.linspace(0, 10, 20)
y = np.sin(x)
plt.plot(x,y,'-p',color = 'green',
        markersize = 10,linewidth = 4,
        markeredgecolor = 'orange',
        markeredgewidth=2)
plt.ylim(-0.5,0.8)
```




    (-0.5, 0.8)




![png](output_57_1.png)



```python
# 具体参数可查看文档
plt.plot?
```


```python
# ylim,xlim限定函数
plt.plot(x,y,'-p',color = 'green',
        markersize = 10,linewidth = 4,
        markeredgecolor = 'orange',
        markeredgewidth=2)
plt.ylim(-0.5,1.2)
plt.xlim(2,8)
```




    (2, 8)




![png](output_59_1.png)



```python
#散点图函数
plt.scatter(x,y,s=100,c='red')
```




    <matplotlib.collections.PathCollection at 0x1b309da0c88>




![png](output_60_1.png)



```python
plt.style.use('classic')

x = np.random.randn(100)
y = np.random.randn(100)
colors = np.random.randn(100)
sizes = 1000 * np.random.randn(100)
plt.scatter(x,y,c=colors,s=sizes,alpha=0.4)
plt.colorbar()
```

    c:\python\python36\lib\site-packages\matplotlib\collections.py:902: RuntimeWarning: invalid value encountered in sqrt
      scale = np.sqrt(self._sizes) * dpi / 72.0 * self._factor
    




    <matplotlib.colorbar.Colorbar at 0x1b309fe4f98>




![png](output_61_2.png)


### pandas本身自带绘图

### 线性图形


```python
import pandas as pd
df = pd.DataFrame(np.random.randn(100,4).cumsum(0),columns=['A','B','C','D'])
df.plot()
```




    <matplotlib.axes._subplots.AxesSubplot at 0x1b30c0c88d0>




![png](output_64_1.png)


### 柱状图形


```python
df = pd.DataFrame(np.random.randint(10,50,(3,4)),columns=['A','B','C','D'],index = ['one','two','three'])
df.plot.bar()
```




    <matplotlib.axes._subplots.AxesSubplot at 0x1b30c284898>




![png](output_66_1.png)



```python
df.B.plot.bar()
```




    <matplotlib.axes._subplots.AxesSubplot at 0x1b30c16c9b0>




![png](output_67_1.png)



```python
# 等价于上面的绘制
df.plot(kind = 'bar')
```




    <matplotlib.axes._subplots.AxesSubplot at 0x1b30c190898>




![png](output_68_1.png)



```python
# 进行累加
df.plot(kind = 'bar',stacked = True)
```




    <matplotlib.axes._subplots.AxesSubplot at 0x1b30c223978>




![png](output_69_1.png)


### 直方图


```python
df = pd.DataFrame(np.random.randn(100,4),columns=['A','B','C','D'])
df.hist(column='A',grid=True,figsize=(10,5))
```




    array([[<matplotlib.axes._subplots.AxesSubplot object at 0x000001B30DE24DD8>]],
          dtype=object)




![png](output_71_1.png)


### 密度图


```python
# 等价于df.plot(kind = 'kde')
# 提示:运行前,需要安装scipy库,用pip install scipy命令,否则提示:ModuleNotFoundError: No module named 'scipy'
df.plot.kde()
```




    <matplotlib.axes._subplots.AxesSubplot at 0x1b30e082d30>




![png](output_73_1.png)


### matplotlib 绘制三维图


```python
from mpl_toolkits.mplot3d import Axes3D  
from matplotlib import cm  
from matplotlib.ticker import LinearLocator, FormatStrFormatter  
import matplotlib.pyplot as plt  
import numpy as np  
 
fig = plt.figure()  
ax = fig.gca(projection='3d') 
#横坐标区间,内部不能重复
X = np.arange(-5, 5, 0.25)
#纵坐标区间,内部不能重复
Y = np.arange(-5, 5, 0.25)
#生成网格
X, Y = np.meshgrid(X, Y)
R = np.sqrt(X**2 + Y**2)  
Z = np.sin(R)  

#plot the surface z axis
surf = ax.plot_surface(X, Y, Z, rstride=1, cstride=1, cmap=cm.jet,  
        linewidth=0, antialiased=False)  

#Customize the 
ax.set_zlim(-1.01, 1.01)  
ax.zaxis.set_major_locator(LinearLocator(10))  
ax.zaxis.set_major_formatter(FormatStrFormatter('%.02f'))  
 
# Add a color bar which maps values to colors
fig.colorbar(surf, shrink=0.5, aspect=5)  
 
plt.show() 
```


![png](output_75_0.png)

Author: allengao

发表评论

电子邮件地址不会被公开。 必填项已用*标注