## Python第四课 ### 新的数据格式:CSV - 纯文本,使用某个字符集,比如ACSII,Unicode,EBCDIC或GB2312(简体中文环境)等; - 由记录组成(典型的是每行一条记录); - 每条记录被分隔符(英语:Delimiter)分隔为字段(英语:Field(computer science))(典型分隔符有逗号、分号或制表符;有时分隔符可以包括可选的空格); - 每条记录都有同样的字段序列。 #### pandas ```python import pandas as pd import numpy as np ``` ```python f = open('K:/Code/jupyter-notebook/Python Study/成绩表.csv') df = pd.read_csv(f) ``` ```python #head默认读取前5行 df.head() ``` <div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>学号</th> <th>姓名</th> <th>性别</th> <th>年龄</th> <th>班级</th> <th>计算机</th> <th>英语</th> <th>数学</th> <th>语文</th> <th>物理</th> <th>化学</th> </tr> </thead> <tbody> <tr> <th>0</th> <td>1</td> <td>张小文</td> <td>男</td> <td>20</td> <td>1002</td> <td>56</td> <td>62</td> <td>86</td> <td>85</td> <td>86</td> <td>75</td> </tr> <tr> <th>1</th> <td>2</td> <td>李清</td> <td>女</td> <td>19</td> <td>1001</td> <td>94</td> <td>65</td> <td>85</td> <td>90</td> <td>84</td> <td>75</td> </tr> <tr> <th>2</th> <td>3</td> <td>孙明</td> <td>男</td> <td>19</td> <td>1003</td> <td>74</td> <td>85</td> <td>80</td> <td>84</td> <td>86</td> <td>91</td> </tr> <tr> <th>3</th> <td>4</td> <td>陈平</td> <td>男</td> <td>8</td> <td>1003</td> <td>85</td> <td>75</td> <td>78</td> <td>73</td> <td>86</td> <td>81</td> </tr> <tr> <th>4</th> <td>5</td> <td>刘东</td> <td>男</td> <td>20</td> <td>1001</td> <td>88</td> <td>74</td> <td>77</td> <td>65</td> <td>85</td> <td>71</td> </tr> </tbody> </table> </div> ```python type(df) ``` pandas.core.frame.DataFrame ### DataFrame ```python # 列名 print(df.columns) # 索引 print(df.index) ``` Index(['学号', '姓名', '性别', '年龄', '班级', '计算机', '英语', '数学', '语文', '物理', '化学'], dtype='object') RangeIndex(start=0, stop=8, step=1) ```python df.loc[0] ``` 学号 1 姓名 张小文 性别 男 年龄 20 班级 1002 计算机 56 英语 62 数学 86 语文 85 物理 86 化学 75 Name: 0, dtype: object ```python # 筛选数学成绩大于80的 df[df.数学 > 80] ``` <div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>学号</th> <th>姓名</th> <th>性别</th> <th>年龄</th> <th>班级</th> <th>计算机</th> <th>英语</th> <th>数学</th> <th>语文</th> <th>物理</th> <th>化学</th> </tr> </thead> <tbody> <tr> <th>0</th> <td>1</td> <td>张小文</td> <td>男</td> <td>20</td> <td>1002</td> <td>56</td> <td>62</td> <td>86</td> <td>85</td> <td>86</td> <td>75</td> </tr> <tr> <th>1</th> <td>2</td> <td>李清</td> <td>女</td> <td>19</td> <td>1001</td> <td>94</td> <td>65</td> <td>85</td> <td>90</td> <td>84</td> <td>75</td> </tr> </tbody> </table> </div> ```python df[df.数学 < 70] ``` <div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>学号</th> <th>姓名</th> <th>性别</th> <th>年龄</th> <th>班级</th> <th>计算机</th> <th>英语</th> <th>数学</th> <th>语文</th> <th>物理</th> <th>化学</th> </tr> </thead> <tbody> <tr> <th>7</th> <td>8</td> <td>黄佳</td> <td>女</td> <td>20</td> <td>1002</td> <td>81</td> <td>78</td> <td>58</td> <td>84</td> <td>90</td> <td>82</td> </tr> </tbody> </table> </div> ```python # 复杂筛选 df[(df.语文 >= 80) & (df.数学 >= 80) & (df.英语 >= 80)] ``` <div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>学号</th> <th>姓名</th> <th>性别</th> <th>年龄</th> <th>班级</th> <th>计算机</th> <th>英语</th> <th>数学</th> <th>语文</th> <th>物理</th> <th>化学</th> </tr> </thead> <tbody> <tr> <th>2</th> <td>3</td> <td>孙明</td> <td>男</td> <td>19</td> <td>1003</td> <td>74</td> <td>85</td> <td>80</td> <td>84</td> <td>86</td> <td>91</td> </tr> </tbody> </table> </div> ### 排序 ```python df.sort_values(['数学','语文']).head() ``` <div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>学号</th> <th>姓名</th> <th>性别</th> <th>年龄</th> <th>班级</th> <th>计算机</th> <th>英语</th> <th>数学</th> <th>语文</th> <th>物理</th> <th>化学</th> </tr> </thead> <tbody> <tr> <th>7</th> <td>8</td> <td>黄佳</td> <td>女</td> <td>20</td> <td>1002</td> <td>81</td> <td>78</td> <td>58</td> <td>84</td> <td>90</td> <td>82</td> </tr> <tr> <th>6</th> <td>7</td> <td>王大力</td> <td>男</td> <td>18</td> <td>1003</td> <td>85</td> <td>85</td> <td>75</td> <td>78</td> <td>84</td> <td>69</td> </tr> <tr> <th>4</th> <td>5</td> <td>刘东</td> <td>男</td> <td>20</td> <td>1001</td> <td>88</td> <td>74</td> <td>77</td> <td>65</td> <td>85</td> <td>71</td> </tr> <tr> <th>5</th> <td>6</td> <td>严云峰</td> <td>男</td> <td>19</td> <td>1001</td> <td>84</td> <td>87</td> <td>77</td> <td>80</td> <td>70</td> <td>81</td> </tr> <tr> <th>3</th> <td>4</td> <td>陈平</td> <td>男</td> <td>8</td> <td>1003</td> <td>85</td> <td>75</td> <td>78</td> <td>73</td> <td>86</td> <td>81</td> </tr> </tbody> </table> </div> ### 访问 ```python # 按照索引定位 df.loc[1] ``` 学号 2 姓名 李清 性别 女 年龄 19 班级 1001 计算机 94 英语 65 数学 85 语文 90 物理 84 化学 75 Name: 1, dtype: object ### 索引 ```python scores = { '英语': [90,70,89], '数学': [64,78,48], '姓名': ['wang','li','sun'] } df = pd.DataFrame(scores, index = ['one','two','three']) df ``` <div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>英语</th> <th>数学</th> <th>姓名</th> </tr> </thead> <tbody> <tr> <th>one</th> <td>90</td> <td>64</td> <td>wang</td> </tr> <tr> <th>two</th> <td>70</td> <td>78</td> <td>li</td> </tr> <tr> <th>three</th> <td>89</td> <td>48</td> <td>sun</td> </tr> </tbody> </table> </div> ```python df.index ``` Index(['one', 'two', 'three'], dtype='object') ```python df.loc['one'] ``` 英语 90 数学 64 姓名 wang Name: one, dtype: object ```python # 实实在在的所谓的第几行,当索引不是数字索引时使用 df.iloc[0] ``` 英语 90 数学 64 姓名 wang Name: one, dtype: object ```python # 合并了loc和iloc的功能 df.ix[0] ``` c:\python\python36\lib\site-packages\ipykernel_launcher.py:1: DeprecationWarning: .ix is deprecated. Please use .loc for label based indexing or .iloc for positional indexing See the documentation here: http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated """Entry point for launching an IPython kernel. 英语 90 数学 64 姓名 wang Name: one, dtype: object ```python df.loc[:2] ``` <div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>学号</th> <th>姓名</th> <th>性别</th> <th>年龄</th> <th>班级</th> <th>计算机</th> <th>英语</th> <th>数学</th> <th>语文</th> <th>物理</th> <th>化学</th> </tr> </thead> <tbody> <tr> <th>0</th> <td>1</td> <td>张小文</td> <td>男</td> <td>20</td> <td>1002</td> <td>56</td> <td>62</td> <td>86</td> <td>85</td> <td>86</td> <td>75</td> </tr> <tr> <th>1</th> <td>2</td> <td>李清</td> <td>女</td> <td>19</td> <td>1001</td> <td>94</td> <td>65</td> <td>85</td> <td>90</td> <td>84</td> <td>75</td> </tr> <tr> <th>2</th> <td>3</td> <td>孙明</td> <td>男</td> <td>19</td> <td>1003</td> <td>74</td> <td>85</td> <td>80</td> <td>84</td> <td>86</td> <td>91</td> </tr> </tbody> </table> </div> ```python df.iloc[:3] ``` <div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>学号</th> <th>姓名</th> <th>性别</th> <th>年龄</th> <th>班级</th> <th>计算机</th> <th>英语</th> <th>数学</th> <th>语文</th> <th>物理</th> <th>化学</th> </tr> </thead> <tbody> <tr> <th>0</th> <td>1</td> <td>张小文</td> <td>男</td> <td>20</td> <td>1002</td> <td>56</td> <td>62</td> <td>86</td> <td>85</td> <td>86</td> <td>75</td> </tr> <tr> <th>1</th> <td>2</td> <td>李清</td> <td>女</td> <td>19</td> <td>1001</td> <td>94</td> <td>65</td> <td>85</td> <td>90</td> <td>84</td> <td>75</td> </tr> <tr> <th>2</th> <td>3</td> <td>孙明</td> <td>男</td> <td>19</td> <td>1003</td> <td>74</td> <td>85</td> <td>80</td> <td>84</td> <td>86</td> <td>91</td> </tr> </tbody> </table> </div> ```python # 访问某一行,是错误的 # df[0] #访问多行数据是可以使用切片的 df[:2] ``` <div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>学号</th> <th>姓名</th> <th>性别</th> <th>年龄</th> <th>班级</th> <th>计算机</th> <th>英语</th> <th>数学</th> <th>语文</th> <th>物理</th> <th>化学</th> </tr> </thead> <tbody> <tr> <th>0</th> <td>1</td> <td>张小文</td> <td>男</td> <td>20</td> <td>1002</td> <td>56</td> <td>62</td> <td>86</td> <td>85</td> <td>86</td> <td>75</td> </tr> <tr> <th>1</th> <td>2</td> <td>李清</td> <td>女</td> <td>19</td> <td>1001</td> <td>94</td> <td>65</td> <td>85</td> <td>90</td> <td>84</td> <td>75</td> </tr> </tbody> </table> </div> ```python # dataFrame中的数组 df.values ``` array([[1, '张小文', '男', 20, 1002, 56, 62, 86, 85, 86, 75], [2, '李清', '女', 19, 1001, 94, 65, 85, 90, 84, 75], [3, '孙明', '男', 19, 1003, 74, 85, 80, 84, 86, 91], [4, '陈平', '男', 8, 1003, 85, 75, 78, 73, 86, 81], [5, '刘东', '男', 20, 1001, 88, 74, 77, 65, 85, 71], [6, '严云峰', '男', 19, 1001, 84, 87, 77, 80, 70, 81], [7, '王大力', '男', 18, 1003, 85, 85, 75, 78, 84, 69], [8, '黄佳', '女', 20, 1002, 81, 78, 58, 84, 90, 82]], dtype=object) ```python df.数学.values ``` array([86, 85, 80, 78, 77, 77, 75, 58], dtype=int64) ```python # 简单的统计 df.数学.value_counts() ``` 77 2 78 1 75 1 58 1 86 1 85 1 80 1 Name: 数学, dtype: int64 ```python new = df[['数学','语文']].head() new ``` <div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>数学</th> <th>语文</th> </tr> </thead> <tbody> <tr> <th>0</th> <td>86</td> <td>85</td> </tr> <tr> <th>1</th> <td>85</td> <td>90</td> </tr> <tr> <th>2</th> <td>80</td> <td>84</td> </tr> <tr> <th>3</th> <td>78</td> <td>73</td> </tr> <tr> <th>4</th> <td>77</td> <td>65</td> </tr> </tbody> </table> </div> ```python new * 2 ``` <div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>数学</th> <th>语文</th> </tr> </thead> <tbody> <tr> <th>0</th> <td>172</td> <td>170</td> </tr> <tr> <th>1</th> <td>170</td> <td>180</td> </tr> <tr> <th>2</th> <td>160</td> <td>168</td> </tr> <tr> <th>3</th> <td>156</td> <td>146</td> </tr> <tr> <th>4</th> <td>154</td> <td>130</td> </tr> </tbody> </table> </div> ### 重点 ```python def func(score): if score>=80: return '优秀' elif score>=70: return '良' elif score>=60: return '及格' else: return '不及格' df['数学分类'] = df.数学.map(func) ``` ```python df.head() ``` <div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>学号</th> <th>姓名</th> <th>性别</th> <th>年龄</th> <th>班级</th> <th>计算机</th> <th>英语</th> <th>数学</th> <th>语文</th> <th>物理</th> <th>化学</th> <th>数学分类</th> </tr> </thead> <tbody> <tr> <th>0</th> <td>1</td> <td>张小文</td> <td>男</td> <td>20</td> <td>1002</td> <td>56</td> <td>62</td> <td>86</td> <td>85</td> <td>86</td> <td>75</td> <td>优秀</td> </tr> <tr> <th>1</th> <td>2</td> <td>李清</td> <td>女</td> <td>19</td> <td>1001</td> <td>94</td> <td>65</td> <td>85</td> <td>90</td> <td>84</td> <td>75</td> <td>优秀</td> </tr> <tr> <th>2</th> <td>3</td> <td>孙明</td> <td>男</td> <td>19</td> <td>1003</td> <td>74</td> <td>85</td> <td>80</td> <td>84</td> <td>86</td> <td>91</td> <td>优秀</td> </tr> <tr> <th>3</th> <td>4</td> <td>陈平</td> <td>男</td> <td>8</td> <td>1003</td> <td>85</td> <td>75</td> <td>78</td> <td>73</td> <td>86</td> <td>81</td> <td>良</td> </tr> <tr> <th>4</th> <td>5</td> <td>刘东</td> <td>男</td> <td>20</td> <td>1001</td> <td>88</td> <td>74</td> <td>77</td> <td>65</td> <td>85</td> <td>71</td> <td>良</td> </tr> </tbody> </table> </div> ```python # applymap对dataFrame中所有的数据进行操作的一个函数,非常重要 def func(number): return number + 10 # 等价 func = lambda number: number + 10 df.applymap(lambda x: str(x) + ' -').head(2) ``` <div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>学号</th> <th>姓名</th> <th>性别</th> <th>年龄</th> <th>班级</th> <th>计算机</th> <th>英语</th> <th>数学</th> <th>语文</th> <th>物理</th> <th>化学</th> <th>数学分类</th> </tr> </thead> <tbody> <tr> <th>0</th> <td>1 -</td> <td>张小文 -</td> <td>男 -</td> <td>20 -</td> <td>1002 -</td> <td>56 -</td> <td>62 -</td> <td>86 -</td> <td>85 -</td> <td>86 -</td> <td>75 -</td> <td>优秀 -</td> </tr> <tr> <th>1</th> <td>2 -</td> <td>李清 -</td> <td>女 -</td> <td>19 -</td> <td>1001 -</td> <td>94 -</td> <td>65 -</td> <td>85 -</td> <td>90 -</td> <td>84 -</td> <td>75 -</td> <td>优秀 -</td> </tr> </tbody> </table> </div> ### 匿名函数 ```python [i+ 100 for i in range(10)] ``` [100, 101, 102, 103, 104, 105, 106, 107, 108, 109] ```python def func(x): return x + 100 ``` ```python list(map(func,range(10))) # 函数太简单,不经常使用,或者没有必要取名字就可以使用匿名函数lambda list(map(lambda x: x + 100,range(10))) ``` [100, 101, 102, 103, 104, 105, 106, 107, 108, 109] ```python # 根据多列生成新的一个列的操作,用apply函数 df['new_score'] = df.apply(lambda x: x.数学 + x.语文, axis = 1) ``` ```python #前几行 df.head(2) #最后几行 df.tail(2) ``` <div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>学号</th> <th>姓名</th> <th>性别</th> <th>年龄</th> <th>班级</th> <th>计算机</th> <th>英语</th> <th>数学</th> <th>语文</th> <th>物理</th> <th>化学</th> <th>数学分类</th> <th>new_score</th> </tr> </thead> <tbody> <tr> <th>6</th> <td>7</td> <td>王大力</td> <td>男</td> <td>18</td> <td>1003</td> <td>85</td> <td>85</td> <td>75</td> <td>78</td> <td>84</td> <td>69</td> <td>良</td> <td>153</td> </tr> <tr> <th>7</th> <td>8</td> <td>黄佳</td> <td>女</td> <td>20</td> <td>1002</td> <td>81</td> <td>78</td> <td>58</td> <td>84</td> <td>90</td> <td>82</td> <td>不及格</td> <td>142</td> </tr> </tbody> </table> </div> ### pandas中的dataFrame的操作,很大一部分和numpy中的二维数组的操作是近似的 <h1 style="text-align:center">matplotlib绘图 </h1> ```python df = df.drop(['new_score'],axis = 1) ``` ```python df.head(2) ``` <div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>学号</th> <th>姓名</th> <th>性别</th> <th>年龄</th> <th>班级</th> <th>计算机</th> <th>英语</th> <th>数学</th> <th>语文</th> <th>物理</th> <th>化学</th> <th>数学分类</th> </tr> </thead> <tbody> <tr> <th>0</th> <td>1</td> <td>张小文</td> <td>男</td> <td>20</td> <td>1002</td> <td>56</td> <td>62</td> <td>86</td> <td>85</td> <td>86</td> <td>75</td> <td>优秀</td> </tr> <tr> <th>1</th> <td>2</td> <td>李清</td> <td>女</td> <td>19</td> <td>1001</td> <td>94</td> <td>65</td> <td>85</td> <td>90</td> <td>84</td> <td>75</td> <td>优秀</td> </tr> </tbody> </table> </div> ### 绘图 ```python import numpy as np import matplotlib.pyplot as plt #这一行是必不可少的 %matplotlib inline ``` ```python x = np.linspace(0, 10, 100) y = np.sin(x) plt.plot(x, y) plt.plot(x, np.cos(x)) ``` [<matplotlib.lines.Line2D at 0x1b3061cc7f0>]  ```python plt.plot(x, y, '--') ``` [<matplotlib.lines.Line2D at 0x1b3082c71d0>]  ```python fig = plt.figure() plt.plot(x, y, '--') ``` [<matplotlib.lines.Line2D at 0x1b30832ca58>]  ```python fig.savefig('K:/Code/jupyter-notebook/Python Study/first_figure.png') ``` ```python # 虚线样式 plt.subplot(2,1,1) plt.plot(x, np.sin(x),'--') plt.subplot(2,1,2) plt.plot(x, np.cos(x),) ``` [<matplotlib.lines.Line2D at 0x1b308395198>]  ```python # 点状样式 x = np.linspace(0,10,20) plt.plot(x, np.sin(x),'o') ``` [<matplotlib.lines.Line2D at 0x1b3084f4940>]  ```python # color控制颜色 x = np.linspace(0,10,20) plt.plot(x, np.sin(x),'o',color= 'red') ``` [<matplotlib.lines.Line2D at 0x1b30855bef0>]  ```python # 加label标签 x = np.linspace(0, 10, 100) y = np.sin(x) plt.plot(x, y,'--',label='sin(x)') plt.plot(x, np.cos(x),'o',label='cos(x)') # legend控制label的显示效果,loc是控制label的位置的显示 plt.legend(loc= 1 ) ``` <matplotlib.legend.Legend at 0x1b309907198>  ```python plt.legend? ##当遇到一个不熟悉的函数的时候,多使用?号,查看函数的文档 ``` ```python # plot函数,可定义的参数非常多 x = np.linspace(0, 10, 20) y = np.sin(x) plt.plot(x,y,'-p',color = 'green', markersize = 10,linewidth = 4, markeredgecolor = 'orange', markeredgewidth=2) plt.ylim(-0.5,0.8) ``` (-0.5, 0.8)  ```python # 具体参数可查看文档 plt.plot? ``` ```python # ylim,xlim限定函数 plt.plot(x,y,'-p',color = 'green', markersize = 10,linewidth = 4, markeredgecolor = 'orange', markeredgewidth=2) plt.ylim(-0.5,1.2) plt.xlim(2,8) ``` (2, 8)  ```python #散点图函数 plt.scatter(x,y,s=100,c='red') ``` <matplotlib.collections.PathCollection at 0x1b309da0c88>  ```python plt.style.use('classic') x = np.random.randn(100) y = np.random.randn(100) colors = np.random.randn(100) sizes = 1000 * np.random.randn(100) plt.scatter(x,y,c=colors,s=sizes,alpha=0.4) plt.colorbar() ``` c:\python\python36\lib\site-packages\matplotlib\collections.py:902: RuntimeWarning: invalid value encountered in sqrt scale = np.sqrt(self._sizes) * dpi / 72.0 * self._factor <matplotlib.colorbar.Colorbar at 0x1b309fe4f98>  ### pandas本身自带绘图 ### 线性图形 ```python import pandas as pd df = pd.DataFrame(np.random.randn(100,4).cumsum(0),columns=['A','B','C','D']) df.plot() ``` <matplotlib.axes._subplots.AxesSubplot at 0x1b30c0c88d0>  ### 柱状图形 ```python df = pd.DataFrame(np.random.randint(10,50,(3,4)),columns=['A','B','C','D'],index = ['one','two','three']) df.plot.bar() ``` <matplotlib.axes._subplots.AxesSubplot at 0x1b30c284898>  ```python df.B.plot.bar() ``` <matplotlib.axes._subplots.AxesSubplot at 0x1b30c16c9b0>  ```python # 等价于上面的绘制 df.plot(kind = 'bar') ``` <matplotlib.axes._subplots.AxesSubplot at 0x1b30c190898>  ```python # 进行累加 df.plot(kind = 'bar',stacked = True) ``` <matplotlib.axes._subplots.AxesSubplot at 0x1b30c223978>  ### 直方图 ```python df = pd.DataFrame(np.random.randn(100,4),columns=['A','B','C','D']) df.hist(column='A',grid=True,figsize=(10,5)) ``` array([[<matplotlib.axes._subplots.AxesSubplot object at 0x000001B30DE24DD8>]], dtype=object)  ### 密度图 ```python # 等价于df.plot(kind = 'kde') # 提示:运行前,需要安装scipy库,用pip install scipy命令,否则提示:ModuleNotFoundError: No module named 'scipy' df.plot.kde() ``` <matplotlib.axes._subplots.AxesSubplot at 0x1b30e082d30>  ### matplotlib 绘制三维图 ```python from mpl_toolkits.mplot3d import Axes3D from matplotlib import cm from matplotlib.ticker import LinearLocator, FormatStrFormatter import matplotlib.pyplot as plt import numpy as np fig = plt.figure() ax = fig.gca(projection='3d') #横坐标区间,内部不能重复 X = np.arange(-5, 5, 0.25) #纵坐标区间,内部不能重复 Y = np.arange(-5, 5, 0.25) #生成网格 X, Y = np.meshgrid(X, Y) R = np.sqrt(X**2 + Y**2) Z = np.sin(R) #plot the surface z axis surf = ax.plot_surface(X, Y, Z, rstride=1, cstride=1, cmap=cm.jet, linewidth=0, antialiased=False) #Customize the ax.set_zlim(-1.01, 1.01) ax.zaxis.set_major_locator(LinearLocator(10)) ax.zaxis.set_major_formatter(FormatStrFormatter('%.02f')) # Add a color bar which maps values to colors fig.colorbar(surf, shrink=0.5, aspect=5) plt.show() ``` 