<h1 style="text-align:center">泰坦尼克数据处理与分析 </h1>  ```python import pandas as pd %matplotlib inline ``` #### 导入数据 ```python titanic = pd.read_csv('K:/Code/jupyter-notebook/Python Study/train.csv') ``` #### 快速预览 ```python titanic.head() ``` <div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>PassengerId</th> <th>Survived</th> <th>Pclass</th> <th>Name</th> <th>Sex</th> <th>Age</th> <th>SibSp</th> <th>Parch</th> <th>Ticket</th> <th>Fare</th> <th>Cabin</th> <th>Embarked</th> </tr> </thead> <tbody> <tr> <th>0</th> <td>1</td> <td>0</td> <td>3</td> <td>Braund, Mr. Owen Harris</td> <td>male</td> <td>22.0</td> <td>1</td> <td>0</td> <td>A/5 21171</td> <td>7.2500</td> <td>NaN</td> <td>S</td> </tr> <tr> <th>1</th> <td>2</td> <td>1</td> <td>1</td> <td>Cumings, Mrs. John Bradley (Florence Briggs Th...</td> <td>female</td> <td>38.0</td> <td>1</td> <td>0</td> <td>PC 17599</td> <td>71.2833</td> <td>C85</td> <td>C</td> </tr> <tr> <th>2</th> <td>3</td> <td>1</td> <td>3</td> <td>Heikkinen, Miss. Laina</td> <td>female</td> <td>26.0</td> <td>0</td> <td>0</td> <td>STON/O2. 3101282</td> <td>7.9250</td> <td>NaN</td> <td>S</td> </tr> <tr> <th>3</th> <td>4</td> <td>1</td> <td>1</td> <td>Futrelle, Mrs. Jacques Heath (Lily May Peel)</td> <td>female</td> <td>35.0</td> <td>1</td> <td>0</td> <td>113803</td> <td>53.1000</td> <td>C123</td> <td>S</td> </tr> <tr> <th>4</th> <td>5</td> <td>0</td> <td>3</td> <td>Allen, Mr. William Henry</td> <td>male</td> <td>35.0</td> <td>0</td> <td>0</td> <td>373450</td> <td>8.0500</td> <td>NaN</td> <td>S</td> </tr> </tbody> </table> </div> |单词|翻译| |---|---| |Passenger|社会阶层(1、精英;2、中层;3、船员/劳苦大众)| |Survived|是否幸存| |name|名字| |sex|性别| |age|年龄| |sibsp|兄弟姐妹配偶个数 sibling spouse| |parch|父母儿女个数| |ticket|船票号| |fare|船票价格| |cabin|船舱| |embarked|登船口| ```python titanic.info() ``` <class 'pandas.core.frame.DataFrame'> RangeIndex: 891 entries, 0 to 890 Data columns (total 12 columns): PassengerId 891 non-null int64 Survived 891 non-null int64 Pclass 891 non-null int64 Name 891 non-null object Sex 891 non-null object Age 714 non-null float64 SibSp 891 non-null int64 Parch 891 non-null int64 Ticket 891 non-null object Fare 891 non-null float64 Cabin 204 non-null object Embarked 889 non-null object dtypes: float64(2), int64(5), object(5) memory usage: 83.6+ KB ```python # 把所有数值类型的数据做一个简单的统计 titanic.describe() ``` <div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>PassengerId</th> <th>Survived</th> <th>Pclass</th> <th>Age</th> <th>SibSp</th> <th>Parch</th> <th>Fare</th> </tr> </thead> <tbody> <tr> <th>count</th> <td>891.000000</td> <td>891.000000</td> <td>891.000000</td> <td>714.000000</td> <td>891.000000</td> <td>891.000000</td> <td>891.000000</td> </tr> <tr> <th>mean</th> <td>446.000000</td> <td>0.383838</td> <td>2.308642</td> <td>29.699118</td> <td>0.523008</td> <td>0.381594</td> <td>32.204208</td> </tr> <tr> <th>std</th> <td>257.353842</td> <td>0.486592</td> <td>0.836071</td> <td>14.526497</td> <td>1.102743</td> <td>0.806057</td> <td>49.693429</td> </tr> <tr> <th>min</th> <td>1.000000</td> <td>0.000000</td> <td>1.000000</td> <td>0.420000</td> <td>0.000000</td> <td>0.000000</td> <td>0.000000</td> </tr> <tr> <th>25%</th> <td>223.500000</td> <td>0.000000</td> <td>2.000000</td> <td>20.125000</td> <td>0.000000</td> <td>0.000000</td> <td>7.910400</td> </tr> <tr> <th>50%</th> <td>446.000000</td> <td>0.000000</td> <td>3.000000</td> <td>28.000000</td> <td>0.000000</td> <td>0.000000</td> <td>14.454200</td> </tr> <tr> <th>75%</th> <td>668.500000</td> <td>1.000000</td> <td>3.000000</td> <td>38.000000</td> <td>1.000000</td> <td>0.000000</td> <td>31.000000</td> </tr> <tr> <th>max</th> <td>891.000000</td> <td>1.000000</td> <td>3.000000</td> <td>80.000000</td> <td>8.000000</td> <td>6.000000</td> <td>512.329200</td> </tr> </tbody> </table> </div> ```python # isnull函数统计null值的个数 titanic.isnull().sum() ``` PassengerId 0 Survived 0 Pclass 0 Name 0 Sex 0 Age 177 SibSp 0 Parch 0 Ticket 0 Fare 0 Cabin 687 Embarked 2 dtype: int64 #### 处理空值 ```python # 可以填充整个dataframe里面的空值,可以取消注释,试验一下 #titanic.fillna(0) # 单独选择一列进行填充 #titanic.Age.fillna(0) # 求年龄的中位数 titanic.Age.median() #按年龄的中位数进行填充,此时返回一个新的series # titanic.Age.fillna(titanic.Age.median()) #直接填充,并不返回新的series titanic.Age.fillna(titanic.Age.median(),inplace=True) # 在次查看Age的空值 titanic.isnull().sum() ``` ### 尝试从性别进行分析 ```python # 做简单的汇总统计,经常用到 titanic.Sex.value_counts() ``` male 577 female 314 Name: Sex, dtype: int64 ```python # 生还者中,男女的人数 survived = titanic[titanic.Survived==1].Sex.value_counts() ``` ```python # 未生还者中,男女的人数 dead = titanic[titanic.Survived==0].Sex.value_counts() ``` ```python df = pd.DataFrame([survived,dead],index=['survived','dead']) df.plot.bar() ``` <matplotlib.axes._subplots.AxesSubplot at 0x1496afd27f0>  ```python # 绘图成功,但不是想要的效果 # 把dataframe转置一下,行列相互替换 df = df.T df ``` <div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>survived</th> <th>dead</th> </tr> </thead> <tbody> <tr> <th>female</th> <td>233</td> <td>81</td> </tr> <tr> <th>male</th> <td>109</td> <td>468</td> </tr> </tbody> </table> </div> ```python df.plot.bar() # df.plot(kind='bar')等价的 ``` <matplotlib.axes._subplots.AxesSubplot at 0x1496d1d7940>  ```python # 仍然不是我们想要的结果 df.plot(kind = 'bar',stacked = True) ``` <matplotlib.axes._subplots.AxesSubplot at 0x1496d22aef0>  ```python # 男女中生还者的比例情况 df['p_survived'] = df.survived / (df.survived + df.dead) df['p_dead'] = df.dead / (df.survived + df.dead) df[['p_survived','p_dead']].plot.bar(stacked=True) ``` <matplotlib.axes._subplots.AxesSubplot at 0x1496d2b7470>  #### 通过上面图片可以看出:性别特征对是否生还的影响还是挺大的 ### 尝试从年龄进行分析 ```python # 简单统计 # titanic.Age.value_counts() ``` ```python survived = titanic[titanic.Survived==1].Age dead = titanic[titanic.Survived==0].Age df =pd.DataFrame([survived,dead],index=['survived','dead']) df = df.T df.plot.hist(stacked=True) ``` <matplotlib.axes._subplots.AxesSubplot at 0x1496d3c4be0>  ```python # 直方图柱子显示多一点 df.plot.hist(stacked = True,bins = 30) # 中间很高的柱子,是因为我们把空值都替换为了中位数 ``` <matplotlib.axes._subplots.AxesSubplot at 0x1496e42f588>  ```python # 密度图,更直观一点 df.plot.kde() ``` <matplotlib.axes._subplots.AxesSubplot at 0x1496e4c7dd8>  ```python # 可以查看年龄的分布,来决定图片横轴的取值范围 titanic.Age.describe() ``` count 891.000000 mean 29.361582 std 13.019697 min 0.420000 25% 22.000000 50% 28.000000 75% 35.000000 max 80.000000 Name: Age, dtype: float64 ```python # 限定范围 df.plot.kde(xlim=(0,80)) ``` <matplotlib.axes._subplots.AxesSubplot at 0x1496e511c18>  ```python age = 16 young = titanic[titanic.Age<=age]['Survived'].value_counts() old = titanic[titanic.Age>age]['Survived'].value_counts() df = pd.DataFrame([young,old],index = ['young','old']) df.columns = ['dead','survived'] df.plot.bar(stacked = True) ``` <matplotlib.axes._subplots.AxesSubplot at 0x1496f3a3b70>  ```python # 大于16岁和小于等于16岁中生还者的比例情况 df['p_survived'] = df.survived / (df.survived + df.dead) df['p_dead'] = df.dead / (df.survived + df.dead) df[['p_survived','p_dead']].plot.bar(stacked=True) ``` <matplotlib.axes._subplots.AxesSubplot at 0x1496f407c50>  ### 分析票价 ```python # 票价和年龄特征相似 survived = titanic[titanic.Survived==1].Fare dead = titanic[titanic.Survived==0].Fare df = pd.DataFrame([survived,dead],index = ['survived','dead']) df = df.T df.plot.kde() ``` <matplotlib.axes._subplots.AxesSubplot at 0x1496f47b978>  ```python # 设定xlim范围,先查看票价的范围 titanic.Fare.describe() ``` count 891.000000 mean 32.204208 std 49.693429 min 0.000000 25% 7.910400 50% 14.454200 75% 31.000000 max 512.329200 Name: Fare, dtype: float64 ```python df.plot(kind = 'kde',xlim = (0,513)) ``` <matplotlib.axes._subplots.AxesSubplot at 0x1496f45bba8>  #### 可以看出低票价的人生还率比较低 ### 组合特征 ```python # 比如同时查看年龄和票价对生还率的影响 import matplotlib.pyplot as plt plt.scatter(titanic[titanic.Survived==0].Age, titanic[titanic.Survived==0].Fare) ``` <matplotlib.collections.PathCollection at 0x1496f597a58>  ```python # 不美观 ax = plt.subplot() # 未生还者 age = titanic[titanic.Survived==0].Age fare = titanic[titanic.Survived==0].Fare plt.scatter(age, fare,s=20,alpha=0.3,linewidths=1,edgecolors='gray') #生还者 age = titanic[titanic.Survived==1].Age fare = titanic[titanic.Survived==1].Fare plt.scatter(age, fare,s=20,alpha=0.3,linewidths=1,edgecolors='red') ax.set_xlabel('age') ax.set_ylabel('fare') ``` Text(0,0.5,'fare')  ```python # 生还者 ax = plt.subplot() age = titanic[titanic.Survived==1].Age fare = titanic[titanic.Survived==1].Fare plt.scatter(age, fare,s=20,alpha=0.5,linewidths=1,edgecolors='red') ax.set_xlabel('age') ax.set_ylabel('fare') ``` Text(0,0.5,'fare')  ### 隐含特征 ```python #提取称呼Mr Mrs Miss titanic.Name ``` 0 Braund, Mr. Owen Harris 1 Cumings, Mrs. John Bradley (Florence Briggs Th... 2 Heikkinen, Miss. Laina 3 Futrelle, Mrs. Jacques Heath (Lily May Peel) 4 Allen, Mr. William Henry 5 Moran, Mr. James 6 McCarthy, Mr. Timothy J 7 Palsson, Master. Gosta Leonard 8 Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) 9 Nasser, Mrs. Nicholas (Adele Achem) 10 Sandstrom, Miss. Marguerite Rut 11 Bonnell, Miss. Elizabeth 12 Saundercock, Mr. William Henry 13 Andersson, Mr. Anders Johan 14 Vestrom, Miss. Hulda Amanda Adolfina 15 Hewlett, Mrs. (Mary D Kingcome) 16 Rice, Master. Eugene 17 Williams, Mr. Charles Eugene 18 Vander Planke, Mrs. Julius (Emelia Maria Vande... 19 Masselmani, Mrs. Fatima 20 Fynney, Mr. Joseph J 21 Beesley, Mr. Lawrence 22 McGowan, Miss. Anna "Annie" 23 Sloper, Mr. William Thompson 24 Palsson, Miss. Torborg Danira 25 Asplund, Mrs. Carl Oscar (Selma Augusta Emilia... 26 Emir, Mr. Farred Chehab 27 Fortune, Mr. Charles Alexander 28 O'Dwyer, Miss. Ellen "Nellie" 29 Todoroff, Mr. Lalio ... 861 Giles, Mr. Frederick Edward 862 Swift, Mrs. Frederick Joel (Margaret Welles Ba... 863 Sage, Miss. Dorothy Edith "Dolly" 864 Gill, Mr. John William 865 Bystrom, Mrs. (Karolina) 866 Duran y More, Miss. Asuncion 867 Roebling, Mr. Washington Augustus II 868 van Melkebeke, Mr. Philemon 869 Johnson, Master. Harold Theodor 870 Balkic, Mr. Cerin 871 Beckwith, Mrs. Richard Leonard (Sallie Monypeny) 872 Carlsson, Mr. Frans Olof 873 Vander Cruyssen, Mr. Victor 874 Abelson, Mrs. Samuel (Hannah Wizosky) 875 Najib, Miss. Adele Kiamie "Jane" 876 Gustafsson, Mr. Alfred Ossian 877 Petroff, Mr. Nedelio 878 Laleff, Mr. Kristo 879 Potter, Mrs. Thomas Jr (Lily Alexenia Wilson) 880 Shelley, Mrs. William (Imanita Parrish Hall) 881 Markun, Mr. Johann 882 Dahlberg, Miss. Gerda Ulrika 883 Banfield, Mr. Frederick James 884 Sutehall, Mr. Henry Jr 885 Rice, Mrs. William (Margaret Norton) 886 Montvila, Rev. Juozas 887 Graham, Miss. Margaret Edith 888 Johnston, Miss. Catherine Helen "Carrie" 889 Behr, Mr. Karl Howell 890 Dooley, Mr. Patrick Name: Name, Length: 891, dtype: object ```python titanic['title'] = titanic.Name.apply(lambda name: name.split(',')[1].split('.')[0].strip()) ``` ```python s= 'Williams, Mr.Howard Hugh "harry"' s.split(',')[-1].split('.')[0].strip() ``` 'Mr' ```python titanic.title.value_counts() # 比如有一个人称呼是Mr,而年龄是不可知的,这个时候可以用所有Mr的年龄平均值来替代, # 而不是用我们之前最简单的所有数据的中位数。 ``` Mr 517 Miss 182 Mrs 125 Master 40 Dr 7 Rev 6 Mlle 2 Major 2 Col 2 Capt 1 Ms 1 Mme 1 Jonkheer 1 the Countess 1 Don 1 Lady 1 Sir 1 Name: title, dtype: int64 ### GDP ```python ### 夜光图,简单用灯光图的亮度来模拟这个GDP ``` ```python titanic.head() ``` <div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>PassengerId</th> <th>Survived</th> <th>Pclass</th> <th>Name</th> <th>Sex</th> <th>Age</th> <th>SibSp</th> <th>Parch</th> <th>Ticket</th> <th>Fare</th> <th>Cabin</th> <th>Embarked</th> <th>title</th> </tr> </thead> <tbody> <tr> <th>0</th> <td>1</td> <td>0</td> <td>3</td> <td>Braund, Mr. Owen Harris</td> <td>male</td> <td>22.0</td> <td>1</td> <td>0</td> <td>A/5 21171</td> <td>7.2500</td> <td>NaN</td> <td>S</td> <td>Mr</td> </tr> <tr> <th>1</th> <td>2</td> <td>1</td> <td>1</td> <td>Cumings, Mrs. John Bradley (Florence Briggs Th...</td> <td>female</td> <td>38.0</td> <td>1</td> <td>0</td> <td>PC 17599</td> <td>71.2833</td> <td>C85</td> <td>C</td> <td>Mrs</td> </tr> <tr> <th>2</th> <td>3</td> <td>1</td> <td>3</td> <td>Heikkinen, Miss. Laina</td> <td>female</td> <td>26.0</td> <td>0</td> <td>0</td> <td>STON/O2. 3101282</td> <td>7.9250</td> <td>NaN</td> <td>S</td> <td>Miss</td> </tr> <tr> <th>3</th> <td>4</td> <td>1</td> <td>1</td> <td>Futrelle, Mrs. Jacques Heath (Lily May Peel)</td> <td>female</td> <td>35.0</td> <td>1</td> <td>0</td> <td>113803</td> <td>53.1000</td> <td>C123</td> <td>S</td> <td>Mrs</td> </tr> <tr> <th>4</th> <td>5</td> <td>0</td> <td>3</td> <td>Allen, Mr. William Henry</td> <td>male</td> <td>35.0</td> <td>0</td> <td>0</td> <td>373450</td> <td>8.0500</td> <td>NaN</td> <td>S</td> <td>Mr</td> </tr> </tbody> </table> </div> ```python titanic['family_size'] = titanic.SibSp + titanic.Parch + 1 ``` ```python titanic ``` <div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>PassengerId</th> <th>Survived</th> <th>Pclass</th> <th>Name</th> <th>Sex</th> <th>Age</th> <th>SibSp</th> <th>Parch</th> <th>Ticket</th> <th>Fare</th> <th>Cabin</th> <th>Embarked</th> <th>title</th> <th>family_size</th> </tr> </thead> <tbody> <tr> <th>0</th> <td>1</td> <td>0</td> <td>3</td> <td>Braund, Mr. Owen Harris</td> <td>male</td> <td>22.0</td> <td>1</td> <td>0</td> <td>A/5 21171</td> <td>7.2500</td> <td>NaN</td> <td>S</td> <td>Mr</td> <td>2</td> </tr> <tr> <th>1</th> <td>2</td> <td>1</td> <td>1</td> <td>Cumings, Mrs. John Bradley (Florence Briggs Th...</td> <td>female</td> <td>38.0</td> <td>1</td> <td>0</td> <td>PC 17599</td> <td>71.2833</td> <td>C85</td> <td>C</td> <td>Mrs</td> <td>2</td> </tr> <tr> <th>2</th> <td>3</td> <td>1</td> <td>3</td> <td>Heikkinen, Miss. Laina</td> <td>female</td> <td>26.0</td> <td>0</td> <td>0</td> <td>STON/O2. 3101282</td> <td>7.9250</td> <td>NaN</td> <td>S</td> <td>Miss</td> <td>1</td> </tr> <tr> <th>3</th> <td>4</td> <td>1</td> <td>1</td> <td>Futrelle, Mrs. Jacques Heath (Lily May Peel)</td> <td>female</td> <td>35.0</td> <td>1</td> <td>0</td> <td>113803</td> <td>53.1000</td> <td>C123</td> <td>S</td> <td>Mrs</td> <td>2</td> </tr> <tr> <th>4</th> <td>5</td> <td>0</td> <td>3</td> <td>Allen, Mr. William Henry</td> <td>male</td> <td>35.0</td> <td>0</td> <td>0</td> <td>373450</td> <td>8.0500</td> <td>NaN</td> <td>S</td> <td>Mr</td> <td>1</td> </tr> <tr> <th>5</th> <td>6</td> <td>0</td> <td>3</td> <td>Moran, Mr. James</td> <td>male</td> <td>28.0</td> <td>0</td> <td>0</td> <td>330877</td> <td>8.4583</td> <td>NaN</td> <td>Q</td> <td>Mr</td> <td>1</td> </tr> <tr> <th>6</th> <td>7</td> <td>0</td> <td>1</td> <td>McCarthy, Mr. Timothy J</td> <td>male</td> <td>54.0</td> <td>0</td> <td>0</td> <td>17463</td> <td>51.8625</td> <td>E46</td> <td>S</td> <td>Mr</td> <td>1</td> </tr> <tr> <th>7</th> <td>8</td> <td>0</td> <td>3</td> <td>Palsson, Master. Gosta Leonard</td> <td>male</td> <td>2.0</td> <td>3</td> <td>1</td> <td>349909</td> <td>21.0750</td> <td>NaN</td> <td>S</td> <td>Master</td> <td>5</td> </tr> <tr> <th>8</th> <td>9</td> <td>1</td> <td>3</td> <td>Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)</td> <td>female</td> <td>27.0</td> <td>0</td> <td>2</td> <td>347742</td> <td>11.1333</td> <td>NaN</td> <td>S</td> <td>Mrs</td> <td>3</td> </tr> <tr> <th>9</th> <td>10</td> <td>1</td> <td>2</td> <td>Nasser, Mrs. Nicholas (Adele Achem)</td> <td>female</td> <td>14.0</td> <td>1</td> <td>0</td> <td>237736</td> <td>30.0708</td> <td>NaN</td> <td>C</td> <td>Mrs</td> <td>2</td> </tr> <tr> <th>10</th> <td>11</td> <td>1</td> <td>3</td> <td>Sandstrom, Miss. Marguerite Rut</td> <td>female</td> <td>4.0</td> <td>1</td> <td>1</td> <td>PP 9549</td> <td>16.7000</td> <td>G6</td> <td>S</td> <td>Miss</td> <td>3</td> </tr> <tr> <th>11</th> <td>12</td> <td>1</td> <td>1</td> <td>Bonnell, Miss. Elizabeth</td> <td>female</td> <td>58.0</td> <td>0</td> <td>0</td> <td>113783</td> <td>26.5500</td> <td>C103</td> <td>S</td> <td>Miss</td> <td>1</td> </tr> <tr> <th>12</th> <td>13</td> <td>0</td> <td>3</td> <td>Saundercock, Mr. William Henry</td> <td>male</td> <td>20.0</td> <td>0</td> <td>0</td> <td>A/5. 2151</td> <td>8.0500</td> <td>NaN</td> <td>S</td> <td>Mr</td> <td>1</td> </tr> <tr> <th>13</th> <td>14</td> <td>0</td> <td>3</td> <td>Andersson, Mr. Anders Johan</td> <td>male</td> <td>39.0</td> <td>1</td> <td>5</td> <td>347082</td> <td>31.2750</td> <td>NaN</td> <td>S</td> <td>Mr</td> <td>7</td> </tr> <tr> <th>14</th> <td>15</td> <td>0</td> <td>3</td> <td>Vestrom, Miss. Hulda Amanda Adolfina</td> <td>female</td> <td>14.0</td> <td>0</td> <td>0</td> <td>350406</td> <td>7.8542</td> <td>NaN</td> <td>S</td> <td>Miss</td> <td>1</td> </tr> <tr> <th>15</th> <td>16</td> <td>1</td> <td>2</td> <td>Hewlett, Mrs. (Mary D Kingcome)</td> <td>female</td> <td>55.0</td> <td>0</td> <td>0</td> <td>248706</td> <td>16.0000</td> <td>NaN</td> <td>S</td> <td>Mrs</td> <td>1</td> </tr> <tr> <th>16</th> <td>17</td> <td>0</td> <td>3</td> <td>Rice, Master. Eugene</td> <td>male</td> <td>2.0</td> <td>4</td> <td>1</td> <td>382652</td> <td>29.1250</td> <td>NaN</td> <td>Q</td> <td>Master</td> <td>6</td> </tr> <tr> <th>17</th> <td>18</td> <td>1</td> <td>2</td> <td>Williams, Mr. Charles Eugene</td> <td>male</td> <td>28.0</td> <td>0</td> <td>0</td> <td>244373</td> <td>13.0000</td> <td>NaN</td> <td>S</td> <td>Mr</td> <td>1</td> </tr> <tr> <th>18</th> <td>19</td> <td>0</td> <td>3</td> <td>Vander Planke, Mrs. Julius (Emelia Maria Vande...</td> <td>female</td> <td>31.0</td> <td>1</td> <td>0</td> <td>345763</td> <td>18.0000</td> <td>NaN</td> <td>S</td> <td>Mrs</td> <td>2</td> </tr> <tr> <th>19</th> <td>20</td> <td>1</td> <td>3</td> <td>Masselmani, Mrs. Fatima</td> <td>female</td> <td>28.0</td> <td>0</td> <td>0</td> <td>2649</td> <td>7.2250</td> <td>NaN</td> <td>C</td> <td>Mrs</td> <td>1</td> </tr> <tr> <th>20</th> <td>21</td> <td>0</td> <td>2</td> <td>Fynney, Mr. Joseph J</td> <td>male</td> <td>35.0</td> <td>0</td> <td>0</td> <td>239865</td> <td>26.0000</td> <td>NaN</td> <td>S</td> <td>Mr</td> <td>1</td> </tr> <tr> <th>21</th> <td>22</td> <td>1</td> <td>2</td> <td>Beesley, Mr. Lawrence</td> <td>male</td> <td>34.0</td> <td>0</td> <td>0</td> <td>248698</td> <td>13.0000</td> <td>D56</td> <td>S</td> <td>Mr</td> <td>1</td> </tr> <tr> <th>22</th> <td>23</td> <td>1</td> <td>3</td> <td>McGowan, Miss. Anna "Annie"</td> <td>female</td> <td>15.0</td> <td>0</td> <td>0</td> <td>330923</td> <td>8.0292</td> <td>NaN</td> <td>Q</td> <td>Miss</td> <td>1</td> </tr> <tr> <th>23</th> <td>24</td> <td>1</td> <td>1</td> <td>Sloper, Mr. William Thompson</td> <td>male</td> <td>28.0</td> <td>0</td> <td>0</td> <td>113788</td> <td>35.5000</td> <td>A6</td> <td>S</td> <td>Mr</td> <td>1</td> </tr> <tr> <th>24</th> <td>25</td> <td>0</td> <td>3</td> <td>Palsson, Miss. Torborg Danira</td> <td>female</td> <td>8.0</td> <td>3</td> <td>1</td> <td>349909</td> <td>21.0750</td> <td>NaN</td> <td>S</td> <td>Miss</td> <td>5</td> </tr> <tr> <th>25</th> <td>26</td> <td>1</td> <td>3</td> <td>Asplund, Mrs. Carl Oscar (Selma Augusta Emilia...</td> <td>female</td> <td>38.0</td> <td>1</td> <td>5</td> <td>347077</td> <td>31.3875</td> <td>NaN</td> <td>S</td> <td>Mrs</td> <td>7</td> </tr> <tr> <th>26</th> <td>27</td> <td>0</td> <td>3</td> <td>Emir, Mr. Farred Chehab</td> <td>male</td> <td>28.0</td> <td>0</td> <td>0</td> <td>2631</td> <td>7.2250</td> <td>NaN</td> <td>C</td> <td>Mr</td> <td>1</td> </tr> <tr> <th>27</th> <td>28</td> <td>0</td> <td>1</td> <td>Fortune, Mr. Charles Alexander</td> <td>male</td> <td>19.0</td> <td>3</td> <td>2</td> <td>19950</td> <td>263.0000</td> <td>C23 C25 C27</td> <td>S</td> <td>Mr</td> <td>6</td> </tr> <tr> <th>28</th> <td>29</td> <td>1</td> <td>3</td> <td>O'Dwyer, Miss. Ellen "Nellie"</td> <td>female</td> <td>28.0</td> <td>0</td> <td>0</td> <td>330959</td> <td>7.8792</td> <td>NaN</td> <td>Q</td> <td>Miss</td> <td>1</td> </tr> <tr> <th>29</th> <td>30</td> <td>0</td> <td>3</td> <td>Todoroff, Mr. Lalio</td> <td>male</td> <td>28.0</td> <td>0</td> <td>0</td> <td>349216</td> <td>7.8958</td> <td>NaN</td> <td>S</td> <td>Mr</td> <td>1</td> </tr> <tr> <th>...</th> <td>...</td> <td>...</td> <td>...</td> <td>...</td> <td>...</td> <td>...</td> <td>...</td> <td>...</td> <td>...</td> <td>...</td> <td>...</td> <td>...</td> <td>...</td> <td>...</td> </tr> <tr> <th>861</th> <td>862</td> <td>0</td> <td>2</td> <td>Giles, Mr. Frederick Edward</td> <td>male</td> <td>21.0</td> <td>1</td> <td>0</td> <td>28134</td> <td>11.5000</td> <td>NaN</td> <td>S</td> <td>Mr</td> <td>2</td> </tr> <tr> <th>862</th> <td>863</td> <td>1</td> <td>1</td> <td>Swift, Mrs. Frederick Joel (Margaret Welles Ba...</td> <td>female</td> <td>48.0</td> <td>0</td> <td>0</td> <td>17466</td> <td>25.9292</td> <td>D17</td> <td>S</td> <td>Mrs</td> <td>1</td> </tr> <tr> <th>863</th> <td>864</td> <td>0</td> <td>3</td> <td>Sage, Miss. Dorothy Edith "Dolly"</td> <td>female</td> <td>28.0</td> <td>8</td> <td>2</td> <td>CA. 2343</td> <td>69.5500</td> <td>NaN</td> <td>S</td> <td>Miss</td> <td>11</td> </tr> <tr> <th>864</th> <td>865</td> <td>0</td> <td>2</td> <td>Gill, Mr. John William</td> <td>male</td> <td>24.0</td> <td>0</td> <td>0</td> <td>233866</td> <td>13.0000</td> <td>NaN</td> <td>S</td> <td>Mr</td> <td>1</td> </tr> <tr> <th>865</th> <td>866</td> <td>1</td> <td>2</td> <td>Bystrom, Mrs. (Karolina)</td> <td>female</td> <td>42.0</td> <td>0</td> <td>0</td> <td>236852</td> <td>13.0000</td> <td>NaN</td> <td>S</td> <td>Mrs</td> <td>1</td> </tr> <tr> <th>866</th> <td>867</td> <td>1</td> <td>2</td> <td>Duran y More, Miss. Asuncion</td> <td>female</td> <td>27.0</td> <td>1</td> <td>0</td> <td>SC/PARIS 2149</td> <td>13.8583</td> <td>NaN</td> <td>C</td> <td>Miss</td> <td>2</td> </tr> <tr> <th>867</th> <td>868</td> <td>0</td> <td>1</td> <td>Roebling, Mr. Washington Augustus II</td> <td>male</td> <td>31.0</td> <td>0</td> <td>0</td> <td>PC 17590</td> <td>50.4958</td> <td>A24</td> <td>S</td> <td>Mr</td> <td>1</td> </tr> <tr> <th>868</th> <td>869</td> <td>0</td> <td>3</td> <td>van Melkebeke, Mr. Philemon</td> <td>male</td> <td>28.0</td> <td>0</td> <td>0</td> <td>345777</td> <td>9.5000</td> <td>NaN</td> <td>S</td> <td>Mr</td> <td>1</td> </tr> <tr> <th>869</th> <td>870</td> <td>1</td> <td>3</td> <td>Johnson, Master. Harold Theodor</td> <td>male</td> <td>4.0</td> <td>1</td> <td>1</td> <td>347742</td> <td>11.1333</td> <td>NaN</td> <td>S</td> <td>Master</td> <td>3</td> </tr> <tr> <th>870</th> <td>871</td> <td>0</td> <td>3</td> <td>Balkic, Mr. Cerin</td> <td>male</td> <td>26.0</td> <td>0</td> <td>0</td> <td>349248</td> <td>7.8958</td> <td>NaN</td> <td>S</td> <td>Mr</td> <td>1</td> </tr> <tr> <th>871</th> <td>872</td> <td>1</td> <td>1</td> <td>Beckwith, Mrs. Richard Leonard (Sallie Monypeny)</td> <td>female</td> <td>47.0</td> <td>1</td> <td>1</td> <td>11751</td> <td>52.5542</td> <td>D35</td> <td>S</td> <td>Mrs</td> <td>3</td> </tr> <tr> <th>872</th> <td>873</td> <td>0</td> <td>1</td> <td>Carlsson, Mr. Frans Olof</td> <td>male</td> <td>33.0</td> <td>0</td> <td>0</td> <td>695</td> <td>5.0000</td> <td>B51 B53 B55</td> <td>S</td> <td>Mr</td> <td>1</td> </tr> <tr> <th>873</th> <td>874</td> <td>0</td> <td>3</td> <td>Vander Cruyssen, Mr. Victor</td> <td>male</td> <td>47.0</td> <td>0</td> <td>0</td> <td>345765</td> <td>9.0000</td> <td>NaN</td> <td>S</td> <td>Mr</td> <td>1</td> </tr> <tr> <th>874</th> <td>875</td> <td>1</td> <td>2</td> <td>Abelson, Mrs. Samuel (Hannah Wizosky)</td> <td>female</td> <td>28.0</td> <td>1</td> <td>0</td> <td>P/PP 3381</td> <td>24.0000</td> <td>NaN</td> <td>C</td> <td>Mrs</td> <td>2</td> </tr> <tr> <th>875</th> <td>876</td> <td>1</td> <td>3</td> <td>Najib, Miss. Adele Kiamie "Jane"</td> <td>female</td> <td>15.0</td> <td>0</td> <td>0</td> <td>2667</td> <td>7.2250</td> <td>NaN</td> <td>C</td> <td>Miss</td> <td>1</td> </tr> <tr> <th>876</th> <td>877</td> <td>0</td> <td>3</td> <td>Gustafsson, Mr. Alfred Ossian</td> <td>male</td> <td>20.0</td> <td>0</td> <td>0</td> <td>7534</td> <td>9.8458</td> <td>NaN</td> <td>S</td> <td>Mr</td> <td>1</td> </tr> <tr> <th>877</th> <td>878</td> <td>0</td> <td>3</td> <td>Petroff, Mr. Nedelio</td> <td>male</td> <td>19.0</td> <td>0</td> <td>0</td> <td>349212</td> <td>7.8958</td> <td>NaN</td> <td>S</td> <td>Mr</td> <td>1</td> </tr> <tr> <th>878</th> <td>879</td> <td>0</td> <td>3</td> <td>Laleff, Mr. Kristo</td> <td>male</td> <td>28.0</td> <td>0</td> <td>0</td> <td>349217</td> <td>7.8958</td> <td>NaN</td> <td>S</td> <td>Mr</td> <td>1</td> </tr> <tr> <th>879</th> <td>880</td> <td>1</td> <td>1</td> <td>Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)</td> <td>female</td> <td>56.0</td> <td>0</td> <td>1</td> <td>11767</td> <td>83.1583</td> <td>C50</td> <td>C</td> <td>Mrs</td> <td>2</td> </tr> <tr> <th>880</th> <td>881</td> <td>1</td> <td>2</td> <td>Shelley, Mrs. William (Imanita Parrish Hall)</td> <td>female</td> <td>25.0</td> <td>0</td> <td>1</td> <td>230433</td> <td>26.0000</td> <td>NaN</td> <td>S</td> <td>Mrs</td> <td>2</td> </tr> <tr> <th>881</th> <td>882</td> <td>0</td> <td>3</td> <td>Markun, Mr. Johann</td> <td>male</td> <td>33.0</td> <td>0</td> <td>0</td> <td>349257</td> <td>7.8958</td> <td>NaN</td> <td>S</td> <td>Mr</td> <td>1</td> </tr> <tr> <th>882</th> <td>883</td> <td>0</td> <td>3</td> <td>Dahlberg, Miss. Gerda Ulrika</td> <td>female</td> <td>22.0</td> <td>0</td> <td>0</td> <td>7552</td> <td>10.5167</td> <td>NaN</td> <td>S</td> <td>Miss</td> <td>1</td> </tr> <tr> <th>883</th> <td>884</td> <td>0</td> <td>2</td> <td>Banfield, Mr. Frederick James</td> <td>male</td> <td>28.0</td> <td>0</td> <td>0</td> <td>C.A./SOTON 34068</td> <td>10.5000</td> <td>NaN</td> <td>S</td> <td>Mr</td> <td>1</td> </tr> <tr> <th>884</th> <td>885</td> <td>0</td> <td>3</td> <td>Sutehall, Mr. Henry Jr</td> <td>male</td> <td>25.0</td> <td>0</td> <td>0</td> <td>SOTON/OQ 392076</td> <td>7.0500</td> <td>NaN</td> <td>S</td> <td>Mr</td> <td>1</td> </tr> <tr> <th>885</th> <td>886</td> <td>0</td> <td>3</td> <td>Rice, Mrs. William (Margaret Norton)</td> <td>female</td> <td>39.0</td> <td>0</td> <td>5</td> <td>382652</td> <td>29.1250</td> <td>NaN</td> <td>Q</td> <td>Mrs</td> <td>6</td> </tr> <tr> <th>886</th> <td>887</td> <td>0</td> <td>2</td> <td>Montvila, Rev. Juozas</td> <td>male</td> <td>27.0</td> <td>0</td> <td>0</td> <td>211536</td> <td>13.0000</td> <td>NaN</td> <td>S</td> <td>Rev</td> <td>1</td> </tr> <tr> <th>887</th> <td>888</td> <td>1</td> <td>1</td> <td>Graham, Miss. Margaret Edith</td> <td>female</td> <td>19.0</td> <td>0</td> <td>0</td> <td>112053</td> <td>30.0000</td> <td>B42</td> <td>S</td> <td>Miss</td> <td>1</td> </tr> <tr> <th>888</th> <td>889</td> <td>0</td> <td>3</td> <td>Johnston, Miss. Catherine Helen "Carrie"</td> <td>female</td> <td>28.0</td> <td>1</td> <td>2</td> <td>W./C. 6607</td> <td>23.4500</td> <td>NaN</td> <td>S</td> <td>Miss</td> <td>4</td> </tr> <tr> <th>889</th> <td>890</td> <td>1</td> <td>1</td> <td>Behr, Mr. Karl Howell</td> <td>male</td> <td>26.0</td> <td>0</td> <td>0</td> <td>111369</td> <td>30.0000</td> <td>C148</td> <td>C</td> <td>Mr</td> <td>1</td> </tr> <tr> <th>890</th> <td>891</td> <td>0</td> <td>3</td> <td>Dooley, Mr. Patrick</td> <td>male</td> <td>32.0</td> <td>0</td> <td>0</td> <td>370376</td> <td>7.7500</td> <td>NaN</td> <td>Q</td> <td>Mr</td> <td>1</td> </tr> </tbody> </table> <p>891 rows × 14 columns</p> </div> ```python titanic.family_size.value_counts() ``` 1 537 2 161 3 102 4 29 6 22 5 15 7 12 11 7 8 6 Name: family_size, dtype: int64 ```python def func(family_size): if family_size == 1: return 'Singleton' if family_size <= 4 and family_size >= 2: return 'SmallFamily' if family_size > 4: return 'LargeFamily' titanic['family_type'] = titanic.family_size.apply(func) ``` ```python titanic.family_type.value_counts() ``` Singleton 537 SmallFamily 292 LargeFamily 62 Name: family_type, dtype: int64