\n",
+ " "
+ ],
+ "text/plain": [
+ " name gender age ... ticketno fare survived\n",
+ "0 Abbing, Mr. Anthony male 42.0 ... 5547.0 7.11 0\n",
+ "1 Abbott, Mr. Eugene Joseph male 13.0 ... 2673.0 20.05 0\n",
+ "2 Abbott, Mr. Rossmore Edward male 16.0 ... 2673.0 20.05 0\n",
+ "3 Abbott, Mrs. Rhoda Mary 'Rosa' female 39.0 ... 2673.0 20.05 1\n",
+ "4 Abelseth, Miss. Karen Marie female 16.0 ... 348125.0 7.13 1\n",
+ "... ... ... ... ... ... ... ...\n",
+ "2202 Wynn, Mr. Walter male 41.0 ... NaN NaN 1\n",
+ "2203 Yearsley, Mr. Harry male 40.0 ... NaN NaN 1\n",
+ "2204 Young, Mr. Francis James male 32.0 ... NaN NaN 0\n",
+ "2205 Zanetti, Sig. Minio male 20.0 ... NaN NaN 0\n",
+ "2206 Zarracchi, Sig. L. male 26.0 ... NaN NaN 0\n",
+ "\n",
+ "[2207 rows x 9 columns]"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 225
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "c6RRWPfWJtY5"
+ },
+ "source": [
+ "The categorical variables in this data set are:\n",
+ "- `gender` (male or female)\n",
+ "- `class`: what class they were in (1st, 2nd, or 3rd) or what type of crew member they were\n",
+ "- `embarked`: where they embarked (Belfast, Southampton, Cherbourg, or Queenstown)\n",
+ "- `country`: their country of origin\n",
+ "- `ticketno`: the ticket number\n",
+ "- `survived`: whether or not they survived the disaster\n",
+ "\n",
+ "Note that `age` and `fare` are quantitative variables. It is tempting to consider `name` a categorical variable, but it is not, since (almost) every person has a unique name. In order for a variable to be categorical, it must take on a _limited_ set of values, ideally with each level appearing multiple times in the data set. Otherwise, the analyses that we describe in this chapter will not be very meaningful."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "XRBPQWpZJtZm"
+ },
+ "source": [
+ "# Analyzing One Categorical Variable\n",
+ "\n",
+ "In this lesson, we focus on a single categorical variable. For a high-level summary of a categorical variable, we can use the `.describe()` command. Note that the behavior of `.describe()` will change, depending on whether `pandas` thinks that the variable is quantitative or categorical, which is why it is important to cast categorical variables to the right type, as we discussed in the previous chapter."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "nru7HFtNJtZn",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "9a4d920b-873a-458c-9dfe-a4248baf161b"
+ },
+ "source": [
+ "df_titanic[\"class\"].describe()"
+ ],
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "count 2207\n",
+ "unique 7\n",
+ "top 3rd\n",
+ "freq 709\n",
+ "Name: class, dtype: object"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 226
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "AUwJjfTQJtZt"
+ },
+ "source": [
+ "To completely summarize a single categorical variable, we report the number of times each level appeared, or its **frequency**."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "E4kcGK9TJtZt",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "e14dca12-4e4a-4853-dc2a-69a479e0a6fe"
+ },
+ "source": [
+ "class_counts = df_titanic[\"class\"].value_counts()\n",
+ "class_counts"
+ ],
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "3rd 709\n",
+ "victualling crew 431\n",
+ "1st 324\n",
+ "engineering crew 324\n",
+ "2nd 284\n",
+ "restaurant staff 69\n",
+ "deck crew 66\n",
+ "Name: class, dtype: int64"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 227
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "Bmk5Y7zQJtZw"
+ },
+ "source": [
+ "Notice that the levels are sorted in decreasing order of frequency by default. We can also report the levels in the order they appear in the data set..."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "yEK0-PecJtZx",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "4f1b1486-258b-4c07-ac86-bccd180c7ab9"
+ },
+ "source": [
+ "df_titanic[\"class\"].value_counts(sort=False)"
+ ],
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "engineering crew 324\n",
+ "victualling crew 431\n",
+ "1st 324\n",
+ "restaurant staff 69\n",
+ "3rd 709\n",
+ "deck crew 66\n",
+ "2nd 284\n",
+ "Name: class, dtype: int64"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 228
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "7P-TqJvrJtZ0"
+ },
+ "source": [
+ "...or in alphabetical order, by sorting the index:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "w1jjt1nuJtZ0",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "aa821cea-1fd0-4555-a657-c74ad196821d"
+ },
+ "source": [
+ "class_counts.sort_index()"
+ ],
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "1st 324\n",
+ "2nd 284\n",
+ "3rd 709\n",
+ "deck crew 66\n",
+ "engineering crew 324\n",
+ "restaurant staff 69\n",
+ "victualling crew 431\n",
+ "Name: class, dtype: int64"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 229
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "kCTkd_7OJtZ3"
+ },
+ "source": [
+ "Note that this produces a _new_ `Series` with the index sorted. It does not sort the original `Series` `class_counts`. To sort the original series, we need to specify that the sorting should be done in place, just like we did for `.set_index()` in Chapter 1."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "6tMY5S4CJtZ3",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "78377215-6382-4b0b-d1d5-895f7761ef5a"
+ },
+ "source": [
+ "class_counts.sort_index(inplace=True)\n",
+ "class_counts"
+ ],
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "1st 324\n",
+ "2nd 284\n",
+ "3rd 709\n",
+ "deck crew 66\n",
+ "engineering crew 324\n",
+ "restaurant staff 69\n",
+ "victualling crew 431\n",
+ "Name: class, dtype: int64"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 230
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "HBFyIJmYJtZ8"
+ },
+ "source": [
+ "Any other order would require selecting the levels manually, in the desired order."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "Q7vi4FgIJtZ9",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "5289ef2d-79d9-463e-db08-c12be3216fb5"
+ },
+ "source": [
+ "class_counts.loc[\n",
+ " [\"1st\", \"2nd\", \"3rd\",\n",
+ " \"deck crew\", \"engineering crew\", \"victualling crew\",\n",
+ " \"restaurant staff\"]\n",
+ "]"
+ ],
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "1st 324\n",
+ "2nd 284\n",
+ "3rd 709\n",
+ "deck crew 66\n",
+ "engineering crew 324\n",
+ "victualling crew 431\n",
+ "restaurant staff 69\n",
+ "Name: class, dtype: int64"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 231
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "IKNQM_UxJtZ_"
+ },
+ "source": [
+ "This information can be visualized using a **bar chart**."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "1Pr1BpI9JtZ_",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 359
+ },
+ "outputId": "e0cbb48e-36ae-4412-f900-4f4b4e68974c"
+ },
+ "source": [
+ "class_counts.plot.bar()"
+ ],
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "execution_count": 232
+ },
+ {
+ "output_type": "display_data",
+ "data": {
+ "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAFFCAYAAAAXcq1YAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nO3de5xdZX3v8c+XmwjKfUwhSQkqQqnlEqNAsdpKrXJR0FIUL+RQjrGVKmptTU+16nmhxfa8pKAtNYoaqlYu4iEFtNKIKLSgw0WuUmKEQ9JAInITtAh8zx/r2bIzTjJ7T2b2mv3wfb9e89prPWvNzC+8Nt9Z+1nPeh7ZJiIi6rJZ2wVERMTUS7hHRFQo4R4RUaGEe0REhRLuEREV2qLtAgB22WUXz5s3r+0yIiKGyjXXXPMj2yPjHZsR4T5v3jxGR0fbLiMiYqhIunNDx9ItExFRoYR7RESFEu4RERVKuEdEVCjhHhFRoYR7RESFEu4RERVKuEdEVGjCcJe0l6Tru74elPROSTtJulTS7eV1x3K+JJ0haYWkGyTNn/5/RkREdJvwCVXbtwH7A0jaHFgNfAVYDCy3faqkxWX/vcBhwJ7l60DgzPIalZi3+OJp/fl3nHrEtP78iKeCfrtlDgV+YPtO4ChgaWlfChxdto8CznbjKmAHSbtOSbUREdGTfsP99cA/l+1ZtteU7buBWWV7NnBX1/esKm3rkbRI0qik0XXr1vVZRkREbEzP4S5pK+DVwHljj7lZiLWvxVhtL7G9wPaCkZFxJzWLiIhJ6ufK/TDgWtv3lP17Ot0t5XVtaV8NzO36vjmlLSIiBqSfcD+OJ7tkAJYBC8v2QuDCrvbjy6iZg4AHurpvIiJiAHqaz13StsDLgbd2NZ8KnCvpROBO4NjSfglwOLACeAQ4YcqqjYiInvQU7rYfBnYe03YvzeiZsecaOGlKqouIiEnJE6oRERVKuEdEVCjhHhFRoYR7RESFEu4RERVKuEdEVCjhHhFRoYR7RESFEu4RERVKuEdEVCjhHhFRoYR7RESFEu4RERVKuEdEVCjhHhFRoYR7RESFEu4RERVKuEdEVCjhHhFRoYR7RESFegp3STtIOl/S9yXdKulgSTtJulTS7eV1x3KuJJ0haYWkGyTNn95/QkREjNXrlfvpwNds7w3sB9wKLAaW294TWF72AQ4D9ixfi4Azp7TiiIiY0IThLml74CXAWQC2H7V9P3AUsLScthQ4umwfBZztxlXADpJ2nfLKIyJig3q5ct8DWAd8VtJ1kj4taVtglu015Zy7gVllezZwV9f3ryptERExIL2E+xbAfOBM2wcAD/NkFwwAtg24n18saZGkUUmj69at6+dbIyJiAr2E+ypgle2ry/75NGF/T6e7pbyuLcdXA3O7vn9OaVuP7SW2F9heMDIyMtn6IyJiHBOGu+27gbsk7VWaDgVuAZYBC0vbQuDCsr0MOL6MmjkIeKCr+yYiIgZgix7PezvwBUlbASuBE2j+MJwr6UTgTuDYcu4lwOHACuCRcm5ERAxQT+Fu+3pgwTiHDh3nXAMnbWJdERGxCfKEakREhRLuEREVSrhHRFQo4R4RUaGEe0REhRLuEREVSrhHRFQo4R4RUaGEe0REhRLuEREVSrhHRFQo4R4RUaGEe0REhRLuEREVSrhHRFQo4R4RUaGEe0REhRLuEREVSrhHRFQo4R4RUaGEe0REhXoKd0l3SLpR0vWSRkvbTpIulXR7ed2xtEvSGZJWSLpB0vzp/AdERMQv6+fK/Xds7297QdlfDCy3vSewvOwDHAbsWb4WAWdOVbEREdGbTemWOQpYWraXAkd3tZ/txlXADpJ23YTfExERfeo13A18XdI1khaVtlm215Ttu4FZZXs2cFfX964qbeuRtEjSqKTRdevWTaL0iIjYkC16PO/FtldLehZwqaTvdx+0bUnu5xfbXgIsAViwYEFf3xsRERvX05W77dXldS3wFeBFwD2d7pbyuracvhqY2/Xtc0pbREQMyIThLmlbSc/sbAO/B9wELAMWltMWAheW7WXA8WXUzEHAA13dNxERMQC9dMvMAr4iqXP+F21/TdJ3gXMlnQjcCRxbzr8EOBxYATwCnDDlVUdExEZNGO62VwL7jdN+L3DoOO0GTpqS6iIiYlLyhGpERIUS7hERFUq4R0RUKOEeEVGhhHtERIUS7hERFUq4R0RUKOEeEVGhhHtERIUS7hERFUq4R0RUKOEeEVGhhHtERIUS7hERFep1mb2IiKe8eYsvntaff8epR0zZz8qVe0REhRLuEREVSrhHRFQo4R4RUaGEe0REhXoOd0mbS7pO0kVlfw9JV0taIekcSVuV9qeV/RXl+LzpKT0iIjaknyv3k4Fbu/Y/Cpxm+7nAfcCJpf1E4L7Sflo5LyIiBqincJc0BzgC+HTZF/Ay4PxyylLg6LJ9VNmnHD+0nB8REQPS65X73wF/DjxR9ncG7rf9WNlfBcwu27OBuwDK8QfK+euRtEjSqKTRdevWTbL8iIgYz4ThLulIYK3ta6byF9teYnuB7QUjIyNT+aMjIp7yepl+4BDg1ZIOB7YGtgNOB3aQtEW5Op8DrC7nrwbmAqskbQFsD9w75ZVHRMQGTXjlbvsvbM+xPQ94PfAN228ELgOOKactBC4s28vKPuX4N2x7SquOiIiN2pRx7u8F3i1pBU2f+lml/Sxg59L+bmDxppUYERH96mtWSNvfBL5ZtlcCLxrnnJ8BfzAFtUVExCTlCdWIiAol3CMiKpRwj4ioUMI9IqJCCfeIiAol3CMiKpRwj4ioUMI9IqJCCfeIiAol3CMiKpRwj4ioUMI9IqJCCfeIiAol3CMiKpRwj4ioUMI9IqJCfS3WMVPMW3zxtP78O049Ylp/fsRk5b0fvcqVe0REhRLuEREVSrhHRFRownCXtLWk70j6nqSbJX2otO8h6WpJKySdI2mr0v60sr+iHJ83vf+EiIgYq5cr9/8GXmZ7P2B/4JWSDgI+Cpxm+7nAfcCJ5fwTgftK+2nlvIiIGKAJw92Nn5TdLcuXgZcB55f2pcDRZfuosk85fqgkTVnFERExoZ763CVtLul6YC1wKfAD4H7bj5VTVgGzy/Zs4C6AcvwBYOdxfuYiSaOSRtetW7dp/4qIiFhPT+Fu+3Hb+wNzgBcBe2/qL7a9xPYC2wtGRkY29cdFRESXvkbL2L4fuAw4GNhBUuchqDnA6rK9GpgLUI5vD9w7JdVGRERPehktMyJph7L9dODlwK00IX9MOW0hcGHZXlb2Kce/YdtTWXRERGxcL9MP7AoslbQ5zR+Dc21fJOkW4EuSTgGuA84q558F/JOkFcCPgddPQ90REbERE4a77RuAA8ZpX0nT/z62/WfAH0xJdZXK/CARMd3yhGpERIUS7hERFUq4R0RUKOEeEVGhhHtERIUS7hERFUq4R0RUKOEeEVGhhHtERIUS7hERFUq4R0RUKOEeEVGhhHtERIUS7hERFUq4R0RUKOEeEVGhhHtERIUS7hERFUq4R0RUKOEeEVGhCcNd0lxJl0m6RdLNkk4u7TtJulTS7eV1x9IuSWdIWiHpBknzp/sfERER6+vlyv0x4E9t7wMcBJwkaR9gMbDc9p7A8rIPcBiwZ/laBJw55VVHRMRGTRjuttfYvrZsPwTcCswGjgKWltOWAkeX7aOAs924CthB0q5TXnlERGxQX33ukuYBBwBXA7NsrymH7gZmle3ZwF1d37aqtEVExID0HO6SngF8GXin7Qe7j9k24H5+saRFkkYlja5bt66fb42IiAn0FO6StqQJ9i/YvqA039Ppbimva0v7amBu17fPKW3rsb3E9gLbC0ZGRiZbf0REjKOX0TICzgJutf2xrkPLgIVleyFwYVf78WXUzEHAA13dNxERMQBb9HDOIcCbgRslXV/a/hdwKnCupBOBO4Fjy7FLgMOBFcAjwAlTWnFERExownC3fQWgDRw+dJzzDZy0iXVFRMQmyBOqEREVSrhHRFQo4R4RUaGEe0REhRLuEREVSrhHRFQo4R4RUaGEe0REhRLuEREVSrhHRFQo4R4RUaGEe0REhRLuEREVSrhHRFQo4R4RUaGEe0REhRLuEREVSrhHRFQo4R4RUaGEe0REhSYMd0mfkbRW0k1dbTtJulTS7eV1x9IuSWdIWiHpBknzp7P4iIgYXy9X7p8DXjmmbTGw3PaewPKyD3AYsGf5WgScOTVlRkREPyYMd9vfAn48pvkoYGnZXgoc3dV+thtXATtI2nWqio2IiN5Mts99lu01ZftuYFbZng3c1XXeqtIWEREDtMk3VG0bcL/fJ2mRpFFJo+vWrdvUMiIiostkw/2eTndLeV1b2lcDc7vOm1PafontJbYX2F4wMjIyyTIiImI8kw33ZcDCsr0QuLCr/fgyauYg4IGu7puIiBiQLSY6QdI/A78N7CJpFfAB4FTgXEknAncCx5bTLwEOB1YAjwAnTEPNERExgQnD3fZxGzh06DjnGjhpU4uKiIhNkydUIyIqlHCPiKhQwj0iokIJ94iICk14QzUiYqrMW3zxtP78O049Ylp//jBJuMdTznQGTMIlZop0y0REVCjhHhFRoYR7RESFEu4RERVKuEdEVCjhHhFRoYR7RESFEu4RERVKuEdEVCjhHhFRoYR7RESFEu4RERVKuEdEVCjhHhFRoYR7RESFpiXcJb1S0m2SVkhaPB2/IyIiNmzKw13S5sDfA4cB+wDHSdpnqn9PRERs2HRcub8IWGF7pe1HgS8BR03D74mIiA2Q7an9gdIxwCtt/8+y/2bgQNt/Mua8RcCisrsXcNuUFrK+XYAfTePPn26pvz3DXDuk/rZNd/272x4Z70Bra6jaXgIsGcTvkjRqe8Egftd0SP3tGebaIfW3rc36p6NbZjUwt2t/TmmLiIgBmY5w/y6wp6Q9JG0FvB5YNg2/JyIiNmDKu2VsPybpT4B/BTYHPmP75qn+PX0aSPfPNEr97Rnm2iH1t621+qf8hmpERLQvT6hGRFQo4R4RUaGEe0QXSVu3XcNTkaTl5fWjbdeyKSQdKunpbdcBLY5zj5ihbpJ0D/Dt8nWF7Qdarqlnkk4EvmX79rZr6dOukn4TeLWkLwHqPmj72nbK6tvxwJmSfkzz/vkWzXvovkEXUu0NVUmH2L5yoraZRtL8jR2fyW9ySf8CbPANZfvVAyxn0iT9KvBbwCHA4cD9tvdvt6reSPoQTe3zgGtowuXbtq9vs66JlCfbTwReTDOcujvcbftlrRQ2SZJ2A44B3gPsZnvgF9I1h/u1tudP1DbTSLqsbG4NLAC+R/NG3xcYtX1wW7VNRNJLy+ZrgV8BPl/2jwPusf2uVgrrg6Q5NOH4UmA/4Mc0V15/3WphfSpdA2+hCZfZtjdvuaSN6lx4Sfor2/+77XomS9KbaN4/v0Ez7cAVNH9c/2PgtdQW7pIOBn4TeCdwWteh7YDX2N6vlcL6JOkC4AO2byz7zwc+aPuYdiub2HiPXA/LY+SSnqC5cvyI7Qvbrqdfkt5H84njGcB1PBkua1otbAKSrrH9gmG4ANsYST8CfgD8I3CZ7TvaqqXGPvetaN7YWwDP7Gp/kOZj0rDYqxPsALZvkvRrbRbUh20lPdv2SgBJewDbtlxTrw6g6Rp4Q1mL4HbgcttntVtWz14LPAZcDFwO/Ift/263pJ78XNISYLakM8YetP2OFmrqm+1dJP068BLgw5L2BG6z/eZB11JduNu+HLhc0uds3wkgaTPgGbYfbLe6vtwg6dM82bXxRuCGFuvpxzuBb0paSdOltDtPzgA6o9n+nqQf0Fx9/RbwJpoumqEId9vzJW1Hc/X+cmCJpLW2X9xyaRM5Evhd4BU09wqGUvlv/6s07/l5wPbAE63UUlu3TIekLwJ/BDxO8zF7O+B023/bamE9KkPy/pjmCgCaG2Nn2v5Ze1VNrPwhPQa4ENi7NH9/SK4ekTQKPA34d8qImc5FwjAo3XedewYLgLto/g1/1WphPZK0n+3vtV3HZEm6gaYr7AqaUUurWqul4nC/3vb+kt4IzAcWA9fY3rfl0qo3LP3r45E0Yntd23VMlqSLeHIY53dt/7zlkvpSLmpOBH6dZlABALb/sLWiJkHSNrYfabOGmh9i2lLSlsDRwLIhfJMfIulSSf8paWXnq+26evRvkt4jaa6knTpfbRfVo80knSXpqwCS9iljx4eC7SOBM4B7h+09X/wTzUirV9DcM5gDPNRqRX2QdLCkW4Dvl/39JP1DG7XUHO6fBO6guZH3LUm7A0PzMApNH+/HaG7uvbDraxi8DjiJpitplKYPdbTVinr3OZoZTXcr+/9Jcw9hKEh6FXA98LWyv7+kYZpy+7m23w88bHspcARwYMs19ePvaP4w3QvNPRye7FodqGrD3fYZtmfbPtxN39P/A85uu64+PGD7q7bX2r6389V2UROR9CLgWNt70Nwk+3vgJNvPbreynu1i+1zKTTDbj9HctxkWH6RZx/h+gPLw0h5tFtSnzqeN+8v9g+2BZ7VYT99s3zWmqZX3T3WjZTbEtiW9H/hU27X06DJJfwtcAPziZuQMf0L1A8BhwBaSLqUJmW8CiyUdYPvDbdbXo4cl7Ux50lbSQQzXJ76f235AWv/p/baKmYQlknYE3kezyM8zgPe3W1Jf7irTKLh0C58M3NpGIdXdUC13q8c9BDzP9tMGWc9kdT2p2m1GP4Yt6UZgf5rRJncDc2w/WJ6WvHoYbmaX6R8+DjwfuAkYAY6xPRTDUCWdBSynGUDw+8A7gC1t/1GrhfVI0h62fzhR20wlaRfgdJphnQK+DpzcxqfuGq/cZ9H0eY2dqEc0w9tmPEl7A6fQBOJPutoPa6+qnjxm+3HgEUk/6DxXYPun5cnPGU3S5jRDCF8K7EXznrltyG5Mvh34S5pPe1+kuX9wSqsV9efLNKPbup0PvKCFWvpS3j+n235j27VAneF+Ec0DS780UZKkbw6+nP5IegfNzchbgbMkndz1GPyHga+2VtzEHu0aAvaL/xkltfYgRz9sPy7pONunAW0vDdm3Ei4X2/4dmoAfGuWC5teB7SW9tuvQdnQNiZzJyvtnd0lb2X607XqqC3fbGxy2ZvsNg6xlkt4CvMD2TyTNA86XNM/26YyZBnUGeknnYSXb3WG+JbCwnZL6dqWkTwDnAA93GmfyvY6OEi5PSNp+mKYpLvaiuQG/A/CqrvaHaP6fGBYrad5Dy1j//fOxQRdSXbhXYLNOV4ztOyT9Nk3A784MD/cNPYVq+0c0M+QNg87Uvt0zExqYsfc6xvgJcGO5od0dLjN6bpby6fRCSQe3MYPiFOpMXbEZ689tNXAJ95nnHkn7d7qVyhX8kcBnaKYRjWlUujSG2QXla1i9RtLNwE9pxurvC7zL9uc3/m0zg+0PtV1DR3WjZYZdmU/8Mdt3j3Nsxi82MuwkfQT4G9v3l/0dgT+1/b52K+uNpG2Bn5Ub251++Ke1/Sh8r7qmDXkNTTfNu2nmaBmWqbovBf5gzPvnS7ZfMehaqn2IaVjZXjVesJdjCfbpd1jnf0wAN8ujHd5iPf1aDnSv4fl04N9aqmUytiyvRwDnDeG9g5Fx3j+tPISVcI9Y3+aSfvEsRBmjPxTPRhRbdw+fLdvbtFhPv/5F0vdpRlstlzQCzOiZUMd4XM0yjQCUe2WtdI+kzz1ifV+gCZXPlv0TgKUt1tOvhyXN74zukfQCmv7roWB7saS/oZl+43FJjwBHtV1XH/4SuELS5TQDIH6LltYySJ97xBiSXknzhCHApbb/tc16+iHphcCXgP+iCZdfAV5ne2gXwBg25SnVg8ruVWW02ODrSLhH1KXMabJX2R22J2xjiiTcIyIqlBuqETFjSFreS1tMLDdUI7pIesHY/mlJR9q+qK2angrK8nrbALuUseGdp7G3A2a3VlifNrDi2ENtdI0l3CPW9ylJx9u+CUDScTQrMQ1FuJcpi8d6ALizLDwyU72V5r/zbjQrd3XC/UHgE20VNQnXAnNpZqUVzVw5d0u6B3jLIG9sp889ooukZ9NMMfsGmmFsxwNHDsvDNJKuopky9waacHk+zQyX2wN/bPvrLZY3IUlvt/3xtuuYLEmfAs7vjLCS9Hs08+p/lmY64IEtGZhwjxhD0vOA/0uzNONrbA/NOHFJFwDvt31z2d+HZhK0PwcusL3/xr5/JigrGc2jq2fB9lAskSnpRtu/MabtBtv7dqZWGFQt6ZaJ4BerSHVf6ewEbA5cLYlhWEWqeF4n2AFs3yJpb9srxyy9NyNJ+ifgOTSLfHfWHjXDs/7xGknvpXnWAJrF4u8pc/wMdE2DhHtE48i2C5giN0s6k/XD5ZYypcIwjHdfAOzj4e1SeAPwAZpPfgBXlrbNgWMHWUi6ZSK6lAWxb7b9UNnfDvg121e3W1lvylw4bwNeXJquBP6BZn6WbbrnnZmJJJ0HvMP2mrZrGXYJ94gukq4D5neuHCVtBozaHm8USkyxsjD8/sB3aNaBBcD2q1srqg/lfs17+OV7BgNf7CXdMhHrU3eXgO0nJA3N/yeSDgE+COzO+uHy7LZq6tMH2y5gE50H/CPwaZ68Z9CKoXnTRgzIyrJI+Zll/20062IOi7OAd9GMFW81XCbD9uVt17CJHrN95sSnTb9MPxCxvj8CfhNYDawCDqSlKVsn6QHbX7W91va9na+2i+qVpIMkfVfSTyQ9KulxSQ+2XVcf/kXS2yTtKmmnzlcbhaTPPaIikk6lGZlxAev3WV/bWlF9kDQKvJ6me2MBzUNkz7P9F60W1iNJPxyn2W10iyXcI7qUG2JnArNsP1/SvsCrbZ/Scmk9KTckx3IbN/QmQ9Ko7QWdB39K23W2D2i7tmGTcI/oUlbQ+TPgk51AkXST7ee3W9lTg6Rv0SyU8mngbmAN8D9m+gLZkl5m+xuSXjvecdsXDLqm3FCNWN82tr8z5mnOmTzhFgCS3mT785LePd5x2x8bdE2T9Gaae4F/QnNjeC7N3Cwz3UuBbwCvGueYabrJBirhHrG+H0l6DmUqAknH0Fw9znTbltdntlrFJiiP6H/E9htpHrr6UMsl9cz2B8rrCW3X0pFwj1jfScASYG9Jq4EfAm9qt6SJ2f5kCccHbZ/Wdj2TURbE3l3SVrYfbbuefmzoE1NHG5+cEu4RXWyvBH5X0rbAZp1pCIZBCcfjgKEM92IlcKWkZcDDncYh6FaacZ+YckM1gpl55TUZkk4DtgTOYf1wHJahkB8Yr9320HTRzBQJ9wjWC5W9gBcCy8r+q4Dv2J7xXTMw/EMhh5WkMzZ23PY7BlVLR8I9oksZindE16yQzwQutv2Sdit7aih/nH4plGb6HydJCzd23PbSQdXSkT73iPXNArpv5j1a2oaCpFnAR4DdbB9WVmI62PZZLZfWq/d0bW9NMwxyxg9FbSO8J5Jwj1jf2cB3JH2l7B8NfK69cvr2OZr1Ov+y7P8nTf/7UIT7OAtIXynpO60UMwmSRoD3AvvQ/HEC2vnkkYnDIrrY/jBwAs3q9fcBJ9j+63ar6ssuts+lLOlm+zGGaHbI7sm2JO0i6RU0i3sPiy8AtwJ70IzTvwP4bhuF5Mo9YowysmQoRpeM42FJO/PkQ1gHAQ+0W1JfrqGpXTTdMT8ETmy1ov7sbPssSSeX6Ysvl5Rwj4hN9m6akT7PkXQlMAIc025JvbO9R9s1bKLOOrVrJB0B/BfNYusDl9EyEZUpK0ftRXP1e5vtYVgY+xckPZ9f7rM+u72KeifpSODbNHPifBzYDviQ7WUb/cbpqCXhHlEPSdvQXL3vbvstkvYE9rJ9Ucul9aQ8b/DbNOF+CXAYcIXtofn0MVMk3CMqIukcmn7r48t89NsA/257/5ZL64mkG4H9gOts71eGdn7e9stbLq0nkj7L+OP0/3DQtaTPPaIuz7H9ujLHDLYf0Zj5i2e4n5ZFyR+TtB2wlqaLY1h0f0LaGngNTb/7wCXcI+ryqKSn8+RomefQtdzeEBiVtAPwKZpPID8B/qPdknpn+8vd+5L+GbiijVrSLRNREUkvB95H02f9deAQmpWMvtlmXZMhaR6wne0bWi5l0iTtRTN9xXMH/rsT7hF1KePcD6IZLXOV7R+1XFLPJC23fehEbTOVpIdYv8/9buAvxl7RD0K6ZSLqszXN07VbAPtIwva3Wq5poyRtDWwD7CJpR5o/TNAMJZzdWmF9sj1j5nVPuEdURNJHgdcBN1OmIKC5kpzR4Q68FXgnsBtNX3sn3B8EPtFWUf2aSZ880i0TURFJtwH72h6mm6i/IOnttj/edh396vrkcRnNOP3uTx5fs733oGvKlXtEXVbSrMQ0lOEO3C3pmbYfkvQ+YD5wyhCsJDXjPnnkyj2iIpK+TPMQ0HK6Ar6NlYAmQ9INtveV9GLgFOBvgb+yfWDLpfVkJn3yyJV7RF2W8eQSgcOoMz3xEcAS2xdLOqXNgvr0hKQdbN8PUG4OH2f7HwZdSK7cI2LGkHQRsBp4OU2XzE9p1rDdr9XCeiTp+rFTPUi6zvYBg64lV+4RFZB0ru1jy9ws481tsm8LZU3GscArgf9j+35JuwJ/1nJN/dhcklyumiVtDmzVRiEJ94g6nFxej2y1ik1U5sJZC7wYuJ1mwY7b262qL18DzpH0ybL/1tI2cOmWiYgZo0z5u4BmmuLnSdoNOM/2IS2X1hNJm9EEemdc+6XAp20PfKnDhHtERcZ5/B2aZfZGgT+1vXLwVfVO0vXAAcC1nX7qzgiadisbPumWiajL3wGrgC/SjLV+PfAcmjVhP0PzgM1M9qhtS+r0WW/bdkG9mIn3PHLlHlERSd8bO7KkM4JjvGMzjaT3AHvSjJb5a+APgS/OlLHjGyJpV9trJO0+3nHbdw66ply5R9TlEUnHAueX/WOAn5XtGX0lVxYVOQfYm+bJzr1oHmC6tNXCemB7Tdn8feBLtltZoKNbrtwjKiLp2cDpwME0YX4V8C6aseMvsN3KwhG9knSj7d9ou47JKjeEjwV+TPOH6jzb97RSS8I9ImYKSUuBT9j+btu1bApJ+9LMzvn7wCrbvzvoGtItE1ERSSPAW4B5dP3/3cYCzZN0IPBGSXcCD6J1U5QAAAEYSURBVNPcFPYQjpZZS7NQx73As9ooIOEeUZcLgW8D/8aT87QMk1e0XcCmkPQ2mm6ZEeA84C22b2mjloR7RF22sf3etouYrDZGlUyxucA7bV/fdiHpc4+oSJlB8d9tX9J2LdGuhHtERcoTqtsCj5avTp/1dq0WFgOXcI+IqNBmbRcQEVNHjTdJen/ZnyvpRW3XFYOXK/eIikg6E3gCeJntXysrAX3d9gtbLi0GLKNlIupyoO35kq4DsH2fpFYWi4h2pVsmoi4/L6v/dGZVHKG5ko+nmIR7RF3OAL4CPEvSh4ErgI+0W1K0IX3uEZWRtDfNSkACltu+teWSogUJ94iICqVbJiKiQgn3iIgKJdwjIiqUcI+IqND/Bzfi+FNyP9VCAAAAAElFTkSuQmCC\n",
+ "text/plain": [
+ "
"
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ }
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "D2ST50hRJtaB"
+ },
+ "source": [
+ "Instead of reporting counts, we can also report proportions or probabilities, or the **relative frequencies**. We can calculate the relative frequencies by specifying `normalize=True` in `.value_counts()`."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "D-3JccgEJtaC",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "0ce37e3f-e4d6-4e9b-c27f-1843eb462cb5"
+ },
+ "source": [
+ "class_probs = df_titanic[\"class\"].value_counts(normalize=True)\n",
+ "class_probs.sort_index()"
+ ],
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "1st 0.146806\n",
+ "2nd 0.128681\n",
+ "3rd 0.321251\n",
+ "deck crew 0.029905\n",
+ "engineering crew 0.146806\n",
+ "restaurant staff 0.031264\n",
+ "victualling crew 0.195288\n",
+ "Name: class, dtype: float64"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 233
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "g6EKTxBBJtaE"
+ },
+ "source": [
+ "This is equivalent to taking the counts and dividing by their sum."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "Z2X-cH0wJtaF",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "bd520517-5980-4563-d7c3-f1d6177ae9f4"
+ },
+ "source": [
+ "class_counts / class_counts.sum()"
+ ],
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "1st 0.146806\n",
+ "2nd 0.128681\n",
+ "3rd 0.321251\n",
+ "deck crew 0.029905\n",
+ "engineering crew 0.146806\n",
+ "restaurant staff 0.031264\n",
+ "victualling crew 0.195288\n",
+ "Name: class, dtype: float64"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 234
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "a5wGkd9AJtaH"
+ },
+ "source": [
+ "Notice that the relative frequencies add up to 1.0, by construction. We can report these relative frequencies using probability notation. For example:\n",
+ "\n",
+ "$$ P(\\text{1st class}) = 0.146806. $$\n",
+ "\n",
+ "The complete collection of probabilities of all levels of a variable is called the **distribution** of that variable. So the code above calculates the distribution of \"class\" on the Titanic."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "swUzbXFGJtaH"
+ },
+ "source": [
+ "The bar chart for relative frequencies (i.e., probabilities) looks qualitatively the same as the bar chart for frequencies (i.e., counts). The only difference is that the scale on the $y$-axis is different."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "LNEzMug6JtaI",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 359
+ },
+ "outputId": "821a0559-5ae2-487b-9a42-b6783fb2768e"
+ },
+ "source": [
+ "class_probs.sort_index().plot.bar()"
+ ],
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "execution_count": 235
+ },
+ {
+ "output_type": "display_data",
+ "data": {
+ "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXoAAAFFCAYAAADijCboAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nO3de5RdZX3/8feHIFBQLJqplQBJwAhG5RpBi5dWQUJBUIsCSqWWJbVKvdX+jEsFpaKov6X1gpQoeK2/cJHWVGKRIqJgkQwXwWBTQkRJihIBQQGNgc/vj70PHMbJzJ7JzNnnPPm81pqVs28nX1gnn9nneZ79PLJNRESUa4u2C4iIiOmVoI+IKFyCPiKicAn6iIjCJegjIgq3ZdsFjDRz5kzPmTOn7TIiIgbKNddc8wvbQ6Md67ugnzNnDsPDw22XERExUCT9ZGPH0nQTEVG4BH1EROES9BERhUvQR0QULkEfEVG4BH1EROES9BERhUvQR0QULkEfEVG4vnsyNgbLnEUXTev733r6YdP6/hGbg9zRR0QULkEfEVG4BH1EROES9BERhUvQR0QULkEfEVG4BH1EROES9BERhUvQR0QUrlHQS1ooaaWkVZIWjXL89ZJulHS9pCskze869s76upWSDpnK4iMiYnzjBr2kGcAZwKHAfODY7iCvfcX2M23vDXwY+Gh97XzgGODpwELg0/X7RUREjzS5o98fWGV7te31wBLgyO4TbN/btbkd4Pr1kcAS27+1/WNgVf1+ERHRI00mNZsF3Na1vQY4YORJkt4IvA3YCnhh17VXjbh21qQqjYiISZmyzljbZ9jeDXgH8O6JXCvpREnDkobXrVs3VSVFRATNgn4tsHPX9k71vo1ZArx0ItfaXmx7ge0FQ0NDDUqKiIimmgT9cmCepLmStqLqXF3afYKkeV2bhwE316+XAsdI2lrSXGAecPWmlx0REU2N20Zve4Okk4CLgRnAObZXSDoVGLa9FDhJ0kHA74C7gePra1dIOg+4CdgAvNH2g9P03xIREaNotMKU7WXAshH7Tu56/eYxrj0NOG2yBUZExKbJk7EREYVL0EdEFC5BHxFRuAR9REThEvQREYVL0EdEFC5BHxFRuAR9REThEvQREYVL0EdEFC5BHxFRuAR9REThEvQREYVL0EdEFC5BHxFRuAR9REThEvQREYVL0EdEFC5BHxFRuAR9REThEvQREYVL0EdEFC5BHxFRuEZBL2mhpJWSVklaNMrxt0m6SdINki6VNLvr2IOSrq9/lk5l8RERMb4txztB0gzgDOBgYA2wXNJS2zd1nXYdsMD2/ZL+FvgwcHR97AHbe09x3RER0VCTO/r9gVW2V9teDywBjuw+wfZltu+vN68CdpraMiMiYrKaBP0s4Lau7TX1vo05AfhG1/Y2koYlXSXppZOoMSIiNsG4TTcTIek4YAHwgq7ds22vlbQr8C1JN9q+ZcR1JwInAuyyyy5TWVJExGavyR39WmDnru2d6n2PIukg4F3AEbZ/29lve23952rg28A+I6+1vdj2AtsLhoaGJvQfEBERY2sS9MuBeZLmStoKOAZ41OgZSfsAZ1GF/B1d+3eQtHX9eiZwINDdiRsREdNs3KYb2xsknQRcDMwAzrG9QtKpwLDtpcBHgMcC50sC+KntI4CnAWdJeojql8rpI0brRETENGvURm97GbBsxL6Tu14ftJHrvgc8c1MKjIiITZMnYyMiCpegj4goXII+IqJwCfqIiMIl6CMiCpegj4goXII+IqJwCfqIiMIl6CMiCpegj4goXII+IqJwCfqIiMIl6CMiCpegj4goXII+IqJwCfqIiMIl6CMiCpegj4goXII+IqJwCfqIiMIl6CMiCpegj4goXII+IqJwCfqIiMI1CnpJCyWtlLRK0qJRjr9N0k2SbpB0qaTZXceOl3Rz/XP8VBYfERHjGzfoJc0AzgAOBeYDx0qaP+K064AFtvcELgA+XF/7BOAU4ABgf+AUSTtMXfkRETGeJnf0+wOrbK+2vR5YAhzZfYLty2zfX29eBexUvz4EuMT2XbbvBi4BFk5N6RER0USToJ8F3Na1vabetzEnAN+YyLWSTpQ0LGl43bp1DUqKiIimprQzVtJxwALgIxO5zvZi2wtsLxgaGprKkiIiNntNgn4tsHPX9k71vkeRdBDwLuAI27+dyLURETF9tmxwznJgnqS5VCF9DPCq7hMk7QOcBSy0fUfXoYuBD3R1wL4YeOcmVx0R0bI5iy6a1ve/9fTDpuy9xg162xsknUQV2jOAc2yvkHQqMGx7KVVTzWOB8yUB/NT2EbbvkvSPVL8sAE61fdeUVR8REeNqckeP7WXAshH7Tu56fdAY154DnDPZAiMiYtPkydiIiMIl6CMiCpegj4goXII+IqJwCfqIiMIl6CMiCpegj4goXII+IqJwCfqIiMIl6CMiCpegj4goXII+IqJwCfqIiMIl6CMiCpegj4goXII+IqJwCfqIiMIl6CMiCpegj4goXII+IqJwCfqIiMIl6CMiCpegj4goXKOgl7RQ0kpJqyQtGuX48yVdK2mDpKNGHHtQ0vX1z9KpKjwiIprZcrwTJM0AzgAOBtYAyyUttX1T12k/Bf4KePsob/GA7b2noNaIiJiEcYMe2B9YZXs1gKQlwJHAw0Fv+9b62EPTUGNERGyCJk03s4DburbX1Pua2kbSsKSrJL10tBMknVifM7xu3boJvHVERIynF52xs20vAF4F/JOk3UaeYHux7QW2FwwNDfWgpIiIzUeTppu1wM5d2zvV+xqxvbb+c7WkbwP7ALdMoMYxzVl00VS91ahuPf2waX3/iE2Rz3800eSOfjkwT9JcSVsBxwCNRs9I2kHS1vXrmcCBdLXtR0TE9Bs36G1vAE4CLgZ+BJxne4WkUyUdASDpWZLWAK8AzpK0or78acCwpB8AlwGnjxitExER06xJ0w22lwHLRuw7uev1cqomnZHXfQ945ibWGBERmyBPxkZEFC5BHxFRuAR9REThEvQREYVL0EdEFC5BHxFRuAR9REThEvQREYVr9MBUTJ/MVRIR0y139BERhUvQR0QULkEfEVG4BH1EROES9BERhUvQR0QULkEfEVG4BH1EROES9BERhUvQR0QULkEfEVG4BH1EROES9BERhUvQR0QUrlHQS1ooaaWkVZIWjXL8+ZKulbRB0lEjjh0v6eb65/ipKjwiIpoZN+glzQDOAA4F5gPHSpo/4rSfAn8FfGXEtU8ATgEOAPYHTpG0w6aXHRERTTW5o98fWGV7te31wBLgyO4TbN9q+wbgoRHXHgJcYvsu23cDlwALp6DuiIhoqEnQzwJu69peU+9rotG1kk6UNCxpeN26dQ3fOiIimuiLzljbi20vsL1gaGio7XIiIorSJOjXAjt3be9U72tiU66NiIgp0CTolwPzJM2VtBVwDLC04ftfDLxY0g51J+yL630REdEj4wa97Q3ASVQB/SPgPNsrJJ0q6QgASc+StAZ4BXCWpBX1tXcB/0j1y2I5cGq9LyIiemTLJifZXgYsG7Hv5K7Xy6maZUa79hzgnE2oMSIiNkFfdMZGRMT0SdBHRBQuQR8RUbgEfURE4RL0ERGFS9BHRBQuQR8RUbgEfURE4RL0ERGFS9BHRBQuQR8RUbgEfURE4RL0ERGFS9BHRBQuQR8RUbgEfURE4RL0ERGFS9BHRBQuQR8RUbgEfURE4RL0ERGFS9BHRBQuQR8RUbhGQS9poaSVklZJWjTK8a0lnVsf/76kOfX+OZIekHR9/fPPU1t+RESMZ8vxTpA0AzgDOBhYAyyXtNT2TV2nnQDcbfspko4BPgQcXR+7xfbeU1x3REQ01OSOfn9gle3VttcDS4AjR5xzJPCF+vUFwIskaerKjIiIyWoS9LOA27q219T7Rj3H9gbgHuCJ9bG5kq6TdLmk521ivRERMUHjNt1sotuBXWzfKWk/4N8kPd32vd0nSToROBFgl112meaSIiI2L03u6NcCO3dt71TvG/UcSVsCjwfutP1b23cC2L4GuAV46si/wPZi2wtsLxgaGpr4f0VERGxUk6BfDsyTNFfSVsAxwNIR5ywFjq9fHwV8y7YlDdWduUjaFZgHrJ6a0iMioolxm25sb5B0EnAxMAM4x/YKSacCw7aXAmcDX5K0CriL6pcBwPOBUyX9DngIeL3tu6bjPyQiIkbXqI3e9jJg2Yh9J3e9/g3wilGu+yrw1U2sMSIiNkGejI2IKFyCPiKicAn6iIjCJegjIgqXoI+IKFyCPiKicAn6iIjCTfdcNxERGzVn0UXT9t63nn7YtL33oEnQx2ZtOoMGEjbRH9J0ExFRuAR9REThEvQREYVL0EdEFC5BHxFRuAR9REThEvQREYVL0EdEFC5BHxFRuAR9REThEvQREYVL0EdEFC5BHxFRuAR9REThEvQREYVrFPSSFkpaKWmVpEWjHN9a0rn18e9LmtN17J31/pWSDpm60iMioolxg17SDOAM4FBgPnCspPkjTjsBuNv2U4CPAR+qr50PHAM8HVgIfLp+v4iI6JEmd/T7A6tsr7a9HlgCHDninCOBL9SvLwBeJEn1/iW2f2v7x8Cq+v0iIqJHmiwlOAu4rWt7DXDAxs6xvUHSPcAT6/1Xjbh21si/QNKJwIn15q8lrWxU/eTMBH7R9GR9aBormZzU367U367G9Q9y7TCp+mdv7EBfrBlrezGwuBd/l6Rh2wt68XdNh9TfrtTfrkGuv83amzTdrAV27treqd436jmStgQeD9zZ8NqIiJhGTYJ+OTBP0lxJW1F1ri4dcc5S4Pj69VHAt2y73n9MPSpnLjAPuHpqSo+IiCbGbbqp29xPAi4GZgDn2F4h6VRg2PZS4GzgS5JWAXdR/TKgPu884CZgA/BG2w9O039LUz1pIppGqb9dqb9dg1x/a7WruvGOiIhS5cnYiIjCJegjIgqXoI8Yg6Rt2q5hcyTp0vrP/hsN35CkF0n6g7brgD4ZRx/Rx34o6efAd+ufK2zf03JNjUk6AfiO7ZvbrmWCnizpT4AjJC0B1H3Q9rXtlDUhrwHOlHQX1WfnO1Sfn7t7Xchm0Rkr6UDbV463r99I2nes4/3+YZf078BGP2C2j+hhOZMmaRfgecCBwJ8Dv7S9d7tVNSPpfVS1zwGuoQqb79q+vs26xiPpKKo5tJ5LNcS7O+ht+4WtFDYJknakGnb+dmBH2z2/wd5cgv5a2/uOt6/fSLqsfrkNsAD4AdUHfk+qoa3Paau2JiS9oH75cuCPgS/X28cCP7f91lYKmwBJO1EF5QuAvaiGD19h+4OtFjZBdRPC66jCZpbtvp5csHMjJulk26e2Xc9kSDqO6rPzTKqpD66g+iX7Xz2vpeSgl/Qc4E+At1DNqtmxPfAy23u1UtgESboQOMX2jfX2M4D32j6q3cqaGe3R70F5lF3SQ1R3lB+w/bW265koSe+m+ibyWOA6Hgmb21stbBySrrG93yDckG2MpF8AtwD/DFxm+9a2aim9jX4rqg/4lsDjuvbfS/VValDs3gl5ANs/lPS0NguaoO0k7Wp7NUD9lPR2LdfU1D5UzQevqtdiuBm43PbZ7ZbV2MupHla8CLgc+C/bv223pEZ+J2kxMEvSJ0YetP2mFmqaENszJT0deD5wmqR5wErbf9nrWooOetuXA5dL+rztnwBI2gJ4rO17261uQm6Q9Fkeafp4NXBDi/VM1FuAb0taTdX0NJtHZivta7Z/IOkWqjuz5wHHUTXjDETQ295X0vZUd/UHA4sl3WH7uS2XNp7DgYOAQ6j6FgZO/f99F6rP+xyqOcAeaqWWkptuOiR9BXg98CDV1/DtgY/b/kirhTVUD/H7W6o7A6g61M60/Zv2qmqm/sV6FPA1YI96938PyF0lkoaBrYHvUY+86dw0DIK6ma/Tx7CAajrx79o+udXCGpK0l+0ftF3HZEi6gaqp7AqqkU9rWqtlMwn6623vLenVwL7AIuAa23u2XNpmYVDa40cjacj2urbrmCxJX+eRoaHLbf+u5ZImpL7JOYFqlbqHn2mw/detFTVBkra1fX+bNWwuD0w9RtJjgJcCSwfww36gpEsk/Y+k1Z2ftuuagP+U9HZJO0t6Quen7aIa2kLS2ZK+AdXymPXY9IFg+3DgE8Cdg/a5r32JasTWIVR9DDsBv2q1ooYkPUfSTcB/19t7Sfp0G7VsLkF/FnArVQfgdyTNBgbmoReq9uCPUnUKPqvrZ1AcDbyRqslpmKrNdbjVipr7PNXMrTvW2/9D1ecwECS9BLge+I96e29JI6cZ72dPsf0e4D7bXwAO4/dXuOtX/0T1C+pOqPp7eKT5tac2i6C3/Qnbs2z/eT1P/k+BL7Zd1wTcY/sbtu+wfWfnp+2impC0P/BK23OpOtjOoJquetd2K2tspu3zqDvRbG+g6usZFO+lWqf5lwD1g1Jz2yxogjrfQn5Z9zc8HvijFuuZENu3jdjVymen6FE3G2Pbkt4DfKbtWhq6TNJHgAuBhzsxB+DJ2FOAQ4EtJV1CFTjfBhZJ2sf2aW3W19B9kp5I/YSvpGczWN8Gf2f7HunRMwi0VcwkLJa0A/BuqoWMHgu8p92SGrutnsbBddPxm4EftVFI0Z2xda/3qIeAp9reupf1TFbXE7Ld+v4xcEk3AntTjVr5GbCT7XvrpzS/Pwid4fU0FJ8EngH8EBgCjrI9EMNbJZ0NXEo1AOEvgDcBj7H9+lYLa0jSXNs/Hm9fP5I0E/g41TBRAd8E3tzGt/HS7+ifRNVGNnISIVENl+t7kvYA3k8VjL/u2n9oe1U1tqFeUex+Sbd0nl2w/UD9xGlfkzSDaljiC4DdqT43KwesU/PvgHdRfRP8ClV/w/tbrWhivko1Uq7bBcB+LdTSWP3Z+bjtV7ddC5Qf9F+nejjq9yZwkvTt3pczMZLeRNWJ+SPgbElv7noM/zTgG60V18z6rqFlD//DlNTagyMTYftBScfa/hiwou16JqoOm4ts/xlV2A+M+gbn6cDjJb2869D2dA2z7Ff1Z2e2pK1sr2+7nqKD3vZGh8HZflUva5mk1wH72f61pDnABZLm2P44I6Zt7VPP7zwYZbs72B/DI4vJ97srJX0KOBe4r7Oz3/tH4OGweUjS4wdpauXa7lSd938IvKRr/6+o/l0MgtVUn5+lPPqz89FeF1J00Bdgi05zje1bJf0pVdjPZgCCfmNPv9r+BdVsfoOgMx1x9wyKBvq6f6TLr4Eb687w7rDp67li6m+uX5P0nDZme5winakztuDRc231XIK+v/1c0t6dpqf6zv5w4ByqqU9jmtXNHoPswvpnUL1M0grgAapnAfYE3mr7y2Nf1j7b72u7ho6iR90Munou9A22fzbKsb5fOKUEkj4AfNj2L+vtHYC/t/3uditrRtJ2wG/qTvFOu/3WbT+S31TX9CUvo2rKeRvVvDF9P8V4/S3qFSM+O0tsH9LrWjaLB6YGle01o4V8fSwh3xuHdv6hArhaBu7PW6xnoi4Futct/QPgP1uqZTIeU/95GHD+gPU1DI3y2WnlYa8EfcTYZkh6+HmL+hmAgXj+orZN97Dc+vW2LdYzUf8u6b+pRm1dKmkI6PtZW2sPqlqGEoC6b62VJpS00UeM7V+oAuZz9fZrgS+0WM9E3Sdp384oIUn7UbV3DwTbiyR9mGoakAcl3Q8c2XZdDb0LuELS5VSDJ55HS+swpI0+YhySFlI93Qhwie2L26xnIiQ9C1gC/C9V2PwxcLTtgVzMY9DUT8c+u968qh5x1vs6EvQRZavnWdm93hy0J3tjCiToIyIKl87YiOhbki5tsi/Gls7YiDFI2m9ke7akw21/va2aNgf1EoLbAjPr8eedJ8G3B2a1VtgEbGQVtV+10XSWoI8Y22ckvcb2DwEkHUu1wtRABH09zfJI9wA/qRdR6Vd/Q/X/eUeqFck6QX8v8Km2ipqga4GdqWbPFdW8PT+T9HPgdb3sEE8bfcQYJO1KNS3uq6iGx70GOHxQHtyRdBXVNL83UIXNM6hm4nw88Le2v9lieeOS9He2P9l2HZMh6TPABZ1RWpJeTLUmwOeopjDu2ZKICfqIcUh6KvBvVEtQvsz2wIxDl3Qh8B7bK+rt+VQTtP0f4ELbe491fT+oV2maQ1cLhO2+XwpU0o22nzli3w229+xM7dCrWtJ0EzGKenWs7rugJwAzgO9LYhBWx6o9tRPyALZvkrSH7dUjlhfsS5K+BOxGtcB5Z71VMxhrPt8u6R1UzzEAHE01UeEMerweQ4I+YnSHt13AFFkh6UweHTY31dM6DMJ4+gXAfA9m08OrgFOovg0CXFnvmwG8speFpOkmYgz1YuArbP+q3t4eeJrt77dbWTP13DxvAJ5b77oS+DTVfDHbds+D048knQ+8yfbtbdcyyBL0EWOQdB2wb+eOUtIWwLDt0UazxBSTdBnV4i9XU617C4DtI1orqqG6b+ft/H7/Qs8XrUnTTcTY1N1sYPshSQPz70bSgcB7gdk8Omx2baumCXpv2wVsgvOBfwY+yyP9C60YmA9sREtW14u0n1lvv4FqLdBBcTbwVqqx6K2GzWTYvrztGjbBBttnjn/a9MsUCBFjez3wJ8BaYA1wAC1NNTtJ99j+hu07bN/Z+Wm7qKYkPVvSckm/lrRe0oOS7m27rob+XdIbJD1Z0hM6P20Ukjb6iIJJOp1qlMeFPLqN+9rWipoAScPAMVTNIAuoHlh7qu13tlpYA5J+PMput9FslqCPGEPdoXYm8CTbz5C0J3CE7fe3XFojdWfmSG6jQ3AyJA3bXtB50Kjed53tfdqubZAk6CPGUK8O9A/AWZ1wkfRD289ot7LNg6TvUC368lngZ8DtwF/18+Lgkl5o+1uSXj7acdsX9rqmdMZGjG1b21ePeIq0nycDA0DScba/LOltox23/dFe1zRJf0nVl3gSVafyzlTzxfSzFwDfAl4yyjFTNaP1VII+Ymy/kLQb9XQIko6iuqvsd9vVfz6u1So2QT1VwAdsv5rqAa/3tVxSI7ZPqf98bdu1dCToI8b2RmAxsIektcCPgePaLWl8ts+qg/Je2x9ru57JqBcDny1pK9vr266nqY19i+po49tUgj5iDLZXAwdJ2g7YojMVwiCog/JYYCCDvrYauFLSUuC+zs4+b3rqu29R6YyNGEU/3pVNhqSPAY8BzuXRQTkowytPGW2/7YFoxukXCfqIUXQFzO7As4Cl9fZLgKtt933zDQz+8MpBJOkTYx23/aZe1dKRoI8YQz2877Cu2SsfB1xk+/ntVrZ5qH9R/V5I9fMvKknHj3Xc9hd6VUtH2ugjxvYkoLsjcH29byBIehLwAWBH24fWK0w9x/bZLZfW1Nu7Xm9DNbSyr4e3thHk40nQR4zti8DVkv613n4p8Pn2ypmwz1OtUfquevt/qNrrByLoR1lA+0pJV7dSzARJGgLeAcyn+iUFtPNtJJOaRYzB9mnAa4G765/X2v5gu1VNyEzb51EvXWd7AwM0i2X3ZGCSZko6hGph80HwL8CPgLlUzwDcCixvo5Dc0UeMox6hMhCjVEZxn6Qn8sgDX88G7mm3pAm5hqp2UTXZ/Bg4odWKmnui7bMlvbmebvlySQn6iJhyb6MaMbSbpCuBIeCodktqzvbctmvYBJ01eW+XdBjwv1SLzPdcRt1EFK5eEWt3qrvilbYHYVHwh0l6Br/fzv3F9ipqRtLhwHep5uf5JLA98D7bS8e8cDpqSdBHlEvStlR39bNtv07SPGB3219vubRG6ucZ/pQq6JcBhwJX2B6YbyX9IEEfUTBJ51K1c7+mnk9/W+B7tvduubRGJN0I7AVcZ3uverjol20f3HJp45L0OUZ/BuCve11L2ugjyrab7aPrOW+wfb9GzLnc5x6oF2TfIGl74A6qppBB0P2taRvgZVTt9D2XoI8o23pJf8Ajo252o2tJwQEwLOkPgc9QfTP5NfBf7ZbUjO2vdm9L+n/AFW3UkqabiIJJOhh4N1Ub9zeBA6lWaPp2m3VNhqQ5wPa2b2i5lEmRtDvV9BlP6fnfnaCPKFs9jv7ZVKNurrL9i5ZLakzSpbZfNN6+fiTpVzy6jf5nwDtH3un3QppuIsq3DdVTvVsC8yVh+zst1zQmSdsA2wIzJe1A9UsKqiGKs1orbAJs98289An6iIJJ+hBwNLCCehoEqrvMvg564G+AtwA7UrXNd4L+XuBTbRU1Ef30bSRNNxEFk7QS2NP2IHXAPkzS39n+ZNt1TETXt5HLqJ4B6P428h+29+h1TbmjjyjbaqoVpgYy6IGfSXqc7V9JejewL/D+Pl8hq+++jeSOPqJgkr5K9cDRpXSFfRurHE2GpBts7ynpucD7gY8AJ9s+oOXSxtVP30ZyRx9RtqU8sgziIOpMqXwYsNj2RZLe32ZBE/CQpD+0/UuAulP5WNuf7nUhuaOPiL4l6evAWuBgqmabB6jW7N2r1cIakHT9yKkmJF1ne59e15I7+ogCSTrP9ivruWJGm29lzxbKmoxXAguB/2v7l5KeDPxDyzU1NUOSXN9NS5oBbNVGIQn6iDK9uf7z8Far2ET13Dx3AM8FbqZafOTmdqtq7D+AcyWdVW//Tb2v59J0ExF9q56meAHV1MpPlbQjcL7tA1subVyStqAK9864+UuAz9ru+VKOCfqIgo3yGD5USwkOA39ve3Xvq2pO0vXAPsC1nbbtzkicdisbLGm6iSjbPwFrgK9Qjec+BtiNag3cc6ge6Oln621bUqede7u2CxpPP/aP5I4+omCSfjByhEpnNMhox/qNpLcD86hG3XwQ+GvgK/0yPn00kp5s+3ZJs0c7bvsnva4pd/QRZbtf0iuBC+rto4Df1K/7+i6vXiDlXGAPqqdKd6d6WOqSVgsbh+3b65d/ASyx3cpiI91yRx9RMEm7Ah8HnkMV7FcBb6Uam76f7VYWwmhK0o22n9l2HZNRdyS/EriL6hfW+bZ/3kotCfqI6FeSvgB8yvbytmuZLEl7Us0g+hfAGtsH9bqGNN1EFEzSEPA6YA5d/97bWKB6kg4AXi3pJ8B9VB3KHrBRN3dQLTpyJ/BHbRSQoI8o29eA7wL/ySPzxgySQ9ouYElbtOkAAAD0SURBVLIkvYGq6WYIOB94ne2b2qglQR9Rtm1tv6PtIiarjREqU2hn4C22r2+7kLTRRxSsnunxe7aXtV1LtCdBH1Gw+snY7YD19U+njXv7VguLnkrQR0QUbou2C4iI6aPKcZLeU2/vLGn/tuuK3sodfUTBJJ0JPAS80PbT6lWOvmn7WS2XFj2UUTcRZTvA9r6SrgOwfbekVha/iPak6SaibL+rVzbqzP44RHWHH5uRBH1E2T4B/CvwR5JOA64APtBuSdFraaOPKJykPahWORJwqe0ftVxS9FiCPiKicGm6iYgoXII+IqJwCfqIiMIl6CMiCvf/AUvgIrYuXNhKAAAAAElFTkSuQmCC\n",
+ "text/plain": [
+ "
"
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ }
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "T-aiM7OfJtaM"
+ },
+ "source": [
+ "In the next lesson, we will see why it is often more useful to plot the relative frequencies (i.e., probabilities) rather than the frequencies (i.e., counts)."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "O1R6vBYwJtaN"
+ },
+ "source": [
+ "## Transforming Categorical Variables\n",
+ "\n",
+ "A categorical variable can be transformed by mapping its levels to new levels. For example, we may only be interested in whether a person on the titanic was a passenger or a crew member. The variable `class` is too detailed. We can create a new variable, `type`, that is derived from the existing variable `class`. Observations with a `class` of \"1st\", \"2nd\", or \"3rd\" get a value of \"passenger\", while observations with a `class` of \"deck crew\", \"engineering crew\", or \"deck crew\" get a value of \"crew\"."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "gsA4gHKhJtaN",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 424
+ },
+ "outputId": "20317968-2946-4ea2-f708-3d37fc2147a9"
+ },
+ "source": [
+ "df_titanic[\"type\"] = df_titanic[\"class\"].map({\n",
+ " \"1st\": \"passenger\",\n",
+ " \"2nd\": \"passenger\",\n",
+ " \"3rd\": \"passenger\",\n",
+ " \"victualling crew\": \"crew\",\n",
+ " \"engineering crew\": \"crew\",\n",
+ " \"deck crew\": \"crew\"\n",
+ "})\n",
+ "\n",
+ "df_titanic"
+ ],
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/html": [
+ "\n",
+ "
\n",
+ " "
+ ],
+ "text/plain": [
+ " name gender age ... fare survived type\n",
+ "0 Abbing, Mr. Anthony male 42.0 ... 7.11 0 passenger\n",
+ "1 Abbott, Mr. Eugene Joseph male 13.0 ... 20.05 0 passenger\n",
+ "2 Abbott, Mr. Rossmore Edward male 16.0 ... 20.05 0 passenger\n",
+ "3 Abbott, Mrs. Rhoda Mary 'Rosa' female 39.0 ... 20.05 1 passenger\n",
+ "4 Abelseth, Miss. Karen Marie female 16.0 ... 7.13 1 passenger\n",
+ "... ... ... ... ... ... ... ...\n",
+ "2202 Wynn, Mr. Walter male 41.0 ... NaN 1 crew\n",
+ "2203 Yearsley, Mr. Harry male 40.0 ... NaN 1 crew\n",
+ "2204 Young, Mr. Francis James male 32.0 ... NaN 0 crew\n",
+ "2205 Zanetti, Sig. Minio male 20.0 ... NaN 0 NaN\n",
+ "2206 Zarracchi, Sig. L. male 26.0 ... NaN 0 NaN\n",
+ "\n",
+ "[2207 rows x 10 columns]"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 236
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "u082T2xbJtaP"
+ },
+ "source": [
+ "Upon closer inspection of this `DataFrame`, we see that we accidentally left out the level \"restaurant staff\" in the input to `.map()`. Any levels that are unspecified will be mapped to the missing value `NaN`.\n",
+ "\n",
+ "This suggests a more concise way to define the new variable `type`. We can specify only the levels for passengers in the mapping and then fill in the missing values afterwards."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "fL2fQ8O8JtaQ",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 424
+ },
+ "outputId": "6650f480-1e00-42d4-8652-4fc986d96a37"
+ },
+ "source": [
+ "df_titanic[\"type\"] = df_titanic[\"class\"].map({\n",
+ " \"1st\": \"passenger\",\n",
+ " \"2nd\": \"passenger\",\n",
+ " \"3rd\": \"passenger\"\n",
+ "})\n",
+ "\n",
+ "# Replace all missing values by \"crew\"\n",
+ "df_titanic[\"type\"].fillna(\"crew\", inplace=True)\n",
+ "\n",
+ "df_titanic"
+ ],
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/html": [
+ "\n",
+ "
\n",
+ " "
+ ],
+ "text/plain": [
+ " name gender age ... fare survived type\n",
+ "0 Abbing, Mr. Anthony male 42.0 ... 7.11 0 passenger\n",
+ "1 Abbott, Mr. Eugene Joseph male 13.0 ... 20.05 0 passenger\n",
+ "2 Abbott, Mr. Rossmore Edward male 16.0 ... 20.05 0 passenger\n",
+ "5 Abelseth, Mr. Olaus Jørgensen male 25.0 ... 7.13 1 passenger\n",
+ "6 Abelson, Mr. Samuel male 30.0 ... 24.00 0 passenger\n",
+ "... ... ... ... ... ... ... ...\n",
+ "2202 Wynn, Mr. Walter male 41.0 ... NaN 1 crew\n",
+ "2203 Yearsley, Mr. Harry male 40.0 ... NaN 1 crew\n",
+ "2204 Young, Mr. Francis James male 32.0 ... NaN 0 crew\n",
+ "2205 Zanetti, Sig. Minio male 20.0 ... NaN 0 crew\n",
+ "2206 Zarracchi, Sig. L. male 26.0 ... NaN 0 crew\n",
+ "\n",
+ "[1718 rows x 10 columns]"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 241
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "1NXzBSg6Jtai"
+ },
+ "source": [
+ "Note that every person in this new `DataFrame` is male. If you inspect the index, you will see that rows 4 and 5 are missing. That is because those passengers were female."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "rj6sUAzlJtaj"
+ },
+ "source": [
+ "Finally, we can apply the methods we've learned in this lesson to `df_male` to answer the original question: \"What fraction of males were crew members?\""
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "PL5AZdDCJtak",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "44ae33d6-be08-4979-9f82-2433f43dc89a"
+ },
+ "source": [
+ "df_male[\"type\"].value_counts(normalize=True)"
+ ],
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "crew 0.504657\n",
+ "passenger 0.495343\n",
+ "Name: type, dtype: float64"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 242
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "h8JMRXKrJtam"
+ },
+ "source": [
+ "It appears that about 50.4657% of the males were crew members. We can notate this using _conditional probability_ notation:\n",
+ "\n",
+ "$$ P(\\text{crew} | \\text{male}) = 0.504657. $$\n",
+ "\n",
+ "The bar $|$ is read as \"given\". The information after the bar is the given information. In this case, we were interested in the probability a person was a crew member, \"given\" they were male. That is, after restricting to the male passengers (i.e., using a boolean mask as a filter), we want to know the relative frequency of crew members."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "ZGI7SaceJtam"
+ },
+ "source": [
+ "We can also filter on multiple criteria. For example, if we want to know the fraction of male _survivors_ who were crew members, we need to combine two boolean masks, one based on the column `gender` and another based on the column `survived`. The two masks can be combined using the logical operator `&`."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "5-OhXQDXJtan",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "00c9ec24-5076-4e28-f2de-91fc5e48d0ba"
+ },
+ "source": [
+ "(df_titanic[\"gender\"] == \"male\") & (df_titanic[\"survived\"] == 1)"
+ ],
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "0 False\n",
+ "1 False\n",
+ "2 False\n",
+ "3 False\n",
+ "4 False\n",
+ " ... \n",
+ "2202 True\n",
+ "2203 True\n",
+ "2204 False\n",
+ "2205 False\n",
+ "2206 False\n",
+ "Length: 2207, dtype: bool"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 243
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "ygJIJbtOJtat"
+ },
+ "source": [
+ "Notice that the logical operator was _broadcast_ (that word again!) over the elements of the two `Series`. In other words, the logical operator was applied to each element, producing a `Series` of booleans.\n",
+ "\n",
+ "Now we can use this new boolean mask to filter the `DataFrame`, just as we did before."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "KdfY3t-wJtau",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 424
+ },
+ "outputId": "4f562db4-d0b3-4142-cfc0-3c890416fa0b"
+ },
+ "source": [
+ "df_male_survivors = df_titanic[(df_titanic[\"gender\"] == \"male\") & (df_titanic[\"survived\"] == 1)]\n",
+ "df_male_survivors"
+ ],
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/html": [
+ "\n",
+ "