diff --git a/2_1_Analyzing_One_Categorical_Variable_Daniel_Kim.ipynb b/2_1_Analyzing_One_Categorical_Variable_Daniel_Kim.ipynb new file mode 100644 index 0000000..65847ce --- /dev/null +++ b/2_1_Analyzing_One_Categorical_Variable_Daniel_Kim.ipynb @@ -0,0 +1,2662 @@ +{ + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + }, + "colab": { + "name": "2.1 Analyzing One Categorical Variable - Daniel Kim", + "provenance": [], + "collapsed_sections": [], + "include_colab_link": true + } + }, + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "view-in-github", + "colab_type": "text" + }, + "source": [ + "\"Open" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "0bNZijfPJtYw" + }, + "source": [ + "Recall from Chapter 1 that a **categorical variable** is a variable that can take on one of a limited set of values (which are called _levels_). In this chapter, we will discuss ways to analyze categorical variables. Throughout this chapter, we use the Titanic data set as a working example. This data set contains information about all of the people who were aboard the RMS Titanic when it sank in 1912, including both passengers and crew. Let's start by reading in this data set." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "_YFx6GgqJtYz", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 424 + }, + "outputId": "2b83eaf0-3193-45a8-b3b7-4e90540a24b3" + }, + "source": [ + "import pandas as pd\n", + "\n", + "data_dir = \"http://dlsun.github.io/pods/data/\"\n", + "df_titanic = pd.read_csv(data_dir + \"titanic.csv\")\n", + "df_titanic" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/html": [ + "\n", + "
\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
namegenderageclassembarkedcountryticketnofaresurvived
0Abbing, Mr. Anthonymale42.03rdSUnited States5547.07.110
1Abbott, Mr. Eugene Josephmale13.03rdSUnited States2673.020.050
2Abbott, Mr. Rossmore Edwardmale16.03rdSUnited States2673.020.050
3Abbott, Mrs. Rhoda Mary 'Rosa'female39.03rdSEngland2673.020.051
4Abelseth, Miss. Karen Mariefemale16.03rdSNorway348125.07.131
..............................
2202Wynn, Mr. Waltermale41.0deck crewBEnglandNaNNaN1
2203Yearsley, Mr. Harrymale40.0victualling crewSEnglandNaNNaN1
2204Young, Mr. Francis Jamesmale32.0engineering crewSEnglandNaNNaN0
2205Zanetti, Sig. Miniomale20.0restaurant staffSEnglandNaNNaN0
2206Zarracchi, Sig. L.male26.0restaurant staffSEnglandNaNNaN0
\n", + "

2207 rows × 9 columns

\n", + "
\n", + " \n", + " \n", + " \n", + "\n", + " \n", + "
\n", + "
\n", + " " + ], + "text/plain": [ + " name gender age ... ticketno fare survived\n", + "0 Abbing, Mr. Anthony male 42.0 ... 5547.0 7.11 0\n", + "1 Abbott, Mr. Eugene Joseph male 13.0 ... 2673.0 20.05 0\n", + "2 Abbott, Mr. Rossmore Edward male 16.0 ... 2673.0 20.05 0\n", + "3 Abbott, Mrs. Rhoda Mary 'Rosa' female 39.0 ... 2673.0 20.05 1\n", + "4 Abelseth, Miss. Karen Marie female 16.0 ... 348125.0 7.13 1\n", + "... ... ... ... ... ... ... ...\n", + "2202 Wynn, Mr. Walter male 41.0 ... NaN NaN 1\n", + "2203 Yearsley, Mr. Harry male 40.0 ... NaN NaN 1\n", + "2204 Young, Mr. Francis James male 32.0 ... NaN NaN 0\n", + "2205 Zanetti, Sig. Minio male 20.0 ... NaN NaN 0\n", + "2206 Zarracchi, Sig. L. male 26.0 ... NaN NaN 0\n", + "\n", + "[2207 rows x 9 columns]" + ] + }, + "metadata": {}, + "execution_count": 225 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "c6RRWPfWJtY5" + }, + "source": [ + "The categorical variables in this data set are:\n", + "- `gender` (male or female)\n", + "- `class`: what class they were in (1st, 2nd, or 3rd) or what type of crew member they were\n", + "- `embarked`: where they embarked (Belfast, Southampton, Cherbourg, or Queenstown)\n", + "- `country`: their country of origin\n", + "- `ticketno`: the ticket number\n", + "- `survived`: whether or not they survived the disaster\n", + "\n", + "Note that `age` and `fare` are quantitative variables. It is tempting to consider `name` a categorical variable, but it is not, since (almost) every person has a unique name. In order for a variable to be categorical, it must take on a _limited_ set of values, ideally with each level appearing multiple times in the data set. Otherwise, the analyses that we describe in this chapter will not be very meaningful." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XRBPQWpZJtZm" + }, + "source": [ + "# Analyzing One Categorical Variable\n", + "\n", + "In this lesson, we focus on a single categorical variable. For a high-level summary of a categorical variable, we can use the `.describe()` command. Note that the behavior of `.describe()` will change, depending on whether `pandas` thinks that the variable is quantitative or categorical, which is why it is important to cast categorical variables to the right type, as we discussed in the previous chapter." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "nru7HFtNJtZn", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "9a4d920b-873a-458c-9dfe-a4248baf161b" + }, + "source": [ + "df_titanic[\"class\"].describe()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "count 2207\n", + "unique 7\n", + "top 3rd\n", + "freq 709\n", + "Name: class, dtype: object" + ] + }, + "metadata": {}, + "execution_count": 226 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "AUwJjfTQJtZt" + }, + "source": [ + "To completely summarize a single categorical variable, we report the number of times each level appeared, or its **frequency**." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "E4kcGK9TJtZt", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "e14dca12-4e4a-4853-dc2a-69a479e0a6fe" + }, + "source": [ + "class_counts = df_titanic[\"class\"].value_counts()\n", + "class_counts" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "3rd 709\n", + "victualling crew 431\n", + "1st 324\n", + "engineering crew 324\n", + "2nd 284\n", + "restaurant staff 69\n", + "deck crew 66\n", + "Name: class, dtype: int64" + ] + }, + "metadata": {}, + "execution_count": 227 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Bmk5Y7zQJtZw" + }, + "source": [ + "Notice that the levels are sorted in decreasing order of frequency by default. We can also report the levels in the order they appear in the data set..." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "yEK0-PecJtZx", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "4f1b1486-258b-4c07-ac86-bccd180c7ab9" + }, + "source": [ + "df_titanic[\"class\"].value_counts(sort=False)" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "engineering crew 324\n", + "victualling crew 431\n", + "1st 324\n", + "restaurant staff 69\n", + "3rd 709\n", + "deck crew 66\n", + "2nd 284\n", + "Name: class, dtype: int64" + ] + }, + "metadata": {}, + "execution_count": 228 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "7P-TqJvrJtZ0" + }, + "source": [ + "...or in alphabetical order, by sorting the index:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "w1jjt1nuJtZ0", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "aa821cea-1fd0-4555-a657-c74ad196821d" + }, + "source": [ + "class_counts.sort_index()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "1st 324\n", + "2nd 284\n", + "3rd 709\n", + "deck crew 66\n", + "engineering crew 324\n", + "restaurant staff 69\n", + "victualling crew 431\n", + "Name: class, dtype: int64" + ] + }, + "metadata": {}, + "execution_count": 229 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "kCTkd_7OJtZ3" + }, + "source": [ + "Note that this produces a _new_ `Series` with the index sorted. It does not sort the original `Series` `class_counts`. To sort the original series, we need to specify that the sorting should be done in place, just like we did for `.set_index()` in Chapter 1." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "6tMY5S4CJtZ3", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "78377215-6382-4b0b-d1d5-895f7761ef5a" + }, + "source": [ + "class_counts.sort_index(inplace=True)\n", + "class_counts" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "1st 324\n", + "2nd 284\n", + "3rd 709\n", + "deck crew 66\n", + "engineering crew 324\n", + "restaurant staff 69\n", + "victualling crew 431\n", + "Name: class, dtype: int64" + ] + }, + "metadata": {}, + "execution_count": 230 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "HBFyIJmYJtZ8" + }, + "source": [ + "Any other order would require selecting the levels manually, in the desired order." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "Q7vi4FgIJtZ9", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "5289ef2d-79d9-463e-db08-c12be3216fb5" + }, + "source": [ + "class_counts.loc[\n", + " [\"1st\", \"2nd\", \"3rd\",\n", + " \"deck crew\", \"engineering crew\", \"victualling crew\",\n", + " \"restaurant staff\"]\n", + "]" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "1st 324\n", + "2nd 284\n", + "3rd 709\n", + "deck crew 66\n", + "engineering crew 324\n", + "victualling crew 431\n", + "restaurant staff 69\n", + "Name: class, dtype: int64" + ] + }, + "metadata": {}, + "execution_count": 231 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "IKNQM_UxJtZ_" + }, + "source": [ + "This information can be visualized using a **bar chart**." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "1Pr1BpI9JtZ_", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 359 + }, + "outputId": "e0cbb48e-36ae-4412-f900-4f4b4e68974c" + }, + "source": [ + "class_counts.plot.bar()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "" + ] + }, + "metadata": {}, + "execution_count": 232 + }, + { + "output_type": "display_data", + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAFFCAYAAAAXcq1YAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nO3de5xdZX3v8c+XmwjKfUwhSQkqQqnlEqNAsdpKrXJR0FIUL+RQjrGVKmptTU+16nmhxfa8pKAtNYoaqlYu4iEFtNKIKLSgw0WuUmKEQ9JAInITtAh8zx/r2bIzTjJ7T2b2mv3wfb9e89prPWvNzC+8Nt9Z+1nPeh7ZJiIi6rJZ2wVERMTUS7hHRFQo4R4RUaGEe0REhRLuEREV2qLtAgB22WUXz5s3r+0yIiKGyjXXXPMj2yPjHZsR4T5v3jxGR0fbLiMiYqhIunNDx9ItExFRoYR7RESFEu4RERVKuEdEVCjhHhFRoYR7RESFEu4RERVKuEdEVGjCcJe0l6Tru74elPROSTtJulTS7eV1x3K+JJ0haYWkGyTNn/5/RkREdJvwCVXbtwH7A0jaHFgNfAVYDCy3faqkxWX/vcBhwJ7l60DgzPIalZi3+OJp/fl3nHrEtP78iKeCfrtlDgV+YPtO4ChgaWlfChxdto8CznbjKmAHSbtOSbUREdGTfsP99cA/l+1ZtteU7buBWWV7NnBX1/esKm3rkbRI0qik0XXr1vVZRkREbEzP4S5pK+DVwHljj7lZiLWvxVhtL7G9wPaCkZFxJzWLiIhJ6ufK/TDgWtv3lP17Ot0t5XVtaV8NzO36vjmlLSIiBqSfcD+OJ7tkAJYBC8v2QuDCrvbjy6iZg4AHurpvIiJiAHqaz13StsDLgbd2NZ8KnCvpROBO4NjSfglwOLACeAQ4YcqqjYiInvQU7rYfBnYe03YvzeiZsecaOGlKqouIiEnJE6oRERVKuEdEVCjhHhFRoYR7RESFEu4RERVKuEdEVCjhHhFRoYR7RESFEu4RERVKuEdEVCjhHhFRoYR7RESFEu4RERVKuEdEVCjhHhFRoYR7RESFEu4RERVKuEdEVCjhHhFRoYR7RESFegp3STtIOl/S9yXdKulgSTtJulTS7eV1x3KuJJ0haYWkGyTNn95/QkREjNXrlfvpwNds7w3sB9wKLAaW294TWF72AQ4D9ixfi4Azp7TiiIiY0IThLml74CXAWQC2H7V9P3AUsLScthQ4umwfBZztxlXADpJ2nfLKIyJig3q5ct8DWAd8VtJ1kj4taVtglu015Zy7gVllezZwV9f3ryptERExIL2E+xbAfOBM2wcAD/NkFwwAtg24n18saZGkUUmj69at6+dbIyJiAr2E+ypgle2ry/75NGF/T6e7pbyuLcdXA3O7vn9OaVuP7SW2F9heMDIyMtn6IyJiHBOGu+27gbsk7VWaDgVuAZYBC0vbQuDCsr0MOL6MmjkIeKCr+yYiIgZgix7PezvwBUlbASuBE2j+MJwr6UTgTuDYcu4lwOHACuCRcm5ERAxQT+Fu+3pgwTiHDh3nXAMnbWJdERGxCfKEakREhRLuEREVSrhHRFQo4R4RUaGEe0REhRLuEREVSrhHRFQo4R4RUaGEe0REhRLuEREVSrhHRFQo4R4RUaGEe0REhRLuEREVSrhHRFQo4R4RUaGEe0REhRLuEREVSrhHRFQo4R4RUaGEe0REhXoKd0l3SLpR0vWSRkvbTpIulXR7ed2xtEvSGZJWSLpB0vzp/AdERMQv6+fK/Xds7297QdlfDCy3vSewvOwDHAbsWb4WAWdOVbEREdGbTemWOQpYWraXAkd3tZ/txlXADpJ23YTfExERfeo13A18XdI1khaVtlm215Ttu4FZZXs2cFfX964qbeuRtEjSqKTRdevWTaL0iIjYkC16PO/FtldLehZwqaTvdx+0bUnu5xfbXgIsAViwYEFf3xsRERvX05W77dXldS3wFeBFwD2d7pbyuracvhqY2/Xtc0pbREQMyIThLmlbSc/sbAO/B9wELAMWltMWAheW7WXA8WXUzEHAA13dNxERMQC9dMvMAr4iqXP+F21/TdJ3gXMlnQjcCRxbzr8EOBxYATwCnDDlVUdExEZNGO62VwL7jdN+L3DoOO0GTpqS6iIiYlLyhGpERIUS7hERFUq4R0RUKOEeEVGhhHtERIUS7hERFUq4R0RUKOEeEVGhhHtERIUS7hERFUq4R0RUKOEeEVGhhHtERIUS7hERFep1mb2IiKe8eYsvntaff8epR0zZz8qVe0REhRLuEREVSrhHRFQo4R4RUaGEe0REhXoOd0mbS7pO0kVlfw9JV0taIekcSVuV9qeV/RXl+LzpKT0iIjaknyv3k4Fbu/Y/Cpxm+7nAfcCJpf1E4L7Sflo5LyIiBqincJc0BzgC+HTZF/Ay4PxyylLg6LJ9VNmnHD+0nB8REQPS65X73wF/DjxR9ncG7rf9WNlfBcwu27OBuwDK8QfK+euRtEjSqKTRdevWTbL8iIgYz4ThLulIYK3ta6byF9teYnuB7QUjIyNT+aMjIp7yepl+4BDg1ZIOB7YGtgNOB3aQtEW5Op8DrC7nrwbmAqskbQFsD9w75ZVHRMQGTXjlbvsvbM+xPQ94PfAN228ELgOOKactBC4s28vKPuX4N2x7SquOiIiN2pRx7u8F3i1pBU2f+lml/Sxg59L+bmDxppUYERH96mtWSNvfBL5ZtlcCLxrnnJ8BfzAFtUVExCTlCdWIiAol3CMiKpRwj4ioUMI9IqJCCfeIiAol3CMiKpRwj4ioUMI9IqJCCfeIiAol3CMiKpRwj4ioUMI9IqJCCfeIiAol3CMiKpRwj4ioUMI9IqJCfS3WMVPMW3zxtP78O049Ylp/fsRk5b0fvcqVe0REhRLuEREVSrhHRFRownCXtLWk70j6nqSbJX2otO8h6WpJKySdI2mr0v60sr+iHJ83vf+EiIgYq5cr9/8GXmZ7P2B/4JWSDgI+Cpxm+7nAfcCJ5fwTgftK+2nlvIiIGKAJw92Nn5TdLcuXgZcB55f2pcDRZfuosk85fqgkTVnFERExoZ763CVtLul6YC1wKfAD4H7bj5VTVgGzy/Zs4C6AcvwBYOdxfuYiSaOSRtetW7dp/4qIiFhPT+Fu+3Hb+wNzgBcBe2/qL7a9xPYC2wtGRkY29cdFRESXvkbL2L4fuAw4GNhBUuchqDnA6rK9GpgLUI5vD9w7JdVGRERPehktMyJph7L9dODlwK00IX9MOW0hcGHZXlb2Kce/YdtTWXRERGxcL9MP7AoslbQ5zR+Dc21fJOkW4EuSTgGuA84q558F/JOkFcCPgddPQ90REbERE4a77RuAA8ZpX0nT/z62/WfAH0xJdZXK/CARMd3yhGpERIUS7hERFUq4R0RUKOEeEVGhhHtERIUS7hERFUq4R0RUKOEeEVGhhHtERIUS7hERFUq4R0RUKOEeEVGhhHtERIUS7hERFUq4R0RUKOEeEVGhhHtERIUS7hERFUq4R0RUKOEeEVGhCcNd0lxJl0m6RdLNkk4u7TtJulTS7eV1x9IuSWdIWiHpBknzp/sfERER6+vlyv0x4E9t7wMcBJwkaR9gMbDc9p7A8rIPcBiwZ/laBJw55VVHRMRGTRjuttfYvrZsPwTcCswGjgKWltOWAkeX7aOAs924CthB0q5TXnlERGxQX33ukuYBBwBXA7NsrymH7gZmle3ZwF1d37aqtEVExID0HO6SngF8GXin7Qe7j9k24H5+saRFkkYlja5bt66fb42IiAn0FO6StqQJ9i/YvqA039Ppbimva0v7amBu17fPKW3rsb3E9gLbC0ZGRiZbf0REjKOX0TICzgJutf2xrkPLgIVleyFwYVf78WXUzEHAA13dNxERMQBb9HDOIcCbgRslXV/a/hdwKnCupBOBO4Fjy7FLgMOBFcAjwAlTWnFERExownC3fQWgDRw+dJzzDZy0iXVFRMQmyBOqEREVSrhHRFQo4R4RUaGEe0REhRLuEREVSrhHRFQo4R4RUaGEe0REhRLuEREVSrhHRFQo4R4RUaGEe0REhRLuEREVSrhHRFQo4R4RUaGEe0REhRLuEREVSrhHRFQo4R4RUaGEe0REhSYMd0mfkbRW0k1dbTtJulTS7eV1x9IuSWdIWiHpBknzp7P4iIgYXy9X7p8DXjmmbTGw3PaewPKyD3AYsGf5WgScOTVlRkREPyYMd9vfAn48pvkoYGnZXgoc3dV+thtXATtI2nWqio2IiN5Mts99lu01ZftuYFbZng3c1XXeqtIWEREDtMk3VG0bcL/fJ2mRpFFJo+vWrdvUMiIiostkw/2eTndLeV1b2lcDc7vOm1PafontJbYX2F4wMjIyyTIiImI8kw33ZcDCsr0QuLCr/fgyauYg4IGu7puIiBiQLSY6QdI/A78N7CJpFfAB4FTgXEknAncCx5bTLwEOB1YAjwAnTEPNERExgQnD3fZxGzh06DjnGjhpU4uKiIhNkydUIyIqlHCPiKhQwj0iokIJ94iICk14QzUiYqrMW3zxtP78O049Ylp//jBJuMdTznQGTMIlZop0y0REVCjhHhFRoYR7RESFEu4RERVKuEdEVCjhHhFRoYR7RESFEu4RERVKuEdEVCjhHhFRoYR7RESFEu4RERVKuEdEVCjhHhFRoYR7RESFpiXcJb1S0m2SVkhaPB2/IyIiNmzKw13S5sDfA4cB+wDHSdpnqn9PRERs2HRcub8IWGF7pe1HgS8BR03D74mIiA2Q7an9gdIxwCtt/8+y/2bgQNt/Mua8RcCisrsXcNuUFrK+XYAfTePPn26pvz3DXDuk/rZNd/272x4Z70Bra6jaXgIsGcTvkjRqe8Egftd0SP3tGebaIfW3rc36p6NbZjUwt2t/TmmLiIgBmY5w/y6wp6Q9JG0FvB5YNg2/JyIiNmDKu2VsPybpT4B/BTYHPmP75qn+PX0aSPfPNEr97Rnm2iH1t621+qf8hmpERLQvT6hGRFQo4R4RUaGEe0QXSVu3XcNTkaTl5fWjbdeyKSQdKunpbdcBLY5zj5ihbpJ0D/Dt8nWF7Qdarqlnkk4EvmX79rZr6dOukn4TeLWkLwHqPmj72nbK6tvxwJmSfkzz/vkWzXvovkEXUu0NVUmH2L5yoraZRtL8jR2fyW9ySf8CbPANZfvVAyxn0iT9KvBbwCHA4cD9tvdvt6reSPoQTe3zgGtowuXbtq9vs66JlCfbTwReTDOcujvcbftlrRQ2SZJ2A44B3gPsZnvgF9I1h/u1tudP1DbTSLqsbG4NLAC+R/NG3xcYtX1wW7VNRNJLy+ZrgV8BPl/2jwPusf2uVgrrg6Q5NOH4UmA/4Mc0V15/3WphfSpdA2+hCZfZtjdvuaSN6lx4Sfor2/+77XomS9KbaN4/v0Ez7cAVNH9c/2PgtdQW7pIOBn4TeCdwWteh7YDX2N6vlcL6JOkC4AO2byz7zwc+aPuYdiub2HiPXA/LY+SSnqC5cvyI7Qvbrqdfkt5H84njGcB1PBkua1otbAKSrrH9gmG4ANsYST8CfgD8I3CZ7TvaqqXGPvetaN7YWwDP7Gp/kOZj0rDYqxPsALZvkvRrbRbUh20lPdv2SgBJewDbtlxTrw6g6Rp4Q1mL4HbgcttntVtWz14LPAZcDFwO/Ift/263pJ78XNISYLakM8YetP2OFmrqm+1dJP068BLgw5L2BG6z/eZB11JduNu+HLhc0uds3wkgaTPgGbYfbLe6vtwg6dM82bXxRuCGFuvpxzuBb0paSdOltDtPzgA6o9n+nqQf0Fx9/RbwJpoumqEId9vzJW1Hc/X+cmCJpLW2X9xyaRM5Evhd4BU09wqGUvlv/6s07/l5wPbAE63UUlu3TIekLwJ/BDxO8zF7O+B023/bamE9KkPy/pjmCgCaG2Nn2v5Ze1VNrPwhPQa4ENi7NH9/SK4ekTQKPA34d8qImc5FwjAo3XedewYLgLto/g1/1WphPZK0n+3vtV3HZEm6gaYr7AqaUUurWqul4nC/3vb+kt4IzAcWA9fY3rfl0qo3LP3r45E0Yntd23VMlqSLeHIY53dt/7zlkvpSLmpOBH6dZlABALb/sLWiJkHSNrYfabOGmh9i2lLSlsDRwLIhfJMfIulSSf8paWXnq+26evRvkt4jaa6knTpfbRfVo80knSXpqwCS9iljx4eC7SOBM4B7h+09X/wTzUirV9DcM5gDPNRqRX2QdLCkW4Dvl/39JP1DG7XUHO6fBO6guZH3LUm7A0PzMApNH+/HaG7uvbDraxi8DjiJpitplKYPdbTVinr3OZoZTXcr+/9Jcw9hKEh6FXA98LWyv7+kYZpy+7m23w88bHspcARwYMs19ePvaP4w3QvNPRye7FodqGrD3fYZtmfbPtxN39P/A85uu64+PGD7q7bX2r6389V2UROR9CLgWNt70Nwk+3vgJNvPbreynu1i+1zKTTDbj9HctxkWH6RZx/h+gPLw0h5tFtSnzqeN+8v9g+2BZ7VYT99s3zWmqZX3T3WjZTbEtiW9H/hU27X06DJJfwtcAPziZuQMf0L1A8BhwBaSLqUJmW8CiyUdYPvDbdbXo4cl7Ux50lbSQQzXJ76f235AWv/p/baKmYQlknYE3kezyM8zgPe3W1Jf7irTKLh0C58M3NpGIdXdUC13q8c9BDzP9tMGWc9kdT2p2m1GP4Yt6UZgf5rRJncDc2w/WJ6WvHoYbmaX6R8+DjwfuAkYAY6xPRTDUCWdBSynGUDw+8A7gC1t/1GrhfVI0h62fzhR20wlaRfgdJphnQK+DpzcxqfuGq/cZ9H0eY2dqEc0w9tmPEl7A6fQBOJPutoPa6+qnjxm+3HgEUk/6DxXYPun5cnPGU3S5jRDCF8K7EXznrltyG5Mvh34S5pPe1+kuX9wSqsV9efLNKPbup0PvKCFWvpS3j+n235j27VAneF+Ec0DS780UZKkbw6+nP5IegfNzchbgbMkndz1GPyHga+2VtzEHu0aAvaL/xkltfYgRz9sPy7pONunAW0vDdm3Ei4X2/4dmoAfGuWC5teB7SW9tuvQdnQNiZzJyvtnd0lb2X607XqqC3fbGxy2ZvsNg6xlkt4CvMD2TyTNA86XNM/26YyZBnUGeknnYSXb3WG+JbCwnZL6dqWkTwDnAA93GmfyvY6OEi5PSNp+mKYpLvaiuQG/A/CqrvaHaP6fGBYrad5Dy1j//fOxQRdSXbhXYLNOV4ztOyT9Nk3A784MD/cNPYVq+0c0M+QNg87Uvt0zExqYsfc6xvgJcGO5od0dLjN6bpby6fRCSQe3MYPiFOpMXbEZ689tNXAJ95nnHkn7d7qVyhX8kcBnaKYRjWlUujSG2QXla1i9RtLNwE9pxurvC7zL9uc3/m0zg+0PtV1DR3WjZYZdmU/8Mdt3j3Nsxi82MuwkfQT4G9v3l/0dgT+1/b52K+uNpG2Bn5Ub251++Ke1/Sh8r7qmDXkNTTfNu2nmaBmWqbovBf5gzPvnS7ZfMehaqn2IaVjZXjVesJdjCfbpd1jnf0wAN8ujHd5iPf1aDnSv4fl04N9aqmUytiyvRwDnDeG9g5Fx3j+tPISVcI9Y3+aSfvEsRBmjPxTPRhRbdw+fLdvbtFhPv/5F0vdpRlstlzQCzOiZUMd4XM0yjQCUe2WtdI+kzz1ifV+gCZXPlv0TgKUt1tOvhyXN74zukfQCmv7roWB7saS/oZl+43FJjwBHtV1XH/4SuELS5TQDIH6LltYySJ97xBiSXknzhCHApbb/tc16+iHphcCXgP+iCZdfAV5ne2gXwBg25SnVg8ruVWW02ODrSLhH1KXMabJX2R22J2xjiiTcIyIqlBuqETFjSFreS1tMLDdUI7pIesHY/mlJR9q+qK2angrK8nrbALuUseGdp7G3A2a3VlifNrDi2ENtdI0l3CPW9ylJx9u+CUDScTQrMQ1FuJcpi8d6ALizLDwyU72V5r/zbjQrd3XC/UHgE20VNQnXAnNpZqUVzVw5d0u6B3jLIG9sp889ooukZ9NMMfsGmmFsxwNHDsvDNJKuopky9waacHk+zQyX2wN/bPvrLZY3IUlvt/3xtuuYLEmfAs7vjLCS9Hs08+p/lmY64IEtGZhwjxhD0vOA/0uzNONrbA/NOHFJFwDvt31z2d+HZhK0PwcusL3/xr5/JigrGc2jq2fB9lAskSnpRtu/MabtBtv7dqZWGFQt6ZaJ4BerSHVf6ewEbA5cLYlhWEWqeF4n2AFs3yJpb9srxyy9NyNJ+ifgOTSLfHfWHjXDs/7xGknvpXnWAJrF4u8pc/wMdE2DhHtE48i2C5giN0s6k/XD5ZYypcIwjHdfAOzj4e1SeAPwAZpPfgBXlrbNgWMHWUi6ZSK6lAWxb7b9UNnfDvg121e3W1lvylw4bwNeXJquBP6BZn6WbbrnnZmJJJ0HvMP2mrZrGXYJ94gukq4D5neuHCVtBozaHm8USkyxsjD8/sB3aNaBBcD2q1srqg/lfs17+OV7BgNf7CXdMhHrU3eXgO0nJA3N/yeSDgE+COzO+uHy7LZq6tMH2y5gE50H/CPwaZ68Z9CKoXnTRgzIyrJI+Zll/20062IOi7OAd9GMFW81XCbD9uVt17CJHrN95sSnTb9MPxCxvj8CfhNYDawCDqSlKVsn6QHbX7W91va9na+2i+qVpIMkfVfSTyQ9KulxSQ+2XVcf/kXS2yTtKmmnzlcbhaTPPaIikk6lGZlxAev3WV/bWlF9kDQKvJ6me2MBzUNkz7P9F60W1iNJPxyn2W10iyXcI7qUG2JnArNsP1/SvsCrbZ/Scmk9KTckx3IbN/QmQ9Ko7QWdB39K23W2D2i7tmGTcI/oUlbQ+TPgk51AkXST7ee3W9lTg6Rv0SyU8mngbmAN8D9m+gLZkl5m+xuSXjvecdsXDLqm3FCNWN82tr8z5mnOmTzhFgCS3mT785LePd5x2x8bdE2T9Gaae4F/QnNjeC7N3Cwz3UuBbwCvGueYabrJBirhHrG+H0l6DmUqAknH0Fw9znTbltdntlrFJiiP6H/E9htpHrr6UMsl9cz2B8rrCW3X0pFwj1jfScASYG9Jq4EfAm9qt6SJ2f5kCccHbZ/Wdj2TURbE3l3SVrYfbbuefmzoE1NHG5+cEu4RXWyvBH5X0rbAZp1pCIZBCcfjgKEM92IlcKWkZcDDncYh6FaacZ+YckM1gpl55TUZkk4DtgTOYf1wHJahkB8Yr9320HTRzBQJ9wjWC5W9gBcCy8r+q4Dv2J7xXTMw/EMhh5WkMzZ23PY7BlVLR8I9oksZindE16yQzwQutv2Sdit7aih/nH4plGb6HydJCzd23PbSQdXSkT73iPXNArpv5j1a2oaCpFnAR4DdbB9WVmI62PZZLZfWq/d0bW9NMwxyxg9FbSO8J5Jwj1jf2cB3JH2l7B8NfK69cvr2OZr1Ov+y7P8nTf/7UIT7OAtIXynpO60UMwmSRoD3AvvQ/HEC2vnkkYnDIrrY/jBwAs3q9fcBJ9j+63ar6ssuts+lLOlm+zGGaHbI7sm2JO0i6RU0i3sPiy8AtwJ70IzTvwP4bhuF5Mo9YowysmQoRpeM42FJO/PkQ1gHAQ+0W1JfrqGpXTTdMT8ETmy1ov7sbPssSSeX6Ysvl5Rwj4hN9m6akT7PkXQlMAIc025JvbO9R9s1bKLOOrVrJB0B/BfNYusDl9EyEZUpK0ftRXP1e5vtYVgY+xckPZ9f7rM+u72KeifpSODbNHPifBzYDviQ7WUb/cbpqCXhHlEPSdvQXL3vbvstkvYE9rJ9Ucul9aQ8b/DbNOF+CXAYcIXtofn0MVMk3CMqIukcmn7r48t89NsA/257/5ZL64mkG4H9gOts71eGdn7e9stbLq0nkj7L+OP0/3DQtaTPPaIuz7H9ujLHDLYf0Zj5i2e4n5ZFyR+TtB2wlqaLY1h0f0LaGngNTb/7wCXcI+ryqKSn8+RomefQtdzeEBiVtAPwKZpPID8B/qPdknpn+8vd+5L+GbiijVrSLRNREUkvB95H02f9deAQmpWMvtlmXZMhaR6wne0bWi5l0iTtRTN9xXMH/rsT7hF1KePcD6IZLXOV7R+1XFLPJC23fehEbTOVpIdYv8/9buAvxl7RD0K6ZSLqszXN07VbAPtIwva3Wq5poyRtDWwD7CJpR5o/TNAMJZzdWmF9sj1j5nVPuEdURNJHgdcBN1OmIKC5kpzR4Q68FXgnsBtNX3sn3B8EPtFWUf2aSZ880i0TURFJtwH72h6mm6i/IOnttj/edh396vrkcRnNOP3uTx5fs733oGvKlXtEXVbSrMQ0lOEO3C3pmbYfkvQ+YD5wyhCsJDXjPnnkyj2iIpK+TPMQ0HK6Ar6NlYAmQ9INtveV9GLgFOBvgb+yfWDLpfVkJn3yyJV7RF2W8eQSgcOoMz3xEcAS2xdLOqXNgvr0hKQdbN8PUG4OH2f7HwZdSK7cI2LGkHQRsBp4OU2XzE9p1rDdr9XCeiTp+rFTPUi6zvYBg64lV+4RFZB0ru1jy9ws481tsm8LZU3GscArgf9j+35JuwJ/1nJN/dhcklyumiVtDmzVRiEJ94g6nFxej2y1ik1U5sJZC7wYuJ1mwY7b262qL18DzpH0ybL/1tI2cOmWiYgZo0z5u4BmmuLnSdoNOM/2IS2X1hNJm9EEemdc+6XAp20PfKnDhHtERcZ5/B2aZfZGgT+1vXLwVfVO0vXAAcC1nX7qzgiadisbPumWiajL3wGrgC/SjLV+PfAcmjVhP0PzgM1M9qhtS+r0WW/bdkG9mIn3PHLlHlERSd8bO7KkM4JjvGMzjaT3AHvSjJb5a+APgS/OlLHjGyJpV9trJO0+3nHbdw66ply5R9TlEUnHAueX/WOAn5XtGX0lVxYVOQfYm+bJzr1oHmC6tNXCemB7Tdn8feBLtltZoKNbrtwjKiLp2cDpwME0YX4V8C6aseMvsN3KwhG9knSj7d9ou47JKjeEjwV+TPOH6jzb97RSS8I9ImYKSUuBT9j+btu1bApJ+9LMzvn7wCrbvzvoGtItE1ERSSPAW4B5dP3/3cYCzZN0IPBGSXcCD6J1U5QAAAEYSURBVNPcFPYQjpZZS7NQx73As9ooIOEeUZcLgW8D/8aT87QMk1e0XcCmkPQ2mm6ZEeA84C22b2mjloR7RF22sf3etouYrDZGlUyxucA7bV/fdiHpc4+oSJlB8d9tX9J2LdGuhHtERcoTqtsCj5avTp/1dq0WFgOXcI+IqNBmbRcQEVNHjTdJen/ZnyvpRW3XFYOXK/eIikg6E3gCeJntXysrAX3d9gtbLi0GLKNlIupyoO35kq4DsH2fpFYWi4h2pVsmoi4/L6v/dGZVHKG5ko+nmIR7RF3OAL4CPEvSh4ErgI+0W1K0IX3uEZWRtDfNSkACltu+teWSogUJ94iICqVbJiKiQgn3iIgKJdwjIiqUcI+IqND/Bzfi+FNyP9VCAAAAAElFTkSuQmCC\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + } + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "D2ST50hRJtaB" + }, + "source": [ + "Instead of reporting counts, we can also report proportions or probabilities, or the **relative frequencies**. We can calculate the relative frequencies by specifying `normalize=True` in `.value_counts()`." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "D-3JccgEJtaC", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "0ce37e3f-e4d6-4e9b-c27f-1843eb462cb5" + }, + "source": [ + "class_probs = df_titanic[\"class\"].value_counts(normalize=True)\n", + "class_probs.sort_index()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "1st 0.146806\n", + "2nd 0.128681\n", + "3rd 0.321251\n", + "deck crew 0.029905\n", + "engineering crew 0.146806\n", + "restaurant staff 0.031264\n", + "victualling crew 0.195288\n", + "Name: class, dtype: float64" + ] + }, + "metadata": {}, + "execution_count": 233 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "g6EKTxBBJtaE" + }, + "source": [ + "This is equivalent to taking the counts and dividing by their sum." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "Z2X-cH0wJtaF", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "bd520517-5980-4563-d7c3-f1d6177ae9f4" + }, + "source": [ + "class_counts / class_counts.sum()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "1st 0.146806\n", + "2nd 0.128681\n", + "3rd 0.321251\n", + "deck crew 0.029905\n", + "engineering crew 0.146806\n", + "restaurant staff 0.031264\n", + "victualling crew 0.195288\n", + "Name: class, dtype: float64" + ] + }, + "metadata": {}, + "execution_count": 234 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "a5wGkd9AJtaH" + }, + "source": [ + "Notice that the relative frequencies add up to 1.0, by construction. We can report these relative frequencies using probability notation. For example:\n", + "\n", + "$$ P(\\text{1st class}) = 0.146806. $$\n", + "\n", + "The complete collection of probabilities of all levels of a variable is called the **distribution** of that variable. So the code above calculates the distribution of \"class\" on the Titanic." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "swUzbXFGJtaH" + }, + "source": [ + "The bar chart for relative frequencies (i.e., probabilities) looks qualitatively the same as the bar chart for frequencies (i.e., counts). The only difference is that the scale on the $y$-axis is different." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "LNEzMug6JtaI", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 359 + }, + "outputId": "821a0559-5ae2-487b-9a42-b6783fb2768e" + }, + "source": [ + "class_probs.sort_index().plot.bar()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "" + ] + }, + "metadata": {}, + "execution_count": 235 + }, + { + "output_type": "display_data", + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXoAAAFFCAYAAADijCboAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nO3de5RdZX3/8feHIFBQLJqplQBJwAhG5RpBi5dWQUJBUIsCSqWWJbVKvdX+jEsFpaKov6X1gpQoeK2/cJHWVGKRIqJgkQwXwWBTQkRJihIBQQGNgc/vj70PHMbJzJ7JzNnnPPm81pqVs28nX1gnn9nneZ79PLJNRESUa4u2C4iIiOmVoI+IKFyCPiKicAn6iIjCJegjIgq3ZdsFjDRz5kzPmTOn7TIiIgbKNddc8wvbQ6Md67ugnzNnDsPDw22XERExUCT9ZGPH0nQTEVG4BH1EROES9BERhUvQR0QULkEfEVG4BH1EROES9BERhUvQR0QULkEfEVG4vnsyNgbLnEUXTev733r6YdP6/hGbg9zRR0QULkEfEVG4BH1EROES9BERhUvQR0QULkEfEVG4BH1EROES9BERhUvQR0QUrlHQS1ooaaWkVZIWjXL89ZJulHS9pCskze869s76upWSDpnK4iMiYnzjBr2kGcAZwKHAfODY7iCvfcX2M23vDXwY+Gh97XzgGODpwELg0/X7RUREjzS5o98fWGV7te31wBLgyO4TbN/btbkd4Pr1kcAS27+1/WNgVf1+ERHRI00mNZsF3Na1vQY4YORJkt4IvA3YCnhh17VXjbh21qQqjYiISZmyzljbZ9jeDXgH8O6JXCvpREnDkobXrVs3VSVFRATNgn4tsHPX9k71vo1ZArx0ItfaXmx7ge0FQ0NDDUqKiIimmgT9cmCepLmStqLqXF3afYKkeV2bhwE316+XAsdI2lrSXGAecPWmlx0REU2N20Zve4Okk4CLgRnAObZXSDoVGLa9FDhJ0kHA74C7gePra1dIOg+4CdgAvNH2g9P03xIREaNotMKU7WXAshH7Tu56/eYxrj0NOG2yBUZExKbJk7EREYVL0EdEFC5BHxFRuAR9REThEvQREYVL0EdEFC5BHxFRuAR9REThEvQREYVL0EdEFC5BHxFRuAR9REThEvQREYVL0EdEFC5BHxFRuAR9REThEvQREYVL0EdEFC5BHxFRuAR9REThEvQREYVL0EdEFC5BHxFRuEZBL2mhpJWSVklaNMrxt0m6SdINki6VNLvr2IOSrq9/lk5l8RERMb4txztB0gzgDOBgYA2wXNJS2zd1nXYdsMD2/ZL+FvgwcHR97AHbe09x3RER0VCTO/r9gVW2V9teDywBjuw+wfZltu+vN68CdpraMiMiYrKaBP0s4Lau7TX1vo05AfhG1/Y2koYlXSXppZOoMSIiNsG4TTcTIek4YAHwgq7ds22vlbQr8C1JN9q+ZcR1JwInAuyyyy5TWVJExGavyR39WmDnru2d6n2PIukg4F3AEbZ/29lve23952rg28A+I6+1vdj2AtsLhoaGJvQfEBERY2sS9MuBeZLmStoKOAZ41OgZSfsAZ1GF/B1d+3eQtHX9eiZwINDdiRsREdNs3KYb2xsknQRcDMwAzrG9QtKpwLDtpcBHgMcC50sC+KntI4CnAWdJeojql8rpI0brRETENGvURm97GbBsxL6Tu14ftJHrvgc8c1MKjIiITZMnYyMiCpegj4goXII+IqJwCfqIiMIl6CMiCpegj4goXII+IqJwCfqIiMIl6CMiCpegj4goXII+IqJwCfqIiMIl6CMiCpegj4goXII+IqJwCfqIiMIl6CMiCpegj4goXII+IqJwCfqIiMIl6CMiCpegj4goXII+IqJwCfqIiMI1CnpJCyWtlLRK0qJRjr9N0k2SbpB0qaTZXceOl3Rz/XP8VBYfERHjGzfoJc0AzgAOBeYDx0qaP+K064AFtvcELgA+XF/7BOAU4ABgf+AUSTtMXfkRETGeJnf0+wOrbK+2vR5YAhzZfYLty2zfX29eBexUvz4EuMT2XbbvBi4BFk5N6RER0USToJ8F3Na1vabetzEnAN+YyLWSTpQ0LGl43bp1DUqKiIimprQzVtJxwALgIxO5zvZi2wtsLxgaGprKkiIiNntNgn4tsHPX9k71vkeRdBDwLuAI27+dyLURETF9tmxwznJgnqS5VCF9DPCq7hMk7QOcBSy0fUfXoYuBD3R1wL4YeOcmVx0R0bI5iy6a1ve/9fTDpuy9xg162xsknUQV2jOAc2yvkHQqMGx7KVVTzWOB8yUB/NT2EbbvkvSPVL8sAE61fdeUVR8REeNqckeP7WXAshH7Tu56fdAY154DnDPZAiMiYtPkydiIiMIl6CMiCpegj4goXII+IqJwCfqIiMIl6CMiCpegj4goXII+IqJwCfqIiMIl6CMiCpegj4goXII+IqJwCfqIiMIl6CMiCpegj4goXII+IqJwCfqIiMIl6CMiCpegj4goXII+IqJwCfqIiMIl6CMiCpegj4goXKOgl7RQ0kpJqyQtGuX48yVdK2mDpKNGHHtQ0vX1z9KpKjwiIprZcrwTJM0AzgAOBtYAyyUttX1T12k/Bf4KePsob/GA7b2noNaIiJiEcYMe2B9YZXs1gKQlwJHAw0Fv+9b62EPTUGNERGyCJk03s4DburbX1Pua2kbSsKSrJL10tBMknVifM7xu3boJvHVERIynF52xs20vAF4F/JOk3UaeYHux7QW2FwwNDfWgpIiIzUeTppu1wM5d2zvV+xqxvbb+c7WkbwP7ALdMoMYxzVl00VS91ahuPf2waX3/iE2Rz3800eSOfjkwT9JcSVsBxwCNRs9I2kHS1vXrmcCBdLXtR0TE9Bs36G1vAE4CLgZ+BJxne4WkUyUdASDpWZLWAK8AzpK0or78acCwpB8AlwGnjxitExER06xJ0w22lwHLRuw7uev1cqomnZHXfQ945ibWGBERmyBPxkZEFC5BHxFRuAR9REThEvQREYVL0EdEFC5BHxFRuAR9REThEvQREYVr9MBUTJ/MVRIR0y139BERhUvQR0QULkEfEVG4BH1EROES9BERhUvQR0QULkEfEVG4BH1EROES9BERhUvQR0QULkEfEVG4BH1EROES9BERhUvQR0QUrlHQS1ooaaWkVZIWjXL8+ZKulbRB0lEjjh0v6eb65/ipKjwiIpoZN+glzQDOAA4F5gPHSpo/4rSfAn8FfGXEtU8ATgEOAPYHTpG0w6aXHRERTTW5o98fWGV7te31wBLgyO4TbN9q+wbgoRHXHgJcYvsu23cDlwALp6DuiIhoqEnQzwJu69peU+9rotG1kk6UNCxpeN26dQ3fOiIimuiLzljbi20vsL1gaGio7XIiIorSJOjXAjt3be9U72tiU66NiIgp0CTolwPzJM2VtBVwDLC04ftfDLxY0g51J+yL630REdEj4wa97Q3ASVQB/SPgPNsrJJ0q6QgASc+StAZ4BXCWpBX1tXcB/0j1y2I5cGq9LyIiemTLJifZXgYsG7Hv5K7Xy6maZUa79hzgnE2oMSIiNkFfdMZGRMT0SdBHRBQuQR8RUbgEfURE4RL0ERGFS9BHRBQuQR8RUbgEfURE4RL0ERGFS9BHRBQuQR8RUbgEfURE4RL0ERGFS9BHRBQuQR8RUbgEfURE4RL0ERGFS9BHRBQuQR8RUbgEfURE4RL0ERGFS9BHRBQuQR8RUbhGQS9poaSVklZJWjTK8a0lnVsf/76kOfX+OZIekHR9/fPPU1t+RESMZ8vxTpA0AzgDOBhYAyyXtNT2TV2nnQDcbfspko4BPgQcXR+7xfbeU1x3REQ01OSOfn9gle3VttcDS4AjR5xzJPCF+vUFwIskaerKjIiIyWoS9LOA27q219T7Rj3H9gbgHuCJ9bG5kq6TdLmk521ivRERMUHjNt1sotuBXWzfKWk/4N8kPd32vd0nSToROBFgl112meaSIiI2L03u6NcCO3dt71TvG/UcSVsCjwfutP1b23cC2L4GuAV46si/wPZi2wtsLxgaGpr4f0VERGxUk6BfDsyTNFfSVsAxwNIR5ywFjq9fHwV8y7YlDdWduUjaFZgHrJ6a0iMioolxm25sb5B0EnAxMAM4x/YKSacCw7aXAmcDX5K0CriL6pcBwPOBUyX9DngIeL3tu6bjPyQiIkbXqI3e9jJg2Yh9J3e9/g3wilGu+yrw1U2sMSIiNkGejI2IKFyCPiKicAn6iIjCJegjIgqXoI+IKFyCPiKicAn6iIjCTfdcNxERGzVn0UXT9t63nn7YtL33oEnQx2ZtOoMGEjbRH9J0ExFRuAR9REThEvQREYVL0EdEFC5BHxFRuAR9REThEvQREYVL0EdEFC5BHxFRuAR9REThEvQREYVL0EdEFC5BHxFRuAR9REThEvQREYVrFPSSFkpaKWmVpEWjHN9a0rn18e9LmtN17J31/pWSDpm60iMioolxg17SDOAM4FBgPnCspPkjTjsBuNv2U4CPAR+qr50PHAM8HVgIfLp+v4iI6JEmd/T7A6tsr7a9HlgCHDninCOBL9SvLwBeJEn1/iW2f2v7x8Cq+v0iIqJHmiwlOAu4rWt7DXDAxs6xvUHSPcAT6/1Xjbh21si/QNKJwIn15q8lrWxU/eTMBH7R9GR9aBormZzU367U367G9Q9y7TCp+mdv7EBfrBlrezGwuBd/l6Rh2wt68XdNh9TfrtTfrkGuv83amzTdrAV27treqd436jmStgQeD9zZ8NqIiJhGTYJ+OTBP0lxJW1F1ri4dcc5S4Pj69VHAt2y73n9MPSpnLjAPuHpqSo+IiCbGbbqp29xPAi4GZgDn2F4h6VRg2PZS4GzgS5JWAXdR/TKgPu884CZgA/BG2w9O039LUz1pIppGqb9dqb9dg1x/a7WruvGOiIhS5cnYiIjCJegjIgqXoI8Yg6Rt2q5hcyTp0vrP/hsN35CkF0n6g7brgD4ZRx/Rx34o6efAd+ufK2zf03JNjUk6AfiO7ZvbrmWCnizpT4AjJC0B1H3Q9rXtlDUhrwHOlHQX1WfnO1Sfn7t7Xchm0Rkr6UDbV463r99I2nes4/3+YZf078BGP2C2j+hhOZMmaRfgecCBwJ8Dv7S9d7tVNSPpfVS1zwGuoQqb79q+vs26xiPpKKo5tJ5LNcS7O+ht+4WtFDYJknakGnb+dmBH2z2/wd5cgv5a2/uOt6/fSLqsfrkNsAD4AdUHfk+qoa3Paau2JiS9oH75cuCPgS/X28cCP7f91lYKmwBJO1EF5QuAvaiGD19h+4OtFjZBdRPC66jCZpbtvp5csHMjJulk26e2Xc9kSDqO6rPzTKqpD66g+iX7Xz2vpeSgl/Qc4E+At1DNqtmxPfAy23u1UtgESboQOMX2jfX2M4D32j6q3cqaGe3R70F5lF3SQ1R3lB+w/bW265koSe+m+ibyWOA6Hgmb21stbBySrrG93yDckG2MpF8AtwD/DFxm+9a2aim9jX4rqg/4lsDjuvbfS/VValDs3gl5ANs/lPS0NguaoO0k7Wp7NUD9lPR2LdfU1D5UzQevqtdiuBm43PbZ7ZbV2MupHla8CLgc+C/bv223pEZ+J2kxMEvSJ0YetP2mFmqaENszJT0deD5wmqR5wErbf9nrWooOetuXA5dL+rztnwBI2gJ4rO17261uQm6Q9Fkeafp4NXBDi/VM1FuAb0taTdX0NJtHZivta7Z/IOkWqjuz5wHHUTXjDETQ295X0vZUd/UHA4sl3WH7uS2XNp7DgYOAQ6j6FgZO/f99F6rP+xyqOcAeaqWWkptuOiR9BXg98CDV1/DtgY/b/kirhTVUD/H7W6o7A6g61M60/Zv2qmqm/sV6FPA1YI96938PyF0lkoaBrYHvUY+86dw0DIK6ma/Tx7CAajrx79o+udXCGpK0l+0ftF3HZEi6gaqp7AqqkU9rWqtlMwn6623vLenVwL7AIuAa23u2XNpmYVDa40cjacj2urbrmCxJX+eRoaHLbf+u5ZImpL7JOYFqlbqHn2mw/detFTVBkra1fX+bNWwuD0w9RtJjgJcCSwfww36gpEsk/Y+k1Z2ftuuagP+U9HZJO0t6Quen7aIa2kLS2ZK+AdXymPXY9IFg+3DgE8Cdg/a5r32JasTWIVR9DDsBv2q1ooYkPUfSTcB/19t7Sfp0G7VsLkF/FnArVQfgdyTNBgbmoReq9uCPUnUKPqvrZ1AcDbyRqslpmKrNdbjVipr7PNXMrTvW2/9D1ecwECS9BLge+I96e29JI6cZ72dPsf0e4D7bXwAO4/dXuOtX/0T1C+pOqPp7eKT5tac2i6C3/Qnbs2z/eT1P/k+BL7Zd1wTcY/sbtu+wfWfnp+2impC0P/BK23OpOtjOoJquetd2K2tspu3zqDvRbG+g6usZFO+lWqf5lwD1g1Jz2yxogjrfQn5Z9zc8HvijFuuZENu3jdjVymen6FE3G2Pbkt4DfKbtWhq6TNJHgAuBhzsxB+DJ2FOAQ4EtJV1CFTjfBhZJ2sf2aW3W19B9kp5I/YSvpGczWN8Gf2f7HunRMwi0VcwkLJa0A/BuqoWMHgu8p92SGrutnsbBddPxm4EftVFI0Z2xda/3qIeAp9reupf1TFbXE7Ld+v4xcEk3AntTjVr5GbCT7XvrpzS/Pwid4fU0FJ8EngH8EBgCjrI9EMNbJZ0NXEo1AOEvgDcBj7H9+lYLa0jSXNs/Hm9fP5I0E/g41TBRAd8E3tzGt/HS7+ifRNVGNnISIVENl+t7kvYA3k8VjL/u2n9oe1U1tqFeUex+Sbd0nl2w/UD9xGlfkzSDaljiC4DdqT43KwesU/PvgHdRfRP8ClV/w/tbrWhivko1Uq7bBcB+LdTSWP3Z+bjtV7ddC5Qf9F+nejjq9yZwkvTt3pczMZLeRNWJ+SPgbElv7noM/zTgG60V18z6rqFlD//DlNTagyMTYftBScfa/hiwou16JqoOm4ts/xlV2A+M+gbn6cDjJb2869D2dA2z7Ff1Z2e2pK1sr2+7nqKD3vZGh8HZflUva5mk1wH72f61pDnABZLm2P44I6Zt7VPP7zwYZbs72B/DI4vJ97srJX0KOBe4r7Oz3/tH4OGweUjS4wdpauXa7lSd938IvKRr/6+o/l0MgtVUn5+lPPqz89FeF1J00Bdgi05zje1bJf0pVdjPZgCCfmNPv9r+BdVsfoOgMx1x9wyKBvq6f6TLr4Eb687w7rDp67li6m+uX5P0nDZme5winakztuDRc231XIK+v/1c0t6dpqf6zv5w4ByqqU9jmtXNHoPswvpnUL1M0grgAapnAfYE3mr7y2Nf1j7b72u7ho6iR90Munou9A22fzbKsb5fOKUEkj4AfNj2L+vtHYC/t/3uditrRtJ2wG/qTvFOu/3WbT+S31TX9CUvo2rKeRvVvDF9P8V4/S3qFSM+O0tsH9LrWjaLB6YGle01o4V8fSwh3xuHdv6hArhaBu7PW6xnoi4Futct/QPgP1uqZTIeU/95GHD+gPU1DI3y2WnlYa8EfcTYZkh6+HmL+hmAgXj+orZN97Dc+vW2LdYzUf8u6b+pRm1dKmkI6PtZW2sPqlqGEoC6b62VJpS00UeM7V+oAuZz9fZrgS+0WM9E3Sdp384oIUn7UbV3DwTbiyR9mGoakAcl3Q8c2XZdDb0LuELS5VSDJ55HS+swpI0+YhySFlI93Qhwie2L26xnIiQ9C1gC/C9V2PwxcLTtgVzMY9DUT8c+u968qh5x1vs6EvQRZavnWdm93hy0J3tjCiToIyIKl87YiOhbki5tsi/Gls7YiDFI2m9ke7akw21/va2aNgf1EoLbAjPr8eedJ8G3B2a1VtgEbGQVtV+10XSWoI8Y22ckvcb2DwEkHUu1wtRABH09zfJI9wA/qRdR6Vd/Q/X/eUeqFck6QX8v8Km2ipqga4GdqWbPFdW8PT+T9HPgdb3sEE8bfcQYJO1KNS3uq6iGx70GOHxQHtyRdBXVNL83UIXNM6hm4nw88Le2v9lieeOS9He2P9l2HZMh6TPABZ1RWpJeTLUmwOeopjDu2ZKICfqIcUh6KvBvVEtQvsz2wIxDl3Qh8B7bK+rt+VQTtP0f4ELbe491fT+oV2maQ1cLhO2+XwpU0o22nzli3w229+xM7dCrWtJ0EzGKenWs7rugJwAzgO9LYhBWx6o9tRPyALZvkrSH7dUjlhfsS5K+BOxGtcB5Z71VMxhrPt8u6R1UzzEAHE01UeEMerweQ4I+YnSHt13AFFkh6UweHTY31dM6DMJ4+gXAfA9m08OrgFOovg0CXFnvmwG8speFpOkmYgz1YuArbP+q3t4eeJrt77dbWTP13DxvAJ5b77oS+DTVfDHbds+D048knQ+8yfbtbdcyyBL0EWOQdB2wb+eOUtIWwLDt0UazxBSTdBnV4i9XU617C4DtI1orqqG6b+ft/H7/Qs8XrUnTTcTY1N1sYPshSQPz70bSgcB7gdk8Omx2baumCXpv2wVsgvOBfwY+yyP9C60YmA9sREtW14u0n1lvv4FqLdBBcTbwVqqx6K2GzWTYvrztGjbBBttnjn/a9MsUCBFjez3wJ8BaYA1wAC1NNTtJ99j+hu07bN/Z+Wm7qKYkPVvSckm/lrRe0oOS7m27rob+XdIbJD1Z0hM6P20Ukjb6iIJJOp1qlMeFPLqN+9rWipoAScPAMVTNIAuoHlh7qu13tlpYA5J+PMput9FslqCPGEPdoXYm8CTbz5C0J3CE7fe3XFojdWfmSG6jQ3AyJA3bXtB50Kjed53tfdqubZAk6CPGUK8O9A/AWZ1wkfRD289ot7LNg6TvUC368lngZ8DtwF/18+Lgkl5o+1uSXj7acdsX9rqmdMZGjG1b21ePeIq0nycDA0DScba/LOltox23/dFe1zRJf0nVl3gSVafyzlTzxfSzFwDfAl4yyjFTNaP1VII+Ymy/kLQb9XQIko6iuqvsd9vVfz6u1So2QT1VwAdsv5rqAa/3tVxSI7ZPqf98bdu1dCToI8b2RmAxsIektcCPgePaLWl8ts+qg/Je2x9ru57JqBcDny1pK9vr266nqY19i+po49tUgj5iDLZXAwdJ2g7YojMVwiCog/JYYCCDvrYauFLSUuC+zs4+b3rqu29R6YyNGEU/3pVNhqSPAY8BzuXRQTkowytPGW2/7YFoxukXCfqIUXQFzO7As4Cl9fZLgKtt933zDQz+8MpBJOkTYx23/aZe1dKRoI8YQz2877Cu2SsfB1xk+/ntVrZ5qH9R/V5I9fMvKknHj3Xc9hd6VUtH2ugjxvYkoLsjcH29byBIehLwAWBH24fWK0w9x/bZLZfW1Nu7Xm9DNbSyr4e3thHk40nQR4zti8DVkv613n4p8Pn2ypmwz1OtUfquevt/qNrrByLoR1lA+0pJV7dSzARJGgLeAcyn+iUFtPNtJJOaRYzB9mnAa4G765/X2v5gu1VNyEzb51EvXWd7AwM0i2X3ZGCSZko6hGph80HwL8CPgLlUzwDcCixvo5Dc0UeMox6hMhCjVEZxn6Qn8sgDX88G7mm3pAm5hqp2UTXZ/Bg4odWKmnui7bMlvbmebvlySQn6iJhyb6MaMbSbpCuBIeCodktqzvbctmvYBJ01eW+XdBjwv1SLzPdcRt1EFK5eEWt3qrvilbYHYVHwh0l6Br/fzv3F9ipqRtLhwHep5uf5JLA98D7bS8e8cDpqSdBHlEvStlR39bNtv07SPGB3219vubRG6ucZ/pQq6JcBhwJX2B6YbyX9IEEfUTBJ51K1c7+mnk9/W+B7tvduubRGJN0I7AVcZ3uverjol20f3HJp45L0OUZ/BuCve11L2ugjyrab7aPrOW+wfb9GzLnc5x6oF2TfIGl74A6qppBB0P2taRvgZVTt9D2XoI8o23pJf8Ajo252o2tJwQEwLOkPgc9QfTP5NfBf7ZbUjO2vdm9L+n/AFW3UkqabiIJJOhh4N1Ub9zeBA6lWaPp2m3VNhqQ5wPa2b2i5lEmRtDvV9BlP6fnfnaCPKFs9jv7ZVKNurrL9i5ZLakzSpbZfNN6+fiTpVzy6jf5nwDtH3un3QppuIsq3DdVTvVsC8yVh+zst1zQmSdsA2wIzJe1A9UsKqiGKs1orbAJs98289An6iIJJ+hBwNLCCehoEqrvMvg564G+AtwA7UrXNd4L+XuBTbRU1Ef30bSRNNxEFk7QS2NP2IHXAPkzS39n+ZNt1TETXt5HLqJ4B6P428h+29+h1TbmjjyjbaqoVpgYy6IGfSXqc7V9JejewL/D+Pl8hq+++jeSOPqJgkr5K9cDRpXSFfRurHE2GpBts7ynpucD7gY8AJ9s+oOXSxtVP30ZyRx9RtqU8sgziIOpMqXwYsNj2RZLe32ZBE/CQpD+0/UuAulP5WNuf7nUhuaOPiL4l6evAWuBgqmabB6jW7N2r1cIakHT9yKkmJF1ne59e15I7+ogCSTrP9ivruWJGm29lzxbKmoxXAguB/2v7l5KeDPxDyzU1NUOSXN9NS5oBbNVGIQn6iDK9uf7z8Far2ET13Dx3AM8FbqZafOTmdqtq7D+AcyWdVW//Tb2v59J0ExF9q56meAHV1MpPlbQjcL7tA1subVyStqAK9864+UuAz9ru+VKOCfqIgo3yGD5USwkOA39ve3Xvq2pO0vXAPsC1nbbtzkicdisbLGm6iSjbPwFrgK9Qjec+BtiNag3cc6ge6Oln621bUqede7u2CxpPP/aP5I4+omCSfjByhEpnNMhox/qNpLcD86hG3XwQ+GvgK/0yPn00kp5s+3ZJs0c7bvsnva4pd/QRZbtf0iuBC+rto4Df1K/7+i6vXiDlXGAPqqdKd6d6WOqSVgsbh+3b65d/ASyx3cpiI91yRx9RMEm7Ah8HnkMV7FcBb6Uam76f7VYWwmhK0o22n9l2HZNRdyS/EriL6hfW+bZ/3kotCfqI6FeSvgB8yvbytmuZLEl7Us0g+hfAGtsH9bqGNN1EFEzSEPA6YA5d/97bWKB6kg4AXi3pJ8B9VB3KHrBRN3dQLTpyJ/BHbRSQoI8o29eA7wL/ySPzxgySQ9ouYElbtOkAAAD0SURBVLIkvYGq6WYIOB94ne2b2qglQR9Rtm1tv6PtIiarjREqU2hn4C22r2+7kLTRRxSsnunxe7aXtV1LtCdBH1Gw+snY7YD19U+njXv7VguLnkrQR0QUbou2C4iI6aPKcZLeU2/vLGn/tuuK3sodfUTBJJ0JPAS80PbT6lWOvmn7WS2XFj2UUTcRZTvA9r6SrgOwfbekVha/iPak6SaibL+rVzbqzP44RHWHH5uRBH1E2T4B/CvwR5JOA64APtBuSdFraaOPKJykPahWORJwqe0ftVxS9FiCPiKicGm6iYgoXII+IqJwCfqIiMIl6CMiCvf/AUvgIrYuXNhKAAAAAElFTkSuQmCC\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + } + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "T-aiM7OfJtaM" + }, + "source": [ + "In the next lesson, we will see why it is often more useful to plot the relative frequencies (i.e., probabilities) rather than the frequencies (i.e., counts)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "O1R6vBYwJtaN" + }, + "source": [ + "## Transforming Categorical Variables\n", + "\n", + "A categorical variable can be transformed by mapping its levels to new levels. For example, we may only be interested in whether a person on the titanic was a passenger or a crew member. The variable `class` is too detailed. We can create a new variable, `type`, that is derived from the existing variable `class`. Observations with a `class` of \"1st\", \"2nd\", or \"3rd\" get a value of \"passenger\", while observations with a `class` of \"deck crew\", \"engineering crew\", or \"deck crew\" get a value of \"crew\"." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "gsA4gHKhJtaN", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 424 + }, + "outputId": "20317968-2946-4ea2-f708-3d37fc2147a9" + }, + "source": [ + "df_titanic[\"type\"] = df_titanic[\"class\"].map({\n", + " \"1st\": \"passenger\",\n", + " \"2nd\": \"passenger\",\n", + " \"3rd\": \"passenger\",\n", + " \"victualling crew\": \"crew\",\n", + " \"engineering crew\": \"crew\",\n", + " \"deck crew\": \"crew\"\n", + "})\n", + "\n", + "df_titanic" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/html": [ + "\n", + "
\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
namegenderageclassembarkedcountryticketnofaresurvivedtype
0Abbing, Mr. Anthonymale42.03rdSUnited States5547.07.110passenger
1Abbott, Mr. Eugene Josephmale13.03rdSUnited States2673.020.050passenger
2Abbott, Mr. Rossmore Edwardmale16.03rdSUnited States2673.020.050passenger
3Abbott, Mrs. Rhoda Mary 'Rosa'female39.03rdSEngland2673.020.051passenger
4Abelseth, Miss. Karen Mariefemale16.03rdSNorway348125.07.131passenger
.................................
2202Wynn, Mr. Waltermale41.0deck crewBEnglandNaNNaN1crew
2203Yearsley, Mr. Harrymale40.0victualling crewSEnglandNaNNaN1crew
2204Young, Mr. Francis Jamesmale32.0engineering crewSEnglandNaNNaN0crew
2205Zanetti, Sig. Miniomale20.0restaurant staffSEnglandNaNNaN0NaN
2206Zarracchi, Sig. L.male26.0restaurant staffSEnglandNaNNaN0NaN
\n", + "

2207 rows × 10 columns

\n", + "
\n", + " \n", + " \n", + " \n", + "\n", + " \n", + "
\n", + "
\n", + " " + ], + "text/plain": [ + " name gender age ... fare survived type\n", + "0 Abbing, Mr. Anthony male 42.0 ... 7.11 0 passenger\n", + "1 Abbott, Mr. Eugene Joseph male 13.0 ... 20.05 0 passenger\n", + "2 Abbott, Mr. Rossmore Edward male 16.0 ... 20.05 0 passenger\n", + "3 Abbott, Mrs. Rhoda Mary 'Rosa' female 39.0 ... 20.05 1 passenger\n", + "4 Abelseth, Miss. Karen Marie female 16.0 ... 7.13 1 passenger\n", + "... ... ... ... ... ... ... ...\n", + "2202 Wynn, Mr. Walter male 41.0 ... NaN 1 crew\n", + "2203 Yearsley, Mr. Harry male 40.0 ... NaN 1 crew\n", + "2204 Young, Mr. Francis James male 32.0 ... NaN 0 crew\n", + "2205 Zanetti, Sig. Minio male 20.0 ... NaN 0 NaN\n", + "2206 Zarracchi, Sig. L. male 26.0 ... NaN 0 NaN\n", + "\n", + "[2207 rows x 10 columns]" + ] + }, + "metadata": {}, + "execution_count": 236 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "u082T2xbJtaP" + }, + "source": [ + "Upon closer inspection of this `DataFrame`, we see that we accidentally left out the level \"restaurant staff\" in the input to `.map()`. Any levels that are unspecified will be mapped to the missing value `NaN`.\n", + "\n", + "This suggests a more concise way to define the new variable `type`. We can specify only the levels for passengers in the mapping and then fill in the missing values afterwards." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "fL2fQ8O8JtaQ", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 424 + }, + "outputId": "6650f480-1e00-42d4-8652-4fc986d96a37" + }, + "source": [ + "df_titanic[\"type\"] = df_titanic[\"class\"].map({\n", + " \"1st\": \"passenger\",\n", + " \"2nd\": \"passenger\",\n", + " \"3rd\": \"passenger\"\n", + "})\n", + "\n", + "# Replace all missing values by \"crew\"\n", + "df_titanic[\"type\"].fillna(\"crew\", inplace=True)\n", + "\n", + "df_titanic" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/html": [ + "\n", + "
\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
namegenderageclassembarkedcountryticketnofaresurvivedtype
0Abbing, Mr. Anthonymale42.03rdSUnited States5547.07.110passenger
1Abbott, Mr. Eugene Josephmale13.03rdSUnited States2673.020.050passenger
2Abbott, Mr. Rossmore Edwardmale16.03rdSUnited States2673.020.050passenger
3Abbott, Mrs. Rhoda Mary 'Rosa'female39.03rdSEngland2673.020.051passenger
4Abelseth, Miss. Karen Mariefemale16.03rdSNorway348125.07.131passenger
.................................
2202Wynn, Mr. Waltermale41.0deck crewBEnglandNaNNaN1crew
2203Yearsley, Mr. Harrymale40.0victualling crewSEnglandNaNNaN1crew
2204Young, Mr. Francis Jamesmale32.0engineering crewSEnglandNaNNaN0crew
2205Zanetti, Sig. Miniomale20.0restaurant staffSEnglandNaNNaN0crew
2206Zarracchi, Sig. L.male26.0restaurant staffSEnglandNaNNaN0crew
\n", + "

2207 rows × 10 columns

\n", + "
\n", + " \n", + " \n", + " \n", + "\n", + " \n", + "
\n", + "
\n", + " " + ], + "text/plain": [ + " name gender age ... fare survived type\n", + "0 Abbing, Mr. Anthony male 42.0 ... 7.11 0 passenger\n", + "1 Abbott, Mr. Eugene Joseph male 13.0 ... 20.05 0 passenger\n", + "2 Abbott, Mr. Rossmore Edward male 16.0 ... 20.05 0 passenger\n", + "3 Abbott, Mrs. Rhoda Mary 'Rosa' female 39.0 ... 20.05 1 passenger\n", + "4 Abelseth, Miss. Karen Marie female 16.0 ... 7.13 1 passenger\n", + "... ... ... ... ... ... ... ...\n", + "2202 Wynn, Mr. Walter male 41.0 ... NaN 1 crew\n", + "2203 Yearsley, Mr. Harry male 40.0 ... NaN 1 crew\n", + "2204 Young, Mr. Francis James male 32.0 ... NaN 0 crew\n", + "2205 Zanetti, Sig. Minio male 20.0 ... NaN 0 crew\n", + "2206 Zarracchi, Sig. L. male 26.0 ... NaN 0 crew\n", + "\n", + "[2207 rows x 10 columns]" + ] + }, + "metadata": {}, + "execution_count": 237 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "scGxkqKfXVk-" + }, + "source": [ + "For more complex mappings, the `.map()` method also accepts a function. So the above " + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "q2LwDPWMXsXH", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "f2477ce4-7a5c-4066-f394-6da72ba58751" + }, + "source": [ + "def class_to_type(c):\n", + " if c in [\"1st\", \"2nd\", \"3rd\"]:\n", + " return \"passenger\"\n", + " else:\n", + " return \"crew\"\n", + "\n", + "df_titanic[\"class\"].map(class_to_type)" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "0 passenger\n", + "1 passenger\n", + "2 passenger\n", + "3 passenger\n", + "4 passenger\n", + " ... \n", + "2202 crew\n", + "2203 crew\n", + "2204 crew\n", + "2205 crew\n", + "2206 crew\n", + "Name: class, Length: 2207, dtype: object" + ] + }, + "metadata": {}, + "execution_count": 238 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "oNImjvbGJtaS" + }, + "source": [ + "We can apply the techniques we learned above to calculate the _distribution_ of this new variable, which only has two levels." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "TEGiAcFAJtaT", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "c4c4291e-2a50-40ae-a3c4-017080c834fc" + }, + "source": [ + "df_titanic[\"type\"].value_counts(normalize=True)" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "passenger 0.596738\n", + "crew 0.403262\n", + "Name: type, dtype: float64" + ] + }, + "metadata": {}, + "execution_count": 239 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "yF-G7EIXJtaV" + }, + "source": [ + "## Conditional Probabilities\n", + "\n", + "What fraction of males were crew members? To answer questions like this, we have to filter the `DataFrame` to include only males.\n", + "\n", + "The standard way to filter a `DataFrame` is to use a **boolean mask**. A boolean mask is simply a `Series` of booleans whose index matches the index of the `DataFrame`.\n", + "\n", + "The easiest way to create a boolean mask is to use one of the standard comparison operators `==`, `<`, `>`, and `!=` on an existing column in the `DataFrame`. For example, the following code produces a boolean mask that is equal to `True` for the males on the Titanic and `False` otherwise." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "LaR02lRMJtaW", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "4740738d-54e3-4819-c227-7f6b67568b0a" + }, + "source": [ + "df_titanic[\"gender\"] == \"male\"" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "0 True\n", + "1 True\n", + "2 True\n", + "3 False\n", + "4 False\n", + " ... \n", + "2202 True\n", + "2203 True\n", + "2204 True\n", + "2205 True\n", + "2206 True\n", + "Name: gender, Length: 2207, dtype: bool" + ] + }, + "metadata": {}, + "execution_count": 240 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "fPlvq_-mJtad" + }, + "source": [ + "Notice the subtle way the equality operator `==` is being used here! We are comparing an array with a string, i.e.,\n", + "\n", + "\\begin{align}\n", + "& \\begin{bmatrix} \\text{\"male\"} \\\\ \\text{\"male\"} \\\\ \\text{\"male\"} \\\\ \\text{\"female\"} \\\\ \\vdots \\\\ \\text{\"male\"} \\end{bmatrix} & \\text{with} & & \\text{\"male\"}.\n", + "\\end{align}\n", + "\n", + "In most programming languages, this comparison would make no sense. An array of strings is obviously not equal to a string. However, `pandas` automatically applies the equality operator to _each_ element of the array. As a result, we get an entire array (i.e., a `Series`) of booleans.\n", + "\n", + "\\begin{align}\n", + " \\begin{bmatrix} \\text{\"male\"} \\\\ \\text{\"male\"} \\\\ \\text{\"male\"} \\\\ \\text{\"female\"} \\\\ \\vdots \\\\ \\text{\"male\"} \\end{bmatrix} &== \\text{\"male\"} &\\Longrightarrow & & \\begin{bmatrix} \\text{True} \\\\ \\text{True} \\\\ \\text{True} \\\\ \\text{False} \\\\ \\vdots \\\\ \\text{True} \\end{bmatrix}.\n", + "\\end{align}\n", + "\n", + "When an operation is applied to each element of an array, it is said to be **broadcast** over that array.\n", + "\n", + "Now, we can use the boolean mask as a filter on the `DataFrame` to extract the rows where the mask equals `True`." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "1O6x3ZoZJtaf", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 424 + }, + "outputId": "1fbda666-db9f-4384-dfac-58c74bf312da" + }, + "source": [ + "df_male = df_titanic[df_titanic[\"gender\"] == \"male\"]\n", + "df_male" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/html": [ + "\n", + "
\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
namegenderageclassembarkedcountryticketnofaresurvivedtype
0Abbing, Mr. Anthonymale42.03rdSUnited States5547.07.110passenger
1Abbott, Mr. Eugene Josephmale13.03rdSUnited States2673.020.050passenger
2Abbott, Mr. Rossmore Edwardmale16.03rdSUnited States2673.020.050passenger
5Abelseth, Mr. Olaus Jørgensenmale25.03rdSUnited States348122.07.131passenger
6Abelson, Mr. Samuelmale30.02ndCFrance3381.024.000passenger
.................................
2202Wynn, Mr. Waltermale41.0deck crewBEnglandNaNNaN1crew
2203Yearsley, Mr. Harrymale40.0victualling crewSEnglandNaNNaN1crew
2204Young, Mr. Francis Jamesmale32.0engineering crewSEnglandNaNNaN0crew
2205Zanetti, Sig. Miniomale20.0restaurant staffSEnglandNaNNaN0crew
2206Zarracchi, Sig. L.male26.0restaurant staffSEnglandNaNNaN0crew
\n", + "

1718 rows × 10 columns

\n", + "
\n", + " \n", + " \n", + " \n", + "\n", + " \n", + "
\n", + "
\n", + " " + ], + "text/plain": [ + " name gender age ... fare survived type\n", + "0 Abbing, Mr. Anthony male 42.0 ... 7.11 0 passenger\n", + "1 Abbott, Mr. Eugene Joseph male 13.0 ... 20.05 0 passenger\n", + "2 Abbott, Mr. Rossmore Edward male 16.0 ... 20.05 0 passenger\n", + "5 Abelseth, Mr. Olaus Jørgensen male 25.0 ... 7.13 1 passenger\n", + "6 Abelson, Mr. Samuel male 30.0 ... 24.00 0 passenger\n", + "... ... ... ... ... ... ... ...\n", + "2202 Wynn, Mr. Walter male 41.0 ... NaN 1 crew\n", + "2203 Yearsley, Mr. Harry male 40.0 ... NaN 1 crew\n", + "2204 Young, Mr. Francis James male 32.0 ... NaN 0 crew\n", + "2205 Zanetti, Sig. Minio male 20.0 ... NaN 0 crew\n", + "2206 Zarracchi, Sig. L. male 26.0 ... NaN 0 crew\n", + "\n", + "[1718 rows x 10 columns]" + ] + }, + "metadata": {}, + "execution_count": 241 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "1NXzBSg6Jtai" + }, + "source": [ + "Note that every person in this new `DataFrame` is male. If you inspect the index, you will see that rows 4 and 5 are missing. That is because those passengers were female." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "rj6sUAzlJtaj" + }, + "source": [ + "Finally, we can apply the methods we've learned in this lesson to `df_male` to answer the original question: \"What fraction of males were crew members?\"" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "PL5AZdDCJtak", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "44ae33d6-be08-4979-9f82-2433f43dc89a" + }, + "source": [ + "df_male[\"type\"].value_counts(normalize=True)" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "crew 0.504657\n", + "passenger 0.495343\n", + "Name: type, dtype: float64" + ] + }, + "metadata": {}, + "execution_count": 242 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "h8JMRXKrJtam" + }, + "source": [ + "It appears that about 50.4657% of the males were crew members. We can notate this using _conditional probability_ notation:\n", + "\n", + "$$ P(\\text{crew} | \\text{male}) = 0.504657. $$\n", + "\n", + "The bar $|$ is read as \"given\". The information after the bar is the given information. In this case, we were interested in the probability a person was a crew member, \"given\" they were male. That is, after restricting to the male passengers (i.e., using a boolean mask as a filter), we want to know the relative frequency of crew members." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ZGI7SaceJtam" + }, + "source": [ + "We can also filter on multiple criteria. For example, if we want to know the fraction of male _survivors_ who were crew members, we need to combine two boolean masks, one based on the column `gender` and another based on the column `survived`. The two masks can be combined using the logical operator `&`." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "5-OhXQDXJtan", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "00c9ec24-5076-4e28-f2de-91fc5e48d0ba" + }, + "source": [ + "(df_titanic[\"gender\"] == \"male\") & (df_titanic[\"survived\"] == 1)" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "0 False\n", + "1 False\n", + "2 False\n", + "3 False\n", + "4 False\n", + " ... \n", + "2202 True\n", + "2203 True\n", + "2204 False\n", + "2205 False\n", + "2206 False\n", + "Length: 2207, dtype: bool" + ] + }, + "metadata": {}, + "execution_count": 243 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ygJIJbtOJtat" + }, + "source": [ + "Notice that the logical operator was _broadcast_ (that word again!) over the elements of the two `Series`. In other words, the logical operator was applied to each element, producing a `Series` of booleans.\n", + "\n", + "Now we can use this new boolean mask to filter the `DataFrame`, just as we did before." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "KdfY3t-wJtau", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 424 + }, + "outputId": "4f562db4-d0b3-4142-cfc0-3c890416fa0b" + }, + "source": [ + "df_male_survivors = df_titanic[(df_titanic[\"gender\"] == \"male\") & (df_titanic[\"survived\"] == 1)]\n", + "df_male_survivors" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/html": [ + "\n", + "
\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
namegenderageclassembarkedcountryticketnofaresurvivedtype
5Abelseth, Mr. Olaus Jørgensenmale25.0000003rdSUnited States348122.07.13001passenger
8Abī-Al-Munà, Mr. Nāsīf Qāsimmale27.0000003rdCLebanon2699.018.15091passenger
9Abrahamsson, Mr. Abraham August Johannesmale20.0000003rdSFinland3101284.07.18061passenger
13Aks, Master. Frank Philipmale0.8333333rdSEngland392091.09.07001passenger
22Allison, Master. Hudson Trevormale0.9166671stSCanada113781.0151.16001passenger
.................................
2185Windebank, Mr. Alfred Edgarmale38.000000victualling crewSEnglandNaNNaN1crew
2188Witter, Mr. James William Cheethammale31.000000victualling crewSEnglandNaNNaN1crew
2200Wright, Mr. Williammale40.000000victualling crewSEnglandNaNNaN1crew
2202Wynn, Mr. Waltermale41.000000deck crewBEnglandNaNNaN1crew
2203Yearsley, Mr. Harrymale40.000000victualling crewSEnglandNaNNaN1crew
\n", + "

352 rows × 10 columns

\n", + "
\n", + " \n", + " \n", + " \n", + "\n", + " \n", + "
\n", + "
\n", + " " + ], + "text/plain": [ + " name gender ... survived type\n", + "5 Abelseth, Mr. Olaus Jørgensen male ... 1 passenger\n", + "8 Abī-Al-Munà, Mr. Nāsīf Qāsim male ... 1 passenger\n", + "9 Abrahamsson, Mr. Abraham August Johannes male ... 1 passenger\n", + "13 Aks, Master. Frank Philip male ... 1 passenger\n", + "22 Allison, Master. Hudson Trevor male ... 1 passenger\n", + "... ... ... ... ... ...\n", + "2185 Windebank, Mr. Alfred Edgar male ... 1 crew\n", + "2188 Witter, Mr. James William Cheetham male ... 1 crew\n", + "2200 Wright, Mr. William male ... 1 crew\n", + "2202 Wynn, Mr. Walter male ... 1 crew\n", + "2203 Yearsley, Mr. Harry male ... 1 crew\n", + "\n", + "[352 rows x 10 columns]" + ] + }, + "metadata": {}, + "execution_count": 244 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "xabQIG56Jtaw" + }, + "source": [ + "Besides `&`, there are two other logical operators that can be used to modify and combine boolean masks.\n", + "\n", + "- `&` means \"and\"\n", + "- `|` means \"or\"\n", + "- `~` means \"not\"\n", + "\n", + "Like `&`, the operators `|` and `~` are broadcast over the boolean masks." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "tbOe6iaWJtaw" + }, + "source": [ + "## Exercises" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "joCIoBDTJtax" + }, + "source": [ + "_Exercises 1-2 ask you to continue working with the Titanic data set explored in this lesson._" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "1GsCobcZJtax" + }, + "source": [ + "1\\. What proportion of crew members were male? Express your answer using probability notation. How does this proportion differ from the proportion that was calculated in the lesson?" + ] + }, + { + "cell_type": "code", + "source": [ + "import pandas as pd\n", + "\n", + "data_dir = \"http://dlsun.github.io/pods/data/\"\n", + "df_titanic = pd.read_csv(data_dir + \"titanic.csv\")\n", + "\n", + "# generally map crew and passengers\n", + "def class_to_type(c):\n", + " if c in [\"1st\", \"2nd\", \"3rd\"]:\n", + " return \"passenger\"\n", + " else:\n", + " return \"crew\"\n", + "df_titanic[\"class\"] = df_titanic[\"class\"].map(class_to_type)\n", + "\n", + "df_crew = df_titanic[(df_titanic[\"class\"] == \"crew\")]\n", + "df_crew.value_counts(\"gender\", normalize=True)" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "gBn7zNMvu9OD", + "outputId": "d2a4a8d9-11e0-4e99-8d58-bb08bad67080" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "gender\n", + "male 0.974157\n", + "female 0.025843\n", + "dtype: float64" + ] + }, + "metadata": {}, + "execution_count": 245 + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "The probability of being male given being crew: P(male|crew) = 0.974\n", + "\n", + "This differs from P(crew|male) = 0.504, or the probability of being crew given that someone was male.\n", + "\n", + "aka around half the males on the ship were crew, and crew was vastly comprised of males" + ], + "metadata": { + "id": "u_IsbcEMxeys" + } + }, + { + "cell_type": "markdown", + "metadata": { + "id": "YMb6xZ8-Jtay" + }, + "source": [ + "2\\. What is the distribution of gender among passengers on the Titanic? What is the distribution of gender among passengers on the Titanic who survived? Express your answers using probability notation." + ] + }, + { + "cell_type": "code", + "source": [ + "print(df_titanic.value_counts(\"gender\", normalize=True), \"\\n\\n\", \"survivor\", df_titanic[(df_titanic[\"survived\"] == 1)].value_counts(\"gender\", normalize=True))\n" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "-kHQkjR8ykoQ", + "outputId": "64c27b39-0d30-4bd2-c74f-b27eea1c6204" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "gender\n", + "male 0.778432\n", + "female 0.221568\n", + "dtype: float64 \n", + "\n", + " survivor gender\n", + "female 0.504923\n", + "male 0.495077\n", + "dtype: float64\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "***gender:***\n", + "\n", + "male 0.778432\n", + "\n", + "female 0.221568\n", + "\n", + "***survivor gender:***\n", + "\n", + "female 0.504923\n", + "\n", + "male 0.495077\n", + "\n", + "\n" + ], + "metadata": { + "id": "tVQ8UAbCUkDC" + } + }, + { + "cell_type": "markdown", + "metadata": { + "id": "QZ16JR0GJtaz" + }, + "source": [ + "_Exercises 3-5 deal with the OKCupid data set, which consists of user profiles in the San Francisco Bay Area on the dating website OKCupid. This data set is available at the URL https://dlsun.github.io/pods/data/okcupid.csv._" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "jSNtXGq1Jta0" + }, + "source": [ + "3\\. Make a visualization of the distribution of drinking status." + ] + }, + { + "cell_type": "code", + "source": [ + "import pandas as pd\n", + "\n", + "data_dir = \"http://dlsun.github.io/pods/data/\"\n", + "df_okc = pd.read_csv(data_dir + \"okcupid.csv\")\n", + "props = df_okc.value_counts(\"drinks\", normalize=True)\n", + "props.plot.bar()" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 345 + }, + "id": "2yZ50wd1WtcV", + "outputId": "c258a7a9-64f4-4473-e4a4-9256ed8f79af" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "" + ] + }, + "metadata": {}, + "execution_count": 247 + }, + { + "output_type": "display_data", + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXQAAAE3CAYAAAC6r7qRAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAbYklEQVR4nO3dfZxdVX3v8c+3SYMgIF4ZqYVgIg1wEVFhiBbFp0INcgm2+BCu9YoVU1pTaVOtodhgg9dqEbRVXmpQUFs1PPjQ8RJv8CIUtSoZHjXBaIwoiVoCUqGghOD3/rH3hMMwmXNC9pk9s/i+X6/zyuy918z5nVdmvmeftddaW7aJiIip7zfaLiAiIpqRQI+IKEQCPSKiEAn0iIhCJNAjIgoxva0n3nvvvT1r1qy2nj4iYkq67rrr7rA9MNax1gJ91qxZDA8Pt/X0ERFTkqQfbe9YulwiIgqRQI+IKEQCPSKiEAn0iIhCJNAjIgqRQI+IKEQCPSKiEAn0iIhCJNAjIgrR2kzRR2vWkssn9PlufffxE/p8ERGPVs7QIyIKkUCPiChET4EuaZ6kdZLWS1oyxvH3SbqxfnxP0n82X2pERIynax+6pGnA+cCxwEZgtaQh22tH2tj+y472fw48uw+1RkTEOHo5Q58LrLe9wfYWYAVw4jjtTwY+00RxERHRu14CfV/gto7tjfW+R5D0VGA28JXtHF8oaVjS8ObNm3e01oiIGEfTF0UXAJfZfnCsg7aX2x60PTgwMOYNNyIi4lHqJdA3ATM7tver941lAeluiYhoRS+BvhqYI2m2pBlUoT00upGkg4EnAt9otsSIiOhF10C3vRVYBKwCbgEusb1G0jJJ8zuaLgBW2HZ/So2IiPH0NPXf9kpg5ah9S0dtv6O5siIiYkdlpmhERCES6BERhUigR0QUIoEeEVGIBHpERCES6BERhUigR0QUIoEeEVGIBHpERCES6BERhUigR0QUIoEeEVGIBHpERCES6BERhUigR0QUIoEeEVGIBHpERCES6BERhUigR0QUoqdAlzRP0jpJ6yUt2U6bV0laK2mNpE83W2ZERHTT9SbRkqYB5wPHAhuB1ZKGbK/taDMHOAN4nu27JD25XwVHRMTYejlDnwust73B9hZgBXDiqDZvBM63fReA7dubLTMiIrrpJdD3BW7r2N5Y7+t0IHCgpK9L+qakeWP9IEkLJQ1LGt68efOjqzgiIsbU1EXR6cAc4EXAycAFkvYa3cj2ctuDtgcHBgYaeuqIiIDeAn0TMLNje796X6eNwJDtB2z/EPgeVcBHRMQE6SXQVwNzJM2WNANYAAyNavMFqrNzJO1N1QWzocE6IyKii66BbnsrsAhYBdwCXGJ7jaRlkubXzVYBd0paC1wFvNX2nf0qOiIiHqnrsEUA2yuBlaP2Le342sDi+hERES3ITNGIiEIk0CMiCpFAj4goRAI9IqIQCfSIiEIk0CMiCpFAj4goRAI9IqIQCfSIiEIk0CMiCpFAj4goRAI9IqIQCfSIiEIk0CMiCpFAj4goRAI9IqIQCfSIiEIk0CMiCpFAj4goRAI9IqIQPQW6pHmS1klaL2nJGMdPkbRZ0o3149TmS42IiPFM79ZA0jTgfOBYYCOwWtKQ7bWjml5se1EfaoyIiB70coY+F1hve4PtLcAK4MT+lhURETuql0DfF7itY3tjvW+0kyTdLOkySTPH+kGSFkoaljS8efPmR1FuRERsT1MXRb8IzLJ9GPBl4BNjNbK93Pag7cGBgYGGnjoiIqC3QN8EdJ5x71fv28b2nbbvrzc/ChzRTHkREdGrXgJ9NTBH0mxJM4AFwFBnA0lP6dicD9zSXIkREdGLrqNcbG+VtAhYBUwDLrS9RtIyYNj2EPBmSfOBrcDPgVP6WHNERIyha6AD2F4JrBy1b2nH12cAZzRbWkRE7IjMFI2IKEQCPSKiEAn0iIhCJNAjIgqRQI+IKEQCPSKiEAn0iIhCJNAjIgqRQI+IKEQCPSKiEAn0iIhCJNAjIgqRQI+IKEQCPSKiEAn0iIhCJNAjIgqRQI+IKEQCPSKiEAn0iIhCJNAjIgrRU6BLmidpnaT1kpaM0+4kSZY02FyJERHRi66BLmkacD5wHHAIcLKkQ8ZotwdwOvCtpouMiIjuejlDnwust73B9hZgBXDiGO3OBt4D/KrB+iIioke9BPq+wG0d2xvrfdtIOhyYafvy8X6QpIWShiUNb968eYeLjYiI7dvpi6KSfgM4D/irbm1tL7c9aHtwYGBgZ586IiI69BLom4CZHdv71ftG7AEcClwt6VbgucBQLoxGREysXgJ9NTBH0mxJM4AFwNDIQdu/sL237Vm2ZwHfBObbHu5LxRERMaaugW57K7AIWAXcAlxie42kZZLm97vAiIjozfReGtleCawctW/pdtq+aOfLioiIHZWZohERhUigR0QUIoEeEVGIBHpERCES6BERhUigR0QUIoEeEVGIBHpERCES6BERhUigR0QUIoEeEVGIBHpERCES6BERhUigR0QUIoEeEVGIBHpERCES6BERhUigR0QUIoEeEVGIBHpERCF6CnRJ8yStk7Re0pIxjp8m6duSbpT0NUmHNF9qRESMp2ugS5oGnA8cBxwCnDxGYH/a9jNsPwv4B+C8xiuNiIhx9XKGPhdYb3uD7S3ACuDEzga27+7YfDzg5kqMiIheTO+hzb7AbR3bG4HnjG4k6U3AYmAG8JKxfpCkhcBCgP33339Ha42IiHE0dlHU9vm2DwDeBrx9O22W2x60PTgwMNDUU0dEBL0F+iZgZsf2fvW+7VkBvHxnioqIiB3XS6CvBuZImi1pBrAAGOpsIGlOx+bxwPebKzEiInrRtQ/d9lZJi4BVwDTgQttrJC0Dhm0PAYskHQM8ANwFvK6fRUdExCP1clEU2yuBlaP2Le34+vSG64qIiB2UmaIREYVIoEdEFCKBHhFRiAR6REQhEugREYVIoEdEFCKBHhFRiAR6REQhEugREYVIoEdEFCKBHhFRiAR6REQhEugREYVIoEdEFCKBHhFRiAR6REQhEugREYVIoEdEFCKBHhFRiJ4CXdI8SeskrZe0ZIzjiyWtlXSzpCslPbX5UiMiYjxdA13SNOB84DjgEOBkSYeManYDMGj7MOAy4B+aLjQiIsbXyxn6XGC97Q22twArgBM7G9i+yvZ99eY3gf2aLTMiIrrpJdD3BW7r2N5Y79ueNwBfGuuApIWShiUNb968ufcqIyKiq0Yvikr6I2AQOGes47aX2x60PTgwMNDkU0dEPOZN76HNJmBmx/Z+9b6HkXQMcCbwQtv3N1NeRET0qpcz9NXAHEmzJc0AFgBDnQ0kPRv4CDDf9u3NlxkREd10DXTbW4FFwCrgFuAS22skLZM0v252DrA7cKmkGyUNbefHRUREn/TS5YLtlcDKUfuWdnx9TMN1RUTEDspM0YiIQiTQIyIKkUCPiChEAj0iohAJ9IiIQiTQIyIKkUCPiChEAj0iohAJ9IiIQiTQIyIKkUCPiChEAj0iohAJ9IiIQiTQIyIKkUCPiChEAj0iohAJ9IiIQiTQIyIKkUCPiChEAj0iohA9BbqkeZLWSVovackYx18g6XpJWyW9ovkyIyKim+ndGkiaBpwPHAtsBFZLGrK9tqPZj4FTgLf0o8jHkllLLp/Q57v13cdP6PNFRP90DXRgLrDe9gYASSuAE4FtgW771vrYr/tQY0RE9KCXQN8XuK1jeyPwnEfzZJIWAgsB9t9//0fzI2KKyyeQiP6Z0IuitpfbHrQ9ODAwMJFPHRFRvF4CfRMws2N7v3pfRERMIr0E+mpgjqTZkmYAC4Ch/pYVERE7qmug294KLAJWAbcAl9heI2mZpPkAko6UtBF4JfARSWv6WXRERDxSLxdFsb0SWDlq39KOr1dTdcVERERLMlM0IqIQCfSIiEIk0CMiCpFAj4goRAI9IqIQCfSIiEL0NGwxIrrLOjXRtpyhR0QUIoEeEVGIBHpERCES6BERhUigR0QUIoEeEVGIBHpERCES6BERhUigR0QUIoEeEVGIBHpERCES6BERhUigR0QUoqfVFiXNA/4RmAZ81Pa7Rx3fBfgkcARwJ/Bq27c2W2pEtCmrSU5+XQNd0jTgfOBYYCOwWtKQ7bUdzd4A3GX7dyQtAN4DvLofBUdE9EMJb1i9dLnMBdbb3mB7C7ACOHFUmxOBT9RfXwb8niQ1V2ZERHQj2+M3kF4BzLN9ar39WuA5thd1tPlO3WZjvf2Dus0do37WQmBhvXkQsK6pF9KDvYE7uraauvL6pq6SXxvk9TXtqbYHxjowoXcssr0cWD6RzzlC0rDtwTaeeyLk9U1dJb82yOubSL10uWwCZnZs71fvG7ONpOnAE6gujkZExATpJdBXA3MkzZY0A1gADI1qMwS8rv76FcBX3K0vJyIiGtW1y8X2VkmLgFVUwxYvtL1G0jJg2PYQ8DHgnyWtB35OFfqTTStdPRMor2/qKvm1QV7fhOl6UTQiIqaGzBSNiChEAj0iohAJ9IiIQkzoOPSJJmma7QfbriMCQNLi8Y7bPm+iaum3esmQfejIGNs/bq+ix4bSz9C/L+kcSYe0XUg/SDpX0tPbriN6tkeXRxEk/TnwH8CXgcvrx/9ptagGSfqcpOMlTbr8LHqUi6Q9qIZQvp7qzetCYIXtu1strCGSTqV6bdOBi4DP2P5Fu1U1Q9IfUi3y9mRA9cO292y1sOiqHr78HNtFTi6UdAzV391zgUuBi2xP5DIm21V0oHeS9ELg08BeVAuInW17fbtVNUPSQVS/YCcDXwcusH1Vu1XtnDoUTrB9S9u1NEXSP4133PabJ6qWfpJ0FXCs7a1t19JPkp5A9Td3JnAbcAHwL7YfaKum4vvQgeOpwm4WcC7wKeBoYCVwYGvFNaR+jQfXjzuAm4DFkv7E9mSc4NWr/ygpzGvXtV3ABNkAXC3pcuD+kZ2FXSN4EvBHwGuBG6hy5flUM+Zf1FZdRQc68H3gKuAc2//esf8ySS9oqabGSHofcAJwJfAu29fWh94jaVJ8BNwJw5IuBr7Aw0Phc+2VtHNsf6J7qyL8uH7MqB9FkfR5qtVi/5nqU+RP60MXSxpur7LCu1wk7W77v9quo18kvR64xPa9Yxx7wlTuT5d00Ri7bfuPJ7yYhkkaAN4GHAI8bmS/7Ze0VlQfSNrN9n1t19E0SS+erF2aRQa6pA8A231hU72vUtLh4x23ff1E1RI7TtIVwMXAW4DTqD6mb7b9tlYLa4ik36Va32l32/tLeibwJ7b/rOXSdkp9oX67JsOnx1K7XFr92DMBzh3nmIEpf6Yn6UDgQ8A+tg+VdBgw3/Y7Wy6tCU+y/TFJp9v+N+DfJK1uu6gGvR94KfWqrLZvKqGLk6p7c3sMJND7ofS+StsvbruGCXAB8FbgIwC2b5b0aaCEQB8ZBfFTSccDPwH+W4v1NM72baPuQjnlJ/jZfn3bNXRTZKBL+iLjd7nMn8By+kbSbsBiYH/bCyXNAQ6yXcIkjt1sXzsqFEoZBvfOesjbXwEfAPYE/rLdkhp1m6SjAEv6TeB0oJgRS5L2Ad4F/Lbt4+qJi79r+2Mtl1ZmoAPvbbuACXIR1VC4o+rtTVQTHUoI9DskHUD9xlzf2/an43/L1NDxhvsLoMRPW6cB/wjsS/U7eQUwpfvPR/k41d/emfX296iuiSTQ+6Hul3wsOMD2qyWdDGD7Po06pZ3C3kR144CDJW0Cfgi8pt2SokcH2X7Y/5Wk51FNeivB3rYvkXQGbLsJ0KToUpp0axE0SdIcSZdJWitpw8ij7boatEXSrjx0FnsAHWO2pzjbPgYYAA62/XwK/30tyAd63DdV3VtPLBr5u3su1aet1hV5ht7hIuAs4H1UH21H1nQpxVnA/wVmSvoU8DzglFYras5ngcNHjbG/DDiipXoaI2m27R922zfV1MMVjwIGRq0suSfV7StLsZhqBM8Bkr5OddLxynZLqpQe6LvavlKSbP8IeIek64ClbRe2s+qV3p4I/CHVIkECTrd9R6uF7SRJBwNPB54watzvnnRMwpniPguMnktQwpvVbwK7U+VK5+qRd1PdPL4Ua4AXUs0WFbCOSXKiWHqg318H3/frG11vovqFm/Js/1rSX9u+hGp50lIcBPwPqkXUOsf93gO8sZWKGvIYeLM6y/bvSXq67b9ru5g++obtw6mCHQBJ1/PIN+kJV3qgnw7sBrwZOJtqws3rWq2oWf9P0luorrBv65qw/fP2Stppr7D9Wkl/Y/tdbRfTsGLfrGpPqYcrPkPSs6nOXreZ6jOYJf0W1cidXUe9vj2pcqZ1RU79f6yQNFafq20/bcKLaYiktcAxwJeoVq0bHQpT+c0KqPqabX+j7TqaVg8tfQPVqoOjZ2t7qq9VI+l1VNeoBnn467sH+PhkmPpfZKBLer/tv9jeBKNSJhaVSNKbgT8FnkbVRbbtEFP8zWqEpMdRBd/TefjiXFN+4TEASX9r++y26+gXSSfZ/mzbdYyl1EA/wvZ19U0tHqGkceqSDuWRq/Z9sr2KmiHpQ8CHgZE1QK6xfVOLJTVG0qXAd4H/CSyjGl9/i+3TWy2sQZLm89D/3dWFzF7epl6yYfQb8rL2KqoUGegjJD0e+KXtX9fb04BdSlnSU9JZVN0Sh1DdsOM44Gu2p/yIAkmnA6dSLXgk4OVUd2Ka8uOZJd1g+9mSbrZ9WD09/qu2n9t2bU2Q9PfAXKqbPkB1V5/Vtv+mvaqaI+nDVH3mLwY+SjWC51rbb2i1MMoP9G8Cx4ysiS5pd+AK20eN/51Tg6RvA88EbrD9zHqNiX+xfWzLpe00STdTrY9xb739eKrRBYe1W9nOk3St7bmSrqGaEv8zqkCY8t1JsO3/7lmjTqRuKOH/DqrXV78Rj/y7O/Al20e3XdukGDvZR4/rvMFF/fWkuBrdkF/VfzRbJe0J3A7MbLmmpoiHr9D3IKMukE5hyyU9EXg71QSVtVQ3xC7JXh1fP6G1KvrjV/W/90n6barVM5/SYj3blD5s8V5Jh48Ml5J0BPDLlmtqRL1my82S9qJaavY64L+AUkZPXAR8q77dF1RdLq0vftQE2x+tv7yG6uJvaf4euEHVzaJF1Ze+pN2SGvXF+u/uHOB6qoEXF7RbUqX0LpcjgRVU600L+C3g1baLuFmvpG/bfkb99SxgT9s3t1pUg+o7Mz2/3vyq7RvarCd6J+kpwJH15rW2f9ZmPU2pJyo+d+QexZJ2oeoJmBRruRQd6AD1BaeD6s11th8Yr/1UIukTwAdtl3S3m4hJbeSidtt1jKXoQK/D/E/pGD4FfKSUUJf0XeB3gB9RzRQdGatdxMWnUknaxfb93fbF5CTpvVRdm5/zJAvQ0gP9o1QLBo3cku61wIO2T22vquZIeupY++uFyGKSknR9vRbIuPticpJ0D/B4qgv1v+ShE6k9Wy2M8i+KHmn7mR3bX5FUxOQUSHBPNVNhLZAmSDoXuND2mq6NpyDbe3Rv1Y7SA/1BSQfY/gGApKdRwM1qY8p6KdVaIPsB53XsvwcoYtJN7RaqoZnTqUYrfWayXDRsQj3C7DXAbNtnS5oJPMX2tS2XVnyXy0uo7v83cpeiWcDrbV/VVk0Rk3ktkCZJOojqpjInU91+7oIS/vbqZSl+DbzE9n+v5xRcYfvILt/ad6VPLHoScCjV8rlfoTpzKOZMIaasKyWdJ2m4fpwrqajJN/Xs0IPrxx3ATcBiSStaLawZz7H9JuoJRrbvAma0W1Kl9ED/W9t3U/VRvhj4IPChdkuK4GNU3Syvqh93U3VNFEHS+6gWH3sZ8C7bR9h+j+0TgEk53G8HPVC/YY3cU3SA6oy9dcX3odf/Hk/1ce9ySe9ss6AI4ADbJ3Vs/52kG1urpkF1//LPqdZyuXeMJnMnuKR++Cfg88A+kv431eJcb2+3pErpZ+ibJH0EeDWwsp7VVfprjsnvl5JGZsAi6XkUsiRFPS77VdsJc0q4OGr7U8BfA++imoX+ctuXtltVpfQz9FcB84D32v7PejryW1uuKeI04JN1v/nIGe0prVbUrOslHVn4DObdgJFul11brmWboke5RExm9QqZ1Nd5ilH6DGZJS4FXAp/lobX6L7XdenduAj1igtVdfydRDaPd9il5Mtzxpgmlz2CWtA54pu1f1du7AjfaPmj87+y/9CdHTLx/BU4EtlKdwY48ilAH90yqcdo/Au6jrKz5CR23ngN24eH3v21NztAjJpik79g+tO06+qW+NeIgcJDtA+ubQFxq+3ktl9YISV+gWhr4y1R96McC1wIbAWy/ua3aSr8oGjEZ/bukZ9j+dtuF9MkfUI03vx7A9k8kTdr1Tx6Fz9ePEVe3VMcjJNAjJt7zgVMk/RC4n8IuGgJbbFvSyMSbx7ddUJNsj6zeSj3tf+ZkubFMAj1i4h3XdgF9dkk9/2MvSW8E/phJcou2Jki6GphPlZ/XAbdL+rrtxa0WRvrQI6IPJB0L/D7Vp49Vtr/cckmNGbljkaRTqc7Oz5J082T4hJUz9IholKTFwMUlhfgo0+tJiq8Czmy7mE4lDSWKiMlhD+AKSV+VtEjSPm0X1LBlwCrgB7ZX1/dZ+H7LNQHpcomIPpF0GNU6SicBG20f03JJxcsZekT0y+3Az4A7gSe3XEtjJB0o6UpJ36m3D5OU1RYjojyS/qweCXIl1U1m3jgZLhg26ALgDOABgHrI4oJWK6rlomhENG0m8Be2i1jjfQy72b62Wvp9m61tFdMpgR4RjbJ9Rts19Nkdkg7goTsWvQL4abslVXJRNCJiB9SjWpYDRwF3AT8EXjMZVpNMoEdE9KAeX99pV6rrkPcC2D5vwosaJV0uERG9GVlg7CCq1Rb/lWom7GupVltsXc7QIyJ2gKRrgONt31Nv7wFcbvsF7VaWYYsRETtqH2BLx/aWel/r0uUSEbFjPglcK2lkTfSXAx9vr5yHpMslImIHSTocOLrevMb2DW3WMyKBHhFRiPShR0QUIoEeEVGIBHo8Zkh6h6S3jLH/NEn/q8v3niLpg/2rLmLnZZRLPKZJmm77w23XEdGEnKFH0SSdKel7kr5GNcMPSVdLer+kYeD0zjP3+th7JF1bf9/RY/zM4yV9Q9Lekl4p6TuSbqonnES0JmfoUSxJR1CtU/0sqt/166nu0g4ww/Zg3e4do751uu25kl4GnAVsu9OOpD8AFgMvs32XpKXAS21vkrRXX19QRBcJ9CjZ0cDnbd8HIGmo49jF43zf5+p/rwNmdex/CTAI/L7tu+t9Xwc+LumSju+LaEW6XOKx6t5xjt1f//sgDz/p+QHVAk0HjuywfRrwdqqbOlwn6UkN1xnRswR6lOwa4OWSdq0XUDphJ3/ej6huePxJSU8HkHSA7W/ZXgpspgr2iFakyyWKZft6SRcDN1HdsHh1Az/zu5JeA1wq6QTgHElzqJZRvbJ+rohWZOp/REQh0uUSEVGIBHpERCES6BERhUigR0QUIoEeEVGIBHpERCES6BERhfj/gPJF6jHX0ZMAAAAASUVORK5CYII=\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + } + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "YuMsp3GLJta1" + }, + "source": [ + "4\\. Create a new variable that indicates whether the user has, likes, or dislikes cats. Visualize the distribution of this variable." + ] + }, + { + "cell_type": "code", + "source": [ + "def cat_status(c):\n", + " c = str(c)\n", + " if \"has cats\" in c:\n", + " return \"has cats\"\n", + " elif \"dislikes cats\" in c:\n", + " return \"dislikes cats\" \n", + " elif \"likes cats\" in c:\n", + " return \"likes cats\"\n", + " \n", + "df_okc[\"pets\"] = df_okc[\"pets\"].map(cat_status)\n", + "props = df_okc.value_counts(\"pets\", normalize=True)\n", + "props.plot.bar()" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 347 + }, + "id": "s7QTNYPrX6O9", + "outputId": "41a8ac48-55da-4bba-9fe9-607c0138d8d9" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "" + ] + }, + "metadata": {}, + "execution_count": 248 + }, + { + "output_type": "display_data", + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXQAAAE5CAYAAACApdvhAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAVUElEQVR4nO3df9TedX3f8efLBPyBBJzc7WwIJkq0S5m6GrG2q7VqaSiaqFAHbXeksmbdMcOOlrO4KvakXQ/gjjtnNNuM83dlFOnmUkgXOah1paXNjWW4gKlZwBLOPEbkV7UaIu/9cX1vubi5k/u6kyv55v7cz8c59+H6fL+f5HqFC175Xt+fqSokSfPf0/oOIEkaDwtdkhphoUtSIyx0SWqEhS5JjVjc1xufdtpptXz58r7eXpLmpdtvv/0bVTUx07reCn358uVMTk729faSNC8l+erB1rnLRZIaYaFLUiMsdElqhIUuSY2w0CWpERa6JDXCQpekRljoktQIC12SGtHblaLH2vKNN/Ud4ai698rz+o4gqWduoUtSIyx0SWqEhS5JjbDQJakRFrokNcJCl6RGWOiS1AgLXZIaYaFLUiMsdElqhIUuSY2w0CWpESMVepI1SXYl2Z1k40HmvDXJXUl2Jrl2vDElSbOZ9W6LSRYBm4GfAfYCO5Jsraq7huasBN4F/ERVPZjkB45WYEnSzEbZQj8b2F1Ve6pqP3AdsG7anF8BNlfVgwBV9fXxxpQkzWaUQl8K3Dc03tstG/Yi4EVJbk1yW5I1M/1GSdYnmUwyuW/fvsNLLEma0bgOii4GVgKvAS4CPpjk1OmTqmpLVa2uqtUTExNjemtJEoxW6PcDy4bGp3fLhu0FtlbVY1V1D/DXDApeknSMjFLoO4CVSVYkORG4ENg6bc6nGWydk+Q0Brtg9owxpyRpFrMWelUdADYA24G7geurameSTUnWdtO2Aw8kuQv4HHB5VT1wtEJLkp5qpIdEV9U2YNu0ZVcMvS7gsu5HktQDrxSVpEZY6JLUCAtdkhphoUtSIyx0SWqEhS5JjbDQJakRFrokNcJCl6RGWOiS1AgLXZIaYaFLUiMsdElqhIUuSY2w0CWpERa6JDXCQpekRljoktQIC12SGmGhS1IjLHRJaoSFLkmNsNAlqREjFXqSNUl2JdmdZOMM6y9Osi/JHd3PPxt/VEnSoSyebUKSRcBm4GeAvcCOJFur6q5pU/+gqjYchYySpBGMsoV+NrC7qvZU1X7gOmDd0Y0lSZqrUQp9KXDf0Hhvt2y685PcmeSGJMtm+o2SrE8ymWRy3759hxFXknQw4zoo+kfA8qp6CXAz8LGZJlXVlqpaXVWrJyYmxvTWkiQYrdDvB4a3uE/vln1fVT1QVd/thv8FePl44kmSRjVKoe8AViZZkeRE4EJg6/CEJM8bGq4F7h5fREnSKGY9y6WqDiTZAGwHFgEfrqqdSTYBk1W1Fbg0yVrgAPBN4OKjmFmSNINZCx2gqrYB26Ytu2Lo9buAd403miRpLrxSVJIaYaFLUiMsdElqhIUuSY2w0CWpERa6JDXCQpekRljoktQIC12SGmGhS1IjLHRJaoSFLkmNsNAlqREWuiQ1wkKXpEZY6JLUCAtdkhphoUtSIyx0SWqEhS5JjbDQJakRFrokNWKkQk+yJsmuJLuTbDzEvPOTVJLV44soSRrFrIWeZBGwGTgXWAVclGTVDPNOBt4J/MW4Q0qSZjfKFvrZwO6q2lNV+4HrgHUzzPtt4CrgO2PMJ0ka0SiFvhS4b2i8t1v2fUl+FFhWVTeNMZskaQ6O+KBokqcB7wd+fYS565NMJpnct2/fkb61JGnIKIV+P7BsaHx6t2zKycBZwOeT3Av8GLB1pgOjVbWlqlZX1eqJiYnDTy1JeopRCn0HsDLJiiQnAhcCW6dWVtXDVXVaVS2vquXAbcDaqpo8KoklSTOatdCr6gCwAdgO3A1cX1U7k2xKsvZoB5QkjWbxKJOqahuwbdqyKw4y9zVHHkuSNFdeKSpJjbDQJakRFrokNcJCl6RGWOiS1AgLXZIaYaFLUiMsdElqhIUuSY2w0CWpERa6JDXCQpekRljoktQIC12SGmGhS1IjLHRJaoSFLkmNsNAlqREWuiQ1wkKXpEZY6JLUCAtdkhphoUtSI0Yq9CRrkuxKsjvJxhnW/2qSLyW5I8mfJlk1/qiSpEOZtdCTLAI2A+cCq4CLZijsa6vqH1bVy4CrgfePPakk6ZBG2UI/G9hdVXuqaj9wHbBueEJVPTI0PAmo8UWUJI1i8QhzlgL3DY33Aq+cPinJO4DLgBOB144lnSRpZGM7KFpVm6vqhcC/Bt4905wk65NMJpnct2/fuN5aksRohX4/sGxofHq37GCuA94004qq2lJVq6tq9cTExOgpJUmzGqXQdwArk6xIciJwIbB1eEKSlUPD84CvjC+iJGkUs+5Dr6oDSTYA24FFwIerameSTcBkVW0FNiR5PfAY8CDwtqMZWpL0VKMcFKWqtgHbpi27Yuj1O8ecS5I0R14pKkmNsNAlqREWuiQ1wkKXpEZY6JLUCAtdkhphoUtSIyx0SWqEhS5JjbDQJakRFrokNWKke7lIfVu+8aa+Ixw19155Xt8R1Ai30CWpERa6JDXCQpekRljoktQIC12SGmGhS1IjLHRJaoSFLkmNsNAlqREWuiQ1wkKXpEaMVOhJ1iTZlWR3ko0zrL8syV1J7kxyS5Lnjz+qJOlQZi30JIuAzcC5wCrgoiSrpk37K2B1Vb0EuAG4etxBJUmHNsoW+tnA7qraU1X7geuAdcMTqupzVfXtbngbcPp4Y0qSZjNKoS8F7hsa7+2WHcwlwB/PtCLJ+iSTSSb37ds3ekpJ0qzGelA0yS8Bq4H3zbS+qrZU1eqqWj0xMTHOt5akBW+UB1zcDywbGp/eLXuSJK8HfhP4qar67njiSZJGNcoW+g5gZZIVSU4ELgS2Dk9I8o+ADwBrq+rr448pSZrNrIVeVQeADcB24G7g+qramWRTkrXdtPcBzwY+leSOJFsP8ttJko6SkZ4pWlXbgG3Tll0x9Pr1Y84lSZojrxSVpEZY6JLUCAtdkhphoUtSIyx0SWqEhS5JjbDQJakRFrokNcJCl6RGWOiS1AgLXZIaYaFLUiMsdElqhIUuSY2w0CWpERa6JDXCQpekRljoktQIC12SGmGhS1IjLHRJaoSFLkmNsNAlqREjFXqSNUl2JdmdZOMM61+d5ItJDiS5YPwxJUmzmbXQkywCNgPnAquAi5Ksmjbtb4CLgWvHHVCSNJrFI8w5G9hdVXsAklwHrAPumppQVfd26x4/ChklSSMYpdCXAvcNjfcCrzycN0uyHlgPcMYZZxzObyFpnlm+8aa+IxxV9155Xt8Rvu+YHhStqi1VtbqqVk9MTBzLt5ak5o1S6PcDy4bGp3fLJEnHkVEKfQewMsmKJCcCFwJbj24sSdJczVroVXUA2ABsB+4Grq+qnUk2JVkLkOQVSfYCPw98IMnOoxlakvRUoxwUpaq2AdumLbti6PUOBrtiJEk98UpRSWqEhS5JjbDQJakRFrokNcJCl6RGWOiS1AgLXZIaYaFLUiMsdElqhIUuSY2w0CWpERa6JDXCQpekRljoktQIC12SGmGhS1IjLHRJaoSFLkmNsNAlqREWuiQ1wkKXpEZY6JLUCAtdkhoxUqEnWZNkV5LdSTbOsP7pSf6gW/8XSZaPO6gk6dBmLfQki4DNwLnAKuCiJKumTbsEeLCqzgT+PXDVuINKkg5tlC30s4HdVbWnqvYD1wHrps1ZB3yse30D8LokGV9MSdJsFo8wZylw39B4L/DKg82pqgNJHgaeC3xjeFKS9cD6bvi3SXYdTuh54jSm/fmPpvidaJz87Oa31j+/5x9sxSiFPjZVtQXYcizfsy9JJqtqdd85NHd+dvPbQv78Rtnlcj+wbGh8erdsxjlJFgOnAA+MI6AkaTSjFPoOYGWSFUlOBC4Etk6bsxV4W/f6AuCzVVXjiylJms2su1y6feIbgO3AIuDDVbUzySZgsqq2Ah8CPpFkN/BNBqW/0C2IXUuN8rOb3xbs5xc3pCWpDV4pKkmNsNAlqREWujQkydOSLOk7h3Q4LPQxSfLOJEsy8KEkX0xyTt+5NLsk13af3UnA/wHuSnJ537k0miQnJXla9/pFSdYmOaHvXH2w0Mfn7VX1CHAO8BzgnwJX9htJI1rVfXZvAv4YWMHg89P88AXgGUmWAp9h8Nl9tNdEPbHQx2fq3jU/B3yiqnYOLdPx7YRui+5NwNaqeqzvQJqTVNW3gbcA/7Gqfh74kZ4z9cJCH5/bk3yGQaFvT3Iy8HjPmTSaDwD3AicBX0jyfODhXhNpLpLkVcAvAjd1yxb1mKc3noc+Jt0+vJcBe6rqoSTPBZZW1Z09R9MskqyoqnuGxgHOrKqv9BhLI0ryauA3gFur6qokLwB+raou7TnaMecW+vjcXFVfrKqHAKrqAQb3htfx7w+HB91tK67rKYvm7geram1VXQVQVXuA/9Vzpl4c07sttijJM4BnAacleQ5P7DdfwuC2wjpOJflhBvtaT0nylqFVS4Bn9JNKh+FdwKdGWNY8C/3I/XPg14AfAm7niUJ/BPi9vkJpJC8G3gCcCrxxaPmjwK/0kkgjS3Iug2NWS5P8h6FVS4AD/aTql/vQxyTJv6yqa/rOoblL8qqq+vO+c2hukryUwXGrTcAVQ6seBT5XVQ/2EqxHFvoYJTmLwXNXv/91vao+3l8ijaLbbXYJg90vw5/d23sLpZElOcFTTQc8KDomSd4LXNP9/DRwNbC211Aa1SeAvw/8LPAnDB7i8miviTQXy5PckOSuJHumfvoO1QcLfXwuAF4HfK2qfhl4KYMnN+n4d2ZVvQf4VlV9DDiPpz43V8evjwD/icF+858GPg78fq+JemKhj8/fVdXjwIHu5k5f58mP7tPxa+rr+kPdbrNTgB/oMY/m5plVdQuDXchfrarfYvCX8oLjWS7jM5nkVOCDDM52+VvAA23zw5bulNN3M3ic4rN58kE2Hd++213Y95Xu6Wr3M/gMFxwPih4FSZYDS7xKVDr6krwCuJvB6ae/zeC0xfdV1W29BuuBhT4mSd7M4OHYD3fjU4HXVNWn+02m2ST5XeDqqat8u631X6+qd/ebTJob96GPz3unyhygK4f39phHozt3qswBuvOXf67HPJqDJDd3G1BT4+ck2d5npr5Y6OMz079Lj1HMD4uSPH1qkOSZwNMPMV/Hl9Nm+At5QR7UtnDGZzLJ+4HN3fgdDA6O6vj3SeCWJB/pxr8MfKzHPJqbx5OcUVV/A9Dd/nhB7kt2H/qYdI8vew/wegb/Md0M/Nuq+lavwTSSJGsYfHYwuHPmgvzKPh91n90WBheFBfhJYP1C/AwtdEnzXpLTgB/rhrdV1Tf6zNMXC12SGuFBUUlqhIUuDelOeXtJ3zmkw2Ghj0mSq5MsSXJCkluS7EvyS33n0uySfL777P4e8EXgg90ZS5oHkrxw6rTTJK9JcunweekLiYU+PudU1SMMnoBzL3AmcHmviTSqU7rP7i3Ax6vqlTxxxouOf38IfC/JmQzOdlkGXNtvpH5Y6OMzdU7/ecCnhq8a1XFvcZLnAW8Fbuw7jObs8ao6ALwZuKaqLgee13OmXljo43Njki8DL2dwkcoE8J2eM2k0m4DtwO6q2pHkBcBXes6k0T2W5CLgbTzxF/IJPebpjactjlG3D/bhqvped6HRyVX1tb5zSS1Lsgr4VeDPq+q/JlkBvLWqruo52jFnoY9JkmcBlwFnVNX6JCuBF1eVX+GPcz5TdP7r7r9zRlXt6jtLn9zlMj4fAfYDP96N7wd+p784mgOfKTqPJXkjcAfwP7vxy5Js7TdVPyz08XlhVV1N9zizqvo2g/tK6PjnM0Xnt98CzgYeAqiqO4AX9BmoLxb6+OzvvvYVDM6NBb7bbySNyGeKzm+PzXBW2eO9JOmZt88dn/cy+Mq3LMkngZ8ALu41kUY19UzR9+AzReejnUl+gcF97VcClwJ/1nOmXnhQdEy6M1zC4I5vAW5jcJbLPb0GkxrXnZDwm8A53aLPAJuqasF9Q7bQxyTJrQweZfZIN/4HDC4wOqvfZJpNd9n4+cByhr61VtWmvjJpdEkuqaoPTVt2ZVVt7CtTX9yHPj6/C/xRkpOSvBy4AfBeLvPD/wDWAQeAbw39aH44P8kvTg2S/B4w0WOe3rgPfUyq6qYkJzB4UtHJwJur6q97jqXRnF5Va/oOocN2PrA1yePAGuChqrqk50y9cJfLEUpyDU9+fuHrgP/L4AZdVNWlPcTSHCTZwuAeIF/qO4tG1x23mnIy8GngVroD2lX1zT5y9clCP0JJ3nao9d15zToOJfkSg7+MFwMrgT0MTjUNUFXlfdGPY0nuYfD5ZeifU6qqFty56Ba6Fqzu6fAHVVVfPVZZpHGw0I9Qkuur6q1DW3tP4laedHQkeW1VfTbJW2ZaX1X/7Vhn6psHRY/cO7t/vqHXFNLC81PAZ4E3zrCugAVX6G6hS1Ij3EI/QkkeZYZdLTxxYG3JMY4kLQhJLjvU+qpacM+FtdCPUFWd3HcGaYHy/71p3OUiSY3w0n9J81qSq5MsSXJCkluS7EuyIG+7YaFLmu/O6W6K9wYGV2ifCVzea6KeWOiS5rupY4HnMbjD6fSHXSwYHhSVNN/dmOTLwN8B/yLJBPCdnjP1woOikua97kZdD1fV97oHXiypqq/1netYcwtd0rw006X/yZOey77grhS10CXNV176P427XCSpEW6hS5qXvPT/qSx0SfPV1KX/LwZeAWztxm8E/rKXRD1zl4ukeS3JF4DzqurRbnwycFNVvbrfZMeeFxZJmu9+ENg/NN7fLVtw3OUiab77OPCXSf57N34T8NH+4vTHXS6S5r0kPwr8ZDf8QlX9VZ95+mKhS1Ij3IcuSY2w0CWpERa6dAhJLk7yQ33nkEZhoUuHdjFgoWtesNC1oCRZnuTLST6Z5O4kNyR5VpKXJ/mTJLcn2Z7keUkuAFYDn0xyR5JnJrkyyV1J7kzy7/r+80jDPMtFC0qS5cA9wD+uqluTfBi4G3gzsK6q9iX5J8DPVtXbk3we+I2qmkzyXODPgB+uqkpyalU91M+fRHoqLyzSQnRfVd3avf594N8AZwE3d/fTXgT8vxl+3cMMnoTzoSQ3Ajceg6zSyCx0LUTTv5Y+Cuysqlcd8hdVHUhyNvA64AJgA/DaoxNRmjv3oWshOiPJVHn/AnAbMDG1LMkJSX6kW/8o3V39kjwbOKWqtgH/CnjpsY0tHZqFroVoF/COJHcDzwGuYbDFfVWS/w3cAfx4N/ejwH9OcgeDYr8xyZ3AnwKHvB+3dKx5UFQLSndQ9MaqOqvnKNLYuYUuSY1wC12SGuEWuiQ1wkKXpEZY6JLUCAtdkhphoUtSI/4/N/6Y6Ugd74AAAAAASUVORK5CYII=\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + } + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "beEYGriXJta2" + }, + "source": [ + "5\\. If you were a heterosexual female interested in dating a non-smoker, how many options would you have in this data set?" + ] + }, + { + "cell_type": "code", + "source": [ + "# straight males that don't smoke\n", + "df_options = df_okc[(df_okc[\"sex\"] == \"m\") & (df_okc[\"smokes\"] == \"no\") & (df_okc[\"orientation\"] == \"straight\")]\n", + "len(df_options)" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "OP6STVPLdssy", + "outputId": "30291764-27ca-49eb-9b4a-2e38284aeda4" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "1107" + ] + }, + "metadata": {}, + "execution_count": 249 + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "1107 options (straight males that don't smoke)" + ], + "metadata": { + "id": "zlekh1HZfMIr" + } + } + ] +} \ No newline at end of file