Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
259 changes: 245 additions & 14 deletions lab-hypothesis-testing.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": 2,
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -278,7 +278,7 @@
"[800 rows x 11 columns]"
]
},
"execution_count": 3,
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
Expand All @@ -297,11 +297,83 @@
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"#code here"
"dragon_hp = df[df[\"Type 1\"] == \"Dragon\"][\"HP\"]\n",
"grass_hp = df[df[\"Type 1\"] == \"Grass\"][\"HP\"]"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"T-statistic: 3.3349632905124063\n",
"P-value: 0.0015987219490841197\n"
]
}
],
"source": [
"# T-test\n",
"\n",
"t_stat, p_value = st.ttest_ind(dragon_hp, grass_hp, equal_var=False)\n",
"\n",
"print(\"T-statistic:\", t_stat)\n",
"print(\"P-value:\", p_value)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"One-sided p-value: 0.0007993609745420598\n"
]
}
],
"source": [
"p_value_one_sided = p_value / 2\n",
"\n",
"print(\"One-sided p-value:\", p_value_one_sided)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Reject the null hypothesis\n"
]
}
],
"source": [
"alpha = 0.05\n",
"\n",
"if p_value_one_sided < alpha:\n",
" print(\"Reject the null hypothesis\")\n",
"else:\n",
" print(\"Fail to reject the null hypothesis\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We performed an independent two-sample t-test to compare the HP stats of Dragon and Grass Pokémon. Using a significance level of 0.05, the resulting p-value was below 0.05. Therefore, we reject the null hypothesis and conclude that Dragon-type Pokémon have significantly higher average HP than Grass-type Pokémon."
]
},
{
Expand All @@ -313,11 +385,68 @@
},
{
"cell_type": "code",
"execution_count": 18,
"execution_count": 7,
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"HP\n",
"t-stat: 8.981370483625046\n",
"p-value: 1.0026911708035284e-13\n",
"Reject H0 → Significant difference\n",
"\n",
"Attack\n",
"t-stat: 10.438133539322203\n",
"p-value: 2.520372449236646e-16\n",
"Reject H0 → Significant difference\n",
"\n",
"Defense\n",
"t-stat: 7.637078164784618\n",
"p-value: 4.826998494919331e-11\n",
"Reject H0 → Significant difference\n",
"\n",
"Sp. Atk\n",
"t-stat: 13.417449984138461\n",
"p-value: 1.5514614112239816e-21\n",
"Reject H0 → Significant difference\n",
"\n",
"Sp. Def\n",
"t-stat: 10.015696613114878\n",
"p-value: 2.2949327864052826e-15\n",
"Reject H0 → Significant difference\n",
"\n",
"Speed\n",
"t-stat: 11.47504444631443\n",
"p-value: 1.0490163118824507e-18\n",
"Reject H0 → Significant difference\n",
"\n"
]
}
],
"source": [
"#code here"
"legendary = df[df[\"Legendary\"] == True]\n",
"non_legendary = df[df[\"Legendary\"] == False]\n",
"\n",
"stats = [\"HP\",\"Attack\",\"Defense\",\"Sp. Atk\",\"Sp. Def\",\"Speed\"]\n",
"\n",
"for stat in stats:\n",
" \n",
" t_stat, p_value = st.ttest_ind(\n",
" legendary[stat],\n",
" non_legendary[stat],\n",
" equal_var=False\n",
" )\n",
" \n",
" print(f\"{stat}\")\n",
" print(\"t-stat:\", t_stat)\n",
" print(\"p-value:\", p_value)\n",
" \n",
" if p_value < 0.05:\n",
" print(\"Reject H0 → Significant difference\\n\")\n",
" else:\n",
" print(\"Fail to reject H0 → No significant difference\\n\")"
]
},
{
Expand All @@ -337,7 +466,7 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 8,
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -453,7 +582,7 @@
"4 624.0 262.0 1.9250 65500.0 "
]
},
"execution_count": 5,
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
Expand Down Expand Up @@ -483,17 +612,119 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": []
"source": [
"# Defining School and Hospital coordinates\n",
"\n",
"school = (-118, 34)\n",
"hospital = (-122, 37)"
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": []
"source": [
"def euclidean_distance(x1, y1, x2, y2):\n",
" return np.sqrt((x1-x2)**2 + (y1-y2)**2)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"# Calculating distances\n",
"\n",
"df[\"dist_school\"] = euclidean_distance(\n",
" df[\"longitude\"], df[\"latitude\"],\n",
" school[0], school[1]\n",
")\n",
"\n",
"df[\"dist_hospital\"] = euclidean_distance(\n",
" df[\"longitude\"], df[\"latitude\"],\n",
" hospital[0], hospital[1]\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"# Defining houses close to either location\n",
"\n",
"df[\"close\"] = (df[\"dist_school\"] < 0.5) | (df[\"dist_hospital\"] < 0.5)"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"close_houses = df[df[\"close\"] == True][\"median_house_value\"]\n",
"far_houses = df[df[\"close\"] == False][\"median_house_value\"]"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"T-statistic: 37.992330214201516\n",
"P-value: 3.0064957768592614e-301\n"
]
}
],
"source": [
"# T-test\n",
"\n",
"t_stat, p_value = st.ttest_ind(close_houses, far_houses, equal_var=False)\n",
"\n",
"print(\"T-statistic:\", t_stat)\n",
"print(\"P-value:\", p_value)"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Reject H0\n"
]
}
],
"source": [
"# Decision\n",
"\n",
"alpha = 0.05\n",
"\n",
"if p_value < alpha:\n",
" print(\"Reject H0\")\n",
"else:\n",
" print(\"Fail to reject H0\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We performed an independent two-sample t-test to compare the median house values of neighborhoods located close to a school or hospital and those located farther away. Using a significance level of 0.05, the p-value obtained from the test indicates whether the difference in means is statistically significant. Since the p-value is below 0.05, we reject the null hypothesis and conclude that houses close to schools or hospitals tend to be more expensive."
]
}
],
"metadata": {
Expand All @@ -512,7 +743,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.9"
"version": "3.13.11"
}
},
"nbformat": 4,
Expand Down