<<<<<<< HEAD
Course materials for General Assembly's Data Science course in Washington, DC (11/30/15 - 02/22/16).
Instructor: Keegan Hines (blog, [github] (https://github.com/keeganhines), twitter)
| Monday | Wednesday |
|---|---|
| 11/30: Introduction to Data Science | 12/02: Command Line, Version Control |
| 12/07: Data Reading and Cleaning | 12/09: Exploratory Data Analysis |
| 12/14: Visualization | 12/16: Machine Learning |
| 12/21: Getting Data | 12/23: No Class |
| 12/28: K-Nearest Neighbors | 12/30: Basic Model Evaluation |
| 01/04: Linear Regression | 01/06: First Project Presentation |
| 01/11: Logistic Regression | 01/13: Advanced Model Evaluation |
| 01/18: No Class | 01/20: Naive Bayes and Text Data |
| 01/25: Natural Language Processing | 01/27: Kaggle Competition |
| 02/01: Decision Trees | 02/03: Ensembling |
| 02/08: Advanced scikit-learn, Clustering | 02/10: Regularization, Regex |
| 02/15: No Class | 02/17: [Course Review] () |
| 02/22: [Final Project Presentations] () |
- Codecademy's Python course: Good beginner material, including tons of in-browser exercises.
- Dataquest: Uses interactive exercises to teach Python in the context of data science.
- Google's Python Class: Slightly more advanced, including hours of useful lecture videos and downloadable exercises (with solutions).
- Introduction to Python: A series of IPython notebooks that do a great job explaining core Python concepts and data structures.
- Python for Informatics: A very beginner-oriented book, with associated slides and videos.
- A Crash Course in Python for Scientists: Read through the Overview section for a very quick introduction to Python.
- Python 2.7 Quick Reference: My beginner-oriented guide that demonstrates Python concepts through short, well-commented examples.
- Beginner and intermediate workshop code: Useful for review and reference.
- Python Tutor: Allows you to visualize the execution of Python code.
| Resource | Web | Cost | Notes |
|---|---|---|---|
| Data Community DC (DC2) | http://www.datacommunitydc.org/ | $ | DC2 is an umbrella organization of several local Meet Up groups all focused on various aspects of data science. These Meet Ups are almost always free to attend. |
| DataSociety | http://datasociety.co/ | $$ | Introductory online data science courses with a focus on R. |
| District Data Labs | http://www.districtdatalabs.com/#!workshops/cwef | $$ | Weekend workshops and online courses which each focus on an advanced data science concept. They also have a part-time incubator program where participants collaborate on a final project. |
| General Assembly | https://generalassemb.ly/education/data-science | $$$ | Part-time classes, you're already here! Good job! |
| Academic | https://gradanalytics.georgetown.edu, /http://datasci.columbian.gwu.edu/, http://volgenau.gmu.edu/data-analytics-engineering | $$$$ | Many Universities are now offering Masters degrees and certificate programs in Data Science. These are obviously quite expensive. |
- Welcome from General Assembly staff
- Course overview (slides)
- Group exercise
- [Survey] (http://goo.gl/forms/Gu23YkwnPh)
- Analyze results (notebook)
- Introduction to data science (slides)
- Types of data (slides) and public data sources
- Discuss the course project: requirements and example projects
- Slack tour
Homework:
- Find and bring in a dataset that is professionally relevant to you.
- Work through GA's friendly command line tutorial using Terminal (Linux/Mac) or Git Bash (Windows).
- Read through this command line reference, and complete the pre-class exercise at the bottom. (There's nothing you need to submit once you're done.)
- Watch videos 1 through 8 (21 minutes) of Introduction to Git and GitHub, or read sections 1.1 through 2.2 of Pro Git.
- If your laptop has any setup issues, please work with us to resolve them by Wednesday. If your laptop has not yet been checked, you should come early on Wednesday, or just walk through the setup checklist yourself (and let us know you have done so).
Resources:
- For a useful look at the different types of data scientists, read Analyzing the Analyzers (32 pages).
- For some thoughts on what it's like to be a data scientist, read these short posts from Win-Vector and Datascope Analytics.
- Quora has a data science topic FAQ with lots of interesting Q&A.
- Keep up with local data-related events through the Data Community DC event calendar or weekly newsletter.
Homework:
- Find and bring in a professionally relevant dataset (if not yet done).
- Complete the command line homework assignment with the Chipotle data. Submit your think through this [form] (http://goo.gl/forms/EamVci00DO)
- If we don't get through the Python material, read over the beginner and intermediate Python background. If you don't feel comfortable with the Python fundamentals (except the optional stuff toward the end), you should spend time this weekend practicing basic Python:
- Introduction to Python does a great job explaining Python essentials and includes tons of example code.
- If you like learning from a book, Python for Informatics has useful chapters on strings, lists, and dictionaries.
- If you prefer interactive exercises, try these lessons from Codecademy: "Python Lists and Dictionaries" and "A Day at the Supermarket".
- If you have more time, try missions 2 and 3 from DataQuest's Learning Python course.
- If you've already mastered these topics and want more of a challenge, try solving Python Challenge number 1 (decoding a message) and send me your code in Slack.
- To give you a framework for thinking about your project, watch What is machine learning, and how does it work? (10 minutes). (This is the IPython notebook shown in the video.) Alternatively, read A Visual Introduction to Machine Learning, which focuses on a specific machine learning model called decision trees.
- Optional: Browse through some more example student projects, which may help to inspire your own project!
Git and Markdown Resources:
- Pro Git is an excellent book for learning Git. Read the first two chapters to gain a deeper understanding of version control and basic commands.
- If you want to practice a lot of Git (and learn many more commands), Git Immersion looks promising.
- If you want to understand how to contribute on GitHub, you first have to understand forks and pull requests.
- GitRef is my favorite reference guide for Git commands, and Git quick reference for beginners is a shorter guide with commands grouped by workflow.
- Cracking the Code to GitHub's Growth explains why GitHub is so popular among developers.
- Markdown Cheatsheet provides a thorough set of Markdown examples with concise explanations. GitHub's Mastering Markdown is a simpler and more attractive guide, but is less comprehensive.
Command Line Resources:
- If you want to go much deeper into the command line, Data Science at the Command Line is a great book. The companion website provides installation instructions for a "data science toolbox" (a virtual machine with many more command line tools), as well as a long reference guide to popular command line tools.
- If you want to do more at the command line with CSV files, try out csvkit, which can be installed via
pip.
cb3c318f35fb2729f68b904e87c4e6ccadbb5c6d