Getting Started with python

Welcome to the Python Data Expert course! In this comprehensive program, you will embark on an exciting journey to become a proficient data expert using the powerful programming language, Python. Throughout this course, you will gain a deep understanding of Python's data manipulation and analysis capabilities, mastering techniques to extract, transform, and visualize data efficiently. From handling large datasets to implementing machine learning algorithms, this course will equip you with the essential skills needed to tackle real-world data challenges. Get ready to unlock the full potential of Python and become a sought-after data expert in today's data-driven world!

About Python

Python is an open source programming language that is popular for general software development and scripting, web development and data science. Since Python is a general purpose language, the base language does not include many of the data structures and functions necessary for data analysis. Python does, however, have an extensive selection of addon packages that give it great flexiblity and a robust library of data-oriented addons make it one of the most popular languages for data science. The majority of the programming kernels you'll find are written in Python.

If you're wondering whether you should start your data science journey by learning Python or R, there's no correct answer. Both languages are used extensively in data science and it a good idea to learn the basics of both. Python is the most popular langauge for data science and most Kaggle users recommend learning it first. R is great for data exploration, statistics and plotting, as many of the functions you need are built into the language.

Python Setup

There are several environments for working with notebooks. For this guide, you can use any of them. If you're used to a particular environment, you can use it. However, if you're a novice, I strongly recommend a local Jupyter environment.

If you would like to set up a local Python environment on your computer, you will need to download Python. It is easiest to download the Anaconda Python distribution from Continuum Analytics. Anaconda bundles Python with dozens of popular data analysis libraries and it comes with a nice integrated development environment (a fancy code editor) called Spyder.

Using Notebooks

The programming notebooks uses, known as Jupyter notebooks, provide a handy way of structuring text and code in a single document that can be rendered as HTML--a web page--so that it can be viewed in your web browser. Programming notebooks consist of two types of cells: Markdown and Code. Markdown cells contain plain text that can be given additional structure using a text formatting language called Markdown. Code cells consist of code that you can run interactively while you are editing the notebook. Clicking on any part of the notebook while you are in edit mode after forking it will highlight the cell containing the text or code. Click on this line of text to select this Markdown cell!

Now that you have selected this Markdown cell, you should notice some formatting tools in the upper left corner of the cell that allow you to perform some common formatting tasks like adding web links and making lists. On the right side, you will see a tab that says Markdown in blue, followed by Code in gray. The word Markdown in blue indicates that this is currently a Markdown text cell. You can convert Markdown cells to code cells and back by clicking on the appropriate word on the tab. Try selecting cell below and then convert it from a Markdown cell to a code cell:

Turn this cell into a code cell!

5 + 10

Code cells differ from Markdown cells in that you can run code cells, which causes whatever code they contain to be executed and then whatever output the code produces to appear below the cell. After selecting a code cell, hold down the control key and then press enter to run its contents (you can also click the blue "play" button to the left of the cell while it is selected to run it.). Try running the cell above that you converted from Markdown to code!

The code cell above should have run the arithmetic operation 5 + 10 and produced an output value of 15. Congratulations, you've run your first Python code!

If you change the contents of a code cell and run it again, the output will change according to the new code. Try changing the code in the cell above from "5 + 10" to "5 * 10" and run it again. The output should now be 50, since you changed the operation from addition to multiplication.

Notebook Shortcuts

The Jupyter programming environment comes with a variety of keyboard shortcuts that can help you do common operations quicker. First of all, it should be noted that the Jupyter notebook has two distinct modes: Edit mode and Command mode. When you click on a Markdown or code cell like this one (click on it now!), the notebook will be put into edit mode allowing you to directly edit the contents of the cell. While in edit mode, you have access to keyboard shortcuts that deal with editing the contents of the selected cell. Going into command mode lets you use a variety of higher level commands that let you do things like create new cells, delete cells and convert cells from one type to another.

While in edit mode, press the escape key to enter command mode. Try it now!

You should notice the flashing text edit cursor disappear when you enter command mode. Now that you are in command mode, press the "P" key to bring up a list of notebook commands and then search for "show keyboard shortcuts" and click on it. You'll notice two lists: one for keyboard commands that work in edit mode and one for commands that work while you are in command mode. Don't be overwhelmed by the number of keyboard shortcuts: it is not really important that you learn any of them for this guide beyond using control + enter to run code, but I wanted you to be aware that you can use various shortcuts to help navigate the notebooks.

It should be noted that control + enter to run code cells works in both edit mode and command mode. Some other useful shortcuts while in command mode include: "A" to create a new cell above the current cell, "B" to create a new cell below the current cell, "M" to convert the current cell to Markdown, "Y" to convert the current cell to code and "DD" (press "D" twice) to delete the current cell.

Wrap Up

This lesson is just the tip of the iceberg. Although we spent most of our time discussing the programming environment we will use throughout this guide, we are now well prepared to jump in and start learning Python. In the following lessons we will continue our journey by learning the very basics of Python and programming in general to give you the foundation necessary to move onto using Python as a tool for data analysis and predictive modeling. We will start slow, but during the course of this guide you will learn all the tools you need to start exploring data, generating plots and making predictive models with Python. We will use the Titanic disaster data set as our primary motivating example, but the tools you learn will be transferable to any data project you're looking to tackle.