Since pandasql is refusing to connect in Google Colab (to directly execute SQL queries on a pandas dataframe), we will set up our own SQL server in MySQL to query it.
Here are the steps:
- Download MySQL for your machine. I used MySQL Installer 8.0.32, which is a .msi file (like an .exe that walks you through the installation process). There is another MySQL download that leads you to a .zip file, which is basically doing the same thing on hard mode since it doesn’t have that walkthrough process.
- I just kept it as Developer Default:
- There are Visual Studio requirements (below) that I apparently haven’t fulfilled yet, but MySQL seems to work properly even if we ignore it for now.
- Installation completed without errors:
- Keep the default settings:
- Pick and set a root password of your choice.
- Default settings:
- Check if it works within the installer:
- Click on Local instance MySQLServer and Log into your server on MySQL Workbench:
- So now we have a blank file. We now need to create a schema, and then import the CSV as a new table.
- To create a schema, right click anywhere in the white space of the “Schemas” tab and select New Schema.
- Type in the name of the schema that you want to call it. I called mine Uber cuz we’re going to pretend we are working with Uber rideshare datasets. Click Apply on the bottom right corner. You should now see “uber” as a schema on the left side. If you don’t, right click in the blank area on the left side under SCHEMAS and select “Refresh All”.
- Right click on “Tables” underneath Uber and click on Table Data Import Wizard.
- Import the data. I just called my table “trips”, so that when we reference it, it will be “SELECT * FROM uber.trips”.
- Right click on Tables and select “Refresh All”. That’s where the table is.
- Test out SELECT * FROM uber.trips to see if it works! It should!
- Ctrl+Enter is the keyboard shortcut to run the query without using a mouse.