SQL With CSVs – KDnuggets

Write SQL query to analyze CSV files using the simple command line tool.

SQL With CSVs
Image by Author

You can run SQL query on a CSV file:
In this post, we will be reviewing the most popular CLI tool for processing the CSV files and learning how to run SQL queries on CSV files hassle free.

csvkit consists of multiple command line conversion and processing tools. It is the king of tabular file formats.
We will be focusing on running SQL queries and displaying and saving the result.

You can install the csvkit using PIP.

After installation, read the documentation in Shell by typing the command below or reading the documentation online.

To run the SQL, you need to use the –query argument and then write a query in the quotation mark. For longer SQL queries, you can use line breaks, as it works fine with PowerShell/ Bash.

Note: the file name at the end must be the same as the name of the table in the query. 

In this section, we will use a modified version of the Top 50 Fast-Food Chains in USA dataset from Kaggle to run the SQL query on a CSV file and print and save the results.

First, we will run a simple query to test if the command works.

Output:
The results show four columns and the first two records of fast food chains in the USA.

Let’s try to run a complex SQL query to filter out the top 5 fast food chains with sales greater than or equal to 4 billion dollars.

Output:

We will pipe “|” our result to the csvlook command and convert the result into tabular form.

Output:
It has improved the SQL query output.

We will use redirection “>” to save SQL query results into CSV files. You can provide a filename or filename with a full address.

Result:
As we can see, the file is successfully saved in the current directory.

SQL With CSVs
Image by Author | Filtered CSV file using SQL

Having a handy command line tool helps you in automating the tasks and data pipelines. You can even use free online SQL tools and start working on the data analysis project.
I will recommend you use Deepnote for running SQL queries on CSV files in seconds. It uses DuckDB in the background.


Do let me know if you have questions about using SQL in Jupyter notebook, Deepnote, DuckDB, and csvkit. I will try my best to assist you.

Abid Ali Awan (@1abidaliawan) is a certified data scientist professional who loves building machine learning models. Currently, he is focusing on content creation and writing technical blogs on machine learning and data science technologies. Abid holds a Master’s degree in Technology Management and a bachelor’s degree in Telecommunication Engineering. His vision is to build an AI product using a graph neural network for students struggling with mental illness.

Get the FREE ebook ‘The Great Big Natural Language Processing Primer’ and ‘The Complete Collection of Data Science Cheat Sheets’ along with the leading newsletter on Data Science, Machine Learning, AI & Analytics straight to your inbox.

By subscribing you accept KDnuggets Privacy Policy
Get the FREE ebook ‘The Great Big Natural Language Processing Primer’ and ‘The Complete Collection of Data Science Cheat Sheets’ along with the leading newsletter on Data Science, Machine Learning, AI & Analytics straight to your inbox.

By subscribing you accept KDnuggets Privacy Policy
Subscribe To Our Newsletter
(Get The Complete Collection of Data Science Cheat Sheets & Great Big NLP Primer ebook)
Get the FREE ebook ‘The Great Big Natural Language Processing Primer’ and ‘The Complete Collection of Data Science Cheat Sheets’ along with the leading newsletter on Data Science, Machine Learning, AI & Analytics straight to your inbox.
By subscribing you accept KDnuggets Privacy Policy
Get the FREE ebook ‘The Great Big Natural Language Processing Primer’ and ‘The Complete Collection of Data Science Cheat Sheets’ along with the leading newsletter on Data Science, Machine Learning, AI & Analytics straight to your inbox.
By subscribing you accept KDnuggets Privacy Policy

source

Leave a Comment