Python Tips & Tricks:
Ice Cream and Pickles
For those of us just starting out in our data science journey, working with Python and Jupyter Notebooks can be a bit daunting. Codespeak sometimes reads like Japenese and trying to search for useful tips among the plethora of StackOverflow information can be be utterly overwhelming. The following is dedicated to fellow food lovers and those on lunch break simply looking for tips to make your life easier and to optimize your time.
Here are two tools I found useful:
1) Ice Cream
Ice Cream is a python library that takes the print() function and optimizes it, making your output more readable.
Do you ever get tired of writing print()? Especially when you need to see more than one output at a time?
It gets especially tedious when you want to see what that output is referring to. For example, what does 30 belong to?
Well, we could do this:
Notice, how we have to write out print( ‘num1’ , num1) to see it in the output?
This is where Ice Cream comes in handy!
So why use icecream?
- It allows us to see the variables that our output references, which saves confusion, especially when you have many outputs.
- Its shorter than typing out ‘print’. Time is money!
“a serialized, byte-wise .pkl file that preserves a Python object precisely and exactly”
What is Pickling?
- A serialization process for Python data structures
-Useful for when you need to store an object and retrieve it later.
Well, why not just read in a file (JSON, CSV)? Is this not the same thing?
- Yes and no.
We have likely all experienced the incredible amounts of frustration (i.e. blinding rage) that occurs when saving/reading in a file. Maybe you saved the the dataframe to a .csv but the file was altered in memory somehow. Now you have the pleasure of starting from scratch to clean your data set, and between delimiters, dropping rows and removing headers etc., you have wasted valuable time. **Sigh**
Pickling is exactly what is sounds like; an easy way to PRESERVE and reuse any Python object.
Intrigued? Heres a quick tutorial:
In this case, we are declaring a dictionary as our object to Pickle. This however could be anything: a dataframe, a matrix, a fitted model.
Three things here:
- Note that in Step 2, the filename does not need to have an extension.
- “WB” — simply stands for Write Binary (The file will be written in byte objects).
- If you would like to specify where the file is being saved to, you can instead swap the ‘filename’ with a file path.
Pickle.dump() takes two arguments:
- the name of the object to be pickled (in this case our dictionary)
- the file where the object is being pickled (outfile)
You will now see your pickled file ‘food.pkl’ saved in the same directory that you are already working in, unless you specified another filepath.
Now, how to unpickle a file:
To unpickle a file we simply do the opposite of what we did to pickle our dictionary in the first place.
- Outfile becomes infile,
- ‘WB’ → ‘RB’ (meaning read binary)
- Then close the infile.
Note: This only work for Python and is not compatible across other programming languages.
- There is version of pickling written in C — cPickling — making it up to a 1,000x faster. Very useful for large datasets.
- Pickling can prove to be especially useful when working with machine learning algorithms, saving time and energy (both yours & your computers) from rewriting a model and retraining it all over again.
- While Pickling works for most objects it does not work for all. Lambda functions are one of these which can’t be serialized , however there is a workaround for using lambda functions by simply using the dill package instead of pickle.
- Be aware! You are more likely to to import malicious code from a pickled file. As a rule of thumb, do not use pickling when the intended object to pickle contains sensitive data.
For more checkout: