Emulating Amplifies Learning, Tools for Sharing Work
Hi everyone! 👋,
Happy Tuesday! This is a piece about an epiphany I had this past week. It’s one that seems very obvious. But until you actually do it, you don’t fully appreciate the power of it.
Let’s get to it.
Emulating Amplifies Learning
Imagine reading an incredible essay or newsletter by an author you admire. Or reading the source code of a library that you love to use. Most of the time I read the information, learn something new, maybe take notes, and maybe reference them later. I’m starting to realize that might not be enough and I’m leaving a lot on the table.
I saw an old tweet that Andrej Karpathy, one of the foremost AI practitioners and former Director of AI at Tesla, created a 2.5 hr YouTube tutorial on basic concepts of neural networks. I immediately resolved to watch it in its entirety when I got home.
I not only watched the entire video tutorial but I paused it every 5-10 min, typed every line of code, tinkered with the code, and tried to put myself in his shoes.Â
Why did he write the Python class this way?
Why did he explain backpropagation the way he did?
How did he arrive at this explanation that is so simple to understand?
I now understand the basics of neural networks at a deeper level. My aha moment was thinking of a neural network as a computational graph with the following types of nodes:
a scalar node has a numeric value and a numeric gradient
operation nodes perform an operation such as addition and multiplication that convert input sibling scalar nodes into an output scalar node.
The numeric gradient for a scalar node tells a scalar node how to update its value to achieve some ideal prediction. Calculating gradients is the essence of all neural networks whether it’s a toy model or GPT-3. No matter how big and complicated the model is, calculating the gradient for each node just requires the following :
the values of the sibling nodes
the operator node it feeds into
the gradient of the output node that the sibling and operator nodes created
Below is a visualization of a section of the computation graph. w1, x1, w2, x2
are the input scalar nodes, the *
nodes are the operator nodes, and the result is 2 output nodes which are input scalar nodes themselves further downstream in the computational graph. This visualization is part of the neural network code you write in the tutorial.
Pedagogy is baked into the tutorial code. How cool is that?!
Before the tutorial, I thought of neural networks as chained matrix multiplications and then you use a library like PyTorch that just magically calculates all the gradients. Stepping through the tutorial and building a toy neural network node by node and inspecting how the values and gradients change as the network trains was a huge boost in my understanding of the inner workings.
I felt I was able to achieve this boost in understanding because 1) I was willing to obsess over every detail in the tutorial and 2) Andrej Karpathy presented the materials in such a logical and coherent way that was easy to follow. It felt like you could achieve some flow state going through the tutorial in great detail. The cycle was
Watch Andrej Karpathy type out new code and his explanation of it
Pause the video and type out the new code myself in my own coding environment.
Inspect the results and play with the new code. This is arguably the most important step
Unpause and repeat the cycle
It was going through this cycle that I gained a better appreciation for what it means to know something so deeply that you can unlock deeper levels of understanding for others on the same topic.
I’m adding emulation to my pedagogical toolkit and plan to also type out some essays of my favorite writers as well to improve my writing skills.
Tools for Sharing Work
Let me just start off by saying GitHub is incredible for sharing software. But it doesn’t serve a very valuable usecase: an easy way to demonstrate your incredible software with an interactive user interface.
Enter Streamlit.
Streamlit allows anyone to build data apps with just Python scripts. The Python script renders React components under the hood so Streamlit acts like a bridge between Data Scientists who primarily work in Python and React. I wrote about my experiences further in my blog.
I think the biggest unlock for a tool like Streamlit is it removes friction for Python developers, especially those in the data and ML space who may not have front-end experience, to share their work.
Interesting reads and listens from the past week
Sam Altman (CEO of Open AI) talks about opportunities in the AI space. I agree with him that there’s a huge opportunity to tailor pre-trained models for custom domains and tasks rather than companies building models from scratch which would require collecting massive datasets.
Packy talks about writing down the future you envision and reverse-engineering the steps. He talks about the process being similar to the Amazon Future Press release exercise.
Great blog post from Hugging Face that describes how to run Large Language Models without having a lot of expensive machines on hand.