Into AI Safety

Into AI Safety

Welcome to the Into AI Safety podcast website!

Like the show? Consider supporting me in my effort to democratize the conversations around advanced ML systems.

Please email all inquiries to intoaisafety@gmail.com

INTERVIEW: Polysemanticity w/ Dr. Darryl Wright

Darryl and I discuss his background, how he became interested in machine learning, and a project we are currently working on investigating the penalization of polysemanticity during the training of neural networks.

Diagram detailing information flow for the research on penalizing polysemanticity discussed in this episode

Chapters

01:46 - Interview begins
02:14 - Supernovae classification
08:58 - Penalizing polysemanticity
20:58 - Our “toy model”
30:06 - Task description
32:47 - Addressing hurdles
39:20 - Lessons learned

Links

Links to all articles/papers which are mentioned throughout the episode can be found below, in order of their appearance.

Zooniverse
BlueDot Impact
- AI Safety Fundamentals
AI Safety Support
Zoom In: An Introduction to Circuits
MNIST dataset on PapersWithCode
- MNIST on Wikipedia
Clusterability in Neural Networks
CIFAR-10 dataset
Effective Altruism Global
CLIP Blog
- CLIP on GitHub
Long Term Future Fund
Engineering Monosemanticity in Toy Models

Twitter Facebook LinkedIn

Comments

You May Also Enjoy

INTERVIEW: Scaling Democracy w/ (Dr.) Igor Krawczuk

The almost Dr. Igor Krawczuk joins me for what is the equivalent of 4 of my previous episodes. We get into all the classics: eugenics, capitalism, philosophi...

INTERVIEW: StakeOut.AI w/ Dr. Peter Park (3)

As always, the best things come in 3s: dimensions, musketeers, pyramids, and… 3 installments of my interview with Dr. Peter Park, an AI Existential Safety Po...

INTERVIEW: StakeOut.AI w/ Dr. Peter Park (2)

Join me for round 2 with Dr. Peter Park, an AI Existential Safety Postdoctoral Fellow working with Dr. Max Tegmark at MIT. Dr. Park was a cofounder of StakeO...

MINISODE: Restructure Vol. 2

After getting some advice and reflecting more on my own personal goals, I have decided to shift the direction of the podcast towards accessible content regar...