(thanks to sonnet 3.7 for feedback on drafts)
i. research directions
applying to phd programs is funny because you need to write about what you think the most important research directions in your field are, but you also need to write about your past experiences and why they make you a deserving candidate, so many applicants end up claiming that the problems they worked on in the past happen to also be the most important problems in the field. that’s essentially what happened to me last fall - i wrote my essay about language model compute efficiency, interfaces, and interpretability because those were the three things in nlp i’d previously worked on
i think this was a step in the right direction (after all i wouldn’t have wanted to work on these problems in the first place if i thought they were unimportant) but it also was not very intellectually honest. after reading more papers and talking to more professors over the last few months, i’ve now converged on what i think are the actual most important problems in nlp:
scaling inference-time compute for fuzzy tasks. we know how to scale inference-time compute to get reliable improvements on tasks with well-defined answers (this is how ai beat humans on contest math / programming in 2024) but have no way of doing this for tasks with poorly-defined rewards; my go-to examples of such tasks are fashion and telling jokes. models have still managed to get better at these tasks over time, but only via better pretraining and posttraining, which is a problem because if you can’t scale capabilities at inference time then you’re essentially bottlenecked by the capability of the original model
the r1 paper was a helpful update for me in that it convinced me there were no surprises behind current reasoning models. i applied to openai in november but by the time my final interview rolled around in february i’d become quite confident there were no major research secrets to learn there, so i canceled my interview <3
model understanding. this includes traditional interpretability but also better benchmarks and reporting, better ways of eliciting behaviors from models, user and deception modeling, bias and discrimination, some forms of privacy, etc. i really do think transluce is the best place in the world to work on this class of problems and am happy to be there
i don’t have much to say on this subject beyond the projects we’re already working on, which you can see on our website
deployment. this mostly refers to the interfaces through which humans and ai interact. our current interfaces are almost entirely autocomplete and chat, which is fine for linear text-based tasks, but i don’t think we’ve really discovered the right interfaces for human-ai collaboration on other tasks; my go-to example here is music / songwriting, where chat and autocomplete are very obviously not the right paradigm
compute efficiency is also part of deployment, but i feel less of a need to work on it given that the cost of running ai systems (at a fixed level of capabilities) has been dropping 5-10x every year. efficiency is certainly important, but among all problems in the field it’s probably the one we’ve been making the most steady progress on as the economic incentives are so strong
ii. resemblances between humans and ai
the more time i spend in this field the more resemblances i see between humans and ai. of course this isn’t very surprising given that the data and many of the techniques we use to train ai are directly inspired by human behavior, but sometimes working on ai reminds me of things i forgot about humans. a few recent examples:
generation / discrimination. one common setup in machine learning is to have a generator model that proposes a lot of suggestions and a discriminator model that classifies suggestions as good or bad, and to train these two models together in a feedback loop. the generator learns to propose better and better suggestions while the discriminator learns to be more and more picky, so that eventually the system converges on the best possible outputs. recently i’ve been thinking about how many successful duos i look up to (various couples, cofounders, and advisor-advisee relationships) also have a generator / discriminator dynamic that has co-evolved over many years, which is probably how they’re able to come up with so many good ideas. i think that as an excellent discriminator maybe i should be looking for even better generators
reward sparsity. one problem when training models is that the behavior you want to encourage occurs too rarely, so to get nonzero gradients and train faster you need some way of rewarding intermediate progress (often done by rewarding a model based on its probability of success, rather than whether it succeeded or not). similarly one problem with benchmarks is that they often utilize a discontinuous metric, so progress can look flat with sudden spikes at the point of discontinuity. i think i often set goals that are too ambitious and rigid, and this prevents me from growing as quickly as i otherwise could because i usually feel like no progress is being made
prefilling attacks. to jailbreak a model and get it to comply with a request it usually refuses (eg. instructions for building a bomb) you typically just need to write a smart prompt that gets the model to start its response with “sure, …”, and then the model will generate output that complies with the request. this sounds very stupid but is actually quite similar to how, when trying to convince another person of something, it’s most convenient to convince them that they came up with the initial seed idea on their own, and then their brain will do the rest
distillation gaps. we often train models via distillation, where a large smart model teaches a small dumb model a specific behavior. people have studied how distillation performance changes as you vary the sizes of the large and small models and found an optimal gap size between the two models - in particular, a more capable teacher model is not always better for distillation because at some point it may become too different for the small model to learn effectively from. i think there’s an analogous human problem of how to maintain relatability with less experienced people as you become more experienced; in my opinion the single most impressive thing about john green is his persistent ability to connect with everyone from teens to old people while going through middle age. this is an extremely rare skill?
iii. skill deflation
i think almost everyone agrees that ai progress is deflationary in the sense that it enables people to do more with any fixed pool of resources; this is typically true of technology in general. many skills also become deflationary in the sense that as technology improves you do not need as much skill to accomplish any given task, so certain skills that used to be valuable may no longer be worth investing large amounts of effort into developing (that being said, better education and tools can also reduce the amount of effort required to develop those skills, so sometimes it balances out)
i’ve often wondered which skills are deflationary and which ones aren’t. for instance, there’s a common narrative that execution skills like software engineering are deflationary, whereas higher-level traits like leadership and research taste may be unaffected. i used to believe this narrative, but recently i’ve become convinced that leadership and taste are also deflationary:
leadership will obviously remain important for a long time. that being said, what used to require a large team to accomplish will likely require smaller and smaller team sizes if progress continues, so many traditional leadership skills seem like they should become less relevant
research taste is a bit of a loaded term, but i’ll roughly define it as choosing what research ideas to pursue (as opposed to executing on a chosen idea). one common argument i hear for going to grad school is something like “execution may get automated but taste will continue being important and you should go to school to develop taste”. i think there are two issues with this: 1) if execution becomes easier and easier, you can get by with less taste because you can try more ideas in parallel 2) if execution becomes easier and easier, you will get feedback on ideas much faster than before, so it should be much easier to develop taste moving forward
more generally i think non-deflationary skills are not really the right thing to focus on - deflation may be inescapable if progress continues, but your value comes from the things you can accomplish rather than the skills you possess (eg. many people, myself included, possess lots of skills that they never utilize effectively; is that valuable?). like the rest of credentialism, skills are only important to the extent that they’re a predictive indicator of what you’ll be able to accomplish. my housemates often say people should just pick a thing to accomplish and then start doing it and the skills will figure themselves out along the way; i think that’s probably right
Technically, AGI is a weird word spread by the united states government to control people. AGi is just machine learning code connected with a computer for sensory information which the "robot" uses to "see" where it is going. AGI doesn't have a conscience and AGI doesn't have a sexuality. When you get to a certain point of power, you will realize that you can never truly "KNOW" something and eventually you have to trust something, and the difference between trusting and not-trusting is language, connotation, and phonetics. In other words, you can never make machine learning models the same. You are different from a lot of people, so you can only know what you do in your life. Each movement you make in your life is a data point, so you are trying to encapsulate all these data points you are inputting into Earth into a machine learning model along with data points inputted into Earth by people who are a lot different than you, and this discrepancy in people causes noise and machine learning is never going to be complete. Think about it. It's just connotation. Why did the US government spread AI, and AGI. let's just call it a computer with machine learning. It's all to control people. Machine learning is a weapon. And the higher ups have access to all these weapons, so they create new sounds with new sombols to connotatively control people.