Interviewing engineers is hard because it's adversarial

2023-05-18

Hiring is one of the most important functions in any company. The people who are hired end up deciding who else is hired (etc), and this basically determines what kind of company it becomes.

For software companies, the stakes for hiring engineers is very high. Software is cumulative and subject to multiplicative compounding: having software sooner lets companies do more better and faster, including building software. Software engineering is one of the few disciplines where practitioners can increase their own productivity by building better software tools for themselves with their existing skills. But this means that a bad hire that leads to a delay in the development schedule can almost never be "made up" in the future, because it doesn't have the benefits of the finished software. Bad hires literally make the company permanently smaller, slower, and less successful than it would have been otherwise.

Most software engineers hate interviewing and correctly point out that almost all interviews in this industry are uniformly bad. They're a stressful time sink, require too many touchpoints over too long a timeline, are often opaque, test skills that barely relate to the actual job, with seemingly arbitrary results, and worse, are often influenced by factors unrelated to qualifications.

The terrible state of interviewing is sort of an idle obsession in our industry; there are many opinions but very little in the way of rigorous analysis or discussion. In my experience the most commonly overlooked consideration is that interviewing is necessarily adversarial: the two sides involved have different incentives and goals. Hiring orgs want to find and hire the best candidate for their given budget. Candidates want an offer - the highest one possible, regardless of whether they are the best fit (and often, whether they are even qualified for the role). This tension makes it hard for either side to approach the problem with a 100% collaborative mindset, which is why most naive ideas for how to make interviews "better" either fail to scale, or would never work in the first place.

So here are just a few considerations that must be accounted for when designing interviews:

Factors to consider when designing an interview

Total ordering: You might be forgiven for thinking an ideal interview has a specific and clearly delineated "bar" that a candidate needs to clear to receive an offer. They either do well enough to get an offer, or they don't. Theoretically this maps to a boolean. In practice, this needs to be an integer, because what if multiple candidates pass and there's only one open position? You need a way to rank candidates, not just pass/fail them.

Cheating: Glassdoor (etc) makes it extremely easy to cheat by "studying to the test" which causes at least two problems: selectively unintentionally drifts down over time (more candidates will pass, even though the interview as designed should've failed them), and it becomes difficult to fairly compare candidates, because probably some cheated and some didn't. It's hard to counter this; basically you have a sliding scale where you're always trying to balance "freshness of your questions to mitigate cheating" and "consistency so the questions are 1) of similar difficulty over time and 2) make relative comparisons easier".

Selectivity: Every interview question should give the candidate more opportunities to demonstrate they are a good fit, and this means questions should be calibrated to the expectations of the role. But in practice it's often necessary to develop a question to weed out clearly unqualified candidates - that is to say, passing the question is necessary but not sufficient for an offer. Suppose you're hiring a senior machine learning engineer. If your goal is "minimize the risk that I proceed with someone from the bottom 30% of the candidate pool which has objectively never done any machine learning" then "explain to me in your own words how a support vector machine works" might suffice as as a screener. Arriving at the "correct" difficulty for screener and main interview questions is a difficult problem, and the candidate pool is not static: in general you'll find candidates magically get better at answering questions over time because of Glassdoor.

Repeatability & scalability: Interviews need to be somewhat repeatable and controlled, because that's a prerequisite for targeting a specific selectivity and getting a total ordering. Companies can't really be like "let's hire people strictly on the strength of their Github profile." Not everyone uses Github. Not everyone has the time or inclination to make contributions or do them publicly. How would you compare someone with a lot of C++ side projects to someone else who helps write a lot of documentation in the Javascript ecosystem? Overall this is just really difficult to scale and calibrate and grade. In a related sense, repeatability is highest when isolating variables and removing extraneous details, which is naturally in tension with conducting an interview that "mirrors an engineer's day-to-day." You often can't rely too much on something from the candidate's past so asking candidates to perform some task or challenge often ends up as part of a well-rounded interview process.

Trainability: A good heuristic to go on is "if you can't explain what a good answer is and why it's a good answer then you haven't thought rigorously enough about it, or it's a bad question." You need to be able to train people to give the interview question properly, answer clarifying questions, ask appropriate followups, and most important - grade candidates fairly. Since a good question doesn't ask for a factoid that someone can memorize and regurgitate, this can be much harder than it seems. Forcing yourself to actually put pen to paper and lay out what, exactly, is a "good/better/best" answer without accidentally encoding some flaw into it (like implied knowledge of some random factoid) can be surprisingly difficult.

Selection bias / pain tolerance: Given how high stakes hiring is, it's safe to predict that companies would take much more time to interview candidates if they could have unlimited time, money, and candidate patience. But sought-after engineers have lots of options, so they get snapped up quickly. The engineers who are *most* willing to jump through 8-round interviews in a good economy are (statistically speaking) going to be the ones that are mediocre-to-bad. Joel Spolsky famously complained about this almost 2 decades ago (until he switched to complaining about how much Facebook paid). It's the "market for lemons" principle: this person must not be very good, because they're interviewing! This results in an equilibrium where most companies know they have, at best 3-4 hours of interviewing to decide whether to make an offer or not, because they know that longer processes will repel good candidates and only leave bad candidates, overwhelming any benefits of higher confidence hiring decisions.

"I would just do x"

Most people who think there is a Better Way to interview software engineers have an interview process in mind that fails at least one of these considerations. Again, interviewing is adversarial, so you can't assume candidates won't try to game the process [1], or that they are representing their skills and work experience accurately. At the same time, being mindful of time constraints is necessary to minimize pain and avoid repelling good candidates with arduously long funnels. This limits freedom of movement quite a bit when it comes to designing a process that is both fair and effective.

This problem always settles into an equilibrium where smaller firms have an edge: it doesn't matter that their interview process isn't very scalable because they're not operating at scale. But once companies get big they all seem to basically revert to leetcode, which is evidence that the currently-innovative practices don't scale up well.

Is it possible to do better than the current meta? Will we ever move beyond "system design challenges on whiteboards" and "leetcoding exercises on Coderpad?" I'm sure we'll get there. Once something is both 1) clearly better and 2) durably works at scale, market forces essentially guarantee it'll be copied until it takes over. But right now it's not clear to me that the most vocal complainers have anything to offer in terms of (repeatable, scalable, sufficiently Glassdoor-proof) alternatives.



[1] Candidates on zoom calls cheating with a coach next to them (or in their ear) is a thing, feel free to ask me how I know.