Friday, June 24, 2016

Good and Bad Metrics

We learnt not that long ago that 66 new journals were banned by Thomson-Reuters for abusing impact factors by excessive self-citation. While crimes committed by these journals may have been egregious, the subtle, and sometimes not-so-subtle, abuse of impact factors is pervasive.

Curiously, I don't find the crimes surprising. In fact, I would be surprised, if such manipulations did not occur.

If someone (Thomson-Reuters) tells you, "I will measure your performance by this simple yardstick," and that metric has real consequences (whether libraries buy your journal), then clearly you (the publisher) are going to do everything you can to push that metric as far high as you can.

If you build a simple metric or index to quantify complex stuff (academic ranking of universities, IQ to measure smartness, student performance on standardized tests to determine teacher pay etc.), which is linked to a "real" prize, you can rest assured that your metric will be gamed. As I have said before:
I gather this fascination has something to do with out inability to grope with multidimensional complexity. We try to project a complex high-dimensional space onto a simple scalar. We like scalars because we can intuitively compare two scalars. We can order them, plot them on graphs, and run statistics on them with ease.
We can step outside the academic realm and look at few examples.

Cases where a simple metric works best is where the "thing" being measured is simple. For example, in a 100m sprint or high-jump the only thing you care about is the speed and height, respectively. I think of underlying stuff as being "one-dimensional". There is nothing to game here (no pun intended); if you can run faster, you deserve to be champion.

One example, where a simple metric of a somewhat complex thing actually works alright, is Google PageRank. Before Google came along with the really cool idea of "one link, one vote", which reduced the complex task of organizing the relative importance of websites to solving an eigenvalue problem, web search was really hit of miss.

When people did not know about the metric (PageRank in this case), it worked beautifully. Once the metric was public knowledge, and Google became a virtual monopoly in this business, the business of "search engine optimization" (SEO), which seeks to game the metric, suddenly became very lucrative.

Now Google has to do secret stuff to the keep the abusers out. Their efforts, and the consolidation of the web (wikipedia is the #1 link almost always) has helped, so that the metric has not been completely compromised (or so I think, since I don't know what Google doesn't show me).

This is a useful example, since it exposes the conflict between Google Search users (who want the more relevant results to surface to the top), and websites (who want to surface to the top, regardless of relevance), that Google has to manage.

A final example, whose story perhaps has the greatest relevance to academic short-cut metrics, is the somewhat whimsical metric of FICO credit scores in the US. These scores are supposed to determine a person's credit-worthiness, and has real significance if you want to get a mortgage or car loan. A pesky problem with the score is that it treats a perfectly frugal person, who pays her bills on time (in cash), and has never taken on any form of debt, with contempt.

A bigger problem is the reliability of the score (see Credit Scores: Not-so-Magic-Numbers), perhaps because the key ingredients that go into the score are reasonably well-known and easily gamed. One of the heartening reactions:
Golden West Financial (WB), a longtime FICO skeptic, is one of the few mortgage lenders to minimize its use in recent years—and it credits that decision for its below-average mortgage losses. Now a subsidiary of Wachovia (WB), Golden West's delinquency rate on traditional mortgages is running at 0.75%, vs. 1.04% for the industry. Richard Atkinson, who oversees part of Golden West's mortgage unit from San Antonio, says the bank calls to verify employment, examines a borrower's stock holdings and other assets, and employs a team of appraisers who are judged not by the volume of loans but by the accuracy of the appraisal over the life of the loan. "The way we do business is a lot more costly, and cost was a big reason many competitors embraced credit scoring," he says. "But some of our best borrowers had low FICO scores and our worst had FICO scores of 750."
How great it would be if we academics adopted a similar approach. 

No comments: