Clueless Fundatma: July 2015

Wednesday, July 29, 2015

nohup and disown

GNU Screen is good for "detaching" jobs associated with a particular terminal, so that the job carries on even when, for example, the terminal is killed, or you log out from a server.

"nohup" and "disown" are other useful Linux commands with a far shallower learning curve.

Case A: You know before-hand that you want to run a background job without interruption

nohup foo&

Case B: You submitted several background jobs, but now you want the jobs to persist even after you kill the terminal

foo1&
foo2&
disown

Or if you want to disown only a particular job, use the "jobs" command to find out job-ids and,

disown %1 # disowns only foo1

Here is a nice thread on StackExchange on the difference between nohup and disown.

Thursday, July 23, 2015

Computers, Algorithms, Insight

Michael Nielsen has in interesting piece on "The Rise of Computer-Aided Explanation" in Quanta Magazine.

Thus, we can view both statistical translation and computer-assisted proofs as instances of a much more general phenomenon: the rise of computer-assisted explanation. Such explanations are becoming increasingly important, not just in linguistics and mathematics, but in nearly all areas of human knowledge.

But as smart skeptics like Chomsky and Deligne (and critics in other fields) have pointed out, these explanations can be unsatisfying. They argue that these computer techniques are not offering us the sort of insight provided by an orthodox approach. In short, they’re not real explanations.

The ability to scavenge through "Big Data" and perform extraordinary brute force computations allow us to find "explanations". But is an explanation really an explanation, if a human cannot comprehend it?

Friday, July 17, 2015

Python Links

1. Automate the Boring Stuff:

Discusses topics like interfacing with Excel, Word, PDF docs, scheduling tasks and emails, manipulating images etc.

The same author has two free online books on writing simple computer games with Python.

2. A deep dive into matplotlib (Jake Vanderplas)

An obligatory link to handy libraries built on top of matplotlib mpltools and seaborn, which add functionality and let you customize the look and feel of matplotlib plots.

3. A keynote talk on the state of the python stack for scientific computing (HT John D Cook)

Monday, July 13, 2015

Background and Foreground Jobs in Linux

Linux/Unix lets you control how you interface with jobs quite conveniently. Here is a cheat-sheet I keep for my own use:

Send Foreground to Background

Start foreground job in terminal
Press Ctrl+Z to suspend job
Type "bg" to send it to background
Check background jobs with jobs, top, or ps commands

Example:

$ sleep 1000 # job running in foreground
^Z
[1]+ Stopped sleep 1000

$ bg
[1]+ sleep 1000 &

$ jobs
[1]+ Running sleep 1000 &

Send Background to Foreground

Suppose you have multiple background jobs running
Use jobs command to list them
"fg" brings last background job into foreground
fg %1 brings job #1 listed in output of the jobs command

Example:

$ sleep 200& # submit job#1
[1] 9074

$ sleep 100& # submit job #2
[2] 9075

$ fg # bring job #2 to foreground
sleep 100

^Z # suspend it
[2]+ Stopped sleep 100

$ bg # put it back into background
[2]+ sleep 100 &

$ jobs # check if both jobs running
[1]- Running sleep 200 &
[2]+ Running sleep 100 &

$ fg %1 # bring job #1 to foreground
sleep 200

^C # kill it using Ctrl+C

$ jobs # check to see if job#2 is still running
[2]+ Running sleep 100 &

Tuesday, July 7, 2015

Uncloudy Computing

Cloud computing has been a buzzword in the software industry for quite a few years now. I've often asked my software engineering friends, "what is so special about cloud computing? Haven't we had the client-server model since close to the dawn of the modern computing era?"

Sure, I can now access my files on DropBox or GoogleDrive from anywhere, using any device, but I've been remotely connecting to servers, for nearly two decades now. It doesn't strike me as a radical idea.

This morning, I was listening to BackStory, when one of the hosts Brian Balogh provided a good analogy. He was discussing with historian Bernie Carlson how Nicola Tesla and Edward Dean Adams used alternating current to transmit power from Niagara Falls to Buffalo.

Here is the relevant portion of the conversation from the transcript (emphasis mine):

BERNIE CARLSON: First, all of a sudden, companies could save money, because they could hook up to the grid and buy their energy from the electric company, and they could get rid of having to have big piles of coal in the backyard, in the yard of the factory, and a steam engine that the coal would feed and provide that. So all of the sudden–

BRIAN: Bernie, was this analogous to the cloud in computing, where companies that really have very little to do with computing can now just rely on a source of memory? They don’t have to go invest in huge banks of computers in order to store their data?

BERNIE CARLSON: Absolutely. That’s a perfect analogy. Companies didn’t have to have their own steam engines. They didn’t have to have their own generators. They didn’t have to have all the people that were working around those.

So the key insight (for me) is that the hype around cloud computing is not really around how it affects an individual, whose computing infrastructure may include a laptop, a desktop, a tablet, and a phone. This "infrastructure" is relatively inexpensive to administer and maintain.

It is about how, in principle, businesses no longer have to maintain and operate their own IT departments. Like the transition from local generators to the grid, this form of centralization potentially saves headaches, real-estate, and costs (due to scale and expertise).

If every business operates their own generator or IT system, there is overcapacity, since each business sizes its infrastructure to meet its peak demand, which lies idle during off-peak hours. Centralization facilitates efficient resource allocation.

Of course, the analogy is not perfect; information is not a commodity like electricity. Issues of secrecy and security are unique to information. Software upgrades, unlike electric equipment upgrades at an utility, may cause uneven disruption etc.

In any case, I found the analogy useful.

Sunday, July 5, 2015

Algorithm Links

1. Can Algorithms Hire Better? (nytimes)

The No Argument

“I look for passion and hustle, and there’s no data algorithm that could ever get to the bottom of that,” said Amish Shah, founder and chief executive of Millennium Search, an executive search firm for the tech industry. “It’s an intuition, gut feel, chemistry.” He compared it to first meeting his wife.

The Yes Argument

“Similarity between the interviewer and interviewee — they’re from the same region, went to the same school, wore the same shirt, ordered the same tea — is hugely influential, even though it’s not predictive of how they perform down the road,” said Cade Massey, who studies behavior and judgment at the Wharton School of the University of Pennsylvania.

2. Can Algorithms Provide Better Financial Advice? (The Economist)

The platforms work by asking customers a few questions about who they are and what they are saving for. Applying textbook techniques for building up a balanced portfolio—more stable bonds for someone about to retire, more volatile equities for a younger investor, and so on—the algorithm suggests a mix of assets to invest in. Nearly all plump for around a dozen index funds which cheaply track major bond or stock indices such as the S&P500. They keep clear of mutual funds, let alone individual company shares.

3. Can Algorithms be Great Investors?

If you've heard of Jim Simons or Ray Dalio, you already know that rule-based investing can produce out-performance over prolonged periods of time.

4. Can Algorithms Replace (Some) Doctors? (techcrunch, EconTalk)

Let’s start with healthcare (or sickcare, as many knowledgeable people call it). Think about what happens when you visit a doctor. You have to physically go to the hospital or some office, where you wait (with no real predictability for how long), and then the nurse probably takes you in and checks your vitals. Only after all this does the doctor show up and, after some friendly banter, asks you to describe your own symptoms. The doctor assesses them and hunts around (probably in your throat or lungs) for clues as to their source, provides the diagnosis, writes a prescription, and sends you off.

The entire encounter should take no more than 15 minutes and usually takes probably less than that. Sometimes a test or two may be ordered, if you can afford it. And, as we all know, most of the time, it turns out to be some routine diagnosis with a standard treatment . . . something a computer algorithm could do if the treatment involved no harm, or at least do as well as the median doctor.

Wednesday, July 1, 2015

Random Number Generators

In preparation for a language and platform agnostic course on Markov Chain Monte Carlo, I compiled links to random number generators (in addition to uniform) for a bunch of different platforms.

Python: Numpy by itself has a formidable list of distributions it can sample from. The scipy.stats module add even more firepower by increasing not only the number of standard distributions you can sample from, but also being able to do neat things like plotting the PDF, CDF, etc.
GNU Octave: A fairly extensive list that contains most of the usual suspects comes standard. The "Statistics" package at OctaveForge adds to this set, and like the scipy.stats module lets you do more with standard distributions.
Matlab: Core Matlab has only the barebones RNG capability - essentially uniform and normal distributions. You can enhance it by purchasing the Statistics and Machine Learning Toolbox. Also see John's implementation of RANLIB for Matlab below.

For compiled languages, my colleague John Burkardt has a implementations of RANLIB/RNGLIB which allow you to sample from "Beta, Chi-square Exponential, F, Gamma, Multivariate normal, Noncentral chi-square, Noncentral F, Univariate normal, random permutations, Real uniform, Binomial, Negative Binomial, Multinomial, Poisson and Integer uniform"

Fortran90: Useful to point out that one should perform speed tests before settling on a workhorse RNG. In-built RNGs may not be the best. Netlib also provides a fair amount of RNG capability in Fortran 77.
C++: John D. Cook also has a standalone implementation, and one that uses TR1. ALGLIB also has a limited set in C++ and a bunch of other platforms.
C: I don't code in C much anymore.
Matlab