Part 2: For Today’s Graduate, Just One Word: Statistics
The New York Times - August 6, 2009
MOUNTAIN VIEW, Calif. — At Harvard, Carrie Grimes majored in anthropology and archaeology and ventured to places like Honduras, where she studied Mayan settlement patterns by mapping where artifacts were found. But she was drawn to what she calls “all the computer and math stuff” that was part of the job.
“People think of field archaeology as Indiana Jones, but much of what you really do is data analysis,” she said.
Now Ms. Grimes does a different kind of digging. She works at Google, where she uses statistical analysis of mounds of data to come up with ways to improve its search engine.Ms. Grimes is an Internet-age statistician, one of many who are changing the image of the profession as a place for dronish number nerds. They are finding themselves increasingly in demand — and even cool.“I keep saying that the sexy job in the next 10 years will be statisticians,” said Hal Varian, chief economist at Google. “And I’m not kidding.”

The rising stature of statisticians, who can earn $125,000 at top companies in their first year after getting a doctorate, is a byproduct of the recent explosion of digital data. In field after field, computing and the Web are creating new realms of data to explore — sensor signals, surveillance tapes, social network chatter, public records and more. And the digital data surge only promises to accelerate, rising fivefold by 2012, according to a projection by IDC, a research firm.

Yet data is merely the raw material of knowledge. “We’re rapidly entering a world where everything can be monitored and measured,” said Erik Brynjolfsson, an economist and director of the Massachusetts Institute of Technology’s Center for Digital Business. “But the big problem is going to be the ability of humans to use, analyze and make sense of the data.”

The new breed of statisticians tackle that problem. They use powerful computers and sophisticated mathematical models to hunt for meaningful patterns and insights in vast troves of data. The applications are as diverse as improving Internet search and online advertising, culling gene sequencing information for cancer research and analyzing sensor and location data to optimize the handling of food shipments.

Even the recently ended Netflix contest, which offered $1 million to anyone who could significantly improve the company’s movie recommendation system, was a battle waged with the weapons of modern statistics.

Though at the fore, statisticians are only a small part of an army of experts using modern statistical techniques for data analysis. Computing and numerical skills, experts say, matter far more than degrees. So the new data sleuths come from backgrounds like economics, computer science and mathematics.

They are certainly welcomed in the White House these days. “Robust, unbiased data are the first step toward addressing our long-term economic needs and key policy priorities,” Peter R. Orszag, director of the Office of Management and Budget, declared in a speech in May. Later that day, Mr. Orszag confessed in a blog entry that his talk on the importance of statistics was a subject “near to my (admittedly wonkish) heart.”

I.B.M., seeing an opportunity in data-hunting services, created a Business Analytics and Optimization Services group in April. The unit will tap the expertise of the more than 200 mathematicians, statisticians and other data analysts in its research labs — but that number is not enough. I.B.M. plans to retrain or hire 4,000 more analysts across the company.

In another sign of the growing interest in the field, an estimated 6,400 people are attending the statistics profession’s annual conference in Washington this week, up from around 5,400 in recent years, according to the American Statistical Association. The attendees, men and women, young and graying, looked much like any other crowd of tourists in the nation’s capital. But their rapt exchanges were filled with talk of randomization, parameters, regressions and data clusters. The data surge is elevating a profession that traditionally tackled less visible and less lucrative work, like figuring out life expectancy rates for insurance companies.

Ms. Grimes, 32, got her doctorate in statistics from Stanford in 2003 and joined Google later that year. She is now one of many statisticians in a group of 250 data analysts. She uses statistical modeling to help improve the company’s search technology.
For example, Ms. Grimes worked on an algorithm to fine-tune Google’s crawler software, which roams the Web to constantly update its search index. The model increased the chances that the crawler would scan frequently updated Web pages and make fewer trips to more static ones.

The goal, Ms. Grimes explained, is to make tiny gains in the efficiency of computer and network use. “Even an improvement of a percent or two can be huge, when you do things over the millions and billions of times we do things at Google,” she said.
It is the size of the data sets on the Web that opens new worlds of discovery. Traditionally, social sciences tracked people’s behavior by interviewing or surveying them. “But the Web provides this amazing resource for observing how millions of people interact,” said Jon Kleinberg, a computer scientist and social networking researcher at Cornell.

For example, in research just published, Mr. Kleinberg and two colleagues followed the flow of ideas across cyberspace. They tracked 1.6 million news sites and blogs during the 2008 presidential campaign, using algorithms that scanned for phrases associated with news topics like “lipstick on a pig.”

The Cornell researchers found that, generally, the traditional media leads and the blogs follow, typically by 2.5 hours. But a handful of blogs were quickest to quotes that later gained wide attention.

The rich lode of Web data, experts warn, has its perils. Its sheer volume can easily overwhelm statistical models. Statisticians also caution that strong correlations of data do not necessarily prove a cause-and-effect link.

For example, in the late 1940s, before there was a polio vaccine, public health experts in America noted that polio cases increased in step with the consumption of ice cream and soft drinks, according to David Alan Grier, a historian and statistician at George Washington University. Eliminating such treats was even recommended as part of an anti-polio diet. It turned out that polio outbreaks were most common in the hot months of summer, when people naturally ate more ice cream, showing only an association, Mr. Grier said.

If the data explosion magnifies longstanding issues in statistics, it also opens up new frontiers.
“The key is to let computers do what they are good at, which is trawling these massive data sets for something that is mathematically odd,” said Daniel Gruhl, an I.B.M. researcher whose recent work includes mining medical data to improve treatment. “And that makes it easier for humans to do what they are good at — explain those anomalies.”








What is Statistics?
 by Jordan Neus (from http://www.fiu.edu/~neusj/whatisstatistics.html)

Statistics is becoming increasingly more important in modern society with passing time.  We are constantly being bombarded with charts, graphs, and statistics of various types in an attempt to provide us with succinct information to make decisions.  Sometimes this information is presented in a manner so as to sway us toward a particular view.  As consumers and decision makers we must be aware of this.  Which drug should we take?  Which car should we buy?  Where will the economy go?  Who is infected with a particular deadly disease?  These are all examples of questions which are usually relegated to the statistician for analysis and dissemination.  This lecture will attempt to introduce the beginning to student some of the reasoning behind the necessity of statistical inference.

In order to realistically understand the subject of Statistics it is important to appreciate the rationale behind why and how Statistics is used by the world, at large.  That is, why do we need Statistics anyway?  This, perhaps, is a bit philosophical, yet I can not over emphasize the need for thinking along these lines.  Without proper perspective, Statistics becomes a mere mathematical exercise, diverging from the true nature of the subject.

In order to begin our analysis as to why Statistics is a necessary type of reasoning we must begin by addressing the nature of science and experimentation.  A characteristic method used by scientists is to study a relatively small collection of objects, say 2500 people, and a characteristic, say longevity, and through experimentation or observation, draw a conclusion appropriate for the entire class of objects (i.e. people, in general).  For example, suppose a study published results suggesting people who own pets live longer.  Would this mean that all people who own pets are likely to live long lives?  Does owning a pet cause longevity?  Suppose the people in the study, by chance, were on the whole, very healthy people, and therefore lived long lives: Would this invalidate the researcher’s assertion that people who own pets live longer?  The obvious problem with this type of reasoning is that these issues can never be proved absolutely.  This type of scientific reasoning is called inductive reasoning and is inherently flawed.  One can never study a sample and expect conclusions to hold true for the entire population with absolute certainty.  This is exactly why Statistics is needed.

In contrast to the lack of certainty associated with inductive reasoning, the type of logic used in Mathematics is absolutely certain.  The mathematician begins with general principles and logically concludes more specific relationships.  This type of reasoning from the general to the particular is called deductive reasoning.   A rather simplistic (but nevertheless correct) example is based on the principle that two numbers can be added in any order, thereby giving the same sum.  This is called the axiom of commutativity.  An example of deductive reasoning would be to assert that since this holds for any two numbers, surely this must hold for the numbers two and three, in particular.  We are, therefore, absolutely certain that 2 + 3 = 3 + 2, given the axiom of commutativity.

In its applied form, Statistics then becomes a bridge between the inductive uncertainty of science and the deductive certainty of Mathematics.  In his classic book, The Design of Experiments, Sir Ronald A. Fisher expresses this idea beautifully:

We may at once admit that any inference from the particular to the general must be attended with some degree of uncertainty, but this is not the same as to admit that such inference cannot be absolutely rigorous, for the nature and degree of the uncertainty may itself be capable of rigorous expression.

Statistics, therefore, is the mathematical method by which the uncertainty inherent in the scientific method is rigorously quantified.

Sign in to Google to save your progress. Learn more
Arthur Benjamin: Teach statistics before calculus!
Uploaded on Jun 29, 2009 http://www.ted.com Someone always asks the math teacher, "Am I going to use calculus in real life?" And for most of us, says Arthur Benjamin, the answer is no. He offers a bold proposal on how to make math education relevant in the digital age.
What is your name (First Last)? *
React to the above pieces and TED Talk in at least three paragraphs: *
Submit
Clear form
Never submit passwords through Google Forms.
This form was created inside of Mercer Island School District #400. Report Abuse