The academic world of statistics stresses understanding theory, types of analyses, calculations and interpretations. In the world of profit-driven business and government regulations, though, there’s even more to consider, especially if you are conducting the analysis for a boss or other client. How you propose to do the work, how you support the data generation, how you interact with others, how you package the results, and what you recommend your client should do are all part of the big picture. Here is an opportunity to practice skills that a statistician needs to have in addition to the number crunching.
If you’ve completed Stats 101, it’s not unlikely that, one day, your boss will ask you to analyze some data. You may be assigned that job because you are the only person in the office who has the training. You may be assigned the job because you are smart, curious, and have the ability to quickly learn new skills on your own. You may be assigned the job because you are a rising star who the boss wants to give an opportunity to shine. Or, you may just be the most junior staff member in the office, and so, are the dumping grounds for all the crap assignments.
Here are six hypothetical situations for you to practice on. These scenarios are brief descriptions, devoid of many essential facts yet chocked with superfluous distractions. Feel free to customize the scenarios to stimulate your own thinking and any discussions you may have with other data analysts.
There are no unique correct answers. A student will answer differently from a seasoned professional. A data analyst trained in mathematics will answer differently from an individual trained in economics or engineering. What’s important is that you visualize how you might go about approaching these scenarios. And so …
- You are Chris, a new hire at a small nonprofit organization dedicated to consumer-protection. The city’s public transit system is in the process of requesting funding from the State for the purchase of new busses and train cars to replace obsolete equipment. The new equipment is needed desperately because many of the vehicles are well past their design life and can be repaired only by hand fabricating replacement parts. The system is touting improvements in its traditionally poor on-time performance and customer satisfaction since the new CEO, Fred, took over two years ago. They cite this as evidence that the system merits an infusion of cash to continue its equipment replacement program. There have been dozens of rallies held to support the system, including a sit-in in the mayor’s office, which was covered by the local TV news consumer reporter, Ned. One counter-protest was also held to oppose a rumored expansion of the maintenance yard in a minority community, but this did not make the news. The District’s Representative to the State Legislature, Ted, has championed the transit system’s cause by writing the funding legislation. Ted also promised to lobby Jed, one of the State’s two U.S. Senators, for a Federal grant to expand the system. A political opponent of Jed’s accused him of taking kickbacks, but no charges were ever filed. Fred and Ted are fraternal twins. Fred is a redhead like his mother, Ingrid. Ted is bald like his father, Jed. Your organization has obtained public records for the past five years on the system’s riders, customer satisfaction, and on-time performance. The information includes for each route the number of riders on each conveyance (bus or train), the number of seats available on the conveyance, the actual and scheduled times for the run, and any comments on equipment malfunctions or EMT/police calls for assistance. There are five train routes run twenty times per day and one hundred bus routes run fifty times a day. Because you took Stats 101, your boss, Ingrid, has asked you to analyze the data to see if the transit system’s claims are valid. You’re anxious to impress everyone on your first assignment. What would you do?
- You are Dawn. You made a bet with your friend Bob that your favorite weather forecaster, Ororo Munroe, is more accurate than his favorite forecaster, David Drake. You want to design a study that you and Bob can carry out to determine who the better weather person is. What would you do?
- You are Darren, a college student in a Quantitative Methods class. Your semester assignment is to conduct a survey of students involving some aspect of campus life. You plan to major in psychology so you need to make sure the survey project is done well. Because you arrived late for class, you ended up in the last group consisting of a jock, a nerd, a miscreant, a snob, and a stoner. You volunteer to be leader of the group. The group decides to study preferences in coffee consumption by comparing the preferences of students for different sizes and types of coffee drinks from two local coffee shops. You also want to analyze characteristics of the students, perhaps attributes like sex, age, race, class year, and height/weight. The nerd also wants to analyze the chemistry of the coffee drinks using variables such as water hardness and iron content, coffee bean type, sugar content, and temperature. The jock wants to conduct the survey in the fraternities and sororities on campus because he has contacts in all the houses who will facilitate the study. The snob, on the other hand, wants to conduct the survey anonymously over the web instead of in person because it will be less work. The miscreant wants to get free food and drinks from the coffee shops and later, sell them the data and results. The stoner also wants to gather data on drug usage and frequency and type of sexual activity to compare to the coffee preferences. You have ten weeks to complete the project. What would you do?
- You are Lois, an average resident of an average middle-class community in an average city of about 50,000. Over the past two years, you’ve heard from at least a dozen neighbors about someone in their families being diagnosed with a rare form of brain cancer. You asked State health officials to investigate the occurrences but they dismissed your concerns as coincidental. What would you do?
- You are Sid, a math teacher in a large urban high school of about 2,500 students. As a fifth-year teacher, you will receive tenure at the end of the year if you receive another year of satisfactory ratings. Your last review contained a recommendation that you become more involved in supporting the school’s administration. Principal Onyx, your boss, is especially concerned about security because of the growing number of incidences of in-school violence occurring across the country. He has compiled hundreds of reports on current and former students who have had illegal and otherwise prohibited items confiscated, ranging from weapons and drugs to cell phones and gum. Each report has information on the student’s background, including grades, family address and contact information, medical information, and a log of disciplinary actions. Onyx indicated that he will also provide similar files on students who have had no disciplinary actions, which can be used for comparison. If needed, he said he has a contact in the Sheriff’s office who can obtain information on student vehicles and traffic violations. The Principal wants you to develop a model to predict which students would be most likely to have contraband or be involved in some infraction of school rules. What would you do?
- You are Liz. For the past four years you have sold your artwork through an Internet web site a friend built for you. Now, you’re thinking of quitting your day job and selling your art full time. It’s a big decision with a lot of uncertainty and risk, so you decide to look at the data accumulated by the website database. The data includes: number of visitors, their Internet location, the pages viewed on your site, and the sites they visited before and after your site. For customers who purchased artwork, you also have data on: type of art, product price, tax, shipping, customer delivery address, credit card information, and gift wrap options. You looking for the data to give you profiles of your website visitors and customers, and from that, tell you how to improve your sales. If you can generate enough sales, you can fulfill your life’s dream of making a living from your artwork. What would do you do?
Did You Consider …
Here are some things to think about as you go through the hypothetical situations:
- What is the problem or question that needs to be addressed by the analysis? How important would the analysis be? Should the work even be done? Is there a better way to answer the question or solve the problem than statistics?
- Who would do the work, you or some hired help? If you would hire a data analyst to do the work, how would you identify, contract, manage, and compensate him or her? If you plan to do the work yourself, how can you leverage your primary area of expertise to the problem? What help, (e.g., people, information, tools) might you need to obtain?
- Are there valid data available or would they have to be generated? If you need to generate data, how will you control bias and variability? Could publicly available information be used to augment the data?
- What data analysis techniques would need to be used? What software and special expertise would you need? Are there technical assumptions or caveats that should be considered?
- Could the analysis be kept small (e.g., relatively unsophisticated descriptive statistics and graphs), completed in steps (e.g., initially at a small scale like a pilot study), or would the study need to be thorough and technically defensible?
- How long do you think it will take to scrub and analyze the data? Where might the schedule for getting the work done be problematical? Would funding be needed, and if so, where might it come from? Might the source of funding introduce any unintentional bias or apparent conflict-of-interest?
- Is there likely to be media attention or legal proceedings associated with the results? Are there any potential ethical dilemmas or political complications? Are you competing with someone else for the work? Might the results produce some undesirable outcome? What other risks to you, your client, and other stakeholders might there be in the work?
Oh, and don’t forget all the statistical specifications and decisions you have to address.
Getting the Answer that’s Right for You
In life, the correct answers aren’t in the back of the book. Sometimes, there are more than one, even many acceptable answers. Sometimes there are none. Analyzing data is like taking a long road trip. Most of the trip has nothing to do with your destination but you have to go through it to get there. If you’re not proficient in data analysis, it can be like the last bridge, tunnel, or traffic jam you have to get by, white-knuckled and sweating, before you reach your destination. If you are a statistician, it’s more like the last rest stop where you can relieve your pent up anxiety before you cruise home. Whether your analysis will be an aggravating traffic jam or a tranquil rest stop will depend on your confidence. Confidence comes from practice. Go for it!
Read more about using statistics at the Stats with Cats blog. Join other fans at the Stats with Cats Facebook group and the Stats with Cats Facebook page. Order Stats with Cats: The Domesticated Guide to Statistics, Models, Graphs, and Other Breeds of Data Analysis at Wheatmark, amazon.com, barnesandnoble.com, or other online booksellers.