Seminar Schedule




MCSC 310


Web Scraping and Capture-Recapture: Can they really be used to produce official statistics?

Linda J Young, Ph.D.

Chief Mathematical Statistician and Director of Research and Development

USDA National Agricultural Statistics Service

The research at USDA’s National Agricultural Statistics Service (NASS) is focused on developing improved methods for meeting NASS’s mission to provide timely, accurate, and useful statistics in service to U.S. agriculture. Examples of the wide-ranging, current research projects will be discussed. One of these projects will be explored in more depth.

NASS maintains a list frame of all known farms and potential farms in the United States (US). Although extensive effort is made to keep this list as complete as possible, not all farms are on the NASS list frame. The farms in the emerging sectors of agriculture, including urban, organic, and local foods farms, tend to be smaller, more transient, more dispersed, and more diverse than the traditional farms in the rural areas of the US. They are also less likely to be on the NASS list frame. For the 2012 Census of Agriculture, NASS used capture-recapture methods to account for not only the undercoverage of the NASS list frame but also for nonresponse and misclassification for its Census of Agriculture. The two capture-recapture samples were the respondents from (1) the NASS list frame and (2) the June Agricultural Survey (JAS) sample drawn from the NASS area frame. A challenge with using the area frame for the second sample is that the types of farms that are often not well covered by the NASS list frame tend to be sparse in the JAS sample. Thus, NASS has been evaluating the use of web-scraped list frames as a second frame from which a sample could be drawn within a capture-recapture framework to assess undercoverage for not only the census but also for surveys. The projects that have been conducted in this area as well as a large feasibility study that is now underway will be presented. The assumptions underlying these methods and their validity will be discussed. Open questions will be highlighted.




MSCS 310


Seminars for GTAs and Others: Searching for Educational Effectiveness

Dr. Adam Molnar, Department of Statistics

The Statistics department has initiated two new initiatives to improve success rates in introductory statistics (Stat 2013 and 2023). Both initiatives, co-requisite math support and Supplemental Instruction (SI), have records of success at other colleges. This talk will contain discussion of issues in designing a publishable evaluation study. How do we evaluate student math ability? How do we minimize instructor effect? How can treatment variables and covariates be collected? Results from the fall corequisite section, which can be used in planning and course improvement, will also be presented.




MCSC 310

The Generalized Benford Distribution 

Dr. Ole J. Forsberg
Assistant Professor 
Knox College

The Benford distribution, originally formulated by Simon Newcomb in 1881, is used in forensic accountancy to detect irregular charges and fraud. It is also used in electoral forensics to detect certain types of fraud in elections. Unfortunately, one assumption underlying the Benford distribution is not satisfied in elections.

That assumption is that the upper bound in the underlying log-uniform distribution is an integer power of 10. This assumption is not met because the upper bound is the number of votes cast (turnout). If this assumption is not met in elections, then the Benford distribution should not be used to detect fraud in elections.

This presentation examines the origin of the Benford distribution, as well as its assumptions. From there, it proposes a generalization designed to account for a finite upper bound on the underlying distribution. Unfortunately, this generalization offers its own challenges, namely that each electoral division has a different upper bound. A novel solution is explored and applied to three elections.



A unified approach to false discovery rate control under dependence that incorporates null distribution and shrinkage estimation

Josie Akosa
PhD Candidate
OSU Department of Statistics

FDR-controlling procedures are less stringent but powerful multiple testing procedures for large-scale inference and are therefore the preferred error rate to control in such studies. But, the validity and accuracy of any FDR-controlling procedure is essentially determined by whether the chosen test statistic is optimal, the null distributions are correctly or conservatively specified, and whether the data are independent across tests. This study proposes two methods which provide asymptotic FDR control. The first method incorporates null distribution and shrinkage estimation into the original procedures of Benjamini andHochberg (1995) and Benjamini et al. (2006). Extensive Monte Carlo simulations show that the proposed procedures are essentially more stable and as powerful or substantially more powerful than some procedures proposed in finite sample inferential problems, provided there are at least 30 observations in each group for a case-control experiment. The second part of the study proposes a step-down procedure that explicitly incorporates information about the dependence structure of the test statistic, thereby providing a gain in power. One main distinction of this approach from existing stepwise procedures is the null distribution used in place of the unknown distribution of the test statistics. This null distribution does not rely on the restrictive subset pivotality assumption of Westfall and Young (1993).