# ATHDDA Week 1 titles and abstracts

Monday

1. Ejaz Ahmed (Brock). Title: High Dimensional Data Analysis:  Making Sense or Folly. Abstract: In high-dimensional statistics settings where number of variables is greater than observations, or when number of variables are increasing with the sample size, many penalized regularization strategies were studied for simultaneous variable selection and post-estimation.  Penalty estimation strategy yields good results when the model at is assumed to be sparse.  However, in a real scenario a model may include both sparse signals and weak signals. In this setting variable selection methods may not distinguish predictors with weak signals and sparse signals and will treat weak signals as sparse signals. The prediction based on a selected submodel may not be preferable due to selection bias. We suggest a high-dimensional shrinkage estimation strategy to improve the prediction performance of a submodel. Such a high-dimensional shrinkage estimator (HDSE) is constructed by shrinking a full model estimator in the direction of a candidate submodel. We demonstrate that the proposed HDSE performs uniformly better than the full estimator. Interestingly, it improves the prediction performance of the selected submodel. The relative performance of the proposed HDSE strategy is appraised by both simulation studies and the real data analysis.
2. Robert Ghrist. Title: Local-to-Global Data: Applied Algebraic Topology. Abstract: Many contemporary challenges in the sciences concern the inference of global features from local data. This passage from local- to global- data is as subtle as it is fundamental; however, it is not unprecedented. In the mathematical sciences, several types of local-to-global
challenges were overcome with new techniques -- from topology, homological algebra, and sheaves. This talk will outline both the vision and the first steps of exporting homological and topological tools to the data sciences, with an abundance of examples.
3. Tim Johnson. Title: An Overview of the Statistical Analysis of Neuorimaiging Data. Abstract: In this talk I will present an overview of the statistical analysis of neuroimaging data.  In particular,  I will focus on the analysis of fMRI data including brain mapping and connectivity studies.  I will discuss both the merits and pitfalls of the most commonly used methods and models. I will also briefly discuss Random Field Theory (RFT) that is used to address the massive multiple comparisons problem in fMRI analysis.  It turns out that the Euler characteristic—a topological measure—plays a central role in RFT.

Tuesday

1. Mat Taddy. Title: Big Data and Bayesian Nonparametrics. Abstract: Big Data is often characterized by large sample sizes and strange variable distributions. For example, consumer spending on an e-commerce website will have 10-100s million observations weekly with density spikes at zero and elsewhere and very fat right tails. Such spending will also be accompanied by a large set of potential covariates. These properties -- big and strange -- beg for nonparametric analysis. We revisit a flavor of distribution-free Bayesian nonparametrics that approximates the data generating process (DGP) with a multinomial sampling model. This model then serves as the basis for analysis of statistics -- functionals of the DGP -- that are useful for decision making regardless of the true DGP. For example, we'll discuss analysis of a least-squares indexing of treatment effect heterogeneity onto user characteristics, as well as analysis of decision trees developed for fraud prediction. The result is a framework for scalable nonparametric Bayesian decision making on massive data.
2. Ryan Budney. Title: Abelian Groups and Smith Normal Form. Abstract: Abelian groups are the language one uses to describe homology.  This talk will be about what abelian groups are, how they are related to each other, classified, how one can compute with them and mechanisms for coping with large computations.
3. Mike Mandell. Title: Introduction to Simplicial Homology.  Abstract: This talk will introduce the basic definitions and properties of the homology of simplicial complexes.
4. Joel Hass. Title: Filtered spaces and barcodes. Abstract: Understanding the structure of data at different scales leads naturally to the notion  of a filtered spaces. Persistent homology allows us to understand the structure of such spaces in a way that separates noise from real information. The output of a persistent homology computation is a barcode'' that captures when holes of a given dimension are created and when they are filled. We will introduce the concepts of a filtered space and a barcode and give some simple examples.

Wednesday

1. Michael Lesnick. Title: Stability of persistent homology. Abstract: This talk will introduce the stability theorem for persistent homology, along with some extensions.  The stability theorem is arguably the central result in the persistence theory: It provides the core mathematical justification for the use of persistent homology in the study of noisy data, and serves as a starting point for the development of statistical foundations for persistence.
2. Julie Zhou. Title: Robust statistics.  Abstract:  Since outliers may have huge influence on some statistical procedures, it is very important to use robust statistics to do data analysis and detect outliers. In the talk, I will give a short introduction to robust statistics.  Two important measures, influence function and breakdown point, will be defined to assess the robustness of statistics. Robust location and scale estimators, robust  regression, confidence region, and robust covariance matrix will be discussed.  Various examples  and R functions for robust procedures will be given.
3. Vidit Nanda. Title: Discrete Morse theory for computing homology. Abstract: From a theoretical complexity perspective, the algebraic question of how one computes homology (persistent or otherwise) of a finite cell complex has a very satisfying solution. However, as topological methods pervade data analysis, one requires significantly faster computation: there is no solace in cubical complexity algorithms when staring down billions of simplices. In this talk, we will examine a discrete version of Morse theory which may be used to whittle a gigantic cell complex down to tractable size without losing any (persistent) homological information.
4. Rachel Levanger - Title: Using Persistent Homology to study dynamics in the space of persistence diagrams, Part I. Abstract: It is common practice to study dynamical systems on domains with periodic boundary conditions to remove boundary effects imposed by a domain of finite size. While this solves one problem, it potentially creates another: two solutions that are symmetry-related may be seen as separate solutions to the system. While there are classical methods to determine if two solutions are symmetry-related, we show how persistent homology is a natural tool that can be used to quotient out these symmetries. Meant to be an introduction to persistent homology and applied topology, we assume no background in fluid dynamics, and only minimal familiarity with dynamical systems. Rather, this talk is meant to familiarize the audience with two flavors of applications of persistent homology--reducing a scalar field to a persistence diagram and analyzing the shape of a point cloud--,and to see how these two methods work together to say something about the dynamics of a time-evolving system.

Thursday

1. Justin Curry.  Title: Clustering with Cosheaves. Abstract: We will begin by recalling how the Reeb graph tracks clusters for a real-valued map. From here we will understand the analogous construction for maps to more general parameter spaces, such as R^n. This will motivate the introduction of stratification theory and the theory of constructible cosheaves, which are equivalent to functors from MacPherson's entrance path category, which I will describe.
2. Farouk Nathoo. Title: High-dimensional statistics for neuro-imaging. Abstract:   I will discuss three problems involving the analysis of neuroimaging data where the number of unknown parameters is far greater than the number of observations. In the first case I will discuss the  neuroelectromagnetic inverse problem that arises in studies involving electroencephalography (EEG) and magnetoencephalography (MEG). This is an ill-posed inverse problem that involves the recovery of time-varying neural activity at a large number of locations within the brain, from electromagnetic signals recorded at a relatively small number of external locations on or near the scalp. Framing this problem within the context of spatial variable selection for an underdetermined functional linear model, we propose a spatial mixture formulation where the profile of electrical activity within the brain is represented through location-specific spike-and-slab priors based on a spatial logistic specification. The prior specification accommodates spatial clustering in brain activation, while also allowing for the inclusion of auxiliary information derived from alternative imaging modalities, such as functional magnetic resonance imaging (fMRI). We develop a variational Bayes approach for computing estimates of neural source activity, and incorporate a nonparametric bootstrap for interval estimation.

In the second case I will discuss statistical analysis for Imaging Genomics. Recent advances in technology for brain imaging and high-throughput genotyping have motivated studies examining the influence of genetic variation on brain structure. In this setting high-dimensional regression for multi-SNP association analysis is challenging as the response variables obtained through brain imaging comprise potentially interlinked endophenotypes, and in addition, there is a desire to incorporate a biological group structure among SNPs based on their belonging genes.Wang et al. (Bioinformatics, 2012) have recently developed an approach for the analysis of imaging genomic studies based on penalized regression with regularization based on a novel group l_{2,1}-norm penalty which encourages sparsity at the gene level. While incorporating a number of useful features, a shortcoming of the proposed approach is that it only furnishes a point estimate and techniques for obtaining valid standard errors or interval estimates are not provided. We solve this problem by developing a corresponding Bayesian formulation based on a three-level hierarchical model that allows for full posterior inference using Gibbs sampling. Selection of tuning parameters for the model is discussed in detail and our proposed methodology is investigated using simulation studies as well as through the analysis of a large dataset collected as part of the Alzheimer's Disease Neuroimaging Initiative.

In the third case I will discuss the problem of brain decoding, a problem that involves the determination of a subject’s cognitive state or an associated stimulus from functional neuroimaging data measuring brain activity. In this setting the cognitive state is typically characterized by an element of a finite set, and the neuroimaging data comprise voluminous amounts of spatiotemporal data measuring some aspect of the neural signal. The associated statistical problem is one of classification from high-dimensional data. We explore the use of functional principal component analysis, mutual information networks, and persistent homology for examining the data through exploratory analysis and for constructing features characterizing the neural signal for brain decoding. These features are incorporated into a classifier based on symmetric multinomial logistic regression with elastic net regularization. The approaches are illustrated in an application where the task is to infer, from brain activity measured with magnetoencephalography (MEG), the type of video stimulus shown to a subject.
3. Hike.

Friday

1. Peter Bubenik. Title: Topological Data Analysis and Machine Learning. Abstract: Topology aggregates local metric data to provide a global summary of the ‘shape’ of the data. Persistent homology provides a summary of how the shape of the data changes with respect to changes of a parameter. For example, if the parameter is scale, then persistent homology provides a multiscale descriptor of the data. I will give an introduction of various ways in which topological data analysis can be used to provide features and kernels useful for statistical analysis and machine learning. The necessary addition to previous work is a function from the set of topological summaries to a feature space. More precisely, one requires a map from the set of persistence diagrams to a Hilbert space. From this feature map, one obtains a kernel. Using such features and kernels, one can combine topological data analysis with techniques from statistics and machine learning. These methods provide a principled approach to dimension reduction and
visualization helping scientists and engineers to make sense of their big data. I will show how this approach can be used to differentiate two classes of proteins.
2. Joel Hass. Title: Diffeomorphisms of surfaces and some applications. Abstract: There is a tremendous amount of geometric data being gathered from MRI, ultrasound, scanners, satellites, cameras etc. Much of this data concerns the geometry of surfaces, such as brain cortices, faces, protein surfaces and bones.  I will discuss how methods developed in low-dimensional topology and geometry can be used to measure the resemblance of pairs of surfaces.  I’ll show several applications, including alignment of brain images, protein structure prediction, and automatic construction of evolutionary trees.
3. Vidit Nanda. Title:  2d persistence and protein compressibility.  Abstract: A standard question in contemporary proteomics asks which properties of proteins may be directly inferred from their molecular structure. Using only X-Ray crystallography data (of the type which is cataloged in the Protein Data Bank), I will outline a method which accurately estimates the compressibility of a given protein. The method involves imposing a filtered simplicial structure around the atom centers, computing various algebraic-topological invariants, and some rudimentary statistical techniques. This is joint work with Marcio Gameiro, Yasu Hiraoka, Shunsuke Izumi, Miro Kramar and Konstantin Mischaikow.
4. Panel

Main conference webpage.