We create and learn about algorithms for “everything data”:1 data mining, network science, machine learning, data science, knowledge discovery, databases, and much more. You can read more about what we do in the Q&A with Matteo for the college website.
The methods we develop often leverage randomness (e.g., sampling, statistical hypothesis testing, sketches) and offer strong guarantees on their performance.
Our research is sponsored, in part, by the National Science Foundation under award #2006765.
When we are not working together at the whiteboard, writing code, or reading papers, you can find us in courses such as COSC-254 Data Mining, COSC-257 Databases, COSC-351 Information Theory, COSC-355 Network Science, or taking an independent study course (COSC-490) with Matteo.
Mammoths student/alumni authors in italics.
Steedman Jenkins, Stefan Walzer-Goldfeld, and Matteo Riondato. SPEck: Mining Statistically-significant Sequential Patterns Efficiently with Exact Sampling, Data Mining and Knowledge Discovery (S.I. for ECML PKDD’22). GitHub repo
Alexander Lee, Stefan Walzer-Goldfeld, Shukry Zablah, and Matteo Riondato. A Scalable Parallel Algorithm for Balanced Sampling. AAAI’22 (student abstract). GitHub repo
Cyrus Cousins, Chloe Wohlgemuth, and Matteo Riondato. Bavarian: Betweenness Centrality Approximation with Variance-Aware Rademacher Averages. ACM KDD’21.
Maryam Abuissa’24 (Spring’22–), algorithms for statistically-sound knowledge discovery.
Michelle Contreras Catalan’25 (Fall’22–), algorithms for statistically-sound knowledge discovery.
Dhyey Mavani’25 (Fall’22–), efficient implementations of algorithms for significant pattern mining.
Sarah Park’23 (Summer’22–), honors thesis on higher-power methods for statistically-significant patterns.
Stefan Walzer-Goldfeld’23 (Fall’20–), honors thesis on null models for rich data, null models for sequence datasets, scalable algorithms for cube sampling.
Steedman Jenkins’23 (Spring’21–Fall’22), mining statistically-significant patterns.
Shengdi Lin’23E (Spring’22–Fall’22), honors thesis on rapidly-converging MCMC methods for statistically-significant patterns.
Adam Gibbs’22 (Fall’21–Spring’22), parallel/distributed subgraph matching with security applications.
Alexander Lee‘22 (Fall’20–Spring’22), scalable algorithms for cube sampling, statistical evaluation of data mining results.
Vaibhav Shah’23 (Fall’21), teaching materials for the COSC-254 Data Mining course.
Maryam Abuissa’24 (Summer’21), teaching materials for the COSC-111 Introduction to Computer Science 1 course.
Holden Lee’22 (Spring’20–Spring’21), sampling-based algorithms for frequent subgraph mining.
Isaac Caruso‘21 (Fall’20–Spring’21), honors thesis on “Modeling Biological Species Presence with Gaussian Processes”.
Conrad Kuklinsky’21 (Spring’19–Fall’20), VC-dimension for intersections of half-spaces.
Margaret Drew’22 (Summer’20), assignments and videos for the COSC-111 Introduction to Computer Science 1 course.
Kathleen Isenegger’20 (Fall’19–Spring’20), honors thesis on “Approximate Mining of High-Utility Itemsets through Sampling”.
Chloe Wohlgemuth’22 (Fall’20–Spring’22), sampling-based algorithms for centrality approximations in graphs.
Shukry Zablah’20 (Fall’19–Spring’20), honors thesis on “A Parallel Algorithm for Balanced Sampling”.
If you are an Amherst student, and are interested in mixing data, computer science, probability, and statistics, please contact Matteo. With rare exceptions, you should have taken or be enrolled in COSC-211 Data Structures. Having taken one or more courses in probability and/or statistics (e.g., STAT-135, COSC-223, MATH/STAT-360) is a plus, but not necessary, although if you haven’t, it is likely that you will get to spend a semester learning about probability in computer science, possibly in an independent study course with Matteo.
That’s the reason for the
X* means “everything X”, in computer science jargon. ↩