We create and learn about algorithms for “everything data”:1 data mining, network science, machine learning, data science, knowledge discovery, databases, and much more. You can read more about what we do in the Q&A with Matteo for the college website.
The methods we develop often leverage randomness (e.g., sampling, statistical hypothesis testing, sketches) and offer strong guarantees on their performance.
Our research is sponsored, in part, by the National Science Foundation under award #2006765.
When we are not working together at the whiteboard, writing code, or reading papers, you can find us in courses such as COSC-254 Data Mining, COSC-257 Databases, COSC-355 Network Science, or taking an independent study course (COSC-490) with Matteo.
Mammoths student/alumni authors in italics.
Maryam Abuissa’24 (Spring’22–), learning about probability and computing
Adam Gibbs’22 (Fall’21–), parallel/distributed subgraph matching with security applications.
Steedman Jenkins’23 (Spring’21–), mining statistically-significant patterns.
Alexander Lee’22 (Fall’20–), scalable algorithms for cube sampling, statistical evaluation of data mining results.
Shengdi Lin’23E (Spring’21), honors thesis on statistically-significant patterns.
Stefan Walzer-Goldfeld’23 (Fall’20–), scalable algorithms for cube sampling, mining statistically-significant patterns.
Vaibhav Shah’23 (Fall’21), teaching materials for the COSC-254 Data Mining course.
Maryam Abuissa’24 (Summer’21), teaching materials for the COSC-111 Introduction to Computer Science 1 course.
Holden Lee’22 (Spring’20–Spring’21), sampling-based algorithms for frequent subgraph mining.
Isaac Caruso‘21 (Fall’20–Spring’21), honors thesis on “Modeling Biological Species Presence with Gaussian Processes”.
Conrad Kuklinsky’21 (Spring’19–Fall’20), VC-dimension for intersections of half-spaces.
Margaret Drew’22 (Summer’20), assignments and videos for the COSC-111 Introduction to Computer Science 1 course.
Kathleen Isenegger’20 (Fall’19–Spring’20), honors thesis on “Approximate Mining of High-Utility Itemsets through Sampling”.
Chloe Wohlgemuth’22 (Fall’20–Fall’21), sampling-based algorithms for centrality approximations in graphs.
Shukry Zablah’20 (Fall’19–Spring’20), honors thesis on “A Parallel Algorithm for Balanced Sampling”.
If you are an Amherst student, and are interested in mixing data, computer science, probability, and statistics, please contact Matteo. With rare exceptions, you should have taken or be enrolled in COSC-211 Data Structures. Having taken one or more courses in probability and/or statistics (e.g., STAT-135, COSC-223, MATH/STAT-360) is a plus, but not necessary, although if you haven’t, it is likely that you will get to spend a semester learning about probability in computer science, possibly in an independent study course with Matteo.
That’s the reason for the
X* means “everything X”, in computer science jargon. ↩