“I kid you not — we are doing something that’s never been done before,” said Yassine Dhouib ’24 about the research that he, Dara Levy ’23, and Visiting Assistant Professor of Computer Science Dave Perkins are conducting this summer. The trio are working on two different projects in the field of computer science, aiming to improve and streamline industry-standard algorithms.
The first of these is the k-nearest neighbors (KNN) algorithm, commonly used in machine learning and data science. KNN allows the user to make predictions with unfamiliar data by pulling a certain number of neighboring data points from around a specified data point, a process that can be rather time-consuming.
Major: Computer science
Hometown: La Soukra, Tunisia
High school: Pioneer School of Ariana
One application of KNN is in the healthcare industry, where it is used to sift through massive amounts of medical information and draw on certain characteristics to, for example, predict if a patient has breast cancer or heart problems. This is where KNN can be most practical and also where improvements to it would be most valuable — in analyzing large amounts of potentially life-saving data.
Dhouib provided a basic explanation of how the team streamlined the KNN algorithm: “The way we did it is called AkNN, which stands for aggregate k-nearest neighbor. The way it works is that if we’ve used [for example] five neighbors to label one data point … we take those five nearest neighbors and reduce them into this one data point. So, as we run through the algorithm, the data we’re testing against keep getting smaller and the running time gets shorter.”
Of course, Dhouib noted that he and Levy have to be sure that as the running time is decreasing, the accuracy of the program does not decrease as well. Levy said that this does not seem to be the case and that their tweaks appear to have made the program “faster and more efficient without losing too much accuracy.”
Majors: History and Economics
Hometown: Glen Cove, N.Y.
High school: Glen Cove High School
The second project is what they called the “pivot” project. Pivots are used to divide large datasets into more manageable chunks, using medians to predict where certain values would fall. Dhouib and Levy are looking to create a more “intelligent” pivot, which will allow them to run the algorithm more efficiently.
At this point, they are getting close to the end of their work. Dhouib and Levy have completed the KNN project, which culminated in a paper they hope to get published in a scientific journal. Similarly, they plan to write up and publish a paper on the results of the pivot project by the end of the summer.
Both Dhouib and Levy expressed their appreciation for the collaborative nature of the research. “If we’re stuck, we try to help each other,” said Levy. A few times per week, they have been meeting with Perkins to discuss their findings and plan their next steps. “It’s been really fun,” remarked Dhouib.