Joymallya Chakraborty

I am currently a Ph.D. student in the RAISE Lab at North Carolina State University , under the supervision of Dr. Tim Menzies . My research interest includes algorithmic bias, ML model optimization, interpretability & explanation of black-box ML models.

Before coming to NC State, I was a full-stack software developer at TCG Digital . I obtained my bachelors degree in Computer Science from Jadavpur University . I spent two wonderful summers in Bellevue, WA and one summer in Yorktown Heights, NY doing research internships in Intel Corporation & IBM T.J. Watson Research Labs respectively.

Email  /  CV  /  Google Scholar  /  LinkedIn  /  GitHub


My research mainly focuses on solving real world problems in software engineering field using data mining and artificial intelligence methods. Previously I worked on finding "discrimination" in social coding platform. Currently my research focus is on finding and mitigating algorithmic "bias" in machine learning models.


Making Fair ML Software using Trustworthy Explanation

Joymallya Chakraborty , Kewen Peng, Tim Menzies
ASE 2020 (NIER)

Machine learning software is being used in many applications (finance, hiring, admissions, criminal justice) having a huge social impact. But sometimes the behavior of this software is biased and it shows discrimination based on some sensitive attributes such as sex, race, etc. Prior works concentrated on finding and mitigating bias in ML models. A recent trend is using instance-based model-agnostic explanation methods such as LIME to find out bias in the model prediction. Our work concentrates on finding shortcomings of current bias measures and explanation methods. We show how our proposed method based on K nearest neighbors can overcome those shortcomings and find the underlying bias of black-box models. Our results are more trustworthy and helpful for the practitioners. Finally, We describe our future framework combining explanation and planning to build fair software.


Fairway: A way to build fair ML Software

Joymallya Chakraborty , Suvodeep Majumder, Zhe Yu, Tim Menzies

Machine learning software is increasingly being used to make decisions that affect people's lives. But sometimes, the core part of this software (the learned model), behaves in a biased manner that gives undue advantages to a specific group of people (determined by sex, race, etc.). In this work, we a)explain how ground-truth bias in training data affects machine learning model fairness and how to find that bias in AI software, b)propose a methodFairwaywhich combines pre-processing and in-processing approach to remove ethical bias from training data and trained model. Our results show that we can find bias and mitigate bias in a learned model, without much damaging the predictive performance of that model. We propose that (1) testing for bias and (2) bias mitigation should be a routine part of the machine learning software development life cycle. Fairway offers much support for these two purposes.


Software Engineering for Fairness: A Case Study with Hyperparameter Optimization

Joymallya Chakraborty , Tianpei Xia, Fahmid M. Fahid, Tim Menzies
ASE 2019 (LBR Workshop)

Machine learning software is increasingly being used to make decisions that affect people's lives. Potentially, the application of that software will result in fairer decisions because (unlike humans) machine learning software is not biased. However, recent results show that the software within many data mining packages exhibit "group discrimination"; i.e. their decisions are inappropriately affected by "protected attributes" (e.g., race, gender, age, etc.). This paper shows that making fairness as a goal during hyperparamter optimization can preserve the predictive power of a model learned from a data miner while also enerates fairer results. To the best of our knowledge, this is the first application of hyperparameter optimization as a tool for software engineers to generate fairer software.


Predicting Breakdowns in Cloud Services (with SPIKE)

Jianfeng Chen, Joymallya Chakraborty , Tim Menzies , Philip Clark, Kevin Haverlock, Snehit Cherian

Maintaining web-services is a mission-critical task. Any downtime of web-based services means loss of revenue. Worse, such down times can damage the reputation of an organization as a reliable service provider (and in the current competitive web services market, such a loss of reputation causes extensive loss of future revenue). To address this issue, we developed SPIKE , a data mining tool which can predict upcoming service breakdowns, half an hour into the future.


Why Software Projects need Heroes

Suvodeep Majumder, Joymallya Chakraborty , Amritanshu Agrawal, Tim Menzies
TSE 2019 (Under review)

A "hero" project is one where 80% or more of the contributions are made by the 20% of the developers. In the literature, such projects are deprecated since they might cause bottlenecks in development and communication. This paper explores the effect of having heroes in project, from a code quality perspective. After experimenting on 1100+ GitHub projects, we conclude that heroes are very useful part of modern open source projects.


Measuring the Effects of Gender Bias on GitHub

Nasif Imtiaz, Justin Middleton , Joymallya Chakraborty , Neill Robson, Gina Bai, Emerson Murphy-Hill
ICSE 2019

Diversity, including gender diversity, is valued by many software development organizations, yet the field remains dominated by men. One reason for this lack of diversity is gender bias. In this paper, we study the effects of that bias by using an existing framework derived from the gender studies literature. We adapt the four main effects proposed in the framework by posing hypotheses about how they might manifest on GitHub, then evaluate those hypotheses quantitatively. While our results show that effects are largely invisible on the GitHub platform itself, there are still signals of women concentrating their work in fewer places and being more restrained in communication than men.


Algorithms for generating all possible spanning trees of a simple undirected connected graph: an extensive review

Maumita Chakraborty, Sumon Chowdhury, Joymallya Chakraborty , Ranjan Mehera, Rajat Kumar Pal
Complex & Intelligent Systems (Springer),2018

Generation of all possible spanning trees of a graph is a major area of research in graph theory as the number of spanning trees of a graph increases exponentially with graph size. Several algorithms of varying efficiency have been developed since early 1960s by researchers around the globe. This article is an exhaustive literature survey on these algorithms, assuming the input to be a simple undirected connected graph of finite order, and contains detailed analysis and comparisons in both theoretical and experimental behavior of these algorithms.

Industrial Experience

Doctorate Research Intern

June 2020 - August 2020 (Yorktown Heights,NY)

I Worked on State Management & Persistence in a project called Mono2Micro which converts Monolith applications to Microservices


Software Engineer Research Intern

May 2019 - August 2019 (Bellevue,Seattle)

I worked on post-training quantization of ONNX, Tensorflow DL models and Computational Graph Optimization on Onnxruntime.

May 2018 - August 2018 (Bellevue,Seattle)

I explored optimization opportunities of .NET Core Garbage Collection and implemented PoC (Proof of Concept) prototypes. The prototypes were then verified against different workloads.


Software Developer

July 2015 - June 2017 (Salt Lake,Kolkata)

I was core developer for two different projects. I designed and developed a B2B Travel Search Engine. I was responsible for implementing middleware services and integrating those with front end. For the second project, I designed & implemented an intelligence software to retrieve, analyze, transform and report data for business intelligence. It allows users to create different dashboards using its own customizable visualization. It also features advanced analytics concept like data modelling, forecasting, determining product affinity.


ASE 2019

I attended the 34th IEEE/ACM International Conference on Automated Software Engineering (ASE 2019) in San Diego, California. I presented a poster on Late Breaking Results Section. The poster is about the short paper I submitted Software Engineering for Fairness: A Case Study with Hyperparameter Optimization .



I attended the 27th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering ESEC/FSE 2019 in Tallinn, Estonia and presented two papers there. The first paper TERMINATOR: Better Automated UI Test Case Prioritization is related to Testcase prioritization and second paper Predicting Breakdowns in Cloud Services (with SPIKE) is related to Cloud Computing.

TA Experience

CSC230 - Fall 2017 (C and Software Tools)

CSC326 - Spring 2018 (Software Engineering)

CSC520 - Fall 2018 (Artificial Intelligence)

Website Template Credits Last updated: 05/14/2019