How we did guap?
Guap started as a way to onboard everybody on the AI journey, with no entry cost, no friction by expressing your data science project's progress in terms of a business metric. A tool to monitor your ML progress to everyone.
We believe that every data scientist should translate their metrics to business metrics as a routine. Making they are inclusive enough and they're not leaving money on the table.
And because we're convinced, we did guap to annihilate excuses to not use it. We have a bulletproof argument, this is our level of dedication. If you've heard about guap, but still work on unviable ML products, we're sorry.
These are the main principles behind its development.
Leading AI solutions to success should be everybody matter, not only business people. By linking the output of the ML model with business outcomes, we make data science more accessible, without the need of having a Ph.D. in Computer Science or Statistics.
The other way around should share the same DNA. As a Data person yourself, guap should be accessible by everyone, following well-adopted methods and languages.
This is why we've designed guap to be easily installable, using a pip install: pip install guap. Use python, which is the most used programming language for data teams according to the 2020 state of Data Science report from Anaconda. Plus, we've wanted to provide a way to have a little to no learning curve, so we took inspiration from Scikit Learn, used by almost 83% of data scientists according to the Kaggle 2020 survey report, on the usage and workflow of the library.
Ultimately, it was a no-brainer to go for the open-source route using Apache License 2.0, since we deeply believe in the impact of open tools on the adoption of ML.
When you talk about Applied AI, you'll hear a lot about how every use case is different. Even if the saying is right, it's also an easy answer, and we are not satisfied with it. The problem we're trying to solve should not be limited to a specific approach or use case.
So we've found an approach that can be applied to every supervised ML problem, event to every framework, by leveraging what's common to every model output: a confusion matrix.
The confusion matrix is a great fit since it's:
- Representative of the model performance, by evaluating the trained model on a test set that counts fresh data that we know the truth on it.
- The home of the most used metrics, like accuracy, precision, recall, f1 score, MCC, ...
- An exhaustive vision of every scenario that the end-user will face, which is super interesting to onboard non-tech stakeholders on the AI journey.
What about unsupervised? Any ML algorithm would need some technical metric to optimize. That metric is in some way related to business results, and that relationship can be found through analysis or experimentation, at this time, not a one-fits-all tool. That's the sole today's limitation to our agnostic approach with guap.
As data teams shouldn't be isolated from other teams in the org, guap shouldn't be isolated from other tools used by the data team, in our opinion.
As of today you can adopt guap into your well-used and loved tools: notebooks to tools like WandB or MLflow. This will allow you to generate a guap score everywhere on the ML lifecycle no matter what tools you used, not only a the end of training but also during training if you need to optimize it on the guap metric between two epochs.
In the future of guap, we will bridge teams by proving an API that can be used also for the business team to perform data visualization, and project your model output on a profit curve, for example.
We can't wait to see how you'll adopt guap!
This is how we've built guap!
From day one, it was our specifications, and it became our product principles.
No matter where guap is headed it will stay be simple, agnostic, free: you have no excuses to not use it for the greater good. Yes, I said for the greater good, It might sound funny since we're talking monetization, business, and dollars.
But hear me out: removing any barriers for executives to understand how the model is performing and how costly it can be to miss a good prediction can have side-effects: maximize the profit is a hidden way to minimize bias and discrimination since you'll be working to lesser false positive/false negative situations. guap isn't just about guap after all.