Brainome.ai https://www.brainome.ai/ Machine Learning Fundamentally Different Fri, 23 Sep 2022 14:11:51 +0000 en-US hourly 1 https://wordpress.org/?v=6.0.7 https://www.brainome.ai/wp-content/uploads/2020/07/cropped-brainome_favicon-32x32.png Brainome.ai https://www.brainome.ai/ 32 32 185690674 Bank Telemarketing Demo https://www.brainome.ai/2022/07/12/bank-telemarketing-ml-demo/ Tue, 12 Jul 2022 16:01:54 +0000 https://www.brainome.ai/?p=6261 Brainome builds three models in under 10 minutes for the Bank Telemarketing dataset provided by Satoshida Tamoto, via Kaggle to predict if a bank client will subscribe to a term deposit.

The post Bank Telemarketing Demo appeared first on Brainome.ai.

]]>
Watch or follow along our demo that analyzes the success of bank telemarketing via a dataset provided by Satoshida Tamoto through Kaggle to predict if a bank client will subscribe to a term deposit. We will use a subset of the original dataset that has 45210 rows.

You can download the subset of the dataset we have used here or view the full data provided by UC Irvine here.

Success of Bank Telemarketing Dataset provided by Satoshida Tamoto via Kaggle

Follow along our video demo or jump over to terminal to run Brainome with us. If you have not used Brainome before, follow along our quickstart tutorial to get started.

First we will run the command -h in order to see all the possibilities we have while running Brainome. We will use -f to force the model, -e to increase effort, and -split to determine the split of our dataset for training and validation. 

In our first run, we will just run Brainome in AutoML mode to analyze the data. We will immediately ignore the column “Index” by using the command -ignorecolumns. By running in AutoML mode, we can compare four types of models and then use this information to later force the model to the one we prefer. We can see in this first run that Random Forest is the recommended model and the dataset was split at 50% for training and 50% for validation. You can follow along a summary of the results on the right hand side of the video. 

For the second run, we increase the effort to 10, force the model to Random Forest, and force the Split to 80%, let’s take look at the results. 

A little longer run time here, but we see a slight improvement in our results by increasing the effort and split. About 1% increase in Training Accuracy and a little over half of a percent for validation accuracy. Let’s try another run with even higher effort to compare the results. 

We increase to effort 50 and run again. 

This run took much longer, just over 8 minutes. We can see the results are varying, very slight increase in training accuracy, and decrease in validation accuracy. 

Summary of results from our three model builds using Brainome

The real question here is if more effort is actually better?

We can see that the middle model is more useful. There is a tendency for people to focus just on accuracy, but as we see here the model capacity has increased. The model picked a slightly lower validation accuracy, because the MEC is reduced. 

Get started with pip install brainome, and try a dataset on your own. 

The post Bank Telemarketing Demo appeared first on Brainome.ai.

]]>
6261
Celebrate Earth Day with Brainome https://www.brainome.ai/2022/04/25/celebrate-earth-day-with-brainome/ Mon, 25 Apr 2022 14:34:23 +0000 https://www.brainome.ai/?p=6254 Let’s open the conversation about the environmental risk of data science as we celebrate earth day

The post Celebrate Earth Day with Brainome appeared first on Brainome.ai.

]]>

According to the Earth Day Network, we are currently in the largest period of species extinction in 60 million years. All species of mammals, birds, reptiles, amphibians, arthropods, fish, crustaceans, corals and plants have declined, in many cases, severely.

Let’s open the conversation about the environmental risk of data science as we celebrate earth day and challenge the assumption of “infinite and costless compute resources” that has driven the industry for the past decade and led to the rise of extremely large and complicated models. 

How can we reduce the environmental risk of data science? 

One path is to make the whole machine learning process more efficient – by minimizing training data as well as the compute resources needed to create and operate models.  

This is exactly what Brainome does. Our measurements based approach means no more hyper-parameter tuning, no more guessing and checking, no more open ended experiments to determine data quality. In short, Brainome believes that just because you have a lot of CPUs and GPUs at your disposal doesn’t mean you have to use them all. We replace raw compute with scientific measurements and build incredibly small models 30x faster than Google, Microsoft, and Amazon. 

Brainome tells you what data is necessary to build the best model. 

Brainome builds models faster 30x faster than competing platforms.. 

Brainome’s models are extremely compact and can predict 300x faster than competing platforms. 

Brainome’s results are repeatable and reproducible. 

Brainome’s measurements based approach enables companies to drastically reduce the carbon footprint of their data science initiatives.. 

This Earth Day, be part of the solution. Work smarter, not harder. Use measurements.

The post Celebrate Earth Day with Brainome appeared first on Brainome.ai.

]]>
6254
Brainome selected for IRCAI GLOBAL TOP 100 https://www.brainome.ai/2022/04/25/global-top-100/ Mon, 25 Apr 2022 14:13:50 +0000 https://www.brainome.ai/?p=6250 Brainome was selected as one of the top 100 projects solving problems related to the 17 United Nations Sustainable Development Goals for 2021.

The post Brainome selected for IRCAI GLOBAL TOP 100 appeared first on Brainome.ai.

]]>
Brainome was awarded in two categories SDG 3: Good Health and Well-being and SDG 9: Industry, Innovation and Infrastructure, and was selected as promising company and product for the International Awards on Excellent Research in Artificial Intelligence and Sustainable Development.

Together IRCAI and UNESCO have an outreach of over 193 countries around the world and a network potentially reaching hundreds of AI researchers across continents.

The award celebrates the best corporate responsibility and sustainability initiatives and programs, business models using AI services and excellent research from around the world.

Read more about our project here

The post Brainome selected for IRCAI GLOBAL TOP 100 appeared first on Brainome.ai.

]]>
6250
What is Learnability and How Do You Measure It? https://www.brainome.ai/2020/12/14/learnability/ Mon, 14 Dec 2020 17:41:40 +0000 https://www.brainome.ai/?p=3308 Learnability is a lot like irony - you know it immediately when you see it but it’s a little hard to define.

The post What is Learnability and How Do You Measure It? appeared first on Brainome.ai.

]]>
Before we dive into the details, though, let’s first understand why learnability is important in the context of supervised machine learning classification problems which is our current focus here at Brainome.

What is classification? Here are some examples: ball vs strike, good credit risk vs bad credit risk, dog vs cat vs horse vs cow, etc. For these types of problems, the most fundamental question is always: can I create an accurate and generalized model (classifier) from the data I have collected? Another way of saying this is: how learnable is my data? Because, fundamentally, the more “learnable” your data is, the better your classifier will be.

So what makes one data set learnable and another data set not learnable? Let’s take a look at a few examples.

Example A: 2, 4, 6, 8

If you’re given the set of numbers [2, 4, 6, 8] and asked to guess the next number in the sequence, what would your guess be? Most people would say 10. And the next number is 12 and the rule is “+2”. This is the epitome of a learnable data set — the explanatory rule (or pattern if you like) that governs is immediately obvious.

Interestingly, adding more instances to this data set does not make it more learnable. If you were given [2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, etc], you would have still figured out the rule after analyzing the just first 4 numbers. At some point in every learnable data set, your learning must plateau. In other words, the explanatory rule has to settle at some point.

Example B: 6, 5, 1, 3

If we play the same game with [6, 5, 1, 3], then it’s a little more challenging. There is no obvious rule or pattern. And that’s because this data set consists of 4 numbers chosen at random. The overly clever readers out there may think there’s a pattern (-1, -4, +2, etc.) but we assure you that this data set is as random as [8, 6, 7, 5, 3, 0, 9] — the string of digits made famous by Tommy Tutone’s classic 80’s pop hit 🙂 

The thing about Jenny’s number (and everyone’s phone number for that matter) is that it’s just a random sequence of digits with no rhyme or reason whatsoever (no rule). And here we have the epitome of an unlearnable data set. By definition, randomness is not learnable and the best one can do with a random data set (e.g., phone numbers) is to memorize it. Said another way, randomness (“unlearnability”) is the enemy of prediction.

Human Learning vs Machine Learning

The lead character in the Amazon Prime series “Mozart in the Jungle” is a symphony conductor whose life revolves around classical music. In one episode, he asks his audience: “Why do you call it ‘classical music’?” His point is: when the symphonies were first written and performed for audiences, they were known as just “music”. Similarly, there is a modern discipline called “machine learning” which some people associate with magical predictive powers. But, the reality is that computers learn exactly the same way that humans learn — which shouldn’t really be a surprise since humans invented computers … and machine learning.

So how exactly do humans learn? Well, there are 2 fundamental ways: (1) we memorize; or (2) we recognize patterns & rules and remember the patterns & rules. As explained above, some data sets can only be memorized because they are inherently random. Non-random data sets, however, don’t require memorization because rules can be extracted and used for prediction (i.e., learning occurs). 

Compressible = Learnable = Generalization

In our Memorization is Worst Case Generalization article, we discuss two different strategies for teaching children how to multiply. One requires memorization. If we were to ask a child (or a computer) to “learn” multiplication by memorizing an ever growing multiplication table, that learner (human or digital) would eventually run out of memory. And short of having infinite memory (which is not possible), there would still be many multiplication problems that the memorization driven learner could not solve.

Alternatively, if we teach the learner a simple rule — multiplication is just adding the same number over and over again — then they would be able to solve an INFINITE number of multiplication problems while still having a lot of memory left over to learn other topics. And the memory footprint of a recursive program that adds numbers together to implement multiplication is infinitesimally small compared to the memory footprint of a 1 trillion row X 1 trillion column multiplication table. Simply put, multiplication is pretty easy to learn … as long as your learning strategy isn’t memorization.

So there’s clearly a relationship between compression and learnability. And if you’ve read “Memorization is Worst Case Generalization”, it’s now plainly obvious that the title of that article could (should ?) have been “Memorization is Worst Case Learnability”. FYI, the relationship between compression and learnability / generalization is explored in detail in this YouTube lecture.

The Punch Line (aka “I can learn that data set using X bits of memory”)

It’s probably safe to say that most adults know how to multiply. But, since we’re all unique, it’s probably also safe to assume that the multiplication model in each of our heads is slightly different and occupies a different amount of memory for everybody. Similarly, anyone who is 5 or older has a pretty general dog vs. cat classifier somewhere in their brain. Like the multiplication model, the dog vs. cat classifier is slightly different for everyone. What we cannot do today is measure the amount of memory the dog vs. cat classifier in your organic brain occupies (i.e., the number of neurons it took for you to learn the difference between dogs and cats). 

What Brainome can do today is measure the amount of memory a machine learning model needs in order to learn an arbitrary data set. The details for how we do this are explained in this academic paper. Knowing how much memory is required to learn a data set is the key to measuring (quantifying) learnability. 

If we ask ourselves, “What is the hardest thing in the world to learn?”, it should be pretty clear from Example B above that the answer is random data. Randomness can only be memorized — it cannot be learned. In fact, it is the opposite of learnable. If we can measure how much memory is needed to learn a random data set of arbitrary size and class balance, then we have our stake in the ground for determining learnability. If the same machine learning model requires less memory to learn your data set than is required to learn random data, then it’s a clear indication that your data isn’t random and there are rules & patterns that can be identified, learned and used for prediction, as depicted here:

Epilogue

Hopefully, you now have a better understanding of learnability and why it is the fundamental principle behind Brainome’s approach to machine learning. 

If not, we can at least offer one more example of irony:

P.S. If you’d like to measure the learnability of your own data sets, please visit the Brainome Web Demo.

The post What is Learnability and How Do You Measure It? appeared first on Brainome.ai.

]]>
3308
Memorization is Worst Case Generalization https://www.brainome.ai/2020/11/18/memorization-is-worst-case-generalization/ Wed, 18 Nov 2020 09:17:23 +0000 https://www.brainome.ai/?p=2708 Do you remember when you first learned to multiply? Were you one of the many kids that was given a multiplication table and told to memorize it?

The post Memorization is Worst Case Generalization appeared first on Brainome.ai.

]]>

Does this look familiar?

If so, I’m sure you remember the special pain of reading the table over and over again, committing all of the combinations to memory so you could pass the test that was scheduled for the next day. And pass you did … because young children are really good at memorization. In fact, I’m sure you got 100% on your first multiplication test.

But, did you really understand what multiplication meant? If the test consisted of problems that were NOT in the table that you memorized, would you have scored 100%? Likely not, because kids that memorize multiplication tables haven’t really learned anything — all they’ve done is implement a lookup table in their heads. And the problem with lookup tables is that the inputs have to match exactly for the table to generate any results. In other words, lookup tables (and children who memorize) are very accurate but they don’t generalize at all.

So what is the relationship between multiplication tables and supervised machine learning? Well, to answer that question, we first have to ask ourselves:

What is the goal of machine learning?

Machine Learning Goals

Here at Brainome, we believe that the goal of machine learning is to produce predictive models that are both accurate AND general. We want models that understand that multiplying means adding the same number over and over again and can apply this rule to solve ANY multiplication problem.

As explained above, one can always achieve 100% accuracy simply by memorizing the training data. A child that memorizes a multiplication table will be 100% accurate but have zero generalization. Similarly, models that use memorization (overfit) to achieve high accuracy are not desired because they cannot handle novel input. In other words, overfit models are not useful in the real world.

But don’t just take our word for it.

Consider the words of François Chollet — an AI Researcher at Google, the creator of Keras and the author of ‘Deep Learning with Python’ — who recently tweeted:

Now, you might ask: How does one measure generalization in machine learning? 

Please check out the Brainome Web Demo to get the answer  🙂

The post Memorization is Worst Case Generalization appeared first on Brainome.ai.

]]>
2708
Genomics: Using AI measurements to shorten research and be first to market. https://www.brainome.ai/2020/11/17/genomics-using-ai-measurements-to-shorten-research-and-be-first-to-market/ Tue, 17 Nov 2020 23:50:23 +0000 https://www.brainome.ai/?p=2695 With all the advancements in AI, the big question for the life science sector remains:

The post Genomics: Using AI measurements to shorten research and be first to market. appeared first on Brainome.ai.

]]>

How can machine learning be applied to Genomics?

Can you use machine learning techniques to predict cancer in patients, the success rate of a new drug or even understand why Covid affects some people vs others? 

Using machine learning to solve problems in Genomics is a lot more challenging than solving other problems.

This is mainly due to the fact that acquiring data through clinical trials is expensive and time consuming. In addition, you end up with a very hard dataset to work with.

  • Limited rows – a few hudred patients
  • Very large number of columns columns – roughly 21K, each representing a gene expressions.

 

Extracting a working model from such a challenging dataset using typical machine learning algorithms would be near impossible without overfitting. Many algorithms require a minimum of 100 data points per class, making small data sets irrelevant. This forces you to spend more money on collecting data and compute.

This is where the next step in machine learning makes a huge impact: measurements.

Brainome’s model aware pre-training measurements are able to pinpoint the handful of relevant attributes that accurately differentiate classification predictions. These few genes are the only ones needed in your model to maximize the accuracy on unseen data. Suddenly, we transform a complex problem with thousands of columns to a more manageable data set making building a general predictor entirely feasible.

Brainome has recently been working on a joint Ovarian Cancer study with Cedar Sinai. The gene expression data gathered by Cedars Sinai contained 584 sample cells each with 21000 gene expression features labeled whether the cells were healthy or cancerous. Brainome processed this data through their measurement engine ‘DaimensionsTM’.

In a couple minutes, DaimensionsTM was able to extract the single gene “VWA7”.

DaimensionsTM built a model with that single parameter to predict ovarian cancer with 100% accuracy. Brainome’s findings were confirmed by the team at Cedar Sinai that VWA7 is instrumental in predicting ovarian cancer.

In this study and several more like this one, we are able to answer in confidence that using measurements in machine learning can and should be used in Genomics. Not only can we answer complex questions, we can do it faster, saving time and money on collecting possibly unnecessary data and excessive computing resources.

The post Genomics: Using AI measurements to shorten research and be first to market. appeared first on Brainome.ai.

]]>
2695
Is “more data” and “more compute” a thing of the past? https://www.brainome.ai/2020/11/04/data-compute/ Wed, 04 Nov 2020 18:30:51 +0000 https://www.brainome.ai/?p=2065 The trend in Machine learning for the last 20 years has been that the more data you have and the more you spend on computation the more likely you are to succeed.

The post Is “more data” and “more compute” a thing of the past? appeared first on Brainome.ai.

]]>
WRONG! There is no data like enough data and the right data.

Let’s play a game!

If I give you the following sequence –2, 4, 6, 8– and ask you what comes next, you will more than likely answer 10. For that same sequence, if I ask what comes after 1000, you will tell me 1002. When providing you with the initial number sequence, there is no benefit in going all the way to 100. You will tell me very quickly: That’s enough. I got it! It’s +2. More data will take more time; use more compute, etc…. For your ML task, it means more unnecessary $ spent.

Let’s try another one. Using the following sequence of 6, 5, 1, 3 can you to guess what comes next? Don’t waste brain power; you won’t be able to answer me. Why? Because it’s the last 4 digits of my phone number. Doesn’t matter how many data points I give you, there is no way to extrapolate a rule. There is no rule. Phone numbers are random so that people can’t guess them. The emphasis on having the right data is key. Wasting time and money trying to solve the unsolvable makes no sense.

Gerald Friedland, CTO at Brainome and UC Berkeley professor says: “Memorization is worst case generalization”. The holy grail in ML is to achieve the best case generalization and avoid overfitting.

The state of the art in machine learning is to take existing models and throw as much data as possible at them and see how they perform. Then you tune hyperparameters to increase accuracy. This method only promotes the continued collection of data and ever increasing need for compute power. Should we keep on doing that?

When building a bridge, the construction crew doesn’t duplicate the 100 other bridges they built previously and see which one doesn’t collapse. This would take way too much time, cost too much and will most likely not be a perfect fit.

This is why you should always MEASURE FIRST!

The post Is “more data” and “more compute” a thing of the past? appeared first on Brainome.ai.

]]>
2065
Benefits & Applications 6 https://www.brainome.ai/2020/10/20/benefits-applications-6/ Tue, 20 Oct 2020 15:15:50 +0000 https://www.brainome.ai/?p=1655 December 10, 2020 at 10:00AM PT
Tiny Daimensions™: IoT, Wearables, and Edge Computing

The post Benefits & Applications 6 appeared first on Brainome.ai.

]]>

REGISTER NOW

Days
Hours
Minutes
Seconds

Benefits & Applications 6

Share on facebook
Share on twitter
Share on linkedin

About Benefits & Applications Part 6

Tiny Daimensions™: IoT, Wearables, and Edge Computing

How does Brainome’s “measure first” approach to machine learning make detecting and responding to data drift easy and reliable? How do you keep public-facing machine learning models in AdTech and FinTech deployments fresh and well-tuned? How does Brainome’s novel pre-build measurement approach make it possible to know when your model’s operating conditions have changed enough to warrant a reaction? What should that reaction be? Join us for our webinar “Detect Data Drift with Daimensions™ – Fast!” to see hands on how it’s done.

About Benefits & Applications Part 6

Tiny Daimensions™: IoT, Wearables, and Edge Computing

How does Brainome’s “measure first” approach to machine learning make detecting and responding to data drift easy and reliable? How do you keep public-facing machine learning models in AdTech and FinTech deployments fresh and well-tuned? How does Brainome’s novel pre-build measurement approach make it possible to know when your model’s operating conditions have changed enough to warrant a reaction? What should that reaction be? Join us for our webinar “Detect Data Drift with Daimensions™ – Fast!” to see hands on how it’s done.

Meet your hosts

Bertrand Irissou

Bertrand is CEO at Brainome. He is a serial tech entrepreneur who previously founded two successful companies: Asic Advantage Inc. and Audeme.

Lin Chase

Lin is VP of Product Management at Brainome. She is an experienced executive with an extensive track record in the successful development and delivery of artificial intelligence technologies in complex business environments.

View other events in the series

The post Benefits & Applications 6 appeared first on Brainome.ai.

]]>
1655
Benefits & Applications 5 https://www.brainome.ai/2020/10/20/benefits-applications-5/ Tue, 20 Oct 2020 15:09:02 +0000 https://www.brainome.ai/?p=1646 December 3, 2020 at 10:00AM PT
Benefits of being a compiler: simple ops, ubiquitous deployment, explainability

The post Benefits & Applications 5 appeared first on Brainome.ai.

]]>

REGISTER NOW

Days
Hours
Minutes
Seconds

Benefits & Applications 5

Share on facebook
Share on twitter
Share on linkedin

About Benefits & Applications Part 5

Benefits of being a compiler: simple ops, ubiquitous deployment, explainability

Brainome’s Daimensions™ is a compiler that takes in data and puts out predictors that are compact, standalone (Python) code. How does this make explainability so much easier with Brainome than with other machine learning approaches? How does the fact that Daimensions™ is a compiler dramatically simplify machine learning operations and deployment? Join us for our webinar “Daimensions™: the Benefits of Being a Compiler” to look under the hood on a predictor created by the system, and to see hands-on how it all works.

About Benefits & Applications Part 5

Benefits of being a compiler: simple ops, ubiquitous deployment, explainability

Brainome’s Daimensions™ is a compiler that takes in data and puts out predictors that are compact, standalone (Python) code. How does this make explainability so much easier with Brainome than with other machine learning approaches? How does the fact that Daimensions™ is a compiler dramatically simplify machine learning operations and deployment? Join us for our webinar “Daimensions™: the Benefits of Being a Compiler” to look under the hood on a predictor created by the system, and to see hands-on how it all works.

Meet your hosts

Bertrand Irissou

Bertrand is CEO at Brainome. He is a serial tech entrepreneur who previously founded two successful companies: Asic Advantage Inc. and Audeme.

Lin Chase

Lin is VP of Product Management at Brainome. She is an experienced executive with an extensive track record in the successful development and delivery of artificial intelligence technologies in complex business environments.

View other events in the series

The post Benefits & Applications 5 appeared first on Brainome.ai.

]]>
1646
Core Tech 301 https://www.brainome.ai/2020/10/20/core-tech-301-2/ Tue, 20 Oct 2020 14:06:32 +0000 https://www.brainome.ai/?p=1579 December 8, 2020 at 10:00AM PT
The big, deep picture: telling the whole story fast, with cool examples

The post Core Tech 301 appeared first on Brainome.ai.

]]>

REGISTER NOW

Days
Hours
Minutes
Seconds

Core Tech 301

Share on facebook
Share on twitter
Share on linkedin

About Core Tech 301

The big, deep picture: telling the whole story fast, with cool examples

What’s the complete story on how Brainome is bringing the powerful first systematic approach to measurement to machine learning? Join us for our “Core Tech 301” webinar for a fast-paced, mathematically specific walk-through of both the underlying technical innovations and the key benefits that Brainome’s approach brings to the table. This webinar includes details on how Brainome’s “measure first” approach to machine learning plays out in powerful new impacts in real-world projects.

About Core Tech 301

The big, deep picture: telling the whole story fast, with cool examples

What’s the complete story on how Brainome is bringing the powerful first systematic approach to measurement to machine learning? Join us for our “Core Tech 301” webinar for a fast-paced, mathematically specific walk-through of both the underlying technical innovations and the key benefits that Brainome’s approach brings to the table. This webinar includes details on how Brainome’s “measure first” approach to machine learning plays out in powerful new impacts in real-world projects.

Meet your hosts

Gerald Friedland

Dr. Gerald Friedland is CTO of Brainome, Inc and is also teaching as an adjunct professor at the Electrical Engineering and Computer Sciences department of UC Berkeley. Before that, he had been with Lawrence Livermore National Lab and the International Computer Science Institute in Berkeley. His work focuses on signal processing and machine learning. He has published more than 250 peer-reviewed articles in conferences, journals, and 3 books. Dr. Friedland received his doctorate (summa cum laude) in computer science from Freie Universitaet Berlin, Germany, in 2006.

Lin Chase

Lin is VP of Product Management at Brainome. She is an experienced executive with an extensive track record in the successful development and delivery of artificial intelligence technologies in complex business environments.

View other events in this series

The post Core Tech 301 appeared first on Brainome.ai.

]]>
1579