As showed in the last post, machine learning is great for pattern recognition. But it’s getting so much powerful that can be used in much more areas than only pattern recognition, such as:
- Drug Discovery and toxicology
Discovering new treatments for human diseases is one of the most complicated tasks that exist. A new compound takes years to be researched and tested and can never reach the market. These failures of chemical compounds are caused by insufficient efficiency, undetected interactions with other compounds or yet toxic effects. Tradition-ally, virtual drug screening has used only the experimental data from the particular disease being studied. However, as the volume of experimental drug screening data across many diseases continues to grow, several research groups have demonstrated that data from multiple diseases can be leveraged with multitask neural networks to improve the virtual screening effectiveness. In 2012, a team led by George Dahl won the “Merck Molecular Activity Challenge” using multi-task deep neural networks to predict the bio molecular target of a compound with an area under the curve (AUC) average performance of 0.938.
Deep neural networks were also applied to toxicity prediction because they automatically construct complex features and allow multi-task learning. In 2014, Sepp Hochreiter’s group used Deep Learning multitask neural networks to detect off-target and toxic effects of environmental chemicals in nutrients, household products and drugs have won the “Tox21 Data Challenge” of NIH, FDA and NCATS with an area under the curve (AUC) average performance of 0.846.
In 2015 was born AtomNet, the first structure-based deep convolutional neural network, designed to predict the bio activity of small molecules for drug discovery applications, achieving an AUC greater than 0.9 on 57.8% of the targets in the DUDE benchmark. AtomNet has already explored questions in cancer, neurological diseases, anti-virals, antiparasitics, and antibiotics. Molecules predicted by AtomNet have become the lead candidates in research programs.
- Bioinformatics
Bioinformatics is a field that is responsible for developing methodology and software tools to apply in biological data. Deep learning has many applications in bioinformatics, it can help to predict amino acids sequences, RNA splicing or gene ontology, etc.
DNdisorder is a system that uses DNN’s to predict protein disorder using restricted Boltzmann machines as a means to initialise the weights and then fine-tuning using a back propagation procedure. This system has an AUC of 0.8299.
- Recommendation systems
A recommendation system is a system that predicts user responses to options. There are various types of recommendation systems:
Content-based systems, that examine properties of the items recommended. For example, if a Netflix user has watched many cowboy movies, they recommend a movie classified in the database as having the “cowboy” genre.
Collaborative filtering systems, that recommend items based on similarity measures between users and/or items. The items recommended to a user are those preferred by similar users.
Hybrid recommend systems, that use the combined content-based and collaborative filtering systems. Netflix uses hybrid recommend systems. They make recommendations by comparing the watching and searching habits of similar users (collaborative filtering), as well as by offering movies that share characteristics with films that a user has rated highly (content-based filtering).
Using DNN’s, these systems can learn the user preferences and recommended new content based in past choices.
Spotify, for example, has projects to use convolutional and recurrent NN’s with collaborative filtering to predict the next artist, album or track, given the history of streams.
Fig.1. Collaborative Filtering
(The purple user watched the movies A,B and C. And because the blue user saw the movie A and B, the system recommends the C)
- Customer relationship management
Customer relationship management (CRM) is a strategy to manage the interaction between the enterprise and the customers. CRM analyses data about customers’ history with a company in order to improve business relationships with customers, focusing on retaining them in order to drive sales growth. Data mining provides the technology to analyse mass volume of data and detect hidden patterns in data to convert raw data into valuable information. The use of DNN’s will help to predict customer’s future behaviors and make recommendations for improving customer relationship. There are some studies suggesting this approach.
References
- Google, http://googleresearch.blogspot.pt/2015/03/large-scale-machine-learning-for-drug.html
- Ramsundar, B., Kearnes, S., Riley, P., Webster, D., Konerding, D., Pande, V.: Massively Multitask Networks for Drug Discovery. arXiv:1502.02072v1 (2015)
- Wallach, I., Dzamba, M., Heifets, A.: AtomNet: A Deep Convolutional Neural Network for Bioactivity Prediction in Structure-based Drug Discovery. arXiv:1510.02855v1 (2015)