Jupyter Dash

posted May 30, 2020, 7:25 AM by Chris G   [ updated May 30, 2020, 7:25 AM ]

Jupyter Dash


This library makes it easy to develop Plotly Dash apps interactively from within Jupyter environments (e.g. classic Notebook, JupyterLab, Visual Studio Code notebooks, nteract, PyCharm notebooks, etc.).

jupterlab example

Quiet log noise with Python and machine learning

posted May 8, 2020, 6:49 AM by Chris G   [ updated May 8, 2020, 6:52 AM ]

Quiet log noise with Python and machine learning

Logreduce saves debugging time by picking out anomalies from mountains of log data.

radio communication signals
Image by : 

Continuous integration (CI) jobs can generate massive volumes of data. When a job fails, figuring out what went wrong can be a tedious process that involves investigating logs to discover the root cause—which is often found in a fraction of the total job output. To make it easier to separate the most relevant data from the rest, the Logreduce machine learning model is trained using previous successful job runs to extract anomalies from failed runs' logs.

This principle can also be applied to other use cases, for example, extracting anomalies from Journald or other systemwide regular log files.

Using machine learning to reduce noise

A typical log file contains many nominal events ("baselines") along with a few exceptions that are relevant to the developer. Baselines may contain random elements such as timestamps or unique identifiers that are difficult to detect and remove. To remove the baseline events, we can use a k-nearest neighbors pattern recognition algorithm (k-NN).

Log events must be converted to numeric values for k-NN regression. Using the generic feature extraction tool HashingVectorizer enables the process to be applied to any type of log. It hashes each word and encodes each event in a sparse matrix. To further reduce the search space, tokenization removes known random words, such as dates or IP addresses.

Once the model is trained, the k-NN search tells us the distance of each new event from the baseline.

This Jupyter notebook demonstrates the process and graphs the sparse matrix vectors.

Introducing Logreduce

The Logreduce Python software transparently implements this process. Logreduce's initial goal was to assist with Zuul CI job failure analyses using the build database, and it is now integrated into the Software Factory development forge's job logs process.

At its simplest, Logreduce compares files or directories and removes lines that are similar. Logreduce builds a model for each source file and outputs any of the target's lines whose distances are above a defined threshold by using the following syntax: distance | filename:line-number: line-content.

$ logreduce diff /var/log/audit/audit.log.1 /var/log/audit/audit.log
INFO  logreduce.Classifier - Training took 21.982s at 0.364MB/(1.314kl/s) (8.000 MB - 28.884 kilo-lines)
0.244 | audit.log:19963:        type=USER_AUTH acct="root" exe="/usr/bin/su"
INFO  logreduce.Classifier - Testing took 18.297s at 0.306MB/(1.094kl/s) (5.607 MB - 20.015 kilo-lines)
99.99% reduction (from 20015 lines to 1

A more advanced Logreduce use can train a model offline to be reused. Many variants of the baselines can be used to fit the k-NN search tree.

$ logreduce dir-train audit.clf /var/log/audit/audit.log.*
INFO  logreduce.Classifier - Training took 80.883s at 0.396MB/(1.397kl/s) (32.001 MB - 112.977 kilo-lines)
DEBUG logreduce.Classifier - audit.clf: written
$ logreduce dir-run audit.clf /var/log/audit/audit.log

Logreduce also implements interfaces to discover baselines for Journald time ranges (days/weeks/months) and Zuul CI job build histories. It can also generate HTML reports that group anomalies found in multiple files in a simple interface.

KNN in Python

posted May 5, 2020, 1:54 PM by Chris G   [ updated May 5, 2020, 1:55 PM ]

KNN in Python


In this article you will learn about a very simple yet powerful algorithm called KNN or K-Nearest Neighbor. The first sections will contain a detailed yet clear explanation of this algorithm. At the end of this article you can find an example using KNN (implemented in python).

KNN Explained

KNN is a very popular algorithm, it is one of the top 10 AI algorithms (see Top 10 AI Algorithms). Its popularity springs from the fact that it is very easy to understand and interpret yet many times it’s accuracy is comparable or even better than other, more complicated algorithms.

KNN is a supervised algorithm (which means that the training data is labeled, see Supervised and Unsupervised Algorithms), it is non-parametric and lazy (instance based).

Why is lazy? Because it does not explicitly learns the model, but it saves all the training data and uses the whole training set for classification or prediction. This is in contrast to other techniques like SVM, where you can discard all non support vectors without any problem.

This means that the training process is very fast, it just saves all the values from the data set. The real problem is the huge memory consumption (because we have to store all the data) and time complexity at testing time (since classifying a given observation requires a run down of the whole data set) . But in general it’s a very useful algorithm in case of small data sets (or if you have lots of time and memory) or for educational purposes.

Other important assumption is that this algorithm requires that the data is in metric space. This means that we can define metrics for calculation distance between data points. Defining distance metrics can be a real challenge (see Nearest Neighbor Classification and Retrieval). An interesting idea is to find the distance metrics using machine learning (mainly by converting the data to vector space, represent the differences between objects as distances between vectors and learn those differences, but this is another topic, we will talk about this later).

Auto PY to EXE

posted Apr 24, 2020, 3:00 PM by Chris G   [ updated Apr 24, 2020, 3:00 PM ]

Auto PY to EXE

A .py to .exe converter using a simple graphical interface built using Eel and PyInstaller in Python.

Empty interface

Grid Studio - combines the advantages of spreadsheets and Python

posted Apr 17, 2020, 11:40 AM by Chris G   [ updated Apr 17, 2020, 11:41 AM ]

Say Goodbye to Excel? A Simple Evaluation of Python Grid Studio Using COVID-19 Data

An alternative tool for Data Analysts/ Data Scientists with skills in Python programming language.

Recently, I found an excellent open-source project “Grid Studio”. This library combines the advantages of the spreadsheet and Python in terms of data analytics.

Have you been thinking that

  • When you use MS Excel, you want to use your Python skills and libraries such as Numpy, Pandas, SciPy, Matplotlib and Scikit-learn to generate and manipulate data
  • When you use Python, you may think the tabular view of the data is needed to have a picture of the current dataset in real-time, but what you can do is only output df.head() manually.

OK, this library can satisfy all your requirements.

Before everything, let’s have a look at how it looks like. Grid Studio is a Web-based application. Here is the Web UI.

PyCaret - An open source low-code machine learning library.|

posted Apr 13, 2020, 9:20 AM by Chris G   [ updated Apr 13, 2020, 9:21 AM ]

Why PyCaret

PyCaret is an open source, low-code machine learning library in Python that allows you to go from preparing your data to deploying your model within seconds in your choice of notebook environment.

Build a COVID-19 Dashboard in Python with Plotly Dash

posted Apr 13, 2020, 9:05 AM by Chris G   [ updated Apr 13, 2020, 9:13 AM ]

Build a COVID-19 Dashboard in Python with Plotly Dash

Track Coronavirus cases with your own analytics dashboard

openfaas - Serverless Functions, Made Simple.

posted Aug 23, 2019, 10:00 AM by Chris G   [ updated Aug 23, 2019, 10:00 AM ]

Serverless Functions, Made Simple.

OpenFaaS® makes it simple to turn anything into a serverless function 
that runs on Linux or Windows through Docker Swarm or Kubernetes.

How to Code a Neural Network with Backpropagation from scratch In Python

posted Aug 20, 2019, 7:56 AM by Chris G   [ updated Aug 20, 2019, 7:57 AM ]

The backpropagation algorithm is used in the classical feed-forward artificial neural network.

It is the technique still used to train large deep learning networks.

In this tutorial, you will discover how to implement the backpropagation algorithm for a neural network from scratch with Python.

After completing this tutorial, you will know:

  • How to forward-propagate an input to calculate an output.
  • How to back-propagate error and train a network.
  • How to apply the backpropagation algorithm to a real-world predictive modeling problem.

Python Face Recognition

posted Jul 1, 2019, 1:12 PM by Chris G   [ updated Jul 1, 2019, 1:13 PM ]

This is by far the easiest face recognition (yes, recognition, not just detection!) library available for Python:

It is relatively easy to add your own photos to recognize, and this can even run on a Raspberry Pi zero, see this article:

Face Recognition Raspberry Pi Zero Party Greeter

1-10 of 37