Python

Industrial-Strength Natural Language Processing in Python

posted Mar 14, 2018, 6:58 PM by Chris G   [ updated Mar 15, 2018, 4:56 AM ]


Fastest in the world

spaCy excels at large-scale information extraction tasks. It's written from the ground up in carefully memory-managed Cython. Independent research has confirmed that spaCy is the fastest in the world. If your application needs to process entire web dumps, spaCy is the library you want to be using.

FACTS & FIGURES

Get things done

spaCy is designed to help you do real work — to build real products, or gather real insights. The library respects your time, and tries to avoid wasting it. It's easy to install, and its API is simple and productive. We like to think of spaCy as the Ruby on Rails of Natural Language Processing.

GET STARTED

Deep learning

spaCy is the best way to prepare text for deep learning. It interoperates seamlessly with TensorFlow, PyTorch, scikit-learn, Gensim and the rest of Python's awesome AI ecosystem. With spaCy, you can easily construct linguistically sophisticated statistical models for a variety of NLP problems.




Atom - best cross-platform editor for Python

posted Oct 21, 2017, 8:03 AM by Chris G   [ updated Oct 21, 2017, 8:03 AM ]

Atom is a text editor that's modern, approachable, yet hackable to the core—a tool you can customize to do anything but also use productively without ever touching a config file.


https://atom.io/

Follow this article to configure Atom to be the best cross-platform editor for Python:


Python Tools : Penetration Testers Arsenal

posted Feb 19, 2016, 12:59 PM by Chris G   [ updated Feb 19, 2016, 12:59 PM ]

Originally posted on


Network

Scapy, Scapy3k: Send, sniff and dissect and forge network packets. Usable interactively or as a library

pypcap, Pcapy and pylibpcap: Several different Python bindings for libpcap

libdnet: Low-level networking routines, including interface lookup and Ethernet frame transmission

dpkt: Fast, simple packet creation/parsing, with definitions for the basic TCP/IP protocols

Impacket: Craft and decode network packets. Includes support for higher-level protocols such as NMB and SMB

pynids: Libnids wrapper offering sniffing, IP defragmentation, TCP stream reassembly and port scan detection

Dirtbags py-pcap: Read pcap files without libpcap

flowgrep: Grep through packet payloads using regular expressions

Knock Subdomain Scan: Enumerate subdomains on a target domain through a wordlist

SubBrute: Fast subdomain enumeration tool

Mallory: Extensible TCP/UDP man-in-the-middle proxy, supports modifying non-standard protocols on the fly

Pytbull: Flexible IDS/IPS testing framework (shipped with more than 300 tests)

Debugging and Reverse Engineering

Paimei: Reverse engineering framework, includes PyDBG, PIDA, pGRAPH

Immunity Debugger: Scriptable GUI and command line debugger

mona.py: PyCommand for Immunity Debugger that replaces and improves on pvefindaddr

IDAPython: IDA Pro plugin that integrates the Python programming language, allowing scripts to run in IDA Pro

PyEMU: Fully scriptable IA-32 emulator, useful for malware analysis

pefile: Read and work with Portable Executable (aka PE) files

pydasm: Python interface to the libdasm x86 disassembling library

PyDbgEng: Python wrapper for the Microsoft Windows Debugging Engine

uhooker: Intercept calls to API calls inside DLLs, and also arbitrary addresses within the executable file in memory

diStorm: Disassembler library for AMD64, licensed under the BSD license

python-ptrace: Debugger using ptrace (Linux, BSD and Darwin system call to trace processes) written in Python

vdb / vtrace: Vtrace is a cross-platform process debugging API implemented in python, and vdb is a debugger which uses it

Androguard: Reverse engineering and analysis of Android applications

Capstone: Lightweight multi-platform, multi-architecture disassembly framework with Python bindings

PyBFD: Python interface to the GNU Binary File Descriptor (BFD) library

Fuzzing

Sulley: Fuzzer development and fuzz testing framework consisting of multiple extensible components

Peach Fuzzing Platform: Extensible fuzzing framework for generation and mutation based fuzzing (v2 was written in Python)

antiparser: Fuzz testing and fault injection API

TAOF: The Art of Fuzzing) including ProxyFuzz, a man-in-the-middle non-deterministic network fuzzer

untidy: General purpose XML fuzzer

Powerfuzzer: Highly automated and fully customizable web fuzzer (HTTP protocol based application fuzzer)

SMUDGE : Pure Python network protocol fuzzer

Mistress: Probe file formats on the fly and protocols with malformed data, based on pre-defined patterns

Fuzzbox: Multi-codec media fuzzer

Forensic Fuzzing Tools: Generate fuzzed files, fuzzed file systems, and file systems containing fuzzed files in order to test the robustness of forensics tools and examination systems

Windows IPC Fuzzing Tools: Tools used to fuzz applications that use Windows Interprocess Communication mechanisms

WSBang: Perform automated security testing of SOAP based web services

Construct: Library for parsing and building of data structures (binary or textual). Define your data structures in a declarative manner

fuzzer.py (feliam): Simple fuzzer by Felipe Andres Manzano

Fusil: Python library used to write fuzzing programs

Web

Requests: Elegant and simple HTTP library, built for human beings

HTTPie: Human-friendly cURL-like command line HTTP client

ProxMon: Processes proxy logs and reports discovered issues

WSMap: Find web service endpoints and discovery files

Twill: Browse the Web from a command-line interface. Supports automated Web testing

Ghost.py: Webkit web client written in Python

Windmill: Web testing tool designed to let you painlessly automate and debug your web application

FunkLoad: Functional and load web tester

spynner: Programmatic web browsing module for Python with Javascript/AJAX support

python-spidermonkey: Bridge to the Mozilla SpiderMonkey JavaScript engine; allows for the evaluation and calling of Javascript scripts and functions

mitmproxy: SSL-Capable, intercepting HTTP proxy. Console interface allows traffic flows to be inspected and edited on the fly

pathod / pathoc: Pathological daemon/client for tormenting HTTP clients and servers

Forensics

Volatility: Extract digital artifacts from volatile memory (RAM) samples

Rekall: Memory analysis framework developed by Google

LibForensics: Library for developing digital forensics applications

TrIDLib: Identify file types from their binary signatures. Now includes Python binding

aft: Android forensic toolkit

Malware Analysis

pyew: Command line hexadecimal editor and disassembler, mainly to analyze malware

Exefilter: Filter file formats in e-mails, web pages or files. Detects many common file formats and can remove active content

pyClamAV: Add virus detection capabilities to your Python software

jsunpack-n: Generic JavaScript unpacker: emulates browser functionality to detect exploits that target browser and browser plug-in vulnerabilities

yara-python: Identify and classify malware samples

phoneyc: Pure Python honeyclient implementation

CapTipper: Aanalyse, explore and revive HTTP malicious traffic from PCAP file

PDF

peepdf: Python tool to analyse and explore PDF files to find out if they can be harmful

Didier Stevens' PDF tools: Analyse, identify and create PDF files (includes PDFiD, pdf-parser and make-pdf and mPDF)

Opaf: Open PDF Analysis Framework. Converts PDF to an XML tree that can be analyzed and modified.

Origapy: Python wrapper for the Origami Ruby module which sanitizes PDF files

pyPDF2: Pure Python PDF toolkit: extract info, spilt, merge, crop, encrypt, decrypt...

PDFMiner: Extract text from PDF files

python-poppler-qt4: Python binding for the Poppler PDF library, including Qt4 support

Misc

InlineEgg: Toolbox of classes for writing small assembly programs in Python

Exomind: Framework for building decorated graphs and developing open-source intelligence modules and ideas, centered on social network services, search engines and instant messaging

RevHosts: Enumerate virtual hosts for a given IP address

simplejson: JSON encoder/decoder, e.g. to use Google's AJAX API

PyMangle: Command line tool and a python library used to create word lists for use with other penetration testing tools

Hachoir: View and edit a binary stream field by field

py-mangle: Command line tool and a python library used to create word lists for use with other penetration testing tools

Other Useful Libraries And Tools

IPython: Enhanced interactive Python shell with many features for object introspection, system shell access, and its own special command system

Beautiful Soup: HTML parser optimized for screen-scraping

matplotlib: Make 2D plots of arrays

Mayavi: 3D Scientific data visualization and plotting

RTGraph3D: Create dynamic graphs in 3D

Twisted: Event-driven networking engine

Suds: Lightweight SOAP client for consuming Web Services

M2Crypto: Most complete OpenSSL wrapper

NetworkX: Graph library (edges, nodes)

Pandas: Library providing high-performance, easy-to-use data structures and data analysis tools

pyparsing: General parsing module

lxml: Most feature-rich and easy-to-use library for working with XML and HTML in the Python language

Whoosh: Fast, featureful full-text indexing and searching library implemented in pure Python

Pexpect: Control and automate other programs, similar to Don Libes `Expect` system

Sikuli: Visual technology to search and automate GUIs using screenshots. Scriptable in Jython

PyQt and PySide: Python bindings for the Qt application framework and GUI library

Books

Violent Python by TJ O'Connor. A Cookbook for Hackers, Forensic Analysts, Penetration Testers and Security Engineers

Grey Hat Python by Justin Seitz: Python Programming for Hackers and Reverse Engineers.

Black Hat Python by Justin Seitz: Python Programming for Hackers and Pentesters

Python Penetration Testing Essentials by Mohit: Employ the power of Python to get the best out of pentesting

Python for Secret Agents by Steven F. Lott. Analyze, encrypt, and uncover intelligence data using Python

More Stuff

SecurityTube Python Scripting Expert (SPSE) is an online course and certification offered by Vivek Ramachandran.


The Python Arsenal for Reverse Engineering is a large collection of tools related to reverse engineering.

There is a SANS paper about Python libraries helpful for forensic analysis (PDF).

For more Python libaries, please have a look at PyPI, the Python Package Index.

OpenCV with Python tutorials

posted Jan 11, 2016, 11:01 AM by Chris G   [ updated Jan 31, 2016, 2:21 PM ]


Some great video tutorials on how to use OpenCV with Python:






https://jjyap.wordpress.com/2014/05/24/installing-opencv-2-4-9-on-mac-osx-with-python-support/

Useful Python Libraries

posted Jan 6, 2016, 6:46 AM by Chris G   [ updated Jan 11, 2016, 11:02 AM ]

Featherweight function-to-Internet-callable-function server

Expose Python functions (or class methods) as a web-enabled function for others to call

Goals:

  • Data scientist focused tool to publish simple APIs
  • It is a "featherweight" server which turns your R&D code into a web-enabled function
  • Solve the "but how can we quickly plumb our new-data-sci-code into the demo environment so it shows value to the bosses?" problem without writing a "proper server" (especially if you don't know how to write a Proper Server)
  • Publishes a function using Flask with just 3 lines and little web knowledge
  • Supports scikit-learn and numpy objects (without making you think about correct JSON encoding)
  • Useful error messages are provided at run-time to help diagnose issues
  • Text arguments from an HTTP call are automatically converted to float arguments by default

It does not solve these problems:

  • It is not scalable (it isn't designed for production use)
  • It has no security
  • It does not replace Flask, Django or any other Proper Web Framework

Written for:

  • Python 3.4+
  • Flask 0.10+






Python REST API Framework

Eve is an open source Python REST API framework designed for human beings. It allows to effortlessly build and deploy highly customizable, fully featured RESTful Web Services.

Eve is powered by Flask, Redis, Cerberus, Events and offers support for both MongoDBand SQL backends [*].

The codebase is thoroughly tested under Python 2.6, 2.7, 3.3, 3.4 and PyPy.





Use the Gensim implementation of Word2Vec

posted Nov 24, 2015, 1:26 PM by Chris G   [ updated Jan 11, 2016, 11:03 AM ]

 Some basic examples for the use of the Python Word2Vec implementation in Gensim:

 

#!/usr/bin/env python

from gensim.models import Word2Vec

sentences = [['first', 'sentence'], ['second', 'sentence']]
# train word2vec on the two sentences
model = gensim.models.Word2Vec(sentences, min_count=1)

#or with different options
model = Word2Vec(sentences, size=100, window=5, min_count=5, workers=4)

#Persist a model to disk with:
model.save(fname)


#Advanced users can load a model and continue training it with more sentences: 
model = gensim.models.Word2Vec.load('/tmp/mymodel')
model.train(more_sentences)
 





A more effective way...load sentences from a text file:

 

#!/usr/bin/env python

from gensim.models import Word2Vec

   
class MySentences(object):
 def __init__(self, dirname):
     self.dirname = dirname

 def __iter__(self):
     for fname in os.listdir(self.dirname):
         for line in open(os.path.join(self.dirname, fname)):
             yield line.split()
sentences = MySentences('/some/directory') # a memory-friendly iterator
model = gensim.models.Word2Vec(sentences)





Load an existing model, for example the “text8” corpus that can be downloaded from http://mattmahoney.net/dc/text8.zip .

 

#!/usr/bin/env python

from gensim.models import Word2Vec

   
#model = Word2Vec.load(path/to/your/model)

#model = Word2Vec.load_word2vec_format('/tmp/vectors.txt', binary=False)  # C text format
model = Word2Vec.load_word2vec_format('GoogleNews-vectors-negative300.bin', binary=True)  # C binary format




Some examples of use:

 
  
print model.similarity('france', 'spain')


print model.most_similar_cosmul(positive=['baghdad', 'england'], negative=['london'])


print model.n_similarity(['sushi', 'shop'], ['japanese', 'restaurant'])

print model.n_similarity(['restaurant', 'japanese'], ['japanese', 'restaurant'])

print model.n_similarity(['sushi'], ['restaurant']) == model.similarity('sushi', 'restaurant')

print model.most_similar(positive=['woman', 'king'], negative=['man'], topn=1)
print model.doesnt_match("breakfast cereal dinner lunch".split())
print model.similarity('woman', 'man')

model.most_similar(['man'])

This is just the beginning...


http://radimrehurek.com/gensim/models/word2vec.html


Find more pre-trained Word2Vec models at https://code.google.com/p/word2vec/


Or pull the entire Wikipedia data:



https://radimrehurek.com/gensim/wiki.html


Transfer files from a Linux system in seconds

posted Oct 8, 2015, 12:51 PM by Chris G   [ updated Nov 24, 2015, 4:48 PM ]

This is by far the easiest way to make logfiles available to developers, or to get many files from a Linux system.
 
All you need is Python, this is the simplest implementation of a basic webserver:
 
python -m SimpleHTTPServer 8080

Python Web Development: Understanding Django for Beginners

posted Sep 15, 2015, 4:10 PM by Chris G   [ updated Jan 11, 2016, 11:04 AM ]


How to implement a neural network in Python

posted Jun 12, 2015, 9:09 AM by Chris G   [ updated Jan 11, 2016, 11:05 AM ]

A Python great tutorial:


Using neural nets to recognize handwritten digits





How the backpropagation algorithm works





Brain-inspired algorithms may make for optimized computational networks






Recurrent Neural Networks Tutorial, Part 1 – Introduction to RNNs







NVIDIA Get Started with Deep Learning



  • If you are a data scientist designing neural networks for image classification use the NVIDIA Deep Learning GPU Training System (DIGITS)
  • If you are a deep learning researcher or developer choose one of these widely-used open source deep learning frameworks and accelerate it with the CUDA Deep Neural Network (cuDNN) library:
    • Caffe – developed by Yangqing Jia while in the PhD program at University of California at Berkeley
    • Theano - A Python library that allows you to efficiently define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays
    • Torch - A scientific computing framework with wide support for machine learning algorithms

  • Andrew Ng's Coursera course provides a good introduction to deep learning (CourseraYouTube)
  • Yann LeCun’s NYU Course on Deep Learning, Spring 2014 (TechTalks)
  • Geoffrey Hinton's “Neural Networks for Machine Learning” course from Oct 2012 (Coursera)
  • Rob Fergus's "Deep Learning for Computer Vision" tutorial from NIPS 2013 (slidesvideo)
  • Caltech's introductory deep learning course taught by Yasser Abu-Mostafa (YouTube)
  • Stanford CS224d: Deep Learning for Natural Language Processing (video, slides, tutorials)

Using Python for Natural Language Processing (NLP)

posted May 29, 2015, 8:06 AM by Chris G   [ updated Jun 1, 2015, 4:04 AM ]

Here are a few practical articles on how to
 
Gensim and Word2vec:

Similarity Queries
 
Modern Methods for Sentiment Analysis
 
Word2vec Tutorial
 
models.word2vec – Deep learning with word2vec


Using sklearn:

Teaching a Computer to Read
http://blog.scripted.com/staff/nlp-hacking-in-python/


1-10 of 16