Featured Posts

A method for JIT'ing algorithms and data structures with LLVMA method for JIT'ing algorithms and data structures... Hello folks, I always post about Python and EvoComp (Pyevolve), but this time it's about C, LLVM, search algorithms and data structures. This post describes the efforts to implement an idea: to JIT...

Readmore

Send Subversion commit messages to TwitterSend Subversion commit messages to Twitter Hello, this is a bridge between Subversion (svn) and Twitter, the intent of this tool is to update a Twitter account when new commit messages arrives in a Subversion repository. We almost never have access...

Readmore

An analysis of Benford's law applied to TwitterAn analysis of Benford's law applied to Twitter Benford's law is one of those very weird things that we can't explain, and when we discovery more and more events which obeys the law, we became astonished. Two men (Simon Newcomb - 1881 and Frank Benford...

Readmore

TSP on Nokia N73 using PyS60 and PyevolveTSP on Nokia N73 using PyS60 and Pyevolve As I promised on the post about Pyevolve on N73, I've ported the TSP problem to run on the cellphone. The script will draw the best individual of every generation on the screen using the PyS60 Canvas API....

Readmore

Python: acessing near real-time MODIS images and fire data from NASA's Aqua and Terra satellitesPython: acessing near real-time MODIS images and fire... From Modis Rapid Response System site: The MODIS Rapid Response System was developed to provide daily satellite images of the Earth's landmasses in near real time. True-color, photo-like imagery and...

Readmore

  • Prev
  • Next

A method for JIT’ing algorithms and data structures with LLVM

Posted on : 22-11-2009 | By : Perone | In : LLVM, c

2

llvm_dragon

Hello folks, I always post about Python and EvoComp (Pyevolve), but this time it’s about C, LLVM, search algorithms and data structures. This post describes the efforts to implement an idea: to JIT (verb) algorithms and the data structures used by them, together.

AVL Tree Intro

Here is a short intro to AVL Trees from Wikipedia:

In computer science, an AVL tree is a self-balancing binary search tree, and it is the first such data structure to be invented. In an AVL tree, the heights of the two child subtrees of any node differ by at most one; therefore, it is also said to be height-balanced. Lookup, insertion, and deletion all take O(log n) time in both the average and worst cases, where n is the number of nodes in the tree prior to the operation. Insertions and deletions may require the tree to be rebalanced by one or more tree rotations.

The problem and the idea

When we have a data structure and algorithms to handle (insert, remove and lookup) that structure, the native code of our algorithm is usually full of overhead; for example, in an AVL Tree (Balanced Binary Tree), the overhead appear in: checking if we really have a left or right node while traversing the nodes for lookups, accessing nodes inside nodes, etc. This overhead creates unnecessary assembly operations which in turn, creates native code overhead, even when the compiler optimize it. This overhead directly impacts on the performance of our algorithm (this traditional approach, of course, give us a very flexible structure and the complexity (not Big-O) is easy to handle, but we pay for it: performance loss).

Share:
  • Reddit
  • del.icio.us
  • Digg
  • Mixx
  • Slashdot
  • StumbleUpon
  • Technorati
  • Facebook
  • TwitThis
  • description
  • Live
  • NewsVine
  • Ping.fm

Pyevolve on SIGEVOlution

Posted on : 09-11-2009 | By : Perone | In : News, Pyevolve, Python, Science

3

SIGEVOlution200901WebCover

I’m proud to announce that Pyevolve is featuring on the last issue of SIGEVOlution (Volume 4, Issue 1), a newsletter from the ACM Special Interest Group on Evolutionary Computation. I would like to thank the newsletter editor Pier Luca Lanzi and the board for the corrections in the article and for the well done reformatted version of the paper.

Pyevolve is currently in version 0.5, in a few months I’ll be releasing the new 0.6 release with the new major features that are currently implemented in the development version only (you can check it at the subversion repository in sourceforge.net).

I hope you enjoy the article !

Yours,
- Christian S. Perone

Share:
  • Reddit
  • del.icio.us
  • Digg
  • Mixx
  • Slashdot
  • StumbleUpon
  • Technorati
  • Facebook
  • TwitThis
  • description
  • Live
  • NewsVine
  • Ping.fm

Pyevolve benchmark on different Python flavors

Posted on : 19-10-2009 | By : Perone | In : Pyevolve, Python

14

So I did a comparative of Pyevolve GP/GA core in different Python interpreters. I’ve used my Pentium Core 2 Duo (E4500 @ 2.20GHz, 1GB RAM), using Ubuntu 9.04 and Windows XP SP3 just for IronPython 2.6.1 (IronPython doesn’t run with Mono, so I used the win xp with .net 2.0).

The interpreters used were:

Unladen Swallow 2009Q2

I tried using 2009Q3 (the currently main trunk), but I think it’s unstable yet, cause it was more slow than 2009Q2, so I used 2009Q2; I compiled it with GCC 4.3.3 just using the default configure parameters (./configure).

CPython 2.6.2

I used the default CPython package of Ubuntu 9.04.

CPython 2.5.4

I used the default CPython package of Ubuntu 9.04 too, the python2.5 package.

PyPy 1.1.0 (svn:r68612)

I used the last svn version of the repository, the release 68612. My Pentium Core 2 Duo had only 1GB of RAM, and the PyPy translation process eats more RAM than Java (sorry for the joke), so I used a notebook with 3GB of RAM to create the pypy-c, what took 1 hour (I used –opt=3) and a beautiful ascii Mandelbrot fractal !

Jython 2.5.1

I used the default installer from the Jython project site. I used the Sun JRE 1.6.0_16.

IronPython 2.6.10920.0

I’ve used the 2.6 RC1 available at IronPython project site with MS .NET 2.0.

To test the GA core I’ve used this source-code (a simple sphere function):

?View Code PYTHON
from pyevolve import G1DList
from pyevolve import Mutators, Initializators
from pyevolve import GSimpleGA, Consts
 
# This is the Sphere Function
def sphere(xlist):
   total = 0
   for i in xlist:
      total += i**2
   return total
 
def run_main():
   genome = G1DList.G1DList(140)
   genome.setParams(rangemin=-5.12, rangemax=5.13)
   genome.initializator.set(Initializators.G1DListInitializatorReal)
   genome.mutator.set(Mutators.G1DListMutatorRealGaussian)
   genome.evaluator.set(sphere)
 
   ga = GSimpleGA.GSimpleGA(genome, seed=666)
   ga.setMinimax(Consts.minimaxType["minimize"])
   ga.setGenerations(1500)
   ga.setMutationRate(0.01)
   ga.evolve(freq_stats=500)
 
   best = ga.bestIndividual()
 
if __name__ == "__main__":
   run_main()

And to test the GP core, I’ve used this source-code (a simple symbolic regression):

?View Code PYTHON
from pyevolve import GTree
from pyevolve import Mutators
from pyevolve import GSimpleGA, Consts, Util
import math
 
rmse_accum = Util.ErrorAccumulator()
 
def gp_add(a, b): return a+b
def gp_sub(a, b): return a-b
def gp_mul(a, b): return a*b
def gp_sqrt(a):   return math.sqrt(abs(a))
 
def eval_func(chromosome):
   global rmse_accum
   rmse_accum.reset()
   code_comp = chromosome.getCompiledCode()
 
   for a in xrange(0, 10):
      for b in xrange(0, 10):
         evaluated     = eval(code_comp)
         target        = math.sqrt((a*a)+(b*b))
         rmse_accum   += (target, evaluated)
   return rmse_accum.getRMSE()
 
def main_run():
   genome = GTree.GTreeGP()
   genome.setParams(max_depth=4, method="ramped")
   genome.evaluator += eval_func
   genome.mutator.set(Mutators.GTreeGPMutatorSubtree)
 
   ga = GSimpleGA.GSimpleGA(genome, seed=666)
   ga.setParams(gp_terminals       = ['a', 'b'],
                gp_function_prefix = "gp")
 
   ga.setMinimax(Consts.minimaxType["minimize"])
   ga.setGenerations(40)
   ga.setCrossoverRate(1.0)
   ga.setMutationRate(0.08)
   ga.setPopulationSize(800)
 
   ga(freq_stats=10)
   best = ga.bestIndividual()
 
if __name__ == "__main__":
   main_run()

UPDATE 19/08: the x-axis is measured in “seconds“, and the y-axis is the python flavor;

The results are are described in the graph below:

pyevolve_pyvmsAs we can see, Unladen Swallow 2009Q2 did a little better performance than CPython 2.6.2, but Jython and PyPy (experimental) were left behind in that scenario, even behind IronPython 2.6.1.

Share:
  • Reddit
  • del.icio.us
  • Digg
  • Mixx
  • Slashdot
  • StumbleUpon
  • Technorati
  • Facebook
  • TwitThis
  • description
  • Live
  • NewsVine
  • Ping.fm

Successful pyevolve multiprocessing speedup for Genetic Programming

Posted on : 11-10-2009 | By : Perone | In : Pyevolve, Python, genetic programming

0

As we know, Genetic Programming usually requires intensive processing power for the fitness functions and tree manipulations (in crossover operations), and this fact can be a huge problem when using a pure Python approach like Pyevolve. So, to overcome this situation, I’ve used the Python multiprocessing features to implement a parallel fitness evaluation approach in Pyevolve and I was surprised by the super linear speedup I got for a cpu bound fitness function used to do the symbolic regression of the Pythagoras theorem: c = \sqrt{a^2 + b^2}. I’ve used the same seed for the GP, so it has consumed nearly the same cpu resources for both test categories. Here are the results I obtained:

pyevolve_multiprocessing

The first fitness landscape I’ve used had 2.500 points and the later had a fitness landscape of 6.400 points, here is the source code I’ve used (you just need to turn on the multiprocessing option using the setMultiProcessing method, so Pyevolve will use multiprocessing when you have more than one single core, you can enable the logging feature to check what’s going on behind the scenes):

?View Code PYTHON
from pyevolve import *
import math
 
rmse_accum = Util.ErrorAccumulator()
 
def gp_add(a, b): return a+b
def gp_sub(a, b): return a-b
def gp_mul(a, b): return a*b
def gp_sqrt(a):   return math.sqrt(abs(a))
 
def eval_func(chromosome):
   global rmse_accum
   rmse_accum.reset()
   code_comp = chromosome.getCompiledCode()
 
   for a in xrange(0, 80):
      for b in xrange(0, 80):
         evaluated     = eval(code_comp)
         target        = math.sqrt((a*a)+(b*b))
         rmse_accum   += (target, evaluated)
   return rmse_accum.getRMSE()
 
def main_run():
   genome = GTree.GTreeGP()
   genome.setParams(max_depth=4, method="ramped")
   genome.evaluator += eval_func
   genome.mutator.set(Mutators.GTreeGPMutatorSubtree)
 
   ga = GSimpleGA.GSimpleGA(genome, seed=666)
   ga.setParams(gp_terminals       = ['a', 'b'],
                gp_function_prefix = "gp")
 
   ga.setMinimax(Consts.minimaxType["minimize"])
   ga.setGenerations(20)
   ga.setCrossoverRate(1.0)
   ga.setMutationRate(0.08)
   ga.setPopulationSize(800)
   ga.setMultiProcessing(True)
 
   ga(freq_stats=5)
   best = ga.bestIndividual()
 
if __name__ == "__main__":
   main_run()

As you can see, the population size was 800 individuals with a 8% mutation rate and a 100% crossover rate for a simple 20 generations evolution. Of course you don’t need so many points in the fitness landscape, I’ve used 2.500+ points to create a cpu intensive fitness function, otherwise, the speedup can be less than 1.0 due the communication overhead between the processes. For the first case (2.500 points fitness landscape) I’ve got a 3.33x speedup and for the last case (6.400 points fitness landscape) I’ve got a 3.28x speedup. The tests were executed in a 2 cores pc (Intel Core 2 Duo).

Share:
  • Reddit
  • del.icio.us
  • Digg
  • Mixx
  • Slashdot
  • StumbleUpon
  • Technorati
  • Facebook
  • TwitThis
  • description
  • Live
  • NewsVine
  • Ping.fm

Meanwhile, at the Hall of Justice!

Posted on : 05-10-2009 | By : Perone | In : Genetic Algorithms, News, Science

0

UPDATE 05/10: there is an article in the Physorg too.

Sometimes we face new applications for EC, but for this I was not expecting, from Eurekalert:

WASHINGTON, Oct. 5 — Criminals are having a harder time hiding their faces, thanks to new software that helps witnesses recreate and recognize suspects using principles borrowed from the fields of optics and genetics.

(…)

His software generates its own faces that progressively evolve to match the witness’ memories. The witness starts with a general description such as “I remember a young white male with dark hair.” Nine different computer-generated faces that roughly fit the description are generated, and the witness identifies the best and worst matches. The software uses the best fit as a template to automatically generate nine new faces with slightly tweaked features, based on what it learned from the rejected faces.

“Over a number of generations, the computer can learn what face you’re looking for,” says Solomon.

Read the full article here.

Share:
  • Reddit
  • del.icio.us
  • Digg
  • Mixx
  • Slashdot
  • StumbleUpon
  • Technorati
  • Facebook
  • TwitThis
  • description
  • Live
  • NewsVine
  • Ping.fm

Beautiful Django

Posted on : 01-10-2009 | By : Perone | In : Python, Time Waste

13

The ugly web is over; the trick is to add a Django middleware to process every HttpResponse (with content-type text/html) of Django using BeautifulSoup. The source-code of the middleware is simple:

?View Code PYTHON
from BeautifulSoup import BeautifulSoup
 
class BeautifulMiddleware(object):
    def process_response(self, request, response):
        if response.status_code == 200:
            if response["content-type"].startswith("text/html"):
                beauty = BeautifulSoup(response.content)
                response.content = beauty.prettify()
        return response

We simple check for HTTP response code 200 and then check for a “text/html” content and use BeautifulSoup to process the response. See an example of what it does:

1) I’d a html in my Django application, very ugly and with missing tags:

imagem

This HTML template will be rendered as showed above by Django without the BeautifulSoup middleware, but with the middleware pluged in the settings of your Django app, it will render that html source:

imagem2

BeautifulSoup has figured out sensible places to put the closing tags of the HTML source and has created a pretty indented structure, automagically =)

It’s very easy and interesting create new django middlewares, examples can be JavaScript obfuscators, compressors, automatic performance analysis of html code to improve the render speed of browser and these sort of things.

Share:
  • Reddit
  • del.icio.us
  • Digg
  • Mixx
  • Slashdot
  • StumbleUpon
  • Technorati
  • Facebook
  • TwitThis
  • description
  • Live
  • NewsVine
  • Ping.fm

On the irreversibility of evolution

Posted on : 27-09-2009 | By : Perone | In : News, Science

0

evo_comic

Today I’ve read about an important work done by a team of evolutionary biologists of the University of Oregon, which reveals an important result about the evolutionary irreversibility. The concept of irreversibility states that the future results of evolution at any point in time must depend on the present state and by the past, showing the determinism of evolution; on the other hand, the evolution reversibility dictates that the natural selection can produce the same forms in any given environment, independent of history.

This question about the irreversibility of evolution has remained unsolved because of the fact that we rarely know what features the ancestors had and what the mechanisms was used to evolve into the actual organisms, but the team of Joe Thornton has solved those issues by studying the problem at the molecular level, resurrecting ancestral proteins (GR1) as they existed long ago and using manipulation to study evolutionary process in two directions: forward and reverse.

The results of the work done by the research team was:

Our observations suggest that history and contingency during glucocorticoid receptor evolution strongly limited the pathways that could be deterministically followed under selection.

(…)

Selection is an extraordinarily powerful evolutionary force; nevertheless, our observations suggest that, because of the complexity of glucocorticoid receptor architecture, low-probability permissive substitutions were required to open some mutational trajectories to exploration under selection, whereas restrictive substitutions closed other potential paths. Under selection, some kind of adaptation will always occur, but the specific adaptive forms that are realized depend on the historical trajectory that precedes them. The conditions that once facilitated evolution of the glucocorticoid receptor’s ancestors were destroyed during the realization of its present form. The past is difficult to recover because it was built on the foundation of its own history, one irrevocably different from that of the present and its many possible futures.

So my friend, that’s the way nature evolve, possible never looking back. But this is a great new step for future works and research on the irreversibility of evolution.

References

[1] http://www.nature.com/nature/journal/v461/n7263/abs/nature08249.html
[2] http://www.uoregon.edu/~joet/PDF/bridgham-thornton-nature2009.pdf
[3] http://sciencenow.sciencemag.org/cgi/content/full/2009/923/1

Share:
  • Reddit
  • del.icio.us
  • Digg
  • Mixx
  • Slashdot
  • StumbleUpon
  • Technorati
  • Facebook
  • TwitThis
  • description
  • Live
  • NewsVine
  • Ping.fm

New SIGEvolution issue – Volume 3 Issue 4!

Posted on : 22-09-2009 | By : Perone | In : Genetic Algorithms, News, Science, genetic programming

0

Great news, the new issue of SIGEvolution was released, click in the cover above to go to the SIGEvolution site; this issue has an interview is with Hans-Paul Schwefel.

sig_cover

Share:
  • Reddit
  • del.icio.us
  • Digg
  • Mixx
  • Slashdot
  • StumbleUpon
  • Technorati
  • Facebook
  • TwitThis
  • description
  • Live
  • NewsVine
  • Ping.fm

n-queens problem using Pyevolve

Posted on : 19-09-2009 | By : Perone | In : Genetic Algorithms, Pyevolve, Python

4

Last night I’ve read a post on Reddit written by Matthew Rollings showing a code in Python to solve Eight Queens puzzle using EA. So I decided to implement it in Python again but this time using Pyevolve, here is the code:

?View Code PYTHON
from pyevolve import *
from random import shuffle
 
BOARD_SIZE = 64
 
def queens_eval(genome):
   collisions = 0
   for i in xrange(0, BOARD_SIZE):
      if i not in genome: return 0
   for i in xrange(0, BOARD_SIZE):
      col = False
      for j in xrange(0, BOARD_SIZE):
         if (i != j) and (abs(i-j) == abs(genome[j]-genome[i])):
            col = True
      if col == True: collisions +=1
   return BOARD_SIZE-collisions
 
def queens_init(genome, **args):
   genome.genomeList = range(0, BOARD_SIZE)
   shuffle(genome.genomeList)
 
def run_main():
   genome = G1DList.G1DList(BOARD_SIZE)
   genome.setParams(bestrawscore=BOARD_SIZE, rounddecimal=2)
   genome.initializator.set(queens_init)
   genome.mutator.set(Mutators.G1DListMutatorSwap)
   genome.crossover.set(Crossovers.G1DListCrossoverCutCrossfill)
   genome.evaluator.set(queens_eval)
 
   ga = GSimpleGA.GSimpleGA(genome)
   ga.terminationCriteria.set(GSimpleGA.RawScoreCriteria)
   ga.setMinimax(Consts.minimaxType["maximize"])
 
   ga.setPopulationSize(100)
   ga.setGenerations(5000)
   ga.setMutationRate(0.02)
   ga.setCrossoverRate(1.0)
 
   # This DBAdapter is to create graphs later, it'll store statistics in
   # a SQLite db file
   sqlite_adapter = DBAdapters.DBSQLite(identify="queens")
   ga.setDBAdapter(sqlite_adapter)
 
   ga.evolve(freq_stats=10)
 
   best = ga.bestIndividual()
   print best
   print "\nBest individual score: %.2f\n" % (best.score,)
 
if __name__ == "__main__":
   run_main()

It tooks 49 generations to solve a 64×64 (4.096 chess squares) chessboard, here is the output:

Gen. 0 (0.00%): Max/Min/Avg Fitness(Raw) [20.83(27.00)/13.63(7.00)/17.36(17.36)]
Gen. 10 (0.20%): Max/Min/Avg Fitness(Raw) [55.10(50.00)/39.35(43.00)/45.92(45.92)]
Gen. 20 (0.40%): Max/Min/Avg Fitness(Raw) [52.51(55.00)/28.37(24.00)/43.76(43.76)]
Gen. 30 (0.60%): Max/Min/Avg Fitness(Raw) [67.45(62.00)/51.92(54.00)/56.21(56.21)]
Gen. 40 (0.80%): Max/Min/Avg Fitness(Raw) [65.50(62.00)/19.89(31.00)/54.58(54.58)]

        Evolution stopped by Termination Criteria function !

Gen. 49 (0.98%): Max/Min/Avg Fitness(Raw) [69.67(64.00)/54.03(56.00)/58.06(58.06)]
Total time elapsed: 39.141 seconds.

And here is the plots generated by the Graph Plot Tool of Pyevolve:

fig1

fig3

fig5

fig8

Share:
  • Reddit
  • del.icio.us
  • Digg
  • Mixx
  • Slashdot
  • StumbleUpon
  • Technorati
  • Facebook
  • TwitThis
  • description
  • Live
  • NewsVine
  • Ping.fm

Word is smart, but OpenOffice is wise

Posted on : 18-09-2009 | By : Perone | In : Python, Time Waste

7

UPDATE 19/09: it seems that some people had misunderstood the post title, so here is a clarification: I’m not comparing Word with OpenOffice or something like that, the title refers to the design choices of OpenOffice in using Python 2.6 as an option for scripting language, it’s a humorous title and should not be considered in literal sense.

This is a very simple, but powerful, Python script to call Google Sets API (in fact it’s not an API call – since Google doesn’t have an official API for Sets service – but an interesting and well done scraper using BeautifulSoup) inside the OpenOffice 3.1.1 Writer… anyway, you can check the video to understand what it really does:

And here is the very complex source-code:

?View Code PYTHON
from xgoogle.googlesets import GoogleSets
 
def growMyLines():
   """ Calls Google Set unofficial API (xgoogle) """
   doc = XSCRIPTCONTEXT.getDocument()
   controller = doc.getCurrentController()
   selection = controller.getSelection()
   count = selection.getCount();
   text_range = selection.getByIndex(0);
   lines_list = text_range.getString().split("\n");
   gset = GoogleSets(lines_list)
   gset_results = gset.get_results()
   results_concat = "\n".join(gset_results)
   text_range.setString(results_concat);
 
g_exportedScripts = growMyLines,

You need to put the “xgoogle” module inside the “OpenOffice.org 3\Basis\program\python-core-2.6.1\lib” path, and the above script inside “OpenOffice.org 3\Basis\share\Scripts\python”.

I hope you enjoyed =) with new Python 2.6 core in OpenOffice 3, they have increased the productivity potential at the limit.

Share:
  • Reddit
  • del.icio.us
  • Digg
  • Mixx
  • Slashdot
  • StumbleUpon
  • Technorati
  • Facebook
  • TwitThis
  • description
  • Live
  • NewsVine
  • Ping.fm

Send Subversion commit messages to Twitter

Posted on : 12-08-2009 | By : Perone | In : Python

2

Hello, this is a bridge between Subversion (svn) and Twitter, the intent of this tool is to update a Twitter account when new commit messages arrives in a Subversion repository. We almost never have access to svn repository to add a post-commit hook in a way to call our script and send updates to twitter, so this tool was done to overcome that situation. Using it, you can monitor for example a svn repository from Google Hosting, from Sourceforge.net, etc…

The process of the tool is simple: it will firstly check Twitter account for the last svn commit message (the messages always start with a “$” prefix, or with another user-defined prefix), and then it will check the svn repository server to verify if it has new commits compared to the last tweet revision, if it has, it will update twitter with newer commit messages.

Here is a simple example of a Twitter account updated with commit messages:

twittsvn

The tool is very simple to use in command-line:

Python twittsvn v.0.1
By Christian S. Perone
http://pyevolve.sourceforge.net/wordpress

Usage: twittsvn.py [options]

Options:
  -h, --help            show this help message and exit

  Twitter Options:
    Twitter Accounting Options

    -u TWITTER_USERNAME, --username=TWITTER_USERNAME
                        Twitter username (required).
    -p TWITTER_PASSWORD, --password=TWITTER_PASSWORD
                        Twitter password (required).
    -r TWITTER_REPONAME, --reponame=TWITTER_REPONAME
                        Repository name (required).
    -c TWITTER_COUNT, --twittercount=TWITTER_COUNT
                        How many tweets to fetch from Twitter, default is
                        '30'.

  Subversion Options:
    Subversion Options

    -s SVN_PATH, --spath=SVN_PATH
                        Subversion path, default is '.'.
    -n SVN_NUMLOG, --numlog=SVN_NUMLOG
                        Number of SVN logs to get, default is '5'.

And here is a simple example:

# python twittsvn.py -u twitter_username -p twitter_password \
-r any_repository_name

You must execute it in a repository directory (not the server, your local files) or use the “-s” option to specify the path of your local repository. You should put this script to execute periodicaly using cron or something like that.

To use the tool, you must install pysvn and python-twitter. To install Python-twitter you can use “easy_install python-twitter”, but for pysvn you must check the download section at the project site.

Here is the source-code of the twittsvn.py:

?Download twittsvn.py
# python-twittsvn - A SVN/Twitter bridge
# Copyright (C) 2009  Christian S. Perone
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program.  If not, see .
 
import sys
import re
import time
from optparse import OptionParser, OptionGroup
 
__version__ = "0.1"
__author__  = "Christian S. Perone "
 
TWITTER_PREFIX = "$"
TWITTER_MSG    = TWITTER_PREFIX + "[%s] Rev. %d by [%s] %s: %s"
 
try:
   import pysvn
except ImportError:
   raise ImportError, "pysvn not found, check http://pysvn.tigris.org/project_downloads.html"
 
try:
   import twitter
except ImportError:
   raise ImportError, "python-twitter not found, check http://code.google.com/p/python-twitter"
 
def ssl_server_trust_prompt(trust_dict):
    return True, 5, True
 
def run_main():
   parser = OptionParser()
   print "Python twittsvn v.%s\nBy %s" % (__version__, __author__)
   print "http://pyevolve.sourceforge.net/wordpress\n"
 
   group_twitter = OptionGroup(parser, "Twitter Options",
                       "Twitter Accounting Options")
   group_twitter.add_option("-u", "--username", dest="twitter_username",
                            help="Twitter username (required).",
                            type="string")
   group_twitter.add_option("-p", "--password", dest="twitter_password",
                            help="Twitter password (required).",
                            type="string")
   group_twitter.add_option("-r", "--reponame", dest="twitter_reponame",
                            help="Repository name (required).",
                            type="string")
   group_twitter.add_option("-c", "--twittercount", dest="twitter_count",
                            help="How many tweets to fetch from Twitter, default is '30'.",
                            type="int", default=30)
   parser.add_option_group(group_twitter)
 
   group_svn = OptionGroup(parser, "Subversion Options",
                       "Subversion Options")
   group_svn.add_option("-s", "--spath", dest="svn_path",
                        help="Subversion path, default is '.'.",
                        default=".", type="string")
   group_svn.add_option("-n", "--numlog", dest="svn_numlog",
                        help="Number of SVN logs to get, default is '5'.",
                        default=5, type="int")
   parser.add_option_group(group_svn)
 
   (options, args) = parser.parse_args()   
 
   if options.twitter_username is None:
      parser.print_help()
      print "\nError: you must specify a Twitter username !"
      return
 
   if options.twitter_password is None:
      parser.print_help()
      print "\nError: you must specify a Twitter password !"
      return
 
   if options.twitter_reponame is None:
      parser.print_help()
      print "\nError: you must specify any repository name !"
      return
 
   twitter_api = twitter.Api(username=options.twitter_username,
                             password=options.twitter_password)
   twitter_api.SetCache(None) # Dammit cache !
   svn_api     = pysvn.Client()
 
   svn_api.callback_ssl_server_trust_prompt = ssl_server_trust_prompt
 
   print "Checking Twitter synchronization..."
   status_list = twitter_api.GetUserTimeline(options.twitter_username,
                                             count=options.twitter_count)
   print "Got %d statuses to check..." % len(status_list)
 
   last_twitter_commit = None
   for status in status_list:
      if status.text.startswith(TWITTER_PREFIX):
         print "SVN Commit messages found !"
         last_twitter_commit = status
         break     
 
   print "Checking SVN logs for ['%s']..." % options.svn_path
   log_list = svn_api.log(options.svn_path, limit=options.svn_numlog)
 
   if last_twitter_commit is None:
      print "No twitter SVN commit messages found, posting last %d svn commit messages..." % options.svn_numlog
      log_list.reverse()
 
      for log in log_list:
         message = log["message"].strip()
         date    = time.ctime(log["date"])
         if len(message) <= 0: message = "(no message)"
         twitter_api.PostUpdate(TWITTER_MSG % (options.twitter_reponame,
                                               log["revision"].number,
                                               log["author"], date, message))
      print "Posted %d svn commit messages to twitter !" % len(log_list)
   else:
      print "SVN commit messages found in twitter, checking last revision message...."
      msg_regex        = re.compile(r'Rev\. (\d+) by')
      last_rev_twitter = int(msg_regex.findall(last_twitter_commit.text)[0])
 
      print "Last revision detected in twitter is #%d, checking for new svn commit messages..." % last_rev_twitter
      rev_num = pysvn.Revision(pysvn.opt_revision_kind.number, last_rev_twitter+1)
 
      try:
         log_list = svn_api.log(options.svn_path, revision_end=rev_num,
                                limit=options.svn_numlog)
      except pysvn.ClientError:
         print "No more revisions found !"
         log_list = []
 
      if len(log_list) <= 0:
         print "No new SVN commit messages found !"
         print "Updated !"
         return
 
      log_list.reverse()
 
      print "Posting new messages to twitter..."
      posted_new = 0
      for log in log_list:
         message = log["message"].strip()
         date    = time.ctime(log["date"])
         if len(message) <= 0:             
            message = "(no message)"
            if log["revision"].number > last_rev_twitter:
            twitter_api.PostUpdate(TWITTER_MSG % (options.twitter_reponame,
                                                  log["revision"].number,
                                                  log["author"], date, message ))
            posted_new+=1
      print "Posted new %d messages to twitter !" % posted_new
      print "Updated!"
 
if __name__ == "__main__":
   run_main()
Share:
  • Reddit
  • del.icio.us
  • Digg
  • Mixx
  • Slashdot
  • StumbleUpon
  • Technorati
  • Facebook
  • TwitThis
  • description
  • Live
  • NewsVine
  • Ping.fm

An analysis of Benford’s law applied to Twitter

Posted on : 11-08-2009 | By : Perone | In : News, Python, Science

8

Benford’s law is one of those very weird things that we can’t explain, and when we discovery more and more events which obeys the law, we became astonished. Two men (Simon Newcomb – 1881 and Frank Benford – 1938) noted the law in the same way, while flipping pages of a logarithmic table book; they noticed that the pages in the beginning of the book were dirtier than the pages at the end.

Currently, there is no a priori criteria that says to us when a dataset will or will not obey the Benford’s Law. And it is because of this, that I’ve done an analysis on the Twitter Public Timeline.

The Twitter API to get Public Timeline is simple useless for this analysis, because in the API Docs, they say that the Public Timeline is cached for 60 seconds ! 60 seconds is an eternity, and there are a request rating of 150 request/hour. So, it doesn’t help, buuuuuut, there are an alpha testing API with pretty and very useful streams of data from the Public Timeline; there are many methods in the Twitter Streaming API, and the most interesting is the “Firehose”, which returns ALL the public statuses, but this method is only available for intere$ting people, and I’m not one of them. Buuuut, we have “Spritzer”, which returns a portion of all public statuses, since it’s only what we have available in the moment, it MUST be useful, and it’s a pretty stream of data =)

So, I’ve got the Spritzer real-time stream of data and processed each new status which arrived with a regex to find all the numbers in the status; if the status was “I have 5 dogs and 3 cats”, the numbers collected should be “[5, 3]“. All those accumulated numbers were then checked against the Benford’s Law. I’ve used Matplotlib to plot the two curves (the Benford’s Law and the Twitter statuses digits distribution) to empirically observe the correlation between them. You can note in the upper right corner of the video, the Pearson’s correlation between two distributions too.

Here is the video (I’ve seen only after the creating of the video, the color of  the curves are the inverse as seen in legend):

The video represents the capture of 15 minutes (3.160 captured statuses) of the Twitter Public Timeline. In the end of the video, you can see a Pearson’s correlation of 0.95, I think it’s a good correlation, since the Pearson’s varies from -1 to 1, so 0.95 it’s a strong correlation between the variables. It’s seems that we have found another Benford’s son =)

The little tool to handle the Twitter Spritzer stream of data and plot the correlation graph in real-time was entire written in Python, I’ll do a clean-up and post it here soon I got time. The tool has generated 1823 png images that were merged using ffmpeg.

I hope you enjoyed =)

UPDATE 11/08: the user “poobare” has cited a interesting paper about Benford’s Law on Reddit, here is the link.

More posts about Benford’s Law

Prime Numbers and the Benford’s Law

Delicious.com, checking user numbers against Benford’s Law

Benford’s Law meets Python and Apple Stock Prices

Share:
  • Reddit
  • del.icio.us
  • Digg
  • Mixx
  • Slashdot
  • StumbleUpon
  • Technorati
  • Facebook
  • TwitThis
  • description
  • Live
  • NewsVine
  • Ping.fm

Announce: python-gstringc 1.0

Posted on : 08-08-2009 | By : Perone | In : Python

1

Last week I’ve done the ctype wrapper of Glib GString, but the performance issues when compared with cStringIO or StringIO was very poor, due to overhead of the ctypes calls. So I’ve written a new C extension for Python, the source-code (for linux) and install packages (for win32) are available at the project site. Here is some performance comparisons:

gstring

gstring_linux

Those tests were done to test string concatenation (append) of the three modules. You can find more about the tests and performance in the project site.

I hope you enjoy =)

Share:
  • Reddit
  • del.icio.us
  • Digg
  • Mixx
  • Slashdot
  • StumbleUpon
  • Technorati
  • Facebook
  • TwitThis
  • description
  • Live
  • NewsVine
  • Ping.fm

Pyevolve: announce of the user group

Posted on : 04-08-2009 | By : Perone | In : Pyevolve, Python

0

image5

This post is to announce the Pyevolve user group I’ve created, since some users are requesting it and I think it’s important to exchange experiences and help users, here is the group (it’s hosted on google groups).

Cya !

Share:
  • Reddit
  • del.icio.us
  • Digg
  • Mixx
  • Slashdot
  • StumbleUpon
  • Technorati
  • Facebook
  • TwitThis
  • description
  • Live
  • NewsVine
  • Ping.fm

GLib GString wrapper for Python

Posted on : 04-08-2009 | By : Perone | In : Python

1

I’ve done a little wrapper of GLib/GString functionality using Python ctypes. GString of GLib is an amazing and very stable work done by GLib team, the core used by GTK+ and many other libraries. The wrapper I’ve done works in Windows and Linux, but the performance results seems more interesting for Windows, however, the GString functionality is very interesting for any platform. The project is hosted here in Google project hosting.

Here is some examples of the wrapper:

?View Code PYTHON
>>> from GString import GString
>>> obj1 = GString("test")
>>> obj1
<GString [test]>
>>> print obj1
test
>>> obj1+obj1
<GString [testtest]>
>>> obj1+=obj1
>>> obj1
<GString [testtest]>
>>> obj1.truncate(5)
<GString [testt]>
>>> obj1.insert(2, "xx")
<GString [texxstt]>
>>> obj1.assign("test")
<GString [test]>
>>> obj1.erase(2,2)
<GString [te]>
>>> obj1.assign("12345")
>>> for i in xrange(len(obj1)):
...    print obj1[i]
...
1
2
3
4
5

I hope you enjoyed the wrapper, you can find performance comparisons in the project home site.

Share:
  • Reddit
  • del.icio.us
  • Digg
  • Mixx
  • Slashdot
  • StumbleUpon
  • Technorati
  • Facebook
  • TwitThis
  • description
  • Live
  • NewsVine
  • Ping.fm

Jinja2 in a Java JSP

Posted on : 29-07-2009 | By : Perone | In : Python, java

0

This is a simple trick possible using Jython; to call jinja2 template engine under JSP we instantiate the PythonInterpreter class of Jython, set some parameters in the interpreter and then call jinja2 to render a template and write to the Java “out” object.

To install Jython, just download the last stable version Jython 2.5 and then install it as “Standalone” version; the installer will create a simple jython.jar file under the installation directory, copy this package to your Java web project under the \WEB-INF\lib or where you put your web application libraries.

Later, copy the jinja2 module to the \build\classes.

Create a simple template under the WebContent\templates\template.html with contents below:

This is just a Jinja2 template test !
 
Parameter "p" = <strong>{{ request.getParameter("p") }}</strong>
 
Getting session attribute: <strong>{{ session.getAttribute("session_attribute") }}</strong>
 
Iterating over Java array:
<ul>
{% for user in users %}
	<li>{{ user }}</li>
{% endfor %}</ul>

And now create a file called index.jsp in the root of the WebContent, with the contents:

<%@ page language="java" contentType="text/html; charset=ISO-8859-1"
    pageEncoding="ISO-8859-1"%>
<%@page import="org.python.util.PythonInterpreter" %>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
<title>Jinja2 Template under JSP</title>
</head>
<body>
<%
	PythonInterpreter py = new PythonInterpreter();
 
	String[] users = {"User Number One", "User Number Two"};
	session.setAttribute("session_attribute", "Testing Session");
 
	py.set("out",     out);
	py.set("request", request);
	py.set("session", session);
	py.set("users",   users);
	py.set("context_path", application.getRealPath("/") + "templates");
 
	py.exec("from jinja2 import Environment, FileSystemLoader");
	py.exec("env = Environment(loader=FileSystemLoader(context_path))");
	py.exec("template = env.get_template('template.html')");
	py.exec("out.write(template.render(locals()))");
%>
</body>
</html>

The structure will be like this:

structure

And the output will be something like this:

out

Share:
  • Reddit
  • del.icio.us
  • Digg
  • Mixx
  • Slashdot
  • StumbleUpon
  • Technorati
  • Facebook
  • TwitThis
  • description
  • Live
  • NewsVine
  • Ping.fm

The most possible True Random Number Generator

Posted on : 28-07-2009 | By : Perone | In : Article, Philosophy, Python

27

We know that our physical world is deterministic (and please, don’t come with the Dr Quantum videos of YouTube, by more strange and successful that the QM laws are, they do not fully represent a solid understand, and there are still many gaps to be filled), which means that creating a TRUE random number generator it’s impossible. Some people uses atmospheric noise to create random numbers, but it’s just a hard-to-predict approach and not a TRUE random number generator, there are some people using the radioactive decay too, but it can still be just a unknow mechanism. When I say by “TRUE”, I’m talking about a number that is created by something that could break the causality law, something that have an effect without prior cause. So, I think, the secret for creating a True Random Number Generator (TRNG) instead of a Pseudo-Random Number Generator (PRNG) is to link our generator with the only thing that is supposed to break the causality law: us.

Share:
  • Reddit
  • del.icio.us
  • Digg
  • Mixx
  • Slashdot
  • StumbleUpon
  • Technorati
  • Facebook
  • TwitThis
  • description
  • Live
  • NewsVine
  • Ping.fm

Approximating Pi number using Genetic Programming

Posted on : 22-07-2009 | By : Perone | In : News, Pyevolve, Python, Time Waste, genetic programming

9

pi

As many (or very few in the real life haha) people know, today is the Pi Approximation Day ! So it’s time to make a contribution to celebrate this funny day =)

My contribution is to use Python and Pyevolve to approximate Pi number using Genetic Programming approach. I’ve created the functions gp_add(+), gp_sub(-), gp_div(/), gp_mul(*) and gp_sqrt (square root) to use as non-terminals of the GP. The fitness function is very simple too, it simple returns the absolute difference between the Python math.pi and the evaluated individual. I’ve used also a population size of 1k individuals with max tree depth of 8 and the random ephemeral constants as random integers. The best approximation I’ve got while running the GP for about 8 minutes (40 generations) was 3.1416185511, best for 3 digits, you can improve it and let it run for more time to get better approximations.

Here is the formulae I’ve got with the GP (click to enlarge):

tree_pi

And here is the output of the script:

Best (0): 3.1577998365
        Error: 0.0162071829
Best (10): 3.1417973679
        Error: 0.0002047143
Best (20): 3.1417973679
        Error: 0.0002047143
Best (30): 3.1417973679
        Error: 0.0002047143
Best (40): 3.1416185511
        Error: 0.0000258975

- GenomeBase
        Score:                   0.000026
        Fitness:                 15751.020831

        Params:          {'max_depth': 8, 'method': 'ramped'}

        Slot [Evaluator] (Count: 1)
        Slot [Initializator] (Count: 1)
                Name: GTreeGPInitializator - Weight: 0.50
                Doc: This initializator accepts the follow parameters:

   *max_depth*
      The max depth of the tree

   *method*
      The method, accepts "grow" or "full"

   .. versionadded:: 0.6
      The *GTreeGPInitializator* function.

        Slot [Mutator] (Count: 1)
                Name: GTreeGPMutatorSubtree - Weight: 0.50
                Doc:  The mutator of GTreeGP, Subtree Mutator

   .. versionadded:: 0.6
      The *GTreeGPMutatorSubtree* function

        Slot [Crossover] (Count: 1)
                Name: GTreeGPCrossoverSinglePoint - Weight: 0.50

- GTree
        Height:                 8
        Nodes:                  21

GTreeNodeBase [Childs=1] - [gp_sqrt]
  GTreeNodeBase [Childs=2] - [gp_div]
    GTreeNodeBase [Childs=2] - [gp_add]
      GTreeNodeBase [Childs=0] - [26]
      GTreeNodeBase [Childs=2] - [gp_div]
        GTreeNodeBase [Childs=2] - [gp_mul]
          GTreeNodeBase [Childs=2] - [gp_add]
            GTreeNodeBase [Childs=2] - [gp_sub]
              GTreeNodeBase [Childs=0] - [34]
              GTreeNodeBase [Childs=2] - [gp_sub]
                GTreeNodeBase [Childs=0] - [44]
                GTreeNodeBase [Childs=0] - [1]
            GTreeNodeBase [Childs=2] - [gp_mul]
              GTreeNodeBase [Childs=0] - [49]
              GTreeNodeBase [Childs=0] - [43]
          GTreeNodeBase [Childs=1] - [gp_sqrt]
            GTreeNodeBase [Childs=0] - [18]
        GTreeNodeBase [Childs=0] - [16]
    GTreeNodeBase [Childs=2] - [gp_add]
      GTreeNodeBase [Childs=0] - [24]
      GTreeNodeBase [Childs=0] - [35]

- GTreeGP
        Expression: gp_sqrt(gp_div(gp_add(26,
gp_div(gp_mul(gp_add(gp_sub(34,
gp_sub(44, 1)), gp_mul(49, 43)), gp_sqrt(18)),
16)), gp_add(24, 35)))

And finally, here is the source code:

?View Code PYTHON
from __future__ import division
from pyevolve import *
import math
 
def gp_add(a, b): return a+b
def gp_sub(a, b): return a-b
def gp_div(a, b): return 1 if b==0 else a/b
def gp_mul(a, b): return a*b
def gp_sqrt(a):   return math.sqrt(abs(a))
 
def eval_func(chromosome):
   code_comp = chromosome.getCompiledCode()
   ret = eval(code_comp)
   return abs(math.pi - ret)
 
def step_callback(engine):
   gen = engine.getCurrentGeneration()
   if gen % 10 == 0:
      best = engine.bestIndividual()
      best_pi = eval(best.getCompiledCode())
      print "Best (%d): %.10f" % (gen, best_pi)
      print "\tError: %.10f" % (abs(math.pi - best_pi))
 
   return False
 
def main_run():
   genome = GTree.GTreeGP()
 
   genome.setParams(max_depth=8, method="ramped")
   genome.evaluator += eval_func
 
   ga = GSimpleGA.GSimpleGA(genome)
   ga.setParams(gp_terminals       = ['ephemeral:random.randint(1, 50)'],
                gp_function_prefix = "gp")
 
   ga.setMinimax(Consts.minimaxType["minimize"])
   ga.setGenerations(50000)
   ga.setCrossoverRate(1.0)
   ga.setMutationRate(0.09)
   ga.setPopulationSize(1000)
   ga.stepCallback.set(step_callback)
 
   ga.evolve()
   best = ga.bestIndividual()
   best.writeDotImage("tree_pi.png")
 
   print best
 
if __name__ == "__main__":
   main_run()

If you are interested why today is the Pi Approximation day, see some resources:

Little Cartoon

Some Background History

Some Pi Approximations

Share:
  • Reddit
  • del.icio.us
  • Digg
  • Mixx
  • Slashdot
  • StumbleUpon
  • Technorati
  • Facebook
  • TwitThis
  • description
  • Live
  • NewsVine
  • Ping.fm

Genetic Programming and Flex layouts

Posted on : 29-06-2009 | By : Perone | In : Pyevolve, Python, genetic programming

0

To show how Genetic Programming of Pyevolve can be flexible, I’ve done a simple example using Adobe Flex and Pyevolve, the example is just to show how to evolve some kind of Flex layouts, I’ve not implemented the fitness function, this example will just create a random Flex layout using MXML. So, here is the Pyevolve code of the example:

?View Code PYTHON
import random
from pyevolve import *
 
def gp_hbox(x, y):
   return "%s %s" % (x,y)
 
def gp_vbox(x, y):
   return "%s %s" % (x,y)
 
def gp_panel(x, y):
   return "%s %s" % (x,y)
 
def eval_func(chromosome):
   code_comp = chromosome.getCompiledCode()
 
   for a in xrange(0, 5):
      for b in xrange(0, 5):
         evaluated     = eval(code_comp)
   return random.randint(1,100)
 
def main_run():
   genome = GTree.GTreeGP()
   genome.setParams(max_depth=5, method="ramped")
   genome.evaluator += eval_func
 
   ga = GSimpleGA.GSimpleGA(genome)
 
   button     = repr("&lt;mx:Button label='Button'/&gt;")
   label      = repr("&lt;mx:Label text='Label'/&gt;")
   text_input = repr("&lt;mx:TextInput width='50'/&gt;")
 
   ga.setParams(gp_terminals       = [button, label, text_input],
                gp_function_prefix = "gp")
   ga.setMinimax(Consts.minimaxType["minimize"])
   ga.evolve(freq_stats=5)
   print ga.bestIndividual()
 
if __name__ == "__main__":
   main_run()

As you can see, I’ve created the layout tags like HBox, VBox and Panel as functions of GP and the Button, Labe, TextInput as terminals of the GP, the result is very funny, it’s just a random layout, but you can use your imagination to create some nice and interesting fitness functions.

Here is the SWF generated from a random individual of the population:

I hope you enjoyed =)

Share:
  • Reddit
  • del.icio.us
  • Digg
  • Mixx
  • Slashdot
  • StumbleUpon
  • Technorati
  • Facebook
  • TwitThis
  • description
  • Live
  • NewsVine
  • Ping.fm

FISL 10 – “Forum Internacional de Software Livre”

Posted on : 25-06-2009 | By : Perone | In : News

0

fisl10-tecposts

I’m very proud of my city today, because it is here (in Porto Alegre, RS, Brazil) that the FISL 2009 (10th) is happening ! The FISL is a conference which is attracting more than 7.000 attendees in the same time I’m writing this post. All these people (developers, companies, etc..) are here to talk and focus on “free software”. The conference is being presented with the presence of distinguished speakers like Richard Stallman (FSF), Peter Sunde (from Pirate Bay), Jon “Maddog” Hall (founder of Open Source Internation) and many other great speakers and Python people too =)  even the president of Brazil will be present tomorrow (friday) !!!

Unfortunately, almost the all sessions are in Portuguese, but this conference is a “must-go” for every developer ! See here some photos of the event.

You can find more information about FISL here:

The main site of the event

A Javalobby article about the first day

The real-time transmissions and some videos of conference

The session list of the event

Share:
  • Reddit
  • del.icio.us
  • Digg
  • Mixx
  • Slashdot
  • StumbleUpon
  • Technorati
  • Facebook
  • TwitThis
  • description
  • Live
  • NewsVine
  • Ping.fm