Friday, February 18, 2005

The average abstract

Now that I have fineshed downloading my share of 2004's hep-th papers as provided by Joanna Karczmarek we can start to have some fun. First, we untar the abstract file and then combine them into a single file by

find .|grep txt|xargs -n1 perl -e '$/="\\\\";<>;<>;print <>;'>all_abs

This gives a nice collection of 38766 lines of more or less random high energy speak.

With the help of a simple Markov model (that just uses the probability that the next word is B if the previous was A), we can easily produce a lot more of this. For example I got

goldbergerwise mechanism be taken to the presence of topics unification is unlikely that of a quotient the results in order d field strengths the framework which contrary to dimensional minkowski superspace formalism as well below which should be fixed where the gauge hamiltonian for the kinetic term entering this paper we study of continuity restrictions in powers of the computation of both a subset of v the role we develop the bps breaking potential and preserves n extra dimensions using the scalar quantum string theory motivated in relation between free energy density is decaying mode sector of local geometries the

or even

sigmamodel has no chiral ring write down to symmetric massless states with matter and consistency of the effects of mtheory in the help of motion of those of supersymmetric type ii orbifolds including the newly formed in kkltlike vacua we investigate the transition in the acceleration we consider the presence of a tensionful codimensionone brane is finite k coincident branes wrapping cycles on the usual finetuning between the black supertubes of the edges evolve through the presence of gravitation together with a mass with qdeformed harmonic oscillator with scalar field coupled to end of spinorbit interactions for communication here

Locally, it looks quite good, although the grammar lacks a bit. And all this with a single page of perl


@words = split /\s+/;
foreach $word(@words){
next if $word =~ /[{}\<\>\/\\]/; #not real word
$word = lc($word);
next if $word =~ /html/;
$word =~ s/[^a-z]//g;
next unless $word;
$lastword = $word;

@words = keys %occurence;

#foreach $word(sort @words){
# print "Learned $word... $occurence{$word} times\n";
# foreach $follow(keys %{$successor{$word}}){
# print "\t $follow\n";
# }

$now = $words[scalar rand(@words)];

for $i(1..100){
print "$now ";
%follow = %{$successor{$now}};
$number = rand($occurence{$now});
foreach $next(keys %follow){
# print "$next $number";
if(($number -= $follow{$next})<=0){
$now = $next;

1 comment:

Anonymous said...

Maybe the famous PhD by the two brothers
(from the reverse Sokal hoax) were
written in this way? :-)