Schnelleinstieg Reader

Home|Suche|Sitemap|Webmail

Startseite FSU

Benjamin Redling



benjamin_redlingI joined Prof. Udo Hahn's group in November 2006 as a system administrator.

My main task at the beginning was to accompany the implementation of the groups planned SAN infrastructure. I reorganized the attempts of my predecessor in a structured way, reduced unnecessary complexity by communicating and cooperating with the main data center of the university in Jena.

By mid-2007 I expanded the cleaned up beginnings into a necessary three rack ensemble with a proper power concept, necessary safety according to law and capacity planning. These preparation allowed a 15 node / 60 core HPC. In 2009 I upgraded the cluster from 180GB ECC RAM to 480GB ECC RAM by myself and economically maintained it in working condition for nearly 10 years.

In the following years, by gaining a deeper understanding of the workflow of (the former) group, I was able to improve the replicability of the experiments (e.g. advising and guiding to use automatically aquired baseline corpora instead of ad hoc downloads) and I showed how to reduce the archive of research data of the group from more than 52M files to way less than 3M.

Becoming kind of the data manager of the group and thus learning to understand what is really needed in the long run and further promoting scripted replicability allowed me to reduce the maintenance bill by 12.000€/year and switch the small group from a complex, expensive, unsafe on-site fibre-channel SAN to economical off-site backup solutions.

During my second (part-time) parental leave I planned and supervised the installation of the groups Ganeti cluster (a Google-originated virtualization project) to gain virtualization capabilities with increased availability and VM live migration capability. That allowed us to avoid the 9.000€+/year bills (6 nodes, live migration, real-time replicated VM storage) of the market leader.

In 2017 I improved the run-time of one of one of the main processing pipelines by annalising a bottleneck caused by overeager locking in a PostgreSQL-based queue implementation with the help of monitoring via Check_MK, determined just-in-time learning of PostgreSQL locking types and long-time JAVA knowledge. That reduces the run-time from more than a week to two days.

Just recently I started to climb the rough edges of CUDA installations for Deep Learning. If you are also intereseted in the combination of SLURM/batch schedulers/clusters, distributed TensorFlow and Ipython/Jupyter drop me a mail -- we should meet and talk, no matter what level of experience.

Co-Authorship

Ulrike Krieg-Holz, Christian Schuschnig, Franz Matthies, Benjamin Redling and Udo Hahn: CodE Alltag: A German-Language E-Mail Corpus. In: Nicoletta Calzolari (Conference Chair), Khalid Choukri, Thierry Declerck, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis (Eds.): Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), Portorož, Slovenia, 2016, pp. 2543-2550.


Recent Working Experience

  • Installation and mainteanance of the groups SLURM cluster (15 nodes+)
  • Installation and trouble-shooting of the groups MATLAB Distributed Computing Server (6 nodes, up to 95 cores/workers)
  • Setup and tuning of PostgreSQL installations (2TB+)
  • Basic Linux infrastructure (NFS4 file server, OpenLDAP, FAI)
  • Monitoring and annalising with Check_MK / OMD
  • Setup of ElasticSearch clusters
  • Planning, coaching a complex installation and on-going maintenance of the groups Ganeti (KVM/DRBD) cluster (currently 6 nodes)
  • Supervision of student assistants (literature/ranking, usability, web design, PHP development)
  • Supporting junior JAVA developers (stacktrace interpretation, language/VM implementation details, Tomcat setup and trouble-shooting, general problem solution and engineering methodolgy like versioning, bisecting, project documentation, etc.)
  • Configuration management and orchestration via ansible and basic shell scripting
  • Basic Python knowledge

Contact

Benjamin Redling
System administrator
 
Email

Phone  +49 3641 9-44323
Postal Address
Fürstengraben 27
07743 Jena
Germany
Room E 011