SelfAssembled Magnetic Surface Structures 
Model: 
M. Belkin^, A. Glatz*, A. Snezhko*, and I. S. Aranson* (^Department of Chemical Engineering,
Northwestern University, 2145 Sheridan Rd, Evanston, IL 60208; *Materials Science Division, Argonne National Laboratory, 9700 South Cass Avenue, Argonne, IL 60439)

Abstract: We propose a firstprinciples model for selfassembled
magnetic surface structures at the waterair interface reported in
earlier experiments [24]. The model is
based on the NavierStokes equation for liquids in shallow water
approximation coupled to Newton equations for interacting magnetic
particles suspended at the waterair interface. The model reproduces
most of the observed phenomenology, including spontaneous formation
of magnetic snakelike structures, generation of largescale vortex
flows, complex ferromagnetic/antiferromagnetic ordering of the snake,
and selfpropulsion of beadsnake hybrids.
The model provides valuable insights into selforganization
phenomena in a broad range of nonequilibrium magnetic and
electrostatic systems with competing interactions.
Details on the computations: (see also [1]) The computational algorithm is implemented for graphics processing units (GPUs). The dynamic equations in [1] were solved in a periodic x,y domain by a quasispectral method. We used a domain area of 160x160 in dimensionless units (the length is normalized by the layer height h_{0}), on a grid with 1024x1024 points, and up to 225 magnetic particles. The algorithm was implemented for massive parallel GPUs and run on a NVIDIA GTX285 GPU with a peak performance of 1 TFlop. Typically, a speed up more than 100 times was achieved compared to a fast Intel i7 CPU. This GPU setup allowed us to harness a supercomputerlike power on a single desktop computer. The equations were solved in discrete time steps by a semiimplicit algorithm using a quasispectral method. The algorithms were implemented using the (NVIDIA) CUDA programming model which allowed us to parallelized the code by unwrapping all mesh point and particle loops into socalled kernel routines supplying a thread index which replaced the loop variables. The linear part of equation (2) was solved in Fourier space using the CUDAfast Fourier toolbox routines. An important point in the implementation of the CUDA kernel routines is to avoid simultaneous write access to the used data structures and data transfer between the host CPU system and the graphics card. Simulation parameters: (see also [1])
supplementary materials:

References:
[1] arxiv.org, tba
[2] A. Snezhko, I.S. Aranson, and W.K. Kwok, Phys. Rev. Lett., 96, 078701 (2006). [3] M. Belkin, A. Snezhko, I. S. Aranson, and W.K. Kwok, Driven Magnetic Particles on a Fluid Surface: Pattern Assisted Surface Flows, Phys. Rev. Lett., 99, 158301 (2007). [PDF] [4] A. Snezhko, M. Belkin, I.S. Aranson, and W.K. Kwok, Phys. Rev. Lett., 102, 118103 (2009). 