README.md 4.06 KB
Newer Older
Azat Garifullin's avatar
Azat Garifullin committed
1
2
# GPU server instructions

Azat Garifullin's avatar
Azat Garifullin committed
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

## Getting access 

Contact your supervisor to get login credentials to access the GPU server.

Access using the secure shell:
* Linux or MacOS: e.g.,
```
    $ ssh lut4753.pc.lut.fi
```
* Windows: see PuTTY (http://www.putty.org/)


## Available hardware

Lasse Lensu's avatar
Lasse Lensu committed
18
CPUs: 2x Intel(R) Xeon(R) CPU E5-2680 @ 2.70GHz
Azat Garifullin's avatar
Azat Garifullin committed
19
20

GPUs:
Lasse Lensu's avatar
Lasse Lensu committed
21
22
* 2x [GTX Titan 6GB GDDR5](https://www.geforce.com/hardware/desktop-gpus/geforce-gtx-titan/specifications)
* 2x [GTX 1080Ti 11 GB GDDR5X](https://www.nvidia.com/en-us/geforce/products/10series/geforce-gtx-1080-ti/)
Azat Garifullin's avatar
Azat Garifullin committed
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44

To check the current GPU load and available resources use:
```
nvidia-smi
```

**[WARNING] Important notes!!!**

We have limited amount of GPU resources. We kindly ask you to utilize the server
responsibly. If your computations do not require all the available GPUs consider 
specifying $CUDA_VISIBLE_DEVICES variable in your environment, e.g., to set only
GPU with index 1 visible:
```
export CUDA_VISIBLE_DEVICES=1
```
or in python
```python
import os
os.environ["CUDA_VISIBLE_DEVICES"]="1"
```
Some frameworks (e.g. TensorFlow) are greedy and they allocate resources on all the
available devices while utilizing only one of them effectively. By explicitly
Lasse Lensu's avatar
Lasse Lensu committed
45
46
specifying which GPU device you need you do not block other users from using
the server.   
Azat Garifullin's avatar
Azat Garifullin committed
47
48
49
50
51

## Local storage and data transfer

TBD.

Lasse Lensu's avatar
Lasse Lensu committed
52
53
54
55
56
Currently the server has only a small SSD drive (OS and staff home) and a large
HD (mounted to /media). If you are working with large datasets and/or your
experiments produce large log or model files, you should consider where they are
stored. Do not fill the SSD drive with anything extra.

Azat Garifullin's avatar
Azat Garifullin committed
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
Assuming that a user has two directories with code and data, e.g.:
```
/home/user/code/
/home/user/data/
```
You can copy your files with [scp](https://linux.die.net/man/1/scp) command:
```
scp -r /home/user/code user_login@server_ip:/media/students/user/
scp -r /home/user/data user_login@server_ip:/media/students/user/
```

## Dependencies and software installation

We are trying to keep the server clean and to avoid version conflicts. Thus,
docker is a preferred general way of handling dependencies (see docker section).
Python users may also consider using virtualenv (see Python section).

### Docker

From wiki:
> Docker is a computer program that performs operating-system-level virtualization, 
> also known as "containerization". It was first released in 2013 and is developed by 
> Docker, Inc.
>
> Docker is used to run software packages called "containers". Containers are 
> isolated from each other and bundle their own application, tools, libraries and 
> configuration files; they can communicate with each other through well-defined 
> channels. All containers are run by a single operating system kernel and are thus
> more lightweight than virtual machines. Containers are created from "images" that
> specify their precise contents. Images are often created by combining and modifying 
> standard images downloaded from public repositories. 

General instructions on how to use docker can be found: https://docker-curriculum.com/

There are plenty of docker images available, e.g.:
* [Anaconda](https://medium.com/@patrickmichelberger/getting-started-with-anaconda-docker-b50a2c482139) 
* [PyTorch](https://hub.docker.com/r/pytorch/pytorch/)
* [CNTK](https://docs.microsoft.com/en-us/cognitive-toolkit/CNTK-Docker-Containers)
* [Caffe](https://github.com/BVLC/caffe/tree/master/docker)
* [Caffe2](https://caffe2.ai/docs/docker-setup.html)
* Might be dead [Theano](https://hub.docker.com/r/kaixhin/cuda-theano/)
* [TensorFlow](https://www.tensorflow.org/install/docker)

More detailed instructions about basic docker usage with examples: https://git.it.lut.fi/mvpr-common/docker-examples

### Python users
In case you are using Python and you do not want to use docker we offer to use
virtual environments with [virtualenv](https://www.pythonforbeginners.com/basics/how-to-use-python-virtualenv/)
or [venv](https://docs.python.org/3/library/venv.html) modules. Instructions can
be found by the given links.

### MATLAB

TBD.


# Additional questions and support

Lasse Lensu's avatar
Lasse Lensu committed
115
Contact: azat.garifullin@lut.fi