README.md 4.64 KB
Newer Older
Azat Garifullin's avatar
Azat Garifullin committed
1
2
# GPU server instructions

Azat Garifullin's avatar
Azat Garifullin committed
3
4
5
6
7
8

## Getting access 

Contact your supervisor to get login credentials to access the GPU server.

Access using the secure shell:
9
* Linux, MacOS or Windows (Powershell): e.g.,
Azat Garifullin's avatar
Azat Garifullin committed
10
11
12
```
    $ ssh lut4753.pc.lut.fi
```
13
* Windows: see PuTTY (http://www.putty.org/) if using old version and ssh is not supported in Powershell 
Azat Garifullin's avatar
Azat Garifullin committed
14
15
16
17


## Available hardware

Lasse Lensu's avatar
Lasse Lensu committed
18
CPUs: 2x Intel(R) Xeon(R) CPU E5-2680 @ 2.70GHz
Azat Garifullin's avatar
Azat Garifullin committed
19
20

GPUs:
Lasse Lensu's avatar
Update    
Lasse Lensu committed
21
22
23
24
* 1x [Titan RTX 24GB GDDR6](https://www.nvidia.com/en-us/deep-learning-ai/products/titan-rtx/)
* 2x [GTX 1080Ti 11GB GDDR5X](https://www.nvidia.com/en-us/geforce/products/10series/geforce-gtx-1080-ti/)

* GPUs moved to lut8100.pc.lut.fi: 2x [GTX Titan 6GB GDDR5](https://www.geforce.com/hardware/desktop-gpus/geforce-gtx-titan/specifications)
Azat Garifullin's avatar
Azat Garifullin committed
25
26
27
28
29
30
31
32
33
34
35
36
37

To check the current GPU load and available resources use:
```
nvidia-smi
```

**[WARNING] Important notes!!!**

We have limited amount of GPU resources. We kindly ask you to utilize the server
responsibly. If your computations do not require all the available GPUs consider 
specifying $CUDA_VISIBLE_DEVICES variable in your environment, e.g., to set only
GPU with index 1 visible:
```
38
export CUDA_DEVICE_ORDER=PCI_BUS_ID
Azat Garifullin's avatar
Azat Garifullin committed
39
40
41
42
43
export CUDA_VISIBLE_DEVICES=1
```
or in python
```python
import os
44
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"
Azat Garifullin's avatar
Azat Garifullin committed
45
46
os.environ["CUDA_VISIBLE_DEVICES"]="1"
```
Lasse Lensu's avatar
Update    
Lasse Lensu committed
47

48
49
(The environment variable $CUDA_DEVICE_ORDER makes the GPU IDs to be identical
with the command nvidia-smi.)
Lasse Lensu's avatar
Update    
Lasse Lensu committed
50
51
52
53
Some frameworks (e.g. TensorFlow) are greedy and they allocate resources on all
the available devices while utilizing only one of them effectively. By
explicitly specifying which GPU device you need you do not block other users
from using the server.   
Azat Garifullin's avatar
Azat Garifullin committed
54
55
56
57
58

## Local storage and data transfer

TBD.

Lasse Lensu's avatar
Lasse Lensu committed
59
60
61
62
63
Currently the server has only a small SSD drive (OS and staff home) and a large
HD (mounted to /media). If you are working with large datasets and/or your
experiments produce large log or model files, you should consider where they are
stored. Do not fill the SSD drive with anything extra.

Azat Garifullin's avatar
Azat Garifullin committed
64
65
66
67
68
69
70
71
72
73
74
Assuming that a user has two directories with code and data, e.g.:
```
/home/user/code/
/home/user/data/
```
You can copy your files with [scp](https://linux.die.net/man/1/scp) command:
```
scp -r /home/user/code user_login@server_ip:/media/students/user/
scp -r /home/user/data user_login@server_ip:/media/students/user/
```

75
76
77
78
79
80
## Proposed workflow for new users

Proposed workflow uses docker + conda to ensure complete isolation and independence. This workflow is also oriented o the usage of [Jupyter Lab](https://jupyterlab.readthedocs.io/en/stable/).

For the convenience of users, we have prepared a number of bash scripts to simplify the workflow. [Relevant scripts and documetation](docker_utils/README.md).

Azat Garifullin's avatar
Azat Garifullin committed
81
82
83
84
85
86
87
88
89
## Dependencies and software installation

We are trying to keep the server clean and to avoid version conflicts. Thus,
docker is a preferred general way of handling dependencies (see docker section).
Python users may also consider using virtualenv (see Python section).

### Docker

From wiki:
Lasse Lensu's avatar
Update    
Lasse Lensu committed
90
91
92
> Docker is a computer program that performs operating-system-level
> virtualization, also known as "containerization". It was first released in
> 2013 and is developed by Docker, Inc.
Azat Garifullin's avatar
Azat Garifullin committed
93
>
Lasse Lensu's avatar
Update    
Lasse Lensu committed
94
95
96
97
98
99
100
101
> Docker is used to run software packages called "containers". Containers are
> isolated from each other and bundle their own application, tools, libraries
> and configuration files; they can communicate with each other through
> well-defined channels. All containers are run by a single operating system
> kernel and are thus more lightweight than virtual machines. Containers are
> created from "images" that specify their precise contents. Images are often
> created by combining and modifying standard images downloaded from public
> repositories. 
Azat Garifullin's avatar
Azat Garifullin committed
102
103
104
105
106
107
108
109

General instructions on how to use docker can be found: https://docker-curriculum.com/

There are plenty of docker images available, e.g.:
* [Anaconda](https://medium.com/@patrickmichelberger/getting-started-with-anaconda-docker-b50a2c482139) 
* [PyTorch](https://hub.docker.com/r/pytorch/pytorch/)
* [Caffe](https://github.com/BVLC/caffe/tree/master/docker)
* [Caffe2](https://caffe2.ai/docs/docker-setup.html)
Lasse Lensu's avatar
Update    
Lasse Lensu committed
110
* [CNTK](https://docs.microsoft.com/en-us/cognitive-toolkit/CNTK-Docker-Containers)
Azat Garifullin's avatar
Azat Garifullin committed
111
112
113
114
115
116
117
118
119
120
121
122
123
124
* [TensorFlow](https://www.tensorflow.org/install/docker)

### Python users
In case you are using Python and you do not want to use docker we offer to use
virtual environments with [virtualenv](https://www.pythonforbeginners.com/basics/how-to-use-python-virtualenv/)
or [venv](https://docs.python.org/3/library/venv.html) modules. Instructions can
be found by the given links.

### MATLAB

TBD.

# Additional questions and support

Lasse Lensu's avatar
Lasse Lensu committed
125
Contact: azat.garifullin@lut.fi