Standalone backup agents for Linux, Mac, AIX & Solaris workloads on-premises or in the public cloud
Post Reply
Alvi
Influencer
Posts: 12
Liked: 10 times
Joined: Jul 08, 2015 10:19 am
Contact:

Pipeline timeout

Post by Alvi » 1 person likes this post

After watching the initial Backup Job with Beta2 for half an hour doing nothing, I looked the Job.log and found:

Code: Select all

[02.09.2016 14:26:29] <140070657427200> lpbcore|     [3509] out: --pex:11;52706672640;49142562816;0;49142562816;46649471144;22;77;81;21;77;71;131172927891526970
[02.09.2016 14:26:29] <140070657427200> lpbcore| Session progress: 41.2911%; processed size: 632840896512 processing speed: 107575188
[02.09.2016 14:56:33] <140070665819904> lpbcore|     [16d1] out: --asyncNtf:Pipeline timeout: it seems that pipeline hanged.
[02.09.2016 14:56:33] <140070674212608> lpbcore|     [281d] out: --asyncNtf:Pipeline timeout: it seems that pipeline hanged.
[02.09.2016 14:56:33] <140070682605312> lpbcore|     [5ec1] out: --asyncNtf:Pipeline timeout: it seems that pipeline hanged.
[02.09.2016 14:56:33] <140070657427200> lpbcore|     [3509] out: --asyncNtf:Pipeline timeout: it seems that pipeline hanged.
I tried to stop the job via "S", but veeam did not shut down the job.
I tried to stop the veeamservice, nothing happend (just the veeamservice stopped).
ps auxww | grep veeam told me:

Code: Select all

root      7704  9.3  0.0      0     0 ?        Zl   14:15   6:13 [veeamservice] <defunct>
root      8433  0.0  0.0      0     0 ?        S<   14:18   0:00 [veeamimage2]
root      8448  1.0  6.3 1394004 515652 ?      SNl  14:18   0:40 veeamagent --log /var/log/veeam/Backup/FullBackup/Session__20160902__141848___F01e052f8_Bcbc0_B4d77_Bb381_Bbe34369f0da0_G/Agent.Target.log
root      8465  6.1  2.9 2926856 236104 ?      SNl  14:18   3:53 veeamagent --log /var/log/veeam/Backup/FullBackup/Session__20160902__141848___F01e052f8_Bcbc0_B4d77_Bb381_Bbe34369f0da0_G/Agent.Source.log
Think I need to reboot the machine to get into a clean state?
PTide
Product Manager
Posts: 6551
Liked: 765 times
Joined: May 19, 2015 1:46 pm
Contact:

Re: Pipeline timeout

Post by PTide »

Hi,

Do you backup to some sort of a shared folder (NFS/SMB)? What are backup job settings?

Also please provide archived /var/log/veeam directory as well as an output from lsblk -af and dmesg -T.

Thanks
Alvi
Influencer
Posts: 12
Liked: 10 times
Joined: Jul 08, 2015 10:19 am
Contact:

Re: Pipeline timeout

Post by Alvi » 1 person likes this post

Is there no job config file somewhere? :)
It was just a simple full backup by volumes. Any volumes except the target chosen.

Where to send the tar file with all those informations?
PTide
Product Manager
Posts: 6551
Liked: 765 times
Joined: May 19, 2015 1:46 pm
Contact:

Re: Pipeline timeout

Post by PTide »

Is there no job config file somewhere?
You can export config with this command:

Code: Select all

veeamconfig config export --file my_config
<...>except the target chosen.
Is the target a separate physical block device? If so then is local or connected remotely (iSCSI/FC)?
Where to send the tar file with all those informations?
Please use dropbox or any other service of your choice.

Thanks
Alvi
Influencer
Posts: 12
Liked: 10 times
Joined: Jul 08, 2015 10:19 am
Contact:

Re: Pipeline timeout

Post by Alvi » 1 person likes this post

https://www.wetransfer.com/downloads/d7 ... 315/f14538

The target is a lvm volume on a local disk (see sdd in lsblk.txt)
The volumes sda1, sda2, sdb1 and sdc1 were the sources.
Everything is local.

Same configuration worked in Beta 1.

Have a nice weekend, I am shutting down now :)
Alvi
Influencer
Posts: 12
Liked: 10 times
Joined: Jul 08, 2015 10:19 am
Contact:

Re: Pipeline timeout

Post by Alvi » 1 person likes this post

On the next try again the Job hung, just at 98%.... :?

https://www.wetransfer.com/downloads/8f ... 406/cff20a

The disk sdc did only back up 1.1MB... should this happen on a volume based backup?
That volume worked fine in Beta 1.

During the Backup the system again got nearly unresponsive, you need to wait several seconds for every keystroke in a shell. The CPU load got up, I saw load averages about 6,93. I would like to throttle veeam there during backups... Any current/planned options for this?
Gostev
Chief Product Officer
Posts: 31816
Liked: 7312 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Pipeline timeout

Post by Gostev »

Actually Beta 2 has throttling built-in: the backup process is started with lower than default priority, and so it should not steal CPU resources from regular processes. Now, as long as CPU resources are available, there's no reason not to use its potential to the fullest - so high load is not a problem. The fact that your system becomes irresponsive is unexpected though, and this needs to be investigated.
tschlappinger
Lurker
Posts: 1
Liked: 1 time
Joined: Sep 05, 2016 8:26 am
Full Name: Timo Schlappinger
Contact:

Re: Pipeline timeout

Post by tschlappinger » 1 person likes this post

I have also the same problem. I can backup "/" and "/home" as filelevel Backup but when I do an backup of the entired maschine it will stop about 97 percent and I have to stop it manually.
It is an encrypted LVM volume on Fedora 24 x64. I will backup to an extrernal USB 3.0 HDD.

Attached you will find the logfiles, config and dmesg
https://dl.dropboxusercontent.com/u/202 ... onsult.zip

If you need any other information just ask for it!
PTide
Product Manager
Posts: 6551
Liked: 765 times
Joined: May 19, 2015 1:46 pm
Contact:

Re: Pipeline timeout

Post by PTide »

The target is a lvm volume on a local disk (see sdd in lsblk.txt)
The volumes sda1, sda2, sdb1 and sdc1 were the sources.
Everything is local.
As I can see sda2 is a PV belonging to vgmint which contains root LV and swap LV, sdb1 is a PV belonging to vgdata VG which contains "data" LV, sdc1 is a LUKS encrypted volume, and sdd1 is a volume with LUKS over it with LVM atop of LUKS, is that correct? Please also note that backup of encrypted devices is not officially supported yet. Btw, have you tried to exclude sdc1 from the backup job?
The CPU load got up, I saw load averages about 6,93
Did you manage to spot the most CPU consumer with htop or top? I suspect that luks software might be to blame for the cpu overhead as encryption algorithms might be cpu greedy.

Thanks
Alvi
Influencer
Posts: 12
Liked: 10 times
Joined: Jul 08, 2015 10:19 am
Contact:

Re: Pipeline timeout

Post by Alvi » 2 people like this post

Meanwhile I had a thir try:
This time I closed any programs, locked the screen and thought "come again in an hour and have a look"...
the thrid try ended up in a "Snapshot overflow" early after only 17 minutes.
Here are the logs:
https://www.wetransfer.com/downloads/43 ... 526/e4181b

The disk layout:
Correct. sdc1 is LUKS encrypted and LVM on top, just not mounted at backup time. The LVM on top of LUKS-sdd1 is mounted to /mnt/backup, chosen as backup target directory.
sdc1 did backup in Beta1 (at least Beta 1 told me 80G copied, never tried to restore that volume at all in Beta 1).
I would have thought a volume based backup will backup any blocks ignoring the underlying filesystem. Would make the restore "complete or nothing", but that would be better than nothing.

Just had started the fourth try:
It seems the pipeline hangs after 16 minutes. Now I could say the pipelime in try 2 and 4 hung just a few minutes after opening Google Chrome. I currently only tried to answer your post, but may be chrome uses so many resources that something similar than "Snapshot overflow" happens to the job?
The logs will not contain the pipeline hung message, this message often came long time after really losing the connection. Accidantially I hit "S" instead of "Esc" to leave the veeam console, so that would explain the cancel request in the logs, but the console says "Stopping" even 15 minutes after this...
https://www.wetransfer.com/downloads/ec ... 452/26e2a6

Watching top / htop:
during the system irresponsive top told me 2 jobs on top: "veeamagent" consuming about 120-166 CPU% and second another "veeamagent" consuming about 25-30 CPU%. Those processes were niced to NI 10 bringing them to priority PR 30.
Followed by the usual system lines some lines below came jobs like "veeamimage1", niced NI -20 to PR 0.
Within htop I could not see too much (I'm not used to htop, don't know how to really read it), at least I saw about 18 lines containing "veeamagent" for the current Session ID (even telling about NI 10 and PRI 30). between that some system lines (kworker, etc..)
I tried to do some screenshots, but the screenshot everytime was taken when the system became responsive again and showing "normal" values in top.
The current load went up to about 12. I'm up with you, the CPU should not be bored just to be bored :) . I think LUKS sounds like possibility, but I don't know how to find out that on my system.. top did not help me to verify that (top and htop went fine during the "hangs", just keyboard and mouse seemed to "wait").

I will reformat the target not using LUKS and start a next try then without sdc1. then I will give a next report.
Alvi
Influencer
Posts: 12
Liked: 10 times
Joined: Jul 08, 2015 10:19 am
Contact:

Re: Pipeline timeout

Post by Alvi » 2 people like this post

This time I excluded sdc and reformatted sdd1 as "raw" ext4, so no LUKS involved now, I did not start a chrome or anything else.

The system kept unresponsive but succesfully backed up, here is a sample top:

Code: Select all

top - 19:21:58 up  1:29,  1 user,  load average: 5,23, 5,03, 5,19
Tasks: 282 gesamt,   1 laufend, 281 schlafend,   0 gestoppt,   0 Zombie
%CPU(s):  0,9 be,  4,1 sy, 23,8 ni, 44,9 un, 26,0 wa,  0,0 hi,  0,2 si,  0,0 st
KiB Spch :  8116576 gesamt,   139084 frei,  2205844 belegt,  5771648 Puff/Cache
KiB Swap: 16383996 gesamt, 16383996 frei,        0 belegt.  5503264 verfü Spch 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     ZEIT+ BEFEHL
 5687 root      30  10 2933896 480784  15868 S 204,3  5,9  39:53.71 veeamagent
 5670 root      30  10 1694060 819984  15276 S  13,3 10,1   6:21.39 veeamagent
   63 root      20   0       0      0      0 S   5,3  0,0   1:20.68 kswapd0
 5662 root       0 -20       0      0      0 S   3,0  0,0   0:22.85 veeamimage2
 5604 root      20   0  428432  21848  12756 S   2,7  0,3   2:11.26 veeam
 4066 root      20   0  459480  81140  62332 S   2,0  1,0   1:43.19 Xorg
 4750 sven      20   0 1586976 180036  70480 S   1,7  2,2   1:00.38 cinnamon
 4966 sven      20   0  501092  44048  29196 S   0,7  0,5   0:22.87 gnome-terminal-
 7761 root      20   0       0      0      0 D   0,7  0,0   0:08.24 kworker/u16:2
 4053 root      20   0  866960  16496  10232 S   0,3  0,2   0:07.43 veeamservice
 8044 sven      20   0   42308   3928   3208 R   0,3  0,0   0:00.03 top
    1 root      20   0  186320   6992   4048 S   0,0  0,1   0:01.01 systemd
    2 root      20   0       0      0      0 S   0,0  0,0   0:00.00 kthreadd
    3 root      20   0       0      0      0 S   0,0  0,0   0:00.08 ksoftirqd/0
    5 root       0 -20       0      0      0 S   0,0  0,0   0:00.00 kworker/0:0H
    6 root      20   0       0      0      0 S   0,0  0,0   0:24.73 kworker/u16:0
    7 root      20   0       0      0      0 S   0,0  0,0   0:02.68 rcu_sched
    8 root      20   0       0      0      0 S   0,0  0,0   0:00.00 rcu_bh
    9 root      rt   0       0      0      0 S   0,0  0,0   0:00.00 migration/0
   10 root      rt   0       0      0      0 S   0,0  0,0   0:00.00 watchdog/0
   11 root      rt   0       0      0      0 S   0,0  0,0   0:00.00 watchdog/1
   12 root      rt   0       0      0      0 S   0,0  0,0   0:00.00 migration/1
   13 root      20   0       0      0      0 S   0,0  0,0   0:00.08 ksoftirqd/1
   15 root       0 -20       0      0      0 S   0,0  0,0   0:00.00 kworker/1:0H
   16 root      rt   0       0      0      0 S   0,0  0,0   0:00.00 watchdog/2
   17 root      rt   0       0      0      0 S   0,0  0,0   0:00.00 migration/2
   18 root      20   0       0      0      0 S   0,0  0,0   0:00.08 ksoftirqd/2
   20 root       0 -20       0      0      0 S   0,0  0,0   0:00.00 kworker/2:0H
   21 root      rt   0       0      0      0 S   0,0  0,0   0:00.00 watchdog/3
   22 root      rt   0       0      0      0 S   0,0  0,0   0:00.00 migration/3
   23 root      20   0       0      0      0 S   0,0  0,0   0:00.08 ksoftirqd/3
   25 root       0 -20       0      0      0 S   0,0  0,0   0:00.00 kworker/3:0H
   26 root      rt   0       0      0      0 S   0,0  0,0   0:00.00 watchdog/4
Will try again the "no-LUKS" tomorrow and starting chrome to verify that chrome or it's resource usage maybe a possible cause for the failures.
PTide
Product Manager
Posts: 6551
Liked: 765 times
Joined: May 19, 2015 1:46 pm
Contact:

Re: Pipeline timeout

Post by PTide »

Appreciate your research very much! Looking forward to get new results from you.
Alvi
Influencer
Posts: 12
Liked: 10 times
Joined: Jul 08, 2015 10:19 am
Contact:

Re: Pipeline timeout

Post by Alvi » 1 person likes this post

Did an incremental backup on the successful backup yesterday:
Everything went smooth and fine, needed about 14 minutes - was after a full shutdown - do you save CBT now?

Deleted the backup, did a new full backup while "working" with chrome and other apps (working is good when your system responses only every 15-60 seconds). Flash videos kept playing (sometimes sound was ok but the images hung), everything backed up successful, but as mentioned, the system kept unresponsive. Till this point no LUKS involved.
PTide
Product Manager
Posts: 6551
Liked: 765 times
Joined: May 19, 2015 1:46 pm
Contact:

Re: Pipeline timeout

Post by PTide »

Hi tschlappinger,
If you need any other information just ask for it!
May I ask you to exclude encrypted volumes from the backup job and try to backup again?

Alvi,

Another question
Just had started the fourth try:
It seems the pipeline hangs after 16 minutes <...>The logs will not contain the pipeline hung message, this message often came long time after really losing the connection.
Then why did you say it was pipeline if no pipeline error message had been shown?

Thanks
Post Reply

Who is online

Users browsing this forum: No registered users and 8 guests