-
- Enthusiast
- Posts: 78
- Liked: 46 times
- Joined: Dec 10, 2019 3:59 pm
- Full Name: Ryan Walker
- Contact:
CEPH/45Drives
Looking to see if anyone has experience with CEPH backed XFS server(s) as a VBR repository.
The design would be a front end server with a CEPH driver hosting a single large XFS repository that can grow quite large, with just expanding the CEPH cluster behind it.
---------------------------------------------------------------------------------------------------------------
Currently running all-flash in a traditional RAID on ReFS and it's working great, but RAID cards are generally speaking limited on their scale, maxing at ~32-drives per VD (which at 7.68TB max current yields between 209-216TB usable raid-5/raid-6|raid-5E). Within 24 months it's likely our repository needs will be closer to or over 300TB.
With rebuild times on even SATA SSD I'm not super worried about running Raid-5 with a cold spare or hot spare, but with the lack of scale short of doing a bunch of 2U-3U repositories and needing to organize that way, a cluster solution is becoming more attractive.
---------------------------------------------------------------------------------------------------------------
On the all flash conversation: We do weekly full tapes in excess of 80TB (growing to 160+ within 24 months) requiring at least 600MB/s WHILE running normal backup jobs, and NL-SAS in the configuration required to ensure rebuilds don't cripple the environment (spindle dense raid-60) supports the sequential more or less, but any time you want to do anything else to it, good-luck-have-fun. Breaking it out into SOBR ruins our ReFS/XFS block clone for synth fulls so isn't really an option. Plus the amount of power is substantially different in a massive NL-SAS setup vs Flash. I'm entertaining Exagrid for some quotes as well, but it'd require multiple of their EX84 units which I'm sure isn't going to be cheap.
The design would be a front end server with a CEPH driver hosting a single large XFS repository that can grow quite large, with just expanding the CEPH cluster behind it.
---------------------------------------------------------------------------------------------------------------
Currently running all-flash in a traditional RAID on ReFS and it's working great, but RAID cards are generally speaking limited on their scale, maxing at ~32-drives per VD (which at 7.68TB max current yields between 209-216TB usable raid-5/raid-6|raid-5E). Within 24 months it's likely our repository needs will be closer to or over 300TB.
With rebuild times on even SATA SSD I'm not super worried about running Raid-5 with a cold spare or hot spare, but with the lack of scale short of doing a bunch of 2U-3U repositories and needing to organize that way, a cluster solution is becoming more attractive.
---------------------------------------------------------------------------------------------------------------
On the all flash conversation: We do weekly full tapes in excess of 80TB (growing to 160+ within 24 months) requiring at least 600MB/s WHILE running normal backup jobs, and NL-SAS in the configuration required to ensure rebuilds don't cripple the environment (spindle dense raid-60) supports the sequential more or less, but any time you want to do anything else to it, good-luck-have-fun. Breaking it out into SOBR ruins our ReFS/XFS block clone for synth fulls so isn't really an option. Plus the amount of power is substantially different in a massive NL-SAS setup vs Flash. I'm entertaining Exagrid for some quotes as well, but it'd require multiple of their EX84 units which I'm sure isn't going to be cheap.
-
- Veteran
- Posts: 528
- Liked: 143 times
- Joined: Aug 20, 2015 9:30 pm
- Contact:
Re: CEPH/45Drives
If you need more than 32 drives, why not run RAID 50/RAID 60? I've got volumes with at least 40 drives in RAID 60 on a MegaRAID controller.
-
- Veeam Legend
- Posts: 1197
- Liked: 415 times
- Joined: Dec 17, 2015 7:17 am
- Contact:
Re: CEPH/45Drives
Or: just use LVM (if you want with striping to even get more speed over multiple RAIDs)! The flexibility LVM brings are just too great, especially since its gives you a way to migrate the XFS without loosing block cloning.
-
- Enthusiast
- Posts: 78
- Liked: 46 times
- Joined: Dec 10, 2019 3:59 pm
- Full Name: Ryan Walker
- Contact:
Re: CEPH/45Drives
Thanks for the responses. Raid 50 I've run before in the cisco "Veeam Machine" S3260 with good results, but those were NL-SAS and no where near the kind of i/o I currently need (though great for cheap and deep). The problem with 5+0 and 6+0 in SSD is that if it's struggling on a Raid-5 of 24 SSD, it's going to be worse on doing parity AND striping. At that point, Linux LVM...
And one option is to just do a 32-bay Linux box with XFS on it - it's not as expensive, but it lacks the Object Storage side which makes the Ceph cluster very attractive. Otherwise it's another 25-30k for a 4x1U cluster storage running something like Min.IO - but while it's a good solution it's Opex in cost, which is hard to get approved for a company historically very capex heavy.
With LVM, are you saying you just would run a Docker of Veeam and XFS, or does XFS in an LVM act like a virtual disk that can be copied between servers? I actually do that currently for one of my repositories, as I have dedupe enabled (and you can't do Dedupe on over 64TB volume, and the raid it sits on is 80TB) - but the fact I don't lose the block clone if moving the vhdx is really cool.
And one option is to just do a 32-bay Linux box with XFS on it - it's not as expensive, but it lacks the Object Storage side which makes the Ceph cluster very attractive. Otherwise it's another 25-30k for a 4x1U cluster storage running something like Min.IO - but while it's a good solution it's Opex in cost, which is hard to get approved for a company historically very capex heavy.
With LVM, are you saying you just would run a Docker of Veeam and XFS, or does XFS in an LVM act like a virtual disk that can be copied between servers? I actually do that currently for one of my repositories, as I have dedupe enabled (and you can't do Dedupe on over 64TB volume, and the raid it sits on is 80TB) - but the fact I don't lose the block clone if moving the vhdx is really cool.
-
- Veeam Legend
- Posts: 1197
- Liked: 415 times
- Joined: Dec 17, 2015 7:17 am
- Contact:
Re: CEPH/45Drives
I never used LVM to copy something between servers but in theory you could attach an external storage device (iSCSI, FC), move the volumes online and after its finished detach and attach to another host. But all that is on block level. Thinking about it you could do some crazy stuff with loopback devices, that way you could migrate a volume via normal files .
-
- Enthusiast
- Posts: 78
- Liked: 46 times
- Joined: Dec 10, 2019 3:59 pm
- Full Name: Ryan Walker
- Contact:
Re: CEPH/45Drives
It's one of the frustrating things about the ReFS integration - it's amazing - but GLHF if you want to take it off and onto another server. Right now my 140TB used on the main volume has... 489.5TB in files. And it could be a lot more to be honest, but we keep mid-monthly tapes for a year and monthly tapes forever, so we don't 'need' to keep as many on disk.
But with VHDX you're limited to a 64TB max size in ReFS, so in the future one way of doing this could be to make multiple VHDXs with ReFS per server, and thus moving them to new storage would keep the savings...
Scale Out is an option as well, but - as far as I understand it - scale out doesn't do well with ReFS as those blocks have to be on the same ReFS repository anyways.
But with VHDX you're limited to a 64TB max size in ReFS, so in the future one way of doing this could be to make multiple VHDXs with ReFS per server, and thus moving them to new storage would keep the savings...
Scale Out is an option as well, but - as far as I understand it - scale out doesn't do well with ReFS as those blocks have to be on the same ReFS repository anyways.
-
- Chief Product Officer
- Posts: 31690
- Liked: 7201 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: CEPH/45Drives
The good news is, V12 will have an ability to move backups to another ReFS or XFS volume with block cloning thus preserving those great physical disk space savings you're seeing.
-
- Veeam Legend
- Posts: 1197
- Liked: 415 times
- Joined: Dec 17, 2015 7:17 am
- Contact:
Re: CEPH/45Drives
@Gostev Did i understand right? ReFS to XFS and back?!
Wow!
Please make that feature not an all or nothing thing so that we can migrate one, many or all backups at will!
Wow!
Please make that feature not an all or nothing thing so that we can migrate one, many or all backups at will!
-
- Product Manager
- Posts: 9689
- Liked: 2562 times
- Joined: May 13, 2017 4:51 pm
- Full Name: Fabian K.
- Location: Switzerland
- Contact:
Re: CEPH/45Drives
That was announced on VeeamOn 2021. It was my favorite announcement from AntonDid i understand right? ReFS to XFS and back?!
Wow!
Makes migration to new storages much easier.
Product Management Analyst @ Veeam Software
-
- Chief Product Officer
- Posts: 31690
- Liked: 7201 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
-
- Enthusiast
- Posts: 78
- Liked: 46 times
- Joined: Dec 10, 2019 3:59 pm
- Full Name: Ryan Walker
- Contact:
-
- Novice
- Posts: 3
- Liked: never
- Joined: May 03, 2013 10:20 pm
- Full Name: Jens Galsgaard
- Contact:
Re: CEPH/45Drives
While not on 45Drives hardware, I run a small vanilla ceph cluster on Supermicro hardware. In front of this I have a linux host with a reflink enabled XFS file system which serves as a linux repository for veeam.
Adding drives to ceph, upgrading ceph, updating the OS (Rocky8) has been a breeze and never ever affects production.
Put one host in maintenance mode and do what you have planned, once out of maintenance mode the cluster will backfill what it needs.
I started the cluster on 3 VMs with each 5x500GB vDisks. When I was satisfied with the "PoC" I just added physical servers to the cluster and one-by-one took out the virtual disks. The cluster reorganizes or balances the data across the cluster when you add or remove disks.
The above is somewhat simplified as disks are handled by an OSD (object storage daemon).
When gathering knowledge on the subject of ceph beware of "old" content and a steep learning curve
Adding drives to ceph, upgrading ceph, updating the OS (Rocky8) has been a breeze and never ever affects production.
Put one host in maintenance mode and do what you have planned, once out of maintenance mode the cluster will backfill what it needs.
I started the cluster on 3 VMs with each 5x500GB vDisks. When I was satisfied with the "PoC" I just added physical servers to the cluster and one-by-one took out the virtual disks. The cluster reorganizes or balances the data across the cluster when you add or remove disks.
The above is somewhat simplified as disks are handled by an OSD (object storage daemon).
When gathering knowledge on the subject of ceph beware of "old" content and a steep learning curve
-
- Enthusiast
- Posts: 43
- Liked: 8 times
- Joined: Aug 24, 2012 11:59 am
- Contact:
Re: CEPH/45Drives
I have experience with deploying Ceph(Uptream and RHCS), with VBR and/or VBO as storage consumers with a wide range of Ceph cluster sizes.Looking to see if anyone has experience with CEPH backed XFS server(s) as a VBR repository.
The design would be a front end server with a CEPH driver hosting a single large XFS repository that can grow quite large, with just expanding the CEPH cluster behind it.
Scaling to between 300TB and 1PB usable is quite achievable.
Apart from designing the hardware to match your performance requirements, you also need to keep in mind that if you make a single large XFS repository then, at least on RHEL, the max _certified_ FS size is 1PiB ( https://access.redhat.com/solutions/1532 )
iirc. you need to diverge from the default FS options to make it larger.
Why consider Ceph for VBR:
#1 It scales, a lot, both out and up.
#2 It provides Block, Object(S3 compatible) and file storage in 1 solution
#3 It can be setup as a stretched cluster, as long as RTT is low.
#4 It can do a-sync replication on block and/or S3 to another Ceph cluster, say for a DR site or to a 3. party for archival etc.
#5 It runs on standard hardware(no vendor-lockin)
#6 Ceph is self-healing, you can therefore have the "cattle" approach to your storage, instead of "housepet" approach, since there is no need to rush to the Datacenter just because a failure domain (disk, server, pdu, etc) is failed.
As @gallenat0r points out, you should be aware of how hold / what version any documents refer to, since both the "ease-of-use" aspect of ceph and features have improved/changed quite a bit over the past few releases.
-
- Enthusiast
- Posts: 78
- Liked: 46 times
- Joined: Dec 10, 2019 3:59 pm
- Full Name: Ryan Walker
- Contact:
Re: CEPH/45Drives
Wanted to offer an update on this as there's not much out there, and this community is pretty big on word of mouth.
Overall, it's working great. We backup from two PureStorage Arrays (X50R4 and C60R3) in a Hyper-V cluster of (7) R650s, end to end 32gb FC for the HBAs and end to end SFP28 (25Gbps) on the networking. Most of the time, the bottleneck is either the proxy (which is saying something as those are dual Xeon Gold 6326 with 768Gb memory per host) and on occasion it's the Destination - but that's mostly because CEPH likes more simultaneous writes than our jobs currently can do - if we broke up our jobs into multiple smaller jobs, and ran them at the same time, it'd probably go faster... But I rarely see it go below 500MB/s on any of my jobs, which range in size from 5TB to 30TB (incremental runs are normally more in the 10GB-50GB range every two hours or so).
45 Drives
Here are some stats I collected... I'll include my script I made for this below, because we don't own Veeam One to give me pretty reporting
(Workload is misc infrastructure servers)
Backup Type: Incremental
Over 22 of Incremental
Average Transferred Data: 96.5 GB
Average Processing Rate: 991.51 MB/s
Average Bottleneck: 87.14
Average Duration: 47.19 minutes
Average Processed Used Size: 12887.52 GB
Average Total Objects: 73.86
Backup Type: Synthetic
Over 3 of Synthetic
Average Transferred Data: 92.7 GB
Average Processing Rate: 1020.5 MB/s
Average Bottleneck: 72.33
Average Duration: 41.94 minutes
Average Processed Used Size: 11342.74 GB
Average Total Objects: 65
(Workload is file/sql/app servers w/o dedupe enabled on File Server)
Backup Type: Incremental
Over 315 of Incremental
Average Transferred Data: 22.4 GB
Average Processing Rate: 555.49 MB/s
Average Bottleneck: 78.15
Average Duration: 23.3 minutes
Average Processed Used Size: 25926.19 GB
Average Total Objects: 7
Backup Type: Synthetic
Over 5 of Synthetic
Average Transferred Data: 31.76 GB
Average Processing Rate: 737.63 MB/s
Average Bottleneck: 74.2
Average Duration: 26.11 minutes
Average Processed Used Size: 24690.57 GB
Average Total Objects: 7
(Workload is a Windows File Server w/Windows dedupe enabled)
Backup Type: Incremental
Over 331 of Incremental
Average Transferred Data: 26.71 GB
Average Processing Rate: 918.82 MB/s
Average Bottleneck: 67.83
Average Duration: 7.96 minutes
Average Processed Used Size: 38380.93 GB
Average Total Objects: 1.01
Backup Type: Synthetic
Over 2 of Synthetic
Average Transferred Data: 0.24 GB
Average Processing Rate: 141.42 MB/s
Average Bottleneck: 41
Average Duration: 57.32 minutes
Average Processed Used Size: 38150.53 GB
Average Total Objects: 1
(Workload is Misc SQL/App servers)
Backup Type: Incremental
Over 277 of Incremental
Average Transferred Data: 19.57 GB
Average Processing Rate: 776.88 MB/s
Average Bottleneck: 75.06
Average Duration: 29.93 minutes
Average Processed Used Size: 12521.46 GB
Average Total Objects: 41.05
Backup Type: Synthetic
Over 4 of Synthetic
Average Transferred Data: 11.35 GB
Average Processing Rate: 777.29 MB/s
Average Bottleneck: 73.25
Average Duration: 23.6 minutes
Average Processed Used Size: 9436.96 GB
Average Total Objects: 31
Overall, it's working great. We backup from two PureStorage Arrays (X50R4 and C60R3) in a Hyper-V cluster of (7) R650s, end to end 32gb FC for the HBAs and end to end SFP28 (25Gbps) on the networking. Most of the time, the bottleneck is either the proxy (which is saying something as those are dual Xeon Gold 6326 with 768Gb memory per host) and on occasion it's the Destination - but that's mostly because CEPH likes more simultaneous writes than our jobs currently can do - if we broke up our jobs into multiple smaller jobs, and ran them at the same time, it'd probably go faster... But I rarely see it go below 500MB/s on any of my jobs, which range in size from 5TB to 30TB (incremental runs are normally more in the 10GB-50GB range every two hours or so).
45 Drives
- Canadia company - makes logistics for warranty stuff a little harder, but they pay shipping
- Support is not 24/7
- Despite not being 24/7, support is amazing
- Also offers Linux block hour support which personally I leverage as I'm 'okay' with Linux, but especially since I'm running CEPH, it's been very worth it
- Prices are great; yeah you could probably whitebox it cheaper, but I'm fond of having someone else deal with hardware issues (even if I'm technically doing the on site hardware swaps)
- And on that note, yes it's not on site support - but their hardware is very user friendly, and they're constantly improving their designs
- Boy is this system a bit confusing - if you want set and forget, buy a Vast Data or Pure FlashBlade
- FOR THE COST I personally think it's well worth it, especially having 45 Drives help with the support of CEPH (in theory you could probably buy support for CEPH elsewhere, but these guys are smart cookies0
- Flexibility isn't bad but isn't as simple as the Vast/Pure options out there
- Performance on Block/XFS is fine - it's not super amazing but considering what it's doing, I have no complaints
- Performance on object is...Okay? I really can't find much information on what sort of performance Object is 'supposed' to get but for VBO / M365 stuff it's more than enough (I haven't tried direct to object VBR or Archive tier on it, yet)
- BIGGEST ISSUE is just keeping the system updated - I don't have ansible or Chef to automate it, or it'd be a lot easier - but even still it's not 'hard' you just have to put the CEPH cluster into maintenance and reboot each system one at a time
- POSSIBLE issue for some users: by default they don't configure any sort of load balancing on the FILE level side, as the Veeam data mover exists on one of my Gateways, not both/not load balanced - in theory I think this can still be done but I haven't dug in too hard, as frankly if one of my gateways dies, they helped me build a quick set of commands that brings the XFS file system up on the 2nd gateway, and then I just need to change the IP on Veeam for that repo to point to the new Gateway and it'll be fine (I'd love to hear some Linux nerds idea on how to load balance between them or setup an active/passive setup, but don't think it's really feasible)
- Note: S3 natively supports LB between the gateways, it's just how we're presenting the storage to the gateways for XFS that makes it not inherently 'load balanced' for ingress/egress - but the actual cluster data is fully LB/HA
- 2x1U gateway boxes running (1)Xeon Silver and 2xSFP28
- 4x2U Cluster boxes running (1) Xeon Gold and 4xSFP28 - each cluster node currently has 15x7.68TB Micron (Pro?) SSDs and can go up to I think 30 or 35
- My usable capacity is around 300TB with the 4:2 erasure coding
- The XFS is running on a linux 'raid-0' of 4x50TB volumes presented by CEPH to one of the gateways, resulting in a 200TB volume
- Note: while it's raid-0, that's not really a concern as the underlying CEPH handles each of those disks between 4 Cluster nodes; I've literally lost an entire node (all 15 drives) with no impact, which is how i can update it without impact (except the Gateway, which is a pretty quick reboot anyways)
- One minor change we found is that the OSDs were having false positive failures until we bumped up the Memory allocation to each of them - this is likely just because CEPH expects slower i/o on the disks as it was originally built for HDD, and so the SSDs seem to need more 'memory' likely to keep their drives actually working; thankfully these were all specced with 256GB of RAM per node, so we have more than enough for both now and if I fully populate (they designed it to scale very well with literally just needing to buy new SSD and pay for the CEPH nerds to configure the new drives)
Here are some stats I collected... I'll include my script I made for this below, because we don't own Veeam One to give me pretty reporting
(Workload is misc infrastructure servers)
Backup Type: Incremental
Over 22 of Incremental
Average Transferred Data: 96.5 GB
Average Processing Rate: 991.51 MB/s
Average Bottleneck: 87.14
Average Duration: 47.19 minutes
Average Processed Used Size: 12887.52 GB
Average Total Objects: 73.86
Backup Type: Synthetic
Over 3 of Synthetic
Average Transferred Data: 92.7 GB
Average Processing Rate: 1020.5 MB/s
Average Bottleneck: 72.33
Average Duration: 41.94 minutes
Average Processed Used Size: 11342.74 GB
Average Total Objects: 65
(Workload is file/sql/app servers w/o dedupe enabled on File Server)
Backup Type: Incremental
Over 315 of Incremental
Average Transferred Data: 22.4 GB
Average Processing Rate: 555.49 MB/s
Average Bottleneck: 78.15
Average Duration: 23.3 minutes
Average Processed Used Size: 25926.19 GB
Average Total Objects: 7
Backup Type: Synthetic
Over 5 of Synthetic
Average Transferred Data: 31.76 GB
Average Processing Rate: 737.63 MB/s
Average Bottleneck: 74.2
Average Duration: 26.11 minutes
Average Processed Used Size: 24690.57 GB
Average Total Objects: 7
(Workload is a Windows File Server w/Windows dedupe enabled)
Backup Type: Incremental
Over 331 of Incremental
Average Transferred Data: 26.71 GB
Average Processing Rate: 918.82 MB/s
Average Bottleneck: 67.83
Average Duration: 7.96 minutes
Average Processed Used Size: 38380.93 GB
Average Total Objects: 1.01
Backup Type: Synthetic
Over 2 of Synthetic
Average Transferred Data: 0.24 GB
Average Processing Rate: 141.42 MB/s
Average Bottleneck: 41
Average Duration: 57.32 minutes
Average Processed Used Size: 38150.53 GB
Average Total Objects: 1
(Workload is Misc SQL/App servers)
Backup Type: Incremental
Over 277 of Incremental
Average Transferred Data: 19.57 GB
Average Processing Rate: 776.88 MB/s
Average Bottleneck: 75.06
Average Duration: 29.93 minutes
Average Processed Used Size: 12521.46 GB
Average Total Objects: 41.05
Backup Type: Synthetic
Over 4 of Synthetic
Average Transferred Data: 11.35 GB
Average Processing Rate: 777.29 MB/s
Average Bottleneck: 73.25
Average Duration: 23.6 minutes
Average Processed Used Size: 9436.96 GB
Average Total Objects: 31
Code: Select all
<#
====================================================
| Veeam 30-day stats script for backup jobs |
| Created by: RDW |
| Contact: Veeam Forums? If I respond.... |
| Revision: 1.1 |
====================================================#>
# Get a list of all Veeam jobs with "backup" in the job type, excluding "Windows Agent Backup" and "Backup Copy" types, and display them with numbers and types
$allJobs = Get-VBRJob | Where-Object { $_.TypeToString -like '*backup*' -and $_.TypeToString -notlike '*Windows Agent Backup*' -and $_.TypeToString -notlike '*Backup Copy*' }
$jobNumberMapping = @{} # Create a hashtable to map job numbers to job names
for ($i = 0; $i -lt $allJobs.Count; $i++) {
$job = $allJobs[$i]
$jobType = $job.TypeToString # Get the job type
$jobNumber = $i + 1
$jobNumberMapping[$jobNumber] = $job.Name
Write-Host "$($jobNumber). $($job.Name) ($jobType)"
}
# Prompt the hobos to select a job
$jobIndex = Read-Host "Enter the number of the job you want to analyze"
# Validate user input
if ($jobIndex -match '^\d+$' -and $jobNumberMapping.ContainsKey([int]$jobIndex)) {
$jobName = $jobNumberMapping[[int]$jobIndex]
$selectedJob = $allJobs | Where-Object { $_.Name -eq $jobName }
$jobType = $selectedJob.TypeToString # Get the job type
# Calculate the start and end dates for the last month
$endDate = Get-Date
$startDate = $endDate.AddMonths(-1)
# Get all sessions for the selected job name within the last month
$sessions = Get-VBRBackupSession | Where-Object { $_.JobName -eq $jobName -and $_.EndTime -ge $startDate -and $_.EndTime -le $endDate -and $_.Result -eq "Success" }
# Sort the sessions by end time in descending order (most recent first)
$sessions = $sessions | Sort-Object -Property EndTime -Descending
# Initialize variables to store total transferred data, average processing rate, average bottleneck, total duration, total processed used size, and total objects
$totalTransferredData = 0
$averageProcessingRate = 0
$totalBottleneck = 0
$totalDuration = [TimeSpan]::Zero
$totalProcessedUsedSize = 0
$totalObjects = 0
$incrementalSessions = @()
$syntheticSessions = @()
# Iterate through sessions and separate them into Incremental and Synthetic
foreach ($session in $sessions) {
$totalTransferredData += $session.Progress.TransferedSize
$averageProcessingRate += $session.Progress.AvgSpeed
$bottleneckInfo = $session.Progress.BottleneckInfo
$highestBottleneckCategory = $bottleneckInfo.PSObject.Properties | Sort-Object { $_.Value } | Select-Object -Last 1
$totalBottleneck += [int]$highestBottleneckCategory.Value
$totalDuration += $session.Progress.Duration
$totalProcessedUsedSize += $session.Progress.ProcessedUsedSize
$totalObjects += $session.Progress.TotalObjects
# Separate sessions into Incremental and Synthetic based on their names
if ($session.Name -like "*Incremental*") {
$incrementalSessions += $session
} elseif ($session.Name -like "*Synthetic*") {
$syntheticSessions += $session
}
}
# Calculate the average transferred data, processing rate, bottleneck, duration, processed used size, and objects for each type
$averageTransferredDataGB = [math]::Round(($totalTransferredData / $incrementalSessions.Count) / 1GB, 2) # GB
$averageProcessingRateMBps = [math]::Round($averageProcessingRate / $incrementalSessions.Count / 1MB, 2) # MB/s
$averageBottleneck = [math]::Round($totalBottleneck / $incrementalSessions.Count, 2)
$averageDurationSeconds = [math]::Round($totalDuration.TotalSeconds / $incrementalSessions.Count, 2)
$durationUnit = "seconds"
if ($averageDurationSeconds -ge 3600) {
$averageDuration = [math]::Round($averageDurationSeconds / 3600, 2)
$durationUnit = "hours"
} elseif ($averageDurationSeconds -ge 120) {
$averageDuration = [math]::Round($averageDurationSeconds / 60, 2)
$durationUnit = "minutes"
} else {
$averageDuration = $averageDurationSeconds
}
$averageProcessedUsedSizeGB = [math]::Round(($totalProcessedUsedSize / $incrementalSessions.Count) / 1GB, 2) # GB
$averageTotalObjects = [math]::Round($totalObjects / $incrementalSessions.Count, 2)
# Display the results with colors
Write-Host "Job Name: $($jobName)" -ForegroundColor Yellow
Write-Host "Job Type: $($jobType)" -ForegroundColor Yellow
# Display Incremental results
Write-Host "Backup Type: Incremental" -ForegroundColor Yellow
Write-Host "Over $($incrementalSessions.Count) of Incremental" -ForegroundColor Yellow
Write-Host "Average Transferred Data: $averageTransferredDataGB GB" -ForegroundColor Green
Write-Host "Average Processing Rate: $averageProcessingRateMBps MB/s" -ForegroundColor Green
Write-Host "Average Bottleneck: $averageBottleneck" -ForegroundColor Green
Write-Host "Average Duration: $averageDuration $durationUnit" -ForegroundColor Green
Write-Host "Average Processed Used Size: $averageProcessedUsedSizeGB GB" -ForegroundColor Green
Write-Host "Average Total Objects: $averageTotalObjects" -ForegroundColor Green
# Calculate the same statistics for Synthetic sessions
$totalTransferredData = 0
$averageProcessingRate = 0
$totalBottleneck = 0
$totalDuration = [TimeSpan]::Zero
$totalProcessedUsedSize = 0
$totalObjects = 0
foreach ($session in $syntheticSessions) {
$totalTransferredData += $session.Progress.TransferedSize
$averageProcessingRate += $session.Progress.AvgSpeed
$bottleneckInfo = $session.Progress.BottleneckInfo
$highestBottleneckCategory = $bottleneckInfo.PSObject.Properties | Sort-Object { $_.Value } | Select-Object -Last 1
$totalBottleneck += [int]$highestBottleneckCategory.Value
$totalDuration += $session.Progress.Duration
$totalProcessedUsedSize += $session.Progress.ProcessedUsedSize
$totalObjects += $session.Progress.TotalObjects
}
$averageTransferredDataGB = [math]::Round(($totalTransferredData / $syntheticSessions.Count) / 1GB, 2) # GB
$averageProcessingRateMBps = [math]::Round($averageProcessingRate / $syntheticSessions.Count / 1MB, 2) # MB/s
$averageBottleneck = [math]::Round($totalBottleneck / $syntheticSessions.Count, 2)
$averageDurationSeconds = [math]::Round($totalDuration.TotalSeconds / $syntheticSessions.Count, 2)
$durationUnit = "seconds"
if ($averageDurationSeconds -ge 3600) {
$averageDuration = [math]::Round($averageDurationSeconds / 3600, 2)
$durationUnit = "hours"
} elseif ($averageDurationSeconds -ge 120) {
$averageDuration = [math]::Round($averageDurationSeconds / 60, 2)
$durationUnit = "minutes"
} else {
$averageDuration = $averageDurationSeconds
}
$averageProcessedUsedSizeGB = [math]::Round(($totalProcessedUsedSize / $syntheticSessions.Count) / 1GB, 2) # GB
$averageTotalObjects = [math]::Round($totalObjects / $syntheticSessions.Count, 2)
# Display Synthetic results
Write-Host "Backup Type: Synthetic" -ForegroundColor Yellow
Write-Host "Over $($syntheticSessions.Count) of Synthetic" -ForegroundColor Yellow
Write-Host "Average Transferred Data: $averageTransferredDataGB GB" -ForegroundColor Green
Write-Host "Average Processing Rate: $averageProcessingRateMBps MB/s" -ForegroundColor Green
Write-Host "Average Bottleneck: $averageBottleneck" -ForegroundColor Green
Write-Host "Average Duration: $averageDuration $durationUnit" -ForegroundColor Green
Write-Host "Average Processed Used Size: $averageProcessedUsedSizeGB GB" -ForegroundColor Green
Write-Host "Average Total Objects: $averageTotalObjects" -ForegroundColor Green
} else {
Write-Host "Invalid selection. Please enter a valid number." -ForegroundColor Red
}
# Get a list of all Veeam jobs with "backup" in the job type and display them with numbers and types
$allJobs = Get-VBRJob | Where-Object { $_.TypeToString -like '*backup*' }
$jobNumberMapping = @{} # Create a hashtable to map job numbers to job names
for ($i = 0; $i -lt $allJobs.Count; $i++) {
$job = $allJobs[$i]
$jobType = $job.TypeToString # Get the job type
$jobNumber = $i + 1
$jobNumberMapping[$jobNumber] = $job.Name
Write-Host "$($jobNumber). $($job.Name) ($jobType)"
}
# Prompt the user to select a job
$jobIndex = Read-Host "Enter the number of the job you want to analyze"
# Validate user input
if ($jobIndex -match '^\d+$' -and $jobNumberMapping.ContainsKey([int]$jobIndex)) {
$jobName = $jobNumberMapping[[int]$jobIndex]
$selectedJob = $allJobs | Where-Object { $_.Name -eq $jobName }
$jobType = $selectedJob.TypeToString # Get the job type
# Calculate the start and end dates for the last month
$endDate = Get-Date
$startDate = $endDate.AddMonths(-1)
# Get all sessions for the selected job name within the last month
$sessions = Get-VBRBackupSession | Where-Object { $_.JobName -eq $jobName -and $_.EndTime -ge $startDate -and $_.EndTime -le $endDate -and $_.Result -eq "Success" }
# Sort the sessions by end time in descending order (most recent first)
$sessions = $sessions | Sort-Object -Property EndTime -Descending
# Initialize variables to store total transferred data, average processing rate, average bottleneck, total duration, total processed used size, and total objects
$totalTransferredData = 0
$averageProcessingRate = 0
$totalBottleneck = 0
$totalDuration = [TimeSpan]::Zero
$totalProcessedUsedSize = 0
$totalObjects = 0
$incrementalSessions = @()
$syntheticSessions = @()
# Iterate through sessions and separate them into Incremental and Synthetic
foreach ($session in $sessions) {
$totalTransferredData += $session.Progress.TransferedSize
$averageProcessingRate += $session.Progress.AvgSpeed
$bottleneckInfo = $session.Progress.BottleneckInfo
$highestBottleneckCategory = $bottleneckInfo.PSObject.Properties | Sort-Object { $_.Value } | Select-Object -Last 1
$totalBottleneck += [int]$highestBottleneckCategory.Value
$totalDuration += $session.Progress.Duration
$totalProcessedUsedSize += $session.Progress.ProcessedUsedSize
$totalObjects += $session.Progress.TotalObjects
# Separate sessions into Incremental and Synthetic based on their names
if ($session.Name -like "*Incremental*") {
$incrementalSessions += $session
} elseif ($session.Name -like "*Synthetic*") {
$syntheticSessions += $session
}
}
# Calculate the average transferred data, processing rate, bottleneck, duration, processed used size, and objects for each type
$averageTransferredDataGB = [math]::Round(($totalTransferredData / $incrementalSessions.Count) / 1GB, 2) # GB
$averageProcessingRateMBps = [math]::Round($averageProcessingRate / $incrementalSessions.Count / 1MB, 2) # MB/s
$averageBottleneck = [math]::Round($totalBottleneck / $incrementalSessions.Count, 2)
$averageDurationSeconds = [math]::Round($totalDuration.TotalSeconds / $incrementalSessions.Count, 2)
$durationUnit = "seconds"
if ($averageDurationSeconds -ge 3600) {
$averageDuration = [math]::Round($averageDurationSeconds / 3600, 2)
$durationUnit = "hours"
} elseif ($averageDurationSeconds -ge 120) {
$averageDuration = [math]::Round($averageDurationSeconds / 60, 2)
$durationUnit = "minutes"
} else {
$averageDuration = $averageDurationSeconds
}
$averageProcessedUsedSizeGB = [math]::Round(($totalProcessedUsedSize / $incrementalSessions.Count) / 1GB, 2) # GB
$averageTotalObjects = [math]::Round($totalObjects / $incrementalSessions.Count, 2)
# Display the results with colors
Write-Host "Job Name: $($jobName)" -ForegroundColor Yellow
Write-Host "Job Type: $($jobType)" -ForegroundColor Yellow
# Display Incremental results
Write-Host "Backup Type: Incremental" -ForegroundColor Yellow
Write-Host "Over $($incrementalSessions.Count) of Incremental" -ForegroundColor Yellow
Write-Host "Average Transferred Data: $averageTransferredDataGB GB" -ForegroundColor Green
Write-Host "Average Processing Rate: $averageProcessingRateMBps MB/s" -ForegroundColor Green
Write-Host "Average Bottleneck: $averageBottleneck" -ForegroundColor Green
Write-Host "Average Duration: $averageDuration $durationUnit" -ForegroundColor Green
Write-Host "Average Processed Used Size: $averageProcessedUsedSizeGB GB" -ForegroundColor Green
Write-Host "Average Total Objects: $averageTotalObjects" -ForegroundColor Green
# Reset variables for Synthetic sessions
$totalTransferredData = 0
$averageProcessingRate = 0
$totalBottleneck = 0
$totalDuration = [TimeSpan]::Zero
$totalProcessedUsedSize = 0
$totalObjects = 0
# Calculate the same statistics for Synthetic sessions
foreach ($session in $syntheticSessions) {
$totalTransferredData += $session.Progress.TransferedSize
$averageProcessingRate += $session.Progress.AvgSpeed
$bottleneckInfo = $session.Progress.BottleneckInfo
$highestBottleneckCategory = $bottleneckInfo.PSObject.Properties | Sort-Object { $_.Value } | Select-Object -Last 1
$totalBottleneck += [int]$highestBottleneckCategory.Value
$totalDuration += $session.Progress.Duration
$totalProcessedUsedSize += $session.Progress.ProcessedUsedSize
$totalObjects += $session.Progress.TotalObjects
}
# Calculate the same averages for Synthetic sessions
$averageTransferredDataGB = [math]::Round(($totalTransferredData / $syntheticSessions.Count) / 1GB, 2) # GB
$averageProcessingRateMBps = [math]::Round($averageProcessingRate / $syntheticSessions.Count / 1MB, 2) # MB/s
$averageBottleneck = [math]::Round($totalBottleneck / $syntheticSessions.Count, 2)
$averageDurationSeconds = [math]::Round($totalDuration.TotalSeconds / $syntheticSessions.Count, 2)
$durationUnit = "seconds"
if ($averageDurationSeconds -ge 3600) {
$averageDuration = [math]::Round($averageDurationSeconds / 3600, 2)
$durationUnit = "hours"
} elseif ($averageDurationSeconds -ge 120) {
$averageDuration = [math]::Round($averageDurationSeconds / 60, 2)
$durationUnit = "minutes"
} else {
$averageDuration = $averageDurationSeconds
}
$averageProcessedUsedSizeGB = [math]::Round(($totalProcessedUsedSize / $syntheticSessions.Count) / 1GB, 2) # GB
$averageTotalObjects = [math]::Round($totalObjects / $syntheticSessions.Count, 2)
# Display Synthetic results
Write-Host "Backup Type: Synthetic" -ForegroundColor Yellow
Write-Host "Over $($syntheticSessions.Count) of Synthetic" -ForegroundColor Yellow
Write-Host "Average Transferred Data: $averageTransferredDataGB GB" -ForegroundColor Green
Write-Host "Average Processing Rate: $averageProcessingRateMBps MB/s" -ForegroundColor Green
Write-Host "Average Bottleneck: $averageBottleneck" -ForegroundColor Green
Write-Host "Average Duration: $averageDuration $durationUnit" -ForegroundColor Green
Write-Host "Average Processed Used Size: $averageProcessedUsedSizeGB GB" -ForegroundColor Green
Write-Host "Average Total Objects: $averageTotalObjects" -ForegroundColor Green
} else {
Write-Host "Invalid selection. Please enter a valid number." -ForegroundColor Red
}
-
- Chief Product Officer
- Posts: 31690
- Liked: 7201 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: CEPH/45Drives
Thank you for coming back with this elaborate feedback. Could you clarify if 45 Drives only provides hardware, or also supports CEPH and Linux? From your post above it looks like you do at least buy Linux support from them, what about CEPH itself? Are you on your own with it, or do they help you with CEPH too?
-
- Enthusiast
- Posts: 78
- Liked: 46 times
- Joined: Dec 10, 2019 3:59 pm
- Full Name: Ryan Walker
- Contact:
Re: CEPH/45Drives
They absolutely help you - within that 'block of hours' you can purchase, which IMHO is not a bad cost - with CEPH as well. Again, they don't make CEPH - I THINK IBM might now offer Enterprise Support on it, but they're the current 'owners' of the project (it's still OSS though) - but they have some cool nerds there that know their stuff. They DO include the initial setup in their cost, and if you're comfortable with Linux in general, CEPH isn't all that alien of a concept or product. The Dashboards and Monitoring is all using Prometheus and Grafana is all configured by 45drives for a "Turn Key" solution.
I guess technically one gripe I have is that all of their configuration for those backend systems use HTTP not HTTPS for prometheus/grafana, but that graph/performance data has basically no value that 'requires' it to be encrypted (obviously our backups are being encrypted by Veeam anyways).
As a bonus you can also get custom face plates! My CIO cheaped out on me and made them take that off the quote for my CEPH cluster systems (despite it being only like $1,500 for 6 custom face plates i think? It wasn't much and you can design whatever you want)
I should note: I can't edit it, but that Single File Server isn't on CEPH - I thought I'd migrated it, but it's still on a Windows ReFS all SSD (Micron ION) 24x7.68TB Raid-5 system, so actually gives an interesting comparison in performance now that I think about it.
This is a smaller but similar design Windows File Server (with Windows Dedupe on the data in the OS) being backed up to CEPH:
Backup Type: Incremental
Over 259 of Incremental
Average Transferred Data: 10.46 GB
Average Processing Rate: 307.59 MB/s
Average Bottleneck: 77.68
Average Duration: 18.62 minutes
Average Processed Used Size: 24197.67 GB
Average Total Objects: 1.02
Backup Type: Synthetic
Over 4 of Synthetic
Average Transferred Data: 1.06 GB
Average Processing Rate: 177.15 MB/s
Average Bottleneck: 84
Average Duration: 21.81 minutes
Average Processed Used Size: 23720.21 GB
Average Total Objects: 1
So from what I'm seeing anyways, the Synthetic Full jobs on the XFS CEPH seem faster than just a straight up SSD RAID-5. Again, the incremental are twice as large on the other, but only ~17GB more - and there are 'more' incremental in the one on the ceph so I'd think that means more fragmentation as well, but it seems to still do better... weird.
I also just realized that the bottleneck logic in my script got messed up - it's just reporting the number and not 'where' the bottleneck is. Most of the time I was seeing it in Target Network, oddly enough - which I think is a bit odd as that's a SFP28 connection an i don't tend to see it get hammered fully... Maybe the Intel drivers are just a bit unoptimized or something. Though honestly, I sat in on some really cool sessions at the USENIX 2023 conference this year that made me realize 'bigger' isn't always better when it comes to networking, as a lot of these higher bandwidth NICs are literally too fast for the OS/Application layer to fully utilize - so it's not impossible that the congestion is outside of the layer-1 side.
I guess technically one gripe I have is that all of their configuration for those backend systems use HTTP not HTTPS for prometheus/grafana, but that graph/performance data has basically no value that 'requires' it to be encrypted (obviously our backups are being encrypted by Veeam anyways).
As a bonus you can also get custom face plates! My CIO cheaped out on me and made them take that off the quote for my CEPH cluster systems (despite it being only like $1,500 for 6 custom face plates i think? It wasn't much and you can design whatever you want)
I should note: I can't edit it, but that Single File Server isn't on CEPH - I thought I'd migrated it, but it's still on a Windows ReFS all SSD (Micron ION) 24x7.68TB Raid-5 system, so actually gives an interesting comparison in performance now that I think about it.
This is a smaller but similar design Windows File Server (with Windows Dedupe on the data in the OS) being backed up to CEPH:
Backup Type: Incremental
Over 259 of Incremental
Average Transferred Data: 10.46 GB
Average Processing Rate: 307.59 MB/s
Average Bottleneck: 77.68
Average Duration: 18.62 minutes
Average Processed Used Size: 24197.67 GB
Average Total Objects: 1.02
Backup Type: Synthetic
Over 4 of Synthetic
Average Transferred Data: 1.06 GB
Average Processing Rate: 177.15 MB/s
Average Bottleneck: 84
Average Duration: 21.81 minutes
Average Processed Used Size: 23720.21 GB
Average Total Objects: 1
So from what I'm seeing anyways, the Synthetic Full jobs on the XFS CEPH seem faster than just a straight up SSD RAID-5. Again, the incremental are twice as large on the other, but only ~17GB more - and there are 'more' incremental in the one on the ceph so I'd think that means more fragmentation as well, but it seems to still do better... weird.
I also just realized that the bottleneck logic in my script got messed up - it's just reporting the number and not 'where' the bottleneck is. Most of the time I was seeing it in Target Network, oddly enough - which I think is a bit odd as that's a SFP28 connection an i don't tend to see it get hammered fully... Maybe the Intel drivers are just a bit unoptimized or something. Though honestly, I sat in on some really cool sessions at the USENIX 2023 conference this year that made me realize 'bigger' isn't always better when it comes to networking, as a lot of these higher bandwidth NICs are literally too fast for the OS/Application layer to fully utilize - so it's not impossible that the congestion is outside of the layer-1 side.
Who is online
Users browsing this forum: Baidu [Spider], Google [Bot] and 40 guests