gitlab.com/jokerrs1/Sia@v1.3.2/modules/renter/download.go (about) 1 package renter 2 3 // The download code follows a hopefully clean/intuitive flow for getting super 4 // high and computationally efficient parallelism on downloads. When a download 5 // is requested, it gets split into its respective chunks (which are downloaded 6 // individually) and then put into the download heap. The primary purpose of the 7 // download heap is to keep downloads on standby until there is enough memory 8 // available to send the downloads off to the workers. The heap is sorted first 9 // by priority, but then a few other criteria as well. 10 // 11 // Some downloads, in particular downloads issued by the repair code, have 12 // already had their memory allocated. These downloads get to skip the heap and 13 // go straight for the workers. 14 // 15 // When a download is distributed to workers, it is given to every single worker 16 // without checking whether that worker is appropriate for the download. Each 17 // worker has their own queue, which is bottlenecked by the fact that a worker 18 // can only process one item at a time. When the worker gets to a download 19 // request, it determines whether it is suited for downloading that particular 20 // file. The criteria it uses include whether or not it has a piece of that 21 // chunk, how many other workers are currently downloading pieces or have 22 // completed pieces for that chunk, and finally things like worker latency and 23 // worker price. 24 // 25 // If the worker chooses to download a piece, it will register itself with that 26 // piece, so that other workers know how many workers are downloading each 27 // piece. This keeps everything cleanly coordinated and prevents too many 28 // workers from downloading a given piece, while at the same time you don't need 29 // a giant messy coordinator tracking everything. If a worker chooses not to 30 // download a piece, it will add itself to the list of standby workers, so that 31 // in the event of a failure, the worker can be returned to and used again as a 32 // backup worker. The worker may also decide that it is not suitable at all (for 33 // example, if the worker has recently had some consecutive failures, or if the 34 // worker doesn't have access to a piece of that chunk), in which case it will 35 // mark itself as unavailable to the chunk. 36 // 37 // As workers complete, they will release memory and check on the overall state 38 // of the chunk. If some workers fail, they will enlist the standby workers to 39 // pick up the slack. 40 // 41 // When the final required piece finishes downloading, the worker who completed 42 // the final piece will spin up a separate thread to decrypt, decode, and write 43 // out the download. That thread will then clean up any remaining resources, and 44 // if this was the final unfinished chunk in the download, it'll mark the 45 // download as complete. 46 47 // The download process has a slightly complicating factor, which is overdrive 48 // workers. Traditionally, if you need 10 pieces to recover a file, you will use 49 // 10 workers. But if you have an overdrive of '2', you will actually use 12 50 // workers, meaning you download 2 more pieces than you need. This means that up 51 // to two of the workers can be slow or fail and the download can still complete 52 // quickly. This complicates resource handling, because not all memory can be 53 // released as soon as a download completes - there may be overdrive workers 54 // still out fetching the file. To handle this, a catchall 'cleanUp' function is 55 // used which gets called every time a worker finishes, and every time recovery 56 // completes. The result is that memory gets cleaned up as required, and no 57 // overarching coordination is needed between the overdrive workers (who do not 58 // even know that they are overdrive workers) and the recovery function. 59 60 // By default, the download code organizes itself around having maximum possible 61 // throughput. That is, it is highly parallel, and exploits that parallelism as 62 // efficiently and effectively as possible. The hostdb does a good of selecting 63 // for hosts that have good traits, so we can generally assume that every host 64 // or worker at our disposable is reasonably effective in all dimensions, and 65 // that the overall selection is generally geared towards the user's 66 // preferences. 67 // 68 // We can leverage the standby workers in each unfinishedDownloadChunk to 69 // emphasize various traits. For example, if we want to prioritize latency, 70 // we'll put a filter in the 'managedProcessDownloadChunk' function that has a 71 // worker go standby instead of accept a chunk if the latency is higher than the 72 // targeted latency. These filters can target other traits as well, such as 73 // price and total throughput. 74 75 // TODO: One of the biggest requested features for users is to improve the 76 // latency of the system. The biggest fruit actually doesn't happen here, right 77 // now the hostdb doesn't discriminate based on latency at all, and simply 78 // adding some sort of latency scoring will probably be the biggest thing that 79 // we can do to improve overall file latency. 80 // 81 // After we do that, the second most important thing that we can do is enable 82 // partial downloads. It's hard to have a low latency when to get any data back 83 // at all you need to download a full 40 MiB. If we can leverage partial 84 // downloads to drop that to something like 256kb, we'll get much better overall 85 // latency for small files and for starting video streams. 86 // 87 // After both of those, we can leverage worker latency discrimination. We can 88 // add code to 'managedProcessDownloadChunk' to put a worker on standby 89 // initially instead of have it grab a piece if the latency of the worker is 90 // higher than the faster workers. This will prevent the slow workers from 91 // bottlenecking a chunk that we are trying to download quickly, though it will 92 // harm overall system throughput because it means that the slower workers will 93 // idle some of the time. 94 95 // TODO: Currently the number of overdrive workers is set to '2' for the first 2 96 // chunks of any user-initiated download. But really, this should be a parameter 97 // of downloading that gets set by the user through the API on a per-file basis 98 // instead of set by default. 99 100 // TODO: I tried to write the code such that the transition to true partial 101 // downloads would be as seamless as possible, but there's a lot of work that 102 // still needs to be done to make that fully possible. The most disruptive thing 103 // probably is the place where we call 'Sector' in worker.managedDownload. 104 // That's going to need to be changed to a partial sector. This is probably 105 // going to result in downloading that's 64-byte aligned instead of perfectly 106 // byte-aligned. Further, the encryption and erasure coding may also have 107 // alignment requirements which interefere with how the call to Sector can work. 108 // So you need to make sure that in 'managedDownload' you download at least 109 // enough data to fit the alignment requirements of all 3 steps (download from 110 // host, encryption, erasure coding). After the logical data has been recovered, 111 // we slice it to whatever is meant to be written to the underlying 112 // downloadWriter, that code is going to need to be adjusted as well to slice 113 // things in the right way. 114 // 115 // Overall I don't think it's going to be all that difficult, but it's not 116 // nearly as clean-cut as some of the other potential extensions that we can do. 117 118 // TODO: Right now the whole download will build and send off chunks even if 119 // there are not enough hosts to download the file, and even if there are not 120 // enough hosts to download a particular chunk. For the downloads and chunks 121 // which are doomed from the outset, we can skip some computation by checking 122 // and failing earlier. Another optimization we can make is to not count a 123 // worker for a chunk if the worker's contract does not appear in the chunk 124 // heap. 125 126 import ( 127 "fmt" 128 "os" 129 "path/filepath" 130 "sync" 131 "sync/atomic" 132 "time" 133 134 "github.com/NebulousLabs/Sia/modules" 135 "github.com/NebulousLabs/Sia/persist" 136 "github.com/NebulousLabs/Sia/types" 137 138 "github.com/NebulousLabs/errors" 139 ) 140 141 type ( 142 // A download is a file download that has been queued by the renter. 143 download struct { 144 // Data progress variables. 145 atomicDataReceived uint64 // Incremented as data completes, will stop at 100% file progress. 146 atomicTotalDataTransferred uint64 // Incremented as data arrives, includes overdrive, contract negotiation, etc. 147 148 // Other progress variables. 149 chunksRemaining uint64 // Number of chunks whose downloads are incomplete. 150 completeChan chan struct{} // Closed once the download is complete. 151 err error // Only set if there was an error which prevented the download from completing. 152 153 // Timestamp information. 154 endTime time.Time // Set immediately before closing 'completeChan'. 155 staticStartTime time.Time // Set immediately when the download object is created. 156 157 // Basic information about the file. 158 destination downloadDestination 159 destinationString string // The string reported to the user to indicate the download's destination. 160 destinationType string // "memory buffer", "http stream", "file", etc. 161 staticLength uint64 // Length to download starting from the offset. 162 staticOffset uint64 // Offset within the file to start the download. 163 staticSiaPath string // The path of the siafile at the time the download started. 164 165 // Retrieval settings for the file. 166 staticLatencyTarget time.Duration // In milliseconds. Lower latency results in lower total system throughput. 167 staticOverdrive int // How many extra pieces to download to prevent slow hosts from being a bottleneck. 168 staticPriority uint64 // Downloads with higher priority will complete first. 169 170 // Utilities. 171 log *persist.Logger // Same log as the renter. 172 memoryManager *memoryManager // Same memoryManager used across the renter. 173 mu sync.Mutex // Unique to the download object. 174 } 175 176 // downloadParams is the set of parameters to use when downloading a file. 177 downloadParams struct { 178 destination downloadDestination // The place to write the downloaded data. 179 destinationType string // "file", "buffer", "http stream", etc. 180 destinationString string // The string to report to the user for the destination. 181 file *file // The file to download. 182 183 latencyTarget time.Duration // Workers above this latency will be automatically put on standby initially. 184 length uint64 // Length of download. Cannot be 0. 185 needsMemory bool // Whether new memory needs to be allocated to perform the download. 186 offset uint64 // Offset within the file to start the download. Must be less than the total filesize. 187 overdrive int // How many extra pieces to download to prevent slow hosts from being a bottleneck. 188 priority uint64 // Files with a higher priority will be downloaded first. 189 } 190 ) 191 192 // managedFail will mark the download as complete, but with the provided error. 193 // If the download has already failed, the error will be updated to be a 194 // concatenation of the previous error and the new error. 195 func (d *download) managedFail(err error) { 196 d.mu.Lock() 197 defer d.mu.Unlock() 198 199 // If the download is already complete, extend the error. 200 complete := d.staticComplete() 201 if complete && d.err != nil { 202 return 203 } else if complete && d.err == nil { 204 d.log.Critical("download is marked as completed without error, but then managedFail was called with err:", err) 205 return 206 } 207 208 // Mark the download as complete and set the error. 209 d.err = err 210 close(d.completeChan) 211 err = d.destination.Close() 212 if err != nil { 213 d.log.Println("unable to close download destination:", err) 214 } 215 } 216 217 // staticComplete is a helper function to indicate whether or not the download 218 // has completed. 219 func (d *download) staticComplete() bool { 220 select { 221 case <-d.completeChan: 222 return true 223 default: 224 return false 225 } 226 } 227 228 // Err returns the error encountered by a download, if it exists. 229 func (d *download) Err() (err error) { 230 d.mu.Lock() 231 err = d.err 232 d.mu.Unlock() 233 return err 234 } 235 236 // newDownload creates and initializes a download based on the provided 237 // parameters. 238 func (r *Renter) newDownload(params downloadParams) (*download, error) { 239 // Input validation. 240 if params.file == nil { 241 return nil, errors.New("no file provided when requesting download") 242 } 243 if params.length <= 0 { 244 return nil, errors.New("download length must be a positive whole number") 245 } 246 if params.offset < 0 { 247 return nil, errors.New("download offset cannot be a negative number") 248 } 249 if params.offset+params.length > params.file.size { 250 return nil, errors.New("download is requesting data past the boundary of the file") 251 } 252 253 // Create the download object. 254 d := &download{ 255 completeChan: make(chan struct{}), 256 257 staticStartTime: time.Now(), 258 259 destination: params.destination, 260 destinationString: params.destinationString, 261 staticLatencyTarget: params.latencyTarget, 262 staticLength: params.length, 263 staticOffset: params.offset, 264 staticOverdrive: params.overdrive, 265 staticSiaPath: params.file.name, 266 staticPriority: params.priority, 267 268 log: r.log, 269 memoryManager: r.memoryManager, 270 } 271 272 // Determine which chunks to download. 273 minChunk := params.offset / params.file.staticChunkSize() 274 maxChunk := (params.offset + params.length - 1) / params.file.staticChunkSize() 275 276 // For each chunk, assemble a mapping from the contract id to the index of 277 // the piece within the chunk that the contract is responsible for. 278 chunkMaps := make([]map[types.FileContractID]downloadPieceInfo, maxChunk-minChunk+1) 279 for i := range chunkMaps { 280 chunkMaps[i] = make(map[types.FileContractID]downloadPieceInfo) 281 } 282 params.file.mu.Lock() 283 for id, contract := range params.file.contracts { 284 resolvedID := r.hostContractor.ResolveID(id) 285 for _, piece := range contract.Pieces { 286 if piece.Chunk >= minChunk && piece.Chunk <= maxChunk { 287 // Sanity check - the same worker should not have two pieces for 288 // the same chunk. 289 _, exists := chunkMaps[piece.Chunk-minChunk][resolvedID] 290 if exists { 291 r.log.Println("ERROR: Worker has multiple pieces uploaded for the same chunk.") 292 } 293 chunkMaps[piece.Chunk-minChunk][resolvedID] = downloadPieceInfo{ 294 index: piece.Piece, 295 root: piece.MerkleRoot, 296 } 297 } 298 } 299 } 300 params.file.mu.Unlock() 301 302 // Queue the downloads for each chunk. 303 writeOffset := int64(0) // where to write a chunk within the download destination. 304 d.chunksRemaining += maxChunk - minChunk + 1 305 for i := minChunk; i <= maxChunk; i++ { 306 udc := &unfinishedDownloadChunk{ 307 destination: params.destination, 308 erasureCode: params.file.erasureCode, 309 masterKey: params.file.masterKey, 310 311 staticChunkIndex: i, 312 staticChunkMap: chunkMaps[i-minChunk], 313 staticChunkSize: params.file.staticChunkSize(), 314 staticPieceSize: params.file.pieceSize, 315 316 // TODO: 25ms is just a guess for a good default. Really, we want to 317 // set the latency target such that slower workers will pick up the 318 // later chunks, but only if there's a very strong chance that 319 // they'll finish before the earlier chunks finish, so that they do 320 // no contribute to low latency. 321 // 322 // TODO: There is some sane minimum latency that should actually be 323 // set based on the number of pieces 'n', and the 'n' fastest 324 // workers that we have. 325 staticLatencyTarget: params.latencyTarget + (25 * time.Duration(i-minChunk)), // Increase target by 25ms per chunk. 326 staticNeedsMemory: params.needsMemory, 327 staticPriority: params.priority, 328 329 physicalChunkData: make([][]byte, params.file.erasureCode.NumPieces()), 330 pieceUsage: make([]bool, params.file.erasureCode.NumPieces()), 331 332 download: d, 333 } 334 335 // Set the fetchOffset - the offset within the chunk that we start 336 // downloading from. 337 if i == minChunk { 338 udc.staticFetchOffset = params.offset % params.file.staticChunkSize() 339 } else { 340 udc.staticFetchOffset = 0 341 } 342 // Set the fetchLength - the number of bytes to fetch within the chunk 343 // that we start downloading from. 344 if i == maxChunk && (params.length+params.offset)%params.file.staticChunkSize() != 0 { 345 udc.staticFetchLength = ((params.length + params.offset) % params.file.staticChunkSize()) - udc.staticFetchOffset 346 } else { 347 udc.staticFetchLength = params.file.staticChunkSize() - udc.staticFetchOffset 348 } 349 // Set the writeOffset within the destination for where the data should 350 // be written. 351 udc.staticWriteOffset = writeOffset 352 writeOffset += int64(udc.staticFetchLength) 353 354 // TODO: Currently all chunks are given overdrive. This should probably 355 // be changed once the hostdb knows how to measure host speed/latency 356 // and once we can assign overdrive dynamically. 357 udc.staticOverdrive = params.overdrive 358 359 // Add this chunk to the chunk heap, and notify the download loop that 360 // there is work to do. 361 r.managedAddChunkToDownloadHeap(udc) 362 select { 363 case r.newDownloads <- struct{}{}: 364 default: 365 } 366 } 367 return d, nil 368 } 369 370 // Download performs a file download using the passed parameters. 371 func (r *Renter) Download(p modules.RenterDownloadParameters) error { 372 // Lookup the file associated with the nickname. 373 lockID := r.mu.RLock() 374 file, exists := r.files[p.SiaPath] 375 r.mu.RUnlock(lockID) 376 if !exists { 377 return fmt.Errorf("no file with that path: %s", p.SiaPath) 378 } 379 380 // Validate download parameters. 381 isHTTPResp := p.Httpwriter != nil 382 if p.Async && isHTTPResp { 383 return errors.New("cannot async download to http response") 384 } 385 if isHTTPResp && p.Destination != "" { 386 return errors.New("destination cannot be specified when downloading to http response") 387 } 388 if !isHTTPResp && p.Destination == "" { 389 return errors.New("destination not supplied") 390 } 391 if p.Destination != "" && !filepath.IsAbs(p.Destination) { 392 return errors.New("destination must be an absolute path") 393 } 394 if p.Offset == file.size { 395 return errors.New("offset equals filesize") 396 } 397 // Sentinel: if length == 0, download the entire file. 398 if p.Length == 0 { 399 p.Length = file.size - p.Offset 400 } 401 // Check whether offset and length is valid. 402 if p.Offset < 0 || p.Offset+p.Length > file.size { 403 return fmt.Errorf("offset and length combination invalid, max byte is at index %d", file.size-1) 404 } 405 406 // Instantiate the correct downloadWriter implementation. 407 var dw downloadDestination 408 var destinationType string 409 if isHTTPResp { 410 dw = newDownloadDestinationWriteCloserFromWriter(p.Httpwriter) 411 destinationType = "http stream" 412 } else { 413 osFile, err := os.OpenFile(p.Destination, os.O_CREATE|os.O_WRONLY, os.FileMode(file.mode)) 414 if err != nil { 415 return err 416 } 417 dw = osFile 418 destinationType = "file" 419 } 420 421 // Create the download object. 422 d, err := r.newDownload(downloadParams{ 423 destination: dw, 424 destinationType: destinationType, 425 destinationString: p.Destination, 426 file: file, 427 428 latencyTarget: 25e3 * time.Millisecond, // TODO: high default until full latency support is added. 429 length: p.Length, 430 needsMemory: true, 431 offset: p.Offset, 432 overdrive: 3, // TODO: moderate default until full overdrive support is added. 433 priority: 5, // TODO: moderate default until full priority support is added. 434 }) 435 if err != nil { 436 return err 437 } 438 439 // Add the download object to the download queue. 440 r.downloadHistoryMu.Lock() 441 r.downloadHistory = append(r.downloadHistory, d) 442 r.downloadHistoryMu.Unlock() 443 444 // Block until the download has completed. 445 select { 446 case <-d.completeChan: 447 return d.Err() 448 case <-r.tg.StopChan(): 449 return errors.New("download interrupted by shutdown") 450 } 451 } 452 453 // DownloadHistory returns the list of downloads that have been performed. Will 454 // include downloads that have not yet completed. Downloads will be roughly, but 455 // not precisely, sorted according to start time. 456 // 457 // TODO: Currently the DownloadHistory only contains downloads from this 458 // session, does not contain downloads that were executed for the purposes of 459 // repairing, and has no way to clear the download history if it gets long or 460 // unwieldy. It's not entirely certain which of the missing features are 461 // actually desirable, please consult core team + app dev community before 462 // deciding what to implement. 463 func (r *Renter) DownloadHistory() []modules.DownloadInfo { 464 r.downloadHistoryMu.Lock() 465 defer r.downloadHistoryMu.Unlock() 466 467 downloads := make([]modules.DownloadInfo, len(r.downloadHistory)) 468 for i := range r.downloadHistory { 469 // Order from most recent to least recent. 470 d := r.downloadHistory[len(r.downloadHistory)-i-1] 471 d.mu.Lock() // Lock required for d.endTime only. 472 downloads[i] = modules.DownloadInfo{ 473 Destination: d.destinationString, 474 DestinationType: d.destinationType, 475 Length: d.staticLength, 476 Offset: d.staticOffset, 477 SiaPath: d.staticSiaPath, 478 479 Completed: d.staticComplete(), 480 EndTime: d.endTime, 481 Received: atomic.LoadUint64(&d.atomicDataReceived), 482 StartTime: d.staticStartTime, 483 TotalDataTransferred: atomic.LoadUint64(&d.atomicTotalDataTransferred), 484 } 485 // Release download lock before calling d.Err(), which will acquire the 486 // lock. The error needs to be checked separately because we need to 487 // know if it's 'nil' before grabbing the error string. 488 d.mu.Unlock() 489 if d.Err() != nil { 490 downloads[i].Error = d.Err().Error() 491 } else { 492 downloads[i].Error = "" 493 } 494 } 495 return downloads 496 }