github.com/Synthesix/Sia@v1.3.3-0.20180413141344-f863baeed3ca/modules/renter/download.go (about) 1 package renter 2 3 // The download code follows a hopefully clean/intuitive flow for getting super 4 // high and computationally efficient parallelism on downloads. When a download 5 // is requested, it gets split into its respective chunks (which are downloaded 6 // individually) and then put into the download heap. The primary purpose of the 7 // download heap is to keep downloads on standby until there is enough memory 8 // available to send the downloads off to the workers. The heap is sorted first 9 // by priority, but then a few other criteria as well. 10 // 11 // Some downloads, in particular downloads issued by the repair code, have 12 // already had their memory allocated. These downloads get to skip the heap and 13 // go straight for the workers. 14 // 15 // When a download is distributed to workers, it is given to every single worker 16 // without checking whether that worker is appropriate for the download. Each 17 // worker has their own queue, which is bottlenecked by the fact that a worker 18 // can only process one item at a time. When the worker gets to a download 19 // request, it determines whether it is suited for downloading that particular 20 // file. The criteria it uses include whether or not it has a piece of that 21 // chunk, how many other workers are currently downloading pieces or have 22 // completed pieces for that chunk, and finally things like worker latency and 23 // worker price. 24 // 25 // If the worker chooses to download a piece, it will register itself with that 26 // piece, so that other workers know how many workers are downloading each 27 // piece. This keeps everything cleanly coordinated and prevents too many 28 // workers from downloading a given piece, while at the same time you don't need 29 // a giant messy coordinator tracking everything. If a worker chooses not to 30 // download a piece, it will add itself to the list of standby workers, so that 31 // in the event of a failure, the worker can be returned to and used again as a 32 // backup worker. The worker may also decide that it is not suitable at all (for 33 // example, if the worker has recently had some consecutive failures, or if the 34 // worker doesn't have access to a piece of that chunk), in which case it will 35 // mark itself as unavailable to the chunk. 36 // 37 // As workers complete, they will release memory and check on the overall state 38 // of the chunk. If some workers fail, they will enlist the standby workers to 39 // pick up the slack. 40 // 41 // When the final required piece finishes downloading, the worker who completed 42 // the final piece will spin up a separate thread to decrypt, decode, and write 43 // out the download. That thread will then clean up any remaining resources, and 44 // if this was the final unfinished chunk in the download, it'll mark the 45 // download as complete. 46 47 // The download process has a slightly complicating factor, which is overdrive 48 // workers. Traditionally, if you need 10 pieces to recover a file, you will use 49 // 10 workers. But if you have an overdrive of '2', you will actually use 12 50 // workers, meaning you download 2 more pieces than you need. This means that up 51 // to two of the workers can be slow or fail and the download can still complete 52 // quickly. This complicates resource handling, because not all memory can be 53 // released as soon as a download completes - there may be overdrive workers 54 // still out fetching the file. To handle this, a catchall 'cleanUp' function is 55 // used which gets called every time a worker finishes, and every time recovery 56 // completes. The result is that memory gets cleaned up as required, and no 57 // overarching coordination is needed between the overdrive workers (who do not 58 // even know that they are overdrive workers) and the recovery function. 59 60 // By default, the download code organizes itself around having maximum possible 61 // throughput. That is, it is highly parallel, and exploits that parallelism as 62 // efficiently and effectively as possible. The hostdb does a good of selecting 63 // for hosts that have good traits, so we can generally assume that every host 64 // or worker at our disposable is reasonably effective in all dimensions, and 65 // that the overall selection is generally geared towards the user's 66 // preferences. 67 // 68 // We can leverage the standby workers in each unfinishedDownloadChunk to 69 // emphasize various traits. For example, if we want to prioritize latency, 70 // we'll put a filter in the 'managedProcessDownloadChunk' function that has a 71 // worker go standby instead of accept a chunk if the latency is higher than the 72 // targeted latency. These filters can target other traits as well, such as 73 // price and total throughput. 74 75 // TODO: One of the biggest requested features for users is to improve the 76 // latency of the system. The biggest fruit actually doesn't happen here, right 77 // now the hostdb doesn't discriminate based on latency at all, and simply 78 // adding some sort of latency scoring will probably be the biggest thing that 79 // we can do to improve overall file latency. 80 // 81 // After we do that, the second most important thing that we can do is enable 82 // partial downloads. It's hard to have a low latency when to get any data back 83 // at all you need to download a full 40 MiB. If we can leverage partial 84 // downloads to drop that to something like 256kb, we'll get much better overall 85 // latency for small files and for starting video streams. 86 // 87 // After both of those, we can leverage worker latency discrimination. We can 88 // add code to 'managedProcessDownloadChunk' to put a worker on standby 89 // initially instead of have it grab a piece if the latency of the worker is 90 // higher than the faster workers. This will prevent the slow workers from 91 // bottlenecking a chunk that we are trying to download quickly, though it will 92 // harm overall system throughput because it means that the slower workers will 93 // idle some of the time. 94 95 // TODO: Currently the number of overdrive workers is set to '2' for the first 2 96 // chunks of any user-initiated download. But really, this should be a parameter 97 // of downloading that gets set by the user through the API on a per-file basis 98 // instead of set by default. 99 100 // TODO: I tried to write the code such that the transition to true partial 101 // downloads would be as seamless as possible, but there's a lot of work that 102 // still needs to be done to make that fully possible. The most disruptive thing 103 // probably is the place where we call 'Sector' in worker.managedDownload. 104 // That's going to need to be changed to a partial sector. This is probably 105 // going to result in downloading that's 64-byte aligned instead of perfectly 106 // byte-aligned. Further, the encryption and erasure coding may also have 107 // alignment requirements which interefere with how the call to Sector can work. 108 // So you need to make sure that in 'managedDownload' you download at least 109 // enough data to fit the alignment requirements of all 3 steps (download from 110 // host, encryption, erasure coding). After the logical data has been recovered, 111 // we slice it to whatever is meant to be written to the underlying 112 // downloadWriter, that code is going to need to be adjusted as well to slice 113 // things in the right way. 114 // 115 // Overall I don't think it's going to be all that difficult, but it's not 116 // nearly as clean-cut as some of the other potential extensions that we can do. 117 118 // TODO: Right now the whole download will build and send off chunks even if 119 // there are not enough hosts to download the file, and even if there are not 120 // enough hosts to download a particular chunk. For the downloads and chunks 121 // which are doomed from the outset, we can skip some computation by checking 122 // and failing earlier. Another optimization we can make is to not count a 123 // worker for a chunk if the worker's contract does not appear in the chunk 124 // heap. 125 126 import ( 127 "fmt" 128 "os" 129 "path/filepath" 130 "sync" 131 "sync/atomic" 132 "time" 133 134 "github.com/Synthesix/Sia/modules" 135 "github.com/Synthesix/Sia/persist" 136 "github.com/Synthesix/Sia/types" 137 138 "github.com/NebulousLabs/errors" 139 ) 140 141 type ( 142 // A download is a file download that has been queued by the renter. 143 download struct { 144 // Data progress variables. 145 atomicDataReceived uint64 // Incremented as data completes, will stop at 100% file progress. 146 atomicTotalDataTransferred uint64 // Incremented as data arrives, includes overdrive, contract negotiation, etc. 147 148 // Other progress variables. 149 chunksRemaining uint64 // Number of chunks whose downloads are incomplete. 150 completeChan chan struct{} // Closed once the download is complete. 151 err error // Only set if there was an error which prevented the download from completing. 152 153 // Timestamp information. 154 endTime time.Time // Set immediately before closing 'completeChan'. 155 staticStartTime time.Time // Set immediately when the download object is created. 156 157 // Basic information about the file. 158 destination downloadDestination 159 destinationString string // The string reported to the user to indicate the download's destination. 160 staticDestinationType string // "memory buffer", "http stream", "file", etc. 161 staticLength uint64 // Length to download starting from the offset. 162 staticOffset uint64 // Offset within the file to start the download. 163 staticSiaPath string // The path of the siafile at the time the download started. 164 165 // Retrieval settings for the file. 166 staticLatencyTarget time.Duration // In milliseconds. Lower latency results in lower total system throughput. 167 staticOverdrive int // How many extra pieces to download to prevent slow hosts from being a bottleneck. 168 staticPriority uint64 // Downloads with higher priority will complete first. 169 170 // Utilities. 171 log *persist.Logger // Same log as the renter. 172 memoryManager *memoryManager // Same memoryManager used across the renter. 173 mu sync.Mutex // Unique to the download object. 174 } 175 176 // downloadParams is the set of parameters to use when downloading a file. 177 downloadParams struct { 178 destination downloadDestination // The place to write the downloaded data. 179 destinationType string // "file", "buffer", "http stream", etc. 180 destinationString string // The string to report to the user for the destination. 181 file *file // The file to download. 182 183 latencyTarget time.Duration // Workers above this latency will be automatically put on standby initially. 184 length uint64 // Length of download. Cannot be 0. 185 needsMemory bool // Whether new memory needs to be allocated to perform the download. 186 offset uint64 // Offset within the file to start the download. Must be less than the total filesize. 187 overdrive int // How many extra pieces to download to prevent slow hosts from being a bottleneck. 188 priority uint64 // Files with a higher priority will be downloaded first. 189 } 190 ) 191 192 // managedFail will mark the download as complete, but with the provided error. 193 // If the download has already failed, the error will be updated to be a 194 // concatenation of the previous error and the new error. 195 func (d *download) managedFail(err error) { 196 d.mu.Lock() 197 defer d.mu.Unlock() 198 199 // If the download is already complete, extend the error. 200 complete := d.staticComplete() 201 if complete && d.err != nil { 202 return 203 } else if complete && d.err == nil { 204 d.log.Critical("download is marked as completed without error, but then managedFail was called with err:", err) 205 return 206 } 207 208 // Mark the download as complete and set the error. 209 d.err = err 210 close(d.completeChan) 211 err = d.destination.Close() 212 if err != nil { 213 d.log.Println("unable to close download destination:", err) 214 } 215 } 216 217 // staticComplete is a helper function to indicate whether or not the download 218 // has completed. 219 func (d *download) staticComplete() bool { 220 select { 221 case <-d.completeChan: 222 return true 223 default: 224 return false 225 } 226 } 227 228 // Err returns the error encountered by a download, if it exists. 229 func (d *download) Err() (err error) { 230 d.mu.Lock() 231 err = d.err 232 d.mu.Unlock() 233 return err 234 } 235 236 // newDownload creates and initializes a download based on the provided 237 // parameters. 238 func (r *Renter) newDownload(params downloadParams) (*download, error) { 239 // Input validation. 240 if params.file == nil { 241 return nil, errors.New("no file provided when requesting download") 242 } 243 if params.length <= 0 { 244 return nil, errors.New("download length must be a positive whole number") 245 } 246 if params.offset < 0 { 247 return nil, errors.New("download offset cannot be a negative number") 248 } 249 if params.offset+params.length > params.file.size { 250 return nil, errors.New("download is requesting data past the boundary of the file") 251 } 252 253 // Create the download object. 254 d := &download{ 255 completeChan: make(chan struct{}), 256 257 staticStartTime: time.Now(), 258 259 destination: params.destination, 260 destinationString: params.destinationString, 261 staticDestinationType: params.destinationType, 262 staticLatencyTarget: params.latencyTarget, 263 staticLength: params.length, 264 staticOffset: params.offset, 265 staticOverdrive: params.overdrive, 266 staticSiaPath: params.file.name, 267 staticPriority: params.priority, 268 269 log: r.log, 270 memoryManager: r.memoryManager, 271 } 272 273 // Determine which chunks to download. 274 minChunk := params.offset / params.file.staticChunkSize() 275 maxChunk := (params.offset + params.length - 1) / params.file.staticChunkSize() 276 277 // For each chunk, assemble a mapping from the contract id to the index of 278 // the piece within the chunk that the contract is responsible for. 279 chunkMaps := make([]map[types.FileContractID]downloadPieceInfo, maxChunk-minChunk+1) 280 for i := range chunkMaps { 281 chunkMaps[i] = make(map[types.FileContractID]downloadPieceInfo) 282 } 283 params.file.mu.Lock() 284 for id, contract := range params.file.contracts { 285 resolvedID := r.hostContractor.ResolveID(id) 286 for _, piece := range contract.Pieces { 287 if piece.Chunk >= minChunk && piece.Chunk <= maxChunk { 288 // Sanity check - the same worker should not have two pieces for 289 // the same chunk. 290 _, exists := chunkMaps[piece.Chunk-minChunk][resolvedID] 291 if exists { 292 r.log.Println("ERROR: Worker has multiple pieces uploaded for the same chunk.") 293 } 294 chunkMaps[piece.Chunk-minChunk][resolvedID] = downloadPieceInfo{ 295 index: piece.Piece, 296 root: piece.MerkleRoot, 297 } 298 } 299 } 300 } 301 params.file.mu.Unlock() 302 303 // Queue the downloads for each chunk. 304 writeOffset := int64(0) // where to write a chunk within the download destination. 305 d.chunksRemaining += maxChunk - minChunk + 1 306 for i := minChunk; i <= maxChunk; i++ { 307 udc := &unfinishedDownloadChunk{ 308 destination: params.destination, 309 erasureCode: params.file.erasureCode, 310 masterKey: params.file.masterKey, 311 312 staticChunkIndex: i, 313 staticCacheID: fmt.Sprintf("%v:%v", d.staticSiaPath, i), 314 staticChunkMap: chunkMaps[i-minChunk], 315 staticChunkSize: params.file.staticChunkSize(), 316 staticPieceSize: params.file.pieceSize, 317 318 // TODO: 25ms is just a guess for a good default. Really, we want to 319 // set the latency target such that slower workers will pick up the 320 // later chunks, but only if there's a very strong chance that 321 // they'll finish before the earlier chunks finish, so that they do 322 // no contribute to low latency. 323 // 324 // TODO: There is some sane minimum latency that should actually be 325 // set based on the number of pieces 'n', and the 'n' fastest 326 // workers that we have. 327 staticLatencyTarget: params.latencyTarget + (25 * time.Duration(i-minChunk)), // Increase target by 25ms per chunk. 328 staticNeedsMemory: params.needsMemory, 329 staticPriority: params.priority, 330 331 physicalChunkData: make([][]byte, params.file.erasureCode.NumPieces()), 332 pieceUsage: make([]bool, params.file.erasureCode.NumPieces()), 333 334 download: d, 335 chunkCache: r.chunkCache, 336 cacheMu: r.cmu, 337 } 338 339 // Set the fetchOffset - the offset within the chunk that we start 340 // downloading from. 341 if i == minChunk { 342 udc.staticFetchOffset = params.offset % params.file.staticChunkSize() 343 } else { 344 udc.staticFetchOffset = 0 345 } 346 // Set the fetchLength - the number of bytes to fetch within the chunk 347 // that we start downloading from. 348 if i == maxChunk && (params.length+params.offset)%params.file.staticChunkSize() != 0 { 349 udc.staticFetchLength = ((params.length + params.offset) % params.file.staticChunkSize()) - udc.staticFetchOffset 350 } else { 351 udc.staticFetchLength = params.file.staticChunkSize() - udc.staticFetchOffset 352 } 353 // Set the writeOffset within the destination for where the data should 354 // be written. 355 udc.staticWriteOffset = writeOffset 356 writeOffset += int64(udc.staticFetchLength) 357 358 // TODO: Currently all chunks are given overdrive. This should probably 359 // be changed once the hostdb knows how to measure host speed/latency 360 // and once we can assign overdrive dynamically. 361 udc.staticOverdrive = params.overdrive 362 363 // Add this chunk to the chunk heap, and notify the download loop that 364 // there is work to do. 365 r.managedAddChunkToDownloadHeap(udc) 366 select { 367 case r.newDownloads <- struct{}{}: 368 default: 369 } 370 } 371 return d, nil 372 } 373 374 // Download performs a file download using the passed parameters. 375 func (r *Renter) Download(p modules.RenterDownloadParameters) error { 376 // Lookup the file associated with the nickname. 377 lockID := r.mu.RLock() 378 file, exists := r.files[p.SiaPath] 379 r.mu.RUnlock(lockID) 380 if !exists { 381 return fmt.Errorf("no file with that path: %s", p.SiaPath) 382 } 383 384 // Validate download parameters. 385 isHTTPResp := p.Httpwriter != nil 386 if p.Async && isHTTPResp { 387 return errors.New("cannot async download to http response") 388 } 389 if isHTTPResp && p.Destination != "" { 390 return errors.New("destination cannot be specified when downloading to http response") 391 } 392 if !isHTTPResp && p.Destination == "" { 393 return errors.New("destination not supplied") 394 } 395 if p.Destination != "" && !filepath.IsAbs(p.Destination) { 396 return errors.New("destination must be an absolute path") 397 } 398 if p.Offset == file.size { 399 return errors.New("offset equals filesize") 400 } 401 // Sentinel: if length == 0, download the entire file. 402 if p.Length == 0 { 403 p.Length = file.size - p.Offset 404 } 405 // Check whether offset and length is valid. 406 if p.Offset < 0 || p.Offset+p.Length > file.size { 407 return fmt.Errorf("offset and length combination invalid, max byte is at index %d", file.size-1) 408 } 409 410 // Instantiate the correct downloadWriter implementation. 411 var dw downloadDestination 412 var destinationType string 413 if isHTTPResp { 414 dw = newDownloadDestinationWriteCloserFromWriter(p.Httpwriter) 415 destinationType = "http stream" 416 } else { 417 osFile, err := os.OpenFile(p.Destination, os.O_CREATE|os.O_WRONLY, os.FileMode(file.mode)) 418 if err != nil { 419 return err 420 } 421 dw = osFile 422 destinationType = "file" 423 } 424 425 // Create the download object. 426 d, err := r.newDownload(downloadParams{ 427 destination: dw, 428 destinationType: destinationType, 429 destinationString: p.Destination, 430 file: file, 431 432 latencyTarget: 25e3 * time.Millisecond, // TODO: high default until full latency support is added. 433 length: p.Length, 434 needsMemory: true, 435 offset: p.Offset, 436 overdrive: 3, // TODO: moderate default until full overdrive support is added. 437 priority: 5, // TODO: moderate default until full priority support is added. 438 }) 439 if err != nil { 440 return err 441 } 442 443 // Add the download object to the download queue. 444 r.downloadHistoryMu.Lock() 445 r.downloadHistory = append(r.downloadHistory, d) 446 r.downloadHistoryMu.Unlock() 447 448 // Block until the download has completed. 449 select { 450 case <-d.completeChan: 451 return d.Err() 452 case <-r.tg.StopChan(): 453 return errors.New("download interrupted by shutdown") 454 } 455 } 456 457 // DownloadHistory returns the list of downloads that have been performed. Will 458 // include downloads that have not yet completed. Downloads will be roughly, but 459 // not precisely, sorted according to start time. 460 // 461 // TODO: Currently the DownloadHistory only contains downloads from this 462 // session, does not contain downloads that were executed for the purposes of 463 // repairing, and has no way to clear the download history if it gets long or 464 // unwieldy. It's not entirely certain which of the missing features are 465 // actually desirable, please consult core team + app dev community before 466 // deciding what to implement. 467 func (r *Renter) DownloadHistory() []modules.DownloadInfo { 468 r.downloadHistoryMu.Lock() 469 defer r.downloadHistoryMu.Unlock() 470 471 downloads := make([]modules.DownloadInfo, len(r.downloadHistory)) 472 for i := range r.downloadHistory { 473 // Order from most recent to least recent. 474 d := r.downloadHistory[len(r.downloadHistory)-i-1] 475 d.mu.Lock() // Lock required for d.endTime only. 476 downloads[i] = modules.DownloadInfo{ 477 Destination: d.destinationString, 478 DestinationType: d.staticDestinationType, 479 Length: d.staticLength, 480 Offset: d.staticOffset, 481 SiaPath: d.staticSiaPath, 482 483 Completed: d.staticComplete(), 484 EndTime: d.endTime, 485 Received: atomic.LoadUint64(&d.atomicDataReceived), 486 StartTime: d.staticStartTime, 487 TotalDataTransferred: atomic.LoadUint64(&d.atomicTotalDataTransferred), 488 } 489 // Release download lock before calling d.Err(), which will acquire the 490 // lock. The error needs to be checked separately because we need to 491 // know if it's 'nil' before grabbing the error string. 492 d.mu.Unlock() 493 if d.Err() != nil { 494 downloads[i].Error = d.Err().Error() 495 } else { 496 downloads[i].Error = "" 497 } 498 } 499 return downloads 500 }