github.com/Synthesix/Sia@v1.3.3-0.20180413141344-f863baeed3ca/modules/renter/download.go

github.com/Synthesix/Sia@v1.3.3-0.20180413141344-f863baeed3ca/modules/renter/download.go (about)

     1  package renter
     2  
     3  // The download code follows a hopefully clean/intuitive flow for getting super
     4  // high and computationally efficient parallelism on downloads. When a download
     5  // is requested, it gets split into its respective chunks (which are downloaded
     6  // individually) and then put into the download heap. The primary purpose of the
     7  // download heap is to keep downloads on standby until there is enough memory
     8  // available to send the downloads off to the workers. The heap is sorted first
     9  // by priority, but then a few other criteria as well.
    10  //
    11  // Some downloads, in particular downloads issued by the repair code, have
    12  // already had their memory allocated. These downloads get to skip the heap and
    13  // go straight for the workers.
    14  //
    15  // When a download is distributed to workers, it is given to every single worker
    16  // without checking whether that worker is appropriate for the download. Each
    17  // worker has their own queue, which is bottlenecked by the fact that a worker
    18  // can only process one item at a time. When the worker gets to a download
    19  // request, it determines whether it is suited for downloading that particular
    20  // file. The criteria it uses include whether or not it has a piece of that
    21  // chunk, how many other workers are currently downloading pieces or have
    22  // completed pieces for that chunk, and finally things like worker latency and
    23  // worker price.
    24  //
    25  // If the worker chooses to download a piece, it will register itself with that
    26  // piece, so that other workers know how many workers are downloading each
    27  // piece. This keeps everything cleanly coordinated and prevents too many
    28  // workers from downloading a given piece, while at the same time you don't need
    29  // a giant messy coordinator tracking everything. If a worker chooses not to
    30  // download a piece, it will add itself to the list of standby workers, so that
    31  // in the event of a failure, the worker can be returned to and used again as a
    32  // backup worker. The worker may also decide that it is not suitable at all (for
    33  // example, if the worker has recently had some consecutive failures, or if the
    34  // worker doesn't have access to a piece of that chunk), in which case it will
    35  // mark itself as unavailable to the chunk.
    36  //
    37  // As workers complete, they will release memory and check on the overall state
    38  // of the chunk. If some workers fail, they will enlist the standby workers to
    39  // pick up the slack.
    40  //
    41  // When the final required piece finishes downloading, the worker who completed
    42  // the final piece will spin up a separate thread to decrypt, decode, and write
    43  // out the download. That thread will then clean up any remaining resources, and
    44  // if this was the final unfinished chunk in the download, it'll mark the
    45  // download as complete.
    46  
    47  // The download process has a slightly complicating factor, which is overdrive
    48  // workers. Traditionally, if you need 10 pieces to recover a file, you will use
    49  // 10 workers. But if you have an overdrive of '2', you will actually use 12
    50  // workers, meaning you download 2 more pieces than you need. This means that up
    51  // to two of the workers can be slow or fail and the download can still complete
    52  // quickly. This complicates resource handling, because not all memory can be
    53  // released as soon as a download completes - there may be overdrive workers
    54  // still out fetching the file. To handle this, a catchall 'cleanUp' function is
    55  // used which gets called every time a worker finishes, and every time recovery
    56  // completes. The result is that memory gets cleaned up as required, and no
    57  // overarching coordination is needed between the overdrive workers (who do not
    58  // even know that they are overdrive workers) and the recovery function.
    59  
    60  // By default, the download code organizes itself around having maximum possible
    61  // throughput. That is, it is highly parallel, and exploits that parallelism as
    62  // efficiently and effectively as possible. The hostdb does a good of selecting
    63  // for hosts that have good traits, so we can generally assume that every host
    64  // or worker at our disposable is reasonably effective in all dimensions, and
    65  // that the overall selection is generally geared towards the user's
    66  // preferences.
    67  //
    68  // We can leverage the standby workers in each unfinishedDownloadChunk to
    69  // emphasize various traits. For example, if we want to prioritize latency,
    70  // we'll put a filter in the 'managedProcessDownloadChunk' function that has a
    71  // worker go standby instead of accept a chunk if the latency is higher than the
    72  // targeted latency. These filters can target other traits as well, such as
    73  // price and total throughput.
    74  
    75  // TODO: One of the biggest requested features for users is to improve the
    76  // latency of the system. The biggest fruit actually doesn't happen here, right
    77  // now the hostdb doesn't discriminate based on latency at all, and simply
    78  // adding some sort of latency scoring will probably be the biggest thing that
    79  // we can do to improve overall file latency.
    80  //
    81  // After we do that, the second most important thing that we can do is enable
    82  // partial downloads. It's hard to have a low latency when to get any data back
    83  // at all you need to download a full 40 MiB. If we can leverage partial
    84  // downloads to drop that to something like 256kb, we'll get much better overall
    85  // latency for small files and for starting video streams.
    86  //
    87  // After both of those, we can leverage worker latency discrimination. We can
    88  // add code to 'managedProcessDownloadChunk' to put a worker on standby
    89  // initially instead of have it grab a piece if the latency of the worker is
    90  // higher than the faster workers. This will prevent the slow workers from
    91  // bottlenecking a chunk that we are trying to download quickly, though it will
    92  // harm overall system throughput because it means that the slower workers will
    93  // idle some of the time.
    94  
    95  // TODO: Currently the number of overdrive workers is set to '2' for the first 2
    96  // chunks of any user-initiated download. But really, this should be a parameter
    97  // of downloading that gets set by the user through the API on a per-file basis
    98  // instead of set by default.
    99  
   100  // TODO: I tried to write the code such that the transition to true partial
   101  // downloads would be as seamless as possible, but there's a lot of work that
   102  // still needs to be done to make that fully possible. The most disruptive thing
   103  // probably is the place where we call 'Sector' in worker.managedDownload.
   104  // That's going to need to be changed to a partial sector. This is probably
   105  // going to result in downloading that's 64-byte aligned instead of perfectly
   106  // byte-aligned. Further, the encryption and erasure coding may also have
   107  // alignment requirements which interefere with how the call to Sector can work.
   108  // So you need to make sure that in 'managedDownload' you download at least
   109  // enough data to fit the alignment requirements of all 3 steps (download from
   110  // host, encryption, erasure coding). After the logical data has been recovered,
   111  // we slice it to whatever is meant to be written to the underlying
   112  // downloadWriter, that code is going to need to be adjusted as well to slice
   113  // things in the right way.
   114  //
   115  // Overall I don't think it's going to be all that difficult, but it's not
   116  // nearly as clean-cut as some of the other potential extensions that we can do.
   117  
   118  // TODO: Right now the whole download will build and send off chunks even if
   119  // there are not enough hosts to download the file, and even if there are not
   120  // enough hosts to download a particular chunk. For the downloads and chunks
   121  // which are doomed from the outset, we can skip some computation by checking
   122  // and failing earlier. Another optimization we can make is to not count a
   123  // worker for a chunk if the worker's contract does not appear in the chunk
   124  // heap.
   125  
   126  import (
   127  	"fmt"
   128  	"os"
   129  	"path/filepath"
   130  	"sync"
   131  	"sync/atomic"
   132  	"time"
   133  
   134  	"github.com/Synthesix/Sia/modules"
   135  	"github.com/Synthesix/Sia/persist"
   136  	"github.com/Synthesix/Sia/types"
   137  
   138  	"github.com/NebulousLabs/errors"
   139  )
   140  
   141  type (
   142  	// A download is a file download that has been queued by the renter.
   143  	download struct {
   144  		// Data progress variables.
   145  		atomicDataReceived         uint64 // Incremented as data completes, will stop at 100% file progress.
   146  		atomicTotalDataTransferred uint64 // Incremented as data arrives, includes overdrive, contract negotiation, etc.
   147  
   148  		// Other progress variables.
   149  		chunksRemaining uint64        // Number of chunks whose downloads are incomplete.
   150  		completeChan    chan struct{} // Closed once the download is complete.
   151  		err             error         // Only set if there was an error which prevented the download from completing.
   152  
   153  		// Timestamp information.
   154  		endTime         time.Time // Set immediately before closing 'completeChan'.
   155  		staticStartTime time.Time // Set immediately when the download object is created.
   156  
   157  		// Basic information about the file.
   158  		destination           downloadDestination
   159  		destinationString     string // The string reported to the user to indicate the download's destination.
   160  		staticDestinationType string // "memory buffer", "http stream", "file", etc.
   161  		staticLength          uint64 // Length to download starting from the offset.
   162  		staticOffset          uint64 // Offset within the file to start the download.
   163  		staticSiaPath         string // The path of the siafile at the time the download started.
   164  
   165  		// Retrieval settings for the file.
   166  		staticLatencyTarget time.Duration // In milliseconds. Lower latency results in lower total system throughput.
   167  		staticOverdrive     int           // How many extra pieces to download to prevent slow hosts from being a bottleneck.
   168  		staticPriority      uint64        // Downloads with higher priority will complete first.
   169  
   170  		// Utilities.
   171  		log           *persist.Logger // Same log as the renter.
   172  		memoryManager *memoryManager  // Same memoryManager used across the renter.
   173  		mu            sync.Mutex      // Unique to the download object.
   174  	}
   175  
   176  	// downloadParams is the set of parameters to use when downloading a file.
   177  	downloadParams struct {
   178  		destination       downloadDestination // The place to write the downloaded data.
   179  		destinationType   string              // "file", "buffer", "http stream", etc.
   180  		destinationString string              // The string to report to the user for the destination.
   181  		file              *file               // The file to download.
   182  
   183  		latencyTarget time.Duration // Workers above this latency will be automatically put on standby initially.
   184  		length        uint64        // Length of download. Cannot be 0.
   185  		needsMemory   bool          // Whether new memory needs to be allocated to perform the download.
   186  		offset        uint64        // Offset within the file to start the download. Must be less than the total filesize.
   187  		overdrive     int           // How many extra pieces to download to prevent slow hosts from being a bottleneck.
   188  		priority      uint64        // Files with a higher priority will be downloaded first.
   189  	}
   190  )
   191  
   192  // managedFail will mark the download as complete, but with the provided error.
   193  // If the download has already failed, the error will be updated to be a
   194  // concatenation of the previous error and the new error.
   195  func (d *download) managedFail(err error) {
   196  	d.mu.Lock()
   197  	defer d.mu.Unlock()
   198  
   199  	// If the download is already complete, extend the error.
   200  	complete := d.staticComplete()
   201  	if complete && d.err != nil {
   202  		return
   203  	} else if complete && d.err == nil {
   204  		d.log.Critical("download is marked as completed without error, but then managedFail was called with err:", err)
   205  		return
   206  	}
   207  
   208  	// Mark the download as complete and set the error.
   209  	d.err = err
   210  	close(d.completeChan)
   211  	err = d.destination.Close()
   212  	if err != nil {
   213  		d.log.Println("unable to close download destination:", err)
   214  	}
   215  }
   216  
   217  // staticComplete is a helper function to indicate whether or not the download
   218  // has completed.
   219  func (d *download) staticComplete() bool {
   220  	select {
   221  	case <-d.completeChan:
   222  		return true
   223  	default:
   224  		return false
   225  	}
   226  }
   227  
   228  // Err returns the error encountered by a download, if it exists.
   229  func (d *download) Err() (err error) {
   230  	d.mu.Lock()
   231  	err = d.err
   232  	d.mu.Unlock()
   233  	return err
   234  }
   235  
   236  // newDownload creates and initializes a download based on the provided
   237  // parameters.
   238  func (r *Renter) newDownload(params downloadParams) (*download, error) {
   239  	// Input validation.
   240  	if params.file == nil {
   241  		return nil, errors.New("no file provided when requesting download")
   242  	}
   243  	if params.length <= 0 {
   244  		return nil, errors.New("download length must be a positive whole number")
   245  	}
   246  	if params.offset < 0 {
   247  		return nil, errors.New("download offset cannot be a negative number")
   248  	}
   249  	if params.offset+params.length > params.file.size {
   250  		return nil, errors.New("download is requesting data past the boundary of the file")
   251  	}
   252  
   253  	// Create the download object.
   254  	d := &download{
   255  		completeChan: make(chan struct{}),
   256  
   257  		staticStartTime: time.Now(),
   258  
   259  		destination:           params.destination,
   260  		destinationString:     params.destinationString,
   261  		staticDestinationType: params.destinationType,
   262  		staticLatencyTarget:   params.latencyTarget,
   263  		staticLength:          params.length,
   264  		staticOffset:          params.offset,
   265  		staticOverdrive:       params.overdrive,
   266  		staticSiaPath:         params.file.name,
   267  		staticPriority:        params.priority,
   268  
   269  		log:           r.log,
   270  		memoryManager: r.memoryManager,
   271  	}
   272  
   273  	// Determine which chunks to download.
   274  	minChunk := params.offset / params.file.staticChunkSize()
   275  	maxChunk := (params.offset + params.length - 1) / params.file.staticChunkSize()
   276  
   277  	// For each chunk, assemble a mapping from the contract id to the index of
   278  	// the piece within the chunk that the contract is responsible for.
   279  	chunkMaps := make([]map[types.FileContractID]downloadPieceInfo, maxChunk-minChunk+1)
   280  	for i := range chunkMaps {
   281  		chunkMaps[i] = make(map[types.FileContractID]downloadPieceInfo)
   282  	}
   283  	params.file.mu.Lock()
   284  	for id, contract := range params.file.contracts {
   285  		resolvedID := r.hostContractor.ResolveID(id)
   286  		for _, piece := range contract.Pieces {
   287  			if piece.Chunk >= minChunk && piece.Chunk <= maxChunk {
   288  				// Sanity check - the same worker should not have two pieces for
   289  				// the same chunk.
   290  				_, exists := chunkMaps[piece.Chunk-minChunk][resolvedID]
   291  				if exists {
   292  					r.log.Println("ERROR: Worker has multiple pieces uploaded for the same chunk.")
   293  				}
   294  				chunkMaps[piece.Chunk-minChunk][resolvedID] = downloadPieceInfo{
   295  					index: piece.Piece,
   296  					root:  piece.MerkleRoot,
   297  				}
   298  			}
   299  		}
   300  	}
   301  	params.file.mu.Unlock()
   302  
   303  	// Queue the downloads for each chunk.
   304  	writeOffset := int64(0) // where to write a chunk within the download destination.
   305  	d.chunksRemaining += maxChunk - minChunk + 1
   306  	for i := minChunk; i <= maxChunk; i++ {
   307  		udc := &unfinishedDownloadChunk{
   308  			destination: params.destination,
   309  			erasureCode: params.file.erasureCode,
   310  			masterKey:   params.file.masterKey,
   311  
   312  			staticChunkIndex: i,
   313  			staticCacheID:    fmt.Sprintf("%v:%v", d.staticSiaPath, i),
   314  			staticChunkMap:   chunkMaps[i-minChunk],
   315  			staticChunkSize:  params.file.staticChunkSize(),
   316  			staticPieceSize:  params.file.pieceSize,
   317  
   318  			// TODO: 25ms is just a guess for a good default. Really, we want to
   319  			// set the latency target such that slower workers will pick up the
   320  			// later chunks, but only if there's a very strong chance that
   321  			// they'll finish before the earlier chunks finish, so that they do
   322  			// no contribute to low latency.
   323  			//
   324  			// TODO: There is some sane minimum latency that should actually be
   325  			// set based on the number of pieces 'n', and the 'n' fastest
   326  			// workers that we have.
   327  			staticLatencyTarget: params.latencyTarget + (25 * time.Duration(i-minChunk)), // Increase target by 25ms per chunk.
   328  			staticNeedsMemory:   params.needsMemory,
   329  			staticPriority:      params.priority,
   330  
   331  			physicalChunkData: make([][]byte, params.file.erasureCode.NumPieces()),
   332  			pieceUsage:        make([]bool, params.file.erasureCode.NumPieces()),
   333  
   334  			download:   d,
   335  			chunkCache: r.chunkCache,
   336  			cacheMu:    r.cmu,
   337  		}
   338  
   339  		// Set the fetchOffset - the offset within the chunk that we start
   340  		// downloading from.
   341  		if i == minChunk {
   342  			udc.staticFetchOffset = params.offset % params.file.staticChunkSize()
   343  		} else {
   344  			udc.staticFetchOffset = 0
   345  		}
   346  		// Set the fetchLength - the number of bytes to fetch within the chunk
   347  		// that we start downloading from.
   348  		if i == maxChunk && (params.length+params.offset)%params.file.staticChunkSize() != 0 {
   349  			udc.staticFetchLength = ((params.length + params.offset) % params.file.staticChunkSize()) - udc.staticFetchOffset
   350  		} else {
   351  			udc.staticFetchLength = params.file.staticChunkSize() - udc.staticFetchOffset
   352  		}
   353  		// Set the writeOffset within the destination for where the data should
   354  		// be written.
   355  		udc.staticWriteOffset = writeOffset
   356  		writeOffset += int64(udc.staticFetchLength)
   357  
   358  		// TODO: Currently all chunks are given overdrive. This should probably
   359  		// be changed once the hostdb knows how to measure host speed/latency
   360  		// and once we can assign overdrive dynamically.
   361  		udc.staticOverdrive = params.overdrive
   362  
   363  		// Add this chunk to the chunk heap, and notify the download loop that
   364  		// there is work to do.
   365  		r.managedAddChunkToDownloadHeap(udc)
   366  		select {
   367  		case r.newDownloads <- struct{}{}:
   368  		default:
   369  		}
   370  	}
   371  	return d, nil
   372  }
   373  
   374  // Download performs a file download using the passed parameters.
   375  func (r *Renter) Download(p modules.RenterDownloadParameters) error {
   376  	// Lookup the file associated with the nickname.
   377  	lockID := r.mu.RLock()
   378  	file, exists := r.files[p.SiaPath]
   379  	r.mu.RUnlock(lockID)
   380  	if !exists {
   381  		return fmt.Errorf("no file with that path: %s", p.SiaPath)
   382  	}
   383  
   384  	// Validate download parameters.
   385  	isHTTPResp := p.Httpwriter != nil
   386  	if p.Async && isHTTPResp {
   387  		return errors.New("cannot async download to http response")
   388  	}
   389  	if isHTTPResp && p.Destination != "" {
   390  		return errors.New("destination cannot be specified when downloading to http response")
   391  	}
   392  	if !isHTTPResp && p.Destination == "" {
   393  		return errors.New("destination not supplied")
   394  	}
   395  	if p.Destination != "" && !filepath.IsAbs(p.Destination) {
   396  		return errors.New("destination must be an absolute path")
   397  	}
   398  	if p.Offset == file.size {
   399  		return errors.New("offset equals filesize")
   400  	}
   401  	// Sentinel: if length == 0, download the entire file.
   402  	if p.Length == 0 {
   403  		p.Length = file.size - p.Offset
   404  	}
   405  	// Check whether offset and length is valid.
   406  	if p.Offset < 0 || p.Offset+p.Length > file.size {
   407  		return fmt.Errorf("offset and length combination invalid, max byte is at index %d", file.size-1)
   408  	}
   409  
   410  	// Instantiate the correct downloadWriter implementation.
   411  	var dw downloadDestination
   412  	var destinationType string
   413  	if isHTTPResp {
   414  		dw = newDownloadDestinationWriteCloserFromWriter(p.Httpwriter)
   415  		destinationType = "http stream"
   416  	} else {
   417  		osFile, err := os.OpenFile(p.Destination, os.O_CREATE|os.O_WRONLY, os.FileMode(file.mode))
   418  		if err != nil {
   419  			return err
   420  		}
   421  		dw = osFile
   422  		destinationType = "file"
   423  	}
   424  
   425  	// Create the download object.
   426  	d, err := r.newDownload(downloadParams{
   427  		destination:       dw,
   428  		destinationType:   destinationType,
   429  		destinationString: p.Destination,
   430  		file:              file,
   431  
   432  		latencyTarget: 25e3 * time.Millisecond, // TODO: high default until full latency support is added.
   433  		length:        p.Length,
   434  		needsMemory:   true,
   435  		offset:        p.Offset,
   436  		overdrive:     3, // TODO: moderate default until full overdrive support is added.
   437  		priority:      5, // TODO: moderate default until full priority support is added.
   438  	})
   439  	if err != nil {
   440  		return err
   441  	}
   442  
   443  	// Add the download object to the download queue.
   444  	r.downloadHistoryMu.Lock()
   445  	r.downloadHistory = append(r.downloadHistory, d)
   446  	r.downloadHistoryMu.Unlock()
   447  
   448  	// Block until the download has completed.
   449  	select {
   450  	case <-d.completeChan:
   451  		return d.Err()
   452  	case <-r.tg.StopChan():
   453  		return errors.New("download interrupted by shutdown")
   454  	}
   455  }
   456  
   457  // DownloadHistory returns the list of downloads that have been performed. Will
   458  // include downloads that have not yet completed. Downloads will be roughly, but
   459  // not precisely, sorted according to start time.
   460  //
   461  // TODO: Currently the DownloadHistory only contains downloads from this
   462  // session, does not contain downloads that were executed for the purposes of
   463  // repairing, and has no way to clear the download history if it gets long or
   464  // unwieldy. It's not entirely certain which of the missing features are
   465  // actually desirable, please consult core team + app dev community before
   466  // deciding what to implement.
   467  func (r *Renter) DownloadHistory() []modules.DownloadInfo {
   468  	r.downloadHistoryMu.Lock()
   469  	defer r.downloadHistoryMu.Unlock()
   470  
   471  	downloads := make([]modules.DownloadInfo, len(r.downloadHistory))
   472  	for i := range r.downloadHistory {
   473  		// Order from most recent to least recent.
   474  		d := r.downloadHistory[len(r.downloadHistory)-i-1]
   475  		d.mu.Lock() // Lock required for d.endTime only.
   476  		downloads[i] = modules.DownloadInfo{
   477  			Destination:     d.destinationString,
   478  			DestinationType: d.staticDestinationType,
   479  			Length:          d.staticLength,
   480  			Offset:          d.staticOffset,
   481  			SiaPath:         d.staticSiaPath,
   482  
   483  			Completed:            d.staticComplete(),
   484  			EndTime:              d.endTime,
   485  			Received:             atomic.LoadUint64(&d.atomicDataReceived),
   486  			StartTime:            d.staticStartTime,
   487  			TotalDataTransferred: atomic.LoadUint64(&d.atomicTotalDataTransferred),
   488  		}
   489  		// Release download lock before calling d.Err(), which will acquire the
   490  		// lock. The error needs to be checked separately because we need to
   491  		// know if it's 'nil' before grabbing the error string.
   492  		d.mu.Unlock()
   493  		if d.Err() != nil {
   494  			downloads[i].Error = d.Err().Error()
   495  		} else {
   496  			downloads[i].Error = ""
   497  		}
   498  	}
   499  	return downloads
   500  }