github.com/filecoin-project/bacalhau@v0.3.23-0.20230228154132-45c989550ace/ops/terraform/remote_files/scripts/http-domain-allowlist.txt (about) 1 # This is the domain allowlist used for HTTP networking for "the Bacalhau team 2 # provided compute nodes" (as opposed to "all compute nodes on the Bacalhau 3 # network"). 4 # 5 # This list is very much about *our* perception of risk, what things *we* are 6 # comfortable with, and ensuring that *our* nodes are not performing 7 # illegal/questionable behaviour (as opposed to trying to define an allowlist 8 # for all compute providers to use, which would be much harder). 9 # 10 # Why do we have network restrictions? 11 # ==================================== 12 # Broadly, to stop certain behaviours on our nodes that we don't want to 13 # support. 14 # 15 # 1. Illegal behaviour on our nodes is our problem and we are liable for it (or 16 # will have to put in effort to contest we are liable). Example: using a job 17 # to download copyright files from one place and upload them to another. 18 # 2. Behaviour that our hosting provider(s) deem against their ToS might result 19 # in the shut down of our compute nodes. 20 # 3. Behaviour that circumvents the Bacalhau network operation (e.g. for paid 21 # jobs, using network connections to publish results before they've been paid 22 # for, and then denying payment) 23 # 4. Behaviour that degrades the ability of our nodes to serve legitimate 24 # requests and/or encourages flood use of our nodes (e.g. for unpaid jobs, 25 # using our nodes as bitcoin miners, constantly, in a loop, meaning that our 26 # ability to serve legitimate volunteer/example requests is reduced) 27 # 5. Behaviour that allows a job to be compromised by a malicious actor to 28 # achieve one of the above behaviours (e.g. for paid jobs, allowing a 29 # compromised package to download and run a bitcoin miner, sucking up all of 30 # the user's money). 31 # 32 # What use cases should we use networking to meet? 33 # ================================================ 34 # Well we have a sensible idea of what not to use it for: 35 # 36 # * As a summary of above: nothing illegal, nothing against Google TOS, nothing 37 # to circumvent our network principles, nothing that allows repeated use of 38 # the network to the point of degradation for other users 39 # * Bacalhau alerady has data input and output using storage and publishers, so 40 # we shouldn't use networking to meet any use cases where it is already 41 # possible using a suitably formatted job spec. 42 # 43 # And as a suggestion, we could start with the following list that we have 44 # observed people asking for: 45 # 46 # * Jobs where a tool expects a certain HTTP API for proper operation (e.g. 47 # build tools, like go, cargo, gem, pip etc) and hence where bringing data in 48 # via IPFS/URL download is not feasible/practical 49 # * Jobs whose sole role is to provide some [IPFS] consolidation of Internet 50 # endpoints (e.g. a job that downloads data/scrapes web pages from a set of 51 # domains, and archives the results on IPFS/Filecoin) 52 # * Jobs that enable our own use cases or those of our partners whom we have a 53 # trusted relationship with (e.g. Project Frog, or people we give grants to) 54 # and hence have more trust that the privilege will not be abused 55 # 56 # We should also treat jobs that will only make safe HTTP requests more 57 # leniently than those that do not (i.e. a job just downloading data is 58 # generally safer to approve than one that is POSTing results to different 59 # places). So domains that are predominantly read-only are normally fine, 60 # whereas those with writeable APIs need more care. 61 # 62 # How do I know what domains to approve? 63 # ====================================== 64 # You need to assure yourself that approving access to the domain meets ALL of 65 # the requirements in the first list of the above section and ONE OF the 66 # requirements in the second list of the above section. 67 # 68 # Start by checking out the domain on the web: what is it used for? Does it have 69 # good documentation of what can be done on it? Is it mainly for read-only data 70 # access or does it also include writable endpoints? Is the organisation 71 # operating it easy to find? 72 # 73 # Are the operators of the domain likely to have a content policy and 74 # moderation? So that it is not likely that the domain currently is being used 75 # for nefarious purposes, and anything that our user does that tries to use it 76 # for that will be shutdown/removed. Generally bigger players that display data 77 # publicly (e.g. Github) will have this. 78 # 79 # Remember that we should operate a "default deny" policy – if we can't be 80 # reasonably confident the domain access will be used appropriately, we just say 81 # no. 82 # 83 # We also need to think about what a domain "could" be used for outside of what 84 # the requestor is asking to use it for. E.g. if they are saying they only want 85 # to use it download some static files, but access to the domain could also 86 # enable some bitcoin-mining workflow, we should probably be saying no to that 87 # request. 88 # 89 # (We may find that domain-based allow-listing is not enough, and we need to go 90 # to the next level – job-based allow-listing, e.g. you can only access these 91 # domains if you want to run certain jobs we have approved.) 92 # 93 # Who updates this file and how? 94 # ============================== 95 # Anyone who has access to update Bacalhau compute nodes in production also has 96 # the ability to approve or deny allowlist changes. They should think through 97 # the above rationale and come to a decision. 98 # 99 # Community members who want to use new domains can either make the request on 100 # Slack or submit a Github PR against the allowlist that includes the domains 101 # they want to use. 102 103 # example domains 104 example.com 105 106 # golang dependencies 107 proxy.golang.org 108 sum.golang.org 109 index.golang.org 110 storage.googleapis.com 111 112 # boinc.multi-pool.info/latinsquares BOINC project 113 78.26.93.125 114 boinc.berkeley.edu 115 boinc.multi-pool.info 116 117 # einsteinathome.org BOINC project 118 einsteinathome.org 119 scheduler.einsteinathome.org 120 einstein.phys.uwm.edu 121 einstein-dl.syr.edu 122 .aei.uni-hannover.de