github.com/filecoin-project/bacalhau@v0.3.23-0.20230228154132-45c989550ace/ops/terraform/remote_files/scripts/http-domain-allowlist.txt (about)

     1  # This is the domain allowlist used for HTTP networking for "the Bacalhau team
     2  # provided compute nodes" (as opposed to "all compute nodes on the Bacalhau
     3  # network").
     4  #
     5  # This list is very much about *our* perception of risk, what things *we* are
     6  # comfortable with, and ensuring that *our* nodes are not performing
     7  # illegal/questionable behaviour (as opposed to trying to define an allowlist
     8  # for all compute providers to use, which would be much harder).
     9  #
    10  # Why do we have network restrictions?
    11  # ====================================
    12  # Broadly, to stop certain behaviours on our nodes that we don't want to
    13  # support.
    14  #
    15  # 1. Illegal behaviour on our nodes is our problem and we are liable for it (or
    16  #    will have to put in effort to contest we are liable). Example: using a job
    17  #    to download copyright files from one place and upload them to another.
    18  # 2. Behaviour that our hosting provider(s) deem against their ToS might result
    19  #    in the shut down of our compute nodes.
    20  # 3. Behaviour that circumvents the Bacalhau network operation (e.g. for paid
    21  #    jobs, using network connections to publish results before they've been paid
    22  #    for, and then denying payment)
    23  # 4. Behaviour that degrades the ability of our nodes to serve legitimate
    24  #    requests and/or encourages flood use of our nodes (e.g. for unpaid jobs,
    25  #    using our nodes as bitcoin miners, constantly, in a loop, meaning that our
    26  #    ability to serve legitimate volunteer/example requests is reduced)
    27  # 5. Behaviour that allows a job to be compromised by a malicious actor to
    28  #    achieve one of the above behaviours (e.g. for paid jobs, allowing a
    29  #    compromised package to download and run a bitcoin miner, sucking up all of
    30  #    the user's money).
    31  #
    32  # What use cases should we use networking to meet?
    33  # ================================================
    34  # Well we have a sensible idea of what not to use it for:
    35  #
    36  # * As a summary of above: nothing illegal, nothing against Google TOS, nothing
    37  #   to circumvent our network principles, nothing that allows repeated use of
    38  #   the network to the point of degradation for other users
    39  # * Bacalhau alerady has data input and output using storage and publishers, so
    40  #   we shouldn't use networking to meet any use cases where it is already
    41  #   possible using a suitably formatted job spec.
    42  #
    43  # And as a suggestion, we could start with the following list that we have
    44  # observed people asking for:
    45  #
    46  # * Jobs where a tool expects a certain HTTP API for proper operation (e.g.
    47  #   build tools, like go, cargo, gem, pip etc) and hence where bringing data in
    48  #   via IPFS/URL download is not feasible/practical
    49  # * Jobs whose sole role is to provide some [IPFS] consolidation of Internet
    50  #   endpoints (e.g. a job that downloads data/scrapes web pages from a set of
    51  #   domains, and archives the results on IPFS/Filecoin)
    52  # * Jobs that enable our own use cases or those of our partners whom we have a
    53  #   trusted relationship with (e.g. Project Frog, or people we give grants to)
    54  #   and hence have more trust that the privilege will not be abused
    55  #
    56  # We should also treat jobs that will only make safe HTTP requests more
    57  # leniently than those that do not (i.e. a job just downloading data is
    58  # generally safer to approve than one that is POSTing results to different
    59  # places). So domains that are predominantly read-only are normally fine,
    60  # whereas those with writeable APIs need more care.
    61  #
    62  # How do I know what domains to approve?
    63  # ======================================
    64  # You need to assure yourself that approving access to the domain meets ALL of
    65  # the requirements in the first list of the above section and ONE OF the
    66  # requirements in the second list of the above section.
    67  #
    68  # Start by checking out the domain on the web: what is it used for? Does it have
    69  # good documentation of what can be done on it? Is it mainly for read-only data
    70  # access or does it also include writable endpoints? Is the organisation
    71  # operating it easy to find?
    72  #
    73  # Are the operators of the domain likely to have a content policy and
    74  # moderation? So that it is not likely that the domain currently is being used
    75  # for nefarious purposes, and anything that our user does that tries to use it
    76  # for that will be shutdown/removed. Generally bigger players that display data
    77  # publicly (e.g. Github) will have this.
    78  #
    79  # Remember that we should operate a "default deny" policy – if we can't be
    80  # reasonably confident the domain access will be used appropriately, we just say
    81  # no.
    82  #
    83  # We also need to think about what a domain "could" be used for outside of what
    84  # the requestor is asking to use it for. E.g. if they are saying they only want
    85  # to use it download some static files, but access to the domain could also
    86  # enable some bitcoin-mining workflow, we should probably be saying no to that
    87  # request.
    88  #
    89  # (We may find that domain-based allow-listing is not enough, and we need to go
    90  # to the next level – job-based allow-listing, e.g. you can only access these
    91  # domains if you want to run certain jobs we have approved.)
    92  #
    93  # Who updates this file and how?
    94  # ==============================
    95  # Anyone who has access to update Bacalhau compute nodes in production also has
    96  # the ability to approve or deny allowlist changes. They should think through
    97  # the above rationale and come to a decision.
    98  #
    99  # Community members who want to use new domains can either make the request on
   100  # Slack or submit a Github PR against the allowlist that includes the domains
   101  # they want to use.
   102  
   103  # example domains
   104  example.com
   105  
   106  # golang dependencies
   107  proxy.golang.org
   108  sum.golang.org
   109  index.golang.org
   110  storage.googleapis.com
   111  
   112  # boinc.multi-pool.info/latinsquares BOINC project
   113  78.26.93.125
   114  boinc.berkeley.edu
   115  boinc.multi-pool.info
   116  
   117  # einsteinathome.org BOINC project
   118  einsteinathome.org
   119  scheduler.einsteinathome.org
   120  einstein.phys.uwm.edu
   121  einstein-dl.syr.edu
   122  .aei.uni-hannover.de