github.com/slspeek/camlistore_namedsearch@v0.0.0-20140519202248-ed6f70f7721a/website/content/docs/overview (about)

     1  <h1>Camlistore Overview</h1>
     2  
     3  <p>Camlistore is your <b>personal storage system for life</b>.</p>
     4  
     5  <h2>Summary</h2>
     6  
     7  The project began because I wanted to...
     8  <ul>
     9  <li>... <b>store all my stuff forever</b>, not worrying about deleting, or losing stuff.</li>
    10  
    11  <li>... <b>save stuff easily</b>, and <b>without categorizing it or choosing a location</b> whenever I save it.  I just want a data dumptruck that I can throw stuff at whenever.</li>
    12  
    13  <li>... <b>never lose anything</b> because nothing can be overwritten (all blobs are content-addressable), and there's no delete support.  (optional garbage collection coming later)</li>
    14  
    15  <li>be able to <b>search for anything</b> I once stored.</li>
    16  
    17  <li>be able to <b>browse and visualize</b> stuff I've stored.</li>
    18  
    19  <li>... <b>not always be forced into a POSIX-y filesystem model</b>. That involves thinking of where to put stuff, and most the time I don't even want filenames. If I take a bunch of photos, those don't have filenames (or not good ones, and not unique). They just exist. They don't need a directory or a name. Likewise with blog posts, comments, likes, bookmarks, etc. They're just objects.</li>
    20  
    21  <li>... <b>have a POSIX-y filesystem when I want one</b>. And it should all be logically available on my tiny laptop's SSD disk, even if my laptop's disk is miniscule compared to my entire repo.  That is, there should actually be a caching virtual filesystem, not a daemon running rsync in the background. If I have to have a complete copy of my data locally, or I have to "choose which folders" to sync, that's broken.</li>
    22  
    23  <li>... <b>be able to synthesize POSIX-y filesystems from search queries</b> over my higher-level objects. e.g. a "recent" directory of recent photos from my Android phone (this all works already in 0.1)</li>
    24  
    25  <li><b>Not write another CMS system, ever</b>. Camlistore should be able to store and model any type of content, so it can just be a backend for other apps.</li>
    26  
    27  <li>... have <b>backups of all my social network content</b> I created daily on other people's servers, to protect myself if my account is hijacked, the company goes evil, changes ownership, or goes out of business..</li>
    28  
    29  <li>... have both a <b>web UI</b> and <b>command-line tools</b>, as well as a <b>FUSE filesystem</b>.</li>
    30  
    31  <li>... <b>be in control</b> of my data, but also still be able to utilize big companies' infrastructure cloud products if desired.</li>
    32  
    33  <li>... <b>be able to share content</b> with both technical and non-technical friends.</li>
    34  
    35  </ul>
    36  
    37  <p>Most of this works as of the 0.1 <a href="/download">release</a>, and the rest and more is in progress.</p>
    38  
    39  <h2>Longer Answer</h2>
    40  
    41  <p>Throughout our life, we all continue to generate content, whether
    42  that's writing documents, taking photos, writing comments online,
    43  liking our friends' posts on social networks, etc. Our content is
    44  typically spread between a mix of different companies' servers ("The
    45  Cloud") and your own hardware (laptops, phones, etc).  All of these
    46  things are prone to failure: companies go out of business, change
    47  ownership, or kill products. Personal harddrives fail, laptops and
    48  phones are dropped.</p>
    49  
    50  <p>It would be nice if we were a bit more in control. At least, it
    51  would be nice if we had a reliable backup of all our content. Once we
    52  have all our content, it's then nice to search it, view it, and
    53  directly serve it or share it out to others (public or with select
    54  ACLs), regardless of the original host's policies.</p>
    55  
    56  <p>Camlistore is a system to do all that.</p>
    57  
    58  <p>While Camlistore can store files like a traditional filesystem
    59  (think: "directories", "files", "filenames"), its specialized in
    60  storing higher-level objects, which can represent anything..</p>
    61  
    62  <p>In addition to an implementation, Camlistore is also a schema for
    63  how to represent many types of content. Much JSON is used.</p>
    64  
    65  <p>Because every type of content in Camlistore is represented using
    66  content-addressable blobs (even metadata), it's impossible to
    67  "overwrite" things. It also means it's easy for Camlistore to sync in
    68  any direction between your devices and Camlistore storage servers, without
    69  versioning or conflict resolution issues.</p>
    70  
    71  <p>Camlistore can represent both immutable information (like snapshots
    72  of filesystem trees), but can also represent mutable
    73  information. Mutable information is represented by storing immutable,
    74  timestamped, GPG-signed blobs representing a mutation request. The
    75  current state of an object is just the application of all mutation
    76  blobs up until that point in time. Thus all history is recorded and
    77  you can look at an object as it existed at any point in time, just by
    78  ignoring mutations after a certain point.</p>
    79  
    80  <p>Despite using parts of the OpenPGP spec, users don't need to use
    81  the GnuPG tools or go to key signing events or anything dorky like
    82  that.</p>
    83  
    84  <p>You are in control of your Camlistore server(s), whether you run
    85  your own copy or use a hosted version. In the latter case, you're at
    86  least logically in control, analagous to how you're in charge of your
    87  email (and it's your private repository of all your email), even if a
    88  big company runs your email for you. Of course, you can also store all
    89  your email in Camlistore too, but Gmail's interface and search is much
    90  better.</p>
    91  
    92  <p>Responsible (or paranoid) users would set up their Camlistore
    93  servers to cross-replicate and mirror between different big companies'
    94  cloud platforms if they're not able to run their own servers between
    95  different geographical areas. (e.g. cross-replicating between
    96  different big disks stored within a family)</p>
    97  
    98  <p>A Camlistore server comprises several parts, all of which are
    99  optional and can be turn on or off per-instance:</p>
   100  
   101  <ul>
   102  
   103   <li><b>Storage</b>: the most basic part of a Camlistore server is
   104    storage. This is anything which can Get or Put a blob (named by its
   105    content-addressable digest), and enumerate those blobs, sorted by
   106    their digest. The only metadata a storage server needs to track
   107    per-blob is its size. (No other metadata is permitted, as it's
   108    stored elsewhere) Implementations are trivial and exist for local
   109    disk, Amazon S3, Google Storage, etc. They're also composable, so
   110    there exists "shard", "replica", "remote", "conditional", and
   111    "encrypt" (in-progress) storage targets, which layer upon
   112    others<.</li>
   113  
   114    <li><b>Index</b>: index is implemented in terms of the Storage
   115    interface, so can be synchronously or asynchronously replicated to
   116    from other storage types. Putting a blob indexes it, enumerating
   117    returns what has been indexed, and getting isn't supported. An
   118    abstraction within Camlistore similar to the storage abstractions
   119    means that any underlying system which can store keys & values and
   120    can scan in sorted order from a point can be used to store
   121    Camlistore's indexes. Implementations are likewise trivial and exist
   122    for memory (for development), SQLite, LevelDB, MySQL, Postgres,
   123    MongoDB, App Engine, etc. Dynamo and others would be trivial.</li>
   124  
   125    <li><b>Search</b>: pointing Camlistore's search handlers at an index
   126    means you can search for your things.  It's worth pointing out that   
   127    you can lose your index at any time. If your database holding your index
   128    goes corrupt, just delete it all and re-replicate from your storage
   129    to your index: it'll be re-indexed and search will work again.</li>
   130  
   131    <li><b>User Interface</b>: the web user interface lets you click
   132    around and view your content, and do searches. Of course, you could
   133    also just use the command-line tools or API.</li>
   134  
   135  </ul>
   136  
   137  <p>Enough words for now.  See <a href="/docs/">the docs</a> and code for more.</p>
   138  
   139  <p><em>Last updated 2013-06-12</em></p>