What are we talking about this time?
This time we are going to talk about AWS S3 storage. I have chosen to start with S3, as storage lies at the heart of a great many cloud services, both AWS and Azure. For example
- Use S3 blobs to create external SQL tables (AWS Athena)
- Use S3 storage with Kafka
- Use S3 with data warehouses such as AWS Redshift
- Use S3 with Apache Spark
- Use S3 with AWS Lambda
- Receive events when a new S3 operation occurs
These are just some of the things you can do using S3 storage. So it’s a great starting point. There are other storage options in AWS such as
- Glacier (archive / slow moving data storage)
- EFS (file system)
- Storage Gateway
I will probably cover some of these in future posts too, but for now lets stick to what this post will cover which is standard S3.
Initial setup
If you did not read the very first part of this series of posts, I urge you to go and read that one now as it shows you how to get started with AWS, and create an IAM user : https://sachabarbs.wordpress.com/2018/08/30/aws-initial-setup/
Where is the code?
The code for this post can be found here in GitHub : https://github.com/sachabarber/AWS/tree/master/Storage/S3BucketsAndKeys
Ok so how does S3 work?
S3 has a concept of a bucket, which is a top level entity. You may have multiple buckets, each which may have metadata, public/private permissions, auto encryptions etc etc enabled on it. Each bucket contains your files that you have uploaded. Conceptually there is not a lot more to it.
IAM user privileges needed for S3
You will need to add these permissions to your IAM user to allow them to use S3
- AmazonS3FullAccess
Obviously if you are working in a team you will not want to give out full access, but for this series of posts this is fine.
So what will we try and cover in this post?
So to my mind what I am trying to cover is all/most of the stuff I would like to do had I have been using Azure Blob Storage. To this end this is the list of things I will cover in this post.
- List all buckets
- Create a bucket
- Write an object to a bucket
- Write an object to a bucket with a pre-check to see if it exists
- Write a Stream to a bucket
- Read an object from a bucket
- Delete an object from a bucket
- List all objects in a bucket
Install the nugets
So lets start. The first thing we need to do is install the Nuget packages, which for this demo are
- AWSSDK.S3
- Nito.AsyncEx (nice async await extensions, like awaitable Console apps)
Ok now that we have that in place and we know (thanks to the 1st post about how to use the default Profile which is linked to the IAM user for this demo series), we can just go through the items on the list above one by one
List all buckets
var client = new AmazonS3Client(); ListBucketsResponse response = await client.ListBucketsAsync(); foreach (S3Bucket bucket in response.Buckets) { Console.WriteLine("You own Bucket with name: {0}", bucket.BucketName); }
Create a bucket
This example shows you how to make the bucket public/private
if (client.DoesS3BucketExist(bucketToCreate)) { Console.WriteLine($"{bucketToCreate} already exists, skipping this step"); } PutBucketRequest putBucketRequest = new PutBucketRequest() { BucketName = bucketToCreate, BucketRegion = S3Region.EUW2, CannedACL = isPublic ? S3CannedACL.PublicRead : S3CannedACL.Private }; var response = await client.PutBucketAsync(putBucketRequest);
Write an object to a bucket
This example also shows you how to attach metadata to an object in a bucket
var client = new AmazonS3Client(); // simple object put PutObjectRequest request = new PutObjectRequest() { ContentBody = "this is a test", BucketName = bucketName, Key = keyName }; PutObjectResponse response = await client.PutObjectAsync(request); // put a more complex object with some metadata and http headers. PutObjectRequest titledRequest = new PutObjectRequest() { BucketName = bucketName, Key = keyName }; titledRequest.Metadata.Add("title", "the title"); await client.PutObjectAsync(titledRequest);
Write an object to a bucket with a pre-check to see if it exists
var client = new AmazonS3Client(); if (!S3FileExists(bucketName, uniqueKeyName)) { // simple object put Console.WriteLine($"Adding file {uniqueKeyName}"); PutObjectRequest request = new PutObjectRequest() { ContentBody = "this is a test", BucketName = bucketName, Key = uniqueKeyName }; PutObjectResponse response = await client.PutObjectAsync(request); } else { Console.WriteLine($"File {uniqueKeyName} existed"); } .... .... .... bool S3FileExists(string bucketName, string keyName) { var s3FileInfo = new Amazon.S3.IO.S3FileInfo(client, bucketName, keyName); return s3FileInfo.Exists; }
Write a Stream to a bucket
There are plenty of times when you want to upload/download data as Streams, as you don’t want to take the memory hit of loading it all into memory in one go. For this reason it is a good idea to work with Stream objects. Here is an upload example using the TransferUtility which is something we might look at in another dedicated post
var client = new AmazonS3Client(); //sly inner function async Task<Stream> GenerateStreamFromStringAsync(string s) { var stream = new MemoryStream(); var writer = new StreamWriter(stream); await writer.WriteAsync(s); await writer.FlushAsync(); stream.Position = 0; return stream; } var bucketToCreate = $"public-{bucketName}"; await CreateABucketAsync(bucketToCreate); var fileTransferUtility = new TransferUtility(client); var fileName = Guid.NewGuid().ToString("N"); using (var streamToUpload = await GenerateStreamFromStringAsync("some random string contents")) { var uploadRequest = new TransferUtilityUploadRequest() { InputStream = streamToUpload, Key = $"{fileName}.txt", BucketName = bucketToCreate, CannedACL = S3CannedACL.PublicRead }; await fileTransferUtility.UploadAsync(uploadRequest); } Console.WriteLine($"Upload using stream to file '{fileName}' completed");
Read an object from a bucket
var client = new AmazonS3Client(); GetObjectRequest request = new GetObjectRequest() { BucketName = bucketName, Key = keyName }; using (GetObjectResponse response = await client.GetObjectAsync(request)) { string title = response.Metadata["x-amz-meta-title"]; Console.WriteLine("The object's title is {0}", title); string dest = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), keyName); if (!File.Exists(dest)) { await response.WriteResponseStreamToFileAsync(dest, true, CancellationToken.None); } }
Delete an object from a bucket
var client = new AmazonS3Client(); DeleteObjectRequest request = new DeleteObjectRequest() { BucketName = bucketName, Key = keyName }; await client.DeleteObjectAsync(request);
List all objects in a bucket
var client = new AmazonS3Client(); ListObjectsRequest request = new ListObjectsRequest(); request.BucketName = bucketName; ListObjectsResponse response = await client.ListObjectsAsync(request); foreach (S3Object entry in response.S3Objects) { Console.WriteLine("key = {0} size = {1}", entry.Key, entry.Size); } // list only things starting with "foo" request.Prefix = "foo"; response = await client.ListObjectsAsync(request); foreach (S3Object entry in response.S3Objects) { Console.WriteLine("key = {0} size = {1}", entry.Key, entry.Size); } // list only things that come after "bar" alphabetically request.Prefix = null; request.Marker = "bar"; response = await client.ListObjectsAsync(request); foreach (S3Object entry in response.S3Objects) { Console.WriteLine("key = {0} size = {1}", entry.Key, entry.Size); } // only list 3 things request.Prefix = null; request.Marker = null; request.MaxKeys = 3; response = await client.ListObjectsAsync(request); foreach (S3Object entry in response.S3Objects) { Console.WriteLine("key = {0} size = {1}", entry.Key, entry.Size); }
After running the demo code associated with this article we should see something like this
Conclusion
I think I have shown the most common things you may want to do with AWS s3. We will 100% be looking at some of the integration points with other services in the future. I hope I have also shown this stuff is not that hard, and the APIs are quite intuitive