How to Pull Files From S3 Based on Uploaded Date
AWS S3 Source
Amazon Unproblematic Storage Service (Amazon S3) provides a web services interface that can be used to store and retrieve any amount of data from anywhere on the web. Utilize an Amazon S3 Source to upload data to Sumo Logic from S3.
One Amazon S3Source tin collect information from a single S3 bucket. Even so, you lot can configure multiple S3Sourcesouthward to collect from one S3 bucket. For case, you lot could employ 1 S3Source to collect i item data type, then configure some other S3Source to collect another data type.
For information on S3 operation optimization, come across Request Rate and Performance Considerations.
Compressed data
An S3 Source tin collect either plain text or gzip-compressed text. Zip files are not supported.
Information is treated as plain text by default, but gzip decompression will exist used if both of the post-obit conditions use:
- The target file has a .gz or .gzip extension, or no file extension.
- The target file's initial bytes lucifer the gzip file format.
Configure an Amazon S3 Source
- Grant Sumo Logic admission to an Amazon S3 bucket.
- Enable logging in AWS using the Amazon Console.
- Ostend that logs are being delivered to the Amazon S3 bucket.
- Add an AWS S3 Source to collect objects from your Amazon S3 bucket. See below for details.
AWS S3 Source
When you create an AWS S3 Source, you add information technology to a Hosted Collector. Earlier creating the Source, identify the Hosted Collector you lot want to use or create a new Hosted Collector. For instructions, see Configure a Hosted Collector.
Rules
- If y'all're editing the
Collection should begin
appointment on a Source the new engagement must be later on the currentCollection should begin
date. - Sumo Logic supports log files (S3 objects) that exercise Non change after they are uploaded to S3. Support is not provided if your logging arroyo relies on updating files stored in an S3 bucket. S3 does not have a concept of updating existing files, you can only overwrite an existing file. When this overwrite happens, S3 considers information technology as a new file object, or a new version of the file, and that file object gets its own unique version ID.
Sumo Logic scans an S3 bucket based on the path expression supplied, or receives an SNS notification when a new file object is created. As part of this, nosotros receive a file name (cardinal) and the object'south ID. Information technology's compared against a list of file objects already ingested. If a matching file ID is non found the contents of the file are ingested in full.
When y'all overwrite a file in S3, the file object gets a new version ID and as a outcome, Sumo Logic sees it as a new file and ingests all of it. If with each version you mail to S3 you are only calculation to the end of the file, and so this volition lead to duplicate messages ingested, 1 message for each version of the file you created in S3.
- Indistinguishable logs are nerveless when changing the AWS versioned APIs setting from Yes to No and the S3 bucket has versioning enabled.
- Glacier objects will not be collected and are ignored.
- If you're using SNS yous demand to create a separate topic and subscription for each Source.
Cisco Umbrella
Cisco Umbrella offers logging to a Cisco-managed S3 bucket. Collection from these buckets has the post-obit limitations:
- AWS versioned APIs are not supported. The Use AWS versioned APIs setting on the Source must be disabled.
- S3 Event Notifications Integration is not supported.
- Access must exist provided with an Access ID and Key. Role-based access is non supported.
- Utilise a prefix in the path expression so information technology doesn't point to the root directory.
S3 Issue Notifications Integration
Sumo's S3 integration combines scan-based discovery and event based discovery into a unified integration that gives you the ability to maintain a low-latency integration for new content and provide assurances that no data was missed or dropped. When you enable upshot based notifications S3 will automatically publish new files to Amazon Simple Notification Service (SNS) topics which Sumo Logic tin can be subscribed. This notifies Sumo Logic immediately when new files are added to your S3 bucket and then we can collect them. For more data well-nigh SNS, run into the Amazon SNS product detail folio.
Enabling issue based notifications is an S3 bucket-level operation that subscribes to an SNS topic. An SNS topic is an admission point that Sumo Logic tin dynamically subscribe to in order to receive event notifications. When creating a Source that collects from an S3 bucket Sumo assigns an endpoint URL to the Source. The URL is for yous to utilise in the AWS subscription to the SNS topic so AWS notifies Sumo when in that location are new files. Come across Configuring Amazon S3 Upshot Notifications for more data.
You can accommodate the configuration of when and how AWS handles communication attempts with Sumo Logic. Run into Setting Amazon SNS Delivery Retry Policies for details.
Create an AWS S3 Source
- In Sumo Logic select Manage Data > Collection > Collection.
- On the Collectors folio, clickAdd Source side by side to a Hosted Collector, either an existing Hosted Collector, or one you have created for this purpose.
- SelectAmazon S3.
- Enter a name for the new Source. A description is optional.
- Select an S3 region or keep the default value of Others. The S3 region must match the appropriate S3 bucket created in your Amazon account.
- Employ AWS versioned APIs? Select Yeah to collect from managed buckets where versioning is not enabled, such as Cisco Umbrella. This uses the listing-object-versions and get-object-version AWS S3 APIs. Selecting Yep requires your credentials to have ListObjectVersions and GetObjectVersion permissions.
- ForBucket Name, enter the verbal proper name of your organization'southward S3 saucepan. Be sure to double-check the name as it appears in AWS, for example:
- ForPath Expression, enter the wildcard pattern that matches the S3 objects you lot'd like to collect. You can use one wildcard (*) in this string. Recursive path expressions utilize a single wildcard and exercise NOT utilise a leading frontwards slash. Come across About Amazon Path Expressions for details.
- Drove should begin. Choose or enter how far back you'd similar to brainstorm collecting historical logs. You can either:
- Choose a predefined value from dropdown list, ranging from "Now" to "72 hours agone" to "All Time", or
- Enter a relative value. To enter a relative value, click the Collection should begin field and press the delete key on your keyboard to clear the field. And so, enter a relative time expression, for instance
-1w
. You can define when you want collection to brainstorm in terms of months (Grand), weeks (w), days (d), hours (h), and minutes (yard).
- ForSource Category, enter any cord to tag the output nerveless from this Source. (Category metadata is stored in a searchable field chosen _sourceCategory.)
- Fields. Click the + Add Field link to add together custom log metadata Fields.
- Define the fields yous want to acquaintance, each field needs a name (fundamental) and value.
- For AWS Access yous have 2 Access Method options. SelectRole-based access orPrimal access based on the AWS authentication yous are providing. Role-based access is preferred, this was completed in the prerequisite step Grant Sumo Logic access to an AWS Product.
- ForRole-based admission enterthe Office ARN that was provided by AWS afterward creating the role.
- ForCentral access enter theAccess Fundamental ID and Secret Access Key. See AWS Access Central ID and AWS Secret Access Key for details.
- ForRole-based admission enterthe Office ARN that was provided by AWS afterward creating the role.
- Log File Discovery. You have the pick to ready up Amazon Unproblematic Notification Service (SNS) to notify Sumo Logic of new items in your S3 bucket. A browse interval is required and automatically applied to detect log files.
- Browse Interval. Sumo Logic will periodically scan your S3 bucket for new items in addition to SNS notifications. Automated is recommended to not incur additional AWS charges. This sets the browse interval based on if subscribed to an SNS topic endpoint and how often new files are detected over fourth dimension.
If the Source is not subscribed to an SNS topic and set to Automatic the scan interval is v minutes. Yous may enter a set frequency to scan your S3 bucket for new data. To larn more than nearly Browse Interval considerations, see About setting the S3 Scan Interval. - SNS Subscription Endpoint (Highly Recommended). New files will be collected by Sumo Logic as soon as the notification is received. This volition provide faster collection versus having to look for the next browse to detect the new file.
-
To set up the subscription you demand to go an endpoint URL from Sumo to provide to AWS. This process will save your Source and begin scanning your S3 bucket when the endpoint URL is generated. Click on Create URL and use the provided endpoint URL when creating your subscription in step C.
-
- Browse Interval. Sumo Logic will periodically scan your S3 bucket for new items in addition to SNS notifications. Automated is recommended to not incur additional AWS charges. This sets the browse interval based on if subscribed to an SNS topic endpoint and how often new files are detected over fourth dimension.
Set up SNS in AWS (Highly Recommended)
-
Go toServices >Elementary Notification Service and click Create Topic. Enter a Topic name and click Create topic. Copy the provided Topic ARN, you'll need this for the adjacent step.
-
Again go toServices >Simple Notification Service and click Create Subscription. Paste the Topic ARN from pace B above. Select HTTPS as the protocol and enter the Endpoint URL provided while creating the S3 source in Sumo Logic. Click Create subscription and a confirmation request volition be sent to Sumo Logic. The request will exist automatically confirmed by Sumo Logic.
-
Select the Topic created in pace B and navigate to Actions > Edit Topic Policy. Use the following policy template, replace the
SNS-topic-ARN
andbucket-name
placeholders in theResources
section of the JSON policy with your actual SNS topic ARN and S3 bucket name:{
"Version": "2008-ten-17",
"Statement": [{
"Effect": "Allow",
"Principal": {
"AWS": "*"
},
"Activeness": [
"SNS:Publish"
],
"Resources": "SNS-topic-ARN",
"Condition": {
"ArnLike": {
"aws:SourceArn": "arn:aws:s3:*:*:bucket-proper noun"
}
}
}]
} -
Go toServices > S3 and select the bucket to which you want to attach the notifications. Navigate to Properties > Events > Add Notification. Enter a Name for the event notification. In the Events department select All object create events. In the Send to department (notification destination) select SNS Topic. An SNS section becomes available, select the name of the topic you created in step B from the dropdown. Click Salvage.
Complete setup in Sumo Logic
- Set any of the following underAdvanced:
- Enable Timestamp Parsing. This option is selected by default. If information technology'south deselected, no timestamp information is parsed at all.
- Time Zone. There are two options for Time Zone. Y'all can use the time zone present in your log files, and then choose an option in case time zone information is missing from a log bulletin. Or, you tin takeSumo Logic completely disregard whatsoever time zone information present in logs by forcing a time zone. It's very of import to have the proper time zone set, no matter which selection you lot choose. If the fourth dimension zone of logs can't be adamant,Sumo Logic assigns logs UTC; if the balance of your logs are from another time zone your search results will exist affected.
- Timestamp Format. By default,Sumo Logic volition automatically detect the timestamp format of your logs. However, yous tin can manually specify a timestamp format for a Source. See Timestamps, Time Zones, Time Ranges, and Engagement Formats for more than data.
- Enable Multiline Processing. See Collecting Multiline Logs for details on multiline processing and its options. This is enabled by default. Use this selection if you're working with multiline letters (for example, log4J or exception stack traces). Deselect this option if you want to avoid unnecessary processing when collecting single-bulletin-per-line files (for example, Linux system.log). Choose one of the following:
- Infer Boundaries. Enable when yous want Sumo Logic to automatically attempt to make up one's mind which lines vest to the same bulletin. If you lot deselect the Infer Boundaries option, you will demand to enter a regular expression in the Boundary Regex field to use for detecting the unabridged offset line of multiline letters.
- Boundary Regex. You lot can specify the purlieus between messages using a regular expression. Enter a regular expression that matches the entire first line of every multiline bulletin in your log files.
- Create any Processing Rules y'all'd similar for the AWSSource.
- When yous are finished configuring the Source clickSave.
SNS with one bucket and multiple Sources
When collecting from ane AWS S3 bucket with multiple Sumo Sources you need to create a divide topic and subscription for each Source. Subscriptions and Sumo Sources should both map to only one endpoint. If you were to have multiple subscriptions Sumo would collect your objects multiple times.
Each topic needs a separate filter (prefix/suffix) so that collection does not overlap. For example, the following image shows a bucket configured with two notifications that have filters (prefix/suffix) set to notify Sumo separately about new objects in different folders.
Update Source to utilise S3 Event Notifications
- In Sumo Logic select Manage Data > Collection > Collection.
- On the Collection folio navigate to your Source and click Edit. Coil downwardly to Log File Discovery and note the Endpoint URL provided, y'all volition apply this in step 13.C when creating your subscription.
- Complete steps 13.B through 13.E for configuring SNS Notifications.
Troubleshoot S3 Event Notifications
In the web interface under Log File Discovery it shows a cherry-red exclamation mark with "Sumo Logic has non received a validation request from AWS".
Steps to troubleshoot:
- Refresh the Source's page to view the latest condition of the subscription in the SNS Subscription department by clicking Abolish and then Edit on the Source in the Collection tab.
- Verify yous accept enabled sending Notifications from your S3 bucket to the appropriate SNS topic. This is done in footstep 13.E.
- If you didn't use CloudFormation check that the SNS topic has a confirmed subscription to the URL in AWS panel. A "Pending Confirmation" country likely means that you entered the wrong URL while creating the subscription.
In the spider web interface nether Log File Discovery it shows a green check with "Sumo Logic has received an AWS validation request at this endpoint." but you still accept high latencies.
The green check confirms that the endpoint was used correctly, just information technology does non mean Sumo is receiving notifications successfully.
Steps to troubleshoot:
- AWS writes CloudTrail and S3 Audit Logs to S3 with a latency of a few minutes. If you're seeing latencies of effectually x minutes for these Sources information technology is likely because AWS is writing them to S3 later than expected.
- Verify you have enabled sending Notifications from your S3 bucket to the appropriate SNS topic. This is washed in step 13.
Source: https://help.sumologic.com/03Send-Data/Sources/02Sources-for-Hosted-Collectors/Amazon-Web-Services/AWS-S3-Source
Post a Comment for "How to Pull Files From S3 Based on Uploaded Date"