Tag Archives: search

February 26, 2014 · 1:13 PM

Avoid SharePoint eDiscovery Headaches Before They Begin

by Barry Murphy

Last week, this blog featured highlights from the very successful SharePoint eDiscovery webinar with Reed Smith. It is a topic that came up time and again in my tenure as an analyst; companies and government Agencies are becoming tightly bound to SharePoint and want to be sure that eDiscovery ordeals can be avoided.

Microsoft SharePoint continues to gain widespread traction in organizations large and small. A recent Forrester Research survey found that 66% of respondents will deploy SharePoint server in the next 12 months (source: Forrester Research August 2013 Global SharePoint Usage online survey). The attraction to SharePoint is obvious – the system creates business benefits: enabling collaboration in efficient ways; providing ways to track versions of documents edited by multiple parties; allowing non-technical business people to apply basic workflow to content-driven processes, and providing faster access to information (via search and integration with the MS Office suite of apps).

For the value it creates, SharePoint can cause eDiscovery and information governance headaches if proper planning does not take place. It is all too easy to assume that if information is searchable, eDiscovery will be no problem when the time comes. But, as is often the case in life, the devil is in the details. Because SharePoint allows users to add value to content (e.g. adding workflow tasks), there is the factor of metadata to consider.

In eDiscovery, metadata is a critical component of ESI that can wreak havoc on collection efforts. When we talk about metadata for native ESI, we are usually concerned about the Operating System (OS) fields that are kept in the File Allocation Table (FAT). Different OS formats support a wide variety of fields such as different dates, attributes, permissions and file name formats (long vs. short). These fields are not usually stored within the actual file and so are very vulnerable to alteration or complete loss when items are read or copied. Forensic collection is focused on preserving this ‘envelope’ information so that evidence can be authenticated and the context reconstructed in court. That is only half of the metadata story. Microsoft Office and other programs retain non-displayed information within the header and body of all common file types, especially with the adoption of the XML based Office 2007 file formats.

This metadata issue is only extended with SharePoint because most collection efforts are focused only on grabbing SharePoint document libraries (as they are stored on file systems). But SharePoint is so much more than document libraries (if it weren’t, it would simply be another file share sitting out there).

A defensible SharePoint collection solution will be able to capture document libraries, metadata, and truly enable contextual preservation. When considering a SharePoint collection tool be sure that it:

Allows for incremental preservation, monitoring changes, identifying and preserving different document versions, and incrementally preserve to multiple matters over time
Maps to custodian access and prevent over-preservation, collecting and preserving only what is relevant to a particular custodian, not the entire SharePoint site
Is deployable in the same fashion as SharePoint, which tends to be highly decentralized and siloed; thus the collection solution should be deployable on-demand, directly to SharePoint sites through a highly automated, remote installation, with intuitive administration and operation performed through a standard web browser

At the end of the day, informed customers will make sure that the collection tool can get more than just SharePoint document libraries, but all metadata, as well. Also, look for solutions that will not impact the production environment too heavily; you don’t want to bring SharePoint to its knees when it is a valuable business application. And finally, get legal and IT together on the same page about how to reasonably prove that your SharePoint preservation and collection methodologies and tools are defensible.

Leave a comment

Filed under Information Governance

Tagged as Barry Murphy, collection, defensible, document libraries, e-discovery, eDiscovery, ESI, Forrester Research, metadata, Microsoft, Microsoft Office, preservation, Reed Smith, search, SharePoint, Webinar

November 18, 2013 · 9:21 AM

Dr. Michael Levitt: World Famous Scientist, Nobel Laureate, and X1 Power User

Michael Levitt
Nobel Prize in Chemistry 2013

Recently I had the distinct honor of speaking with Dr. Michael Levitt, a 2013 Nobel Prize winner for Chemistry, and highly regarded Professor of Structural Biology at Stanford University. The Nobel Committee awarded Dr. Levitt a Nobel in recognition of his research in computational biology, “for the development of multiscale models for complex chemical systems.” He is also a “huge fan” of X1. When Dr. Levitt and I spoke, he discussed his daily use of X1 Search and how it is essential to his research and professional productivity. “X1 saves me many hours per week,” per his unsolicited email to us at X1 that initiated our dialogue, “I cannot survive without it.”

A computer-savvy scientist, Dr. Levitt relies on a Macintosh laptop with VMWare virtualization running a Windows OS, where he stores 200 gigabytes of data, including 40 gigabytes of over 300,000 emails, and of course relies on X1 to make sense of it all. “Next to my computer itself, X1 is the one tool I can’t do without,” explained Dr. Levitt. “People use the term ‘big data’ a lot these days, but the most important ‘big data’ for me is the 200 gigabytes on my laptop that consists of decades of research, important communications with fellow academics, and other key resources. X1 enables me to find what I am looking for instantaneously. It is a very effective interface to all of my information.”

Dr. Levitt credits X1’s lightning-fast, iterative and faceted search capability, along with X1’s reliability and stability, as enabling him to quickly and tactically sift through 200 gigabytes of emails and academic research. “X1 is an intimate part of my workflow — it is essentially an extension of my mind when I engage in information retrieval, which is many times an hour during my workday.”

In addition to locating his research and other critical data, X1 proved very handy to Dr. Levitt in managing an important email response project. “When I was awarded the Nobel, I received over two thousand congratulatory emails. I used X1 to cross reference my sent folder to make sure I replied to them all. That X1 shortcut saved me several hours alone!”

Dr. Levitt’s testimonial echoes similar sentiments expressed by many high-powered business professionals at top financial institutions, major law firms, consulting companies and science and engineering firms. They all rely on X1 to dramatically enhance their productivity by quickly locating their information amongst an ever-increasing avalanche of emails and other data.

We here at X1 extend our congratulations to Dr. Levitt for his 2013 Nobel prize in Chemistry, as well as our sincere thanks to him for reaching out to us and sharing his enthusiastic feedback on X1 search, which, incidentally, is completely gratis. “Just keep developing great software” is all he asked for in return.

___________________________________________________________________

For more information about X1 Search 8, including a free 14 day trial, please visit here >

2 Comments

Filed under Case Study, Desktop Search

Tagged as 2013, business professionals, Chemistry, fast, Laureate, Michael Levitt, Nobel, Nobel Prize, productivity, Scientist, search, Stanford University, winner, X1

May 29, 2013 · 9:13 AM

X1 Rises Again

Earlier this month Robert Mitchell at Computerworld proclaimed that X1 had reemerged in the world of search with X1 Search 8, the new release of our flagship and industry-leading X1 desktop search. (See: X1 Rises Again with Desktop Search 8, Virtual Edition). I think Computerworld is spot on and aptly describes the response and success of X1 Search 8. X1S8 (or “8”) is major advancement of X1 Search. As mentioned on our recent webinar showcasing X1S8, I wanted to thank our hundreds of thousands of loyal and longtime X1 customers, plus many of the new customers joining us in recent weeks since our highly successful launch of 8.

Overall the response has been tremendous! Since the May 7 release, we have seen shattered sales records coupled with very exciting feedback from our customers, new and old. Based upon the feedback, three improvements to X1S8 are particularly resonating. One is the new and streamlined interface that, combined with a faster and more responsive product, provides an enhanced and highly intuitive user experience. Second, our business users from enterprises large and small are very happy with the built-in and integrated SharePoint support (see video here). And finally, the unique support of virtual desktop infrastructure (VDI) demonstrates that X1 is once again a cutting edge technology that supports our customers’ current as well as future requirements.

X1 has always been a great solution. In July 2010, Network World declared the previous version of X1 to be the leader in its class, selecting X1 as its Clear Choice winner ahead of competitors such as the native Windows Outlook search and Google desktop search. Perhaps not coincidentally, after Network World anointed X1, Google shortly thereafter announced the end of life for its desktop search. Additionally, with Windows 8, Microsoft is apparently moving away from integrated desktop search as the latest Windows OS no longer features a federated search option.

What this means is that X1 not only has risen again, but has a clear lead in the field of search as 8 is a major upgrade to our award-winning previous version. This is particularly true given the virtualization capabilities of X1S8, which we will now be rolling out to many large enterprises that previously could not support any desktop search within their VDI environment. We believe that eventually desktops will be hosted in the cloud, which will require a cloud-based virtual desktop architecture, which X1 already and uniquely supports. So X1 is ready for the cloud when our customers are.

Recently X1 has enjoyed the support of a number of investors who have enabled us to double down on our support of current customers and the development of next generation search and eDiscovery solutions to support enterprises large and small. This includes our enterprise eDiscovery strategy, but it also means channeling additional resources into our core X1 desktop search technology. Look for more innovations to support our loyal customer base including new mobile support, and search of hybrid cloud. And if you haven’t yet seen X1 Search 8, please take it for a spin with a free trial available at this link.

Leave a comment

Filed under Desktop Search, Virtualized Environment

Tagged as Computerworld, desktop search, Google, Network World, Robert Mitchell, search, SharePoint, VDI, Windows 8, X1

June 27, 2012 · 2:50 PM

Authenticating Internet Web Pages as Evidence: a New Approach

By John Patzakis and Brent Botta

In recent posts, we have addressed the issue of evidentiary authentication of social media data. (See previous entries here and here). General Internet site data available through standard web browsing, instead of social media data provided by APIs or user credentials, presents slightly different but just as compelling challenges.

The Internet provides torrential amounts of evidence potentially relevant to litigation matters, with courts routinely facing proffers of data preserved from various websites. This evidence must be authenticated in all cases, and the authentication standard is no different for website data or chat room evidence than for any other. Under Federal Rule of Evidence 901(a), “The requirement of authentication … is satisfied by evidence sufficient to support a finding that the matter in question is what its proponent claims.” United States v. Simpson, 152 F.3d 1241, 1249 (10th Cir. 1998).

Ideally, a proponent of the evidence can rely on uncontroverted direct testimony from the creator of the web page in question. In many cases, however, that option is not available. In such situations, the testimony of the viewer/collector of the Internet evidence “in combination with circumstantial indicia of authenticity (such as the dates and web addresses), would support a finding” that the website documents are what the proponent asserts. Perfect 10, Inc. v. Cybernet Ventures, Inc. (C.D.Cal.2002) 213 F.Supp.2d 1146, 1154. (emphasis added) (See also, Lorraine v. Markel American Insurance Company, 241 F.R.D. 534, 546 (D.Md. May 4, 2007) (citing Perfect 10, and referencing MD5 hash values as an additional element of potential “circumstantial indicia” for authentication of electronic evidence).

One of the many benefits of X1 Social Discovery is its ability to preserve and display all the available “circumstantial indicia” – to borrow the Perfect 10 court’s term — to the user in order to present the best case possible for the authenticity of Internet-based evidence collected with the software. This includes collecting all available metadata and generating a MD5 checksum or “hash value” of the preserved data.

But html web pages pose unique authentication challenges and merely generating an MD5 checksum of the entire web page, or just the web page source file, provides limited value because web pages are constantly changing due to their very fluid and dynamic nature. In fact, a web page collected from the Internet in immediate succession would very likely calculate two different MD5 checksums. This is because web pages typically feature links to many external items that are dynamically loaded upon each page view. These external links take the form of cascading style sheets (CSS), graphical images, JavaScripts and other supporting files. This linked content can be stored on another server in the same domain, but is often located somewhere else on the Internet.

When the Web browser loads a web page, it consolidates all these items into one viewable page for the user. Since the Web page source file contains only the links to the files to be loaded, the MD5 checksum of the source file can remain unchanged even if the content of the linked files become completely different. Therefore, the content of the linked items must be considered in the authenticity of the Web page. X1 Social Discovery addresses these challenges by first generating an MD5 checksum log representing each item that constitutes the Web page, including the main Web page’s source. Then an MD5 representing the content of all the items contained within the web page is generated and preserved.

To further complicate Web collections, entire sections of a Web page are often not visible to the viewer. These hidden areas serve various purposes, including metatagging for Internet search engine optimization. The servers that host Websites can either store static Web pages or dynamically created pages that usually change each time a user visits the Website, even though the actual content may appear unchanged.

In order to address this additional challenge, X1 Social Discovery utilizes two different MD5 fields for each item that makes a Web page. The first is the acquisition hash that is from the actual collected information. The second is the content hash. The content hash is based on the actual “BODY” of a Web page and ignores the hidden metadata. By taking this approach, the content hash will show if the user viewable content has actually changed, not just a hidden metadata tag provided by the server. To illustrate, below is a screenshot from the metadata view of X1 Social Discovery for website capture evidence, reflecting the generation of MD5 checksums for individual objects on a single webpage:

The time stamp of the capture and url of the web page is also documented in the case. By generating hash values of all individual objects within the web page, the examiner is better able to pinpoint any changes that may have occurred in subsequent captures. Additionally, if there is specific item appearing on the web page, such as an incriminating image, then is it is important to have an individual MD5 checksum of that key piece of evidence. Finally, any document file found on a captured web page, such as a pdf, Powerpoint, or Word document, will also be individually collected by X1 Social Discovery with corresponding acquisition and content hash values generated.

We believe this approach to authentication of website evidence is unique in its detail and presents a new standard. This authentication process supports the equally innovative automated and integrated web collection capabilities of X1 Social Discovery, which is the only solution of its kind to collect website evidence both through a one-off capture or full crawling, including on a scheduled basis, and have that information instantly reviewable in native file format through a federated search that includes multiple pieces of social media and website evidence in a single case. In all, X1 Social Discovery is a powerful solution to effectively collect from social media and general websites across the web for both relevant content and all available “circumstantial indicia.”

Leave a comment

Filed under Authentication, Best Practices, Preservation & Collection

Tagged as authenticating, authentication, circumstantial incidia, collection, evidence, forensic, Inc. v. Cybernet Ventures, Inc.;, internet evidence, internet site data, internet web pages, investigation, litigation, Lorraine v. Markel American Insurance Company, MD5 checksum, MD5 hash, Perfect 10, preservation, search, social media, time stamp, United v. Simpson, website site investigation, X1 Discovery, X1 Social Discovery

April 4, 2012 · 2:30 PM

Defining Truly Cloud-Capable eDiscovery Software

Last week we discussed the challenges of searching and collecting data in Infrastructure as a Service (IaaS) cloud deployments (such as the Amazon cloud or Rackspace) for eDiscovery purposes. Today we discuss what is needed for eDiscovery and enterprise search vendors to provide a truly cloud-capable solution and provide a decoder ring of sorts to cut through the hype. For there is a lot of hype with the cloud becoming the latest eDiscovery hot button, with vendor marketing claims far surpassing actual capabilities.

In fact, many eDiscovery and enterprise software vendors claim to support the cloud, but are simply re-branding their long-existing SaaS offerings, which really has nothing to do with supporting IaaS. Barry Murphy of the eDiscovery Journal aptly identified this marketing practice as “cloud washing.” Data hosting, especially where the vendor’s manual labor is routinely required to upload and process data, does not meet defined cloud standards. Neither does a process that primarily exports data through APIs or other means out of its resident cloud environment to slowly migrate the cloud data to the vendor tools, instead of deploying the tools (and their processing power) to the data where it resides in the cloud. In order to truly support IaaS cloud deployments, eDiscovery and enterprise search software must meet the following three core requirements:

1. Automated installation and virtualization: The eDiscovery and search solution must immediately and rapidly install, execute and efficiently operate in a virtualized environment without rigid hardware requirements or on-site physical access. This is impossible if the solution is fused to hardware appliances or otherwise requires a complex on-site installation process. As hardware appliance solutions by definition are not cloud deployable and with enterprise search installations often requiring many months of man hours to install and configure, whether many of these vendors will be able to support robust IaaS cloud deployments in the reasonably foreseeable future is a significant question.

2. On-demand self-service: In its definition of cloud computing, The National Institute of Standards and Technology (NIST) identifies on- demand self-service as an essential characteristic of the cloud where a “consumer can unilaterally provision computing capabilities, such as server time and network storage, as needed automatically without requiring human interaction with each service provider.”

Many hosted eDiscovery services require shipping of data to the provider or extensive behind the scenes manual labor to load and configure the systems for data ingestion. Conversely, solutions that truly support cloud IaaS will spin up, ingest data and fully operate in an automated fashion without the need for manual on-premise labor for configuration or data import.

3. Rapid elasticity: NIST describes this characteristic as capabilities that “scale rapidly outward and inward commensurate with demand. To the consumer, the capabilities available for provisioning often appear to be unlimited and can be appropriated in any quantity at any time.” This important benefit of cloud computing is accomplished by a parallelized software architecture designed to dynamically scale out over potentially several dozen virtualized servers to enable rapid ingestion, processing and analysis of data sets in that cloud environment. This capability would allow several terabytes of data to be indexed and processed within 2 to 4 hours on a highly automated basis at far less cost than non-cloud eDiscovery efforts.

However, many characteristics of leading eDiscovery solutions fundamentality prevent their ability to support this core cloud requirement. Most eDiscovery early case assessment solutions are developed and configured toward a monolithic processing schema designed to operate on a single expensive hardware apparatus. While recently spawning some bold marketing claims of high speeds and feeds, such architecture is very ill-suited to the cloud, which is powered by highly distributed processing across multitudes of servers. Additionally, many of the leading eDiscovery and enterprise search solutions are tightly integrated with third party databases and other OEM technology that cannot be easily decoupled (and also present possible licensing constraints) making such elasticity physically and even legally impossible.

So is there eDiscovery software that will truly support the IaaS cloud based upon these requirements, and address up to terabytes of data? Stay tuned….

Leave a comment

Filed under Cloud Data, Enterprise eDiscovery, IaaS

Tagged as Amazon, automated installation, cloud, cloud-capable, cloud-deployable, collect, eDiscovery, enterprise, Enterprise Search, IaaS, Infrastructure as a Service, Rackspace, rapid elasticity, search, self service, terabytes of data, virtualization

Tag Archives: search

Avoid SharePoint eDiscovery Headaches Before They Begin

Dr. Michael Levitt: World Famous Scientist, Nobel Laureate, and X1 Power User

X1 Rises Again

Blogger

@patzakis

@x1discovery

Search this Blog

Subscribe to Blog

Blog Stats

Tags

Blog Topics

Popular Posts

Copyright

Tag Archives: search

Avoid SharePoint eDiscovery Headaches Before They Begin

Share this:

Dr. Michael Levitt: World Famous Scientist, Nobel Laureate, and X1 Power User

Share this:

X1 Rises Again

Share this:

Authenticating Internet Web Pages as Evidence: a New Approach

Share this:

Defining Truly Cloud-Capable eDiscovery Software

Share this:

Blogger

@patzakis

@x1discovery

Search this Blog

Subscribe to Blog

Blog Stats

Tags

Blog Topics

Popular Posts