Using MongoDB/GridFS and Spring Data-白红宇

Using MongoDB/GridFS and Spring Data

阅读量：4132 次

发布时间：2019-05-25

本文共 15469 字，大约阅读时间需要 51 分钟。

I recently delved into for the first time, and albeit I was skeptical at first, I now believe it is my preference to use a NOSQL database over a traditional RDBMS. I rarely just fall in love with a new technology but the flexibility, ease of use, scalability and versatility of Mongo are good reasons to give it a chance. Here are some of the advantages of .

NOSQL – A more object oriented way to access your data and no complex SQL command to learn or remember

File Storage – Mongo is a master of storing flat files. Relational databases have never been good at this.

No DBA – The requirement of database administration in greatly minimized with NOSQL solutions

No schema, complex structures or normalization. This can be a good thing and also bad. Inevitably everyone has worked on a project that has been over normalized and hated it.

No complex join logic

Spring Data for Mongo

My first stop when coding against Mongo was to figure out how Spring supported it and without fail, I was not disappointed. provides a MongoTemplate and a GridFSTemplate for dealing with Mongo. GridFs is the Mongo file storage mechanism that allows you to store whole files into Mongo. The Mongo NOSQL database utilizes a JSON-like object storage technique and GridFS uses BSON (Binary JSON) to store file data.

As the name implies, a NOSQL database doesn’t use any SQL statements for data manipulation, but it does have a robust mechanism to accomplish the same ends. Before we start interacting with Mongo, let’s look at some of the components I used to accomplish the examples I am going to show you.

Spring 3.1.0.RELEASE

for MongoDB 1.1.0.M2

Mongo Java Driver 2.8.0

AspectJ (Optional) 1.7.0

Maven (Optional) LATEST

The very first thing we need to configure is our context.xml file. I always start a project with one of these but I use Spring annotations as much as possible to keep the file clean.

In short, the context file is setting up a few things.

The database factory that the templates will use to get a connection

The MongoTemplate and GridFSTemplate

Annotation support

Annotation @Configuration support if needed (Optional)

Let’s take a look at my App class that is the main entry point for this Java application.

...@Configurablepublic class App{@Autowiredpublic MongoOperations mongoOperation;@Autowiredpublic StorageService storageService;ApplicationContext ctx;public App() {ctx = new GenericXmlApplicationContext("mongo-config.xml");...

I am using AspectJ to weave my dependencies at inject them at compile or load time. If you are not using AspectJ, you need to lookup the MongoOperation and StorageService from the Context itself. The Storage Service is a simple @Service bean that provides an abstraction on top of the GridFsTemplate.

...@Service("storageService")public class StorageServiceImpl implements StorageService {@Autowiredprivate GridFsOperations gridOperation;@Overridepublic String save(InputStream inputStream, String contentType, String filename) {DBObject metaData = new BasicDBObject();metaData.put("meta1", filename);metaData.put("meta2", contentType);GridFSFile file = gridOperation.store(inputStream, filename, metaData);return file.getId().toString();}@Overridepublic GridFSDBFile get(String id) {System.out.println("Finding by ID: " + id);return gridOperation.findOne(new Query(Criteria.where("_id").is(new ObjectId(id))));}@Overridepublic List listFiles() {return gridOperation.find(null);}@Overridepublic GridFSDBFile getByFilename(String filename) {return gridOperation.findOne(new Query(Criteria.where("filename").is(filename)));}}...

Our StorageServiceImpl is merely making calls to the GridOperations object and simplifying calls. This class is not strictly necessary since you can inject the GridOperations object into any class, but if you are planning on keeping a good separation to be able to extract Mongo/GridFS later to go with something else, this makes sense.

Mongo Template

Now, we are ready to interact with Mongo. First lets deal with creating and saving some textual data. The operations below show a few examples of interacting with data from the Mongo database by using the MongoTemplate.

User user = new User("1", "Joe", "Coffee", 30);//savemongoOperation.save(user);//findUser savedUser = mongoOperation.findOne(new Query(Criteria.where("id").is("1")), User.class);System.out.println("savedUser : " + savedUser);//updatemongoOperation.updateFirst(new Query(Criteria.where("firstname").is("Joe")),Update.update("lastname", "Java"), User.class);//findUser updatedUser = mongoOperation.findOne(new Query(Criteria.where("id").is("1")), User.class);System.out.println("updatedUser : " + updatedUser);//delete// mongoOperation.remove(//      new Query(Criteria.where("id").is("1")),//  User.class);//ListList
   
     listUser =mongoOperation.findAll(User.class);System.out.println("Number of user = " + listUser.size());

As you can see, it is fairly easy to interact with Mongo using Spring and a simple User object. The user object is just a POJO as well with no special annotations. Now, let’s interact with the files using our StorageService abstraction over GridFs.

//StorageService storageService = (StorageService)ctx.getBean("storageService"); //if not using AspectJ WeavingString id = storageService.save(App.class.getClassLoader().getResourceAsStream("test.doc"), "doc", "test.doc");GridFSDBFile file1 = storageService.get(id);System.out.println(file1.getMetaData());GridFSDBFile file = storageService.getByFilename("test.doc");System.out.println(file.getMetaData());List files = storageService.listFiles();for (GridFSDBFile file2: files) {System.out.println(file2);}

The great thing about Mongo is that you can store metadata about the file itself. Let’s look at the output of our file as printed by the code above.

{ "_id" : { "$oid" : "502a61f6c2e662074ea64e52"} , "chunkSize" : 262144 , "length" : 1627645 , "md5" : "da5cb016718d5366d29925fa6a2bd350" , "filename" : "test.doc" , "contentType" : null , "uploadDate" : { "$date" : "2012-08-14T14:34:30.071Z"} , "aliases" : null , "metadata" : { "meta1" : "test.doc" , "meta2" : "doc"}}

Using Mongo, you can associate any metadata with your file you wish and retrieve the file by that data at a later time. Spring support for GridFS is in its infancy, but I fully expect it to only grow as all Spring projects do.

Query Metadata

The power of Mongo also lies in the metadata concepts that I mentioned earlier and relational databases just don’t have this concept. Mongo stored implicit metadata about the files and it also allowed me to attach any data I wish onto a metadata layer. You can query this data in the same fashion you would query Mongo directly by using the . notation.

gridOperation.findOne(new Query(Criteria.where("metadata.meta1").is("test.doc")));

Map Reduce

Mongo offers MapReduce, a powerful searching algorithm for batch processing and aggregations that is somewhat similar to SQL’s group by. The MapReduce algorithm breaks a big task into two smaller steps. The map function is designed to take a large input and divide it into smaller pieces, then hand that data off to a reduce function, which distills the individual answers from the map function into one final output. This can be quite a challenge to get your head around when you first look at it as it requires embedding scripting. I highly recommend reading the regarding Map Reduce before attempting writing any map reduce code.

Full-Text Search

MongoDB has no inherent mechanisms to be able to search the text stored in the GridFS files, however, this isn’t a unique limitation as most relational databases also have problems with this or require very expensive addons to get this functionality. There are a few mechanisms that could be used as a start to writing this type of mechanism if you are using the Java language. The first would be to just simply take the text and attach it as metadata on the file object. That is a really messy solution and screams of inefficiency, but for smaller files is a possibility. A more ideal solution would be to use and create an searchable index of the file content and store that index along with the files.

Scaling with Sharding

While very difficult to say in mixed company, Sharding describes MongoDB’s ability to scale horizontally automatically. Some of the benefits of this process as described by the Mongo web site are:

Automatic balancing for changes in load and data distribution

Easy addition of new machines without down time

Scaling to one thousand nodes

No single points of failure

Automatic failover

Configuration

One to 1000 shards. Shards are partitions of data. Each shard consists of one or more mongod processes which store the data for that shard. When multiple mongod‘s are in a single shard, they are each storing the same data – that is, they are replicating to each other.

Either one or three config server processes. For production systems use three.

One or more mongos routing processes.

For testing purposes, it’s possible to start all the required processes on a single server, whereas in a production situation, a number of are possible.

Once the shards (mongod‘s), config servers, and mongos processes are running, configuration is simply a matter of issuing a series of commands to establish the various shards as being part of the cluster. Once the cluster has been established, you can begin sharding individual collections.

Import, Export and Backup

Getting data in and out of Mongo is very simple and straight forward. Mongo has the following commands that allow you to accomplish these tasks:

mongoimport

mongoexport

mongodump

mongorestore

You can even delve into the data at hand to export pieces and parts of collections by specifying them in the commands and mixing in . notation or you can choose to dump data by using a query.

$ ./mongodump --db blog --collection posts --out - > blogposts.bson$ ./mongodump --db blog --collection posts    -q '{"created_at" : { "$gte" : {"$date" : 1293868800000},                          "$lt"  : {"$date" : 1296460800000}                        }        }'

Mongodump even takes an argument –oplog to get point in time backups. are as robust as any relational database.

Limitations of MongoDB

Mongo has a few limitations. In some ways, a few of these limitations can be seen as benefits as well.

No Joining across collections

No transactional support

No referential integrity support

No full text search for GridFS files built in

Traditional SQL-driven reporting tools like Crystal Reports and business intelligence tools are useless with Mongo

Conclusions

The advantages of MongoDB as a database far outweigh the disadvantages. I would recommend a Mongo NOSQL database for any project regardless of what the programming language you are using. Mongo has drivers for everything. I do however think that if you are in a certain scenarios where you are dealing with rapid, realtime OLTP transactions, MongoDB may fall short of competing with a high performance RDBMS such as Oracle, for example. For the average IT project, I believe Mongo is well-suited. If you still aren’t sold on Mongo by now, (I would be pretty shocked if you weren’t), then feast your eyes on the that are using MongoDB as their backend database today.

FourSquare

Bit.ly

github

Eventbrite

Grooveshark

Craigslist

Intuit

The goes on and on… There are also several other NOSQL solutions out there that enjoy popularity.

CouchDB

RavenDB

CouchBase

Optional Components

I used several optional components for my exercises. I wanted to address these for the folks who may not be familiar with them.

AspectJ and @Configurable

Many folks would ask why I chose to use Aspect Weaving instead of just looking up the objects from the context in the App object. @Configurable allows you to use the @Autowired annotation on a class that is not managed by the Spring context. This process requires load-time or compile-time weaving to work. For the purposes of Eclipse, I use the ADJT plugin and for Maven, I use the AspectJ plugin to achieve this. The weaving process just looks for certain aspects and then weaves the dependencies into the byte code. It does solve a lot of chicken and egg problems when dealing with Spring.

Maven

If you are using Maven and you want all of the dependencies I used for the examples, here is the pom.xml


   
   	
    
     4.0.0
    	
    
     com.doozer
    	
    
     MongoSpring
    	
    
     jar
    	
    
     1.0
    	
    
     MongoSpring
    	
    
     http://maven.apache.org
    	
    		
     
      UTF-8
     		
     
      3.1.0.RELEASE
     	
    	
    		
     			
      
       junit
      			
      
       junit
      			
      
       4.8.2
      			
      
       test
      		
     		
     			
      
       org.slf4j
      			
      
       slf4j-api
      			
      
       1.6.6
      		
     		
     			
      
       org.slf4j
      			
      
       jcl-over-slf4j
      			
      
       1.6.6
      			
      				
       					
        
         slf4j-api
        					
        
         org.slf4j
        				
       			
      		
     		
     			
      
       org.slf4j
      			
      
       slf4j-log4j12
      			
      
       1.6.6
      			
      				
       					
        
         slf4j-api
        					
        
         org.slf4j
        				
       			
      		
     		
     		
     			
      
       org.springframework
      			
      
       spring-core
      			
      
       ${spring.version}
      		
     		
     			
      
       org.springframework
      			
      
       spring-context
      			
      
       ${spring.version}
      		
     		
     			
      
       org.springframework
      			
      
       spring-aop
      			
      
       ${spring.version}
      		
     		
     			
      
       org.springframework
      			
      
       spring-aspects
      			
      
       ${spring.version}
      		
     		
     		
     			
      
       org.mongodb
      			
      
       mongo-java-driver
      			
      
       2.8.0
      		
     		
     			
      
       org.aspectj
      			
      
       aspectjweaver
      			
      
       1.7.0
      		
     		
     			
      
       org.aspectj
      			
      
       aspectjrt
      			
      
       1.7.0
      		
             
     			
      
       org.springframework.data
      			
      
       spring-data-mongodb
      			
      
       1.1.0.M2
      		
     		
     			
      
       cglib
      			
      
       cglib
      			
      
       2.2
      		
     		
     			
      
       javax.persistence
      			
      
       persistence-api
      			
      
       1.0
      			
      
       provided
      		
     	
    	
    		
     			
      				
       
        maven-compiler-plugin
       				
       					
        1.6					
        
         1.6
        				
       			
      			
      				
       
        org.apache.maven.plugins
       				
       
        maven-dependency-plugin
       				
       					
         
         
          copy-dependencies
          
         
          prepare-package
          
          
          
           copy-dependencies
           
          
          
          
           ${project.build.directory}/lib
           
          
           false
           
          
           false
           
          
           true
           
          
        				
       			
      			
      				
       
        org.apache.maven.plugins
       				
       
        maven-jar-plugin
       				
       					
         
          
          
           true
           
          
           lib/
           
          
           com.doozer.mongospring.core.App
           
          
        				
       			
      			
      				
       
        org.codehaus.mojo
       				
       
        aspectj-maven-plugin
       				
       					
        
         1.6
        					
         
          
          
           org.springframework
           
          
           spring-aspects
           
          
        				
       				
       					
         
          
          
           compile